Multimodal Learning

The goal of this topic is to create a multimodal dataset which will be saved in a database and is accessible through a REST-API. An iOS-App is made which uses the API to display the multimodal documents. The App can be used via eyetracking.

Documents (e.g. books, wikipedia articles ...) can be processed to create a multimodal dataset. For this, the focus words of the sentences are found. A focus word is a word that is complex and depictable at the same time. Then, for the sentences with at least one focus word an image is retrieved from the image dataset. In best case, the image represents the focus word/s and the context of the sentence. The next step is to save different versions of that image in which the focus word/s is/are highlighted. A document with at least one sentence that has at least one focus word and an image will be saved in a database.

Demovideo

A video of the app can be found in this repository

Installation and Usage

Please have a look at the wiki for a detailed description in how to install and use the multimodal dataset builder, the api and the frontend.

Documentation

To learn about the basic mechanims and techniques please look into our wiki. The documentation for the backend and the NLP pipeline can be found in the respective folders.

The iOS app is build using apple's basic UI Framework called UIKit. Therefore we did not documentate the default mechanisms and functions, like generating basic UI elements. More over it is common in iOS development to rather use long signitures and varibale names, which describe the functions good enaugh, so less documentation is needed. However, the interesting part of the application is how the eye tracking works, which is done using multiple service classes. We documented them since they are not trivial. They can be found in the files laying in the Multimodal Learning App/Multimodal Learning/Tools folder.

Acknowledgment

Especially the multimodal dataset builder uses the work of others. These are referenced here:

License

For the multimodal dataset creation, we included the complex word identifier and miniCLIP to our code. That's why we want to explicitly point out how they are licensed.

Name	License	URL
Complex Word Identifier	Unkown	https://github.com/in2dblue/mastersThesis
miniCLIP	Apache 2.0	https://github.com/HendrikStrobelt/miniClip

The file "DEP_LICENSES.md" shows how the dependencies of the multimodal dataset builder are licensed.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Mulitmodal Learning App		Mulitmodal Learning App
api		api
data/concreteness		data/concreteness
multimodalDatasetBuilder		multimodalDatasetBuilder
scripts		scripts
DEP_LICENSES.md		DEP_LICENSES.md
LICENSE		LICENSE
Multimodal Learning App- Demo.mp4		Multimodal Learning App- Demo.mp4
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multimodal Learning

Demovideo

Installation and Usage

Documentation

Acknowledgment

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Alienmaster/MultimodalLearningIOSApp

Folders and files

Latest commit

History

Repository files navigation

Multimodal Learning

Demovideo

Installation and Usage

Documentation

Acknowledgment

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages