The generalizability and few-shot capabilities of large language models (LLM) like GPT-3 have opened up new possibilities for countless innovative apps. LLMs demonstrate an impressive context and language understanding which enables them to solve problems that were previously intractable with deep learning. This level of proficiency is based on the structure and knowledge extracted from a huge array of diverse and complex texts. Up to this point, the application of large-scale language (pre-)training to the construction of multimodal models has been mostly limited to specialized tasks, like visual QA or captioning, or it required expensive data annotation. These attempts thus failed to make convincing use of the potential and flexibility of large language models with their hundreds of billions of parameters. This is where we succeeded in building a fully self-supervised trained multimodal model by combining an existing (self-build) multi-language LLM with a pre-trained image encoder. We will discuss our approach of augmenting generative language models with additional modalities using adapter-based finetuning. The language model weights remain unchanged during training, allowing for the transfer of encyclopedic knowledge and in-context learning abilities from language pretraining. This approach makes multimodal enhancement efficient even for very large language models and adds world knowledge and context understanding previously only seen for language models. We further discuss current work using the multimodal embedding for search, explainability and classification.
Aleph Alpha is an artificial intelligence research & development company from Heidelberg, Germany. Aleph Alpha aims to revolutionize the accessibility and usability of AI towards an era of Transformative Artificial Intelligence in Europe.
Please help us plan ahead by registrating for the event at our
meetup event-site
.
After the event, there will be a social get-together with food and drinks courtesy of the Division of Medical Image Computing and Interactive Machine Learning Group at the DKFZ.
What? | Human compatible world models across sizes, languages and modalities |
---|---|
Who? | Jonas Andrulis, Constantin Eichenberg, Robert Baldock; Aleph Alpha |
When? | September 27th 2022 @ 6pm |
Where? | DKFZ Communication Center (K1+K2), Im Neuenheimer Feld 280 |
Registration | meetup event-site |
We are happy to see that so many of you are interested in this event!
To allow as many people as possible to attend it, the following rules apply:
Attendance requires proof of 3G corona status and wearing a FFP2 mask is mandatory throughout the whole event.