Human compatible world models across sizes, languages and modalities

Jonas Andrulis, Constantin Eichenberg, Robert Baldock; Aleph Alpha

upcoming: September 27th 2022 @ 6pm
Did you know there was an AI startup working on "General Artificial Intelligence" right here in Heidelberg?
Yes, there is! And we will have them with us at!

We are excited to have Jonas Andrulis, Constantin Eichenberg and Robert Baldock, the CEO and research scientists of the startup Aleph Alpha from Heidelberg, present their work and insights on multimodal learners and world models in person at the DKFZ. As an example of their work and goal towards making AI technologies accessible to a wide audience, they open-sourced the MAGMA multimodal model, which can process images and text in any combination, this year.


The generalizability and few-shot capabilities of large language models (LLM) like GPT-3 have opened up new possibilities for countless innovative apps. LLMs demonstrate an impressive context and language understanding which enables them to solve problems that were previously intractable with deep learning. This level of proficiency is based on the structure and knowledge extracted from a huge array of diverse and complex texts. Up to this point, the application of large-scale language (pre-)training to the construction of multimodal models has been mostly limited to specialized tasks, like visual QA or captioning, or it required expensive data annotation. These attempts thus failed to make convincing use of the potential and flexibility of large language models with their hundreds of billions of parameters. This is where we succeeded in building a fully self-supervised trained multimodal model by combining an existing (self-build) multi-language LLM with a pre-trained image encoder. We will discuss our approach of augmenting generative language models with additional modalities using adapter-based finetuning. The language model weights remain unchanged during training, allowing for the transfer of encyclopedic knowledge and in-context learning abilities from language pretraining. This approach makes multimodal enhancement efficient even for very large language models and adds world knowledge and context understanding previously only seen for language models. We further discuss current work using the multimodal embedding for search, explainability and classification.


  • Jonas Andreas, CEO and founder of Aleph Alpha
  • Constantin Eichenberg, Researcher at Aleph Alpha
  • Robert Baldock, Senior researcher at Aleph Alpha

Aleph Alpha is an artificial intelligence research & development company from Heidelberg, Germany. Aleph Alpha aims to revolutionize the accessibility and usability of AI towards an era of Transformative Artificial Intelligence in Europe.

Jonas Andrulis

Event Info

Please help us plan ahead by registrating for the event at our meetup event-site .
After the event, there will be a social get-together with food and drinks courtesy of the Division of Medical Image Computing and Interactive Machine Learning Group at the DKFZ.

What? Human compatible world models across sizes, languages and modalities
Who? Jonas Andrulis, Constantin Eichenberg, Robert Baldock; Aleph Alpha
When? September 27th 2022 @ 6pm
Where? DKFZ Communication Center (K1+K2), Im Neuenheimer Feld 280
Registration meetup event-site

Corona rules

We are happy to see that so many of you are interested in this event! To allow as many people as possible to attend it, the following rules apply:
Attendance requires proof of 3G corona status and wearing a FFP2 mask is mandatory throughout the whole event.