Google has released the Multimodal Live API for Gemini, an interface that allows developers to create applications with real-time interactions. This new technology allows processing and responding to text, audio, and video input in real time. This opens a new chapter in human-computer interaction, enabling natural, dialogue-oriented communication with AI systems.
The Multimodal Live API is based on WebSockets to ensure low-latency server-to-server communication. It supports various functions, including function calls, code execution, and the integration of search functions. These tools can be combined within a single request, enabling complex responses without multiple prompts.
Key features of the Multimodal Live API include:
The Multimodal Live API opens up a variety of application possibilities for real-time interactions:
Google offers developers various resources to experiment with the Multimodal Live API and develop their own applications:
The Multimodal Live API is currently available as an experimental version of the Gemini 2.0 Flash model. Google plans to make the API generally available in January 2025 and offer additional model sizes.
Gemini 2.0 is Google's latest generation of AI models, designed for a more agile era. With advancements in multimodality, such as native image and audio output, and the native use of tools, Gemini 2.0 enables the development of new AI agents that are closer to the concept of a universal assistant.
Gemini 2.0 Flash, the first model in the Gemini 2.0 family, offers improved performance with fast response times. It outperforms Gemini 1.5 Pro in key benchmarks and is twice as fast. In addition to multimodal inputs, Gemini 2.0 Flash also supports multimodal outputs, such as natively generated images and controllable text-to-speech audio output. It can also natively call tools like Google Search, code execution, and custom functions.
Bibliographies: https://ai.google.dev/api/multimodal-live https://developers.googleblog.com/en/gemini-2-0-level-up-your-apps-with-real-time-multimodal-interactions/ https://ai.google.dev/api/multimodal-live?authuser=8&hl=de https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/ https://blog.google/feed/gemini-jules-colab-updates/ https://www.youtube.com/watch?v=9hE5-98ZeCg https://x.com/googledevs?lang=de https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-live