November 30, 2024

New Open-Source Language Models OLMo-2 and Llama-3.1 Available on Anychat

Listen to this article as Podcast
0:00 / 0:00
New Open-Source Language Models OLMo-2 and Llama-3.1 Available on Anychat

New AI Models OLMo-2 and Llama-3.1 Available on Anychat

The Allen Institute for AI (AI2) has released its latest language models, OLMo-2-1124-13B-Instruct and Llama-3.1-Tulu-3-8B, on the Anychat platform. This development allows users to directly test the performance of these models and explore them for various applications.

OLMo-2: A Fully Open Language Model

OLMo-2 is the latest iteration of AI2's Open Language Models. The models were trained with up to 5 trillion tokens and aim to advance research in the field of language models. AI2 emphasizes the complete openness of OLMo-2, meaning that in addition to the model weights, training data, code, and evaluation metrics are publicly available. This approach promotes transparency and reproducibility in AI research.

OLMo-2 was trained in two phases using a curriculum approach. The first phase comprised the majority of the training budget and utilized the OLMo-Mix-1124 dataset, a collection of approximately 3.9 trillion tokens from various sources. The second phase used a mixture of filtered web data and high-quality domain-specific data, available as Dolmino-Mix-1124.

Of particular note is the development of OLMo-2-Instruct, a variant specifically optimized for instruction tasks. These models were trained with the Tülu-3 dataset and are designed to be able to understand and follow complex instructions. According to AI2, OLMo-2-13B-Instruct is more powerful than comparable open-weight models like Qwen 2.5 14B Instruct, Tülu 3 8B, and Llama 3.1 8B Instruct.

Llama-3.1 and Tülu-3

The release of Llama-3.1-Tulu-3-8B on Anychat highlights the importance of post-training methods for improving model performance. Tülu-3 is a method developed by AI2 that combines various training techniques, including Supervised Fine-tuning (SFT), Preference Tuning with Direct Preference Optimization (DPO), and Reinforcement Learning with Verifiable Rewards (RLVR).

The application of the Tülu-3 method to Llama-3.1 aims to optimize the model's performance on various tasks, particularly with regard to following instructions. The combination of SFT, DPO, and RLVR allows the model to learn from feedback and improve its responses.

Anychat as a Platform for AI Experiments

The availability of OLMo-2 and Llama-3.1 on Anychat offers researchers and developers a valuable opportunity to experiment with these models and evaluate their capabilities. Anychat serves as a platform for the direct comparison of different models and the exploration of new application possibilities.

The release of these models underscores the ongoing trend towards the democratization of AI technologies and allows a broader community to participate in the development and application of language models.

Bibliographie: https://huggingface.co/allenai/OLMo-2-1124-13B-Instruct https://allenai.org/blog/olmo2 https://huggingface.co/allenai/OLMo-2-1124-13B https://analyticsindiamag.com/ai-news-updates/ai2s-new-language-models-compete-directly-with-qwen-and-llama/ https://twitter.com/allen_ai/status/1861512100277035281 https://www.llama.com/ https://www.threads.net/@natolambert/post/DCpByhZoAhL/first-try-our-models-via-our-free-demo-or-grab-them-on-hugging-facemy-story-behi https://allenai.org/olmo/release-notes