April 29, 2025

BitNet v2 Improves 1-Bit LLM Efficiency with 4-Bit Activations

Listen to this article as Podcast
0:00 / 0:00
BitNet v2 Improves 1-Bit LLM Efficiency with 4-Bit Activations

BitNet v2: More Efficient 1-Bit LLMs through 4-Bit Activations

Large language models (LLMs) have revolutionized the world of Artificial Intelligence. However, their impressive capabilities in text generation and processing come with a high computational cost and memory requirement. This poses a challenge, especially for deployment on resource-constrained devices. A promising approach to optimizing LLMs is quantization, which reduces the precision of model parameters and activations. A particularly extreme example of this is 1-bit LLMs, where weights and activations are reduced to just one bit. However, the implementation of such models is complex and associated with performance losses.

A research team has now introduced BitNet v2, a new approach that significantly improves the efficiency of 1-bit LLMs. The core problem with quantizing LLMs to extremely low bit widths lies in so-called outliers within the activations. These outliers, i.e., values that deviate significantly from the average, make it difficult to accurately represent the activations with few bits. BitNet v2 addresses this problem by introducing 4-bit activations in combination with a Hadamard transform.

The innovative component of BitNet v2 is the H-BitLinear module. This module applies a Hadamard transform before quantizing the activations. The Hadamard transform is a linear transformation that smooths the distribution of the activations and brings them into a more Gaussian shape. This Gaussian distribution is much better suited for quantization to low bit widths, as outliers are reduced and the information density is increased in the relevant value range.

In experiments, BitNet v2 showed promising results. Models trained from scratch with 8-bit activations achieved comparable performance to BitNet b1.58, a predecessor model. Particularly impressive, however, is that BitNet v2 also suffers only minimal performance losses with native 4-bit activations. This enables a significant reduction in memory requirements and computational costs, especially during batch inference, i.e., the simultaneous processing of multiple inputs.

The development of BitNet v2 represents an important step towards more efficient and resource-saving LLMs. By combining 1-bit weights and 4-bit activations in conjunction with the Hadamard transform, high performance is achieved while simultaneously reducing resource requirements. This opens up new possibilities for the use of LLMs on a wider range of devices, from mobile devices to embedded systems.

For companies like Mindverse, which specialize in the development and deployment of AI solutions, such advances in LLM compression are of great importance. More efficient models enable the development of more cost-effective and scalable solutions for customers. The integration of technologies like BitNet v2 into Mindverse's product range could help to make the power of LLMs accessible to a wider audience and drive the development of innovative AI applications.

Bibliographie: - https://arxiv.org/abs/2504.18415 - https://huggingface.co/papers - https://medium.com/data-science-in-your-pocket/bitnet-b1-58-2b4t-the-1st-1-bit-llm-is-here-35f0315089c6 - https://news.ycombinator.com/item?id=39535800 - https://github.com/HuangOwen/Awesome-LLM-Compression/blob/main/README.md - https://paperswithcode.com/paper/bitnet-a4-8-4-bit-activations-for-1-bit-llms - https://openreview.net/forum?id=vWR3KuiQur - https://github.com/xlite-dev/Awesome-LLM-Inference/blob/main/README.md - https://blog.openvino.ai/blog-posts/q424-technology-update---low-precision-and-model-optimization