Meta proposes new scalable memory layers that improve knowledge, reduce hallucinations inkeinspires

Join our daily and weekly newsletters for the latest updates and exclusive content covering cutting-edge AI. Learn more

As companies continue to adopt extended language models (LLMs) in various applications, one of the main challenges they face is improving factual knowledge of the models and reducing hallucinations. In a new article, researchers from Meta-AI to propose “scalable memory layers”, which could be one of many possible solutions to this problem.

Scalable memory layers add more parameters to LLMs to increase their learning capacity without requiring additional computational resources. The architecture is useful for applications where you can save extra memory for factual knowledge, but also want the speed of more agile model inference.

Dense and memory layers

Traditional language models use “dense layers” to encode large amounts of information in their parameters. In dense layers, all parameters are used to their full capacity and are mostly activated at the same time during inference. Dense layers can learn complex functions, and scaling them up requires additional computing and energy resources.

In contrast, for simple factual knowledge, much simpler layers with associative memory architectures would be more efficient and interpretable. That’s what memory layers do. They use simple sparse activations and key-value search mechanisms to encode and retrieve knowledge. Sparse layers take up more memory than dense layers, but only use a small portion of the parameters at a time, making them much more computationally efficient.

Memory layers have existed for several years but are rarely used in modern deep learning architectures. They are not optimized for current hardware accelerators.

Current frontier LLMs typically use a form of “mixture of experts” (MoE) architecture, which uses a mechanism vaguely similar to memory layers. MoE models are composed of many smaller expert components specialized in specific tasks. At inference time, a routing mechanism determines which expert is activated based on the input sequence. PEER, an architecture recently developed by Google DeepMind, extends MoE to millions of experts, providing more granular control over the parameters enabled during inference.

Upgrading Memory Layers

Memory layers are compute-light but memory-heavy, which presents specific challenges for today’s hardware and software infrastructures. In their paper, the Meta researchers propose several modifications that address these challenges and enable their widespread use.

Memory layers can store knowledge in parallel across multiple GPUs without slowing down the model (source: arXiv)

First, the researchers configured the memory layers for parallelization, distributing them across multiple GPUs to store millions of key-value pairs without changing other layers of the model. They also implemented a special CUDA kernel to handle high memory bandwidth operations. They also developed a parameter sharing mechanism that supports a single set of memory parameters across multiple memory layers within a model. This means that the keys and values used for lookups are shared between layers.

These modifications make it possible to implement memory layers within LLMs without slowing down the model.

“Memory layers, with their sparse activations, complement dense networks well, providing increased capacity for knowledge acquisition while being computationally lightweight,” the researchers write. “They can be scaled efficiently and offer practitioners an interesting new direction for trading off memory and computation.”

To test the memory layers, the researchers modified the Llama models by replacing one or more dense layers with a shared memory layer. They compared memory-enhanced models to dense LLM models as well as MoE and PEER models on several tasks, including answering factual questions, scientific and commonsense knowledge of the world, and coding.

Memory model vs dense layers — A 1.3B (solid line) memory model trained on 1 trillion tokens approaches the performance of a 7B (dashed line) model on factual question-answering tasks because it receives more parameters from memory (source: arxiv)

Their results show that memory models improve significantly over dense baselines and rival models that use 2-4 times more compute. They also match the performance of MoE models that have the same computational budget and the same number of parameters. The model’s performance is particularly remarkable on tasks that require factual knowledge. For example, when it comes to factually answering questions, a memory model with 1.3 billion parameters comes close to the performance of Llama-2-7B, which was trained on twice as many tokens and 10 times as much computation. .

Additionally, the researchers found that the benefits of memory models remain consistent with model size, as they scaled their experiments from 134 million to 8 billion parameters.

“Given these results, we strongly advocate that memory layers be integrated into all next-generation AI architectures,” the researchers write, while adding that much remains to be done. “In particular, we hope that new learning methods can be developed to push the effectiveness of these layers even further, allowing for less forgetting, fewer hallucinations and continued learning.”

Daily insights into business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you insight into what companies are doing with generative AI, from regulatory changes to practical deployments, so you can share insights for maximum ROI.

Read our privacy policy

Thank you for subscribing. Check out more VB newsletters here.

An error has occurred.

Meta proposes new scalable memory layers that improve knowledge, reduce hallucinations inkeinspires

Dense and memory layers

Upgrading Memory Layers

Leave a Reply Cancel reply

Follow US

Popular News

Microsoft AutoGen v0.4: A turning point toward more intelligent AI agents for enterprise developers

Global Coronavirus Cases

Categories

Quick Link

Top Categories

Subscribe US

Dense and memory layers

Upgrading Memory Layers

You Might Also Like

The best ereaders for 2025

Tesla’s new Model Y arrives in the US

Best Portable Chargers and Power Banks to Buy for Android in 2025

The best Bluetooth trackers for 2025

Niantic and Capcom will launch update of Monster Hunter Now tied to Monster Hunter Wilds

Leave a Reply Cancel reply

Follow US

Weekly Newsletter

Popular News

Microsoft AutoGen v0.4: A turning point toward more intelligent AI agents for enterprise developers

Global Coronavirus Cases

Categories

Quick Link

Top Categories

Subscribe US