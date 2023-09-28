Large Language Models (LLMs) have transformed Natural Language Processing (NLP) with their ability to handle intricate tasks. However, there has been a lack of open recipes for constructing comparable long-context models that can deliver similar performance. Existing open-source models often fall short on evaluations, neglecting the need to maintain strong performance on short-context tasks.

In a recent paper titled “Effective Long-Context Scaling of Foundation Models,” the Meta AI research team introduces a series of long-context LLMs, built through pretraining from LLAMA 2. These models outperform all existing open-sourced models in terms of performance.

The proposed models utilize continuous pretraining from LLAMA 2 checkpoints, with an additional 400 billion tokens incorporated into long training sequences. The core architecture of LLAMA 2 is preserved, with a crucial modification to the positional encoding to handle longer contexts effectively.

By reducing the rotation angle of the RoPE positional encoding, the researchers mitigate the decaying effect for distant tokens, enhancing the model’s ability to attend to longer contexts.

The team also emphasizes the importance of data quality in achieving superior long-context performance. They find that data curation plays a more pivotal role than the length of texts used in the context of continual pretraining.

To improve long-context abilities, the researchers employ a simple and cost-effective approach to instruction tuning. They leverage a pre-existing, large, and diverse short-prompt dataset and augment it with self-instructed long data generated LLAMA 2 CHAT. This strategy allows the model to acquire a diverse set of skills and transfer that knowledge to long-context scenarios.

Extensive evaluations, including language modeling and synthetic context probing tasks, demonstrate that the proposed models consistently outperform LLAMA 2 on standard tasks and show significant improvements on long-context tasks.

This pioneering work the Meta AI research team has the potential to democratize access to long-context LLMs, advancing the field of Natural Language Processing. It empowers researchers and developers to tackle more complex language understanding tasks, marking a significant step forward in AI-driven language models.

Source: “Effective Long-Context Scaling of Foundation Models” – AI Meta (https://ai.meta.com)

Author: Hecate He | Editor: Chain Zhang