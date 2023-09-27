Large Language Models (LLMs) have made significant advancements in the field of Artificial Intelligence leveraging Natural Language Processing (NLP), Natural Language Understanding (NLU), and Natural Language Generation (NLG). These models, such as LLaMA and LLaMA2, have demonstrated remarkable capabilities in understanding and generating natural language. However, their context size limitations have posed challenges when dealing with lengthy documents or queries.

To address this issue, researchers have developed an efficient fine-tuning approach called LongLoRA. LongLoRA allows for the extension of the context sizes of pre-trained LLMs like LLaMA2 without incurring excessive computational costs. The method accelerates the process of expanding the context window in two important ways.

Firstly, LongLoRA introduces shift short attention (S2-Attn) for effective context extension during fine-tuning. While dense global attention is still necessary for LLMs during inference, the fine-tuning process can utilize sparse local attention, resulting in significant computational savings. By integrating S2-Attn into the training process with just two lines of additional code, context extension becomes both effective and efficient.

Secondly, LongLoRA reconsiders the fine-tuning procedure focusing on parameter-effective context expansion techniques. The researchers discovered that the low-rank adaptation (LoRA) method performs well for extending the context, particularly when trainable embedding and normalization layers are present. This approach allows for effective context expansion without significantly increasing the computational burden.

LongLoRA has demonstrated remarkable empirical results for a variety of tasks using LLaMA2 models ranging in size from 7B/13B to 70B. Using a single 8 x A100 GPU computer, the method increases the context of these models from 4k tokens to 100k tokens for LLaMA2 7B, or up to 32k tokens for LLaMA2 70B. Importantly, LongLoRA maintains the original model structures, ensuring compatibility with existing methods and tools.

In addition, the researchers have developed a dataset called LongQA to assist with supervised fine-tuning and the practical use of LongLoRA. This dataset includes more than 3,000 question-answer pairings with extensive contexts, further expanding the capabilities of LLMs.

LongLoRA offers a promising solution to extend the context sizes of large language models, allowing them to handle longer documents and queries effectively. The research paper and code can be found on the project’s GitHub repository [source].

Unfortunately, the source article does not provide any URLs for sources or definitions of terms mentioned.