Multi Token Prediction Increases AI Model Speed Three Times, Says Meta

2 mins read May 7, 2024

A research study by meta-researchers shows that multi-token predictions can increase the performance of LLMs.
The technique involves using multiple output heads to make predictions simultaneously.
It requires no extra cost in memory or time, as the process uses the same basic inference architecture.

Training language models to predict multiple tokens at once results in better sample efficiency, says researchers at Meta.

Large language models like Llama and ChatGPT are usually trained for the next token prediction, but with this new approach, better performance can be achieved.

What is single token prediction technique?

The multi-token prediction technique provides a significant edge in some scenarios with three times the speed of generative tasks, but it still is not a one-size-fits-all solution for every type of model. The technique has quite some room for improvement, and for some LLM applications, it can become a robust tool.

For a more clearer understanding, it can be said that the traditional process for LLM training uses an approach called “next-token prediction,” and in this way, a model predicts only the next one future token in a given sequence.

In an automated process, the token it predicted is added to the input, and the process is repeated over and over again over the entire text input provided so that the model learns the common patterns and develops the ability to produce output consisting of logical and consistent text.

There are some drawbacks to this technique, as by processing only the next token, the model becomes too focused on the local patterns in text and ignores the predictions that can only be made with reasoning.

Another problem with this technique is that it requires huge amounts of datasets to be fed into the model to reach the normal flow of language output that humans can do with very little text.

Multi token prediction enables 3X speed

In the new multi-token approach suggested by Meta, the LLM is instructed to predict multiple tokens from different positions at the same time in the training process. The researchers used a simple prediction architecture for multi-token prediction that does not require extra resources like time and memory processing.

Researchers used the same Transformer architecture that is already used by most LLMs, but they did make some changes to accommodate multiple token prediction by increasing its output heads from single to multiple and allocating one to each token.

In this way, for drawing conclusions and making predictions, the model uses the same basic next prediction strategy, but by utilizing multiple heads, it can speed up the process. The research study says,

“While cost-free and simple, multi-token prediction is an effective modification to train stronger and faster transformer models.”
Source: Meta.

Researchers found during the study that the technique produced subpar results when they used it on smaller models, but the results became better than average when they applied the same process to larger models, and the results kept improving with the size of the model. As the study writes,

“The method is increasingly useful for larger model sizes, and keeps its appeal when training for multiple epochs. Gains are especially pronounced on generative benchmarks like coding, where our models consistently outperform strong baselines by several percentage points.”
Source: Meta.

Researchers also said that the multi token prediction technique also makes the model three times faster at producing logical results, which is useful with the benefit of no or very little extra cost.

The smartest crypto minds already read our newsletter. Want in? Join them.

Share this article

Disclaimer. The information provided is not trading advice. Cryptopolitan.com holds no liability for any investments made based on the information provided on this page. We strongly recommend independent research and/or consultation with a qualified professional before making any investment decisions.

Aamir Sheikh

Aamir is a tech journalist with nearly six years of experience in the crypto and tech industries. He graduated from MAJ University with an MBA in Finance and Marketing. He now works with Cryptopolitan, where he reports on the latest developments in the cryptocurrency markets and price prediictions.

TABLE OF CONTENT

1. What is single token prediction technique?

2. Multi token prediction enables 3X speed

Share this article

MORE … NEWS

SHOW ALL

What Is Base? The Ethereum Layer-2 Network Launched by Coinbase

October 21, 2025 Learn Crypto: Beginner Guides
Dogecoin vs. Bitcoin: Key Technical Differences

October 20, 2025 Learn Crypto: Beginner Guides
What Is TVL (Total Value Locked) in Crypto?

October 14, 2025 Learn Crypto: Beginner Guides
How to Read a Crypto Whitepaper?

October 13, 2025 Learn Crypto: Beginner Guides
Ripple vs. XRP vs. XRP Ledger: What’s the Difference?

October 13, 2025 Learn Crypto: Beginner Guides
What Is a Multisig Wallet in Crypto?

October 10, 2025 Learn Crypto: Beginner Guides

DEEP CRYPTO
CRASH COURSE

Which cryptocurrencies can make you money
How to boost your security with a wallet (and which ones are actually worth using)
Little-known investment strategies that the pros use
How to get started investing in crypto (which exchanges to use, the best crypto to buy etc)

Multi Token Prediction Increases AI Model Speed Three Times, Says Meta

What is single token prediction technique?

Multi token prediction enables 3X speed

5 Ingenious Applications of ChatGPT And What You Should Do About Them

93% Business Leaders Favor AI-Powered Solutions for Brand Sustainability Management, Reuters

Here’s How Macron Supports France’s Vibrant and Productive AI Ecosystem

Bloomberg Estimates the Generative AI Market to Reach $1.3 Trillion by 2032

One sharp brief.
Every day.

Multi Token Prediction Increases AI Model Speed Three Times, Says Meta

What is single token prediction technique?

Multi token prediction enables 3X speed

5 Ingenious Applications of ChatGPT And What You Should Do About Them

93% Business Leaders Favor AI-Powered Solutions for Brand Sustainability Management, Reuters

Here’s How Macron Supports France’s Vibrant and Productive AI Ecosystem

Bloomberg Estimates the Generative AI Market to Reach $1.3 Trillion by 2032

One sharp brief.Every day.

One sharp brief.
Every day.