In the rapidly evolving landscape of artificial intelligence, a relatively new player has emerged on the scene, causing a stir in the tech world. Mixtral 8x7B, developed by the French artificial intelligence company Mistral AI, is turning heads with its unique approach and impressive capabilities. This article delves into what makes Mixtral 8x7B stand out and why it’s creating a buzz in the AI community.
Mixtral 8x7B: A game changer
While the tech world was captivated by Google’s Gemini update to Bard, Mixtral 8x7B quietly entered the fray. What sets it apart is its use of a Mixture of Experts (MoE) to generate human-like responses. This approach differs significantly from models like ChatGPT and Google Bard. Notably, Mixtral 8x7B boasts a formidable 46.7 billion parameter model, yet it demands only a fraction of the hardware resources.
Mixtral 8x7B’s performance is not to be underestimated. It confidently matches or even outperforms the renowned ChatGPT’s GPT-3.5 model and Meta’s Llama 2 70B model. This open-source model is licensed under Apache 2.0, allowing anyone to access and use it. It’s not confined to a single language, as it can work seamlessly in English, French, Italian, German, and Spanish. Moreover, it possesses the ability to generate code.
Meet Mistral AI – The brains behind the AI revolution
Mistral AI, the brains behind Mixtral, is a French AI company founded by researchers with previous experience at both Meta and Google. This year, Mistral AI made waves by securing around 450 million euros in funding. The release of Mixtral 8x7B, their latest model, was far from traditional, with a nondescript Torrent magnet link shared on Twitter.
The MoE advantage
Mixtral employs a MoE architecture to process incoming tokens, distributing them to various experts within the system. Each expert is essentially a neural network, and Mixtral 8x7B boasts eight of them. This architecture allows for hierarchical MoEs, where an expert can itself be another MoE. When a prompt is submitted to Mixtral 8x7B, a router network selects the most effective expert for processing each token. Interestingly, two experts are chosen per token, and their outputs are combined.
MoEs have their strengths and weaknesses. They excel in terms of compute efficiency during pre-training but can fall prey to overfitting during fine-tuning. Overfitting, in this context, refers to models relying too heavily on their training data and reproducing it verbatim in responses. On the positive side, MoEs offer faster inference times since only a subset of experts is used during inference.
However, they still demand sufficient RAM to accommodate a 47 billion parameter model. The 47 billion parameters differ from the expected 56 billion, as many parameters are shared among the MoEs, and not all 7 billion parameters in each expert are multiplied by eight.
User-friendly and accessible
One of Mixtral 8x7B’s notable features is its user-friendliness. It’s entirely user-tunable and available for deployment by anyone with a powerful enough computer. Users can run it locally using LM Studio, ensuring optimal control over the model’s behavior. Additionally, guardrails can be enabled to protect against potentially harmful content, although they are not activated by default. This ensures a safe and responsible AI experience.
For those who prefer not to run Mixtral locally or lack the hardware requirements, it’s available on Hugging Face. Hugging Face’s implementation comes with default guardrails, offering a similar experience to ChatGPT 3.5 in terms of performance and the range of queries it can handle. Mixtral 8x7B doesn’t specialize in a specific domain; rather, it’s a versatile and comprehensive large language model.
The future of generative AI
As technology continues to advance, 2023 has witnessed a surge in generative AI models. The landscape is expected to evolve further in the coming year, with the possibility of more models being released and continuous improvements. With rumors circulating about OpenAI and the potential advent of Artificial General Intelligence, the AI world is poised for even more exciting developments in the near future. Mixtral is set to be part of that future.