🔥 Land A High Paying Web3 Job In 90 Days LEARN MORE

Elon Musk announces GROK 3 training at Memphis with NVIDIA H100 GPUs

In this post:

  • Elon Musk has kicked off GROK 3 at Memphis and plans to use 100,000 NVIDIA H100 GPUs. 
  • xAI canceled the $10 billion Oracle contract to develop its own supercomputer, with the delivery due by December. 
  • xAI opted for H100 GPUs rather than the upcoming H200 to sustain the momentum.

Elon Musk has officially announced the commencement of GROK 3 training at the Memphis supercomputer facility, equipped with NVIDIA’s current-generation H100 GPUs. The facility, which Musk refers to as ‘the most powerful AI training cluster in the world,’ began operating on Monday with the aid of 100,000 liquid-cooled H100 GPUs on a single RDMA fabric.

The training was scheduled at 4:20 am local time in Memphis. In his subsequent tweet, Musk stated that the world’s “most advanced AI” could be developed by December of this year. Musk also tweeted about the achievement on X and congratulated the teams from xAI, X, and NVIDIA for their excellent work. 

xAI shifts strategy and cancels Oracle server deal

The announcement comes in the wake of the recent cancellation of a $10 billion server deal between xAI and Oracle. Musk indicated that the xAI Gigafactory of Compute, initially expected to be operational by the fall of 2025, has started operations ahead of schedule.

See also  Jeff Bezos bets millions backing Nvidia rival Tenstorrent in a funding round

xAI had earlier outsourced its AI chips from Oracle but decided to disengage in order to develop its own advanced supercomputer. The project now plans to harness the potential of the state-of-the-art H100 GPUs that cost around $30,000 each. GROK 2 used 20,000 GPUs, and GROK 3 needed five times as many GPUs to build a more sophisticated AI chatbot. 

Also Read:Elon Musk seeks public opinion on $5 billion xAI investment for Tesla

This is quite surprising, especially because NVIDIA has just recently announced the upcoming release of the H200 GPUs, which are based on the Hopper architecture. The decision to begin training with H100 GPUs instead of waiting for the H200 or the forthcoming Blackwell-based B100 and B200 GPUs. The H200 GPUs, which entered mass production in Q2, promise significant performance enhancements, but xAI’s immediate focus is on leveraging the existing H100 infrastructure to meet its ambitious targets.

Analyst questions power supply for Memphis Supercluster

Dylan Patel, an expert in AI and semiconductors, initially raised concerns over power concerns with running the Memphis Supercluster. He pointed out that the current grid supply of 7 megawatts can only sustain about 4,000 GPUs. The Tennessee Valley Authority (TVA) is expected to supply 50MW to the facility as a deal that is expected to be signed by the 1st of August. However, the substation that will be needed to meet the full power demand will only be completed in late 2024. 

See also  Elon Musk takes legal action to stop OpenAI from becoming a for-profit

When analyzing satellite images, Patel noted that Musk has employed 14 VoltaGrid mobile generators, which can yield 2. 5 megawatts each. Altogether, these generators produce 35 megawatts of electricity. In addition to the 8MW from the grid, this makes it a total of 43MW, which is enough to power about 32,000 H100 GPUs with some power capping.

Land a High-Paying Web3 Job in 90 Days: The Ultimate Roadmap

Share link:

Disclaimer. The information provided is not trading advice. Cryptopolitan.com holds no liability for any investments made based on the information provided on this page. We strongly recommend independent research and/or consultation with a qualified professional before making any investment decisions.

Most read

Loading Most Read articles...

Stay on top of crypto news, get daily updates in your inbox

Editor's choice

Loading Editor's Choice articles...
Cryptopolitan
Subscribe to CryptoPolitan