Tesla's New FSD Training Cluster With 10,000 GPUs Comes Online
Today, Tesla is turning on its massive new system to advance its Full Self-Driving (FSD) technology. The system has many technicalities, so Tesla enthusiast and investor, Sawyer Merritt, shared insights on X.com.
Tesla uses Graphic Processing Units (GPUs) to train its neural networks that are used to solve AI-based problems. Training Tesla's AI involves using vast amounts of video to teach the system how to operate a vehicle autonomously and safely. NVIDIA, a leading GPU manufacturer, recently released a new chip, the H100, which offers significantly higher performance than its predecessor, the A100. However, the H100 doesn't come cheap. NVIDIA's latest training-focused GPU costs about $30,000, and Tesla needs more than just a handful.
Tesla's new training cluster will be powered by 10,000 H100 NVIDIA GPUs, bringing its cost to hundreds of millions of dollars for the GPUs alone.
However, due to the high demand for this advanced chip, NVIDIA cannot produce enough H100 units to meet Tesla's, and the industry's growing demand.
The Power of Dojo
As a result, Tesla is investing over $1 billion to develop its own supercomputer, Dojo. Dojo will have a custom-designed chip to train Tesla's neural nets that will be leveraged in FSD and Optimus. Elon Musk, Tesla’s CEO, indicated that if NVIDIA could supply enough chips, the creation of Dojo might not have been necessary. Yet, the current situation has made it imperative for Tesla to take matters into its own hands.
Merritt underscored that the computational power required to train Tesla's FSD technology is a significant challenge. However, Tesla is actively addressing this issue. Musk announced that the company plans to invest over $2 billion in 2023 and again in 2024 to enhance its computational capabilities. This investment is necessary to develop FSD technology and signals Tesla’s commitment to innovation and leadership in the automotive industry.
Groundbreaking Data Centers
Tesla is not only developing its own supercomputer but is also in the process of designing what it describes as its "1st of its kind Data Centers." A recent job listing for a Senior Engineering Program Manager in Austin, Texas, where Tesla's Giga Factory operates, reveals the complexity of this initiative.
The establishment of these data centers is integral to Tesla's broader vision. The company’s need for extensive computing resources is paramount, especially when processing and analyzing large amounts of video data for its self-driving software. The Dojo supercomputer is already a testament to their commitment in this area.