AMD approached to make world’s fastest AI supercomputer powered by 1.2 million GPU

Published on:

Ahead-looking: It is no secret that Nvidia has been the dominant GPU provider to information facilities, however now there’s a very actual risk that AMD may turn out to be a severe contender on this market as demand grows. AMD was lately approached by a consumer asking to create an AI coaching cluster consisting of a staggering 1.2 million GPUs. That may probably make it 30x extra highly effective than Frontier, the present quickest supercomputer. AMD provided lower than 2% of information heart GPUs in 2023.

In an interview with The Subsequent Platform, Forrest Norrod, AMD’s GM of Datacenter Options revealed that that they had obtained real inquiries from purchasers to construct AI coaching clusters utilizing 1.2 million GPUs. To place that into perspective, present AI coaching clusters are sometimes created utilizing a number of thousand GPUs related through high-speed interconnect throughout a number of native server racks.

The dimensions being thought-about for AI improvement now’s unprecedented. “A number of the coaching clusters which are being contemplated are really mind-boggling,” Norrod stated. In actual fact, the most important identified supercomputer used to coach AI fashions is Frontier, which has 37,888 Radeon GPUs, making AMD’s potential supercomputer 30 instances extra highly effective than Frontier.

- Advertisement -

After all, it isn’t that easy. Even at present energy ranges, there is a plethora of pitfalls to contemplate when creating AI coaching clusters. AI coaching requires low latency to ship immediate outcomes, makes use of vital quantities of energy, and {hardware} failures should be considered – even with just some thousand GPUs.

See also  OpenAI inks deal with ride-sharing operator to develop AI tools

Most servers run at round 20% utilization and deal with 1000’s of small, asynchronous jobs in distant machines. Nevertheless, the rise of AI coaching is resulting in a major change in server construction. To maintain up with machine studying fashions and algorithms, an AI information heart should be geared up with huge quantities of computing energy specifically designed for the job. AI coaching is basically one giant, synchronous job that requires each node within the cluster to move data backwards and forwards as quick as potential.

What’s most fascinating is that these figures are coming from AMD, which accounted for lower than 2% of the info heart GPU shipments in 2023. Nvidia, which made up the opposite 98%, has remained tight-lipped about what its purchasers have requested it to create. Because the market chief, we will solely think about what they’re engaged on.

- Advertisement -

Whereas the proposed 1.2 million GPU supercomputer may appear outlandish, Norrod stated that “very sober individuals” are contemplating spending as much as 100 billion {dollars} on AI coaching clusters. This should not come as a shock, because the previous few years within the tech world have been outlined by the explosion in AI developments. Plainly firms are prepared to speculate vital sums in AI and machine studying to stay aggressive.

- Advertisment -

Related

- Advertisment -

Leave a Reply

Please enter your comment!
Please enter your name here