AMD admits its Instinct MI300X AI accelerator still can’t quite beat Nvidia’s H100 Hopper

Published on:

In context: The primary official efficiency benchmarks for AMD’s Intuition MI300X accelerator designed for information heart and AI purposes have surfaced. In comparison with Nvidia’s Hopper, the brand new chip secured blended leads to MLPerf Inference v4.1, an industry-standard benchmarking software for AI methods with workloads designed to guage AI accelerator coaching and inference efficiency.

On Wednesday, AMD launched benchmarks evaluating the efficiency of its MI300X with Nvidia’s H100 GPU to showcase its Gen AI inference capabilities. For the LLama2-70B mannequin, a system with eight Intuition MI300X processors reached a throughput of 21,028 tokens per second in server mode and 23,514 tokens per second in offline mode when paired with an EPYC Genoa CPU. The numbers are barely decrease than these achieved by eight Nvidia H100 accelerators, which hit 21,605 tokens per second in server mode and 24,525 tokens per second in offline mode when paired with an unspecified Intel Xeon processor.

- Advertisement -

When examined with an EPYC Turin processor, the MI300X fared slightly higher, reaching a throughput of twenty-two,021 tokens per second in server mode, barely larger than the H100’s rating. Nonetheless, in offline mode, the MI300X nonetheless scored decrease than the H100 system, reaching solely 24,110 tokens per second.

The MI300X helps larger reminiscence capability than the H100, doubtlessly permitting it to run a 70 billion parameter mannequin just like the LLaMA2-70B on a single GPU, thereby avoiding the community overhead related to mannequin splitting throughout a number of GPUs at FP8 precision. For reference, every occasion of the Intuition MI300X options 192 GB of HBM3 reminiscence and delivers a peak reminiscence bandwidth of 5.3 TB/s. Compared, the Nvidia H100 helps as much as 80GB of HMB3 reminiscence with as much as 3.35 TB/s of GPU bandwidth.

See also  Senators probe OpenAI on safety and employment practices

- Advertisement -

The outcomes largely align with Intel’s latest claims that its Blackwell and Hopper chips supply large efficiency features over competing options, together with the AMD Intuition MI300X. Likewise, Nvidia offered information displaying that in LLama2 exams, a system with eight MI300X processors reached solely 23,515 tokens per second at 750 watts in offline mode. In the meantime, the H100 achieved 24,525 tokens per second at 700 watts. The numbers for server mode are related, with the MI300X hitting 21,028 tokens per second, whereas the H100 scored 21,606 tokes per second at decrease wattage.

- Advertisment -

Related

- Advertisment -

Leave a Reply

Please enter your comment!
Please enter your name here