Nvidia unveils inference microservices that can deploy AI applications in minutes

Jensen Huang, CEO of Nvidia, gave a keynote at the Computex trade show in Taiwan about transforming AI models with Nvidia NIM (Nvidia inference microservices) so that AI applications can be deployed within minutes rather than weeks.

He said the world’s world’s 28 million developers can now download Nvidia NIM — inference microservices that provide models as optimized containers — to deploy on clouds, data centers or workstations. It gives them the ability to easily build generative AI applications for copilots, chatbots and more, in minutes rather than weeks, he said.

These new generative AI applications are becoming increasingly complex and often utilize multiple models with different capabilities for generating text, images, video, speech and more. Nvidia NIM dramatically increases developer productivity by providing a simple, standardized way to add generative AI to their applications.

- Advertisement -

NIM also enables enterprises to maximize their infrastructure investments. For example, running Meta Llama 3-8B in a NIM produces up to three times more generative AI tokens on accelerated infrastructure than without NIM. This lets enterprises boost efficiency and use the same amount of compute infrastructure to generate more responses.

Nearly 200 technology partners — including Cadence, Cloudera, Cohesity, DataStax, NetApp, Scale AI and Synopsys — are integrating NIM into their platforms to speed generative AI deployments for domain-specific applications, such as copilots, code assistants, digital human avatars and more. Hugging Face is now offering NIM — starting with Meta Llama 3.

“Every enterprise is looking to add generative AI to its operations, but not every enterprise has a dedicated team of AI researchers,” said Huang. “Integrated into platforms everywhere, accessible to developers everywhere, running everywhere — Nvidia NIM is helping the technology industry
put generative AI in reach for every organization.”

Enterprises can deploy AI applications in production with NIM through the Nvidia AI Enterprise software platform. Starting next month, members of the Nvidia Developer Program can access NIM for free for research, development and testing on their preferred infrastructure.

- Advertisement -

More than 40 microservices power Gen AI models

NIMs will be useful in a variety of businesses including healthcare.

NIM containers are pre-built to speed model deployment for GPU-accelerated inference and can include Nvidia CUDA software, Nvidia Triton Inference Server and Nvidia TensorRT-LLM software.

Over 40 Nvidia and community models are available to experience as NIM endpoints on ai.nvidia.com, including Databricks DBRX, Google’s open model Gemma, Meta Llama 3, Microsoft Phi-3, Mistral Large, Mixtral 8x22B and Snowflake Arctic.

Developers can now access Nvidia NIM microservices for Meta Llama 3 models from the Hugging Face AI platform. This lets developers easily access and run the Llama 3 NIM in just a few clicks using Hugging Face Inference Endpoints, powered by NVIDIA GPUs on their preferred cloud.

Enterprises can use NIM to run applications for generating text, images and video, speech and digital humans. With Nvidia BioNeMo NIM microservices for digital biology, researchers can build novel protein structures to accelerate drug discovery.

Dozens of healthcare companies are deploying NIM to power generative AI inference across a range of applications, including surgical planning, digital assistants, drug discovery and clinical trial optimization.

Hundreds of AI ecosystem partners embedding NIM

Platform providers including Canonical, Red Hat, Nutanix and VMware (acquired by Broadcom) are supporting NIM on open-source KServe or enterprise solutions. AI application companies Hippocratic AI, Glean, Kinetica and Redis are also deploying NIM to power generative AI inference.

- Advertisement -

Leading AI tools and MLOps partners — including Amazon SageMaker, Microsoft Azure AI, Dataiku, DataRobot, deepset, Domino Data Lab, LangChain, Llama Index, Replicate, Run.ai, Securiti AI and Weights & Biases — have also embedded NIM into their platforms to enable developers to build and deploy domain-specific generative AI applications with optimized inference.

Global system integrators and service delivery partners Accenture, Deloitte, Infosys, Latentview, Quantiphi, SoftServe, TCS and Wipro have created NIM competencies to help the world’s enterprises quickly develop and deploy production AI strategies.

Enterprises can run NIM-enabled applications virtually anywhere, including on Nvidia-certified systems from global infrastructure manufacturers Cisco, Dell Technologies, Hewlett-Packard Enterprise, Lenovo and Supermicro, as well as server manufacturers ASRock Rack, Asus, Gigabyte, Ingrasys, Inventec, Pegatron, QCT, Wistron and Wiwynn. NIM microservices have also been integrated into Amazon
Web Services, Google Cloud, Azure and Oracle Cloud Infrastructure.

Industry leaders Foxconn, Pegatron, Amdocs, Lowe’s and ServiceNow are among the
businesses using NIM for generative AI applications in manufacturing, healthcare,
financial services, retail, customer service and more.

Foxconn — the world’s largest electronics manufacturer — is using NIM in the development of domain-specific LLMs embedded into a variety of internal systems and processes in its AI factories for smart manufacturing, smart cities and smart electric vehicles.

Developers can experiment with Nvidia microservices at ai.nvidia.com at no charge. Enterprises can deploy production-grade NIM microservices with Nvidia AI enterprise running on Nvidia-certified systems and leading cloud platforms. Starting next month, members of the Nvidia Developer Program will gain free access to NIM for research and testing.

Nvidia certified systems program

Nvidia is certifying its systems.

Fueled by generative AI, enterprises globally are creating “AI factories,” where data comes in and intelligence comes out.

And Nvidia is making its tech into a critical must-have so that enterprises can deploy validated systems and reference architectures that reduce the risk and time involved in deploying specialized infrastructure that can support complex, computationally intensive generative AI workloads.

Nvidia ALSO today announced the expansion of its Nvidia-certified systems program, which designates leading partner systems as suited for AI and accelerated computing, so customers can confidently deploy these platforms from the data center to the edge.

Two new certification types are now included: Nvidia-certified Spectrum-X Ready systems for AI in the data center and Nvidia-certified IGX systems for AI at the edge. Each Nvidia certified system undergoes rigorous testing and is validated to provide enterprise-grade performance, manageability, security and scalability for Nvidia AI.

Enterprise software workloads, including generative AI applications built with Nvidia NIM (Nvidia inference microservices). The systems provide a trusted pathway to design and implement efficient, reliable infrastructure.

The world’s first Ethernet fabric built for AI, the Nvidia Spectrum-X AI Ethernet platform combines the Nvidia Spectrum-4 SN5000 Ethernet switch series, Nvidia BlueField-3 SuperNICs and networking acceleration software to deliver 1.6x AI networking performance over traditional Ethernet fabrics.

Nvidia-certified Spectrum-X Ready servers will act as building blocks for high-performance AI computing clusters and support powerful Nvidia Hopper architecture and Nvidia L40S GPUs.

Nvidia-certified IGX Systems

Nvidia is all about AI.

Nvidia IGX Orin is an enterprise-ready AI platform for the industrial edge and medical applications that features industrial-grade hardware, a production-grade software stack and long-term enterprise support.

It includes the latest technologies in device security, remote provisioning and management, along with built-in extensions, to deliver high-performance AI and proactive safety for low-latency, real-time applications in such areas as medical diagnostics, manufacturing, industrial robotics, agriculture and more.

Top Nvidia ecosystem partners are set to achieve the new certifications. Asus, Dell Technologies, Gigabyte, Hewlett Packard Enterprise, Ingrasys, Lenovo, QCT and Supermicro will soon offer the certified systems.

And certified IGX systems will soon be available from Adlink, Advantech, Aetina, Ahead, Cosmo Intelligent Medical Devices (a division of Cosmo Pharmaceuticals), Dedicated Computing, Leadtek, Onyx and Yuan.

Nvidia also said that deploying generative AI in the enterprise is about to get easier than ever. Nvidia NIM, a set of generative AI inference microservices, will work with KServe, open-source software that automates putting AI models to work at the scale of a cloud computing application.

The combination ensures generative AI can be deployed like any other large enterprise application. It also makes NIM widely available through platforms from dozens of companies, such as Canonical, Nutanix and
Red Hat.

The integration of NIM on KServe extends Nvidia’s technologies to the open-source community, ecosystem partners and customers. Through NIM, they can all access the performance, support and security of the Nvidia AI Enterprise software platform with an API call — the push-button of modern programming.

Meanwhile, Huang said Meta Llama 3, Meta’s openly available state-of-the-art large language model — trained and optimized using Nvidia accelerated computing — is dramatically boosting healthcare and life sciences workflows, helping deliver applications that aim to improve patients’ lives.

Now available as a downloadable Nvidia NIM inference microservice at ai.nvidia.com, Llama 3 is equipping healthcare developers, researchers and companies to innovate responsibly across a wide variety of applications. The NIM comes with a standard application programming interface that can be deployed anywhere.

For use cases spanning surgical planning and digital assistants to drug discovery and clinical trial optimization, developers can use Llama 3 to easily deploy optimized generative AI models for copilots, chatbots and more.