Table of contents
When NVIDIA launched the Hopper architecture in 2022, Jensen Huang said "NVIDIA H100 is the engine of the world's AI infrastructure that enterprises use to accelerate their AI-driven businesses." The NVIDIA Hopper is built with powerful innovations to handle complex AI workloads like training large models and running inference at scale with 30x speed over the NVIDIA A100. But how does the NVIDIA Hopper achieve this level of superior performance? Let's explore this in our latest blog.
What is NVIDIA Hopper?
NVIDIA Hopper is a groundbreaking GPU architecture designed to accelerate complex AI and high-performance computing (HPC) workloads. It is named after American computer scientist and mathematician Grace Hopper. The Hopper architecture is optimised for tasks requiring large-scale parallel processing and enhanced memory efficiency. The NVIDIA Hopper GPUs cater to a diverse niche of users including researchers, developers and enterprises to achieve faster results in their AI and machine learning applications.
NVIDIA Hopper Architecture
The NVIDIA Hopper architecture is built with over 80 billion transistors using a cutting-edge TSMC 4N process, NVLink Switch, Confidential Computing, Transformer Engine and Second-Generation MIG. These features drive the capabilities of the NVIDIA H100 and the NVIDIA H200 to provide the perfect solution for AI workloads in everything from training to inference and generative AI to deep learning tasks.
Features of NVIDIA Hopper Architecture
The NVIDIA Hopper architecture is built with several key innovations including:
- Transformer Engine: The transformer engine delivers up to 9x faster AI training and up to 30x faster AI. inference speedups on large language models compared to the prior generation A100.
- NVLink Switch System: The fourth-generation NVLink achieves 900GB/s GPU bidirectional bandwidth, while NVSwitch scales H200 clusters, providing exceptional throughput for trillion-parameter AI models.
- Confidential Computing: The NVIDIA Hopper GPUs are the first to use confidential computing for data protection during processing. This feature maintains the confidentiality and integrity of AI models and algorithms deployed on any Hopper GPU.
- Second-Generation Multi-Instance GPU (MIG): The enhanced MIG enables multi-tenant configurations with up to seven secure instances per GPU, isolating users while delivering optimal resource allocation for video analytics or smaller workloads.
- Dynamic Programming Execution: DPX Instructions accelerate algorithms like DNA alignment and graph analytics by 7X over Ampere GPUs, offering faster, more efficient dynamic programming solutions.
Use Cases of NVIDIA Hopper GPUs
NVIDIA Hopper GPUs are designed for high-performance workloads. Check out the use cases of the NVIDIA Hopper GPUs:
- AI Inference: NVIDIA Hopper GPUs deliver industry-leading performance for deploying AI models into production environments. Their ability to process massive amounts of data at high speeds allows for real-time predictions in applications such as autonomous vehicles, healthcare diagnostics, and e-commerce recommendation systems, ensuring rapid and accurate results across a wide range of industries.
- Conversational AI: Optimised for natural language processing (NLP), Hopper GPUs power conversational AI systems, including chatbots and virtual assistants. They efficiently handle the large models and data volumes typical in conversational AI, ensuring high-speed processing for real-time conversations and seamless integration into business solutions such as customer service automation and virtual personal assistants.
- Data Analytics: With superior computational capabilities, Hopper GPUs accelerate data analytics by enabling the rapid processing of massive datasets. Their ability to perform complex calculations in parallel significantly reduces the time required to derive insights from big data, providing valuable intelligence across sectors such as finance, marketing, and logistics, for faster decision-making and competitive advantage.
- Deep Learning Training: Hopper GPUs are ideally suited for deep learning tasks, providing the power needed to train large-scale neural networks. Through optimised tensor operations and memory management, they enable significantly reduced training times, allowing researchers to focus on refining models and accelerating AI breakthroughs in areas like image recognition, speech processing, and natural language understanding.
- Generative AI: For applications such as content creation, simulation, and design, Hopper GPUs provide the computational horsepower necessary for training and executing generative AI models. These models, used in creative tasks such as art generation, video creation, and virtual environments, benefit from the parallel processing and efficiency offered by Hopper, making AI-driven creativity faster and more diverse.
- Prediction and Forecasting: In sectors such as finance, logistics, and retail, Hopper GPUs help process large volumes of data to generate precise predictions and forecasts. These capabilities improve decision-making by delivering accurate real-time insights and forecasts, helping businesses with everything from stock market predictions to supply chain management and demand forecasting.
- Scientific Research and Simulation: Hopper GPUs excel in high-performance computing (HPC) applications, making them invaluable for simulations and scientific research. Their massive computational power enables researchers to conduct highly complex simulations in fields such as astrophysics, climate modelling, and computational chemistry. For memory-intensive tasks, their high memory bandwidth ensures data is processed and accessed efficiently, significantly accelerating time to results.
Performance of NVIDIA Hopper GPUs: NVIDIA H100 vs NVIDIA H200
The NVIDIA Hopper GPUs- NVIDIA H100 and NVIDIA H200 have set benchmarks for high-performance computing and AI workloads with distinctive capabilities designed for modern demands. Here's a comparison between the performance of NVIDIA Hopper GPUs:
Memory and Bandwidth
- NVIDIA H100: The NVIDIA H100 has HBM3 memory with a capacity of up to 80 GB and a memory bandwidth of approximately 2 TB/s. It provides robust support for generative AI and HPC applications requiring high memory bandwidth for efficient performance.
- NVIDIA H200: The NVIDIA H200 has next-gen HBM3e memory with an impressive capacity of 141 GB and a bandwidth of 4.8 TB/s. This is nearly double the capacity of The NVIDIA H100, coupled with 1.4x more bandwidth, significantly enhancing its ability to handle larger datasets and intensive applications.
AI Inference Performance
- NVIDIA H100: The NVIDIA H100 delivers strong inference throughput for large language models like GPT-3 and Llama 2 at standard batch sizes.
- NVIDIA H200: The NVIDIA H200 provides 2x the inference performance for models like Llama 2-70B, supporting batch sizes up to 32. This substantial improvement enables faster processing and efficient scaling for enterprise-level AI applications.
HPC and Scientific Computing
- NVIDIA H100: The NVIDIA H100 offers excellent performance for traditional HPC applications. It remains a reliable choice for a wide variety of simulation and research workloads.
- NVIDIA H200: The NVIDIA H200 excels with advanced optimisations, delivering up to 110x HPC performance compared to the previous-generation GPUs. The increase in memory bandwidth is crucial for demanding simulations, scientific research, and AI workloads requiring rapid data transfer.
Performance of NVIDIA H100 vs NVIDIA H200 for LLM Workloads
When working with advanced AI models like Llama and GPT, scalability and throughput are imperative. Here’s how the NVIDIA H100 and NVIDIA H200 perform on popular LLMs benchmarks [See Source]:
Model |
Batch size (H100) |
Batch size (H200) |
Throughput Improvement |
Llama 2 (13B) |
64 |
128 |
Upto 2x |
Llama 2 (70B) |
6 |
32 |
Upto 4x |
GPT-3 (175B) |
64 |
128 |
Upto 2x |
Llama 2 Performance
As seen above, the NVIDIA H200 offers significant improvements for Llama 2 (13B), supporting batch sizes up to 128 while maintaining higher inference throughput. While the NVIDIA H100 can process batch sizes of up to 64 efficiently.
For Llama 2 (70B), NVIDIA H100 provides solid performance on standard batch sizes of up to 8. The NVIDIA H200 can handle larger batch sizes, increasing capacity fourfold to batch sizes of 32. This dramatically accelerates throughput, making it ideal for real-time AI applications.
GPT-3 Performance
The NVIDIA H100 8 SXM GPUs deliver reliable performance at batch sizes of up to 64 for tasks involving the GPT-3 (175B) model. While the NVIDIA H200 uses the same 8 SXM GPU configuration and batch size capacity doubles to 128 for faster computation.
Efficiency in Inference
Inference is a compute-intensive process that benefits significantly from the advanced architecture and memory bandwidth of the NVIDIA H200. By doubling inference performance compared to the NVIDIA H100, the NVIDIA H200 enables faster responses and real-time capabilities in scenarios involving massive datasets or concurrent queries. Generative AI Applications such as retrieval-augmented generation (RAG), complex question answering and AI-based chatbots see massive improvement with the NVIDIA H200.
NVIDIA Hopper GPUs on AI Supercloud
At the AI Supercloud, we offer the NVIDIA Hopper GPUs but we don’t just deliver hardware we optimise it to match your specific needs. On the AI SUpercloud, you get:
- Reference Architecture: The AI Supercloud features reference architectures developed in partnership with NVIDIA, including the NVIDIA HGX H100 and NVIDIA HGX H200, providing state-of-the-art solutions for AI and HPC workloads.
- Customisation: We offer full customisation, allowing you to tailor hardware configurations, including GPUs, CPUs, RAM, storage and middleware to meet your specific workload requirements.
- Advanced Networking: Our solution integrates NVIDIA-certified WEKA storage with GPUDirect Storage support, alongside advanced networking solutions like NVLink and NVIDIA Quantum-2 InfiniBand for faster AI performance.
- Scalable Solutions: You can scale effortlessly by accessing additional GPU resources on-demand for workload bursting through Hyperstack. Or if you have demanding needs, you can scale up to thousands of NVIDIA Hopper GPUs within as little as 8 weeks.
Want to Get Started? Talk to a Solutions Engineer
Book a call with our specialists to discover the best solution for your project’s budget, timeline, and technologies.
FAQs
What is NVIDIA Hopper?
NVIDIA Hopper is an advanced GPU architecture designed for high-performance AI and HPC workloads, featuring innovations like the Transformer Engine and NVLink Switch System for optimal performance in training, inference, and generative AI applications.
What is NVIDIA Hopper GPU used for?
NVIDIA Hopper GPUs are ideal for demanding tasks such as AI inference, deep learning training, scientific simulations, data analytics, and generative AI, accelerating these workloads for faster and more efficient results.
What is so special about NVIDIA Hopper?
The NVIDIA Hopper architecture stands out with innovations like the Transformer Engine for AI model training speedups, increased memory capacity, and scalable multi-instance configurations, providing unmatched performance in large-scale AI tasks.
Can I get Hopper GPUs on the AI Supercloud?
Yes, AI Supercloud offers the NVIDIA HGX H100 and NVIDIA HGX H200, optimised and customisable to meet your specific AI and HPC workload needs for maximum performance.
Are the NVIDIA Hopper GPUs scalable?
Yes, the NVIDIA Hopper GPUs available on the AI Supercloud are fully scalable, allowing you to easily scale your resources as your AI and HPC workloads grow. With on-demand GPU resources and fast provisioning, you can scale up to thousands of NVIDIA Hopper GPUs in as little as eight weeks.