NVIDIA disclosed a major breakthrough in its collaboration with French AI startup Mistral AI on Tuesday, Eastern Time. Leveraging NVIDIA's latest chip technology, Mistral AI's new open-source model family has achieved significant improvements in performance, efficiency, and deployment flexibility.
The highlight of this collaboration is the Mistral Large 3 model, which demonstrated a 10x performance boost on NVIDIA's GB200 NVL72 system compared to the previous-generation H200 chip. This leap translates to better user experiences, lower per-response costs, and higher energy efficiency. The model can process over 5 million tokens per second per megawatt (MW) of power consumption.
Beyond large models, the Ministral 3 series of compact models has been optimized for NVIDIA's edge platforms, enabling deployment on RTX PCs, laptops, and Jetson devices. This allows enterprises to deploy AI applications in any scenario—from cloud to edge—without relying on continuous network connectivity.
Mistral AI's newly released model family includes one large frontier model and nine smaller models, all accessible via open-source platforms like Hugging Face and major cloud providers. Industry experts view this release as marking a new phase of "distributed intelligence" in open-source AI, bridging the gap between research breakthroughs and practical applications.
**GB200 System Powers Large Model Performance Leap**
Mistral Large 3 is a Mixture of Experts (MoE) model with 67.5 billion total parameters, 41 billion active parameters, and a 256K-token context window. Its architecture activates only the most relevant model components per token, rather than all neurons, ensuring precision while scaling efficiently.
NVIDIA stated that Mistral Large 3 achieved best-in-class performance on the GB200 NVL72 through a suite of optimizations tailored for advanced MoE models. Three key technologies enabled this breakthrough: Wide Expert Parallelism for optimized MoE kernel execution, NVFP4 low-precision inference for reduced computational costs, and the Dynamo distributed inference framework for enhanced long-text processing.
The model is compatible with mainstream inference frameworks like TensorRT-LLM, SGLang, and vLLM, allowing developers to flexibly deploy it across NVIDIA GPUs of varying scales with precision formats and hardware configurations suited to their needs.
**Small Models Optimized for Edge Deployment**
The Ministral 3 series comprises nine high-performance dense models across three parameter sizes (3B, 8B, and 14B), each offering base, instruction-tuned, and inference-optimized variants. All support vision capabilities, handle 128K–256K token contexts, and are multilingual.
On an NVIDIA RTX 5090 GPU, these small models achieve up to 385 tokens per second. Jetson Thor devices deliver 52 tokens/sec (single-concurrency) and scale to 273 tokens/sec (8-concurrency) using vLLM containers. Collaborations with Ollama and llama.cpp further optimize edge performance, enabling deployment on GeForce RTX AI PCs, DGX Spark, and Jetson devices for faster iteration, lower latency, and stronger data privacy.
With single-GPU operability, Ministral 3 can power robots, autonomous drones, vehicles, smartphones, and laptops—ideal for network-constrained or offline environments.
**Mistral Accelerates Commercialization**
This release marks Mistral AI's latest move to compete with leaders like OpenAI, Google, and DeepSeek. The 2023-founded company secured €1.7 billion in funding last September, including €1.3 billion from ASML and participation from NVIDIA, valuing it at €11.7 billion.
Guillaume Lample, Mistral AI's co-founder and chief scientist, noted that while large closed models excel in initial benchmarks, fine-tuned small models often match or surpass them in enterprise-specific use cases—at lower costs and higher speeds.
Mistral AI is rapidly commercializing, recently partnering with HSBC to provide models for financial analysis and translation tasks. Multi-million-dollar contracts with other firms and projects in physical AI (e.g., robotics with Singapore's HTX, Germany's Helsing, and Stellantis) underscore its expansion.
Mistral Large 3 and Ministral-14B-Instruct are now available via NVIDIA's API catalog and preview APIs, with NIM microservices soon enabling easy deployment on any GPU-accelerated infrastructure. All Mistral 3 models are downloadable from Hugging Face.