

Nemotron 3 Ultra — reasoning and orchestration model from NVIDIA built on a hybrid Transformer-Mamba Mixture-of-Experts architecture with 550B parameters and 1M token context.
NVIDIA Nemotron 3 Ultra is a next-generation reasoning and orchestration model based on a hybrid Transformer-Mamba Mixture-of-Experts architecture with 550 billion total parameters and 55 billion active parameters per forward pass. Designed for enterprise-grade agentic workflows, it delivers strong performance on complex reasoning tasks, multi-step analysis, and long-document understanding — with support for contexts up to 1 million tokens.
Nemotron 3 Ultra uses a hybrid Transformer-Mamba architecture combined with Mixture-of-Experts (MoE) routing. The MoE design activates only 55B of 550B parameters per token, enabling efficient inference at scale. The Mamba layers provide linear-complexity sequence modeling for long-context tasks, while Transformer attention layers handle high-precision reasoning over complex inputs.
- Extended Context: Supports up to 1M token context for long-document analysis and retrieval.
- Complex Reasoning: Optimized for chain-of-thought, multi-step problem solving, and logical inference.
- Tool Use: Supports function calling for agentic and orchestration workflows.
- Agent Orchestration: Designed as both an orchestrator and sub-agent in multi-agent pipelines.
- Instruction Following: Strong performance on precise instruction adherence across diverse tasks.
- Code Generation: Capable of generating, reviewing, and debugging complex code across languages.
- Long-Context Summarization: Processes and summarizes large documents, codebases, and transcripts.
NVIDIA Nemotron 3 Ultra is a next-generation reasoning and orchestration model based on a hybrid Transformer-Mamba Mixture-of-Experts architecture with 550 billion total parameters and 55 billion active parameters per forward pass. Designed for enterprise-grade agentic workflows, it delivers strong performance on complex reasoning tasks, multi-step analysis, and long-document understanding — with support for contexts up to 1 million tokens.
Nemotron 3 Ultra uses a hybrid Transformer-Mamba architecture combined with Mixture-of-Experts (MoE) routing. The MoE design activates only 55B of 550B parameters per token, enabling efficient inference at scale. The Mamba layers provide linear-complexity sequence modeling for long-context tasks, while Transformer attention layers handle high-precision reasoning over complex inputs.
- Extended Context: Supports up to 1M token context for long-document analysis and retrieval.
- Complex Reasoning: Optimized for chain-of-thought, multi-step problem solving, and logical inference.
- Tool Use: Supports function calling for agentic and orchestration workflows.
- Agent Orchestration: Designed as both an orchestrator and sub-agent in multi-agent pipelines.
- Instruction Following: Strong performance on precise instruction adherence across diverse tasks.
- Code Generation: Capable of generating, reviewing, and debugging complex code across languages.
- Long-Context Summarization: Processes and summarizes large documents, codebases, and transcripts.