How does MiMo V2.5 Pro differ from `mimo-v2.5`?

It's the Pro tier. MiMo V2.5 Pro activates a larger share of a larger parameter pool per token than `mimo-v2.5`, with higher per-token cost in return for stronger reasoning, code, and agentic scores.

What architecture does MiMo V2.5 Pro use?

A Mixture-of-Experts (MoE) stack with hybrid attention. Each token activates a subset of expert blocks, and sliding-window plus full attention combine to keep KV-cache storage manageable across the 1.1M tokens window.

What's the context window for MiMo V2.5 Pro?

1.1M tokens. Hybrid attention keeps long-context runs practical, and multi-token prediction raises output tokens per inference step.

Does MiMo V2.5 Pro support tool calling and reasoning modes?

Yes. MiMo V2.5 Pro supports reasoning and tool calling, both exposed through AI Gateway. Use them through the AI SDK, the Chat Completions API, the Responses API, or any other supported format.

How do I authenticate requests to MiMo V2.5 Pro through AI Gateway?

Add your API key in AI Gateway project settings. Use `xiaomi/mimo-v2.5-pro` in API calls. AI Gateway routes, retries, and fails over across xiaomi.

What does MiMo V2.5 Pro cost?

See the pricing section on this page for today's rates. AI Gateway tracks each provider's pricing for MiMo V2.5 Pro, so the numbers shown stay current.

Does MiMo V2.5 Pro support zero data retention?

Zero Data Retention is not currently available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.

Can I route between MiMo V2.5 Pro and `mimo-v2.5` automatically?

Yes. AI Gateway supports fallback and routing. Send hard reasoning, code, and agentic requests to MiMo V2.5 Pro and fall back to `mimo-v2.5` for simpler tasks to keep costs in check.

MiMo V2.5 Pro

MiMo V2.5 Pro is the Pro tier of Xiaomi's MiMo v2.5 family, a Mixture-of-Experts (MoE) reasoning model built for agentic workflows, software engineering, and long-horizon tasks. It supports a context window of 1.1M tokens and 131K tokens max output tokens.

ReasoningTool UseVision (Image)File InputImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'xiaomi/mimo-v2.5-pro',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out MiMo V2.5 Pro by Xiaomi. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

1.1M

2.6s

50tps

$1.00/M

$3.00/M

Read:

$0.2/M

Write:

—

04/22/2026

More models by Xiaomi

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

1.1M

1.6s

111tps

$0.40/M

$2.00/M

Read:

$0.08/M

Write:

—

04/22/2026

2.2s

46tps

$1.00/M

$3.00/M

Read:

$0.2/M

Write:

—

03/18/2026

262K

1.6s

133tps

$0.10/M

$0.30/M

Read:$0.02/M

Write:—

—

12/17/2025

About MiMo V2.5 Pro

MiMo V2.5 Pro is the Pro variant in Xiaomi's MiMo v2.5 family, released April 22, 2026 under the MIT license. Compared to the standard tier, Pro activates a larger share of a larger parameter pool per token, which raises reasoning depth at higher per-token cost.

Like the rest of the line, MiMo V2.5 Pro uses a Mixture-of-Experts (MoE) stack with hybrid attention. Sliding-window and full attention combine in a fixed ratio, which cuts KV-cache storage versus dense attention at the same sequence length. Three multi-token prediction (MTP) blocks raise output tokens per inference step. The full window of 1.1M tokens fits long agent traces, repos, or document sets.

MiMo V2.5 Pro supports reasoning, tool calling, file input, vision, and implicit prompt caching. Call it through xiaomi via AI Gateway. For lower-cost everyday work, see mimo-v2.5.

What To Consider When Choosing a Provider

Configuration: MiMo V2.5 Pro sits at the Pro end of MiMo v2.5. Per-token cost is higher than mimo-v2.5, but accuracy on hard math, agentic, and engineering tasks is the reason to pick it. Use AI Gateway's routing and fallback to send easy work to mimo-v2.5 and reserve MiMo V2.5 Pro for the requests where reasoning depth pays off.
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use MiMo V2.5 Pro

Best For

Long-Horizon Agents: Trajectories that span thousands of tool calls in a single run
Complex Software Engineering: Issue resolution, repo-level edits, and multi-file refactors
Math and Proofs: Long logical chains where intermediate reasoning steps matter
Long-Context Reasoning: Documents and codebases approaching 1.1M tokens
Pro-Tier MoE: Higher active-parameter compute for harder reasoning workloads

Consider Alternatives When

Throughput-Sensitive Workloads: mimo-v2.5 runs cheaper per token for everyday agent or code work
Short Prompt-and-Reply Calls: A smaller model is enough when you don't need deep reasoning
Speed-First Tasks: mimo-v2-flash from the previous generation is throughput-tuned
Simple Extraction Jobs: A lightweight model handles classification at lower cost

Conclusion

MiMo V2.5 Pro is the Pro pick in Xiaomi's MiMo v2.5 lineup. Use it for long-horizon agents, complex software engineering, and math-heavy reasoning. Pair it with mimo-v2.5 through AI Gateway routing so you can balance cost and quality across a mixed workload.

Frequently Asked Questions

How does MiMo V2.5 Pro differ from mimo-v2.5?
It's the Pro tier. MiMo V2.5 Pro activates a larger share of a larger parameter pool per token than mimo-v2.5, with higher per-token cost in return for stronger reasoning, code, and agentic scores.
What architecture does MiMo V2.5 Pro use?
A Mixture-of-Experts (MoE) stack with hybrid attention. Each token activates a subset of expert blocks, and sliding-window plus full attention combine to keep KV-cache storage manageable across the 1.1M tokens window.
What's the context window for MiMo V2.5 Pro?
1.1M tokens. Hybrid attention keeps long-context runs practical, and multi-token prediction raises output tokens per inference step.
Does MiMo V2.5 Pro support tool calling and reasoning modes?
Yes. MiMo V2.5 Pro supports reasoning and tool calling, both exposed through AI Gateway. Use them through the AI SDK, the Chat Completions API, the Responses API, or any other supported format.
How do I authenticate requests to MiMo V2.5 Pro through AI Gateway?
Add your API key in AI Gateway project settings. Use xiaomi/mimo-v2.5-pro in API calls. AI Gateway routes, retries, and fails over across xiaomi.
What does MiMo V2.5 Pro cost?
See the pricing section on this page for today's rates. AI Gateway tracks each provider's pricing for MiMo V2.5 Pro, so the numbers shown stay current.
Does MiMo V2.5 Pro support zero data retention?
Zero Data Retention is not currently available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.
Can I route between MiMo V2.5 Pro and mimo-v2.5 automatically?
Yes. AI Gateway supports fallback and routing. Send hard reasoning, code, and agentic requests to MiMo V2.5 Pro and fall back to mimo-v2.5 for simpler tasks to keep costs in check.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

MiMo V2.5 Pro

Playground

Providers

More models by Xiaomi

About MiMo V2.5 Pro

What To Consider When Choosing a Provider

When to Use MiMo V2.5 Pro

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions