How does MiMo M2.5 differ from `mimo-v2.5-pro`?

MiMo M2.5 is the standard tier. `mimo-v2.5-pro` activates a larger share of a larger parameter pool per step, so it costs more but reasons further on harder math, code, and agentic problems.

What architecture does MiMo M2.5 use?

A Mixture-of-Experts (MoE) stack with hybrid attention. Each token activates a subset of expert blocks, and sliding-window plus full attention combine to keep KV cache manageable at long context lengths.

What's the context window for MiMo M2.5?

1.1M tokens. Hybrid attention keeps KV-cache memory in check so long-context runs stay practical.

Does MiMo M2.5 support tool calling and reasoning modes?

Yes. MiMo M2.5 supports tool calling and a reasoning mode, both exposed through AI Gateway. Use them through the AI SDK, the Chat Completions API, the Responses API, or any other supported format.

How do I authenticate requests to MiMo M2.5 through AI Gateway?

Add your API key in AI Gateway project settings. Use `xiaomi/mimo-v2.5` in API calls. AI Gateway routes, retries, and fails over across xiaomi.

What does MiMo M2.5 cost?

See the pricing section on this page for today's rates. AI Gateway tracks each provider's pricing for MiMo M2.5, so the numbers shown stay current.

Does MiMo M2.5 support zero data retention?

Zero Data Retention is not currently available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.

Is MiMo M2.5 available under an open-source license?

Yes. The MiMo v2.5 line is released under the MIT license, which allows commercial use, modification, and redistribution.

MiMo M2.5

MiMo M2.5 is the mid-tier model in Xiaomi's MiMo v2.5 family, a Mixture-of-Experts (MoE) stack with reasoning, tool use, and multimodal input. It supports a context window of 1.1M tokens and 131.1K tokens max output tokens.

ReasoningTool UseImplicit CachingFile InputVision (Image)

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'xiaomi/mimo-v2.5',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out MiMo M2.5 by Xiaomi. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

1.1M

1.6s

111tps

$0.40/M

$2.00/M

Read:

$0.08/M

Write:

—

04/22/2026

More models by Xiaomi

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

1.1M

2.6s

50tps

$1.00/M

$3.00/M

Read:

$0.2/M

Write:

—

04/22/2026

2.2s

46tps

$1.00/M

$3.00/M

Read:

$0.2/M

Write:

—

03/18/2026

262K

1.6s

133tps

$0.10/M

$0.30/M

Read:$0.02/M

Write:—

—

12/17/2025

About MiMo M2.5

MiMo M2.5 is a MoE language model from Xiaomi, released April 22, 2026 under the MIT license. Each forward pass activates a subset of total parameters, which keeps per-token compute lower than a dense model at the same parameter count.

The architecture uses hybrid attention, interleaving sliding-window and full attention to cut KV-cache storage at long sequence lengths. A multi-token prediction (MTP) block raises output tokens per step during inference. The full window of 1.1M tokens lets MiMo M2.5 reason over large documents, repos, or long agent trajectories.

MiMo M2.5 supports reasoning, tool calling, file input, vision, and implicit prompt caching. Call it through xiaomi via AI Gateway. For the higher-capability tier, see mimo-v2.5-pro.

What To Consider When Choosing a Provider

Configuration: MiMo M2.5 balances cost, capability, and context length. The MoE design keeps active compute small, but routing and serving a 300B-class MoE still requires capable infrastructure on the provider side. Use AI Gateway's cost tracking and model fallback to mix MiMo M2.5 with mimo-v2.5-pro on harder workloads.
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use MiMo M2.5

Best For

Agentic Workflows: Tool-using agents that string together many calls in one session
Software Engineering: Code generation, refactors, and repo-scale analysis with a window of 1.1M tokens
Multimodal Input: Reasoning over mixed text, images, and uploaded files
Long Context: Documents, codebases, or chat histories that approach 1.1M tokens
Cost-Aware Reasoning: Lower active-parameter compute than dense models at similar scores

Consider Alternatives When

Maximum Reasoning Depth: mimo-v2.5-pro activates more parameters per step on the hardest math and engineering tasks
Speed-First Throughput: mimo-v2-flash is the throughput-tuned option in the previous generation
Simple Classification: A smaller, cheaper model handles short extraction at lower cost
Strict Text Pipelines: A text-only model is fine when your inputs are never images or files

Conclusion

MiMo M2.5 is the standard tier of Xiaomi's MiMo v2.5 family. Use it for agentic workflows, multimodal input, code, and long-context analysis. Pair it with mimo-v2.5-pro through AI Gateway routing so harder jobs land on the higher tier.

Frequently Asked Questions

How does MiMo M2.5 differ from mimo-v2.5-pro?
MiMo M2.5 is the standard tier. mimo-v2.5-pro activates a larger share of a larger parameter pool per step, so it costs more but reasons further on harder math, code, and agentic problems.
What architecture does MiMo M2.5 use?
A Mixture-of-Experts (MoE) stack with hybrid attention. Each token activates a subset of expert blocks, and sliding-window plus full attention combine to keep KV cache manageable at long context lengths.
What's the context window for MiMo M2.5?
1.1M tokens. Hybrid attention keeps KV-cache memory in check so long-context runs stay practical.
Does MiMo M2.5 support tool calling and reasoning modes?
Yes. MiMo M2.5 supports tool calling and a reasoning mode, both exposed through AI Gateway. Use them through the AI SDK, the Chat Completions API, the Responses API, or any other supported format.
How do I authenticate requests to MiMo M2.5 through AI Gateway?
Add your API key in AI Gateway project settings. Use xiaomi/mimo-v2.5 in API calls. AI Gateway routes, retries, and fails over across xiaomi.
What does MiMo M2.5 cost?
See the pricing section on this page for today's rates. AI Gateway tracks each provider's pricing for MiMo M2.5, so the numbers shown stay current.
Does MiMo M2.5 support zero data retention?
Zero Data Retention is not currently available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.
Is MiMo M2.5 available under an open-source license?
Yes. The MiMo v2.5 line is released under the MIT license, which allows commercial use, modification, and redistribution.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

MiMo M2.5

Playground

Providers

More models by Xiaomi

About MiMo M2.5

What To Consider When Choosing a Provider

When to Use MiMo M2.5

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions