Skip to content

MiMo M2.5

MiMo M2.5 is the mid-tier model in Xiaomi's MiMo v2.5 family, a Mixture-of-Experts (MoE) stack with reasoning, tool use, and multimodal input. It supports a context window of 1.1M tokens and 131.1K tokens max output tokens.

ReasoningTool UseImplicit CachingFile InputVision (Image)
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'xiaomi/mimo-v2.5',
prompt: 'Why is the sky blue?'
})

Playground

Try out MiMo M2.5 by Xiaomi. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Xiaomi
Legal:Terms
Privacy
1.1M
1.6s
111tps
$0.40/M
$2.00/M
Read:
$0.08/M
Write:
04/22/2026
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Xiaomi

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1.1M
2.6s
50tps
$1.00/M
$3.00/M
Read:
$0.2/M
Write:
xiaomi logo
04/22/2026
1M
2.2s
46tps
$1.00/M
$3.00/M
Read:
$0.2/M
Write:
xiaomi logo
03/18/2026
262K
1.6s
133tps
$0.10/M$0.30/M
Read:$0.02/M
Write:
chutes logo
novita logo
xiaomi logo
12/17/2025

About MiMo M2.5

MiMo M2.5 is a MoE language model from Xiaomi, released April 22, 2026 under the MIT license. Each forward pass activates a subset of total parameters, which keeps per-token compute lower than a dense model at the same parameter count.

The architecture uses hybrid attention, interleaving sliding-window and full attention to cut KV-cache storage at long sequence lengths. A multi-token prediction (MTP) block raises output tokens per step during inference. The full window of 1.1M tokens lets MiMo M2.5 reason over large documents, repos, or long agent trajectories.

MiMo M2.5 supports reasoning, tool calling, file input, vision, and implicit prompt caching. Call it through xiaomi via AI Gateway. For the higher-capability tier, see mimo-v2.5-pro.

What To Consider When Choosing a Provider

  • Configuration: MiMo M2.5 balances cost, capability, and context length. The MoE design keeps active compute small, but routing and serving a 300B-class MoE still requires capable infrastructure on the provider side. Use AI Gateway's cost tracking and model fallback to mix MiMo M2.5 with mimo-v2.5-pro on harder workloads.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use MiMo M2.5

Best For

  • Agentic Workflows: Tool-using agents that string together many calls in one session
  • Software Engineering: Code generation, refactors, and repo-scale analysis with a window of 1.1M tokens
  • Multimodal Input: Reasoning over mixed text, images, and uploaded files
  • Long Context: Documents, codebases, or chat histories that approach 1.1M tokens
  • Cost-Aware Reasoning: Lower active-parameter compute than dense models at similar scores

Consider Alternatives When

  • Maximum Reasoning Depth: mimo-v2.5-pro activates more parameters per step on the hardest math and engineering tasks
  • Speed-First Throughput: mimo-v2-flash is the throughput-tuned option in the previous generation
  • Simple Classification: A smaller, cheaper model handles short extraction at lower cost
  • Strict Text Pipelines: A text-only model is fine when your inputs are never images or files

Conclusion

MiMo M2.5 is the standard tier of Xiaomi's MiMo v2.5 family. Use it for agentic workflows, multimodal input, code, and long-context analysis. Pair it with mimo-v2.5-pro through AI Gateway routing so harder jobs land on the higher tier.

Frequently Asked Questions

  • How does MiMo M2.5 differ from mimo-v2.5-pro?

    MiMo M2.5 is the standard tier. mimo-v2.5-pro activates a larger share of a larger parameter pool per step, so it costs more but reasons further on harder math, code, and agentic problems.

  • What architecture does MiMo M2.5 use?

    A Mixture-of-Experts (MoE) stack with hybrid attention. Each token activates a subset of expert blocks, and sliding-window plus full attention combine to keep KV cache manageable at long context lengths.

  • What's the context window for MiMo M2.5?

    1.1M tokens. Hybrid attention keeps KV-cache memory in check so long-context runs stay practical.

  • Does MiMo M2.5 support tool calling and reasoning modes?

    Yes. MiMo M2.5 supports tool calling and a reasoning mode, both exposed through AI Gateway. Use them through the AI SDK, the Chat Completions API, the Responses API, or any other supported format.

  • How do I authenticate requests to MiMo M2.5 through AI Gateway?

    Add your API key in AI Gateway project settings. Use xiaomi/mimo-v2.5 in API calls. AI Gateway routes, retries, and fails over across xiaomi.

  • What does MiMo M2.5 cost?

    See the pricing section on this page for today's rates. AI Gateway tracks each provider's pricing for MiMo M2.5, so the numbers shown stay current.

  • Does MiMo M2.5 support zero data retention?

    Zero Data Retention is not currently available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.

  • Is MiMo M2.5 available under an open-source license?

    Yes. The MiMo v2.5 line is released under the MIT license, which allows commercial use, modification, and redistribution.