MiMo V2 Flash

MiMo V2 Flash is Xiaomi's MiMo v2 Flash MoE reasoning model with 309B total parameters and 15B active per forward pass, using hybrid attention and multi-token prediction for inference efficiency. It supports a context window of 262.1K tokens at $0.1 per million input tokens and $0.3 per million output tokens.

ReasoningTool Use

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'xiaomi/mimo-v2-flash',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

About MiMo V2 Flash

MiMo V2 Flash is a Mixture-of-Experts model from Xiaomi, released December 17, 2025 under the MIT license. Each forward pass activates a fraction of total parameters, which keeps per-token cost down while the full parameter count stores broader knowledge.

The architecture uses hybrid sliding window attention: sliding window and global attention run in a fixed ratio with a 128-token window, which cuts KV-cache storage versus standard attention and makes a window of 262.1K tokens practical. A multi-token prediction (MTP) block enables speculative decoding so generation can run faster during inference.

For benchmark figures and methodology, see https://novita.ai/ (listed in the model changelog as the MiMo v2 Flash announcement).

MiMo V2 Flash is text-in, text-out. Call it through novita, chutes, xiaomi with AI Gateway; input is $0.1 per million tokens and output is $0.3 per million tokens.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

MiMo V2 Flash

About MiMo V2 Flash