Skip to content

MiMo V2 Flash

MiMo V2 Flash is Xiaomi's MiMo v2 Flash MoE reasoning model with 309B total parameters and 15B active per forward pass, using hybrid attention and multi-token prediction for inference efficiency. It supports a context window of 262.1K tokens at $0.1 per million input tokens and $0.3 per million output tokens.

ReasoningTool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'xiaomi/mimo-v2-flash',
prompt: 'Why is the sky blue?'
})

About MiMo V2 Flash

MiMo V2 Flash is a Mixture-of-Experts model from Xiaomi, released December 17, 2025 under the MIT license. Each forward pass activates a fraction of total parameters, which keeps per-token cost down while the full parameter count stores broader knowledge.

The architecture uses hybrid sliding window attention: sliding window and global attention run in a fixed ratio with a 128-token window, which cuts KV-cache storage versus standard attention and makes a window of 262.1K tokens practical. A multi-token prediction (MTP) block enables speculative decoding so generation can run faster during inference.

For benchmark figures and methodology, see https://novita.ai/ (listed in the model changelog as the MiMo v2 Flash announcement).

MiMo V2 Flash is text-in, text-out. Call it through novita, chutes, xiaomi with AI Gateway; input is $0.1 per million tokens and output is $0.3 per million tokens.