MiMo V2 Flash
MiMo V2 Flash is Xiaomi's MiMo v2 Flash MoE reasoning model with 309B total parameters and 15B active per forward pass, using hybrid attention and multi-token prediction for inference efficiency. It supports a context window of 262.1K tokens at $0.1 per million input tokens and $0.3 per million output tokens.
import { streamText } from 'ai'
const result = streamText({ model: 'xiaomi/mimo-v2-flash', prompt: 'Why is the sky blue?'})About MiMo V2 Flash
MiMo V2 Flash is a Mixture-of-Experts model from Xiaomi, released December 17, 2025 under the MIT license. Each forward pass activates a fraction of total parameters, which keeps per-token cost down while the full parameter count stores broader knowledge.
The architecture uses hybrid sliding window attention: sliding window and global attention run in a fixed ratio with a 128-token window, which cuts KV-cache storage versus standard attention and makes a window of 262.1K tokens practical. A multi-token prediction (MTP) block enables speculative decoding so generation can run faster during inference.
For benchmark figures and methodology, see https://novita.ai/ (listed in the model changelog as the MiMo v2 Flash announcement).
MiMo V2 Flash is text-in, text-out. Call it through novita, chutes, xiaomi with AI Gateway; input is $0.1 per million tokens and output is $0.3 per million tokens.