What architecture does Qwen 3.6 27B use?

Qwen 3.6 27B is built on a hybrid architecture that integrates linear attention mechanisms with a sparse mixture-of-experts framework. The combination is designed to keep inference efficient at long context while preserving the capability profile of a larger network.

How does Qwen 3.6 27B compare to the prior Qwen 3.5-35B-A3B generation?

Alibaba positions Qwen 3.6 27B as a clear step up from the 3.5-35B-A3B generation, with significantly improved agentic coding, mathematical and code reasoning, spatial intelligence, and object localization and detection performance.

What modalities does Qwen 3.6 27B accept?

Qwen 3.6 27B is a native vision-language model. It accepts interleaved text and images within a single context window of up to 256K tokens, and supports file input alongside text and vision.

Does Qwen 3.6 27B support tool calling?

Yes. Qwen 3.6 27B is tagged for tool use and is tuned for agentic coding workflows. The model can select and invoke registered functions across multi-turn sessions through AI Gateway.

How do I access Qwen 3.6 27B through AI Gateway?

Authenticate with an AI Gateway API key or OIDC token and reference ``alibaba/qwen3.6-27b`` as the model. You can call Qwen 3.6 27B through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats, from TypeScript or Python.

Does Qwen 3.6 27B support zero data retention?

Zero Data Retention is not currently available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.

Where can I see live latency and cost data for Qwen 3.6 27B?

This page shows live throughput, time-to-first-token, and pricing metrics for Qwen 3.6 27B measured across real AI Gateway traffic.

Qwen 3.6 27B

Qwen 3.6 27B is a Qwen 3.6 native vision-language model from Alibaba built on a hybrid linear-attention plus sparse mixture-of-experts architecture, with a context window of 256K tokens and improvements in agentic coding, math and code reasoning, spatial intelligence, and object detection.

ReasoningTool UseImplicit CachingFile InputVision (Image)

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'alibaba/qwen3.6-27b',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Qwen 3.6 27B by Alibaba. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

256K

0.7s

146tps

$0.60/M

$3.60/M

—

04/22/2026

More models by Alibaba

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

1.1s

118tps

$2.50/M

$7.50/M

Read:$0.25/M

Write:$3.13/M

—

05/21/2026

240K

2.2s

56tps

$1.30/M

$7.80/M

Read:

$0.26/M

Write:

$1.63/M

—

04/20/2026

1.6s

109tps

$0.50/M

$3.00/M

Read:

$0.1/M

Write:

$0.63/M

—

04/02/2026

1.2s

167tps

$0.10/M

$0.40/M

Read:$0.0/M

Write:$0.13/M

—

02/24/2026

1.4s

110tps

$0.40/M

$2.40/M

Read:

$0.04/M

Write:

$0.5/M

—

02/16/2026

256K

1.2s

32tps

$0.50/M

$1.20/M

—

07/22/2025

About Qwen 3.6 27B

Qwen 3.6 27B, released on April 22, 2026, is a Qwen 3.6 native vision-language model in Alibaba's Qwen 3 family. It is built on a hybrid architecture that combines linear attention mechanisms with a sparse mixture-of-experts (MoE) framework, a design intended to keep inference efficient at long context while preserving the capability profile of a larger network.

Compared with the prior 3.5-35B-A3B generation, Qwen 3.6 27B brings improvements across several axes. Agentic coding ability is stronger, which matters for pipelines that chain tool calls and multi-step plans. Mathematical and code reasoning have been upgraded for benchmark-style problem solving and real-world programming tasks. Spatial intelligence, object localization, and object detection are sharper, which improves the model's grounding when it must reason about positions of elements within an image.

Native vision-language support means images are treated as first-class inputs alongside text rather than processed through a bolt-on encoder. Qwen 3.6 27B accepts file input and is tagged for reasoning, tool use, and implicit caching, so it can ingest documents and images, decide when to invoke registered tools, and reuse cached prefixes when serving repeated long prompts. The context window of 256K tokens accommodates extended multimodal sessions, document plus image inputs, and long agent traces.

You can integrate Qwen 3.6 27B through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats, from TypeScript or Python. Maximum output is 256K tokens tokens per request.

What To Consider When Choosing a Provider

Configuration: Sparse MoE architectures keep active compute small per token, but providers serve the model through different infrastructure paths. Check the live cost and latency metrics on this page before sizing high-volume workloads, and confirm vision payload limits with your selected provider.
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Qwen 3.6 27B

Best For

Agentic Multimodal Coding: Pipelines that combine reasoning, tool use, and image input within a single session
Spatial And Object Tasks: Workloads requiring sharper object localization, detection, and 2D grounding
Math And Code Reasoning: Step-by-step accuracy in mathematical and programming problems over raw throughput
Long-Context Multimodal Sessions: Combined text and image inputs handled within the window of 256K tokens
Repeated Long Prefixes: Implicit caching reduces cost on shared system prompts and document inputs

Consider Alternatives When

Text-Only Workloads: A dedicated text model offers lower cost per token when vision is never used
Maximum Reasoning Depth: A thinking-mode model is a better match when extended chain-of-thought matters most
Latency-Critical Pipelines: A smaller, faster vision model serves simple multimodal tasks at lower cost
Image Or Video Generation: A generation-class model fits tasks that produce pixels rather than read them

Conclusion

Qwen 3.6 27B brings the Qwen 3.6 generation's improvements in agentic coding, reasoning, and spatial intelligence to a hybrid linear-attention plus sparse MoE architecture. Routing through AI Gateway gives you a single integration surface across the Qwen 3.6 line, with provider failover and consolidated billing.

Frequently Asked Questions

What architecture does Qwen 3.6 27B use?
Qwen 3.6 27B is built on a hybrid architecture that integrates linear attention mechanisms with a sparse mixture-of-experts framework. The combination is designed to keep inference efficient at long context while preserving the capability profile of a larger network.
How does Qwen 3.6 27B compare to the prior Qwen 3.5-35B-A3B generation?
Alibaba positions Qwen 3.6 27B as a clear step up from the 3.5-35B-A3B generation, with significantly improved agentic coding, mathematical and code reasoning, spatial intelligence, and object localization and detection performance.
What modalities does Qwen 3.6 27B accept?
Qwen 3.6 27B is a native vision-language model. It accepts interleaved text and images within a single context window of up to 256K tokens, and supports file input alongside text and vision.
Does Qwen 3.6 27B support tool calling?
Yes. Qwen 3.6 27B is tagged for tool use and is tuned for agentic coding workflows. The model can select and invoke registered functions across multi-turn sessions through AI Gateway.
How do I access Qwen 3.6 27B through AI Gateway?
Authenticate with an AI Gateway API key or OIDC token and reference `alibaba/qwen3.6-27b` as the model. You can call Qwen 3.6 27B through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats, from TypeScript or Python.
Does Qwen 3.6 27B support zero data retention?
Zero Data Retention is not currently available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.
Where can I see live latency and cost data for Qwen 3.6 27B?
This page shows live throughput, time-to-first-token, and pricing metrics for Qwen 3.6 27B measured across real AI Gateway traffic.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Qwen 3.6 27B

Playground

Providers

More models by Alibaba

About Qwen 3.6 27B

What To Consider When Choosing a Provider

When to Use Qwen 3.6 27B

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions