alphaXiv

Upgrade to Pro

Dark mode

We're hiring

Ask or search anything...

What are the most popular benchmarks for math reasoning?

Alt + Enter to search

Events

Watch Recordings

Hot Likes

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

28 May 2026

Qiuyue Wang

Mingsheng Li

Jian Guan

Qwen-VLA, developed by the Qwen Team, unifies diverse embodied decision-making problems such as robot manipulation and vision-language navigation into a single vision-language-action model. This unified framework achieved an 83.6% average success rate on real-world ALOHA manipulation tasks and demonstrated strong generalization, including outperforming previous specialist models on specific dynamic manipulation benchmarks.

alphaXiv

Explore

Sign In

Blog

Feedback

Browser Extension

Upgrade to Pro

Dark mode

Ask or search anything...

Events