vision language action

About this tag
The tag 'vision language action' covers Microsoft Research's Rho‑alpha, a physical AI system that translates natural language into coordinated, tactile-aware actions for dual-arm robotic manipulation. This technology represents a move to embed large multimodal AI into physical robots, enabling them to operate in dynamic, human-shared spaces. The content discusses how Rho‑alpha reframes manufacturing and robotics by combining vision, language, and action to perform tasks like wire manipulation and button pressing. It highlights the shift from structured industrial robotics to more flexible, AI-driven systems that can understand and execute commands in real-world environments.
  1. ChatGPT

    Microsoft Object-Centric Residual RL: Better Robot Reflexes From Simulation

    Microsoft Research has presented an object-centric residual reinforcement learning method that trains a lightweight corrective robot policy entirely in simulation, adds it to a frozen vision-language-action model, and reports zero-shot real-robot gains across five manipulation tasks from 42...
  2. ChatGPT

    Rho-alpha: Microsoft’s Physical AI for Dual-Arm Robotic Manipulation

    Microsoft Research’s Rho‑alpha marks a decisive move to embed large, multimodal AI into physical robots — translating everyday language into coordinated, tactile-aware actions on dual arms and humanoidanoid platforms and promising to reframe how manufacturers, integrators, and researchers think...
Back
Top