Overview

Robotic manipulation is one of the most fascinating and challenging problems in robotics, with broad applications in manufacturing, customer service, healthcare, household tasks and more. While learning-based visual policies have achieved impressive results such as manipulating Rubik’s cubes, they are typically trained and tested in the same environments on specific tasks, lacking generalization capabilities to new scenes, objects and tasks. Recently, foundation models such as large language models (LLMs) and vision-language models (VLMs) have demonstrated strong abilities to encode vast amounts of world knowledge and generalize to new domains, offering a promising path forward for enhancing robots’ generalization capabilities. In this workshop, we aim to unite researchers from different communities to push the boundaries of generalizable robotic manipulation, including foundation models, perception, planning, embodied AI, simulators, sim2real, among others. To benchmark the generalization performance of robot policies, we present two challenges in the workshop:

  • GemBench: evaluating vision-and-language based manipulation across novel rigid objects, articulated objects and long-horizon tasks
  • Colosseum: assessing robot policies under various environmental perturbations such as changes in lighting conditions and camera poses
By bringing together experts from interdisciplinary fields, we hope to address current challenges, explore cutting-edge research, and identify future directions that will benefit both academic research and industrial applications.

Confirmed Speakers

Schedule

Organizers

Contact

Please feel free to send us your queries via email at robomanigrail@gmail.com