Robotic manipulation is one of the most fascinating and challenging problems in robotics, with broad applications in manufacturing, customer service, healthcare, household tasks and more. While learning-based visual policies have achieved impressive results such as manipulating Rubik’s cubes, they are typically trained and tested in the same environments on specific tasks, lacking generalization capabilities to new scenes, objects and tasks. Recently, foundation models such as large language models (LLMs) and vision-language models (VLMs) have demonstrated strong abilities to encode vast amounts of world knowledge and generalize to new domains, offering a promising path forward for enhancing robots’ generalization capabilities. In this workshop, we aim to unite researchers from different communities to push the boundaries of generalizable robotic manipulation, including foundation models, perception, planning, embodied AI, simulators, sim2real, among others. To benchmark the generalization performance of robot policies, we present two challenges in the workshop:
By bringing together experts from interdisciplinary fields, we hope to address current challenges, explore cutting-edge research, and identify future directions that will benefit both academic research and industrial applications.