GemBench Challenge

GemBench comprises 16 training tasks with 31 variations, covering seven action primitives. The testing set includes 44 tasks with 92 variations, which are organized into four progressively more challenging levels to systematically evaluate generalization capabilities, namely novel placements, novel rigid object, novel articulated objects, and long-horizon tasks.

Colosseum Challenge

Colosseum aims to evaluate models' generalization across various scene perturbations. It encompasses 14 perturbation factors within 20 distinct RLBench tasks, categorized into three tiers (simple, intermediate, and complex) according to the number of way-points involved (task horizon). Collectively, Colosseum presents 20,371 unique task perturbations instances.

Real Robot Challenge

We will deploy your models on a real robot platform as shown in image below, to assess their real-world generalization capabilities. The testing tasks are similar to the tasks in GemBench's four generalization levels. We will provide a small set of real robot data for fine-tuning. Submissions follow the same format as the GemBench track using containers. You are welcome to participate only in the real robot track if preferred. This is an independent track and will be awarded separately.

Evaluation

Simulator-based Evaluation

To participate the GemBench and Colosseum challenges, please register your team using this registration form.

Note that the challenges are evaluated independently. Please review the guidelines for your chosen challenge(s):

GemBench Challenge Guidelines: View Guidelines
Colosseum Challenge Guidelines: View Guidelines

Real Robot Evaluation

Please register your team using this registration form. The submission guideline is the same as the GemBench challenge.

Dates

GemBench and Colosseum submission deadline: ~~May 12 23:59 CET~~ May 23 23:59 CET
GemBench and Colosseum report deadline: ~~May 19 23:59 CET~~ May 30 23:59 CET
Real robot challenge deadline: June 1 23:59 CET

GemBench Leaderboard

#	Team	Members	Association	Report	Performance on Public Split	Performance on Private Split
1	AIM3	Ziyang Li, Yuting Mei, Sipeng Zheng, Qin Jin	Renmin University of China, BeingBeyond	GRASP to GemBench Challenge	46.4%	46.3%
2	MiRA	Posheng Chen, Sin-Yi Chiu, Chuan-Yu Wu, Jia-Fong Yeh, Hung-Ting Su, Chih-Han Chen, Yan-Xiang Qiu, Winston Hsu	National Taiwan University	Hyperparameter-Tuned Object Grounding and Early-Stopped Motion Control Enhance 3D-LOTUS++ on GemBench	40.3%	43.5%
3	BridgeVLA	Peiyan Li, Yixiang Chen, Hongtao Wu, Xiao Ma, Xiangnan Wu, Yan Huang, Liang Wang, Tao Kong, Tieniu Tan	CASIA, ByteDance Seed, UCAS, FiveAges, NJU	Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models	36.3%	24.3%
-	RIL-LAB	Qiang Nie, Zhiyuan Zhang, Xianzu Wu, Xin Li, Junyu Shi, Yong Sun, Tianao Shen	HKUST(GZ)	Model Evaluation and Fine-Tuning Using Data Augmentation	29.9%	-