Chengbo Yuan, Rui Zhou^, Mengzhen Liu^, Yingdong Hu, Shengjie Wang, Li Yi, Chuan Wen, Shanghang Zhang, Yang Gao*.
[Project Website] [Arxiv] [Dataset] [Pi0-VLA Code] [BibTex]
^ Indicates equal contribution. * Corresponding author.
MotionTrans is the first framework that achieve explicit end-to-end human-to-robot motion transfer, established motion-level policy learning from human data. By cotraining on 15 robot tasks and 15 human tasks, we enable both Diffusion Policy and Pi0-VLA to directly perform 10+ human tasks. Here we open-source all codes of the framework, including
- (1) Robot teleoperation.
- (2) Human data collection.
- (3) MotionTrans Dataset.
- (4) Human-robot data processing.
- (5) Human data replay (on robot).
- (6) Policy human-robot cotraining / finetuning.
- (7) Checkpoints (Weights) of cotrained policy.
- (8) Policy inference and deployment (on robot).
conda create -n dexmimic python=3.10
codna activate dexmimic
pip install -r requirments.txt
pip install torch torchvision peft open3d viser
pip install huggingface-hub==0.21.4 pin==3.3.1 numpy==1.24.4
Since we rely on ZED2 camera for visual observation, please also install the ZED SDK following the official instruction.
Please following documents/1.robot_teleoperation.md
Please following documents/2.human_data_collection.md
Please download the dataset from this huggingface link: MotionTrans Dataset. The details of the dataset can be found in documents/3.motiontrans_dataset.md
For data processing:
bash scripts_data/zarr_human_data_conversion_batch.sh
bash scripts_data/zarr_robot_data_conversion_batch.sh
This will process all tasks in the raw data folder and save the processed data in zarr format. For instruction augmentation (with OpenAI-ChatGPT), check out scripts/zarr_get_diverse_instruction.sh. For data visualization, first run visualization-version processing scripts for a single task:
bash scripts_data/zarr_human_data_conversion_vis.sh
bash scripts_data/zarr_robot_data_conversion_vis.sh
And then run the visualization script (remember to update the data_path in the .sh file to the zarr folder generated before):
bash scripts_data/data_visualization.sh
This will open a window to visualize the overlapped pointclouds for checking and an interactive viser visualization for detailed inspection. The visualization results are shown below:
To replay the processed human data on the robot, run:
bash scripts/replay.sh
And then follow the instruction in your terminal to control the replay process.
We provide Diffusion Policy codebase in this repository. For Pi0-VLA, please refer to the MotionTrans-Pi0-VLA.
For human-robot multi-task cotraining (zero-shot setting in the paper), run:
bash scripts/dp_base_cotraining.sh
The checkpoints will be saved in .ckpt file. Allow with the checkpoints, a .yaml recording (the order of) all training tasks will also be saved, which will be used for later policy inference and robot control.
The cotrained checkpoints can be downloaded from here.
For few-shot robot demonstrations finetuning (few-shot setting in the paper), run:
bash scripts/dp_base_finetune_5demo.sh
bash scripts/dp_base_finetune_20demo.sh
To deploy trained Diffusion Policy (DP) on the robot, run:
bash scripts/dp_infer.sh
and follow the instruction in your terminal to control the robot execution. We note that the parameters and tricks for action-chunk-based inference affect the performance a lot. Please refer to the comments in scripts/dp_infer.sh for parameter details. You should especially pay attention to robot_action_horizon, robot_steps_per_inference, gripper_action_horizon, and gripper_steps_per_inference, which should be set carefully to get a balance between inference horizon and action jittering (mode discussion and potential improvement could refer to blog of Real-Time Action Chunking).
To deploy trained Pi0-VLA on the robot, first start the policy server (refer to MotionTrans-Pi0-VLA), and then run:
bash scripts/pi0_infer.sh
and follow the instruction in your terminal to control the robot execution. The scripts above start a client to communicate with the policy server, and then control the robot execution. More details could refer to official Pi0-VLA repository.
This repository is based on the code from Data-Scaling-Laws, UMI, Open-Television, ARCap, OpenPi, Viser, RoboEngine, OneTwoVLA, EgoHOI, DROID and Pinocchio. We sincerely appreciate their contribution to the open-source community, which have significantly supported this project. We also sincerely thank our AI-collaborators ChatGPT, Kimi and Github Copilot !!
If you find this repository useful, please kindly acknowledge our work :
@article{yuan2025motiontrans,
title={MotionTrans: Human VR Data Enable Motion-Level Learning for Robotic Manipulation Policies},
author={Yuan, Chengbo and Zhou, Rui and Liu, Mengzhen and Hu, Yingdong and Wang, Shengjie and Yi, Li and Wen, Chuan and Zhang, Shanghang and Gao, Yang},
journal={arXiv preprint arXiv:2509.17759},
year={2025}
}
