This is a tensorflow based implementation for our approach PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to mitigate the issues of non-stationarity and infeasible subgoal generation in hierarchical reinforcement learning.
This code is based on TensorFlow. To install, run these commands:
# install the code
git clone -b master --single-branch https://github.com/Utsavz/piper.git
virtualenv piper
source $PWD/piper/bin/activate
pip install numpy
pip install -r src/requirements.txtTo run the demo, use the following scripts:
# For Maze navigation environment
python experiment/play.py --dir=maze_piper_0 --render=1 --rollouts=10
# For Pick and place environment
python experiment/play.py --dir=pick_piper_0 --render=1 --rollouts=10
# For Push environment
python experiment/play.py --dir=push_piper_0 --render=1 --rollouts=10
# For Hollow environment
python experiment/play.py --dir=hollow_piper_0 --render=1 --rollouts=10
# For Franka kitchen environment
python experiment/play.py --dir=kitchen_piper_0 --render=1 --rollouts=10
To train, use the following scripts. For baselines, change the parameters accordingly:
# For Maze navigation environment
python experiment/train.py --env="FetchMazeReach-v1" --logdir="maze_piper_0" --n_epochs=3100 --reward_batch_size=50 --seed=0 --bc_loss=0 --num_hrl_layers=2 --reward_model=1 --q_reg=1
# For Pick and place environment
python experiment/train.py --env="FetchPickAndPlace-v1" --logdir="pick_piper_0" --n_epochs=3000 --reward_batch_size=50 --seed=0 --bc_loss=1 --num_hrl_layers=2 --reward_model=1 --q_reg=1
# For push environment
python experiment/train.py --env="FetchPush-v1" --logdir="push_piper_0" --n_epochs=13000 --reward_batch_size=50 --seed=0 --bc_loss=1 --num_hrl_layers=2 --reward_model=1 --q_reg=1
# For hollow environment
python experiment/train.py --env="FetchPickAndPlaceHollow-v1" --logdir="fetchPickHollow_rlhf_q_reg_0" --n_epochs=60000 --reward_batch_size=100 --seed=0 --bc_loss=1 --num_hrl_layers=2 --reward_model=1 --q_reg=1
# For Franka kitchen environment
python experiment/train.py --env="kitchen-complete-v0" --logdir="kitchen_piper_0" --n_epochs=3000 --reward_batch_size=50 --seed=0 --bc_loss=1 --num_hrl_layers=2 --reward_model=1 --q_reg=1
To plot the success rate performances, use the following scripts:
# For Maze navigation environment
python experiment/plot.py --dir1=maze_piper_0:piper --plot_name="maze"
# For Pick and place environment
python experiment/plot.py --dir1=pick_piper_0:piper --plot_name="pick"
# For Push environment
python experiment/plot.py --dir1=push_piper_0:piper --plot_name="push"
# For hollow environment
python experiment/plot.py --dir1=hollow_piper_0:piper --plot_name="hollow"
# For Franka kitchen environment
python experiment/plot.py --dir1=kitchen_piper_0:piper --plot_name="kitchen"
Here we provide the success rate performance results for various environments:






