Terminator (TermPG)

Open-source codebase for Terminator Policy Gradient (TermPG), from "Reinforcement Learning with a Terminator (Neurips 2022)".

Installation

To use Terminator, make sure python3 is installed and pip is up to date. This project was tested using python version 3.8.

Clone the Repository

    git clone https://github.com/guytenn/Terminator.git

It is recommended to install requirements using a virtual environment. To set up a virtual environment follow these steps

    cd Terminator/
    python3 -m venv terminator_env
    source terminator_env/bin/activate
    # Upgrade Pip
    pip install --upgrade pip

Install Requirements

While in Terminator directory install requirements using the following command

    pip install .

Download Backseat Driver

You can find the latest version of Backseat Driver here (currently only supports linux).

Download and unzip the files to src/envs/backseat_driver/build/

Your file system should be organized as follows.

    src/envs/backseat_driver/build/BackseatDriverTerm_BurstDebugInformation_DoNotShip
    src/envs/backseat_driver/build/BackseatDriverTerm_Data
    src/envs/backseat_driver/build/BackseatDriverTerm.x86_64
    src/envs/backseat_driver/build/UnityPlayer.so

Usage

Quick start

The following will run TermPG using its default parameters

    python3 run.py --learn_costs --termination_gamma

Below you can find a list of arguments you can change

TermPG Arguments	Description
`--learn_costs`	Learn costs according to TermPG
`--termination_gamma`	Use dynamic discount factor according to TermPG
`--cost_coef 1`	Cost coefficient for termination in environment
`--bonus_coef 1`	Bonus coefficient for termination cost confidence in TermPG
`--bonus_type 'maxmin'`	Type of bonus to use for costs (one of: 'none', 'std', 'maxmin')
`--reward_penalty_coef 0`	Penalty coefficient for costs (penalize reward by estimated costs)
`--termination_penalty 0`	A Penalty for termination (reward shaping variant)
`--reward_bonus_coef 0`	Bonus coefficient for optimism in costs
`--window 30`	Window size for termination
`--env_window -1`	The real window the env will use for termination. If -1 will use default window.
`--n_ensemble 3`	Number of networks to use in cost model ensemble
`--term_train_steps 30`	Number of train steps to train terminator
`--term_batch_size 64`	Batch size for terminator
`--term_replay_size 1000`	Replay size for terminator
`--cost_in_state`	Will add true cost to state (TerMDP with known costs)
`--no_termination`	Will disable termination in environment
`--cost_history_in_state`	Will add history of costs to state in addition to accumulated cost

General Arguments	Description
`--train_timesteps 1000000`	Number of simulation timesteps to train a policy
`--train_batch_size 1024`	Number of timesteps collected for each SGD round. This defines the size of each SGD epoch.
`--batch_size 32`	Total SGD batch size across all devices for SGD. This defines the minibatch size within each epoch.
`--num_epochs 3`	Number of SGD iterations in each outer loop (i.e., number of epochs to execute per train batch).
`--graphics`	When enabled will render environment
`--wandb`	Log to wandb
`--project_name`	Project name for wandb logging
`--run_name`	Run name for wandb logging
`--num_processes 8`	Number of workers during training (value of -1 will use all cpus)
`--num_gpus 1`	Number of gpus to use for training
`--entropy_coeff 0`	Entropy loss coefficient
`--use_lstm`	Use a recurrent policy
`--clean_data`	Will remove all model files in src/data

Citation

To cite our paper please use

@Article{tennenholtz2022reinforcement,
  title={Reinforcement Learning with a Terminator},
  author={Tennenholtz, Guy and Merlis, Nadav and Shani, Lior and Mannor, Shie and Shalit, Uri and Chechik, Gal and Hallak, Assaf and Dalal, Gal},
  journal={arXiv preprint arXiv:2205.15376},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
LICENSE		LICENSE
README.md		README.md
run.py		run.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Terminator (TermPG)

Open-source codebase for Terminator Policy Gradient (TermPG), from "Reinforcement Learning with a Terminator (Neurips 2022)".

Installation

Clone the Repository

Install Requirements

Download Backseat Driver

Usage

Quick start

Citation

About

Uh oh!

Releases

Packages

Languages

License

guytenn/Terminator

Folders and files

Latest commit

History

Repository files navigation

Terminator (TermPG)

Open-source codebase for Terminator Policy Gradient (TermPG), from "Reinforcement Learning with a Terminator (Neurips 2022)".

Installation

Clone the Repository

Install Requirements

Download Backseat Driver

Usage

Quick start

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages