Releases: google-research/batch-ppo
Releases · google-research/batch-ppo
TensorFlow Agents 1.4.0
Features:
- Split episodes into chunks for training. This reduces memory requirements when training from pixels and in some cases increases data efficiency.
- Use lambda variable initializers everywhere to support embedding the simulation into a larger graph.
- Upgrade to newest Gym version, including new environment names and dtypes for spaces.
- Support regularization losses returned by the network.
Improvements:
- Remove MuJoCo dependency from tests.
- Speed up smoke tests for faster iteration times.
- Enable continuous integration.
Bugs:
- Fix off-by-one bug in
FrameHistoryenvironment wrapper.
TensorFlow Agents 1.3.0
Features:
- Represent policies as tf.distribution objects, so that the algorithms are independent of the action distribution.
Improvements:
- Move reusable components into
agents.partspackage. - Add nesting tools to handle nested tuples, lists, and dicts.
Bugs:
- Fix PPO not learning on GPU by placing the optimizer on the GPU.
TensorFlow Agents 1.2.0
Features:
- Use single optimizer for PPO to train shared feature layers better.
- Allow calling methods of the process environment.
Improvements:
- Improve default and MuJoCo configs.
- Report both training and evaluation scores.
Bugs:
- Likelihood calculation halved gradients for the action standard deviation.
TensorFlow Agents 1.1.0
Features:
- Policy networks are now defined as functions mapping sequences of observations to sequences of actions. As a result, feed forward policies are faster now, and memory based agents are easier to implement. Previously, networks were restricted to be defined as
RNNCells. - All functions of the agent interface receive a tensor of agent indices now. This adds the flexibility to process observations in smaller batches. Previously,
perform()andexperience()was defined on data from all the environments.
TensorFlow Agents 1.0.0
Initial release.