Distributed training with Kubernetes

Opening this issue to start a discussion about whether it would be worth investing to make it easy to run tensorflow agents K8s.

For some inspiration you can look at [TfJob CRD](https://github.com/tensorflow/k8s/blob/master/tf_job_design_doc.md).

Some questions:
  1. Is there a need to be able to distribute the environments across multiple machines?
  1. What is the communication pattern between the simulations and TensorFlow job?
          * Is data fetched from all simulations simultaneously?
          * Does each simulation need to be individually addressable?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distributed training with Kubernetes #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Distributed training with Kubernetes #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions