Opening this issue to start a discussion about whether it would be worth investing to make it easy to run tensorflow agents K8s.
For some inspiration you can look at TfJob CRD.
Some questions:
- Is there a need to be able to distribute the environments across multiple machines?
- What is the communication pattern between the simulations and TensorFlow job?
* Is data fetched from all simulations simultaneously?
* Does each simulation need to be individually addressable?