ParaK8s

ParaK8s is a framework for organizing and running large-scale machine learning experiments on multi-node Kubernetes clusters.
Its core philosophy is that all experiments are trajectories within a large function space: each experiment is fully described by a configuration file specifying data pipelines, model architecture, logging, results tracking, and model registry.

Core Tenets

Experiments as Configurations
Every experiment is uniquely determined by its configuration file, model parameters, and datapipelines.
Reproducibility
All experiments are reproducible via configuration files, Docker images, and version-controlled datasets.
Scalability
ParaK8s can orchestrate hundreds of experiments across multi-node Kubernetes clusters, leveraging GPUs where available.
Modularity
Pipelines, models, and data preprocessing functions are designed to be reusable across experiments.
Experiment Tracking & Artifact Management
Full integration with MLflow, databases (Postgres), and model registries ensures experiments are fully auditable.

Example Experiments

1. Pneumonia X-Ray Classifier

Objective:
Classify X-ray images to detect the presence of pneumonia, outputting a probability for each image.

Key Achievements:

Achieved 85% categorical accuracy and 0.82 AUC score.
Utilized transfer learning and data augmentation using TensorFlow.
Parallelized preprocessing using Keras PyDataset.
Built reusable GPU-enabled Docker images.
Deployed 100 experiments across a multi-node Kubernetes cluster.
Configured Postgres database, MLflow tracking server, and a local Docker repository.

2. ECG Abnormality Classifier

Objective:
Detect abnormalities in ECG signals using a multi-class classifier.

Key Achievements:

Built a DeepCNN model to classify ECG beats into 5 categories.
Achieved competitive AUC, precision, recall, and categorical accuracy.
Preprocessed and normalized ECG signals using TensorFlow operations, ensuring consistency across devices.
Managed large datasets with a custom ECGDataset class (tf.keras.utils.Sequence).
Experiments defined entirely by configuration files describing architecture, batch sizes, and data pipelines.
Deployed multiple experiments on a multi-node Kubernetes cluster with GPU support.
Integrated metrics tracking via MLflow and artifact management in a model registry.

Pipeline Overview:

Load and preprocess ECG signals from CSV.
Batch and normalize signals using the ECGDataset class.
Train DeepCNN with class-weighted loss for imbalanced datasets.
Evaluate with AUC, precision, recall, and categorical accuracy.
Save trained models and track metrics in MLflow.

ParaK8s demonstrates flexibility and reproducibility, supporting experiments on both image and time-series data, while leveraging modular configurations, GPU acceleration, and cluster-level orchestration.

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
.github		.github
archive		archive
data		data
experiments		experiments
features		features
instructional_team		instructional_team
k8s-resources		k8s-resources
.gitignore		.gitignore
README.md		README.md
experiment_architecture.drawio		experiment_architecture.drawio
experiment_architecture.png		experiment_architecture.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ParaK8s

Core Tenets

Example Experiments

1. Pneumonia X-Ray Classifier

2. ECG Abnormality Classifier

About

Uh oh!

Releases

Packages

Languages

jsoych/team_project

Folders and files

Latest commit

History

Repository files navigation

ParaK8s

Core Tenets

Example Experiments

1. Pneumonia X-Ray Classifier

2. ECG Abnormality Classifier

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages