Automated Kubernetes resource recommendation tool that analyzes historical Prometheus metrics to generate right-sized CPU and memory recommendations for your deployments.
Managing resource allocation for hundreds of Kubernetes pods across multiple namespaces is challenging. Over-provisioning leads to wasted cloud costs, while under-provisioning risks application performance. When faced with analyzing and optimizing around 400 pods based on actual usage patterns, manual analysis becomes impractical and time-consuming.
This tool was created to automate the process of analyzing pod resource usage and generating data-driven recommendations. It integrates with Prometheus to analyze historical metrics, calculates optimal resource requests and limits based on statistical analysis (mean for requests, P95 for limits), and provides actionable recommendations through multiple output formats.
The tool runs as a Kubernetes CronJob, deployed via Helm chart, that:
- Connects to Prometheus to retrieve historical CPU and memory metrics for all running pods
- Analyzes usage patterns over a configurable time period (default: 7 days)
- Calculates recommendations using statistical analysis with configurable safety buffers
- Generates YAML patches ready for deployment updates
- Sends reports to Slack with detailed HTML tables and patch files
- Historical Analysis: Analyzes configurable time periods (default 7 days) of Prometheus metrics
- Statistical Recommendations: Uses mean values for requests and P95 percentiles for limits
- Multiple Output Formats: Generates both human-readable tables and machine-readable YAML patches
- Slack Integration: Automated reports with HTML-formatted tables and downloadable patch files
- Safety Buffers: Configurable buffer percentage to add headroom to recommendations
- Kubernetes cluster
- Prometheus with
container_cpu_usage_seconds_totalandcontainer_memory_working_set_bytesmetrics
- Clone the repository:
git clone git@github.com:umairedu/kube-right-sizer.git
cd kube-right-sizer- Configure environment variables in
helm/resource-right-sizing/production/values.yaml:
resource_sizing:
plain:
PROMETHEUS_URL: "http://prometheus-server.default.svc:9090"
KUBERNETES_USE_IN_CLUSTER_CONFIG: "true"
TARGET_NAMESPACE: "" # Empty for all namespaces, or comma-separated list
EXCLUDED_NAMESPACES: "kube-system"
HOURS: "168" # 7 days
BUFFER_PERCENT: "20"
OUTPUT_FORMAT: "both"
secrets:
SLACK_TOKEN: "your-slack-token"
SLACK_CHANNEL: "your-channel"
SLACK_VERIFY_SSL: "true"- Deploy the Helm chart:
helm install resource-right-sizing ./helm/resource-right-sizing \
-f ./helm/resource-right-sizing/production/values.yaml \
-n <your-namespace>- Install dependencies:
pip install -r requirements.txt- Configure environment variables (copy
env.sampleto.envand update):
cp env.sample .env
# Edit .env with your configuration- Run locally:
python3 main.pyAll configuration is done via environment variables:
PROMETHEUS_URL: Prometheus server URLKUBERNETES_USE_IN_CLUSTER_CONFIG: Use in-cluster config (true/false)TARGET_NAMESPACE: Comma-separated list of namespaces to scan (empty for all)EXCLUDED_NAMESPACES: Comma-separated list of namespaces to excludeHOURS: Number of hours of historical data to analyzeBUFFER_PERCENT: Safety buffer percentage to add to recommendationsOUTPUT_FORMAT: Output format (table, yaml, or both)SLACK_TOKEN: Slack bot token for notificationsSLACK_CHANNEL: Slack channel for notificationsSLACK_VERIFY_SSL: SSL verification for Slack API (true/false)
The tool provides color-coded terminal output showing current vs recommended resources:
kube-right-sizer/
├── main.py # Main application entry point
├── config.py # Configuration management (Pydantic)
├── services/
│ ├── kubernetes.py # Kubernetes API interactions
│ ├── prometheus.py # Prometheus metrics queries
│ └── slack.py # Slack notification integration
├── helm/
│ └── resource-right-sizing/
│ ├── Chart.yaml
│ ├── production/
│ │ └── values.yaml
│ └── templates/
│ ├── cron_job.yaml
│ ├── rbac.yaml
│ └── secrets.yaml
├── Dockerfile
├── requirements.txt
└── env.sample


