A personal learning hub for machine learning, data science, and Kaggle competitions.
Welcome to Kaggle Knowledge, a curated collection of notebooks, experiments, and learning materials I develop while improving my machine learning skills through Kaggle competitions and hands-on projects.
This repository serves three main purposes:
A structured space where I practice:
Data preprocessing & cleaning
Feature engineering
Model building (RandomForest, XGBoost, CatBoost, etc.)
Cross-validation & hyperparameter tuning
Kaggle submission pipelines
Each notebook focuses on exploring:
Different modeling strategies
Alternative feature engineering ideas
Error analysis & model interpretation
Methods to improve leaderboard scores
As I learn new techniques, I document:
What worked
What didn’t
Why certain models behave differently
Key insights from competitions
This makes it easier to revisit and apply methods across future projects.
Each competition gets its own folder containing Modeling notebooks (baseline → advanced)
This repository primarily uses:
Python
Pandas, NumPy, Scikit-learn, XGBoost, CatBoost, LightGBM
Matplotlib / Seaborn
By maintaining this repository, I aim to:
- Build a strong understanding of ML modeling workflows
- Master Kaggle competition techniques
- Develop reproducible machine learning pipelines
- Track improvement over time
- Prepare for real-world data science and ML engineering work