This is a supervised learning project to develop algorithms for telecommunications service vendors to predict customer churn probability based on labeled data.
- Preprocessed data set by data cleaning, categorical feature transformation and standardization.
- Trained supervised learning models including Logistic Regression, Random Forest and K-Nearest Neighbors, and applied regularization with optimal parameters to overcome overfitting.
- Evaluated model performance of classification via 5-fold cross-validation technique and analyzed feature importance to identify top factors that influenced the results.