A machine learning project that uses Logistic Regression to classify apples as good or bad based on various features. The dataset is preprocessed, trained with models using L2 regularization, and evaluated using performance metrics and an ROC curve.
- ๐ Data Preprocessing: Cleans missing values and applies feature scaling.
- ๐ Logistic Regression: Implements classification with and without regularization.
- ๐ Model Comparison: Evaluates multiple models with different regularization strengths.
- ๐ Performance Metrics: Calculates accuracy, precision, recall, F1-score, and ROC-AUC.
- ๐ ROC Curve Visualization: Plots the best modelโs performance.
The dataset is publicly available and can be accessed here:
๐ Apple Quality Dataset
- Various numerical attributes related to apple quality.
- Target variable:
Quality(good = 1, bad = 0).
| Library | Purpose |
|---|---|
pandas |
Data processing & cleaning |
sklearn.model_selection |
Train-test splitting |
sklearn.preprocessing |
Feature scaling |
sklearn.linear_model |
Logistic Regression models |
sklearn.metrics |
Performance evaluation |
matplotlib.pyplot |
Visualization (ROC Curve) |
1๏ธโฃ Clone the repository:
git clone https://github.com/your-username/apple-quality-classification.git
cd apple-quality-classification2๏ธโฃ Install dependencies:
pip install pandas scikit-learn matplotlib3๏ธโฃ Run the Python script:
python main.py| Model Type | Accuracy | Precision | Recall | F1-score | ROC-AUC |
|---|---|---|---|---|---|
| No Regularization | 0.85 | 0.87 | 0.83 | 0.85 | 0.86 |
| L2 (C=0.1) | 0.88 | 0.89 | 0.86 | 0.87 | 0.89 |
| L2 (C=1) | 0.90 | 0.91 | 0.88 | 0.89 | 0.91 |
| L2 (C=10) | 0.89 | 0.90 | 0.87 | 0.88 | 0.90 |
โ Best Model: L2 Regularization with C=1, achieving the highest accuracy and balanced precision/recall.
