Skip to content

DanEinstein/DiabetesML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diabetes Risk Prediction Tool

Overview

This application provides an interactive interface for predicting the likelihood of diabetes based on key patient biometric data. It is built using Streamlit for the frontend and a pre-trained scikit-learn machine learning model for the prediction logic.

The tool is a proof-of-concept focused on the predictive model component of public health and wellness.

current Model Status

Model Type: Logistic Regression

Training Data: PIMA Indians Diabetes Dataset

Current Accuracy: Approximately ∼74%

Key Features: The model requires 8 input features, which are automatically scaled using a saved StandardScaler object before prediction.

Prerequisites

To run this application, you must have the following installed on your system:

Git (for cloning the repository)

Python (3.8+)

pip (Python package installer)

Setup and Installation

  1. Clone the Repository First, clone the project files from GitHub to your local machine using your repository URL:

git clone https://github.com/DanEinstein/DiabetesML.git cd DiabetesML

  1. Install Dependencies Once in the project directory, install the core libraries used in the project (Streamlit, pandas, scikit-learn, and joblib).

pip install streamlit pandas numpy scikit-learn joblib

  1. Required Model Assets For the application to function, you must have the following two files in the same directory as display.py. These files are generated by your separate model training script. (These files should be included in your Git repository for immediate deployment.)

diabetes_model.pkl (The saved Logistic Regression model)

scaler.pkl (The saved StandardScaler object, critical for preprocessing user inputs)

How to Run the Application Navigate to the project directory in your terminal and use the Streamlit execution command:

streamlit run display.py

Streamlit will automatically open the application in your default web browser (usually at http://localhost:8501).

Future Expansion and Roadmap The following key features are planned to transform this application into a continuous learning system:

  1. Data Collection and Feedback Loop The application will be enhanced to actively collect user-submitted data.

Persistent Storage: Implement a feature to log all user input features and the resulting model prediction (risk score) into a persistent database (e.g., CSV file, MongoDB, or Firestore).

Truth Label Mechanism: Integrate a mechanism (e.g., a simple form) to collect the actual patient outcome over time, enabling the creation of a powerful feedback loop for model improvement.

  1. Model Retraining and Improvement The training pipeline will be automated and fed the new data collected from the live app.

Dataset Integration: Modify the training script to automatically load and merge the original PIMA data with the new collected data.

Scheduled Retraining: Establish a process to periodically re-run the training script using the larger, combined dataset, specifically targeting accuracy improvements beyond the current ∼74% mark.

  1. Deployment and Versioning To maintain stability during updates, a robust deployment strategy is needed.

Model Versioning: Adopt a naming convention (e.g., diabetes_model_v1.pkl, diabetes_model_v2.pkl) to track different model versions, making it easy to roll back to a stable version if a newly trained model performs poorly.

A/B Testing: Future plans may involve integrating model serving capabilities to allow for A/B testing of new model versions against the current live model.

Application Usage Input Data: Use the input fields (organized into three columns) to enter the 8 required patient metrics (e.g., Pregnancies, Glucose Concentration, BMI, etc.).

Analyze Risk: Click the Analyze Risk button.

View Result: The app will display the prediction, including:

A clear diagnosis (High Risk Detected or Low Risk Detected).

The model's calculated risk score (probability) as a percentage.

A disclaimer reminding users that the tool is not a substitute for medical advice. Screenshot of Diabetes App Screenshot of Diabetes App Screenshot of Diabetes App Screenshot of Diabetes App

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages