This project is based on the annual Behavioral Risk Factor Surveillance System (BRFSS) survey of 350,000 people to assess emerging trends in public health and classify health-related risk behaviors. For this project, I've sampled 20,000 cases and 9 variables. The main objective was to identify which habits/behaviors are most risky to people's health and how they're distributed over age and gender.
The data was obtained from the BRFSS website, which is a division of the CDC.
This project was implemented using Python and Jupyter Notebooks. The Python libraries used include Pandas, Seaborn, Matplotlib, and Numpy.
The project involved Data Cleaning, Data Manipulation, and Exploratory Data Analysis.
The analysis identified key habits and behaviors that pose significant risks to people's health and provided insights into how these risks vary by age and gender.
Through this project, I learned how to visualize multi-variable relationships and perform exploratory data analysis.
In the future, I plan to enhance this project by incorporating machine learning algorithms.
This project is based on Openintro Statistics (4th Ed.)