Skip to content

thikyi/DataScienceCapstoneTwo

Repository files navigation

Los Angeles County Crime Rate Prediction

Los Angeles County Crime Rate Prediction

Context

Los Angeles has a crime rate of 2,759 per 100,000 people, which is higher than the national average of 2,580 per 100,000 people.I developed a machine learning model that integrates demographics, and historical crime reports to predict the likelihood of specific crimes occurring in specific areas within Los Angeles.By providing a data-driven crime prediction model , policy makers and certain communities can be empowered to implement targeted security measures and make informed decisions.

Project Approch

Data Acquisition

The necessary data was accquired from the sources https://data.lacity.org/Public-Safety/Crime-Data-from-2010-to-2019/63jg-8b9z/explore and https://data.lacity.org/Public-Safety/Crime-Data-from-2020-to-Present/2nrs-mtv8

Solution Overview

  1. Step1_DataWrangling.ipynb: Combined both data sources above,cleaned the data and handled missing data.
  2. Step2_Exploratory_data_analysis.ipynb: Statistically explored the dataset to understand itscharacteristics, patterns, and potential issues, and creating relevant features that capture the characteristics of crime areas.
  3. Step3_Preprocessing_and_training.ipynb: Prepared data to train the models,did feature engineering for both category and numerical data by using StandardScalar,OneHotEncoding and generated dummies datasets.
  4. Step4_Modeling.ipynb: Trained the diffrent types of regressions model to predict crime rate trends based on the influenced factors - crime type ,area,victim characteristic,Linear Regression model, Random Forest Regressor, Gradient Boost Regressor, and Decision Tree models are explored,then accessed the model's performance using appropriate metrics such as MAE,MSE and RMSE. Finally, Decision tree has been choosen as the best model, The model file is saved at models.
  5. ProjectReport is prepared.
  6. Slides Presentation is prepared.

Footer Note

All files cannot be uploaded into Github because of size limitation. Full data files processed throughout the project can be found at https://drive.google.com/drive/folders/1Gyf_0yEHZs2v8h3dXyWcxj6AWsMRsXUs?usp=sharing

About

2nd Data Science Capstone Project as part of Springboard's assignments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published