Los Angeles has a crime rate of 2,759 per 100,000 people, which is higher than the national average of 2,580 per 100,000 people.I developed a machine learning model that integrates demographics, and historical crime reports to predict the likelihood of specific crimes occurring in specific areas within Los Angeles.By providing a data-driven crime prediction model , policy makers and certain communities can be empowered to implement targeted security measures and make informed decisions.
The necessary data was accquired from the sources https://data.lacity.org/Public-Safety/Crime-Data-from-2010-to-2019/63jg-8b9z/explore and https://data.lacity.org/Public-Safety/Crime-Data-from-2020-to-Present/2nrs-mtv8
- Step1_DataWrangling.ipynb: Combined both data sources above,cleaned the data and handled missing data.
- Step2_Exploratory_data_analysis.ipynb: Statistically explored the dataset to understand itscharacteristics, patterns, and potential issues, and creating relevant features that capture the characteristics of crime areas.
- Step3_Preprocessing_and_training.ipynb: Prepared data to train the models,did feature engineering for both category and numerical data by using StandardScalar,OneHotEncoding and generated dummies datasets.
- Step4_Modeling.ipynb: Trained the diffrent types of regressions model to predict crime rate trends based on the influenced factors - crime type ,area,victim characteristic,Linear Regression model, Random Forest Regressor, Gradient Boost Regressor, and Decision Tree models are explored,then accessed the model's performance using appropriate metrics such as MAE,MSE and RMSE. Finally, Decision tree has been choosen as the best model, The model file is saved at models.
- ProjectReport is prepared.
- Slides Presentation is prepared.
All files cannot be uploaded into Github because of size limitation. Full data files processed throughout the project can be found at https://drive.google.com/drive/folders/1Gyf_0yEHZs2v8h3dXyWcxj6AWsMRsXUs?usp=sharing
