This repo contains notebooks where I work on some datasets and apply machine learning. It's purpose is to share and learn.
-
Nobel prize winners analysis there's an analysis on the evolution of nobel price winners and winnings by gender as well as some visual EDA.
-
Drunk Iowa this dataset includes Breathalyzer tests check the notebook for analysis details !
-
DGA (Domain Generation algorithm) which is a malware that basically uses an algorithm that generates a lot of random domain names, this notebook uses NLP to build a classifier which differentiates between a legit domain name and a random domain name. This projects was part of my masters degree project.
-
Titanic dataset data analyses from kaggle. Where the goal is to predict whether a passenger will survive or not.
-
Moby Dick book word frequency analysis using nltk, requests and bs4.
-
Linux repo commits analysis, in analysis we get the most productive year in terms of commits within the python project. We also get to see the most productive contributers. Guess who's the most productive one ?
-
Housing dataset from kaggle competition, this kernel includes data imputation and a basic introduction XGBoost.
-
Airplanes dataset analysis with pyspark.