Purpose: To create a model, which automatically predicts the genre of any new music developed by a user. An ideal use case would be - song streaming websites using the system to easily identify song genres for unlabelled songs.
Techniques used: SVM, KNN, Logistic Regression, Random Forest, Gradient Boosting, Multilayer Perceptron
Tools Used: Jupyter Notebook, Python
Overview of each code:
- audio_analysis/Audio EDA v0.1ss.ipynb: The code contains:
- Exploratory audio data analysis
- audio_analysis/Audio Models+Ensemble v0.1ss.ipynb:
-
Audio Modeling - QDA, Logistic, SVM
-
Ensemble models
- Text Modeling (5000 words, 1000 word) - text_analysis/final_text_analysis.ipynb The code contains:
-
Text Modeling - Random Forest, Gradient Boosting, Multilayer Perceptron
-
Ensemble Model
-
Text EDA - data_eda/word_cloud.ipynb
-
Ensemble Boosting.ipynb: The code contains:
- Boosting for ensemble
- LyricBagofWordsClassifier.ipynb This code contains:
- Text modeling (500 word) - Random Forest, SVM, KNN, Logistic Regression
-
Genre Annotations: http://www.tagtraum.com/msd_genre_datasets.html (File name: msd_tagtraum_cd2c.cls.zip)
-
Audio Dataset: https://labrosa.ee.columbia.edu/millionsong/tasteprofile
-
Lyrics Dataset: https://labrosa.ee.columbia.edu/millionsong/musixmatch