For my final project, I chose to attempt to classify tweets by U.S. politicians according to their political party, as a free topic. This repository contains my project proposal and progress report, as well as the data used for this project (in the folder "data_files") and the Jupyter notebooks containing my code and documentation (in the folder "notebooks"). The demonstration is given in the YouTube videos linked below.
My code is divided up between several Jupyter notebooks, listed below. The documentation for my code is also included in these notebooks, as text cells explaining the code. All of these Jupyter notebooks are posted in this repository (in the folder "notebooks"). This code can be tested by running the file in Google Collaboratory, or using Anaconda (among other options). Notebooks:
- Notebook 0 - Introduction: introduction and summary of the project.
- Notebook 1 - Tweet Scraper: gathering data using Tweepy and the Twitter API.
- Notebook 2 - Data Preprocessing: preparing the data for analysis.
- Notebook 3 - Word Frequencies by Party: investigating the word distribution by political party.
- Notebook 4 - Political Party Classifier: training classifiers to take text, and classify it by political party.
- Notebook 5 - LDA: discovering topics in the collection of tweets, and investigating how they break down by political party.
- Notebook 6 - Conclusion: summary of results and ideas for future work.
My demonstration is divided into the seven YouTube videos linked below, with one video corresponding to each of the Jupyter notebooks listed above.
- Notebook 0 - Introduction: https://youtu.be/MJDm-hJuJio
- Notebook 1 - Tweet Scraper: https://youtu.be/BaAiCVspBG8
- Notebook 2 - Data Preprocessing: https://youtu.be/Bv-PXU5ld7U
- Notebook 3 - Word Frequencies by Party: https://youtu.be/sjoV9uaMVjM
- Notebook 4 - Political Party Classifier: https://youtu.be/Nf_paaPLxmk
- Notebook 5 - LDA: https://youtu.be/LU-F3MSooZQ
- Notebook 6 - Conclusion: https://youtu.be/Yjtk0ISA-M0