Update: Data coming soon! This repository is for hosting the code and data for the publication: "Understanding political polarization using languagemodels: A dataset and method".
This work was done as part of the course project 'Understanding Political Polarisation using NLP' for 11785-Intro to Deep Learning course at Carnegie Mellon University.
The structure explaining different branches of the report are:
- web-scraping-dev
senators.py - contains beautiful soup library that fetches all the senators as well as represntatives
along with their label R or D from each Congress page in the range 104 to 117.
data-congress.py - webscraping for fetching the content in each politicians wikipedia page along with
their personal details such as name, party, education, political life etc.
- Transformers-Pipeline
TRANSFORMERS_IDL.ipynb - The enitre pipeline for running different transformers model and getting model embeddings.
*Preliminary-tests
Prelimnary_tests_IDL_project.ipynb - Pipeline for Wor2Vec and KNN
Roberta_Large.ipynb - Pipeline for RoBERTa transformer model and SVC
Updated_Doc2Vec_tests_IDL_project.ipynb - Pipeline for Doc2Vec
-
senators.pyis the first file that executes to generate a .csv for all wikipedia links and labels -
data-congress.pywill dump required text content in each page to JSON file by reading the csv geneated bysenators.py -
Prelimnary_tests_IDL_project.ipynbhelps in getting initial baseline/preliminary test results using Word2Vec -
Roberta_Large.ipynb,Updated_Doc2Vec_tests_IDL_project.ipynbandTRANSFORMERS_IDL.ipynbare various other experiment pipline with different models
If you find our work useful in your research or if you use parts of this code, please consider citing our paper: Gode, S., Bare, S., Raj, B., & Yoo, H. (2023). Understanding political polarization using language models: A dataset and method. AI Magazine, 1–7. https://doi.org/10.1002/aaai.12104
For LaTeX users, you can use the following BibTeX entry:
@article{gode2023understanding,
title={Understanding political polarization using language models: A dataset and method},
author={Gode, S. and Bare, S. and Raj, B. and Yoo, H.},
journal={AI Magazine},
pages={1--7},
year={2023},
doi={https://doi.org/10.1002/aaai.12104}
}