A collection of scripts that scrape and format comments from several major news publications.
Install the following dependencies in your terminal.
Use the command
pip install
In order to use the scraper to obtain any user comments, you must have a New York Times Developer API key.
In order to use the scraper to obtain any user comments, you must have Selenium installed.
In order to use the scraper to obtain any user comments, you must have Selenium installed.
In order to use the scrapers' write_to_gsheet() methods, you must have service account and OAuth2 credentials from the Google API Console.
The New York Times scraper obtains a comment's Article URL, Parent ID, Comment ID, User Display Name, Comment Body, Upload Date, Number of Likes, Number of Replies, and Editor's Selection.
The Washington Post scraper obtains a comment's Article URL, User Display Name, Comment Body, Upload Date, and Number of Likes.
The FiveThirtyEight scraper obtains a comment's Article URL, User Display Name, Comment Body, and Upload Date.
Begin by initializing a new instance of your desired scraper.
WaPo_Scraper = washingtonpost(my_chromedriver_path)
NYT_Scraper = nyt(my_api_key)
FiveThirtyEight_Scraper = fivethirtyeight(my_chomedriver_path)You can retrieve a list of comments from a single article using the article URL with the get_article_comments() method.
my_article = "https://www.washingtonpost.com/politics/2021/04/13/risk-reward-calculus-johnson-johnson-vaccine-visualized/"
WaPo_Scraper.get_article_comments(my_article)You can retrieve a list of comments from a list of articles with the get_comments_from_multiple_articles() method.
my_article_list = ["https://www.nytimes.com/2015/04/12/opinion/sunday/david-brooks-the-moral-bucket-list.html", "https://www.nytimes.com/2019/06/21/science/giant-squid-cephalopod-video.html", "https://www.nytimes.com/2021/08/01/insider/the-olympics-that-feel-like-only-competitions.html"]
NYT_Scraper.get_comments_from_multiple_articles(my_article_list)You can retrieve a list of articles from a Google Spreadsheet with the get_articles_from_spreadsheet() method.
FiveThirtyEight_Scraper.get_articles_from_spreadsheet(spreadsheet_url, sheet_number)You can convert a list of comments into a Pandas dataframe with the get_dataframe() method.
WaPo_Scraper.get_dataframe(comments_list)You can write a dataframe of comments into a Google Spreadsheet with the write_to_gsheet() method.
NYT_Scraper.write_to_gsheet(dataframe, gsheet_path, gsheet_name, sheet_number)