Skip to content

Text Mining Project #4

Open
pdublish wants to merge 2 commits intosd16fall:masterfrom
pdublish:master
Open

Text Mining Project #4
pdublish wants to merge 2 commits intosd16fall:masterfrom
pdublish:master

Conversation

@pdublish
Copy link

@pdublish pdublish commented Oct 3, 2016

No description provided.

Copy link

@poosomooso poosomooso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on the project! Just a few things.

In your first two blocks, you pickle the list of tweets, and in the same box, you load it back. The purpose of pickling is so you can grab data from Twitter just once, when you begin the project. Then, if you want to put down the code and work on it later, the next time, you don't have to do a Twitter search (which is slow and unreliable, because internet). Instead, you have a pickled file, and all you have to do is "pickle.load" and you have a list of tweets again. So definitely keep the pickle.dump and pickle.load in different boxes, and just run the box that loads from a file if you can.

I also noticed that a lot of times you have two code blocks that are almost identical, except one uses the gunsense tweets, and the other uses guncontrol tweets. If you find yourself copy/pasting a lot, put your code block inside a function, and take the one changing parameter (in this case, the list of tweets) and make it a parameter to the function. It'll make your code a lot less repetitive.

In general, good job using descriptive names for everything, and putting comments around.

Pretty small thing that's mostly important for sescurity reasons: try not to put your API keys on Github (or the internet in general). There is a risk of bad people getting ahold of your keys and using them for bad purposes. Generally, you want to put them in a separate file (that you don't commit), and read the key from that file when you need it. That way people on the internet can't see your API key.

But all in all, great project, great visualizations, and kudos for doing research and using Indico to help you. If you need any clarification on my comments, feel free to ask! Text is not the best medium for explaining code.

Project version 2
@poosomooso
Copy link

Hi Prabha! Some final comments.

A lot of your code is pretty straightforward and your variable names are really great and readable. I think you should use functions more, though. For example, have a function called 'reload_tweets' that takes in the name of the pickle file and returns the list of tweets. Or a function that is 'get_lin_regress_rsquared' that gets the r_squared for a given list of sentiments and objectivities. A lot of times, you copy and paste code, where the only part you change is the list of tweets you are using. Writing functions really cut down on copy/pasting and make your code cleaner (because your function names and docstrings should be able to tell the reader what is does!).

Your code could also be more commented. You have good comments regarding what each section does, but in nonobvious parts, like when you take the mean of the lists of the different political parties, you could have a short comment on why you are taking the mean. Also, your function should have a docstring, so instead of :

##Function for sentiment analysis
def sentiment_analysis(tweets):

do:

def sentiment_analysis(tweets):
"""Function for sentiment analysis"""

It doesn't seem important, but docstrings are the python standard, and are also important if you choose to make doctests.

But overall your code is awesome--you do tons of analysis, make use of a library we didn't even suggest, and the code (besides the copy/pasting) is very readable and clean. Awesome project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants