Skip to content

Mini Project 1: Text Mining and Analysis#5

Open
jlee66 wants to merge 5 commits intosd16fall:masterfrom
jlee66:master
Open

Mini Project 1: Text Mining and Analysis#5
jlee66 wants to merge 5 commits intosd16fall:masterfrom
jlee66:master

Conversation

@jlee66
Copy link

@jlee66 jlee66 commented Oct 3, 2016

No description provided.

Copy link

@poosomooso poosomooso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I have a couple comments for you.

First of all, your writeup was pretty thorough, but I wanted to bring up that you said the average polarity was 0.122. A polarity of 0.122 is actually pretty neutral, since polarity goes from -1 to 1, so that fact might change your analysis. Maybe comment on the range of the polarity from the most negative comment to the most positive? And if you find that the results you got were mostly neutral, it's fine to say that your hypothesis was wrong and the data was more boring than you anticipated. It's more about the process and what you learned than the results.

In your first code box in your notebook, you get your data, and then you pickle it, and unpickle it in one box of code. The purpose of pickling should be that you only have to grab data from Twitter once, at the beginning of your work on the project, so that if you put down your code/close the notebook and come back, you don't have to run the part of the code that searches Twitter again (since that's slow and unreliable), but instead, you can run the box that has the two lines that loads from the pickle file. So you should be separating the code that pickles Twitter data and the code that loads pickle data in into different boxes, and in the future, just run the box that loads the pickle data.

In general, your code would benefit from a lot more comments. For example, there are places where you test the sentiment method (in the second code box, where you print the sentiment of a specific tweet), and it would be nice to know your intentions, like a comment saying that you are testing a method to see what kind of output you get.

Also in box 2, you keep using the t.search function, where earlier, you already generataed a list of tweets. For consistency and efficiency, you should be using your previous list of tweets and getting the sentiments of those, rather than getting the sentiments of a new list of tweets that you grab from Twitter during each for loop you have.

Also, you might benefit from a larger data set--I think the t.search function can take a count up to 100.

But good work! When doing revisions, if you need me to clarify anything I said, please ask. Conveying concepts over text is a difficult task.

@jlee66
Copy link
Author

jlee66 commented Oct 11, 2016

"Turning in my revised mini project 1 finalized" is the final one!

@jlee66
Copy link
Author

jlee66 commented Oct 11, 2016

Also, for the project write up, I kept with the result of my first attempt because the tweets would keep change. Thank you very much

@poosomooso
Copy link

Hi Jason! Some more comments.

In general, your code could use a lot more comments. For example, a comment above the 'for tweet in t.search('Pogba', start=i, count=30):' for loop saying that you are grabbing the text of 30 tweets and putting them in a list. Or explain in the code block where you are loading the pickle file, that the pickle file should contain a list of 30 tweets. These comments will help others that are reading your code and explain to them what you intended and what you are trying to accomplish. It will also help you when you go back through your own code, because instead of trying to read your code and figure out what you were trying to do, you can read your comment that says something like 'adding up all the sentiments in this list to take the average' and it will make what you're doing more obvious.

You have a lot of repetitive for loops, where you loop through all the tweets once and print them out, and then loop through all of them again to add them to a list. For example:

for tweet in reloaded_copy_of_texts:
print sentiment(tweet)
sentimentList=[]
for tweet in reloaded_copy_of_texts:
sentimentList.append(sentiment(tweet))

'reloaded_copy_of_texts' isn't changing, and you are using the exact same for loop two times in a row, so you can simplify the above code to:

sentimentList=[]
for tweet in reloaded_copy_of_texts:
print sentiment(tweet)
sentimentList.append(sentiment(tweet))

So now, you can achieve the same results, while only having to go through the list of tweets once.

Overall, pretty good job! The code has great variable names and is fairly straightforward, and you got some meaningful results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants