Sarcasm detection using BERT

This project uses NLP techniques to classify if tweets are sarcastic or not. BERT is used to train the model and arrive at the predictions.

How to run the code

The executable code resides in the file Sentiment_Analysis_with_BERT.ipynb. This code needs to be directly executed from Google Colab. Click on the button below to open the file in Colab.
Once in Colab, the code needs to run on a GPU. From Colab, navigate to Edit> Notebook Settings. Select GPU from the Hardware accelerator dropdown
The notebook can be executed by executing all the code blocks in order by clicking on the black 'Play' button at the top of each block.
In the end, all the predictions are stored in answer.txt in the output folder in the workspace.

A video tutorial is available HERE

How the code works

This project uses BERT (Bidirectional Encoder Representations from Transformers) which is a state-of-the-art machine learning model used for NLP tasks. BERT is a pre-trained NLP model which can be further trained to solve several text classification problems. BERT makes use of Transformer, an attention mechanism that learns contextual relations between words (or sub-words) in a text.

The HuggingFace Transformers library is used to get the BERT model that works with Tensorflow.

This is how the code works at the high level

Copy the testing and training data from github to the Colab workspace
Read the testing and training data from jsonl file and convert them into a csv file
Clean the input data by removing URL and USER tags from the tweets
Split the training dataset into training and validation. This will be used to train the model.
Extract only the required columns for further processing.
Create the BERT model and tokenizer
Convert the training and validation data into the BERT format using the helper functions defined above
Use model.compile to set the optimizer, loss function that BERT will use to train the model
Call model.fit to actually train the model based on the training and validation data
Make predictions on the test data based on the trained model.
Write the resuts to answer.txt in the output folder in the workspace.

Dependencies

python
tensorflow
transformers
pandas
sklearn
os
urllib
jsonlines
csv

References

Sentiment Analysis in 10 Minutes with BERT and TensorFlow by Orhan G. Yalçın
FigLang2020-Sarcasm-detection Github

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
Final Project Proposal.pdf		Final Project Proposal.pdf
Progress Report.pdf		Progress Report.pdf
README.md		README.md
Sentiment_Analysis_with_BERT.ipynb		Sentiment_Analysis_with_BERT.ipynb
answers.txt		answers.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sarcasm detection using BERT

How to run the code

How the code works

Dependencies

References

About

Uh oh!

Releases

Packages

Languages

dilipis/CourseProject

Folders and files

Latest commit

History

Repository files navigation

Sarcasm detection using BERT

How to run the code

How the code works

Dependencies

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages