Skip to content

alaakh42/Toxic_Comment_Classification

Repository files navigation

Toxic_Comment_Classification

My Trial to tackle the Kaggle Toxic Comment Classification Competition

I built a model that calculates the probability of a comment belonging to any of the mentioned classes, I used XGBoost after generating feature vectors using GLove and Google news Word2Vec

I got a total AUC of 0.82

Resources needed:

  • Download data from kaggle competition page here
  • Download GLove Word Vectors here, choose the 300d.480B model
  • Download GoogleNews Word Vectors here
  • To use the Keras model built in the file example_to_clarify.py, you need to download the 20 Newsgroup dataset

Note:

final_try.py 

file is an implementation to XGBoost algorithm on the same data

To Do::

  1. You definetly can make much more hyperparameter optimization epecially regarding the LSTM model. for example: You can try playing around with max_features, max_len, Droupout_rate,size of the Dense layer, etc...

  2. You can try differnt feature engineering and normaization techniques for the text data

  3. In general try playing around with parameters like batch_size, num_epochs and learning_rate

  4. Try to use differnt optimization function, maybe Adagrad ,Adadelta or sgd

About

My Trial to tackle the Kaggle Toxic Comment Classification Competition

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published