GitHub - NUELBUNDI/NLP-using-Python: Natural Language Processing in Python

NLP-using-Python

Welcome to a Natural Language Processing series, using the Natural Language Toolkit, or NLTK, module with Python

01 Tokenizing Word and Sentence

Token

Each "entity" that is a part of whatever was split up based on rules.
For examples, each word is a token when a sentence is "tokenized" into words.
Each sentence can also be a token, if you tokenized the sentences out of a paragraph.

02 Stop Words

Stop words as words that just contain no meaning, we want to remove them.
Words like we, she, is, a etc.

03 Stemming & Lemmatizing

The idea of stemming is a sort of normalizing method.
Many variations of words carry the same meaning, other than when tense is involved.
The reason why we stem is to shorten the lookup, and normalize sentences.
Stemming can often create non-existent words, whereas Lemmas are actual words
Stemming sometime can't have meaning in a dictionary, but Lemmas will defintely have meaning.

04 POS & Chunking

Part of Speech (POS)

This means labeling words in a sentence as nouns,adjectives,verbs...etc.

Chunking

Group words into hopefully meaningful chunks.
One of the main goals of chunking is to group into what are known as "noun phrases."
These are phrases of one or more words that contain a noun, maybe some descriptive words

05 Named Entity Recognition

The idea is to have the machine immediately be able to pull out "entities"
like people, places, things, locations, monetary figures, and more.
There are two major options with NLTK's named entity recognition:
1. Either recognize all named entities
2. Or recognize named entities as their respective type, like people, places, locations, etc.

Sms Spam Dataset

Link - https://www.kaggle.com/uciml/sms-spam-collection-dataset

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
01_Tokenizing_Words_and_Sentences_with_NLTK.ipynb		01_Tokenizing_Words_and_Sentences_with_NLTK.ipynb
02_Stop_Word_with_NLTK.ipynb		02_Stop_Word_with_NLTK.ipynb
03_Stemming_&_Lemmatizing_.ipynb		03_Stemming_&_Lemmatizing_.ipynb
04_POS_&_Chunking.ipynb		04_POS_&_Chunking.ipynb
05_Name_Entity_Recognition.ipynb		05_Name_Entity_Recognition.ipynb
06_Word_Net.ipynb		06_Word_Net.ipynb
07 Word Embedding.pdf		07 Word Embedding.pdf
08_Count_Vectorization.ipynb		08_Count_Vectorization.ipynb
09 TF-IDF Theory.pdf		09 TF-IDF Theory.pdf
10_SMS_Spam_Case_Study.ipynb		10_SMS_Spam_Case_Study.ipynb
Final_Term_Paper_Scripts.ipynb		Final_Term_Paper_Scripts.ipynb
README.md		README.md
Topic_Modelling.ipynb		Topic_Modelling.ipynb
Twitter_Scrapping_ipynb_txt.ipynb		Twitter_Scrapping_ipynb_txt.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-using-Python

01 Tokenizing Word and Sentence

Token

02 Stop Words

03 Stemming & Lemmatizing

04 POS & Chunking

Part of Speech (POS)

Chunking

05 Named Entity Recognition

Sms Spam Dataset

About

Uh oh!

Releases

Packages

Languages

NUELBUNDI/NLP-using-Python

Folders and files

Latest commit

History

Repository files navigation

NLP-using-Python

01 Tokenizing Word and Sentence

Token

02 Stop Words

03 Stemming & Lemmatizing

04 POS & Chunking

Part of Speech (POS)

Chunking

05 Named Entity Recognition

Sms Spam Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages