Stop Words – Natural Language Processing With Python and NLTK p.2

Share it with your friends Like

Thanks! Share it with your friends!

Close

One of the largest elements to any data analysis, natural language processing included, is pre-processing. This is the methodology used to "clean up" and prepare your data for analysis.

One of the first steps to pre-processing is to utilize stop-words. Stop words are words that you want to filter out of any analysis. These are words that carry no meaning, or carry conflicting meanings that you simply do not want to deal with.

The NLTK module comes with a set of stop words for many language pre-packaged, but you can also easily append more to this list.

Playlist link: https://www.youtube.com/watch?v=FLZvOKSCkxY&list=PLQVvvaa0QuDf2JswnfiGkliBInZnIC4HL&index=1

sample code: http://pythonprogramming.net
http://hkinsley.com
https://twitter.com/sentdex
http://sentdex.com
http://seaofbtc.com

Comments

Mj says:

They are using your material without referencing it here: https://pythonspot.com/nltk-stop-words/

lucid storm says:

what if you want to print only first rows of the non tokenized and tokenized thingy (when a text is very long)

Vadivel chennai says:

Pretty simple and clean. Keep it up. If possible pl reduce key stroke sound.

Suvid Singhal says:

I am having an error

TypeError Traceback (most recent call last)

<ipython-input-9-84092b7d701b> in <module>

6

7 for w in words:

—-> 8 if w not in stopwords:

9 filtered_sentence.append(w)

10 print(filtered_sentence)

TypeError: argument of type 'WordListCorpusReader' is not iterable

Code:-

example = "Hello, Mr. John. How are you?"

stop_words = set(stopwords.words("english"))

words = word_tokenize(example)

filtered_sentence = []

for w in words:

if w not in stopwords:

filtered_sentence.append(w)

print(filtered_sentence)

Sasha Marova says:

I love you!!!!!! Ur videos are really helpful and easy to learn!

lucid storm says:

so you imported some ready set of word, how do you import your own text for analysis

Bhargab Sarma says:

thaks…….very nicely explained

Alec says:

'one-liner'

you can tell he doesn't have a formal education in python because he doesn't know the difference between for loops & generators, which are completely different objects.

laxman banoth says:

Time stamp order program in txt summary

Jiyeon Jeon says:

great video. nlp learners needs you;)

Gokul Sundeep says:

6:10 how did you do that commenting?

Ross Moffitt says:

Thank you sir. This is very helpful.

Shravankumar Shetty says:

Hey I'm getting an error as first download nltk.download("stopwords"). But once I do it, still getting an error as, 'WordListCorpusReader' object has no attribute 'word'

UPDATE: it's working fine, once I did nltk.download("punkt")

mohammad zaman dehbashi says:

thank you man

Sidharth Babu says:

Hi I'd like to do a question answering system using nltk and I'd like to know that the necessary tools for the same and the procedures/stages also…hope that you'd notice this comment and process my request ..thanku☺

Chris Austin says:

lol nice one 🙂

Triều Lê says:

can I build other kinds of language with NLTK, or it only apply English?

Yoseph Solomon says:

hehehe, "there's that D again…."

Vidit Khanna says:

Appreciate short duration videos.

Shrinivas Iyengar says:

Hi. In your previous video, you had an example text "Hello Mr. Smith, how are you doing today? The weather is great, and Python is awesome. The sky is pinkish-blue. You should not eat cardboard." When you run the same "stop-word-removal" code on this text, the output is: ['Hello', 'Mr.', 'Smith', ',', 'today', '?', 'The', 'weather', 'great', ',', 'Python', 'awesome', '.', 'The', 'sky', 'pinkish-blue', '.', 'You', 'eat', 'cardboard', '.']
Don't you think this just omits much of the important parts of the text?

Software Developer says:

When I compare the word to the stop words the loop removes the letters, not the word.

TrailData Analytics says:

Great one !

Pradeep Singh says:

'this' is a stop word…but 'This' is not ???

syeda bushra says:

how to remove stopwords in a complete file????how to set a directory of a file???

Apeksha Tadge says:

Sometimes removing the stop words can change the meaning of sentence ..how can we handle this ? help needed  words like cannot,not etc

shaik nashwa says:

I need the stopwords removal for telugu language

rohit sancheti says:

this reminds me of Kevin from The Office!!

Homero Baroni says:

Hello, I'm still trying to understand this w letters between the "for" and "in". Are they specific to nltk?

Prawigya Pariyar says:

why did you use a set to get the stop words if stopwords.word("english") is already a list. Is there any reason behind that?

Write a comment

*