Stop Words – Natural Language Processing With Python and NLTK p.2

One of the largest elements to any data analysis, natural language processing included, is pre-processing. This is the methodology used to "clean up" and prepare your data for analysis.

One of the first steps to pre-processing is to utilize stop-words. Stop words are words that you want to filter out of any analysis. These are words that carry no meaning, or carry conflicting meanings that you simply do not want to deal with.

The NLTK module comes with a set of stop words for many language pre-packaged, but you can also easily append more to this list.

I am having an error

TypeError Traceback (most recent call last)

<ipython-input-9-84092b7d701b> in <module>


7 for w in words:

—-> 8 if w not in stopwords:

9 filtered_sentence.append(w)

10 print(filtered_sentence)

TypeError: argument of type 'WordListCorpusReader' is not iterable


example = "Hello, Mr. John. How are you?"

stop_words = set(stopwords.words("english"))

words = word_tokenize(example)

filtered_sentence = []

for w in words:

if w not in stopwords:



so you imported some ready set of word, how do you import your own text for analysis

you can tell he doesn't have a formal education in python because he doesn't know the difference between for loops & generators, which are completely different objects.

6:10 how did you do that commenting?

Hey I'm getting an error as first download"stopwords"). But once I do it, still getting an error as, 'WordListCorpusReader' object has no attribute 'word'

UPDATE: it's working fine, once I did"punkt")

Hi I'd like to do a question answering system using nltk and I'd like to know that the necessary tools for the same and the procedures/stages also…hope that you'd notice this comment and process my request ..thanku☺

can I build other kinds of language with NLTK, or it only apply English?

Hi. In your previous video, you had an example text "Hello Mr. Smith, how are you doing today? The weather is great, and Python is awesome. The sky is pinkish-blue. You should not eat cardboard." When you run the same "stop-word-removal" code on this text, the output is: ['Hello', 'Mr.', 'Smith', ',', 'today', '?', 'The', 'weather', 'great', ',', 'Python', 'awesome', '.', 'The', 'sky', 'pinkish-blue', '.', 'You', 'eat', 'cardboard', '.']
Don't you think this just omits much of the important parts of the text?

When I compare the word to the stop words the loop removes the letters, not the word.

'this' is a stop word…but 'This' is not ???

how to remove stopwords in a complete file????how to set a directory of a file???

Sometimes removing the stop words can change the meaning of sentence can we handle this ? help needed  words like cannot,not etc

I need the stopwords removal for telugu language

Hello, I'm still trying to understand this w letters between the "for" and "in". Are they specific to nltk?

why did you use a set to get the stop words if stopwords.word("english") is already a list. Is there any reason behind that?

