Natural Language Processing|TF-IDF for Machine Learning| Text Prerocessing

Share it with your friends Like

Thanks! Share it with your friends!


Here is a detailed discussion of the Term Frequency and Inverse Document Frequency in Natural Language Processing.

For more videos on ML or deep learning please check the below url

NLP playlist:…

Deep Learning :

Statistics in ML :

Feature Engineering:

Data Preprocessing Techniques:

Machine learning:


A K says:

Honestly I searched many videos to understand TFIDF .. this video is “The best” among the rest !

Ishant Yadav says:

Sir this is the best playlist to learn about NLP and saves a lot of time into researching source material. I am referring to it for my internship. Really underrated channel; I wish you much more subscribers and support.

Anjali Arora says:

so u haven't implemented the tf-idf completely..i could not understand where is vocabulary


Hi @Krish
I have tried using the tfidf on the boy-girl example, but it is not doing 0 for the word "good", I am getting following result.
['boy', 'girl', 'good']
array([[0.78980693, 0. , 0.61335554],

[0. , 0.78980693, 0.61335554],

[0.61980538, 0.61980538, 0.48133417]])
why it is so ?

Prem ranjan says:

Sir, I love your work. I am currently doing specilization in Data Science and AI and I am learned more from than my two years of college. Keep up the good work sir!

manoj jena says:

is the code workws for odia languagae text

Moulindu Sarkar says:

I used TF-IDF on the para you considered here, but got something like this:
paragraph = """The boy is good. The girl is good. The boy and girl are good"""
array([[0.78980693, 0. , 0.61335554],
[0. , 0.78980693, 0.61335554],
[0.61980538, 0.61980538, 0.48133417]])
But I should get two zeroes in the first and second rows right, according to the formula?

suvarna deore says:

Thank you krish sir

dharmendra singh says:

Excellent explanation!

Arnold Nana says:

Great video

Dhiraj Sharma says:

Error: Expected 2D array, got 1D array instead

Dhiraj Sharma says:

Hi, Thank you so much for the awesome video. I am getting mention below error, could you please help me with it. Thanks
Error : ""

salman haider says:

I m getting error while importing TfidVectorizer package.

Henok Gashaw says:

Excellent explanation!!!!!


Great Video!!!

Fun Time says:

Thank you sir!


Hey Krish, You are the best instructor I had ever seen, You deserve more and more. You explain each thing in a way that it should be. I gonna will be one of the members all the times, I have learnt something in each of your videos.

Anant Chourasia says:

Bro that was an awesome explanation 🤐 Keep it up ✌🏻

Shubham Teke says:

Sir i got 1 error like 'list' object has no attribute 'lower'. How to solve this error

Karthikeyan Palanisamy says:

Amazing Content. No words to thank for explaining so beautifully 🙂

Kayode Oyedele says:

This is great..You are the best man ..Really nice videos

Sandipan Sarkar says:

Superb video Krish to contribute to understand of NLP.Thanks

dnakhawa says:

You are the Best sir in Data Science

Sudeshna Dutta says:

Hi Krish. I had one question. There's a parameter called max_features inside CountVectorizer as well as tdidfvectorizer. How does that work? I am assuming that when the frequency distribution is calculated and sorted then we can choose the top 'n' features? Is that correct? Please let me know. Thank you

Aditya Sharma says:

Fantastic Sir, you are making Data Science easier and easier day by day. One query Sir. when we convert the corpus using Tf-IdfVecorizer or Counter Vecorizer we got array of shape (31,114) what is this 114?

sindhu nannapaneni says:

Thanq so much to take a step forward to help all who thrive to learn data science, kudos to your great efforts, your videos are really helpful and am gaining knowledge as well as confidence by watching your videos..God bless you Krish

Write a comment


Area 51