Natural Language Processing (Part 5): Topic Modeling with Latent Dirichlet Allocation in Python

Share it with your friends Like

Thanks! Share it with your friends!

Close

This six-part video series goes through an end-to-end Natural Language Processing (NLP) project in Python to compare stand up comedy routines.

– Natural Language Processing (Part 1): Introduction to NLP & Data Science
– Natural Language Processing (Part 2): Data Cleaning & Text Pre-Processing in Python
– Natural Language Processing (Part 3): Exploratory Data Analysis & Word Clouds in Python
– Natural Language Processing (Part 4): Sentiment Analysis with TextBlob in Python
– Natural Language Processing (Part 5): Topic Modeling with Latent Dirichlet Allocation in Python
– Natural Language Processing (Part 6): Text Generation with Markov Chains in Python

All of the supporting Python code can be found here: https://github.com/adashofdata/nlp-in-python-tutorial

Comments

Praveen Kumar Maduri says:

Great explanation and nice work

Solopacker Podcast says:

Nice! really enjoyed the explanation! We're trying to google around to see a technique to identify a SEQUENCE of topics within documents, to test hypothesis that most of "these" documents follow a similar order of topics. If you happen to know a resource we can check out, we'd appreciate the nudge =) best wishes and stay safe

sai bhargav L says:

Nice Video but Voice very low . Please have better mike .

Claudiu Clement says:

This is by far the best LDA explanation video. Awesome job!

Rachhek Shrestha says:

haha the words in the topics are so inappropriate. But great video !

huda baraiki says:

Thank you Alice so so so much For this amazing illustration and application! 🌻

Soumyadip Sarkar says:

🔥🔥🔥

lalith Shankar says:

—————————————————————————

ValueError Traceback (most recent call last)

<ipython-input-137-cbe5724bdb63> in <module>()

1 corpus_transformed = ldana[corpusna]

2 corpus_transformed

—-> 3 list(zip([a for [(a,b)] in corpus_transformed], data_dtmna.index))

<ipython-input-137-cbe5724bdb63> in <listcomp>(.0)

1 corpus_transformed = ldana[corpusna]

2 corpus_transformed

—-> 3 list(zip([a for [(a,b)] in corpus_transformed], data_dtmna.index))

ValueError: too many values to unpack (expected 1)

when i try to run the last line. how do i fix it?

Wenyang Qian says:

One might this installs for it to work:

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

Wenyang Qian says:

Much better than DeepLearning's NPL series!!! What a gem in YouTube. Thank you!

Susovan Dey says:

Great…You explained it in an awesome way. Btw..Can this be used in text clustering??

Kay Yes says:

Thank you Alice. Very useful video

Acentro Puebla says:

Thank you so much. I learned a lot from your videos.

Inside Topo says:

This is the best LDA video I've ever seen. It is always easier to understand with examples.

Jae Hee Hwang says:

I had fun watching your videos. They were very real, applicable and informative! Thank you for these videos and I'll look forward to more videos from you. 🙂

Gels says:

are you Chinese?

ramnaresh raghuwanshi says:

Explained.. thanks for uploading video!!

Balasubramaniam Dakshinamoorthi says:

I try to run your Topic Modelling -passing my csv file , am getting TypeError: no supported conversion for types: (dtype('O'),) error. I changed your code as shown below since i don't have pickle

import pandas as pd

data = pd.read_csv('C:/Users/tbadi/TestIncidentDataCSV.csv')
>>> my local file

from gensim import matutils, models

import scipy.sparse

tdm = data.transpose()

tdm.head()

sparse_counts = scipy.sparse.csr_matrix(tdm)

corpus = matutils.Sparse2Corpus(sparse_counts)

#cv = pickle.load(open("cv_stop.pkl", "rb"))

id2word = dict((v, k) for k, v in cv.vocabulary_.items())

Ali Akram says:

Bless your soul!

Write a comment

*