Comparing machine learning models in scikit-learn

Share it with your friends Like

Thanks! Share it with your friends!

Close

We’ve learned how to train different machine learning models and make predictions, but how do we actually choose which model is “best”? We’ll cover the train/test split process for model evaluation, which allows you to avoid “overfitting” by estimating how well a model is likely to perform on new data. We’ll use that same process to locate optimal tuning parameters for a KNN model, and then we’ll re-train our model so that it’s ready to make real predictions.

Download the notebook: https://github.com/justmarkham/scikit-learn-videos
Quora explanation of overfitting: http://www.quora.com/What-is-an-intuitive-explanation-of-overfitting/answer/Jessica-Su
Estimating prediction error: https://www.youtube.com/watch?v=_2ij6eaaSl0&t=2m34s
Understanding the Bias-Variance Tradeoff: http://scott.fortmann-roe.com/docs/BiasVariance.html
Guiding questions for that article: https://github.com/justmarkham/DAT8/blob/master/homework/09_bias_variance.md
Visualizing bias and variance: http://work.caltech.edu/library/081.html

WANT TO GET BETTER AT MACHINE LEARNING? HERE ARE YOUR NEXT STEPS:

1) WATCH my scikit-learn video series:
https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A

2) SUBSCRIBE for more videos:
https://www.youtube.com/dataschool?sub_confirmation=1

3) JOIN “Data School Insiders” to access bonus content:
https://www.patreon.com/dataschool

4) ENROLL in my Machine Learning course:
https://www.dataschool.io/learn/

5) LET’S CONNECT!
– Newsletter: https://www.dataschool.io/subscribe/
– Twitter: https://twitter.com/justmarkham
– Facebook: https://www.facebook.com/DataScienceSchool/
– LinkedIn: https://www.linkedin.com/in/justmarkham/

Comments

Data School says:

Having problems with the code? I just finished updating the notebooks to use scikit-learn 0.23 and Python 3.9 🎉! You can download the updated notebooks here: https://github.com/justmarkham/scikit-learn-videos

Bijaya Manandhar says:

Would you please clarify why we need to use `solver='liblinear'` as one of the parameters in LogisticRegression model. Why we assume rest of the parameters as default ? Also, why we import `metrics` from `sklearn` to have the score function work to compute accuracy where as we can simply make use of the `score` function straight from the model `LogisticRegression` that we imported?

Royal Albert says:

your videos have helped me a lot. Thank you

Stanley SI says:

Stunningly clear logic and structure. Thank you!

Amit Kumar says:

Understanding the Bias-Variance Tradeoff:this doc has links to ensure sites have a check in Fig.1

Richie Gamer says:

Thank you very much! 🙂

Saleh Afzoon says:

sklearn.cross_validation has renamed as sklearn.model_selection

Hassan Abdullahi says:

what a perfect explanation,thank you sir

Aditya Sawant says:

Can we say that KNN would overfit as K values get smaller?

Humphrey Muriuki says:

I am getting the error below when I try to use the LogisticRegression model. "ConvergenceWarning: lbfgs failed to converge (status=1):

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT." Anyone who knows how I can resolve it?

Levon9 says:

The links at the bottom of your comments are great.

Maratimus Lion says:

💯💯💯⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡ make train_test_split(X, y, test_size=0.2, random_state=42) and get 100% accuracy 💯💯💯💯💯💯🎉🎉🎉🎉

Louise Buijs says:

Thanks so much for all these videos! Im doing an internship at a really nice group but they're letting me figure out most of the stuff by myself so this is super useful!

Gautam Jain says:

Man, he just makes it so easy to learn.

Wish we had half as good teachers as him in school.

A. S. says:

I love you man, i have watched every single video of yours.

Brenden Song says:

Thank you for the great class! I learned so much from your video!!!

Ali Fazal says:

you're doing a great job, I would just emphasize on giving more examples that are relatable and speaking like you're talking to another person in the room. I only give feedbacks because thats what I would've wanted from people tuning in.

Chromenia Studio says:

It is just awesome to understand the concept from you. Thanks a ton!

Suhail Chougle says:

This is by far the best Sci-kit Learn tutorial on Youtube. I can say this because I have seen almost every tutorial and this covers everything starting from scratch.I knew how all the algorithms work but what I needed was how do I implement those algorithms from loading the data set to all terminologies to checking the accuracy and what not and this series has everything I was looking for ,thank you so much for this.Really appreciate it.

Adrita Anwar says:

from sklearn.linear_model import LogisticRegression
logreg=LogisticRegression()
logreg.fit(X,y)
logreg.predict(X)

Why is the above code showing the following warning?

C:UsersASUSanaconda3libsite-packagessklearnlinear_model_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(

Priyanshu Gupta says:

Your way of delivery is exceptional. I have never seen somebody teaching so well like you. I made me interested in ML Thanks bro…God bless U

Miguel Gutierrez says:

Uno de los mejores manuales sobre "Machine learning" que he visto. Gracias por ofrecernos la oportunidad de aprender. Además, tu pronunciación es perfecta para hispanohablantes

mohak Agarwal says:

still the best video for beginners

Write a comment

*

Area 51
Ringing

Answer