## Comparing machine learning models in scikit-learn

We've learned how to train different machine learning models and make predictions, but how do we actually choose which model is "best"? We'll cover the train/test split process for model evaluation, which allows you to avoid "overfitting" by estimating how well a model is likely to perform on new data. We'll use that same process to locate optimal tuning parameters for a KNN model, and then we'll re-train our model so that it's ready to make real predictions.

Download the notebook: https://github.com/justmarkham/scikit-learn-videos

Quora explanation of overfitting: http://www.quora.com/What-is-an-intuitive-explanation-of-overfitting/answer/Jessica-Su

Estimating prediction error: https://www.youtube.com/watch?v=_2ij6eaaSl0&t=2m34s

Understanding the Bias-Variance Tradeoff: http://scott.fortmann-roe.com/docs/BiasVariance.html

Guiding questions for that article: https://github.com/justmarkham/DAT8/blob/master/homework/09_bias_variance.md

Visualizing bias and variance: http://work.caltech.edu/library/081.html

WANT TO GET BETTER AT MACHINE LEARNING? HERE ARE YOUR NEXT STEPS:

1) WATCH my scikit-learn video series:

https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A

2) SUBSCRIBE for more videos:

https://www.youtube.com/dataschool?sub_confirmation=1

3) JOIN "Data School Insiders" to access bonus content:

https://www.patreon.com/dataschool

4) ENROLL in my Machine Learning course:

https://www.dataschool.io/learn/

5) LET'S CONNECT!

- Newsletter: https://www.dataschool.io/subscribe/

- Twitter: https://twitter.com/justmarkham

- Facebook: https://www.facebook.com/DataScienceSchool/

- LinkedIn: https://www.linkedin.com/in/justmarkham/

We’ve learned how to train different machine learning models and make predictions, but how do we actually choose which model is “best”? We’ll cover the train/test split process for model evaluation, which allows you to avoid “overfitting” by estimating how well a model is likely to perform on new data. We’ll use that same process to locate optimal tuning parameters for a KNN model, and then we’ll re-train our model so that it’s ready to make real predictions.

Download the notebook: https://github.com/justmarkham/scikit-learn-videos

Quora explanation of overfitting: http://www.quora.com/What-is-an-intuitive-explanation-of-overfitting/answer/Jessica-Su

Estimating prediction error: https://www.youtube.com/watch?v=_2ij6eaaSl0&t=2m34s

Understanding the Bias-Variance Tradeoff: http://scott.fortmann-roe.com/docs/BiasVariance.html

Guiding questions for that article: https://github.com/justmarkham/DAT8/blob/master/homework/09_bias_variance.md

Visualizing bias and variance: http://work.caltech.edu/library/081.html

WANT TO GET BETTER AT MACHINE LEARNING? HERE ARE YOUR NEXT STEPS:

1) WATCH my scikit-learn video series:

https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A

2) SUBSCRIBE for more videos:

https://www.youtube.com/dataschool?sub_confirmation=1

3) JOIN “Data School Insiders” to access bonus content:

https://www.patreon.com/dataschool

4) ENROLL in my Machine Learning course:

https://www.dataschool.io/learn/

5) LET’S CONNECT!

– Newsletter: https://www.dataschool.io/subscribe/

– Twitter: https://twitter.com/justmarkham

– Facebook: https://www.facebook.com/DataScienceSchool/

– LinkedIn: https://www.linkedin.com/in/justmarkham/

Having problems with the code? I just finished updating the notebooks to use

scikit-learn 0.23andPython 3.9🎉! You can download the updated notebooks here: https://github.com/justmarkham/scikit-learn-videosWould you please clarify why we need to use `solver='liblinear'` as one of the parameters in LogisticRegression model. Why we assume rest of the parameters as default ? Also, why we import `metrics` from `sklearn` to have the score function work to compute accuracy where as we can simply make use of the `score` function straight from the model `LogisticRegression` that we imported?

your videos have helped me a lot. Thank you

Stunningly clear logic and structure. Thank you!

Understanding the Bias-Variance Tradeoff:this doc has links to ensure sites have a check in Fig.1

Thank you very much! 🙂

sklearn.cross_validation has renamed as sklearn.model_selection

what a perfect explanation,thank you sir

Can we say that KNN would overfit as K values get smaller?

I am getting the error below when I try to use the LogisticRegression model. "ConvergenceWarning: lbfgs failed to converge (status=1):

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT." Anyone who knows how I can resolve it?

The links at the bottom of your comments are great.

💯💯💯⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡

make train_test_split(X, y, test_size=0.2, random_state=42) and get 100% accuracy💯💯💯💯💯💯🎉🎉🎉🎉Thanks so much for all these videos! Im doing an internship at a really nice group but they're letting me figure out most of the stuff by myself so this is super useful!

Man, he just makes it so easy to learn.

Wish we had half as good teachers as him in school.

I love you man, i have watched every single video of yours.

Thank you for the great class! I learned so much from your video!!!

you're doing a great job, I would just emphasize on giving more examples that are relatable and speaking like you're talking to another person in the room. I only give feedbacks because thats what I would've wanted from people tuning in.

It is just awesome to understand the concept from you. Thanks a ton!

This is by far the best Sci-kit Learn tutorial on Youtube. I can say this because I have seen almost every tutorial and this covers everything starting from scratch.I knew how all the algorithms work but what I needed was how do I implement those algorithms from loading the data set to all terminologies to checking the accuracy and what not and this series has everything I was looking for ,thank you so much for this.Really appreciate it.

from sklearn.linear_model import LogisticRegression

logreg=LogisticRegression()

logreg.fit(X,y)

logreg.predict(X)

Why is the above code showing the following warning?

C:UsersASUSanaconda3libsite-packagessklearnlinear_model_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:

https://scikit-learn.org/stable/modules/preprocessing.html

Please also refer to the documentation for alternative solver options:

https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression

n_iter_i = _check_optimize_result(

Your way of delivery is exceptional. I have never seen somebody teaching so well like you. I made me interested in ML Thanks bro…God bless U

Uno de los mejores manuales sobre "Machine learning" que he visto. Gracias por ofrecernos la oportunidad de aprender. Además, tu pronunciación es perfecta para hispanohablantes

still the best video for beginners