Comparing machine learning models in scikit-learn

We’ve learned how to train different machine learning models and make predictions, but how do we actually choose which model is “best”? We’ll cover the train/test split process for model evaluation, which allows you to avoid “overfitting” by estimating how well a model is likely to perform on new data. We’ll use that same process to locate optimal tuning parameters for a KNN model, and then we’ll re-train our model so that it’s ready to make real predictions.

Download the notebook:
Quora explanation of overfitting:
Estimating prediction error:
Understanding the Bias-Variance Tradeoff:
Guiding questions for that article:
Visualizing bias and variance:


Data School says:

Having problems with the code? I just finished updating the notebooks to use scikit-learn 0.23 and Python 3.9 🎉! You can download the updated notebooks here:

Bijaya Manandhar says:

Would you please clarify why we need to use `solver='liblinear'` as one of the parameters in LogisticRegression model. Why we assume rest of the parameters as default ? Also, why we import `metrics` from `sklearn` to have the score function work to compute accuracy where as we can simply make use of the `score` function straight from the model `LogisticRegression` that we imported?

Royal Albert says:

Stanley SI says:

Amit Kumar says:

Understanding the Bias-Variance Tradeoff:this doc has links to ensure sites have a check in Fig.1

Richie Gamer says:

Saleh Afzoon says:

sklearn.cross_validation has renamed as sklearn.model_selection

Hassan Abdullahi says:

Aditya Sawant says:

Can we say that KNN would overfit as K values get smaller?

Humphrey Muriuki says:

I am getting the error below when I try to use the LogisticRegression model. "ConvergenceWarning: lbfgs failed to converge (status=1):

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT." Anyone who knows how I can resolve it?

Levon9 says:

Maratimus Lion says:

Louise Buijs says:

Thanks so much for all these videos! Im doing an internship at a really nice group but they're letting me figure out most of the stuff by myself so this is super useful!

Gautam Jain says:

A. S. says:

Brenden Song says:

Ali Fazal says:

you're doing a great job, I would just emphasize on giving more examples that are relatable and speaking like you're talking to another person in the room. I only give feedbacks because thats what I would've wanted from people tuning in.

Chromenia Studio says:

Suhail Chougle says:

This is by far the best Sci-kit Learn tutorial on Youtube. I can say this because I have seen almost every tutorial and this covers everything starting from scratch.I knew how all the algorithms work but what I needed was how do I implement those algorithms from loading the data set to all terminologies to checking the accuracy and what not and this series has everything I was looking for ,thank you so much for this.Really appreciate it.

Adrita Anwar says:

from sklearn.linear_model import LogisticRegression

Why is the above code showing the following warning? ConvergenceWarning: lbfgs failed to converge (status=1):

Increase the number of iterations (max_iter) or scale the data as shown in:
Please also refer to the documentation for alternative solver options:
n_iter_i = _check_optimize_result(

Priyanshu Gupta says:

Miguel Gutierrez says:

Uno de los mejores manuales sobre "Machine learning" que he visto. Gracias por ofrecernos la oportunidad de aprender. Además, tu pronunciación es perfecta para hispanohablantes

mohak Agarwal says:

