THE FUTURE IS HERE

Comparing machine learning models in scikit-learn

We’ve learned how to train different machine learning models and make predictions, but how do we actually choose which model is “best”? We’ll cover the train/test split process for model evaluation, which allows you to avoid “overfitting” by estimating how well a model is likely to perform on new data. We’ll use that same process to locate optimal tuning parameters for a KNN model, and then we’ll re-train our model so that it’s ready to make real predictions.

Download the notebook: https://github.com/justmarkham/scikit-learn-videos
Quora explanation of overfitting: http://www.quora.com/What-is-an-intuitive-explanation-of-overfitting/answer/Jessica-Su
Estimating prediction error: https://www.youtube.com/watch?v=_2ij6eaaSl0&t=2m34s
Understanding the Bias-Variance Tradeoff: http://scott.fortmann-roe.com/docs/BiasVariance.html
Guiding questions for that article: https://github.com/justmarkham/DAT8/blob/master/homework/09_bias_variance.md
Visualizing bias and variance: http://work.caltech.edu/library/081.html

WANT TO GET BETTER AT MACHINE LEARNING? HERE ARE YOUR NEXT STEPS:

1) WATCH my scikit-learn video series:
https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A

2) SUBSCRIBE for more videos:
https://www.youtube.com/dataschool?sub_confirmation=1

3) JOIN “Data School Insiders” to access bonus content:
https://www.patreon.com/dataschool

4) ENROLL in my Machine Learning course:
https://www.dataschool.io/learn/

5) LET’S CONNECT!
– Newsletter: https://www.dataschool.io/subscribe/
– Twitter: https://twitter.com/justmarkham
– Facebook: https://www.facebook.com/DataScienceSchool/
– LinkedIn: https://www.linkedin.com/in/justmarkham/