## Comparing machine learning models in scikit-learn

We've learned how to train different machine learning models and make predictions, but how do we actually choose which model is "best"? We'll cover the train/test split process for model evaluation, which allows you to avoid "overfitting" by estimating how well a model is likely to perform on new data. We'll use that same process to locate optimal tuning parameters for a KNN model, and then we'll re-train our model so that it's ready to make real predictions.

Download the notebook: https://github.com/justmarkham/scikit-learn-videos

Quora explanation of overfitting: http://www.quora.com/What-is-an-intuitive-explanation-of-overfitting/answer/Jessica-Su

Estimating prediction error: https://www.youtube.com/watch?v=_2ij6eaaSl0&t=2m34s

Understanding the Bias-Variance Tradeoff: http://scott.fortmann-roe.com/docs/BiasVariance.html

Guiding questions for that article: https://github.com/justmarkham/DAT8/blob/master/homework/09_bias_variance.md

Visualizing bias and variance: http://work.caltech.edu/library/081.html

Having problems with the code? I just finished updating the notebooks to use

Would you please clarify why we need to use `solver='liblinear'` as one of the parameters in LogisticRegression model. Why we assume rest of the parameters as default ? Also, why we import `metrics` from `sklearn` to have the score function work to compute accuracy where as we can simply make use of the `score` function straight from the model `LogisticRegression` that we imported?

Understanding the Bias-Variance Tradeoff:this doc has links to ensure sites have a check in Fig.1

sklearn.cross_validation has renamed as sklearn.model_selection

Can we say that KNN would overfit as K values get smaller?

I am getting the error below when I try to use the LogisticRegression model. "ConvergenceWarning: lbfgs failed to converge (status=1):

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT." Anyone who knows how I can resolve it?

from sklearn.linear_model import LogisticRegression

logreg=LogisticRegression()

logreg.fit(X,y)

logreg.predict(X)

Why is the above code showing the following warning?

C:UsersASUSanaconda3libsite-packagessklearnlinear_model_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:

https://scikit-learn.org/stable/modules/preprocessing.html

Please also refer to the documentation for alternative solver options:

https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression

n_iter_i = _check_optimize_result(

