MIT 6.034 Artificial Intelligence, Fall 2010

View the complete course: http://ocw.mit.edu/6-034F10

Instructor: Patrick Winston

Can multiple weak classifiers be used to make a strong one? We examine the boosting algorithm, which adjusts the weight of each classifier, and work through the math. We end with how boosting doesn't seem to overfit, and mention some applications.

License: Creative Commons BY-NC-SA

More information at http://ocw.mit.edu/terms

More courses at http://ocw.mit.edu

This is such a clear path to understanding. Thank you, Prof.Winston.

Right aisle : 2:38 He's exited.

Handwriting model

At 16:25 doesn´t the orange line at the bottom symbolize the exact same thing as the orange line at the very left? Both say "Everything is +" or "Everything is -". And then we don´t have 12 classifiers but only 10.

Why is a coin flip a weak classifier if p1>p2 with p1+p2=1? 0.5×p1+ 0.5×p2 still is 0.5.

I didn't get the part where new weights are scaled to 1/2 what good does it do ?

A comment on the transcription: A lot of times when it is transcribed [Inaudible], he says "Schapire", which is the inventor of Boosted learning (Robert Schapire)

Thanks for amazing lecture!

an amazing lecture ive enjoyed every second.

question: would this work well for classification with very unbalanced data set?

minority class at about 1 percent

Why is there a sheep on the first row?

How do the stumps tighten back in?

Perfect teaching! Great job, Sir.

Way to go Doctor, the explanation is very clear and unique. I was just wandering if anyone has an idea what application was being used to demonstrate the algorithm.

The not overfitting thing is really mind blowing, because it seems to me like the VC dimensionality of the demonstarted classifier is infinity. I was about to write a question like this:

Does the volume of the space of which the classification result depend on an outlier decrease in any case, or are there cases (of low probability) in which they occupy more volume?

I guess that the volume decreases, if there are good samples around the outlier, and that the volume can stay large if the outlier lies far away from the subspace in that the good samples lie. If that holds, it is still unlikely to get test data points in that volume even if it stays large.

If somebody knows about this, please let me know

It's incredible that there are even empty seats in this lecture. Truly an amazing professor

What do you mean by "data exaggeration"?

so how does the program choose the number of classifiers to use?

Man, that straight line on the board in the beginning, what a pro

just awesome

i would like to thank you about your fantastic contribution in the all science &especially in computer field

Oh, wish I'd learn this in college. Close but not quite. Thanks MIT I guess

Phenomenal lecture. Easy to understand and as said before, great hand writing. Thanks for sharing, it is much appreciated 🙂

so to solve a data set, is there a program to 1st determine which neuro software is the correct choice to produce the correct results, KNN, SVM, Boosting, etc…?

Thank you for this lecture.

Last part of Thank God Hole is excellent explanation

switch speed to 1.25 😀

An outstanding teacher. I appreciate Dr Winston. He explains confusing stuff in a very simple way.

Great teacher!