Image Recognition and Python Part 1

Share it with your friends Like

Thanks! Share it with your friends!


Sample code for this series:

There are many applications for image recognition. One of the largest that people are most familiar with would be facial recognition, which is the art of matching faces in pictures to identities. Image recognition goes much further, however. It can allow computers to translate written text on paper into digital text, it can help the field of machine vision, where robots and other devices can recognize people and objects.

Here, our goal is to begin to use machine learning, in the form of pattern recognition, to teach our program what text looks like. In this case, we'll use numbers, but this could translate to all letters of the alphabet, words, faces, really anything at all. The more complex the image, the more complex the code will need to become. When it comes to letters and characters, it is relatively simplistic, however.

How is it done? Just like any problem, especially in programming, we need to just break it down into steps, and the problem will become easily solved. Let's break it down!

First, we know we want to show the program an image, and have it compare it to patterns that it knows to make an educated guess on what the current image is. This means we're going to need some "memory" of sorts, filled with examples. In the case of this tutorial, we'd like to do image recognition for the numbers zero through nine. So we'd like to be able to show it any random 2, and have it know the image to be a 2 based on the previous examples of 2's that it has seen and memorized.

Next, we need to consider how we'll do this. A computer doesn't read text like we read text. We naturally put things together into a pattern, but a machine just reads the data. In the case of a picture, it reads in the image data, and displays, pixel by pixel, what it is told to display. Past that, a machine makes no attempt to decide whether it is showing a couch or a bird. So, our database of what examples are will actually be pixel information. To keep things simple, we should probably "threshold" the images. This means we store everything as black or white. In RGB code, that's a 255, 255, 255, or 0, 0, 0. That is per pixel. Sometimes there is alpha too! What we can then do is take any image, and, if the pixel coloring is say greater than 125, we could say, this is more of a "white" and convert it to 255 (the entire pixel). If it is less than 125 or equal to it, we could say this is more of a "black" and convert it to black. This might be problematic in some circumstances where we have a dark color on a darker color, usually a type of image meant to fool machines. We could have something in place instead to find the "middle" color on average for the current image, and threshold anything lighter to white and anything darker to black. This works very well for two-dimensional images of things like characters, but less well for things with shading that are meant to accompany the image, say of something like a ball.

Once we've done this, all we need to do is save the string of pixel definitions for a bunch of "example" texts. We can start with a bunch of fonts, plus some hand drawn examples. There are data dumps of a bunch of examples. This is an example of "training" our data.

If we have a decently sized database, then we are ready to try to compare some numbers. A good idea would be to hand-draw an example for your program to compare to. To compare, we'd just simply do the same thing to the question-image. We'd threshold the image into black or white pixels, then we take that pixel list, and compare it to all of our examples. In the end, we will have so many possible "hits." Whichever character has the most "hits" is likely to be the correct one. Done, we've recognized that image.

If you think about it, this is actually very similar to how we humans recognize things. Naturally, many children do not immediately distinguish between couches and love seats. What is the difference many of them ask. There is a bit of a grey area between them, and they have many similarities. Generally, a lot of learning comes by example. After seeing hundreds of couches, thousands of chairs, and hundreds of love-seats, a person soon begins to easily distinguish between them, because they have quite a bit of sample data to compare to. This is even how we read text. A number 5 really does mean nothing to a baby. They only begin to learn what a number 5 is as they are shown it over and over, being told it is "5." Eventually, they understand that to be a 5, and they can see 5 in multiple font types and still recognize it to be a 5.


mohammed adnan says:

I want to write a function through the function get image from a specific path for a particular image in the computer

and the output well be a table as following

the first column will be the colors in the picture(in hexadecimal code)

and the second column will be how many times is repeated in the image .

Sharath Pawar says:

how to install all these packages they seems to be .whl packages

Gurubux Gill says:

Cheers Sentdex!
Syllabus –
1. Introduction and Dependencies

2. Understanding Pixel Arrays

3. More Pixel Arrays

4. Graphing our images in Matplotlib

5. Thresholding

6. Thresholding Function

7. Thresholding Logic

8. Saving our Data For Training and Testing

9. Basic Testing

10. Testing, visualization, and moving forward

Naimish Keswani says:

y python 2 not 3??

Zon Lom says:

I want to write a program that does video processing: counting plants in a row. Any ideas on where to start?

Casper Lind says:

didn't work

Umesh.B.Zagade says:

I need your help , Please give your thoughts.
By using Pyhton I want to know person behavior and personality taking input of his handwriting images.

Moeen Ahmad says:

Hy Hop you will be good
Sir I have anaconda 3 with python 3.6 and l am trying to import openCV on it but unable to import
Kindly guide me for this

Gaurav Misra Sankar says:

Why use 2.7 and not 3+ versions please ? Can you kindly explain why. Thank you!


I am not able to import any of these three libraries. I've downloaded them from the website mentioned in the video. can anyone help?

k2datrack says:

Hi Sendtex, i'm new at python and 5 years late to this video. I noticed the pillow versions have changed. Will they still be useful to the tutorial?

Sujith Palanivasagam says:

For those who are facing problems with the command: 'import PIL' , I have found the solution!
Just type 'pip install Pillow' in the shell

Nil Saha says:

Sorry, ignore my last message. Got images.

Nil Saha says:

Can't access your image data.

Movie Craze says:

hey i'm trying to run a code and getting an error '' C:bldopencv_1506447021968workopencv-3.3.0modulesimgprocsrccolor.cpp:10638: error: (-215) scn == 3 || scn == 4 in function cv::cvtColor'
please help me with this..
Here's my code:

import cv2
import numpy as np

#recognizer = cv2.face.createLBPHFaceRecognizer()
recognizer = cv2.face.LBPHFaceRecognizer_create()

FaceDetect = cv2.CascadeClassifier('haarcascade_frontalface_alt.xml')
cap = cv2.VideoCapture(0)"recognizer\trainingData.yml")
#font =cv2.InitFont(,4,1,0,4)
while 1:
ret,img =
gray = cv2.cvtColor(img,cv2.COLOR_RGB2GRAY) <<<ERROR IS ON THIS LINE
faces = FaceDetect.detectMultiScale(gray,1.3,5)
for (x,y,w,h) in faces:
if cv2.waitKey(1) & 0xFF==ord('q'):

Hageregna/ሀገረኛ tube says:

why not python 3.6 ? bro

Ranjit Singh says:

Hi, great videos, i have a problem, every time i press F5, it goes to "GUI" just fine, but when i click edit to show the image it give me an long error saying:

Exception in Tkinter callback
Traceback (most recent call last):
File "", line 1699, in __call_
return self.func(*args)
File "C:/Users/RSandhu/AppData/Local/Programs/Python/Python36-32/", line 38, in showImg
load ='download.jpg')
File "", line 2530, in open
fp =, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'download.png'

Please help, what does it mean by No such file or directory: 'download.png' when it is in my folder, it give me this error everytime when i try to display an image.

Ninkambazi Elitwaza says:

why not downloading python 3 while i got documents that ocr is also supported br python3

harshit srivastava says:

does this neural networks?

Aerika Kapadia says:

for image processing in python than which version is best 2.7 or 3.5???

Gurpreet Sidhu says:

why we are not usiing python 3?

Luke Schroeder says:

Hello Sendex, I have been watching your videos for a long time and absolutely LOVE them! You are a great teacher and have been a huge help in my python programming experience. However, for this specific video, I have been having some problems. And before I get any, "Just google it", I would like you to know that I have indeed googled this problem and couldn't find an answer anywhere. My problem is that whenever I type in the command, "from PIL import Image", it either gives me the error, "ImportError: cannot import name Image", or, "ImportError: cannot import name _imaging". I am running the 64 bit version python 2.7.13. Any help would be much appreciated:)

Eduardo Zúñiga says:

Hi! Do you recommend using Anaconda?

Write a comment