Recognizing handwritten text is a problem that can be traced back to the first automatic machines that needed to recognize individual characters in handwritten documents. Think about, for example, the ZIP codes on letters at the post office and the automation needed to recognize these five digits. Perfect recognition of these codes is necessary in order to sort mail automatically and efficiently. Included among the other applications that may come to mind is OCR (Optical Character Recognition) software. OCR software must read handwritten text, or pages of printed books, for general electronic documents in which each character is well defined.
The Digits Dataset
The scikit-learn library provides numerous datasets that are useful for testing many problems of data analysis and prediction of the results. Also in this case there is a dataset of images called Digits.
This dataset consists of 1,797 images that are 8x8 pixels in size
Let's start with importing the dataset from scikit-learn:
We can visually check the contents of this result using the matplotlib library.
The dataset is a training set consisting of 1,797 images
The Digits data set of scikit-learn library provides numerous data-sets that are useful for testing many problems of data analysis and prediction of the results. Some Scientist claims that it predicts the digit accurately 95% of the times. Perform data Analysis to accept or reject this Hypothesis.
An estimator that is useful in this case is sklearn.svm.SVC, which uses the technique of Support Vector Classification (SVC).As it works with image datasets better compared to other machine learning algorithms.
Support vector machine is one of the simple algorithms that every machine learning expert should have used in his/her arsenal. The Support vector machine is highly preferred by many as it produces significant accuracy with less computation power. Support Vector Machine, abbreviated as SVM can be used for both regression and classification tasks. But, it is widely used in classification objectives.
What is Support Vector Machine?
from sklearn import svm
svc = svm.SVC(gamma=0.001, C=100.)
once we have defined a predictive model, we must instruct it with a training set, which is a set of data in which we already know the belonging class
Given the large number of elements contained in the Digits dataset, we shall certainly obtain a very effective model, i.e., one that’s capable of recognizing with good certainty the handwritten number.
This dataset contains 1,797 elements, and so you can consider the first 1,750 as a training set and will use the remaining as a validation set. We can see in detail a few of these remaining handwritten digits by using the matplotlib library:
Training the model :
Now we can train the svc estimator that we have defined earlier on the training data.
We can see that the svc estimator has learned correctly. It is able to recognize the handwritten digits, interpreting correctly all digits of the validation set.
In the above case, we have got 100% accurate predictions, but this may not be the case at all times.
We will be running for at least 3 cases, each case for the different range of training and validation sets.