An Efficient OCR System based on the Regional Feature using the ASVM as Classifier

In Image Processing, sometimes due to poor handwriting, the writer left some gap between diacritics and character or between diacritics and header line due to which small text blocks gets created which leads to improper text line segmentation and hence leads to wrong results and overlapping. As a result accuracy of the algorithm degrades. In proposed work Adaptive SVM will be used to improve accuracy of the system.


INTRODUCTION:
Optical character recognition which is also commonly known as optical character reader is the process of converting the mechanical and electronic images into the handwritten, printed text etc.OCR is a course by which focused software is used to change the skimmed pictures of manuscript to electronic text so that digitized data can be examined, indexed and recovered. The OCR are basically design to settled and improved the multiple real world applications such as mining data from business documents, checks, passports, invoices, bank statements, insurance documents, license plates etc. Each and every application contains the processing data sets that contains the hundreds and thousands of scanned documents of the images in order to train and enhance the systems and the processing of the drill data set is naturally done by humans in order to provide accurate data that can be used by the engine to learn and apply, makes it smarter. OCR is mainly used as a form of information entry from printed paper data records, whether passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. OCR is mostly used in the area of the computer vision, artificial intelligence, and pattern recognition. Handwriting text recognition (HTR) can be defined as the ability of a computer to transform handwritten input represented in its spatial form of graphical marks into equivalent symbolic representation as ASCII text. Usually, this handwritten input comes from sources such as paper documents, photographs or electronic pens and touchscreens.

Fig.1 Handwritten document Convert into Text
Approaches for learning Optical character recognition:-The following are the approaches for learning Optical character recognition.

Support Vector Machine:
SVM is a non-linear classifier which is now mostly new in the machine learning which is used to solve the texture classification and pattern recognition problems.SVM is designed to work with only two classes by determining the hyper plane to divide two classes and the separation of these classes is performed by using different Kernels.

PROPOSED METHODOLOGY
OCR involved various steps to read the characters from a scanned Image. In proposed research, a model has been built for handwritten images. The system extracts the characters from handwritten images and writes into text file.

Fig.3 Steps of OCR System
Data Acquisition: Most Important initial phase in OCR is to gather the image from either device sensor like PDA or tablets in case on online recognition or getting the images containing characters directly for offline recognition. The image should have a specific format such as JPEG, BMP etc. Pre Processing: The goal of pre-processing is to simplify the pattern recognition problem without missing any vital information. It reduces the noises and inconsistent data. It enhances and prepares it for the next steps. Segmentation: Segmentation is an integral part of any text based recognition system. It assures efficiency of classification and recognition. Accuracy of character recognition heavily depends upon segmentation phase. Normalization: The results of segmentation process provides isolated characters which are ready to pass through feature extraction stage, thus the isolated characters are reduced to a specific size depending on the methods used. The segmentation process essentially renders the image in the form of m*n matrix. objects/alphabets to form a feature vectors. These feature vectors is then used by classifiers to recognize the input unit with target output unit Feature extraction methods are based on 3types of features: ➢ Statistical ➢ Structural ➢ Global transformations and moments

Classification:
The results Classification is the last stage where we train the neural net using the feature vectors obtained during feature extraction method against the required targets Post Processing: The goal of post processing is the incorporation of context and shape information in all the stages of OCR systems is necessary for meaningful improvements in recognition rates.

Feature extraction:
The following is the feature matching and classification algorithm for matching the extracted plant disease image with the different images of same plant, which are taken at different times, from different viewpoints, or by different sensors.

RESULT AND DISCUSSION:
No. Images Real Text Recognized Text Accuracy (%)

1
The electrical resistance of an electrical conductor is a measure of the difficulty to pass an electric current through that conoctor. The future is what will happe in the time after the present.Its arrival is considerd inevitable due to the existence of time and the laws of physics.
The future is what will happe in the time after the present its arrival is considerd inevitable oue to the extstence of time and the laws of physics.

Fig.4 Accuracy of proposed work
In above figure, the accuracy of proposed work is represented in the form of graph. In graph, X-axis denotes the number of samples which are included in the proposed work for the testing and Y-axis denotes the accuracy of proposed work in percentage. Form the above graph; it has been observed that the average percentage of accuracy is more than 94% with handwritten images. In above figure, the accuracy comparison between artificial neural network (ANN), support vector machine (SVM) and adaptive support vector machine (ASVM) is represented in the form of bar graph. From the figure we observe the accuracy of proposed character recognition system with ASVM is better than ANN and SVM classifier due to the best training.

Comparison between Classifier
Conclusion: Due to overlapping and touching of characters, there remains no significant gap between the text lines and hence two or more text lines comes in a same text block which leads to wrong results. The main focus in this research project is to experiment deeply with, and find alternative solutions to the image segmentation and character recognition problems within the Overlapped Character Recognition. In the existing work, SVM classifies is applied but it has less accuracy. So in future, Adaptive SVM will be applied to improve better accuracy of the system.

Future work:
In future, we can use the artificial neural network along with the optimization algorithm to achieve the better results by minimizing the more noisy data from the images for the character recognition system. The combination of the artificial neural network as classifier instead of SVM with optimization technique the precision of character recognition will have to increase and the rate of noise will decreases.