New Southern Engineering Enterprises Co.,Ltd. - OCR - The First Step of Integrating RPA with Artificial Intelligence

About Us

Home / About Us / Study / OCR - The First Step of Integrating RPA with Artificial Intelligence

About Us

2023/06/30

OCR - The First Step of Integrating RPA with Artificial Intelligence

OCR, or Optical Character Recognition, is a technology that has been around for nearly a century and is the foundation of various practical applications we encounter in our daily lives. You might have heard of handheld scanners, experienced license plate recognition in parking lots, or even been amazed by identity document recognition. Optical Character Recognition is the process of converting images or scanned files into text data. In today's era of information explosion, traditional paper-based processing is no longer sufficient, making OCR an essential component of digital transformation for both businesses and public sectors.

How computers interpret images?

Unlike human eyes, computers interpret images as a series of "numerical" values. Taking the example of Convolutional Neural Networks (CNN) algorithm, the processing involves converting the image into a two-dimensional array and applying filters to extract image features (similar to filters used in photo editing software to enhance image features). The algorithm then captures features pixel by pixel or in blocks and finally converts them into a one-dimensional array for computational analysis and interpretation.

Different types of data require specific image algorithms and processing procedures. For example, YOLO (You Only Look Once) is commonly used for object detection, while FaceNet is popular for facial recognition. In the context of OCR, we utilize CNN + RNN (Recurrent Neural Networks) for text recognition, text analysis, and semantic classification.

In different application domains, specific image preprocessing procedures are employed. For example, photos captured by digital cameras often undergo "denoising" to reduce the interference of image noise and enhance the accuracy of subsequent feature extraction. Documents produced by scanners may require "skew correction" to rectify any skew introduced during the scanning process.

To minimize color interference, improve computational speed, and better extract key features from images, grayscale processing is typically applied. This involves converting the image to shades of gray to achieve a uniform representation of colors with different degrees of gray. Subsequently, "binarization" is applied to convert the image into a binary representation with only black and white colors, effectively distinguishing the background from the text within the image.

OCR Character Recognition

After the image has undergone preprocessing and text has been separated, the next step is to begin comparing the text. In the past, OCR often relied on a technique called "template matching," which involved comparing the extracted text from the image with pre-existing templates and calculating the highest likelihood match. Since it relied on matching pre-prepared templates, this type of algorithm worked well for printed fonts that were neat and consistent.

As the number of font types grew, developers realized that the flexibility of template matching was insufficient to cope with this trend. This led to the emergence of "feature extraction" techniques. Unlike template matching, this algorithm focuses on finding specific features of each character. Even in different fonts, as long as the features fall within a certain range, the characters can be accurately recognized. This not only provides OCR with greater flexibility but also reduces the impact of low-quality images.

However, when it comes to forms, signatures, and handwritten fonts, the complexity of recognition increases due to variations in fonts, writing tools, paper texture, and other factors. Therefore, modern OCR technology incorporates machine learning to train computers. Machine learning-based OCR can automatically learn useful features from a large amount of image data, enabling it to possess reading capabilities similar to humans. Compared to traditional OCR methods, machine learning-based OCR is more adept at handling various complex scenarios and benefits from the computational power of modern GPUs, resulting in unparalleled processing speed. It is widely applied in most OCR software packages today.

Integration of OCR in Various Industries

In industries that deal with a large volume of image or document processing, such as finance, food, and manufacturing, the organization's level of digitization and efficiency can be quickly assessed by their processing methods.

Extracting text from image files alone does not constitute a complete "digital transformation" as there are still tedious tasks such as input, calculation, and verification to be carried out. Moreover, human errors are also a concern. These repetitive and time-consuming operations are suitable for "RPA (Robotic Process Automation) bots."

RPA bots have become one of the most widely adopted technologies by global enterprises in recent years. Although referred to as robots, they lack the concept of artificial intelligence and can only follow predefined steps for processing. When combined with OCR technology, RPA bots are empowered with visual intelligence, enabling them to interpret unstructured images and documents. Here are some

examples of their applications in various fields:

1. Automated retrieval of account balances in online banking by using OCR to recognize login verification codes.

2. Automated input of policy information by using OCR to recognize the contents of scanned policy documents.

3. Automated extraction of invoice information by using OCR to recognize the contents of scanned invoices.

The First Step of RPA Integrating Artificial Intelligence

"Digital transformation" has become a hot topic of concern for numerous enterprises and public sectors. It is a key trend that thrives in the wave of digitalization. While implementing RPA bots can optimize human resources and streamline cumbersome workflows, there is still significant room for "digital optimization" behind it. By integrating artificial intelligence into the equation, these virtual employees can possess a broader range of business processing capabilities, serve through diverse application channels, and operate in more flexible ways.

If you are facing a large volume of document conversion tasks and are concerned about optimizing the efficiency of your human resources, introducing an RPA bot combined with OCR is the crucial step towards executing digital optimization. It will also be the next necessary step towards embarking on your digital transformation journey.