Azure, Cloud Computing, Internet of Things (IoT)

4 Mins Read

What is OCR? How can a Beginner Implement it using Azure Cognitive Services?

Voiced by Amazon Polly

What is OCR?

OCR stands for Optical Character Recognition. It is the process of identifying alphanumeric characters in an image. The following steps outline the procedure for OCR:

  1. Obtain an image
  2. Perform pre-processing on the image
  3. Apply algorithm for character recognition
  4. Post-processing

Images can be obtained using scanning tools or cameras. While scanning tools such as scanners preserve the exact layout when done correctly, camera-based images tend to skew the dimensions and positions of characters and words due to parallax. These issues can be alleviated through pre-processing.

ocr_printed_text

Figure 1: Printed text offers a less complex image to operate on

Pre-processing of images is mostly done to ensure that computer systems have an easier time identifying characters in an image. A wide range of pre-processing algorithms can be implemented based on requirements. Some of these algorithms include

  • De-skew
  • Line removal
  • Layout analysis
  • Normalization

Although pre-processing is mainly conducted to enhance the image, incorrectly applying such filters can pose a threat to the validity of the data collected.

The next step following pre-processing is that of the actual character recognition. One of the most basic algorithms of character recognition is pattern matching. Here, the image is compared to a stored sample (glyph) and compared on a pixel-by-pixel basis. However, this procedure becomes invalid when it comes to handwritten text.

To overcome the hurdles posed by handwritten text, algorithms can be designed to extract and match features rather than pixels. Feature extraction also reduces dimensionality thus improving efficiency.

ocr_handwritten_patterns

Figure 2: Patterns in the handwritten text are extremely diverse and can pose a real problem to machine learning algorithms

The confidence level is a metric that showcases how “optimistic” the algorithm is of its prediction. This confidence level can be improved through the standard fonts and font sizes. Apart from these four basic steps, OCR accuracy can be enhanced through the implementation of application-specific optimizations.

Customized Cloud Solutions to Drive your Business Success

  • Cloud Migration
  • Devops
  • AIML & IoT
Know More

OCR on Azure

OCR is a complex and tedious endeavor that requires extensive domain knowledge and experience. Speaking from my perspective, I have come from a general Computer Science background. As a result, I did not have official technical backing in OCR.

However, with the help of Azure’s Cognitive Services, OCR is possible for novice members of the field as well as newbie programmers. The service uses a simple REST interface that imbues an aura of familiarity as well as increases ease of use.

OCR on Azure is made available as a sub-service of the Computer Vision API. As such, to implement Microsoft’s OCR service, one needs to obtain a key on Azure. Trial keys can be easily obtained for free to test the OCR service.

Regions

Similar to the LUIS service, Computer Vision is not readily available in every region yet. The following regions offer the Computer Vision service:

  • East US
  • East US 2
  • South Central US
  • West US
  • West Central US
  • West US 2
  • North Europe
  • West Europe
  • East Asia
  • Southeast Asia
  • Brazil South
  • Australia East

Pricing

Unlike LUIS, the Computer Vision service is offered in a variety of pricing tiers:

Tier Features Unit Price
Free Transactions 5k/month
S1 10 transactions/s Transactions

0-1 mil – INR 66/1k transactions

1-5 mil – INR 52.8/1k transactions

>5 mil – INR 41.97/1k transactions

S2 10 transactions/s Transactions

0-1 mil – INR 99.15/1k transactions

1-5 mil – INR 66.10/1k transactions

>5 mil – INR 42.97/1k transactions

S3 10 transactions/s Transactions INR 165.25/1k transactions

The Computer Vision API differentiates between OCR for printed text and OCR for handwritten text. As such, different nested routes are implemented for each process. As discussed earlier, printed text is far simpler to perform analysis on. This is mostly attributed to the standard font clear distinction between background and foreground. However, more complicated imagery may place some examples of printed OCR into the handwritten OCR segment. Handwritten text poses immense problems due to its varied nature. Individuals tend to attach their own flair to personal content based on environmental and mental factors that lead to a diverse set of possibilities for every character in the English alphabet.

The REST API for printed text can be accessed by using the following URL:

  • https://{region-name}.api.cognitive.microsoft.com/vision/{version}/ocr

The above URL takes data in the form of a POST request with the image in the body (binary data). Images can also be supplied through a URL. The response is in the form of a JSON object with the analysis and corresponding confidence score.

OCR for handwritten text is slightly different. Due to the complexity involved in handwritten text recognition, the request is accepted, but not immediately processed. As such, Azure sends a 202 response with the operations ID. This ID needs to be continuously polled to check the status of the operation. Once the status indicates success, the output of the OCR operation can be pulled. The URL for handwritten text is as follows:

  • https://{region-name}.api.cognitive.microsoft.com/vision/{version}/recognizeText

Competition

When compared to other cloud-based OCR services, Microsoft’s Computer Vision API does not offer anything above the ordinary. The Redmond giant’s services are fairly cost-effective and offer a simpler interface that does not scare beginners away.

Although IBM did offer an OCR service in the form of Watson, the American tech company has since closed it off under the banner of private beta. This leaves Google as Microsoft’s one true competitor.

From my experience, I have noticed that Microsoft’s Computer Vision service offers a more straightforward approach with slightly better accuracy based on the sample set that I have used.

If concepts such as computer vision and OCR interest you, have a look at the IoT courses that we have to offer. We showcase the possibilities of computer vision and what the future holds for such technologies concerning IoT. Check out our IoT courses:

  1. Fundamentals of IoT – Level 1
  2. Working with Electronics in IoT – Level 2
  3. Cloud Robotics and Advanced IoT Architecture – Level 3
  4. Designing and Implementing IoT Solutions – Level 4

Let us know what tools and services you use when it comes to OCR in the comments section. We’d love to hear about services we may have missed.

Get your new hires billable within 1-60 days. Experience our Capability Development Framework today.

  • Cloud Training
  • Customized Training
  • Experiential Learning
Read More

About CloudThat

CloudThat is also the official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft gold partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best-in-industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.

Drop a query if you have any questions regarding OCR (Optical Character Recognition) and I will get back to you quickly.

To get started, go through our Consultancy page and Managed Services Package that is CloudThat’s offerings.

WRITTEN BY CloudThat

CloudThat is a leading provider of cloud training and consulting services, empowering individuals and organizations to leverage the full potential of cloud computing. With a commitment to delivering cutting-edge expertise, CloudThat equips professionals with the skills needed to thrive in the digital era.

Share

Comments

  1. Sekhar

    Dec 20, 2018

    Reply

    Please let me know, can we read the text from the image at a particular location. (Ex. From invoice format I want to read InvoiceNo and Total. Is it possible)
    This is possible using template in other OCR engines like LeadTools and OmniPage.
    But I want the feature with Cognitive servers.

    • Sangram Rath

      Dec 22, 2018

      Reply

      Sekhar, please take a look at the Computer Vision API of Azure. I believe it can.

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!