Skip to content

Bengali Graphemes

Understanding the problem

Optical character recognition is particularly challenging for Bengali or any of the member of alpha-syllabary family of languages. One of the prime reasons behind this is due to non-linear characteristics of alpha-syllabary languages. A quick comparision of various orthographies can be seen below.

One of the way to solve this problem is to form a way to linearize the structure of detection of graphemes and this can help us improve the OCR. A visual representation of graphemes can be seen in the image below.


Different vowel diacritics (green) and consonant diacritics (red) used in Bengali orthography. The placement of the diacritics are not dependent on the grapheme root.

we have the following distribution of graphemes in Bengali.

Number of unique grapheme roots: 168
Number of unique vowel diacritic: 11
Number of unique consonant diacritic: 7

Solutions

OCR pass information in a linear manner (From left-to-right or right-to-left), so alpha-syllabary languages has to be converted to a linear representations for implementing convolution based OCR. Once this is achieved, we can approach this problem with multihead convolutional network.

My main motivation was to prepare a single inference model performing best. I experimented with couple of convolution based models. A comparision of various models have been shown in the results table below.

Results

Models are evaluated using a hierarchical macro-averaged recall. First, a standard macro-averaged recall is calculated for each component (grapheme root, vowel diacritic, or consonant diacritic). The final score is the weighted average of those three scores, with the grapheme root given double weight.

Model name Validation score Public score Private score
Root Vowel Dicritic Consonant Dicritic Overall
EfficientNet B1 0.971 0.988 0.978 0.977 0.9638 0.9385
EfficientNet B2 0.921 0.978 0.987 0.952 0.9615 0.9245
Resnet18 0.951 0.982 0.978 0.966 0.9599 0.9223