Utilizing ANNs More Efficiently
It has come to my attention that GOCR has its shortcomings.
The problem is that small adjustments in pixels from surrounding noise cause it to recognize h’s as b’s and little things like that. Most of these OCR packages were never designed to learn new alphabets. If you open the source code to OCRAD up you’ll see that it breaks the letters down into a list of features which are then used to assign a probability as to the most likely letter. None of it is trained. It is all pre-planned and hard coded into the program. I’m not 100% sure how GOCR training works but I think it happens at the pixel level.
Now a while back I did a post on training a neural network at the pixel level to recognize characters. It took a long time because it was php (please use the C++ libraries for training unless you have a lifetime to train the neurons) but it started to work, although not as accurately as GOCR. The problems were obvious things like h’s get picked up as b’s again. You can see why the neurons failed to recognize the character.
The nice thing about neural networks is they’re pretty simple to use once you get to grips with the number of layers you might want and so on. The other nice thing is that we can stack them together in a similar way to how a full adder works. I.e. you can pass a carry flag from one adder to the next.
So the reason our neural nets are failing is because the pixels differ a little here and there and without some knowledge about exactly how a letter is formed it’s difficult to know which letter it is. So if we take a step back from the pixel level there’s a couple of things we can analyze. We can look for hills & valleys. Like an ‘n’ has a space inside it. If we calculate the base of the text we can also identify if the word has lines that dip below the baseline or go very high above it. Just using these features we can train a neural net to guess at a range of characters. Then we can feed the output into our second neural net which works at pixel level or concatenates the output of a couple of other nets.
The idea behind this is that feature extraction is a proven technique that gets very good results until the characters deviate from the norm. Using multiple nets we should be able to combine the ability to train a new alphabet with the power of feature extraction.
Friday, July 11th, 2008
