Utilizing ANNs More Efficiently

It has come to my attention that GOCR has its shortcomings. :D The problem is that small adjustments in pixels from surrounding noise cause it to recognize h’s as b’s and little things like that. Most of these OCR packages were never designed to learn new alphabets. If you open the source code to OCRAD up you’ll see that it breaks the letters down into a list of features which are then used to assign a probability as to the most likely letter. None of it is trained. It is all pre-planned and hard coded into the program. I’m not 100% sure how GOCR training works but I think it happens at the pixel level.

Now a while back I did a post on training a neural network at the pixel level to recognize characters. It took a long time because it was php (please use the C++ libraries for training unless you have a lifetime to train the neurons) but it started to work, although not as accurately as GOCR. The problems were obvious things like h’s get picked up as b’s again. You can see why the neurons failed to recognize the character.

The nice thing about neural networks is they’re pretty simple to use once you get to grips with the number of layers you might want and so on. The other nice thing is that we can stack them together in a similar way to how a full adder works. I.e. you can pass a carry flag from one adder to the next.

So the reason our neural nets are failing is because the pixels differ a little here and there and without some knowledge about exactly how a letter is formed it’s difficult to know which letter it is. So if we take a step back from the pixel level there’s a couple of things we can analyze. We can look for hills & valleys. Like an ‘n’ has a space inside it. If we calculate the base of the text we can also identify if the word has lines that dip below the baseline or go very high above it. Just using these features we can train a neural net to guess at a range of characters. Then we can feed the output into our second neural net which works at pixel level or concatenates the output of a couple of other nets.

The idea behind this is that feature extraction is a proven technique that gets very good results until the characters deviate from the norm. Using multiple nets we should be able to combine the ability to train a new alphabet with the power of feature extraction.

7 Responses to “Utilizing ANNs More Efficiently”

  1. Laboratory Testing Says:

    Thats actually a very cool idea, I want to write a test program to see if that actually works.

  2. Harry Says:

    I’m just not sure of the best way to interlink them that’s all. Do you add it as an extra bunch of inputs with the pixel information or link two nets together… I haven’t tried the latter method but I reckon it might give the best results. Obviously you need to train 100,000 odd times but that’s just neural nets for ya.

  3. no-one Says:

    What you’re describing is hierarchical classification. You might want to look into boosting as well.

    I don’t see any reason you should use a neural net here over any other classifier. Look into some other classifiers, such as naive Bayesian classifies and Support Vector Machines, or even Bayesian networks. You’ll find that fast, free, open source solutions exist for all of them.

    If you want to get more complicated start looking at adaptive mixture models and expectation maximization.

  4. Harry Says:

    I guess I use neural networks because they’re simple and easy to understand as well as being pretty powerful. You can knock some code out pretty fast and it will do a lot. Several of those I haven’t even come across :D . I like hidden markov models too because they make a lot of probabilistic decisions on many different levels which is especially nice when you have something that conforms to certain boundaries which can be described by a markov process. But it takes a lot longer to code something that will use that technology.

    What surprises me is that although there are open source solutions for these bits and pieces, no one has fitted all the parts together. We have a choice in open source OCR between gocr, ocrad, and tesseract. The user doesn’t get much choice in software configuration. The ocr program either works or fails.

  5. no-one Says:

    I see your point. Perhaps I should work on a package for building your own OCR using your choice of feature extraction, choice of classifier, include some example training and test sets, etc. Think there’s a market?

  6. Harry Says:

    Yeah, a specialist niche market. I doubt there’s a huge number interested which is why we need to steal the guys from gocr camp and make them work on our projects open source style. If it did cursive offline handwriting recognition that would raise a few more eyebrows from people with different motives. Even if it was writer dependent.

    I’m currently just using ANNs for segmentation because I’m lazy. I like to staple ANNs together and leave them training in the background. I wonder if with enough of them stapled together testing for different parameters all trained on noisy input we could out perform some of the best ocr software with half the code. Don’t go overboard on the feature extraction just keep it to basic observations of different things and then link it with another ANN testing on another observation, and then just plug all the outputs into one final network. Steal someone’s layout analysis code from somewhere. Strip out their copyright notices and put it all together. Hire a used car salesman and sell it to people walking by who were only looking to buy stamps or something.

  7. Debt Reduction Services Says:

    hey, I love this blog - i will try and keep up with it!! please keep more coming :)I wish I could start a blog but I don’t have much time. I consider your advice as very valuable.

    Really Great. :) Thanks,

Leave a Reply

Enter this code