Last post I said to separate characters you simply need to flood fill them and calculate the extreme points to find out where to fit your rectangle around. Hahaha. I’ve spent the last few days trying to port optimized floodfill functions to php. Normally I’d just take the sane easy option and use pre-written code like GOCR but apparently I like pain.
The problem is the optimized bit. I can write a simple recursive floodfill function that calls itself until it’s done but I have no idea how to write something that will be reasonably fast. The more complicated captchas will require a fair bit of speed because you will be thinking about cracking a fair few of them in a period of time. This site is where I eventually found a simple routine that worked. My issue was I used another routine, ported it, and then it broke. Miserably. After filling only two lines.
Here is my code. First it loads in an image of a captcha. It then scans two lines along the horizontal axis, one at 1/4 of the way down the picture and one at 3/4 of the way down. When it hits a character it floodfills it. The floodfill function returns the extreme positions of pixels which gives us a rectangle around that letter.
<?php
function floodFillScanlineStack($image, $x, $y)
{
// the colour we are shading in - black letters
$oldColour = 0;
// the colour the want to shade the letters in - red just because we can
$fillColour = imagecolorallocate($image, 255, 0, 0);
// we need the image width & height
$w = imagesx($image);
$h = imagesy($image);
// set the rectangle co-ords
$rectangle = array(”x1″ => $x, “x2″ => $x, “y1″ => $y, “y2″ => $y);
if($oldColour == $fillColour) return;
$stack = array();
$stack[] = array(”x” => $x, “y” => $y);
while(count($stack)>0)
{
$pos = array_pop($stack);
$x = $pos[’x'];
$y = $pos[’y'];
$y1 = $y;
while($y1 >= 0 && imagecolorat($image, $x, $y1) == $oldColour) $y1–;
$y1++;
$spanLeft = 0;
$spanRight = 0;
while($y1 < $h && imagecolorat($image, $x, $y1) == $oldColour )
{
// here we set the pixel colour
// use these to find our rectangle around the letter
imagesetpixel($image, $x, $y1, $fillColour);
if($x<$rectangle[’x1′])
$rectangle[’x1′] = $x;
if($y1<$rectangle[’y1′])
$rectangle[’y1′] = $y1;
if($x>$rectangle[’x2′])
$rectangle[’x2′] = $x;
if($y1>$rectangle[’y2′])
$rectangle[’y2′] = $y1;
if($spanLeft==0 && $x > 0 && imagecolorat($image, $x - 1, $y1) == $oldColour)
{
$stack[] = array(”x” => $x - 1, “y” => $y1);
$spanLeft = 1;
}
else if($spanLeft==1 && $x > 0 && imagecolorat($image, $x - 1, $y1) != $oldColour)
{
$spanLeft = 0;
}
if($spanRight==0 && $x < $w && imagecolorat($image, $x + 1, $y1) == $oldColour)
{
$stack[] = array(”x” => $x + 1, “y” => $y1);
$spanRight = 1;
}
else if($spanRight==1 && $x < $w && imagecolorat($image, $x + 1, $y1) != $oldColour)
{
$spanRight = 0;
}
$y1++;
}
}
return $rectangle;
}
function floodfill_char($image, $x)
{
if((imagecolorat($image, $x, 12)==0))
return floodFillScanlineStack($image, $x, 12);
if((imagecolorat($image, $x, 38)==0))
return floodFillScanlineStack($image, $x, 38);
}
function split_chars_along_vertical($image)
{
$w = imagesx($image);
$h = imagesy($image);
/* $rgb = imagecolorat($img, $x, $y);
$r += $rgb >> 16;
$g += $rgb >> 8 & 255;
$b += $rgb & 255; */
// scan along each verical line looking for black pixels
// we’ll only scan two lines of pixels to save time. Both along the center
// split slightly apart
$letters = array();
for($index=0; $index<$w; $index++)
{
// check two lines of pixels one at 12 down, one at 38 down
// the picture is 50 pixels tall by the way
if((imagecolorat($image, $index, 12)==0) || (imagecolorat($image, $index, 38)==0))
{
// fill the character and return a rectangle around the image
$rectangle = floodfill_char($image, $index);
// pull this letter out into a new image
$singleLetter = imagecreatetruecolor($rectangle[’x2′] - $rectangle[’x1′] + 1,
$rectangle[’y2′] - $rectangle[’y1′] + 1);
imagecopy($singleLetter, $image, 0, 0, $rectangle[’x1′], $rectangle[’y1′],
$rectangle[’x2′] - $rectangle[’x1′] + 1,
$rectangle[’y2′] - $rectangle[’y1′] + 1);
$letters[] = $singleLetter;
// find the next character
$index = $rectangle[’x2′]+1;
}
}
return $letters;
}
$image = @imagecreatefrompng(’82.clean.png’);
if ($image == false) { die (’Unable to open image’); }
$letters = split_chars_along_vertical($image);
// dump the first letter to the screen
header(”content-type:image/png”);
imagepng($letters[0]);
?>
To run through the code and show how it works I made this neat little gif. To be honest it’s probably just a waste of my bandwidth but it looks pretty cool.