Archive for July, 2008

The most fucktarded content rewriting

So what’s a good idea not to do when you scrape someone’s content. Link to them? Yeah that’d be a good start.

http://seosandbox.com/2008/07/25/user-contributed-captcha-breaking-w-phpbb2-example/

Type this twat’s URL in and you’ll have flashbacks to March 2008 and a post on bluehatseo.com except someone swallowed a very small thesaurus. Bluehatseo.com got ddos’d for posting that article. I think this guy should get a dose of the same.

On the plus side he didn’t use markov’d content, he simply replaces certain words with others. He doesn’t reorder the sentences. What we need is a site with a captcha that says “write this sentence to mean the same but in a different way”. Put some porn on the site, take the answers and use them in our content rewriter. Filter for dodgy words etc etc.

Anyway I was gonna make a post about how to compile windows programs from linux as I’m getting success with it now, but this guy made me hit the trigger button.

Wednesday, July 30th, 2008

Money.co.uk Keeping up?

You all been keeping up with money.co.uk? They have had a couple of bits of linkbait out there that I’ve seen. Maybe you’ve seen more. I wonder if it worked?

Money search

Click the picture to see a search on google.co.uk for money. They’re not first. I think they’re like 5th. But who knows what else they have to turn loose upon the search engines?

Friday, July 25th, 2008

PHPBB3 Code

Windows is a total pain. I spent ages trying to get this phpbb3 code to compile on it. The code is messy as anything. It was written pretty fast just to do the job. I might release piratebay if people are interested which has a lot cleaner code in it. I actually used separate modules :D .

The code just runs through the entire phpbb3 captcha and fills in areas until it finds a small area. It assumes that any small area must be part of the letter. It then blurs all the little squares together, spaces the letters out better, rotates them and dumps a file that can be read by gocr.

The windows executable barely compiled. Hopefully it works, but since I don’t use windows. I have no idea really :D .

Here it is. PhpBB3 Crack tool

Monday, July 21st, 2008

Would you like to be showered with quality links?

If you would… Here’s the plan.

Make some ridiculously shit web page that does nothing except it looks cool. Maybe you answer a test and get some shit answer out. Make sure whatever it is people want to include some kind of pointless widget on their blog. Whilst you’re there squeeze a sneaky link back to your real site in it.

Then you’ll be wanting to buy or bully your way into a review on a top site. Let’s say… John Chow… That sounds cool.

Oh no wait. It’s just been done. 

Tuesday, July 15th, 2008

Utilizing ANNs More Efficiently

It has come to my attention that GOCR has its shortcomings. :D The problem is that small adjustments in pixels from surrounding noise cause it to recognize h’s as b’s and little things like that. Most of these OCR packages were never designed to learn new alphabets. If you open the source code to OCRAD up you’ll see that it breaks the letters down into a list of features which are then used to assign a probability as to the most likely letter. None of it is trained. It is all pre-planned and hard coded into the program. I’m not 100% sure how GOCR training works but I think it happens at the pixel level.

Now a while back I did a post on training a neural network at the pixel level to recognize characters. It took a long time because it was php (please use the C++ libraries for training unless you have a lifetime to train the neurons) but it started to work, although not as accurately as GOCR. The problems were obvious things like h’s get picked up as b’s again. You can see why the neurons failed to recognize the character.

The nice thing about neural networks is they’re pretty simple to use once you get to grips with the number of layers you might want and so on. The other nice thing is that we can stack them together in a similar way to how a full adder works. I.e. you can pass a carry flag from one adder to the next.

So the reason our neural nets are failing is because the pixels differ a little here and there and without some knowledge about exactly how a letter is formed it’s difficult to know which letter it is. So if we take a step back from the pixel level there’s a couple of things we can analyze. We can look for hills & valleys. Like an ‘n’ has a space inside it. If we calculate the base of the text we can also identify if the word has lines that dip below the baseline or go very high above it. Just using these features we can train a neural net to guess at a range of characters. Then we can feed the output into our second neural net which works at pixel level or concatenates the output of a couple of other nets.

The idea behind this is that feature extraction is a proven technique that gets very good results until the characters deviate from the norm. Using multiple nets we should be able to combine the ability to train a new alphabet with the power of feature extraction.

Friday, July 11th, 2008

Record search information

Wouldn’t it be cool if you knew all the searches they typed into google even though it’s not on your domain. With Internet Explorer 7 now you can! I haven’t checked this vulnerability out properly so I’m not quite sure of the details but it looks pretty severe.

I think you may need to have your page still open for this to work, so navigating away *may* destroy the recording, but if they click open a new window then it should work I think. The scammers will be having a field day anyway.

Click Here unless you use Internet Explorer in which case…

Click here for Disney and ignorance

Thursday, July 3rd, 2008