Archive for June, 2008

Anti-Cookie Stuffing

I feel like writing right now. Weird. But anyway. Cookie stuffing works. Cookie stuffing is on the limits of even my ethics :D . Cookie stuffing should be solved. Why don’t the big companies seem to care? Maybe they make too much to even notice, they just factor it in as an inevitable loss with affiliate marketing. On to solving. Cookie stuffing can generally be done two or three ways.

hidden IFRAME

cross-domain browser bug

an image pointing to site with affiliate URL

If there is a cross-domain javascript browser bug then you are fecked. Nothing you can really do to solve that.

Ok so the easy one. An image is the easiest way to cookie stuff, you can do it on any forum, blog, etc. It’s easy. Instant traffic with no work. But. It’s a CSRF (cross-site forgery request)! It simply needs a two-step process from the affiliate merchant’s website. He loads the affiliate url but then on that page is another HTTP request to a token based on the user’s session cookie. Remember we can’t read a cookie and we can’t place a cookie because it’s not on our domain. All we can do is basic HTTP requests. So if the tokenized URL is never loaded then that must mean that page has never been parsed by a browser so we don’t give the user a cookie. Simple.

Ok now before I started writing this I thought the IFRAME method could never be algorithmically detected. You’ve got the obvious checks such as checking referrer URL that sends a bot to make sure the page isn’t breaking rules, but that’s a laborious process. However if I remember correctly according to browser security rules it is ok for an IFRAME to read information about the parent frame from javascript but not the other way around. So if javascript is enabled then it should be easy to check that the page has not been IFRAMED. If it has been IFRAMED that’s a big red flag but I think javascript can also test the IFRAME to make sure it conforms to the rules right there and then.

Cookie stuffing solved. Anybody going to do anything about it?

Sunday, June 29th, 2008

Computer Recognized Photographs

On my wild and wacky adventures on the Internet, this is pretty amazing.

http://wang.ist.psu.edu/cgi-bin/zwang/alip_result1.cgi?test=1

It’s a computer program categorizing images based on statistical probabilities. Now where’s the download source code button? Darn those people!

Saturday, June 28th, 2008

Indian Digging

I’m pretty busy working on some code, hence the minimal posting. But I was thinking last night about getting a power user digg account without actually having to go and interact with the community because that is horrendously boring. Indians are pretty cheap right ;) . I wonder how much it would cost to get them to build you a power user account on a social bookmarking site. Just layout a plan to follow every day listing how many hours to do each task for and then pay them by the hour.

Anybody done this? Is it economically sound? You’ll be paying for an asset (or a liability depending on if it works :D ), the account, that you can use more than once unlike when you purchase diggs.

Tuesday, June 24th, 2008

Blackhat Tools

I forgot to mention that Mark from DigeratiMarketing has compiled a pretty large list of free tools to download of which I contributed a captcha breaker for smf forums and a samair proxy list scraper.

Get them here

When I say I forgot to mention I mean I couldn’t bring myself to give him a link as he keeps asking for them :D (See previous blog comments). It’s a slippery slope, today he asks for links and tomorrow we’ve given him the keys to the empire. Watch out folks! He’s evil. It’s how they all start.

Only kidding I meant to link to the tools as some of them are super neat but it slipped my mind.

Saturday, June 14th, 2008

Pythagoras was a spammer?

I don’t remember blogging about this before so here goes. There’s times when you’ll algorithmically be processing lines and a certain length of line should make your application carry out a certain function on it. Think digg’s captcha ;) .

Now the issue is you can’t just measure the horizontal distance and the vertical distance because the line might be at an arbitrary angle. So we use pythagoras’ theorem. Yes it’s simple but sometimes you can miss this obvious stuff when you’re coding. I did for ages ;) , which is why I’m blogging about it.

Pythagoras Theorem

Simply take the horizontal distance and the vertical distance, square them both, add them together, find the square root and you have the length of your line. Easy. Incidentally you need this in digg because a couple of short lines come off of F’s and so on.

Thursday, June 12th, 2008

Maxxed out Server

Let’s make a long story short. I’ve come into contact with a server which I need to max out the resources on, I’ve been trying my best but it needs something more. If I ran some captcha cracking stuff on it as an API service how many would be interested? I’d be a lot cheaper than employing bored students.What captchas would you all want destroyed :D

Monday, June 9th, 2008

Rigged List

Remember that 10 bloggers who rock post? Apparently I rigged it, here’s the actual comments.

    1. rob Says:
      this list is no where near complete. In fact, i think this whole thing was rigged.
    2. Harry Says:
      I did say it wasn’t complete. You think it was rigged? What are you some kind of conspiracy theorist?

So Rob of Seocracy what did you expect? Some kind of democratic process. Maybe I should have had blog primaries. I guess some people would still say it was tailored towards a particular demographic. Then we could argue the order is all wrong and I’m biased. Then I’d recount the votes and accidentally lose a few. Rigged ballot machines. You name it.

So if it’s so incomplete then fill in the gaps and I’ll link to you from here <—.

Update: Rob was kidding, apparently ;) . Or so he says :D . Anyway his site’s pretty cool. Check out some of his articles like:

This one

Also this one

Tuesday, June 3rd, 2008

Youtube’s 18+ Filters Don’t Work

How easy is it to get past youtube’s 18+ filters without actually signing up as an 18+ user? Hmmm… well we could… embed the video… That won’t work will it? Omg, it does. Still. I think I mentioned this before on my old blog. Does this not undermine the entire point of having an 18+ filter? It’s completely possible for someone underage to embed the video into a local web page thereby not having to confess up to being a kid.

It could be argued that only sites that have asked people to admit to being 18 will embed these videos, however what if they don’t check? That’s then Youtube’s fault, serving up content to underage people without checks. What if someone puts the video into their youtube profile, no checks there either. Some kid could stumble onto it and play it. That’s not unfeasible, someone with an 18+ video embedded into their profile subscribes to a popular video, thereby showing up in the list of subscribers. A kid clicks through. Some of the embedded videos even automatically play.

Have I become a crusader for all that’s right and fair? Nah, not really, I just think it’s pretty stupid and I wonder if Youtube are breaking the law.

Want to test it for yourself. WARNING: Kids please don’t use this.

It’s pretty simple use the link below, find an 18+ video, type in the watch?v=… url and click the button. I’ve defaulted it to some aladdin video in case you click accidentally :D . Or maybe I just couldn’t find the right niche in porn for you all ;) .

Click here to test for yourself

I leave you with a quote that just entered my head completely randomly:
“In cyberspace no-one can hear you power up your hard drives… AHHHHHHHHHHHHHHH!!”

Sunday, June 1st, 2008

Removing Lines Across Letters

Squidoo and Recaptcha.

Both have an annoying line going through them which joins the letters together. But how effective is this really? The thing about recaptcha is that the text is known to OCR successfully apart from one of the words which is unable to be OCRed. In recaptcha we can simply type in one correct word and it won’t be able to check the other one. We’ll probably need some pretty decent OCR software and an approximation module that guesses how close a word is to a proper english word.

But that line through it destroys any chance of standard OCR software recognising anything. However here’s a weakness. The line generally starts somewhere approximately in the middle and often sticks out from the end of the letter slightly. It shouldn’t be too hard to pick up where the line starts and possibly ends. From there we can assume that it won’t ever be thicker than a certain amount and will move by a limited amount. We can roughly track the line whenever it exits a letter. From that we can estimate where it has been travelling and what part is letter and what part is line.

Incidentally this works pretty well on digg except the vast quantity of lines and differing shades make it harder to pick them all up. Often you’ll pick up what looks like a line and have to flip 180 degrees to make sure you haven’t missed anything. The other problem with digg is it’s easy to end up with breaks in the lines and have to “trace blind”. If you have already traced enough of the line that’s not too hard because it’s a pretty basic algorithm to keep tracing through blank space with straight lines until we hit the rest of the line. Just be aware you might be a pixel or really rarely two away from the actual line.

Squidoo is a lot easier to identify the line with but has a lot more distortion in the letters. The distortion could be an issue, might need another algorithm to beat that if it won’t train out.

Anyway below is some code with a line detection algorithm. It assumes the furthest point left and right of a series of letters is part of the line. It then tries to trace along the line. It’s nowhere near perfect at the moment as it suffers from some issues when trying to build the line at the end that causes it to favour travelling upwards. But it shows that with a bit more tweaking those lines can be removed. I compiled it on Linux, it’ll be easier to test inside a linux VM if you’re running windows. The pics below show a perfect case scenario.

Goopwoot Squidoo Captcha

Squidoo Cleaned Up

Download Code with Link Below:

Code to detect and draw over the line in Squdioo Captcha’s

==================================
ALGORITHM OVERVIEW
==================================

- Try and draw as many lines as possible only allowing the line to move up or down by one pixel with each pixel travelled in the horizontal axis. If we hit a blank space stop drawing this line. Carry on drawing the others.

- Find the parts in the image where the line is most likely to skip. See pic below.

First Stage Squidoo Breaking

- Calculate the average incline per pixel movement in the horizontal axis between each section where the line “jumps”. Height/Width

- Use this incline to latch onto the closest shaded pixel. Then finally smooth the line.

==================================

What about other captchas like myspace where the letters actually touch? Hmmm… I wonder if they considered the best choice of font ;) ?

Sunday, June 1st, 2008