Copy Paste Google Scraper

Bored trying to use DOM to parse your data? That library is immense for simple tasks. Well anyway it’s pretty simple to write a program to scrape google, but just to make it easier here’s how I do it. Make sure that the scraper code from here is in the same php file or included. Feel free to use this code for any tool you want.

function scrape_google($url)
{
// get a page of results
$page = scrape_page($url);

// get a list of organic SE links
preg_match(”/<h2 class=r>(.*)<\/h2>/”, $page, $matches);
$link_list = $matches[1];

// get a list of URLS
$link_list = str_replace(”</a>”, “</a>\n”, $link_list);
preg_match_all(”/<a href=\”(.*)?\” class(.*)<\/a>/”, $link_list, $matches);
$link_list = $matches[1];

return $link_list;

// DEBUG: All this below is debugging stuff I’ve left in

// create a string to print to screen
//$str_link_list = implode(”\n”, $link_list);
//echo “<pre>” . $str_link_list . “</pre>”;

// save all links to a file
//$fp = fopen(”out”, “a”);
//fwrite($fp, $str_link_list . “\n”);
//fclose($fp);

// get the next page
//preg_match(”/<td nowrap class=b><a href=\”(.*)\”><div id=nn><\/div>Next<\/a>/”, $page, $matches);
//echo “<a href=’?googleurl=” . urlencode(”http://www.google.com” . $matches[1]) . “‘>Next Page - ” . $matches[1] . “</a>”;
}

So I grabbed this URL from the address bar and stuffed it into this function:

25 Responses to “Copy Paste Google Scraper”

  1. kamo Says:

    good stuff…welcome to my feed reader :)

  2. Lavadev Says:

    Thanks for the useful code. Keep up the good work, I’m subscribing :)

  3. Adam Says:

    I’ve subscribed too, very useful stuff!

  4. Harry Says:

    Some moron left in a bug (No guesses who). I ripped this code out of a script I wrote and converted it to a function, tested it but didn’t realise I had left in a piece of code which referenced a variable which doesn’t exist anymore.

    Fixed. Sorry about that.

  5. elusid Says:

    I keep getting an empty array… the page scraper function works… do you think I may need to change the regex?

  6. Harry Says:

    I just tested it and it works. However obviously if you downloaded the code before I removed the $url=$_GET[’googleurl’]; line at the top then that’s definitely screwing it up. That’s me being stupid.

    The only other thing I noticed and I’m not sure if this is just on Linux. I copied and pasted my code back off this website into a test program and it told me it contained unicode characters, which were the quotes. The quotes were changed to ? signs. I think that might just be Linux though. I’ll have to look at why that’s happening. If you’ve got the page scraper function working that’s probably not the issue.

    Failing that, I only tested the program on google.com and google.co.uk.

    Is any of this helping? :)

  7. Elliott Russell Says:

    Great Stuff! Must have taken a while :O

  8. Harry Says:

    Took ages ;) … j/k. Actually it didn’t take too long thanks to php’s amazing string matching functions.

  9. free tv Says:

    That is really a nice thing. I have used it also. I really liked it.

  10. no deposit poker Says:

    Is it works really.It is looking very simple.Thanks for this.

  11. Online Gambling Says:

    If this thing works then this is cool.

  12. Sports Betting Says:

    Looking very cool.I am gonna try this.

  13. Kamagra Says:

    That is really a nice thing.I will try it.

  14. Genevois Says:

    for elusid : have you tried it using a valid proxy ?

  15. Bookmakers Says:

    Thanks for the useful code. Keep up the good work.

  16. Online Casino Says:

    That is really a nice thing. I have used it also.

  17. Casino Says:

    it works really very good.It is looking very simple.

  18. James Says:

    Yeah!! (Wrings hands)! Nice blog you have here. I’ve enjoyed much reading your last posts. Keep it that way.

  19. hollywood forum Says:

    lol.. that is a nice post. i really enjoyed to read it.

  20. salvia Says:

    Hmm… those codes really works nice. Thanks for the codes.

  21. Will Sheppard Says:

    For screen scraping, I enjoy using Perl with HTML::TreeBuilder there’s a good tutorial here:
    http://search.cpan.org/~petek/HTML-Tree-3.23/lib/HTML/Tree/Scanning.pod

  22. Harry Says:

    For a minute there I thought this was shameless self promotion from Will :D . If it is a decent tutorial I’d have left it there anyway.

  23. kotoh Says:

    Hi everybody..

    I’m pretty n00b about this scraping stuff. Is it possible to run this script on windows platform? And how? Sorry for this stupid question, but I really don’t know how.

  24. Pharm10 Says:

    Very nice site!
    cheap viagra

  25. 5ubliminal Says:

    Try my version … looks a bit better :)

Leave a Reply

Enter this code