Simple Copy Paste Scraper Function

If you looked over at the guest post code on BlueHatSeo.com for scraping it’s a little incomplete (in my opinion :P ). It’s missing cookies, and has a couple of flexibility issues. The code below will let you use POST variables simply, as well as allowing you to store session data in cookies etc.

It’s really simple to use. To get a page without proxy and no post variables:

$htmlcode = scrape_page(”http://www.google.com/”);

with post variables:

$htmlcode = scrape_page(”http://www.google.com/”, 1, “var1=1&var2=2&var3=3″);

with proxy (defaults to 127.0.0.1:8118 - TOR):

$htmlcode = scrape_page(”http://www.google.com/”, 0, “”, 1);

Here’s the code… all you need to do is change the cookie path to a text file (with the correct permissions on linux), and set the proxy to your proxy address.

<?php

function scrape_page($page, $post=0, $fields=null, $proxy=0)
{
// cookie path
$file_cookie = “/path/to/cookie/file/cookies”;

$ch = curl_init($page);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $file_cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $file_cookie);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

if($proxy==1)
curl_setopt($ch, CURLOPT_PROXY, “127.0.0.1:8118″);

curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_USERAGENT,
“Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)”);

if($post==1)
{
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields);
}

$response = curl_exec($ch);
curl_close($ch);

//echo curl_error($ch);

return $response;
}

?>

The only thing it is missing is some good old error checking.

11 Responses to “Simple Copy Paste Scraper Function”

  1. Collectibles Says:

    Thanks for the code.

  2. bishi Says:

    im going to put this function into use. i hate curl, but this is a lot better than file_get_contents

  3. Dark SEO Programming » Blog Archive » Copy Paste Google Scraper Says:

    […] scrape google, but just to make it easier here’s how I do it. Make sure that the scraper code from here is in the same php file or included. Feel free to use this code for any tool you […]

  4. Dark SEO Programming » Blog Archive » Email Verification Says:

    […] a free email service that supports webmail, and a page scraping utility. Hmmmm… Guess what, my page scraping code will work excellently with webmail services. What’s really handy is as long as you point the […]

  5. Bobbink SEO Blog Says:

    I will test this code next week. It looks ok, besides the lack of error checking!

  6. Trond Says:

    Thanks for the code!
    I’ll try it this weekend :-)

    Rgds,
    Trond

  7. Chris Devon Says:

    Just wanted to let you know that the link to your guest post is down

  8. Harry Says:

    It wasn’t my guest post. Somebody else’s. I dunno what happened to it :S

  9. Busby SEO Test Says:

    Great example and tutorial. I’ll try this

  10. Yusan Says:

    How to use this code?

  11. ekolhoca Says:

    Great example and tutorial. I’ll try this.

Leave a Reply

Enter this code