Copy Paste Google Scraper
Bored trying to use DOM to parse your data? That library is immense for simple tasks. Well anyway it’s pretty simple to write a program to scrape google, but just to make it easier here’s how I do it. Make sure that the scraper code from here is in the same php file or included. Feel free to use this code for any tool you want.
function scrape_google($url)
{
// get a page of results
$page = scrape_page($url);
// get a list of organic SE links
preg_match(”/<h2 class=r>(.*)<\/h2>/”, $page, $matches);
$link_list = $matches[1];
// get a list of URLS
$link_list = str_replace(”</a>”, “</a>\n”, $link_list);
preg_match_all(”/<a href=\”(.*)?\” class(.*)<\/a>/”, $link_list, $matches);
$link_list = $matches[1];
return $link_list;
// DEBUG: All this below is debugging stuff I’ve left in
// create a string to print to screen
//$str_link_list = implode(”\n”, $link_list);
//echo “<pre>” . $str_link_list . “</pre>”;
// save all links to a file
//$fp = fopen(”out”, “a”);
//fwrite($fp, $str_link_list . “\n”);
//fclose($fp);
// get the next page
//preg_match(”/<td nowrap class=b><a href=\”(.*)\”><div id=nn><\/div>Next<\/a>/”, $page, $matches);
//echo “<a href=’?googleurl=” . urlencode(”http://www.google.com” . $matches[1]) . “‘>Next Page - ” . $matches[1] . “</a>”;
}
So I grabbed this URL from the address bar and stuffed it into this function:
Tuesday, March 11th, 2008
