Screen-scraping Current Google Images

(Actually, this is a test of Google's Code prettifier.)

Google has updated their Google Images search result output a while ago. Here's a piece of PHP5 that screen-scrapes the result for your personal use (check Google's robots.txt if in doubt). This sample search engine grabs an array of Google image results using the getGoogleImages() function, and then outputs it again on a blank page; it will allow you to right-click a particular thumbnail to zoom into its full size, just to illustrate how you can use the different result properties:

<?
header("Content-type: text/html; charset=utf-8");
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

<head>
    <title>Get Images</title>
</head>
<body>

<?

$results = getGoogleImages('horses');
foreach ($results as $result) {
    echo '<p><a href="' . htmlentities($result['url']) . '">' .
            '<img src="' . htmlentities($result['thumbnail']) . '" alt="" ' .
            'oncontextmenu="this.src=\'' . htmlentities($result['image']) . '\';return false;" ' .
            'style="border: 1px solid black" /></a><br />' .
            '<em>' . htmlentities($result['description']) . '</em>' .
            '</p>';

}

?>

</body>
</html><?

function getGoogleImages($q, $doSafeSearch = false)
{
    $results = array();

    $safe = ($doSafeSearch) ? 'on' : 'off';
    $url = 'http://images.google.com/images?safe=' . $safe .
            '&q=' . urlencode($q);
    $result = file_get_contents($url);

    $from = 'dyn.Img("';
    $startPos = strPos($result, $from);
    $endPos = strPos($result, ');dyn.updateStatus');
    $functions = substr( $result, $startPos + strlen($from), $endPos );
    $functions = explode('");dyn.Img("', $functions);

    foreach ($functions as $f) {

        $i = count($results);
        list($results[$i]['url'], $v1, $hash,
                $results[$i]['image'],
                $results[$i]['width'], $results[$i]['height'],
                $results[$i]['description'],
                $v2, $v3, $more, $extension, $domain) = explode('","', $f);
        list($results[$i]['url'], $params) = explode('&h', $results[$i]['url']);

        $prefix = 'http://tbn0.google.com/images?q=tbn:';
        $results[$i]['thumbnail'] = $prefix . $hash . ':' . $results[$i]['image'];
        $results[$i]['description'] = strip_tags($results[$i]['description']);
    }

    return $results;
}

?>

Instead of "normal" images, the Google HTML output delivers image information in the JavaScript portion. The nice thing is that it's very easy to split this up, because it's already in very structured format. So above, we can easily explode the string into its sub parts like description, original page URL, thumbnail and so on.

Feel free to run with the code, or see its output on a sample page.

Google Blogoscoped 2007