Google Blogoscoped

Wednesday, June 28, 2006

Is Google Objective? Manual Edits in Search Results

Google claims that their search results “are generated completely objectively and are independent of the beliefs and preferences of those who work at Google.” Elizabeth Van Couvering, at the Berlin search engines workshop Monday, argued however that Google didn’t find their search engine “on the road.” It’s true that every ranking algorithm for a search engine at one time was written by a necessarily subjective human engineer – so how objective can a search ranking be? I think we can break up this question into 4 different levels of “objectivity.” (These levels may overlap more than they are clearly separate entities.)

Level 1: Perceived results relevance without manual edits

When I’m saying “perceived relevance” then I mean that an engineer at Google, or any other search engine, is trying to rank the “best” pages on top for a particular search query (the more relevant a search result to its query, the better for the user), and that what’s relevant or best is a highly subjective issue to begin with. An engineer/ programmer/ developer must come up with a basic concept for ranking pages, like “let’s think of web links as votes on pages, and call this thing PageRank.” An engineer must then evaluate the search results for different queries (with the help of feedback by external quality testers, actual usage data and so on), and fine-tune the algo again, for example to battle search result spam.

Now, I don’t think there’s a way to get around the subjectivity on this level, because there is no such thing as a truly “objective” result ranking. Any ranking must reflect the human values of the team who came up with the ranking algos (or of those who judged the result through feedback polls), unless indeed we find the source code on the road... nothing too desirable either if it would be realistic. At this level, we can however argue that to some extent, “all individual pages and search queries are equal.”

Level 2: Perceived results relevance through manual edits

On top of trying to rank pages solely automatically & algorithmically, manual edits consider certain pages or search queries to be “unequal,” meaning they receive special treatment (we can still talk about algorithms, but these algorithms are peppered with data):

Level 3: Perceived results irrelevance through manual edits (for a perceived larger overall relevance)

In previous examples, we can see that while we can’t always tell if manual removals and such were fair, we can always argue that at least the search engine creators deemed them fair. E.g. removing spam sites makes the search engine return more relevant results on top. However, there’s another type of manual edit: the one where even the search engineers agree that results are made worse. I’m thinking of the thing we stop calling “filter” and start calling “censorship.”

For example, when Google agreed to self-censor German search results based on a manual blacklist of sites (e.g. those containing Neo-Nazi material), they did so voluntarily, but one might argue they didn’t really like to do that. They made the decision to react on semi-official German regulations, possibly trying to prevent further, stronger censorship, or at least trying to not stay out of the German market on principle. This was a very clear clash with Google’s principles – you just had to read their help files at the time, where they said they don’t censor*. This was also making results, taken on their own, more irrelevant; clearly entering stormfront.org and getting no results (on Google.de) is worse than getting the actual Stormfront.org site as result (on Google.com), at least measured by relevance.

Why might there be a potentially larger “overall relevance” for search results on this level? Well, for example when Google would leave the German market on principle, as they’re opposing censorship at least by their old standards, they might leave Germans with what they may deem less relevant results**. Yeah, Yahoo might be up to par with Google relevance, but I bet Google engineers think differently – it’s sort of their job to do so. So from their unique subjective perspective, any market without Google is a market with less relevant search results***, even when that market may have other search engines available****.

Level 4: Results relevance not a top priority

Well, and then there’s the point when search engine creators do not even have results relevancy as top priority, mostly to replace them with money-makin’ priorities – we could title this level “plain vanilla evil” or “let’s care about the money instead of the user"***** or “Dilbert cartoon boss doing random stuff.” For example:

*To quote from an official help entry that in the meantime has been changed, but was active for a long time: “Google does not censor results for any search term. The order and content of our results are completely automated; we do not manipulate our search results by hand. We believe strongly in allowing the democracy of the web to determine the inclusion and ranking of sites in our search results.” At the time, Google also didn’t always disclose censorship in their German search results.

**It should be noted that a principled withdrawal from a country has the potential effect of escalating a conflict between that country’s population vs the censorship laws they’re governed by, which in the end could result in search engines being allowed to display uncensored results, thus increasing overall relevancy... but that’s a matter of debate and speculation. I’d argue that if e.g. Google in Germany would withdraw due to self-censorship objections, considering their 90%+ market share in Germany, they might cause more than just a public outcry – they might cause subtle but important changes in German politics and laws.

***Quote Google’s statement from 2006 on their self-censored Google.cn: “We ultimately reached our decision by asking ourselves which course would most effectively further Google’s mission to organize the world’s information and make it universally useful and accessible. Or, put simply: how can we provide the greatest access to information to the greatest number of people?
Filtering our search results clearly compromises our mission. Failing to offer Google search at all to a fifth of the world’s population, however, does so far more severely.”

****This issue became more visible in early 2006 with Google’s self-censored Chinese search engine, because these “potentially more relevant” results censorship partly strengthens a repressive regime and its human rights violations. There are some important parts adding complexity to this specific case of Google.cn; for instance, Google.com was already 90% accessible from within China (by Google’s own records), so you might argue they didn’t even try to increase relevance but only speed. Still, there’s at least a theoretical issue where search engine creators may feel they’ll either be blocked significantly, or are “forced” to agree to make their results less relevant (by their own standards).

*****Of course, it may be an even smarter decision if all you want to do is make money to do care about the user first (just look at Google’s success partly based on their user-centric interfaces), but that’s another issue.

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!