Hard-Coding Bias in Google "Algorithmic" Search Results
I present categories of searches for which available evidence indicates Google has "hard-coded" its own links to appear at the top of algorithmic search results, and I offer a methodology for detecting certain kinds of tampering by comparing Google results for similar searches. I compare Google's hard-coded results with Google's public statements and promises, including a dozen denials but at least one admission. I tabulate affected search terms and examine other mechanisms also granting favored placement to Google's ancillary services. I conclude by analyzing the impact of Google's tampering on users and competition, and by proposing principles to block Google's bias.
Disclosure: I serve as a consultant to various companies that compete with Google. But I write on my own -- not at the suggestion or request of any client, without approval or payment from any client.
Searches for health-related keywords (example: acne) show a similar pattern: The top-of-listing title, inset image, and left-most details link all link to Google Health. Like Google Finance, Google Health also suffers low acceptance in its category. But Google nonetheless places Google Health links in the most prominent positions. And here too, three prominent links all feature Google's own service.
How do Google's less popular services come to receive such valuable placements? Does favorable pagerank (or other favorable reputation) of google.com spill over onto other Google services to guarantee top position under standard ranking algorithms? (Google has made a similar claim in defending why its house ads systematically enjoy prominent placements.) Or have Google staff manually adjusted ("hard-coded") search results to provide special treatment to other Google services? I believe the latter theory offers the more convincing explanation. The next three sections offer evidence supporting this view.
Diagnosing Hard-Coding: "The Comma Test"
In general, adding a comma to the end of a search query does not yield predictable changes in algorithmic search results. Try it for yourself using my comma test search tool. Notice: core algorithmic results change little or none when a comma is added -- though ads and rich result boxes (such maps, products, and videos) often vary from search to search.
But for a subset of search terms, adding a trailing comma yields a large change in results. Add a comma to a finance term, for example requesting CSCO, rather than CSCO. Suddenly, the prominent Google Finance links disappear. Same for health keywords: Search for acne, rather than acne, and Google no longer features Google Health. See the screenshots at right.
Suppose the prominent links to Google Finance and Health were actually the result of a genuine algorithmic search -- the same process that yields Google's ordinary algorithmic search results. Then, as confirmed through my comma search tool and through collective experience with Google Search, a trailing comma should not change which results are listed.
But in fact a comma causes a large and systematic change: the addition of a single tiny comma causes the prominent Google Finance and Health links to disappear completely. What system design could explain that disappearance? My best assessment: If Google staff manually specified that a given result should appear at the top of results when users search for a specific search term, they might well forget to include search term variants with appended commas. Then a user searching for an unexpected variant, including by adding a comma, can see "true" Google algorithmic search, unaffected by the manual overrides.
Predetermining Hard-Coding Scope: Topics Lists and Index Pages
Which keywords are affected by Google's apparent hard-coding? As to health terms, Google's Health Topics index page offers an answer. A search for any of the 2,642 terms listed on the page, in exact match with no variation whatsoever, yields the prominent Google Health links presented above. Furthermore, for each and every such term, the Google Health links always occupy the three prominent positions described above. Indeed, in my testing, the Google Health links never appear in any position below the absolute top of the page.
While exact searches for the listed terms yield prominent links to Google Health, any tiny variant causes the Google Health results to disappear completely. Compare results for a sore throat (no Google Health results) to sore throat (prominent Google Health results), and compare my acne and stop acne (no Google Health results) to acne (prominent Google Health results).
The Google Health Topics page thus reveals a remarkable combination of characteristics: 1) complete, 100% success at achieving the top-most algorithmic search position for all terms where Health results appear at all, 2) complete, 100% success at achieving this top-most position for every single one of the 2,642 keywords Health elected to write about, yet 3) complete, 0% failure as to even the tiniest variations of listed terms.
These characteristics are inconsistent with ordinary algorithmic web search. Ordinarily, sites achieve an intermediate level of success, ranking more favorably for some terms than others. Rare is the site that can state with confidence that, for months on end, it will enjoy the top-most algorithmic position for a competitive search term, not to mention thousands of terms en masse. Equally rare is the site that obtains top position for every single term it covers. That said, when a site achieves top position for some terms, it ordinarily also receives a high position for close variants of those terms. In ordinary algorithmic web search, it would be highly unusual for a site to obtain top-most position for a term, but no listing at all for a close variant of that term.
The Health Topics page thus reveals multiple further differences between the process yielding these favored Google links versus the process that powers algorithmic search: differences in predictability of ranking, differences in the scope of keywords included, differences in robustness to search variants. These differences provide further reason to doubt that these prominent Google links result from ordinary algorithmic web search systems.
Google Usually Promises Unbiased Results, but Occasionally Admits Otherwise
A cynical user might expect Google to prominently link to its own services. After all, keeping users on Google properties means more opportunities to show ads -- hence greater revenue. And every click Google sends through a no-cost algorithmic link is a lost revenue opportunity.
But on numerous occasions, Google has promised not to succumb to temptation to bias its search results. To the contrary, Google has committed to provide users with the best possible links, chosen fairly and even-handedly. For example, Google has promised:
[When] we roll[ed] out Google Finance, we did put the Google link first. It seems only fair right, we do all the work for the search page and all these other things, so we do put it first... That has actually been our policy, since then, because of Finance. So for Google Maps again, it's the first link.
Mayer's statement is a first-hand admission from a person in a position to know -- direct support for my conclusion that Google intentionally and manually gives its services top position, notwithstanding a lower placement from Google's ordinary algorithms.
The Breadth and Effect of Tampering
Preceding sections flag hard-coding in the areas of finance and health conditions. But my testing has revealed anomalous "algorithmic" link patterns in numerous other areas.
For example, in the realm of travel planning, searches like bos to sfo yield links to Google's preferred partners. Google can use these links to play favorites -- offering valuable traffic to selected sites in exchange for their loyalty or other benefits. Meanwhile, Google's preferred placements are strikingly opaque to users. In the screenshot at right, clicking the prominent top-of-page "Flights from Boston, MA to San Francisco, CA" link takes users to Expedia, as does pressing Enter while in the "departing" or "returning" textboxes. But nothing in the on-screen listing gives any indication that the "Flights" link or Enter key send a user to Expedia. And if Google instead began to send such users to another site instead, there's nothing Expedia could do to stand in the way.
So too in the realm of movies. Type a movie name into Google, like the social network or movies 02138, and Google always replies with links to its own movies service. (See the second screenshot at right.) At best, a competing movies service can aspire to a result somewhat below -- but Google always takes the best position for itself.
As to maps, Google's favored treatment of its own service has had a particularly clear effect. A 2009 Hitwise analysis found that 61% of visits to Google Maps came directly from Google -- abundant free traffic Google can give its map service thanks to the popularity of its search service. Consumer Watchdog's 2010 Traffic Report shows that at the same time Google began sharply increasing the frequency and prominence of Google Maps displays and links, traffic to Mapquest (previously the most popular maps site) fell sharply.
Google's favored treatment of its own properties permeates Google's many focused search services. In a search for justin bieber, the top-most link featured Google News. A lower block of links featured iLike music, presented in partnership with Google. A "Videos" section presented two YouTube videos, and the heading introducing that section referred users to a page linking six YouTube videos before the first video drawn from another source. Further down the page, an "Images" section referred users to Image Search. All told, Google left just 775 pixels (53%) of vertical space for ordinary algorithmic results, while Google's own services took the remaining 47%. On one view Google has assembled useful results cutting across formats and media. But Google's systematic promotion of its own services sharply reduces the space available for others. And with results systematically featuring listings from Google and its hand-selected partners, there is reason to doubt Google's promise of unbiased results guided only by relevance.
In many instances, Google's hard-coded results provide useful information that users appreciate. For example, when searching for the social network, users benefit from immediately seeing a listing of theaters and showtimes without needing to click through to another site. There, Google Movies helps users save a click and get results faster. But does saving a click justify hard-coding? I doubt it. For one, much of Google's hard-coding does not save a click; for example, top-placed Google Health links don't accelerate users' searches or save users clicks. Furthermore, the essence of algorithmic search is extracting information from others' sites. Google has already built systems to extract key details from independent sites -- longstanding services such as core algorithmic search, image search, and product search, as well as new functions like reviews, addresses, and hours of operation integrated with site listings. So Google can obtain the efficiency and time-saving benefits of detail-rich results without needing to systematically put its own services first.
In some instances, Google's hard-coded results are affirmatively unhelpful. Consider a search for patent 9999999. (Patent lawyers often use a series of 9's as a placeholder or example patent number.) As with most patent searches, Google presents a prominent top-of-page link to its own Google Patents service. But in fact Google has no information on patent 9999999, and clicking Google's prominent link yields only an error. See the final screenshot at right. In contrast, requests for the lower-linked results yield actual useful information. So, for this search, Google's hard-coded link indisputably reduces the usefulness of Google results.
Implications & Response
It is well-known that the top-most algorithmic link enjoys a large share of search traffic -- 34%+ according to Chitika. Meanwhile, even the second link gets less than half as many clicks -- less than 17%. If these figures apply equally to Google's hard-coded links, then every time Google puts its own link first, it takes a third of all available clicks for itself -- while cutting by half the traffic provided to the site that would otherwise be ranked first. But Google's hard-coded links tend to be distinctive and graphic-rich (pictures in Health results, charts in Finance, etc.), so the actual effect is likely to be even larger.
When facing antitrust scrutiny, Google typically cites its use of algorithms as a key defense -- arguing that because search results are, purportedly, generated by computer algorithm, antitrust review is not necessary. I emphatically disagree. For one, an algorithm can indeed be biased; consider an algorithm that elects to place all Google results before all competing services, or an algorithm that reduces the prominence of links to any site that uses competing services rather than Google's offerings. But more than that, I dispute the premise: The preceding sections present good reason to doubt that Google's results are always generated by algorithm.
Indeed, Google's use of hard-coding and other adjustments to search results gives Google an important advantage in any sector that requires or benefits from substantial algorithmic search traffic. By directing users to Google services, Google can make its offerings take off in a broad class of services -- be it health, finance, maps, video, travel, or otherwise. Any Google business that needs "algorithmic" traffic can get it, free, in huge quantity. Meanwhile, entrepreneurs recognize and anticipate that Google may bury their results as it favors its own services -- blunting the incentive to build a business that competes with Google or competes with a service Google might plausibly develop. With Google already putting its Health and Finance sites first, even when user consensus is that other sites are preferable in these categories, it's easy to envision a future where user preferences and genuine excellence are less important than Google's rote power.
I credit that Google's power is not unlimited. If a new Google service falls sufficiently fall short of the competition, users may insist on going elsewhere. For example, when Google Video had little popular material but YouTube featured substantial copyright-infringing videos, Google's own analysis revealed that users flocked to YouTube (1, 2, 3, 4). Nonetheless, if Google's offering achieves a modicum of quality, Google can use hard-coding and other search biases to enjoy traffic volumes others cannot match.
I am struck by similarities between the favored treatment Google gives its own services and the favored treatment airlines previously gave their own flights in customer reservation systems (CRS's) they respectively owned. For example, when travel agents searched for flights through Apollo, a CRS then owned by United Airlines, United flights would come up first -- even if other carriers offered lower prices or nonstop service. The Department of Justice intervened, culminating in the rules prohibiting any CRS owned by an airline from ordering listings "us[ing] any factors directly or indirectly relating to carrier identity" (14 CFR 255). The same principle applies here: Google ought not rank results by any metric that distinctively favors Google.
I credit that it is less than straightforward to adapt CRS rules to search engines. CRS's sort a limited number of flights along a defined set of criteria (e.g. departure time, arrival time, total travel time, number of connections, price). In contrast, search engines must analyze a startling volume of web pages with arbitrarily many attributes. Still, the same principles hold true: A firm ought not use dominance in one area (CRS or web search) to suppress competition in unrelated fields (flights or independent web services). CRS rules continue to embody that principle, and it's time to insist on similar evenhandedness in online search.
Finally, I am concerned that Google has made inaccurate representations to the public including to users, publishers, advertisers, investors, and regulators. Comparing Google's "we do not manually change results" and similar promises to Mayer's quote and my findings, I can only conclude that Google's promises are not true. If so, Google should retract its misstatements and issue a correction.
Posted: November 15, 2010.
Sign up for notification of major updates and related work.