CFP Presentation on Search Engine Omissions; Spyware Workshop Comments updated June 3, 2004

Today I presented Empirical Research on Search Engine Omissions at Computers, Freedom, and Privacy (CFP) in Berkeley, CA. My presentation focused on two prior empirical projects in which I documented sites missing from Google search results: Localized Google Search Result Exclusions (documenting 100+ controversial sites missing from, .fr, and .ch) and Empirical Analysis of Google SafeSearch (documenting thousands of unobjectionable and non-sexually-explicit sites missing from when users enable Google’s SafeSearch feature to attempt to omit sexually-explicit content).

On Monday I was in DC for the FTC‘s Spyware Workshop. I thought the final panel, Governmental Responses to Spyware, did a fine job of explaining the legislative options on the table, and of noting the pressure to address the problem of spyware for the large and growing number of affected users. But I was dismayed that the first panel (Defining Spyware) classified as fine and unobjectionable certain programs that, in my experience, users rarely want, yet often find installed on their computers. Key among these undesired programs are software from Claria (formerly Gator) and WhenU. The technical experts on the second and third panels agreed that these programs pose major problems and costs for users and tech support staff. Yet the first panel seemed to think them perfectly honorable.

Also puzzling was a new position paper from the Consumer Software Working Group recently convened by CDT. Examples of Unfair, Deceptive or Devious Practices Involving Software (PDF) purports to offer a listing of bad behaviors that software ought not perform. It certainly lists plenty of behaviors that so outrageous as to be beyond dispute. But what it misses — indeed, ignores — are the harder cases, i.e. the programs that make spyware a more complicated issue, and the programs that affect the most users. For example, the Examples document condemns software installed without any notice to the user. It is silent about — and thereby is taken to endorse — the far more typical practice of showing a user a license agreement and/or disclosure that describes the software in euphemisms, but admittedly does provide at least some notice of the software’s purpose.

What to make of the document’s failure to consider the methods actually used by the controversial software with highest installation rates? Perhaps one explanation is that Claria and WhenU helped draft the report! (See the signators listed on page five.) That said, the document doesn’t purport to be comprehensive. Perhaps a future version will address the problems of drive-bys and euphemistic, lengthy, or poorly-presented licenses.

For more on the workshop, and another critical reaction, see other attendees’ notes on forums (especially a recent post by Eric Howes). See also impressive studies from PC Pitstop showing that more than 75% of Gator users don’t even know they have Gator (PDF) (not to mention consenting to Gator’s license agreements) and more than 85% for WhenU (PDF).

See also a transcript of the workshop (PDF).


Benjamin Edelman v. N2H2, Inc.

I sought to research and document sites categorized and restricted by Internet blocking program N2H2. N2H2’s block site list is protected by technical measures including an encryption system, but I sought to write software that would nonetheless allow me to access, analyze, and report its contents. However, I feared that conducting this work might expose me to liability for violation of the N2H2 License, of the Copyright Act of 1976, and of the Digital Millennium Copyright Act, as well as for misappropriation of N2H2’s trade secrets. With representation by the ACLU, I therefore sought from federal court a declaratory judgement that I could conduct this research and publication without fear of liability.

Case details including litigation documents

Empirical Analysis of Google SafeSearch

Google offers interested users a version of its search engine restricted by a service it calls SafeSearch, intended to omit references to sites with “pornography and explicit sexual content.” However, testing indicates that SafeSearch blocks at least tens of thousands of web pages without any sexually-explicit content, whether graphical or textual. Blocked results include sites operated by educational institutions, non-profits, news media, and national and local governments. Among searches on sensitive topics such as reproductive health, SafeSearch blocks results in a way that seems essentially random; it is difficult to construct a rational non-arbitrary basis for which pages are allowed and which are omitted. Full article.

Web Sites Sharing IP Addresses: Prevalence and Significance

Web Sites Sharing IP Addresses: Prevalence and Significance. (September 2013)

More than 87% of active domain names are found to share their IP addresses (i.e. their web servers) with one or more additional domains, and more than two third of active domain names share their addresses with fifty or more additional domains. While this IP sharing is typically transparent to ordinary users, it causes complications for those who seek to filter the Internet, restrict users’ ability to access certain controversial content on the basis of the IP address used to host that content. With so many sites sharing IP addresses, IP-based filtering efforts are bound to produce “overblocking” — accidental and often unanticipated denial of access to web sites that abide by the stated filtering rules.

Empirical Analysis of Internet Filtering in China with Jonathan Zittrain

Empirical Analysis of Internet Filtering in China – full article.

The authors are collecting data on the methods, scope, and depth of selective barriers to Internet access through Chinese networks. Tests from May 2002 through November 2002 indicate at least four distinct and independently operable methods of Internet filtering, with a documentable leap in filtering sophistication beginning in September 2002. The authors document thousands of sites rendered inaccessible using the most common and longstanding filtering practice. These sites were found through connections to the Internet by telephone dial-up link and through proxy servers in China. Once so connected, the authors attempted to access approximately two hundred thousand web sites. The authors tracked 19,032 web sites that were inaccessible from China on multiple occasions while remaining accessible from the United States. Such sites contained information about news, politics, health, commerce, and entertainment. See highlights of blocked pages. The authors conclude (1) that the Chinese government maintains an active interest in preventing users from viewing certain web content, both sexually explicit and non-sexually explicit; (2) that it has managed to configure overlapping nationwide systems to effectively — if at times irregularly — block such content from users who do not regularly seek to circumvent such blocking; and (3) that such blocking systems are becoming more refined even as they are likely more labor- and technology-intensive to maintain than cruder predecessors.

Revised and published as Internet Filtering in China (IEEE Internet Computing 2003). Project joint with Jonathan Zittrain.

Localized Google search result exclusions: Statement of issues and call for data

The authors are studying exclusions from search engine search results, and have found some 113 sites excluded, in whole or in part, from the French and German compared with Learn more about the situation and context, test the exclusions for yourself, and submit further sites suspected to be excluded. Full article.

Joint with Jonathan Zittrain.

Replacement of Google with Alternative Search Systems in China: Documentation and Screen Shots

The authors are studying Internet filtering in countries worldwide, and current investigations focus on restrictions on web access in China. Using a web-based system to test web filtering in China, the authors previously determined and confirmed that Google was inaccessible from at least one testing location in China; initially, in testing beginning August 29, a request for Google led to the error “host not found,” consistent with requests for other inaccessible or blocked sites. However, using related methods, the authors have now confirmed and documented reports that Chinese Internet access currently provides pages other than the ordinary Google home page in response to requests for; such behavior is believed to have begun on September 8. The screen shots in this article document six instances of this replacement. Full article.

Project joint with Jonathan Zittrain.