CFP Presentation on Search Engine Omissions; Spyware Workshop Comments updated June 3, 2004

Today I presented Empirical Research on Search Engine Omissions at Computers, Freedom, and Privacy (CFP) in Berkeley, CA. My presentation focused on two prior empirical projects in which I documented sites missing from Google search results: Localized Google Search Result Exclusions (documenting 100+ controversial sites missing from, .fr, and .ch) and Empirical Analysis of Google SafeSearch (documenting thousands of unobjectionable and non-sexually-explicit sites missing from when users enable Google’s SafeSearch feature to attempt to omit sexually-explicit content).

On Monday I was in DC for the FTC‘s Spyware Workshop. I thought the final panel, Governmental Responses to Spyware, did a fine job of explaining the legislative options on the table, and of noting the pressure to address the problem of spyware for the large and growing number of affected users. But I was dismayed that the first panel (Defining Spyware) classified as fine and unobjectionable certain programs that, in my experience, users rarely want, yet often find installed on their computers. Key among these undesired programs are software from Claria (formerly Gator) and WhenU. The technical experts on the second and third panels agreed that these programs pose major problems and costs for users and tech support staff. Yet the first panel seemed to think them perfectly honorable.

Also puzzling was a new position paper from the Consumer Software Working Group recently convened by CDT. Examples of Unfair, Deceptive or Devious Practices Involving Software (PDF) purports to offer a listing of bad behaviors that software ought not perform. It certainly lists plenty of behaviors that so outrageous as to be beyond dispute. But what it misses — indeed, ignores — are the harder cases, i.e. the programs that make spyware a more complicated issue, and the programs that affect the most users. For example, the Examples document condemns software installed without any notice to the user. It is silent about — and thereby is taken to endorse — the far more typical practice of showing a user a license agreement and/or disclosure that describes the software in euphemisms, but admittedly does provide at least some notice of the software’s purpose.

What to make of the document’s failure to consider the methods actually used by the controversial software with highest installation rates? Perhaps one explanation is that Claria and WhenU helped draft the report! (See the signators listed on page five.) That said, the document doesn’t purport to be comprehensive. Perhaps a future version will address the problems of drive-bys and euphemistic, lengthy, or poorly-presented licenses.

For more on the workshop, and another critical reaction, see other attendees’ notes on forums (especially a recent post by Eric Howes). See also impressive studies from PC Pitstop showing that more than 75% of Gator users don’t even know they have Gator (PDF) (not to mention consenting to Gator’s license agreements) and more than 85% for WhenU (PDF).

See also a transcript of the workshop (PDF).


Sites Blocked by ADL HateFilter with Jonathan Zittrain

Like numerous other Internet filtering programs, the Anti-Defamation League’s HateFilter attempts to prevent users from knowing which specific web sites are deemed off-limits. However, this research presents a method for efficiently determining which specific sites are blocked, and this site reports results. Numerous sites are blocked that no longer offer content meeting ADL’s definitions (if they ever did), including sites now offering other substantive content, sites that offer only error messages, and sites that no longer exist.

Continued: Sites Blocked by ADL HateFilter

Benjamin Edelman v. N2H2, Inc.

I sought to research and document sites categorized and restricted by Internet blocking program N2H2. N2H2’s block site list is protected by technical measures including an encryption system, but I sought to write software that would nonetheless allow me to access, analyze, and report its contents. However, I feared that conducting this work might expose me to liability for violation of the N2H2 License, of the Copyright Act of 1976, and of the Digital Millennium Copyright Act, as well as for misappropriation of N2H2’s trade secrets. With representation by the ACLU, I therefore sought from federal court a declaratory judgement that I could conduct this research and publication without fear of liability.

Case details including litigation documents

Empirical Analysis of Google SafeSearch

Google offers interested users a version of its search engine restricted by a service it calls SafeSearch, intended to omit references to sites with “pornography and explicit sexual content.” However, testing indicates that SafeSearch blocks at least tens of thousands of web pages without any sexually-explicit content, whether graphical or textual. Blocked results include sites operated by educational institutions, non-profits, news media, and national and local governments. Among searches on sensitive topics such as reproductive health, SafeSearch blocks results in a way that seems essentially random; it is difficult to construct a rational non-arbitrary basis for which pages are allowed and which are omitted. Full article.

Web Sites Sharing IP Addresses: Prevalence and Significance

Web Sites Sharing IP Addresses: Prevalence and Significance. (September 2013)

More than 87% of active domain names are found to share their IP addresses (i.e. their web servers) with one or more additional domains, and more than two third of active domain names share their addresses with fifty or more additional domains. While this IP sharing is typically transparent to ordinary users, it causes complications for those who seek to filter the Internet, restrict users’ ability to access certain controversial content on the basis of the IP address used to host that content. With so many sites sharing IP addresses, IP-based filtering efforts are bound to produce “overblocking” — accidental and often unanticipated denial of access to web sites that abide by the stated filtering rules.

Empirical Analysis of Internet Filtering in China with Jonathan Zittrain

Empirical Analysis of Internet Filtering in China – full article.

The authors are collecting data on the methods, scope, and depth of selective barriers to Internet access through Chinese networks. Tests from May 2002 through November 2002 indicate at least four distinct and independently operable methods of Internet filtering, with a documentable leap in filtering sophistication beginning in September 2002. The authors document thousands of sites rendered inaccessible using the most common and longstanding filtering practice. These sites were found through connections to the Internet by telephone dial-up link and through proxy servers in China. Once so connected, the authors attempted to access approximately two hundred thousand web sites. The authors tracked 19,032 web sites that were inaccessible from China on multiple occasions while remaining accessible from the United States. Such sites contained information about news, politics, health, commerce, and entertainment. See highlights of blocked pages. The authors conclude (1) that the Chinese government maintains an active interest in preventing users from viewing certain web content, both sexually explicit and non-sexually explicit; (2) that it has managed to configure overlapping nationwide systems to effectively — if at times irregularly — block such content from users who do not regularly seek to circumvent such blocking; and (3) that such blocking systems are becoming more refined even as they are likely more labor- and technology-intensive to maintain than cruder predecessors.

Revised and published as Internet Filtering in China (IEEE Internet Computing 2003).