Privacy Lapse at Google JotSpot

Privacy Lapse at Google JotSpot
Benjamin Edelman

Google's JotSpot service posts sensitive user data, despite specific promises to the contrary in JotSpot's privacy policy. JotSpot even allows this information to be indexed by Google's search crawlers -- despite repeated press releases claiming that user information is "secure" and protected by "firewalls and other advanced security technologies." JotSpot's postings are, by all indications, accidental. But in the context of a series of similar slip-ups, these postings raise questions about the efficacy of Google's model of hosted applications.

Related Projects

Competition among Sponsored Search Services

Sears Exposes Customer Purchase History in Violation of Its Privacy Policy

Privacy & Security Violations at Buy.com

Other Research by Ben Edelman

JotSpot-Hosted Sites List Registered Users (Screenshot 1)

Screenshot 1: JotSpot-Hosted Sites List Registered Users

JotSpot Lists Registered User Details (including full email address) (Screenshot 2)

Screenshot 2: JotSpot Lists User Details (including full email address)

Google Search Makes JotSpot User Listings Easy to Find (Screenshot 3)

Screenshot 3: Google Search Makes JotSpot User Listings Easy to Find

Google has a strong incentive to assure the privacy of sensitive user information stored on Google servers. After all, if users lose confidence in Google's model of server-based computing, users are unlikely to be willing to store their data in Google systems. Historically, Google has had few widely-publicized privacy breaches -- so, to date, users have few specific reasons to distrust the privacy of data stored on Google's servers. But as Google grows, slip-ups are inevitable. This article presents one such error: Google JotSpot pages that reveal usernames, full names, email addresses and more -- contrary to JotSpot's privacy policy and contrary to user expectations.

Google's JotSpot service provides shared document editing for wiki-style online collaboration, so JotSpot collects a variety of sensitive information -- user details along with who viewed and edited which pages when. But JotSpot's privacy policy tightly limits how JotSpot can use the information it collects and who JotSpot may give that information to. Despite these promises, JotSpot posts sensitive user details on publicly-available web pages.

This page documents the sensitive data I have found on Google JotSpot servers, analyzes likely harms, compares JotSpot's practices with its promises, and considers implications in light of Google's broad and growing efforts to store user data on centralized servers.

JotSpot's Privacy Leaks

For wikis hosted at Google JotSpot, "user management" pages offer lists of all registered for users. See Screenshot 1. For each listed user, a link offers access to a detail page presenting more information about that specific user. Details page report usernames, full names, and full email addresses, along with technical details such as preferred edit style, time zone, and (for some users) instant message usernames. See Screenshot 2.

This data is posted for thousands of wikis hosted at Google JotSpot. Searching for "user management" at jot.com (via the Google search ["user management" site:jot.com]) yields 2,400+ results. See Screenshot 3.

This data is available even for secure wikis. For example, when I request codebook.jot.com, I am redirected to the login screen shown in Screenshot 4. By all indications, the operators of this wiki have elected to deny access by the general public. Yet the codebook wiki's user listing is widely available. See Screenshot 5 (user list) and Screenshots 6-8 (user details for codebook participants as distinguished as uber-cyberprof Lawrence Lessig).

JotSpot also allows the public to view the special roles of group administrators. See Screenshot 9, showing the various levels of administrators in the codebook Jot group.

Harms

I see three distinct harms from these JotSpot postings.

1) Posting email addresses on publicly-available web pages invites massive unsolicited commercial email. Crawlers read publicly-accessible pages and add listed addresses to their distribution lists. A 2003 study by the Center for Democracy and Technology found that more than 95% of tested spam was sent to addresses publicly posted on the web -- confirming the inadvisability of posting email addresses for all to see. Yet Google JotSpot posts email addresses without taking any steps to protect users from spam. In particular, JotSpot posts addresses in ordinary easily-readable text and in ordinary unencoded HTML, without the encoding many sites now recommend (1, 2, 3).

2) Full names and group memberships are reasonably viewed as sensitive and unsuitable for public distribution by JotSpot. Users and administrators have a particularly strong expectation of privacy in the context of closed groups like codebook.jot.com, where access to group discussion requires registration. Moreover, page titles on user-detail pages exactly indicate that the pages are "restricted" -- falsely suggesting that the contents of the pages are not available to the general public, when in fact the pages are available to anyone who cares to look.

3) The additional data provided by JotSpot exposes users to unpredictable cross-cutting attacks. For example, with a combination of a user's name, email address, JotSpot group membership and role, and instant message username, a perpetrator could send a compelling social engineering attack -- perhaps pretending to be a group administrator seeking assistance or document review.

Separately, if this posting violates JotSpot's privacy (as discussed below), such a violation is a harm in and of itself. The FTC has stated that violating a privacy policy, even inadvertently, may be an unfair or deceptive trade practice. (See e.g. Eli Lilly and Company.)

JotSpot's Promises: Privacy and Security

JotSpot's privacy policy specifically limits how JotSpot can use the information it receives from users. JotSpot promises that it "will not sell, rent, or share any user data with third parties in personally identifiable form without [users'] express permission" except as disclosed in the JotSpot privacy policy. As best I can tell, no provision of the privacy policy allows JotSpot to share users' full names, email addresses, administrative roles, or liked IM accounts. I therefore conclude that JotSpot's distribution of this information exceeds the permission users granted when users accepted JotSpot's privacy policy.

JotSpot offers specific representations as to the purported security of its operations. In three separate press releases (1, 2, 3), JotSpot specifically promises that its service is "secure." JotSpot's privacy policy also promises adequate security: JotSpot claims "Firewalls and other advanced security technologies are employed to prevent interference or access from outside intruders. These safeguards help prevent unauthorized access, maintain data accuracy, and ensure the appropriate use of data." Perhaps firewalls and other advanced techniques are used, but these techniques did nothing to prevent the easy access demonstrated above. Finally, as shown in Screenshots 6-8, JotSpot page titles reiterate that sensitive information is purportedly "restricted" by the JotSpot system -- even when in fact no such restriction exists.

JotSpot's Privacy Lapse in a Google'd World of Server-Based Computing

Google's recent services present a vision of server-based computing -- with users' search history, email, calendar, documents, presentations, spreadsheets, and even medical history all stored on systems Google operates.

Google's centralized approach to data storage reflects a major change from current practice. At present, users (and their employers) generally directly control the systems that house their data -- so users (or employers) can examine security practices first-hand, can personally assess security glitches, and can discuss relevant practices with responsible designers and administrators. Not so in Google's world, where implementation is delegated to Google, where Google typically does not provide robust customer support, and where Google is unlikely to discuss the details of its privacy and security policies. Indeed, users are asked to trust Google's approach without any apparent way to verify what protections Google has implemented on their behalf. Furthermore, Google's terms of service and other agreements systematically disclaim any promise that systems will be secure.

In fact, a series of recent vulnerabilities have shown the limits of relying on Google security. For example, a July 2008 Google glitch let any user obtain the full name associated with a Gmail account. A September 2007 vulnerability let arbitrary web sites modify users' Gmail accounts to forward mail to attackers, if users were logged in to Gmail with their passwords saved. A January 2007 vulnerability let arbitrary web sites retrieve users' Gmail contact lists.

Publicly-reported vulnerabilities probably significantly understate the true scope of privacy lapses at Google. Consider Google's likely response when its staff find vulnerabilities. For companies covered by the data breach notification laws present in at least 44 states, consumer notification is generally compulsory. But Google is generally not subject to these notification requirements: While Google collects extensive information about its users, Google's records typically do not include the specific data elements (e.g. social security numbers and financial information) that trigger notification statutes. As a result, there is no guarantee that Google would tell users about whatever further privacy lapses Google uncovers; certainly Google's privacy policies make no such guarantee. Thus, there's strong reason to suspect that Google has actually faced additional data breaches beyond those known to the public.

Managing potential vulnerabilities becomes that much harder as Google's services grow in number and complexity. Meanwhile, as these services become increasingly widely used, each slip-up exposes an ever-larger amount of data. So far few users seem concerned, but I suspect these hidden challenges will ultimately impede the server-based applications Google envisions.

Google's Response

Google responded to c|net coverage of this privacy lapse by claiming its systems are operating just as intended. Google argued: "The information in these wikis is accessible because they have been set to public on the Site Permissions page. Users are always in control of the information they share. If wikis are set to private, no information will be publicly accessible."

I see four separate problems with Google's argument:

1) Users never agreed to the postings at issue. As best I can tell, users nowhere agreed to have their email addresses (and other personal information) posted for all to see. For example, a JotSpot account creation page requested user details without mentioning where or how this information would be displayed.

If users actually agreed to have their email addresses and other data shared by JotSpot, at least some users should remember granting that permission. It might be informative to survey a large number of affected users to see how they thought their data would be shared. To get started, I checked with a computer security expert whose details I found within JotSpot listings. Based on his industry expertise, he might reasonably be expected to recall agreeing to the share the information JotSpot posted. But he told me he does not recall being asked, nor does he believe he granted consent for his details to be posted.

Rather, as detailed below, JotSpot's posting of user data stemmed not from user decisions, but from decisions made by the administrators who configure wikis hosted at JotSpot. In fact, the third sentence of Google's response confirms that administrators, not ordinary users, play the key role; ordinary users cannot set wikis to public or private. Thus, Google errs in its second sentence, where Google claims "users" control their information sharing.

2) Administrator permission is insufficient to justify posting sensitive data about specific individual users. The difference between administrator permission and user permission is crucial for the data at issue here. Users' email addresses pertain not to the group as a whole, but to the corresponding individual users. It is nonsensical to ask administrators to grant permission to share data that is not theirs to give.

Moreover, administrators are not party to JotSpot's privacy commitment to users. JotSpot's privacy policy has the force of contract between JotSpot and its users -- but wiki administrators are not part of that contract, and the privacy policy is nowhere conditioned on wiki administrators' actions. JotSpot users would be alarmed by a privacy policy that said their sensitive data could be distributed at the whim of independent wiki administrators -- and, indeed, the privacy policy says nothing of the kind.

3) Administrators' supposed decisions were ambiguous and ill-informed. As best I can tell, JotSpot user lists (including email addresses) became publicly available if an administrator used JotSpot's "Global Settings" screen to set "guest user priveleges" to include "read pages: yes." (See screenshot 3 of JotSpot's GlobalSettingsManageDoc reference.) But notice the plain language of this setting, letting administrators specify whether guest users may "read pages" (emphasis added). Making "pages" publicly available in no way implies similar distribution of a user list -- not to mention users' email addresses. There is no reason to think an administrator who chose to let the public "read pages" also intended to distribute user lists and user email addresses.

A savvy JotSpot administrator might find JotSpot's "BrowseUsersListDoc" reference. In a final paragraph, that page opaquely mentions the security implications of the user list function: "Administrators may set page permissions so that user profiles are not visible. This could be a requirement for teams with higher security requirements." But this terse description offers little benefit to typical administrators. For one, this text appears in the documentation of an entirely separate administrative function; once an administrator sets a site to let guests "read pages," user lists and email addresses are already available -- without the administrator ever finding this BrowseUsersListsDoc reference. Furthermore, the quoted text is remarkably hard to understand. Compare the following alternative: "By default, if you set your site contents to be visible to the general public, then you will also provide the general public with your full user list, including user email addresses."

4) JotSpot's approach to user lists and user email addresses is unreasonable and ill-advised. What administrator would want to share user email addresses given the well-known risk of spam from email harvesting? Google rightly masks email addresses in Google Groups, in Orkut, and in other Google systems that might otherwise provide fodder for address harvesters. There's no good reason to proceed differently here. Instead, Google should prioritize defaults and options that accommodate reasonable users, reasonable administrators, and standard use cases. If Google elects to offer privacy settings widely viewed as unwise or insecure, Google could helpfully alert administrators to their possible errors -- rather than making such errors so natural that they are virtually inevitable.

On the most charitable view, these privacy lapses stemmed from Google JotSpot's complexity -- from the subtle interactions between user preferences, administrator preferences, user disclosures, and administrative disclosures. JotSpot's complexity certainly creates a heightened opportunity for confusion, error, and unexpected outcomes. But such complexity is inherent in the multi-user, collaborative systems Google increasingly offers. Who can adjust security and privacy settings for a shared Google Docs draft? A shared calendar? A family's shared medical records? These questions deserve clear and easy answers. Users shouldn't have to ponder obscure or convoluted documentation to figure out where their data may end up.

Going forward, developers of collaborative software should clarify exactly which users have the power to show or hide what user data. Such clarity could manifest itself in sites' engineering plans, user interfaces, privacy policies, and documentation. For JotSpot, I suspect this approach would yield an alternative design -- perhaps posting a user's details only if both 1) the user accepted such posting, and 2) the site administrator enabled such posting. Whatever the details, I'm confident that careful evaluation would yield an appraoch importantly superior to JotSpot's current practice.

I notified Google of this privacy lapse on October 23. On October 27, at least some of the affected sites were modified to prevent the disclosures. As of October 30, when this issue began to attract media attention, my tests indicate that every affected site was modified to prevent the disclosures. On October 31, I received a message from JotSpot staff indicating that all User Management pages have been set to private

Posted: October 30, 2008. Sign up for notification of major updates and related work.