Measuring Typosquatting -- Online Appendix
Tyler Moore & Benjamin Edelman* - Web Appendix to Measuring the Perpetrators and Funders of Typosquatting
Abstract: We describe a method for identifying `typosquatting', the intentional registration of misspellings of popular website addresses. We estimate that at least 938,000 typosquatting domains that target the top 3,264 .com sites, and we crawl more than 285,000 of these domains to analyze their revenue sources. We find that 80% are supported by pay-per-click ads, often advertising the correctly spelled domain and its competitors. Another 20% include static redirection to other sites. We present an automated technique that uncovered 75 otherwise legitimate websites which benefited from direct links from thousands of misspellings of competing websites. Using regression analysis, we find that websites in categories with higher pay-per-click ad prices face more typosquatting registrations, indicating that ad platforms such as Google AdWords exacerbate typosquatting. However, our investigations also confirm the feasibility of significantly reducing typosquatting. We find that typosquatting is highly concentrated: Of typo domains showing Google ads, 63% use one of five advertising IDs, and some large name servers host typosquatting domains as much as four times as often as the web as a whole.
Paper Contents: Introduction - Structure and Strategy of the Domaining Business - Measuring Typosquatting - How Typosquatting Domains are Used - Do Pay-Per-Click Ads Promote Typosquatting? - Countering Typosquatting - Conclusions
This online appendix lists specific typosquatting domains we found using the search process detailed in Measuring the Perpetrators and Funders of Typosquatting. We built automated systems to classify the revenue sources of each typosquatting domain, as detailed in section 3. In the links that follow, we present specific victims and perpetrators of typosquatting.
Victims of typosquatting
Most Popular Websites - This page details selected popular sites that are highly targeted by typosquatting.
Self-Advertising on Typo Domains - Many typosquatting domains display pay-per-click links promoting the same merchants that are targeted by typosquatting. This page lists popular websites suffering high rates of self-advertising on typo domains.
Self-Advertising on Typo Domains - Screenshots - This page presents screenshots of specific typo domains that prominently present ads for the same sites users attempted to visit. This page also notes the ad platform and partner IDs that profit from these typo domains.
Top Targets of Redirects to Competing Domains - Some typosquatting domains redirect to users to competitors' sites. That is, if a user mistypes one site's address, the user might end up at a competitor's service. This listing provides specific examples.
Perpetrators of typosquatting
Large Name Servers Resolving Many Typo Domains - We list selected large name servers resolving many typo domains, along with example typo domains and their revenue services. The frequent typosquatting on these large servers indicates that the problem of typosquatting is concentrated on certain hosts.
Small Name Servers Resolving Many Typo Domains - We list selected smaller name servers resolving a high proportion of typo domains, along with example typo domains and how they are used where available. While these servers host fewer domains, they have the highest rates of typosquatting -- raising questions of how and why these servers came to host such a high proportion of typosquatting domains.
Large Name Servers Resolving Few Typo Domains - These name servers reflect particular success, by the corresponding server operators, at avoiding typosquatting -- raising questions of why other name servers were so much less successful.
Google PPC Ad Client IDs Widely Used in Typosquatting - Google PPC ad partner IDs.
Estimating exposure to typosquatting
Estimating Visitors and Advertising Costs of Typo Domains - Using Alexa data on popularity of popular sites and their typosquatting knock-offs, we estimate the total number of visitors reaching typosquatting domains, and the associated costs to advertisers.
* One of the authors (Edelman) previously served as co-counsel in litigation against Google, arising out of Google's use of typosquatting domains to display advertising. See Vulcan Golf, LLC, et al. v. Google, et al., N.D.Ill., Case No. 1:20007cv03371.
Posted: February 17, 2010