How Many .com Domain Names Are Unused?

By on

When looking for .com names, I've been frustrated by how many are already taken but appear to be unused. It can feel like people are registering every pronounceable combination of letters in every major language, and even the unpronounceable short ones. Is there rampant domain speculation, or do I just think of the same names as everyone else? Let's look at the data...

There are currently 137 million .com domain names registered.1 Of these, roughly 1/3 are in use (businesses, personal websites, email, etc.), another 1/3 appear to be unused, and the last 1/3 are used for a variety of speculative purposes.

.com Domain Usage, from a sample of 2,188 domains
Content Mail Private Empty Error No Web Server Parked Ads For Sale Gambling Porn Redirect In Use Not In Use Speculation Porn Private Redirect Mail Gambling 3.0% Parked 4.8% Error 5.7% For Sale 7.1% Empty 9.2% No Web Server 11% Ads 23% Content 31%
Click a category to go to its explanation.
99% CI ±3%2

How I Determined These Numbers

I started by crawling a random sample of the domains from the top-level .com DNS zone file,3 until reaching 100,000 valid domains.4

For each domain, I collected the following:

The crawl took a little over 48 hours from a single server located in a Singapore data center. I ran a follow-up crawl for any domains that failed to connect over HTTP or HTTPS (in case of transient errors). And finally, for the 2,188 domains to be categorized I manually checked any that had failed in case the crawler had timed out or had DOM events blocked by JavaScript.

Then, I wrote a script to help me categorize websites based on their screenshot and body.

The categorization script presents the possible categories as a list of buttons, with Content being the default.

I used the script to categorize domains over the next 2 days.8 In some cases the screenshot and body were not sufficient, so I manually opened the domain in a web browser for inspection.

Summary Statistics and Insights

Top 10 .com Domain Registrars, from a sample of 100,000 domains
GoDaddy.com, LLC Tucows Domains Inc. Alibaba Cloud Computing (Beijing) Co., Ltd. Network Solutions, LLC eNom, Inc. NameCheap, Inc. 1&1 Internet SE PDR Ltd. d/b/a PublicDomainRegistry.com Xin Net Technology Corporation Wild West Domains, LLC Google Inc. OVH the remaining 1,841 registrars 33% 5.3% 4.2% 3.8% 3.5% 3.3% 2.5% 2.5% 2.1% 1.7% 1.5% 1.5% 35%
Domain ages according to WHOIS creation dates, from a sample of 100,000 domains
domain age (in years) 0 5 10 15 20 25 0% 5% 10% 15% 20% 25%

Domain Categories

These categories evolved as I worked. For example, I hadn't anticipated the high number of gambling domains (aliases).

For most categories I've included a random sample of screenshots from that category, excluding redundant ones.

Content (31% or ~43 million)

Content is the category of any domain with a website displaying unique content. It doesn't matter what the content is, as long as it appears to be unique for the domain and publicly accessible. When I was unsure, I placed domains in this category by default.

Ads (23% or ~31 million)

Note that half the domains in this category are GoDaddy parking pages, on which GoDaddy places Google ads based on the keywords related to the domain name.

No Web Server (11% or ~16 million)

If I was unable to connect to, or receive a valid response from, port 80 or 443 for either the top-level domain or the www subdomain and the domain had no MX records, I placed the domain in this category. Some of these domains likely have some non-web use, such as an FTP or video game server, but I expect them to be a small fraction. Additionally, the crawling server was only configured for IPv4, so any IPv6-only websites would have been grouped here.

Empty (9.2% or ~13 million)

An Empty domain is one for which a web server is answering requests, but returning empty pages, 404s, or unfilled templates (such as default WordPress installs).

The difference between an Empty domain and a Parked domain is that the Empty domain has presumably been configured by the user, but no content has been added yet.

For Sale (7.1% or ~9.8 million)

Many domains are listed For Sale, usually by domain investors, through various brokers and marketplaces. Nearly half of this category appears to be domains sold by HugeDomains, although their website lists only "over 200,000" domains available for purchase (a fraction of their ~4 million domains if the sample is representative). I only included domains from recognizable marketplaces or when the contact details were were not part of an ad placement, as ad networks and domain brokers will often falsely claim that they represent a domain owner (I categorized all such domains as Ads instead).

Error (5.7% or ~7.9 million)

If a domain returned any type of error, whether an HTTP error or an in-page error, it belongs to this category.

Note that I might have miscategorized some Private domains as Errors if they used basic authentication, as I did not distinguish between 403 Forbidden (due to no basic auth credentials) and other errors.

Parked (4.8% or ~6.5 million)

Parked domains are those that display a page from the registrar or host explaining that the domain has not been set up yet. To qualify as Parked, a domain had to serve a page without any external ads. It could advertise its own services, but it couldn't place ads from an ad network.

Gambling (3.0% or ~4 million)

All websites in this category are in Chinese and are operating under aliases, often short strings of numbers or consonants (e.g. 17770012 or tdwhtr). They follow common templates and contain similar images, often with automatically-generated logos. I assume their purpose is to attract people who think the names are lucky.

Mail (2.6% or ~3.5 million)

Any domain not in any other category, but with MX DNS records (for email), I categorized as Mail. I did not attempt to see if the mail server was working or if delivery was possible. It's possible that many of these domains are not actually used for email, but I've given them the benefit of the doubt.

Redirect (1.1% or ~1.6 million)

Redirects include vanity domains pointing to Facebook pages, alternative names for businesses, etc.

Private (0.64% or ~0.9 million)

Private domains did not appear to have any content accessible without first logging in (or in some cases registering).

Porn (0.59% or ~0.8 million)

Similar to gambling websites, a number of pornographic websites operate under various aliases. The websites were predominantly in Chinese and the domains followed similar naming patterns. As many of the sites display pornographic material directly (not after a warning), I've not included the screenshots here.

  1. ^ According to https://www.verisign.com/en_US/channel-resources/domain-registry-products/zone-file/index.xhtml there are 137,756,106 .com domains in the "active zone" as of 2019-01-27. I had previously verified this number against the DNS zone file downloaded on 2019-01-21.
  2. ^ Steven K. Thompson. Sample Size for Estimating Multinomial Proportions. The American Statistician, 4(1):42-46, 2 1987.
  3. ^ I downloaded the zone file from Verisign at 2019-01-21 02:00 UTC and crawled the domains from 2019-01-21 11:20:52 UTC to 2019-01-23 14:04:40 UTC.
  4. ^ Not all records in the zone file are valid domains. Some do not have a WHOIS record and may act as honeypots to catch people distributing and using zone files without permission. It's possible that there are also valid domains that act as honeypots, but without any way to identify them I've ignored that possibility for the purpose of this study. Additionally, approximately 1% of the records in the zone file are for name servers, not top-level domains. I excluded them from all analysis (i.e. only 98,854 of the 100,000 crawled records are used).
  5. ^ WHOIS records are directly from Verisign's WHOIS server.
  6. ^ I collected DNS records by issuing a DNS ANY query directly to the name servers listed in the domain's WHOIS record (in order to avoid inaccuracies due to caching and recursive resolution). A small number of DNS providers do not respond correctly or at all to ANY queries.
  7. ^ The crawler verified SSL certificates, so any HTTPS-only websites with invalid SSL certificates were classified as Error.
  8. ^ I did not manually categorize every website. When I noticed repetive and obvious cases, such as when the title of the page was <title>Error 404 (Not Found)!!1</title>, I used an appropriate regular expression to bulk categorize website bodies that matched. I previewed the matches beforehand to ensure that they were not overly broad, but it's possible that I misclassified some edge cases.
  9. ^ DropCatch.com uses numbered LLCs like DropCatch.com 1000 LLC, DropCatch.com 1001 LLC, DropCatch.com 1002, etc. Other drop catching operators have similar collections of names, but not all alternate registrars are named so obviously.