How Many .com Domain Names Are Unused?
When looking for .com names, I've been frustrated by how many are already taken but appear to be unused. It can feel like people are registering every pronounceable combination of letters in every major language, and even the unpronounceable short ones. Is there rampant domain speculation, or do I just think of the same names as everyone else? Let's look at the data...
There are currently 137 million .com domain names registered.1 Of these, roughly 1/3 are in use (businesses, personal websites, email, etc.), another 1/3 appear to be unused, and the last 1/3 are used for a variety of speculative purposes.
How I Determined These Numbers
I started by crawling a random sample of the domains from the top-level .com DNS zone file,3 until reaching 100,000 valid domains.4
For each domain, I collected the following:
- the WHOIS record5
- all DNS records for the top-level domain and the
- HTTP and HTTPS7 responses (status code, headers, and bodies) for the root page of the top-level domain and the
- screenshots of the root page as viewed by Mozilla Firefox 64.0 on Linux
Then, I wrote a script to help me categorize websites based on their screenshot and body.
I used the script to categorize domains over the next 2 days.8 In some cases the screenshot and body were not sufficient, so I manually opened the domain in a web browser for inspection.
Summary Statistics and Insights
- GoDaddy is the registrar for 1/3 of all .com domain names. That's roughly 45 million domain names. Of those, 1 in 3 have parking pages. In other words, more than 10% of all .com domain names host GoDaddy ads pages.
- While there are 1,851 registrars in the sample, the majority of those are controlled by a smaller number of operators. For example, over 1,000 of the registrars are controlled by DropCatch.com alone.9
- 25% of domains were registered within the last year.
These categories evolved as I worked. For example, I hadn't anticipated the high number of gambling domains (aliases).
For most categories I've included a random sample of screenshots from that category, excluding redundant ones.
Content (31% or ~43 million)
Content is the category of any domain with a website displaying unique content. It doesn't matter what the content is, as long as it appears to be unique for the domain and publicly accessible. When I was unsure, I placed domains in this category by default.
Ads (23% or ~31 million)
Note that half the domains in this category are GoDaddy parking pages, on which GoDaddy places Google ads based on the keywords related to the domain name.
No Web Server (11% or ~16 million)
If I was unable to connect to, or receive a valid response from, port 80 or 443 for either the top-level domain or the www subdomain and the domain had no MX records, I placed the domain in this category. Some of these domains likely have some non-web use, such as an FTP or video game server, but I expect them to be a small fraction. Additionally, the crawling server was only configured for IPv4, so any IPv6-only websites would have been grouped here.
Empty (9.2% or ~13 million)
An Empty domain is one for which a web server is answering requests, but returning empty pages, 404s, or unfilled templates (such as default WordPress installs).
The difference between an Empty domain and a Parked domain is that the Empty domain has presumably been configured by the user, but no content has been added yet.
For Sale (7.1% or ~9.8 million)
Many domains are listed For Sale, usually by domain investors, through various brokers and marketplaces. Nearly half of this category appears to be domains sold by HugeDomains, although their website lists only "over 200,000" domains available for purchase (a fraction of their ~4 million domains if the sample is representative). I only included domains from recognizable marketplaces or when the contact details were were not part of an ad placement, as ad networks and domain brokers will often falsely claim that they represent a domain owner (I categorized all such domains as Ads instead).
Error (5.7% or ~7.9 million)
If a domain returned any type of error, whether an HTTP error or an in-page error, it belongs to this category.
Note that I might have miscategorized some Private domains as Errors if they used basic authentication, as I did not distinguish between 403 Forbidden (due to no basic auth credentials) and other errors.
Parked (4.8% or ~6.5 million)
Parked domains are those that display a page from the registrar or host explaining that the domain has not been set up yet. To qualify as Parked, a domain had to serve a page without any external ads. It could advertise its own services, but it couldn't place ads from an ad network.
Gambling (3.0% or ~4 million)
All websites in this category are in Chinese and are operating under aliases, often short strings of numbers or consonants (e.g. 17770012 or tdwhtr). They follow common templates and contain similar images, often with automatically-generated logos. I assume their purpose is to attract people who think the names are lucky.
Mail (2.6% or ~3.5 million)
Any domain not in any other category, but with MX DNS records (for email), I categorized as Mail. I did not attempt to see if the mail server was working or if delivery was possible. It's possible that many of these domains are not actually used for email, but I've given them the benefit of the doubt.
Redirect (1.1% or ~1.6 million)
Redirects include vanity domains pointing to Facebook pages, alternative names for businesses, etc.
Private (0.64% or ~0.9 million)
Private domains did not appear to have any content accessible without first logging in (or in some cases registering).
Porn (0.59% or ~0.8 million)
Similar to gambling websites, a number of pornographic websites operate under various aliases. The websites were predominantly in Chinese and the domains followed similar naming patterns. As many of the sites display pornographic material directly (not after a warning), I've not included the screenshots here.
- ^ According to https://www.verisign.com/en_US/channel-resources/domain-registry-products/zone-file/index.xhtml there are 137,756,106 .com domains in the "active zone" as of 2019-01-27. I had previously verified this number against the DNS zone file downloaded on 2019-01-21.
- ^ Steven K. Thompson. Sample Size for Estimating Multinomial Proportions. The American Statistician, 4(1):42-46, 2 1987.
- ^ I downloaded the zone file from Verisign at 2019-01-21 02:00 UTC and crawled the domains from 2019-01-21 11:20:52 UTC to 2019-01-23 14:04:40 UTC.
- ^ Not all records in the zone file are valid domains. Some do not have a WHOIS record and may act as honeypots to catch people distributing and using zone files without permission. It's possible that there are also valid domains that act as honeypots, but without any way to identify them I've ignored that possibility for the purpose of this study. Additionally, approximately 1% of the records in the zone file are for name servers, not top-level domains. I excluded them from all analysis (i.e. only 98,854 of the 100,000 crawled records are used).
- ^ WHOIS records are directly from Verisign's WHOIS server.
- ^ I collected DNS records by issuing a DNS
ANYquery directly to the name servers listed in the domain's WHOIS record (in order to avoid inaccuracies due to caching and recursive resolution). A small number of DNS providers do not respond correctly or at all to
- ^ The crawler verified SSL certificates, so any HTTPS-only websites with invalid SSL certificates were classified as
- ^ I did not manually categorize every website. When I noticed repetive and obvious cases, such as when the title of the page was
<title>Error 404 (Not Found)!!1</title>, I used an appropriate regular expression to bulk categorize website bodies that matched. I previewed the matches beforehand to ensure that they were not overly broad, but it's possible that I misclassified some edge cases.
- ^ DropCatch.com uses numbered LLCs like DropCatch.com 1000 LLC, DropCatch.com 1001 LLC, DropCatch.com 1002, etc. Other drop catching operators have similar collections of names, but not all alternate registrars are named so obviously.