How to Extract the Registrable Domain (eTLD+1) From a URL the Right Way
Extracting the registrable domain from a URL sounds trivial until you try it in production. You take a URL like `https://shop.example.co.uk/cart?id=99`, strip the path, split on dots, grab the last two pieces, and you get `co.uk`. That is not a domain anyone can register — it is a public suffix. This is the single most common bug in domain-parsing code, and it appears in analytics pipelines, cookie-scoping logic, allowlists, and security tools across the industry.
This guide explains what the registrable domain (also called eTLD+1 or the apex domain) actually is, why naive string splitting fails, and how to extract it correctly using the Public Suffix List (PSL) with battle-tested libraries in Python, JavaScript, and Java.
Key Takeaways
• The registrable domain is the effective top-level domain plus one label (eTLD+1) — the highest level a person or organization can actually register.
• You cannot compute it by taking the last two dot-separated segments. Suffixes like `co.uk`, `com.au`, and `github.io` span multiple labels.
• The Public Suffix List is the authoritative source of truth for which suffixes are “public.”
• Use a maintained PSL-backed library: tldextract or publicsuffix2 (Python), psl or tldts (JavaScript), Guava InternetDomainName (Java).
• Watch the edge cases: deep subdomains, multi-level ccTLDs, and private suffixes like `blogspot.com`.
What is the registrable domain (eTLD+1)?
The registrable domain is the part of a hostname that an individual or organization can purchase from a registrar. It is composed of two pieces:
- The effective TLD (eTLD) — also called the public suffix. This is the portion of the name under which the public can register subdomains directly. It may be a single label (`.com`) or several (`.co.uk`, `.org.au`, `.github.io`).
- Plus one label — the next label to the left of the public suffix. That label is what someone registers.
Put them together and you get eTLD+1, the registrable domain.
Consider `https://blog.example.co.uk/post/123`. Breaking the hostname `blog.example.co.uk` into its functional parts:
| URL part | Value | Role |
|---|---|---|
| Scheme | `https` | Protocol |
| Subdomain | `blog` | Host label under the registrable domain |
| Registrable label | `example` | The “+1” label that was registered |
| Public suffix (eTLD) | `co.uk` | Where registration happens |
| Registrable domain (eTLD+1) | `example.co.uk` | Apex domain |
| Path | `/post/123` | Resource location |
The registrable domain here is `example.co.uk`, not `co.uk` and not `blog.example.co.uk`.
Why does naive dot-splitting fail?
The intuitive approach is to split the hostname on `.` and keep the last two labels. It works for simple cases and breaks everywhere else.
“`python
def naive_registrable(host): return “.”.join(host.split(“.”)[-2:])
print(naive_registrable(“www.example.com”)) # example.com ✓ lucky print(naive_registrable(“shop.example.co.uk”)) # co.uk ✗ WRONG print(naive_registrable(“user.github.io”)) # github.io ✗ WRONG print(naive_registrable(“site.example.com.au”)) # com.au ✗ WRONG “`
The problem is that the number of labels in a public suffix is not fixed. A top-level domain like `.com` is one label, but country-code structures like `.co.uk`, `.com.au`, `.co.jp`, and `.ac.nz` are two. Some are even deeper, such as `.k12.ca.us`. On top of that, many platforms operate their own multi-label suffixes — `github.io`, `cloudfront.net`, and `s3.amazonaws.com` among them — where each user gets a subdomain that should be treated as a distinct site.
There is no algorithm that can infer suffix length from the string alone. The only correct approach is to consult a curated list of which suffixes are public.
Never extract a domain by taking the last two dot-segments. The day your code meets a `.co.uk`, `.com.br`, or `.github.io` host, it will silently treat `co.uk` as a registrable domain — collapsing thousands of unrelated sites into one bucket, mis-scoping cookies across tenants, or letting an allowlist match far more than you intended. Always resolve the suffix against a Public Suffix List library. The cost is one dependency; the alternative is a class of bug that hides until it reaches a UK, Australian, or Brazilian user.
What is the Public Suffix List?
The Public Suffix List (PSL) is a community-maintained registry, originally created by Mozilla, that enumerates every domain suffix under which the public can directly register names. Browsers use it to decide cookie scope and to group sites for security boundaries; you should use it for exactly the same reason.
The list has two sections that matter for extraction:
- ICANN suffixes — the “official” registry-operated suffixes, such as `com`, `co.uk`, `com.au`, and `org`.
- Private suffixes — suffixes operated by private companies that delegate subdomains to users, such as `blogspot.com`, `github.io`, and `herokuapp.com`. These are listed so that, for example, two different `*.github.io` user sites are treated as separate registrable domains.
This private section is the subtle part. Depending on your use case you may want `user.github.io` to resolve to `user.github.io` (treating the PSL private section as authoritative) or to `github.io` (ICANN-only). Good libraries let you choose.
A few representative entries clarify how the list reshapes extraction:
| Hostname | Matched public suffix | Registrable domain (eTLD+1) |
|---|---|---|
| `www.example.com` | `com` | `example.com` |
| `shop.example.co.uk` | `co.uk` | `example.co.uk` |
| `a.b.example.com.au` | `com.au` | `example.com.au` |
| `myapp.herokuapp.com` | `herokuapp.com` (private) | `myapp.herokuapp.com` |
| `octocat.github.io` | `github.io` (private) | `octocat.github.io` |
How do you extract the registrable domain in Python?
The most popular tool is tldextract, which downloads and caches the PSL and splits any URL into subdomain, domain, and suffix.
“`python
import tldextract
ext = tldextract.extract(“https://shop.example.co.uk/cart?id=99”) print(ext.subdomain) # shop print(ext.domain) # example print(ext.suffix) # co.uk print(ext.registered_domain) # example.co.uk <- eTLD+1 print(ext.top_domain_under_public_suffix) # example.co.uk ```
`registered_domain` gives you the registrable domain directly. The library handles IDNs, ports, and missing schemes gracefully.
If you want explicit control over the ICANN-versus-private distinction, publicsuffix2 (or the newer publicsuffixlist) exposes both:
“`python
from publicsuffix2 import PublicSuffixList
psl = PublicSuffixList() print(psl.get_public_suffix(“octocat.github.io”))
print(psl.get_public_suffix(“octocat.github.io”, strict=False))
“`
For production services, pin the PSL version or refresh it on a schedule. A stale list will miss newly delegated suffixes and mis-classify recent gTLDs.
How do you extract the registrable domain in JavaScript?
In Node.js and the browser, two libraries dominate: psl (a thin PSL wrapper) and tldts (a faster, fuller URL parser). Note that `tld.js` is the older, now largely superseded predecessor of `tldts`.
“`javascript // npm install psl const psl = require(‘psl’);
const parsed = psl.parse(‘shop.example.co.uk’); console.log(parsed.subdomain); // shop console.log(parsed.domain); // example.co.uk <- eTLD+1 console.log(parsed.tld); // co.uk ```
`tldts` accepts full URLs and returns a richer result, which is convenient when you are parsing arbitrary links:
“`javascript // npm install tldts const { parse } = require(‘tldts’);
const r = parse(‘https://a.b.example.com.au/path’); console.log(r.domain); // example.com.au <- eTLD+1 console.log(r.subdomain); // a.b console.log(r.publicSuffix); // com.au console.log(r.isIcann); // true ```
`tldts` also exposes flags to control whether private suffixes are honored, which matters when you decide how to treat platforms like `github.io`.
How do you extract the registrable domain in Java?
On the JVM, Guava’s `InternetDomainName` ships with an embedded copy of the Public Suffix List, so you need no extra data file.
“`java import com.google.common.net.InternetDomainName;
InternetDomainName host = InternetDomainName.from(“shop.example.co.uk”);
// topPrivateDomain() == eTLD+1, the registrable domain System.out.println(host.topPrivateDomain()); // example.co.uk System.out.println(host.publicSuffix()); // co.uk System.out.println(host.isUnderPublicSuffix()); // true “`
`topPrivateDomain()` is Guava’s name for the registrable domain. Guard against `IllegalStateException` for hosts that have no registrable domain (a bare public suffix like `co.uk`) by checking `isUnderPublicSuffix()` first. Because Guava bundles the PSL, remember to upgrade the Guava dependency periodically so the list stays current.
What edge cases break domain extraction?
Even with a PSL library, a few situations deserve explicit handling:
- Deep subdomains. `a.b.c.example.com` still has the registrable domain `example.com`. The library collapses everything left of the registrable label into the subdomain — make sure your code reads the registrable-domain field, not a reassembled string.
- Multi-level ccTLDs. `example.com.au`, `example.co.jp`, and `example.org.uk` each have a two-label suffix. This is precisely where naive code fails.
- Private suffixes. Decide deliberately whether `user.blogspot.com` should be `user.blogspot.com` (private section honored) or `blogspot.com` (ICANN only). For multi-tenant security boundaries, honoring private suffixes is usually correct.
- Bare public suffixes. Inputs like `co.uk` or `github.io` have no registrable domain. Your code should return null or an explicit error, not a partial string.
- IP addresses and `localhost`. These are not domains at all. Validate before extraction.
- Internationalized domains (IDNs). Normalize to punycode (or consistently to Unicode) before matching, since the PSL stores entries in a canonical form.
- Trailing dots and uppercase. Lowercase the host and strip a fully-qualified trailing dot (`example.com.`) before lookup.
A robust extractor lowercases the host, strips ports and trailing dots, rejects IPs and `localhost`, and returns a clear null when no registrable domain exists.
Run your domain-processing services on DarazHost
If you are building or hosting an application that parses, classifies, or routes by domain — an analytics ingester, a link scanner, an email-deliverability checker, or a multi-tenant platform that scopes data by registrable domain — you need an environment where you control the runtime and can refresh the Public Suffix List on your own schedule.
DarazHost Linux SSD VPS plans give you root access to run Python or Node.js parsing services exactly the way you want: install `tldextract`, schedule PSL updates with cron, and keep your dependencies pinned. With SSD-backed performance, 99.9% uptime, and 24/7 technical support, your domain-processing workloads stay fast and available. For smaller utilities and APIs, cPanel SSD Web Hosting offers a quick path to deployment, and every plan can pair with a so your own apex domain is sorted too. Explore to find the right fit.
Frequently asked questions
Is the registrable domain the same as the apex domain? Yes. “Apex domain,” “root domain,” “naked domain,” and eTLD+1 all refer to the same thing: the registrable domain — the public suffix plus the one label registered beneath it, such as `example.co.uk`.
Why can’t I just hardcode a list of two-part TLDs? Because the set of multi-label suffixes changes constantly. New gTLDs launch, ccTLD operators add structures, and private platforms register new suffixes. A hardcoded list goes stale and silently misclassifies hosts. The Public Suffix List is maintained for exactly this reason.
Should I honor the PSL private section? It depends on your goal. For security boundaries and cookie-like scoping, honoring the private section (so each `*.github.io` is distinct) is usually correct. For registrar-level grouping, you may prefer ICANN-only suffixes. Most libraries expose a flag to switch between the two.
How often should I update the Public Suffix List? For production systems, refresh it on a regular cadence — many teams update weekly or monthly. Pin a known-good version in your build, and automate refreshes so you do not silently run a list that is months out of date.
What does a PSL library return for an input that has no registrable domain? For a bare public suffix like `co.uk`, or for an IP address or `localhost`, there is no eTLD+1. Well-behaved libraries return null, an empty value, or raise an explicit error. Always handle that case rather than assuming a value is present.