Bulk Domain Lookup with Python: RDAP, WHOIS, DNS & Registrar APIs
If you have ever needed to check the status of a hundred, a thousand, or a hundred thousand domain names at once, you already know that manual lookups do not scale. Whether you are auditing a brand’s domain portfolio, hunting for available names, or enriching a dataset with registration metadata, the right approach is a programmatic bulk domain lookup built in Python.
This guide walks through the four main techniques for querying domains at scale, when to use each, and how to do it without getting your IP blocked. The short version: for automation, reach for RDAP or an official registrar API before you ever touch WHOIS scraping.
Key Takeaways
• RDAP is the modern, structured (JSON) replacement for WHOIS and is the best default for bulk lookups.
• WHOIS still works via `python-whois`, but its unstructured text and aggressive rate limits make it fragile at volume.
• A DNS-based check (does the domain resolve NS/A records?) is the fastest way to make a rough “is it taken” guess.
• A registrar’s official API is the only fully reliable source for true availability and pricing.
• Always respect rate limits, add backoff, cache results, and never scrape abusively.
What are the four ways to look up domains in bulk?
Before writing any code, it helps to understand the trade-offs. Each method answers a slightly different question and carries a different cost in reliability, speed, and politeness.
| Approach | Returns | Format | Speed at volume | Best for |
|---|---|---|---|---|
| RDAP | Registration status, registrar, dates, nameservers | Structured JSON | Good (rate-friendly) | Default for automation and metadata |
| WHOIS (`python-whois`) | Registration details | Unstructured text | Poor (strict limits, blocks) | One-off or low-volume legacy lookups |
| DNS (`dnspython`) | Whether NS/A records exist | Structured records | Excellent | Fast “is it taken” heuristic |
| Registrar API | True availability, pricing, registration | Structured JSON | Good (provider-dependent) | Buying decisions, accurate availability |
The key takeaway from this table: no single method does everything. A robust pipeline often combines a fast DNS pre-filter with an RDAP enrichment pass, falling back to a registrar API when you need a definitive availability answer.
Why is RDAP the best default for bulk domain lookups?
The Registration Data Access Protocol (RDAP) is the IETF-standardized successor to WHOIS. Instead of returning a wall of free-form text that varies by registry, RDAP returns clean, predictable JSON. That single difference makes it dramatically easier to parse, validate, and store, which is exactly what you want when processing thousands of domains.
You can query a registry’s RDAP endpoint directly, or use the bootstrap service at `rdap.org`, which redirects your request to the authoritative server for that TLD. The latter is convenient because you do not have to maintain a map of which registry handles which extension.
Here is a polite bulk RDAP loop that reads domains, queries each one, handles retries and rate limits, and writes results to CSV:
“`python import csv import time import requests
RDAP_BOOTSTRAP = “https://rdap.org/domain/”
def rdap_lookup(domain, max_retries=3): “””Query RDAP for a single domain. Returns a dict of parsed fields.””” for attempt in range(max_retries): try: resp = requests.get( RDAP_BOOTSTRAP + domain, timeout=10, headers={“User-Agent”: “bulk-domain-checker/1.0”}, ) except requests.RequestException: time.sleep(2 ** attempt) # exponential backoff continue
if resp.status_code == 404:
return {“domain”: domain, “registered”: False, “registrar”: None}
if resp.status_code == 429:
retry_after = int(resp.headers.get(“Retry-After”, 2 ** attempt)) time.sleep(retry_after) continue
if resp.ok: data = resp.json() registrar = None for entity in data.get(“entities”, []): if “registrar” in entity.get(“roles”, []): registrar = entity.get(“vcardArray”, [None, []]) events = {e.get(“eventAction”): e.get(“eventDate”) for e in data.get(“events”, [])} return { “domain”: domain, “registered”: True, “status”: “, “.join(data.get(“status”, [])), “created”: events.get(“registration”), “expires”: events.get(“expiration”), “nameservers”: “, “.join( ns.get(“ldhName”, “”) for ns in data.get(“nameservers”, []) ), } time.sleep(2 ** attempt) return {“domain”: domain, “error”: “lookup_failed”}
def bulk_rdap(domains, out_path=”results.csv”, delay=1.0): rows = [] for domain in domains: rows.append(rdap_lookup(domain.strip())) time.sleep(delay) # be polite between requests
fieldnames = sorted({k for r in rows for k in r}) with open(out_path, “w”, newline=””, encoding=”utf-8″) as f: writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() writer.writerows(rows)
if __name__ == “__main__”: with open(“domains.txt”) as f: bulk_rdap([line for line in f if line.strip()]) “`
Notice the 429 handling and exponential backoff. These are not optional niceties; they are what keeps your script working past the first few hundred requests.
The insight that saves you the most pain: for bulk domain lookups, prefer RDAP (or a registrar’s official API) over scraping WHOIS. WHOIS at volume is unstructured, aggressively rate-limited, and a reliable way to get your IP blocked. Many people start with `python-whois` because it is the first result they find, hit a wall at a few hundred queries, and assume bulk lookups are simply hard. They are not. RDAP was designed for exactly this: clean JSON, predictable fields, and servers that expect automated clients. Switching protocols, not building a smarter scraper, is the real fix.
How do you do bulk WHOIS lookups in Python (and why be careful)?
WHOIS is the legacy protocol, and `python-whois` makes it easy to call. It still has its place for small, occasional lookups, but you should understand its limits before pointing it at a large list.
“`python import whois # pip install python-whois import time
def whois_lookup(domain): try: w = whois.whois(domain) return { “domain”: domain, “registrar”: w.registrar, “creation_date”: str(w.creation_date), “expiration_date”: str(w.expiration_date), } except Exception as e: return {“domain”: domain, “error”: str(e)}
for d in domains: print(whois_lookup(d)) time.sleep(3) # WHOIS servers are unforgiving; go slow “`
The problems show up at scale. WHOIS output is unstructured free text that differs between registries, so parsing is brittle. WHOIS servers enforce strict, often undocumented rate limits, and crossing them gets your IP temporarily or permanently blocked. For these reasons, treat `python-whois` as a fallback, not a foundation.
How can DNS give you a fast “is it registered” check?
If you only need a quick heuristic for whether a domain is taken, DNS resolution is by far the fastest method. A domain that has NS (nameserver) records is delegated and therefore registered. The `dnspython` library makes this a couple of lines:
“`python import dns.resolver # pip install dnspython
def is_registered_via_dns(domain): try: dns.resolver.resolve(domain, “NS”) return True # has nameservers -> registered except dns.resolver.NXDOMAIN: return False # does not exist -> likely available except (dns.resolver.NoAnswer, dns.resolver.NoNameservers): return True # exists but no NS answer; treat as registered except Exception: return None # inconclusive (timeout, etc.) “`
The caveat: DNS tells you whether a domain resolves, not whether it is technically available to register. A registered domain might temporarily lack records, and some statuses do not map cleanly to DNS. Use DNS as a pre-filter to cheaply discard obviously taken domains, then confirm the interesting ones with RDAP or a registrar API.
Adding concurrency without being abusive
For large lists, sequential requests are slow. You can speed things up with a bounded thread pool, but keep the worker count modest and the delays in place so you stay polite:
“`python from concurrent.futures import ThreadPoolExecutor
def bulk_dns(domains, workers=8): with ThreadPoolExecutor(max_workers=workers) as pool: return list(pool.map(is_registered_via_dns, domains)) “`
Concurrency is a tool, not a license. Eight to sixteen workers against your own recursive resolver is reasonable; hundreds of parallel requests against a registry’s RDAP endpoint is abuse and will get you blocked.
When should you use a registrar’s official API?
DNS and RDAP tell you whether a domain is *registered*. They do not reliably tell you whether it is *available to buy right now*, or what it costs. For that, you need a registrar’s official API.
Most registrars expose authenticated endpoints for domain availability checks, pricing, and even registration, returning structured JSON. This is the most reliable source for purchasing decisions because the registrar is the system of record. The trade-off is that each registrar’s API is different, so your integration is provider-specific and usually requires an API key and account.
“`python import requests
def check_availability(domain, api_key): resp = requests.get( “https://api.example-registrar.com/v1/domains/available”, params={“domain”: domain}, headers={“Authorization”: f”Bearer {api_key}”}, timeout=10, ) resp.raise_for_status() return resp.json() # e.g. {“domain”: …, “available”: true, “price”: …} “`
A practical pattern: use DNS or RDAP to narrow a huge list down to candidates that look available, then verify only those candidates through the registrar API to confirm availability and pricing. This minimizes calls to the rate-limited, sometimes metered registrar endpoint.
What are the best practices for bulk domain lookups?
Whichever method you choose, a few habits separate a script that works once from one that runs reliably for years:
- Prefer RDAP or official APIs over scraping. Structured data is easier to parse and the servers expect automated clients.
- Respect rate limits. Add delays between requests and honor `Retry-After` and HTTP 429 responses.
- Use exponential backoff on transient failures instead of hammering a failing endpoint.
- Cache results. Domain registration data changes slowly; do not re-query the same domain repeatedly within a short window.
- Keep concurrency modest and point heavy DNS workloads at a resolver you control.
- Log and checkpoint. For large lists, write results incrementally so a crash does not cost you the whole run.
Run your domain scripts on infrastructure you control
Bulk lookups, scheduled re-checks, and availability monitors all run best on a stable server rather than a laptop that sleeps. A DarazHost VPS gives you root access and a clean Python environment to install `requests`, `dnspython`, and `python-whois`, set up cron jobs for recurring scans, and run long jobs without timeouts or IP restrictions you would hit on shared hosting. When your script finally surfaces the names you want, you can register them through DarazHost domain registration and keep everything, scripts, scheduling, and DNS, under one roof, backed by 24/7 support.
Frequently Asked Questions
Is RDAP a full replacement for WHOIS? For most lookups, yes. RDAP is the IETF-standardized successor and returns structured JSON instead of free-form text. Adoption is effectively universal for generic TLDs, though a small number of legacy or country-code TLDs may still rely on WHOIS, so keep a WHOIS fallback for edge cases.
Will bulk domain lookups get my IP blocked? They can if you ignore rate limits. Scraping WHOIS at volume is the fastest way to get blocked. Using RDAP or official APIs, adding delays and exponential backoff, and keeping concurrency modest keeps you within acceptable use.
Can I check thousands of domains for availability for free? You can check whether thousands of domains are *registered* for free using RDAP or DNS. Confirming true *availability and pricing* generally requires a registrar’s API, which may be free with an account but is often rate-limited or metered.
Which Python library should I start with? Use `requests` for RDAP and registrar APIs, `dnspython` for DNS checks, and `python-whois` only as a low-volume fallback. RDAP via `requests` is the best foundation for a bulk pipeline.
How often should I re-check domain data? Registration data changes slowly, so cache results and re-check on a schedule that matches your need, often daily or weekly. For drop-catching or time-sensitive availability monitoring, you will check more frequently, which makes respecting rate limits even more important.