eCommerce Sitemap: XML and HTML Sitemaps for Online Stores

An eCommerce sitemap is a structured file that lists the important URLs on your online store so search engines can discover, crawl, and index them efficiently. While any website benefits from a sitemap, online stores have a particularly strong case: they tend to carry large catalogs, deeply nested product and category pages, frequently changing inventory, and thousands of URLs that no single navigation menu can fully expose. Without a well-built sitemap, your newest products and deepest category pages may sit undiscovered for weeks.

This guide explains what eCommerce sitemaps are, how XML and HTML sitemaps differ, how to structure them for large catalogs, and how to keep them clean so that search engines spend their crawl budget on the pages that actually earn revenue.

Key Takeaways
• An XML sitemap is built for search engines; an HTML sitemap is built for human visitors.
• Online stores need sitemaps more than most sites because of large catalogs and deep, hard-to-reach product pages.
• XML sitemaps have hard limits (50,000 URLs / 50MB uncompressed), so large stores split them and reference each file from a sitemap index.
• Include products and categories; exclude cart, checkout, and faceted/filtered duplicate URLs.
Sitemap hygiene, not just having a sitemap, is what protects crawl budget on enterprise-scale stores.

What is an eCommerce sitemap and why do online stores need one?

A sitemap is a file that tells search engines which URLs on your store you consider worth crawling. For a small brochure site, this is a convenience. For an online store, it is closer to a necessity.

Consider the structure of a typical catalog. A store might have a handful of top-level categories, dozens of subcategories beneath them, and thousands of individual product pages several clicks deep. Search engine crawlers discover pages primarily by following links, and the deeper a product sits, the less likely a crawler is to reach it organically before exhausting its allotted crawl budget for your domain. A sitemap shortcuts this process by handing the crawler a direct list of URLs.

Online stores also change constantly. New products launch, seasonal lines retire, prices update, and stock levels shift. A maintained sitemap signals these changes, helping search engines re-crawl updated pages and discover fresh inventory faster than link-following alone would allow.

What is the difference between XML and HTML sitemaps?

The two sitemap types serve different audiences. An XML sitemap is a machine-readable file submitted to search engines, while an HTML sitemap is a human-readable page linked within your store. Most online stores benefit from having both.

Aspect XML Sitemap HTML Sitemap
Primary audience Search engine crawlers Human visitors
Format Structured XML markup A regular web page with links
Location Usually `/sitemap.xml`, submitted to Search Console A linked page, often in the footer
Contains URLs plus optional metadata (last modified, etc.) Categorized, clickable links
Main benefit Faster, more complete discovery and indexing Navigation aid and internal link distribution
Size limits 50,000 URLs / 50MB per file Practical UX limits only

In short, the XML sitemap accelerates indexing, and the HTML sitemap improves user navigation while passing internal link signals to deep pages.

How is an XML sitemap structured for an online store?

An XML sitemap is a list of `` entries wrapped in a `` element. Each entry contains a `` (the URL) and may include optional fields such as `` to indicate when the page last changed. For online stores, the `` value is especially useful because it helps search engines prioritize re-crawling recently updated product pages.

A minimal entry looks like this:

“`xml https://www.example-store.com/products/wireless-headphones 2026-06-20 “`

Keep entries to canonical, indexable URLs only. Pointing a sitemap at redirects, error pages, or non-canonical variants sends mixed signals and wastes the crawler’s time.

How do sitemap index files work for large stores?

XML sitemaps have hard limits: a single file may contain no more than 50,000 URLs and must not exceed 50MB uncompressed. A large catalog will blow past these limits quickly, so the solution is to split the catalog across multiple sitemap files and reference all of them from a single sitemap index file.

A sitemap index is itself an XML file, but instead of listing pages it lists other sitemaps:

“`xml https://www.example-store.com/sitemap-products-1.xml https://www.example-store.com/sitemap-products-2.xml https://www.example-store.com/sitemap-categories.xml “`

A common practice is to organize child sitemaps by type, for example one or more product sitemaps, a category sitemap, and an image sitemap. This makes it easier to diagnose indexing issues in Search Console, because coverage reports can be read per sitemap rather than for one undifferentiated blob.

What should an eCommerce sitemap include and exclude?

The guiding principle is simple: a sitemap should list the pages you want indexed and nothing else. Including junk URLs dilutes the signal and burns crawl budget.

Include:

  • Product pages that are live and indexable
  • Category and subcategory pages that you want ranking
  • Important content pages such as buying guides, brand pages, and your blog
  • Image and video URLs associated with products, where supported

Exclude:

  • Cart and checkout pages, which serve no search purpose
  • Account, login, and order-confirmation pages
  • Faceted and filtered URLs that generate near-duplicate content
  • Internal search result pages
  • Non-canonical URL variants and tracking-parameter URLs

How should faceted navigation and out-of-stock products be handled?

Faceted navigation, the filters that let shoppers narrow results by color, size, price, and brand, is one of the biggest sources of crawl waste on large stores. Each combination of filters can generate a unique URL, and a catalog with several filter dimensions can spawn an effectively infinite number of low-value, near-duplicate pages. These should be kept out of the sitemap, and ideally controlled further with canonical tags, `noindex` directives, or crawl rules so that search engines are not lured into crawling them.

Out-of-stock products require a judgment call. If an item will return, keeping its URL live and in the sitemap preserves accumulated ranking signals. If it is permanently gone, the cleaner path is usually a redirect to a relevant category or replacement product, in which case the discontinued URL should leave the sitemap. The worst outcome is a sitemap full of dead-end pages, because it teaches crawlers that your sitemap is unreliable.

Here is what separates stores that index well from stores that struggle: for a large catalog, sitemap hygiene matters more than simply having a sitemap. Once you cross into tens of thousands of URLs, search engines will not crawl everything on every visit. They allocate a finite crawl budget, and every faceted duplicate, every parameterized variant, and every dead out-of-stock page you leave in your sitemap is a crawl you spent on a page that earns nothing. A lean, accurate sitemap is effectively a set of instructions telling the crawler, “spend your limited attention here.” The competitive advantage at scale is not the sitemap’s existence, it is its discipline.

Do online stores need image and video sitemaps?

Product imagery is a meaningful traffic source through image search, and image sitemaps help search engines associate your product photos with their pages. Image information can be embedded directly within your product sitemap entries using image-specific tags, or maintained as a dedicated image sitemap. For stores that publish product demos, unboxings, or how-to videos, a video sitemap can similarly surface that content in video search results with metadata such as title, description, and duration.

These are optional layers, but for visually driven categories such as fashion, furniture, and home goods, they can unlock discovery channels that a text-only sitemap would never reach.

What about the HTML sitemap for users?

An HTML sitemap is a single page, usually linked from the footer, that organizes your store’s main categories and key pages into a clean, clickable hierarchy. Its first job is user experience: a visitor who cannot find a section through the main navigation can use it as a directory.

Its quieter job is internal linking. An HTML sitemap creates crawlable links to category pages and important sections, distributing internal link signals and giving crawlers another discovery path. For very large catalogs you would not list every product here, that belongs in XML, but a well-structured HTML sitemap covering categories and key landing pages complements the XML layer nicely.

How do you generate and submit an eCommerce sitemap?

Most modern platforms handle sitemap generation automatically. WooCommerce stores typically rely on an SEO plugin to produce and update XML sitemaps, while Magento includes built-in sitemap generation with configurable settings for products, categories, and images. Hosted platforms generally expose a sitemap at a predictable URL without any setup.

When a platform’s native output is not enough, options include dedicated SEO plugins, standalone sitemap generators, and custom scripts for headless or bespoke builds. Whichever route you choose, the priorities are the same: the sitemap should update automatically as inventory changes, respect the 50,000-URL and 50MB limits with proper splitting, and exclude the noise discussed above.

To get your sitemap working in search:

  1. Locate or generate your sitemap (often at `/sitemap.xml`).
  2. Reference it in robots.txt with a `Sitemap:` directive so crawlers can find it.
  3. Submit it to Google Search Console under the Sitemaps report, and to other webmaster tools you use.
  4. Monitor coverage in Search Console to catch indexing errors, excluded URLs, and discovered-not-indexed pages.
  5. Keep it fresh so new products appear and removed products drop off automatically.

Built for large catalogs: DarazHost eCommerce Hosting

Sitemaps tell search engines what to crawl, but your hosting determines how efficiently those crawls actually complete. When a crawler hits tens of thousands of product URLs, slow response times shrink the number of pages it fetches per visit, quietly throttling your indexing. DarazHost eCommerce hosting is built for exactly this scenario: SSD-backed storage and server-side caching keep response times low so crawlers and shoppers alike move through large catalogs quickly. Our infrastructure is scalable, absorbing both crawler traffic and real customer demand without slowing down, and every plan includes free SSL, performance tuning for crawl efficiency, and 24/7 expert support. If your store is outgrowing its current setup, fast and reliable hosting is the foundation that lets a clean sitemap do its job.


How do you keep an eCommerce sitemap updated?

A sitemap is not a one-time deliverable. Because inventory shifts daily, your sitemap should regenerate automatically whenever products are added, edited, or removed. Most platforms and SEO plugins handle this dynamically, but you should still verify it periodically.

Build a simple maintenance rhythm: confirm new products appear in the sitemap, confirm removed products drop off, watch the Search Console coverage report for spikes in errors or excluded URLs, and re-check your exclusion rules whenever you add new filters or facets to the storefront. A sitemap that drifts out of sync with the live catalog slowly loses the crawler’s trust, which is the opposite of what you want.

Frequently Asked Questions

Do I need both an XML and an HTML sitemap for my store? Most stores benefit from both. The XML sitemap drives indexing for search engines, while the HTML sitemap aids human navigation and distributes internal links. They serve different audiences and do not replace each other.

How many URLs can a single XML sitemap contain? A single XML sitemap file can hold up to 50,000 URLs and must stay under 50MB uncompressed. Larger catalogs split URLs across multiple files referenced by a sitemap index file.

Should out-of-stock products stay in my sitemap? If the product will return, keep its URL live and in the sitemap to preserve ranking signals. If it is permanently discontinued, redirect the URL to a relevant page and remove it from the sitemap rather than leaving a dead end.

Why should I exclude faceted or filtered URLs from my sitemap? Faceted navigation can generate enormous numbers of near-duplicate URLs. Including them wastes crawl budget on low-value pages and can dilute the ranking signals of your canonical category pages.

Where do I submit my eCommerce sitemap? Submit it through Google Search Console’s Sitemaps report and any other webmaster tools you use. Also reference the sitemap URL in your robots.txt file so crawlers can discover it automatically.

Conclusion

For online stores, a sitemap is not a formality, it is part of how products get found. An XML sitemap accelerates discovery and indexing, sitemap index files keep large catalogs within technical limits, and an HTML sitemap improves navigation while strengthening internal links. But the real differentiator at scale is hygiene: by excluding cart pages, faceted duplicates, and dead inventory, you guide your store’s crawl budget toward the pages that drive sales. Pair that discipline with fast, scalable hosting, and search engines can crawl, index, and rank your catalog the way it deserves.

About the Author

Leave a Reply