**Sitemap** is an XML file (typically at /sitemap.xml) listing the canonical URLs of a site, used by search engine crawlers to discover and prioritize content for indexing.
A sitemap follows the open sitemaps.org protocol, an XML format originally developed by Google, Yahoo, and Microsoft in 2005. The file lives at the site root by convention, though its actual location is declared in robots.txt and can be submitted directly to πGoogle Search Console for explicit crawl signaling. Each entry can carry optional metadata β lastmod, priority, changefreq β but Google has stated publicly that only lastmod is meaningfully used; the other fields are largely ignored.
For sites over 50,000 URLs or 50MB compressed, the protocol supports sitemap index files that point to multiple child sitemaps, allowing arbitrarily large libraries to be enumerated without breaking the size limit. Specialized variants exist for images, video, and news, each extending the format to surface media beyond default URL entries.
Sitemaps complement rather than replace the natural crawl. A well-internally-linked site can be discovered without one, but a sitemap accelerates discovery of new or orphaned content and provides a clean canonical-URL signal in cases where the crawl might otherwise surface query-string variants or alternate paths. For content-loop work, the sitemap is the canonical declaration of "what URLs should exist on this site."
