技能 编程开发 站点地图分析生成

站点地图分析生成

v20260410
seo-sitemap
分析与生成标准 XML 站点地图,校验格式、URL、状态码与 robots.txt 引用,并按数量自动拆分与生成结构文档,适用于站点审计与网站上线。
获取技能
496 次下载
概览

Sitemap Analysis & Generation

When to Use

  • Use when analyzing an existing XML sitemap or generating a new one.
  • Use when the user mentions sitemap issues, sitemap generation, or sitemap validation.
  • Use when checking URL coverage, sitemap limits, and sitemap quality rules.

Mode 1: Analyze Existing Sitemap

Validation Checks

  • Valid XML format
  • URL count <50,000 per file (protocol limit)
  • All URLs return HTTP 200
  • <lastmod> dates are accurate (not all identical)
  • No deprecated tags: <priority> and <changefreq> are ignored by Google
  • Sitemap referenced in robots.txt
  • Compare crawled pages vs sitemap; flag missing pages

Quality Signals

  • Sitemap index file if >50k URLs
  • Split by content type (pages, posts, images, videos)
  • No non-canonical URLs in sitemap
  • No noindexed URLs in sitemap
  • No redirected URLs in sitemap
  • HTTPS URLs only (no HTTP)

Common Issues

Issue Severity Fix
>50k URLs in single file Critical Split with sitemap index
Non-200 URLs High Remove or fix broken URLs
Noindexed URLs included High Remove from sitemap
Redirected URLs included Medium Update to final URLs
All identical lastmod Low Use actual modification dates
Priority/changefreq used Info Can remove (ignored by Google)

Mode 2: Generate New Sitemap

Process

  1. Ask for business type (or auto-detect from existing site)
  2. Load industry template from ../seo-plan/assets/ directory
  3. Interactive structure planning with user
  4. Apply quality gates:
    • ⚠️ WARNING at 30+ location pages (require 60%+ unique content)
    • 🛑 HARD STOP at 50+ location pages (require justification)
  5. Generate valid XML output
  6. Split at 50k URLs with sitemap index
  7. Generate STRUCTURE.md documentation

Safe Programmatic Pages (OK at scale)

✅ Integration pages (with real setup docs) ✅ Template/tool pages (with downloadable content) ✅ Glossary pages (200+ word definitions) ✅ Product pages (unique specs, reviews) ✅ User profile pages (user-generated content)

Penalty Risk (avoid at scale)

❌ Location pages with only city name swapped ❌ "Best [tool] for [industry]" without industry-specific value ❌ "[Competitor] alternative" without real comparison data ❌ AI-generated pages without human review and unique value

Sitemap Format

Standard Sitemap

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page</loc>
    <lastmod>2026-02-07</lastmod>
  </url>
</urlset>

Sitemap Index (for >50k URLs)

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
    <lastmod>2026-02-07</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-posts.xml</loc>
    <lastmod>2026-02-07</lastmod>
  </sitemap>
</sitemapindex>

Error Handling

  • URL unreachable: Report the HTTP status code and suggest checking if the site is live
  • No sitemap found: Check common locations (/sitemap.xml, /sitemap_index.xml, robots.txt reference) before reporting "not found"
  • Invalid XML format: Report specific parsing errors with line numbers
  • Rate limiting detected: Back off and report partial results with a note about retry timing

Output

For Analysis

  • VALIDATION-REPORT.md: analysis results
  • Issues list with severity
  • Recommendations

For Generation

  • sitemap.xml (or split files with index)
  • STRUCTURE.md: site architecture documentation
  • URL count and organization summary
信息
Category 编程开发
Name seo-sitemap
版本 v20260410
大小 4.06KB
更新时间 2026-04-14
语言