Next.js SEO索引与爬取预算优化

v20260602

nextjs-seo-indexing

本指南专门为Next.js应用设计，用于优化SEO索引和管理爬取预算。内容涵盖修复Google Search Console的覆盖率错误、正确设置规范化标签、审核sitemap，以及优化静态渲染和内部链接结构，确保网站最大化搜索可见性。

Next.js SEO 索引搜索控制台 Sitemap 规范化抓取预算网页开发

获取技能

474 次下载

概览

Next.js SEO Indexing & Crawl Budget Skill

Fix Google Search Console coverage issues, canonical problems, sitemap errors, and crawl budget waste in Next.js apps.

When to Use

Use when a Next.js site has Google Search Console coverage issues such as duplicate canonicals, accidental noindex, crawl waste, or discovered-but-not-indexed URLs.
Use when auditing sitemap, robots.txt, redirect, internal-linking, or static-rendering problems before an SEO release.
Use when you need framework-specific examples for Next.js App Router metadata, generateMetadata, robots.js, and sitemap routes.

Understanding Search Console Coverage States

Status	Meaning	Fix
Crawled – not indexed	Google crawled but chose not to index	Improve content quality + canonical + internal links
Duplicate without canonical	Multiple URLs serve same content, no canonical	Add explicit canonical to the preferred URL
Excluded by noindex	`noindex` tag present	Remove noindex if page should be indexed
Duplicate, Google chose different canonical	Google prefers a different URL than you specified	Align canonical with the URL Google naturally picks
Alternative page with proper canonical	Correct — non-preferred duplicate pointing to canonical	Expected behavior, not a problem
Not found 404	Page deleted or URL changed	Add redirect or restore page
Discovered – not indexed	Google knows it exists but hasn't crawled it	Improve internal linking + crawl budget
Page with redirect	Redirect chain or redirect to wrong target	Shorten redirect chain, verify destination

Step 1 — Canonical Audit

Next.js App Router (metadata export)

// app/blog/my-post/page.js
export const metadata = {
  title: 'My Post Title',
  alternates: {
    canonical: 'https://www.yourdomain.com/blog/my-post',
  },
};

Next.js App Router (generateMetadata)

export async function generateMetadata({ params }) {
  return {
    alternates: {
      canonical: `https://www.yourdomain.com/blog/${params.slug}`,
    },
  };
}

Common canonical mistakes to fix:

// ❌ WRONG — relative URL
canonical: '/blog/my-post'

// ❌ WRONG — missing trailing slash inconsistency  
// (pick one and stick with it sitewide)

// ✓ CORRECT — absolute URL, consistent scheme + subdomain
canonical: 'https://www.yourdomain.com/blog/my-post'

Step 2 — Noindex Audit

Find pages that are accidentally noindexed:

# Search for noindex in metadata
rg -n --glob '*.{js,ts,jsx,tsx}' 'noindex|robots.*noindex' app pages

# Check layout.js — a noindex here affects ALL pages
grep -n "robots" app/layout.js

In Next.js App Router, robots in the root layout applies globally. Only set it there if you want the whole site affected.

// app/layout.js — only set robots if you need sitewide control
export const metadata = {
  // ✓ Allow indexing
  robots: { index: true, follow: true },
  // ❌ This would noindex the entire site:
  // robots: { index: false }
};

Step 3 — Sitemap Health

Verify sitemap routes return 200 + valid XML

curl -sI https://www.yourdomain.com/sitemap.xml | grep -i "content-type\|status"
curl -s https://www.yourdomain.com/sitemap.xml | head -20

Next.js App Router sitemap (recommended pattern)

// app/sitemap.js
export default async function sitemap() {
  const baseUrl = 'https://www.yourdomain.com';
  
  // Static pages
  const staticPages = [
    { url: baseUrl, lastModified: new Date(), changeFrequency: 'daily', priority: 1.0 },
    { url: `${baseUrl}/about`, lastModified: new Date(), changeFrequency: 'monthly', priority: 0.8 },
  ];
  
  // Dynamic pages (fetch from DB or CMS)
  const posts = await getPosts(); // your data fetch
  const dynamicPages = posts.map(post => ({
    url: `${baseUrl}/blog/${post.slug}`,
    lastModified: new Date(post.updatedAt),
    changeFrequency: 'weekly',
    priority: 0.7,
  }));
  
  return [...staticPages, ...dynamicPages];
}

Multiple sitemaps (sitemap index)

// app/sitemap-tools/sitemap.js  
// app/sitemap-blog/sitemap.js
// Each returns an array of URL entries

Step 4 — Static Rendering Verification

Pages must be statically generated (or SSR with metadata in HTML) for Google to see SEO tags.

# Check build output — pages should show ● (static) not λ (dynamic)
npm run build 2>&1 | grep -E "○|●|λ|/blog|/tools"

○  /about             (static)
●  /blog/[slug]       (SSG)  ← good
λ  /api/data          (serverless) ← expected for APIs

If important pages are λ (fully dynamic with no static generation), add:

// app/blog/[slug]/page.js
export async function generateStaticParams() {
  const posts = await getPosts();
  return posts.map(post => ({ slug: post.slug }));
}

Step 5 — Internal Linking Audit

Pages with zero internal links are rarely indexed. Every important page should be reachable from:

Homepage or navigation
A sitemap
At least one other content page

# Find pages that have no inbound links from other pages
# (manual check — grep for the slug across all files)
grep -r "/blog/my-orphan-post" --include="*.{js,ts,jsx,tsx,md}" . | grep -v "sitemap\|the-page-itself"

Step 6 — Redirect Audit

# Find all redirects in Next.js config
grep -A 3 "redirects" next.config.js

# Check for redirect chains (A → B → C — should be A → C)
# Test a suspected chain:
curl -sI https://www.yourdomain.com/old-url | grep -i location

// next.config.js — keep redirects flat (no chains)
async redirects() {
  return [
    {
      source: '/old-url',
      destination: '/new-url', // Must NOT itself redirect
      permanent: true, // 308 for SEO
    },
  ];
}

Step 7 — robots.txt Check

curl -s https://www.yourdomain.com/robots.txt

# ✓ Good
User-agent: *
Allow: /
Sitemap: https://www.yourdomain.com/sitemap.xml

# ❌ Bad — disallows crawling of important content
Disallow: /blog/
Disallow: /tools/

// app/robots.js (Next.js App Router)
export default function robots() {
  return {
    rules: { userAgent: '*', allow: '/' },
    sitemap: 'https://www.yourdomain.com/sitemap.xml',
  };
}

Indexing Checklist

All important pages have absolute canonical URLs
No important pages accidentally noindexed
Sitemap routes return 200 with valid XML
Sitemap submitted to Google Search Console
Important pages statically generated (●) in build output
No redirect chains (A→B→C should be A→C)
robots.txt allows important content
Every important page has ≥1 internal inbound link
generateStaticParams added for dynamic routes with known slugs

Limitations

Does not guarantee Google will index a page; final indexing decisions remain with the search engine.
Requires access to the codebase, deployed URLs, and ideally Google Search Console data for confident diagnosis.
Treat recommendations that change URL structure, redirects, or canonical policy as production-impacting and review them before deployment.

信息

Category 编程开发

Name nextjs-seo-indexing

版本 v20260602

大小 7.57KB

Source sickn33/antigravity-awesome-skills

更新时间 2026-06-03