Files
2026-06-06 05:21:10 +00:00

11 KiB
Executable File

Table of Contents (TOC) — Complete Reference

This is the single source of truth for all TOC rules. Other files should reference this file instead of duplicating TOC instructions.

Overview

DOCX TOC is a 3-step process: Code → Post-process → User opens Word.

Step A: docx-js code generates empty TOC field structure
Step B: add_toc_placeholders.py fills it with visible placeholder entries
Step C: User opens Word → "Update Field" → real page numbers replace placeholders

All 3 steps are mandatory. Skipping any step results in a broken or empty TOC.

When to Add TOC

  • Recommended: Long or complex documents with many headings (reports, theses, papers, manuals)
  • Do NOT add: Resumes, contracts, letters, exam papers, short documents
  • postcheck rule: If document contains a "目录" title but no TableOfContents element → error

Step A: Code Generation (docx-js)

Insert 4 elements in sequence:

const { TableOfContents, Paragraph, TextRun, PageBreak, AlignmentType } = require("docx");

// 1. TOC title — ⛔ DO NOT use HeadingLevel (or TOC will index itself!)
new Paragraph({
  alignment: AlignmentType.CENTER,
  spacing: { before: 480, after: 360 },
  children: [new TextRun({
    text: "目  录",  // or "Table of Contents" for English docs
    bold: true, size: 32,
    font: { eastAsia: "SimHei", ascii: "Times New Roman" }
  })],
}),

// 2. TOC field element — ⚠️ first parameter is NOT displayed, it's internal name only
new TableOfContents("Table of Contents", {
  hyperlink: true,
  headingStyleRange: "1-3",  // match HeadingLevel range used in document
}),

// 3. ★ MANDATORY Refresh Hint — tells user how to update page numbers
new Paragraph({
  spacing: { before: 200 },
  children: [new TextRun({
    text: "Note: This Table of Contents is generated via field codes. To ensure page number accuracy after editing, please right-click the TOC and select \"Update Field.\"",
    italics: true, size: 18, color: "888888"
  })]
}),

// 4. ★ MANDATORY PageBreak after TOC — prevents TOC and body merging on same page
new Paragraph({ children: [new PageBreak()] }),

Heading Requirements

⚠️ CRITICAL: TOC only picks up paragraphs with heading: HeadingLevel.HEADING_X.

// ✅ Correct — Heading style, TOC can index
new Paragraph({
  heading: HeadingLevel.HEADING_1,
  children: [new TextRun({ text: "第一章 引言", bold: true, size: 32, color: c(P.primary) })]
})

// ❌ Wrong — manual bold + large font, TOC cannot detect
new Paragraph({
  children: [new TextRun({ text: "第一章 引言", bold: true, size: 32, color: c(P.primary) })]
})

Exceptions:

  • Cover title: does NOT need Heading style (should not appear in TOC)
  • "目录" title: MUST NOT use Heading style (prevents TOC from indexing itself)

Step B: Post-Processing Script

MUST run after generating the DOCX file:

python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx --auto

What the script does

  1. Extracts Heading 1-3 from the document as TOC entries
  2. Fixes docx-js fldChar structure bug (begin+instrText+separate merged in one <w:r>)
  3. Patches settings.xml with updateFields=true (Word prompts to refresh on open)
  4. Ensures Heading styles have outlineLvl (required for TOC field update)
  5. Ensures TOC 1/2/3 styles exist in styles.xml
  6. Injects placeholder entries with HYPERLINK + PAGEREF between separate and end fldChars
  7. Handles duplicate heading texts (each gets its own bookmark)

Error handling

The script exits with code 1 if:

  • No TOC field structure found (missing TableOfContents element)
  • TOC field has begin but no separate fldChar (malformed structure)
  • Field structure exists but no TOC instrText detected

If exit code = 1 → the generated code is wrong. Fix the code and regenerate.

Options

# Auto mode (recommended — default behavior)
python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx --auto

# Manual entries
python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx \
  --entries '[{"level":1,"text":"Chapter 1","page":"1"},{"level":2,"text":"Section 1.1","page":"2"}]'

Step C: User Opens in Word/WPS

  • Word: Detects updateFields=true → prompts "Update field?" → click Yes → real page numbers
  • WPS: May NOT auto-prompt. User must: right-click TOC → "Update Field" → "Update entire table"

The placeholder entries ensure TOC is not blank even without updating — users see heading titles with approximate page numbers.

Multi-Section Page Numbering

When a document has a TOC, the TOC MUST be in its own section so that body page numbering starts from 1. This applies to all document types with a TOC (reports, whitepapers, PRDs, academic papers, etc.) — not just academic papers.

Mandatory 3-section architecture for documents with cover + TOC:

sections: [
  { /* Section 1: Cover — no page number, no footer */
    properties: {
      page: { size: pgSize, margin: pgMargin },
      // ⚠️ Do NOT set page.pageNumbers here — docx-js emits empty <pgNumType/> which confuses WPS
    },
  },
  { /* Section 2: Front matter (abstract, TOC) — Roman numerals */
    properties: {
      type: SectionType.NEXT_PAGE,
      page: {
        size: pgSize, margin: pgMargin,
        pageNumbers: { start: 1, formatType: NumberFormat.UPPER_ROMAN },  // I, II, III...
      },
    },
    footers: { default: pageNumFooter() },  // see footer rules below
    children: [/* abstract + TOC title + TableOfContents + PageBreak */]
  },
  { /* Section 3: Body — Arabic numerals starting from 1 */
    properties: {
      type: SectionType.NEXT_PAGE,
      page: {
        size: pgSize, margin: pgMargin,
        pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL },  // 1, 2, 3...
      },
    },
    footers: { default: pageNumFooter() },
    children: [/* body content */]
  },
]

⚠️ Page Number API — Correct Nesting (CRITICAL)

Page number settings MUST be nested inside page.pageNumbers, NOT at properties top level:

// ❌ WRONG — docx-js ignores these, pgNumType will be empty
properties: {
  pageNumberStart: 1,
  pageNumberFormatType: NumberFormat.DECIMAL,
}

// ✅ CORRECT — docx-js writes start= and fmt= attributes
properties: {
  page: {
    pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL },
  },
}

WPS may ignore pgNumType fmt in the section properties. To ensure correct display, the footer PAGE field MUST include an explicit format switch via post-processing:

After generating the docx, unzip and patch each footer XML:

  • Roman numeral footer: replace PAGE with PAGE \* ROMAN \\** MERGEFORMAT
  • Arabic numeral footer: replace PAGE \* arabic \* MERGEFORMAT

⚠️ NEVER use \* decimal in instrTextdecimal is a docx-js API enum value (NumberFormat.DECIMAL for pgNumType XML attribute), NOT a valid Word field format switch. Using it causes page numbers to render as "1decimal", "2decimal". The correct Word field switch for Arabic numerals is always \* arabic.

// Post-process footer XML:
footerXml = footerXml.replace(
  /(<w:instrText[^>]*>)\s*PAGE\s*(<\/w:instrText>)/g,
  '$1 PAGE \\* ROMAN \\** MERGEFORMAT $2'  // or "arabic" for body section
);

Also remove any empty <w:pgNumType/> from the cover section (docx-js emits these even when no pageNumbers is set):

docXml = docXml.replace(/<w:pgNumType\/>/g, "");

Page Numbering Rules

Section Content Format Start Footer
Cover Title page None No footer
Front matter Abstract, TOC Roman (I, II, III) 1 PAGE \* ROMAN
Body Main content Arabic (1, 2, 3) 1 PAGE \* arabic

⚠️ The body section MUST set pageNumbers: { start: 1 } — otherwise page numbers continue from the front matter pages, causing TOC page references to be offset. This is the #1 cause of "TOC page numbers are wrong".

Common Causes of Incorrect Page Numbers

Cause Fix
pageNumberStart at properties top level Move to page: { pageNumbers: { start: 1 } }
Cover section emits empty <pgNumType/> Post-process to remove it
Footer uses bare PAGE without format switch Post-process to add \* roman or \* arabic
Cover and body in same section Separate cover into its own section
Multiple sections without pageNumbers.start Explicitly set on each section needing independent counting
headingStyleRange doesn't match headings Ensure headingStyleRange: "1-3" covers all HeadingLevel values used
Cover section has header/footer Don't set header/footer on cover section

TOC Refresh Hint (MANDATORY)

⚠️ When the document contains a TOC, you MUST add the following hint paragraph between the TableOfContents element and the PageBreak (so it appears on the TOC page, not the body page). This ensures users know how to refresh page numbers after editing.

new Paragraph({
  spacing: { before: 200 },
  children: [new TextRun({
    text: "Note: This Table of Contents is generated via field codes. To ensure page number accuracy after editing, please right-click the TOC and select \"Update Field.\"",
    italics: true, size: 18, color: "888888"
  })]
}),

5 Common TOC Bugs

# Bug Symptom Fix
1 "目录" heading uses HeadingLevel.HEADING_1 TOC includes "目录" as an entry Remove heading: from TOC title paragraph
2 No PageBreak after TableOfContents TOC and body text on same page Add new Paragraph({ children: [new PageBreak()] }) after TOC
3 Missing TableOfContents element Script cannot inject placeholders, TOC is empty Always include new TableOfContents(...) in code
4 Headings use bold+large instead of HeadingLevel TOC is empty even after running script Change all body headings to heading: HeadingLevel.HEADING_X
5 Script not run or exit code ignored TOC page shows only title + blank space Always run script; if exit code = 1, fix code and regenerate

Checklist (for self-check during generation)

  • Document has 3+ H1 → TOC is included
  • "目录" heading does NOT use HeadingLevel (prevents self-indexing)
  • new TableOfContents(...) element present (not just plain text)
  • PageBreak exists after TOC element (prevents merging with body)
  • All body chapter headings use heading: HeadingLevel.HEADING_X
  • add_toc_placeholders.py --auto runs after generation
  • Script exit code checked — if 1, fix code and regenerate
  • TOC page has visible placeholder content (not empty)
  • TOC Refresh Hint present — italic gray note after TOC PageBreak telling user to right-click → "Update Field"
  • outlineLevel: 0 for H1, 1 for H2, etc. (needed for TOC field update)