Files
mantle-ai-trader/skills/docx/references/docx-js-advanced.md
2026-06-06 05:21:10 +00:00

258 lines
8.2 KiB
Markdown
Executable File

# docx-js Advanced Features
Advanced API for complex document scenarios. Load this when creating documents with TOC, cover pages, footnotes, multi-section layouts, or post-processing needs.
## Table of Contents (TOC)
**→ See `references/toc.md` for the complete TOC reference** (3-step process, code examples, page numbering, common bugs, checklist).
## Cover Page Design (Vertical Centering)
Use large `spacing.before` to push content down for visual centering:
```js
// Approximate vertical center on A4:
// Total printable height ≈ 14000 twips
// For title at ~40% from top: before = 5600
const coverSection = {
properties: {
page: { /* standard A4 */ },
// No headers/footers on cover page
},
children: [
new Paragraph({ spacing: { before: 5600 } }), // spacer
new Paragraph({
alignment: AlignmentType.CENTER,
children: [new TextRun({
text: title,
font: { ascii: "Calibri", eastAsia: "SimHei" },
size: 52, bold: true, color: palette.primary,
})],
}),
// ... subtitle, author, date
],
};
```
For multi-section documents, put the cover in its own section so it can have different headers/footers.
## Footnotes
```js
const { FootnoteReferenceRun, Footnote } = require("docx");
const doc = new Document({
footnotes: {
1: { children: [new Paragraph({ children: [new TextRun({ text: "Smith, J. (2024). Research Methods. Academic Press, pp. 45-67.", size: 18 })] })] },
2: { children: [new Paragraph({ children: [new TextRun({ text: "Zhang, W. (2023). \u201c数据分析方法研究\u201d. 科学通报, 68(12), 1234-1250.", size: 18 })] })] },
},
sections: [{
children: [
new Paragraph({
children: [
new TextRun({ text: "According to recent studies" }),
new FootnoteReferenceRun(1), // superscript [1]
new TextRun({ text: ", data analysis methods have evolved" }),
new FootnoteReferenceRun(2), // superscript [2]
new TextRun({ text: "." }),
],
}),
],
}],
});
```
### Academic Reference Pattern
For sequential references [1][2][3]..., pre-define all footnotes in the `footnotes` object with numeric keys, then reference them inline with `FootnoteReferenceRun(n)`.
## keepNext — Element Binding
Prevent page breaks between related elements:
```js
// Heading stays with next paragraph
new Paragraph({
heading: HeadingLevel.HEADING_2,
keepNext: true, // don't break after this
children: [new TextRun({ text: "Table 1: Results" })],
})
// Table immediately follows on same page
// Caption stays with image
new Paragraph({
keepNext: true,
alignment: AlignmentType.CENTER,
children: [new TextRun({ text: "Figure 1: Architecture Diagram", italics: true, size: 20 })],
})
// ImageRun paragraph follows
```
Use `keepNext: true` for:
- Heading → first paragraph of section
- Table caption → table
- Image → image caption
- "Figure X" label → image
## Page Break Rules
Follow the document type strategy defined in SOUL.md Rule 1.
**Structural breaks (always):**
- Cover page → TOC
- TOC → main content
- Main content → back cover
**Content breaks (by document type):**
- Academic / teaching → `new Paragraph({ children: [new PageBreak()] })` before each H1 chapter
- Business report → PageBreak before each H1; H2 flows naturally
- Resume / contract / letter → No content page breaks
- Short article → No content page breaks
**Anti-tear (mandatory):**
```js
// Heading stays with next paragraph
new Paragraph({
heading: HeadingLevel.HEADING_1,
keepNext: true,
children: [new TextRun("Chapter Title")],
})
// Table caption stays with table
new Paragraph({
keepNext: true,
children: [new TextRun({ text: "Table 1: Summary", italics: true })],
})
// Image caption stays with image
new Paragraph({
keepNext: true,
children: [new TextRun({ text: "Figure 1: Architecture", italics: true })],
})
```
**Never:**
- PageBreak inside tables
- PageBreak as standalone element (must be inside Paragraph)
- PageBreak at the END of the last section (causes blank page)
```js
// Correct: page break between cover and TOC
new Paragraph({ children: [new PageBreak()] })
```
## Quotes Escaping in JS Strings
**⚠️⚠️⚠️ CRITICAL — #1 MOST COMMON BUG ⚠️⚠️⚠️**
Bare Chinese curly quotation marks (`""` `''`) in JS string literals **WILL break syntax and crash document generation**. This bug occurs most often in **Chinese body text** where curly quotes are used for emphasis, proper nouns, event names, or quoted speech — e.g., `"双11"`, `"前低后高"`, `"618"大促`. **Every single occurrence** of `""''` in text content MUST be Unicode-escaped. No exceptions.
**MANDATORY RULE: Before writing ANY `TextRun`, `para()`, or string containing Chinese text, scan the text for `""''` characters and replace ALL of them with `\u201c \u201d \u2018 \u2019`.**
| Character | Unicode | Escape method |
|-----------|---------|---------------|
| `"` `"` | `\u201c` `\u201d` | Unicode escape `\u201c` `\u201d` |
| `'` `'` | `\u2018` `\u2019` | Unicode escape `\u2018` `\u2019` |
| `"` | U+0022 | `\"` or wrap string in single quotes / template literal |
| `'` | U+0027 | `\'` or wrap string in double quotes / template literal |
```js
// ❌ WRONG — curly quotes in Chinese text break JS syntax (VERY COMMON MISTAKE)
content.push(para("2025年四个季度行业增速呈现"前低后高"的态势。在"618"大促、"双11""双12"活动拉动下增长显著。"));
new TextRun({ text: "他说"你好"" })
new TextRun({ text: 'It's a test' })
// ✅ CORRECT — ALL curly quotes replaced with Unicode escapes
content.push(para("2025年四个季度行业增速呈现\u201c前低后高\u201d的态势。在\u201c618\u201d大促、\u201c双11\u201d\u201c双12\u201d活动拉动下增长显著。"));
new TextRun({ text: "他说\u201c你好\u201d" })
new TextRun({ text: "It\u2019s a test" })
// ✅ CORRECT — straight quotes escaped or use alternate delimiters
new TextRun({ text: "He said \"hello\"" })
new TextRun({ text: 'He said "hello"' })
new TextRun({ text: `He said "hello"` })
```
## Multi-Section Documents
Different headers/footers per section:
```js
const doc = new Document({
sections: [
{
// Section 1: Cover — no header/footer
properties: { page: { /* ... */ } },
children: coverChildren,
},
{
// Section 2: Front matter — Roman page numbers
properties: {
type: SectionType.NEXT_PAGE,
page: {
/* size, margin... */
pageNumbers: { start: 1, formatType: NumberFormat.UPPER_ROMAN },
},
},
headers: { default: new Header({ children: [] }) },
footers: {
default: new Footer({
children: [new Paragraph({
alignment: AlignmentType.CENTER,
children: [new TextRun({ children: [PageNumber.CURRENT], size: 18 })],
})],
}),
},
children: tocAndAbstract,
},
{
// Section 3: Main content — Arabic page numbers
properties: {
type: SectionType.NEXT_PAGE,
page: {
/* size, margin... */
pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL },
},
},
headers: {
default: new Header({
children: [new Paragraph({
alignment: AlignmentType.CENTER,
children: [new TextRun({ text: docTitle, size: 18, color: "888888" })],
})],
}),
},
footers: { default: footerWithPageNumbers },
children: mainContent,
},
],
});
```
## Converting DOCX to PDF
```bash
# Using LibreOffice (headless)
libreoffice --headless --convert-to pdf output.docx
# ⚠️ TOC Rule: If document has TOC, warn user that:
# 1. LibreOffice conversion may show empty TOC
# 2. User should open in Word first, update fields (Ctrl+A → F9), save, then convert
# 3. Or use Word's "Save as PDF" for best results
```
## Converting DOCX to Images
```bash
# Step 1: Convert to PDF
libreoffice --headless --convert-to pdf output.docx
# Step 2: Convert PDF to images
pdftoppm -png -r 200 output.pdf output_page
# This generates output_page-1.png, output_page-2.png, etc.
# Use -r 200 for good quality (200 DPI)
```
Useful for generating preview thumbnails or when user needs images instead of document files.