258 lines
8.2 KiB
Markdown
Executable File
258 lines
8.2 KiB
Markdown
Executable File
# docx-js Advanced Features
|
|
|
|
Advanced API for complex document scenarios. Load this when creating documents with TOC, cover pages, footnotes, multi-section layouts, or post-processing needs.
|
|
|
|
## Table of Contents (TOC)
|
|
|
|
**→ See `references/toc.md` for the complete TOC reference** (3-step process, code examples, page numbering, common bugs, checklist).
|
|
|
|
## Cover Page Design (Vertical Centering)
|
|
|
|
Use large `spacing.before` to push content down for visual centering:
|
|
|
|
```js
|
|
// Approximate vertical center on A4:
|
|
// Total printable height ≈ 14000 twips
|
|
// For title at ~40% from top: before = 5600
|
|
const coverSection = {
|
|
properties: {
|
|
page: { /* standard A4 */ },
|
|
// No headers/footers on cover page
|
|
},
|
|
children: [
|
|
new Paragraph({ spacing: { before: 5600 } }), // spacer
|
|
new Paragraph({
|
|
alignment: AlignmentType.CENTER,
|
|
children: [new TextRun({
|
|
text: title,
|
|
font: { ascii: "Calibri", eastAsia: "SimHei" },
|
|
size: 52, bold: true, color: palette.primary,
|
|
})],
|
|
}),
|
|
// ... subtitle, author, date
|
|
],
|
|
};
|
|
```
|
|
|
|
For multi-section documents, put the cover in its own section so it can have different headers/footers.
|
|
|
|
## Footnotes
|
|
|
|
```js
|
|
const { FootnoteReferenceRun, Footnote } = require("docx");
|
|
|
|
const doc = new Document({
|
|
footnotes: {
|
|
1: { children: [new Paragraph({ children: [new TextRun({ text: "Smith, J. (2024). Research Methods. Academic Press, pp. 45-67.", size: 18 })] })] },
|
|
2: { children: [new Paragraph({ children: [new TextRun({ text: "Zhang, W. (2023). \u201c数据分析方法研究\u201d. 科学通报, 68(12), 1234-1250.", size: 18 })] })] },
|
|
},
|
|
sections: [{
|
|
children: [
|
|
new Paragraph({
|
|
children: [
|
|
new TextRun({ text: "According to recent studies" }),
|
|
new FootnoteReferenceRun(1), // superscript [1]
|
|
new TextRun({ text: ", data analysis methods have evolved" }),
|
|
new FootnoteReferenceRun(2), // superscript [2]
|
|
new TextRun({ text: "." }),
|
|
],
|
|
}),
|
|
],
|
|
}],
|
|
});
|
|
```
|
|
|
|
### Academic Reference Pattern
|
|
|
|
For sequential references [1][2][3]..., pre-define all footnotes in the `footnotes` object with numeric keys, then reference them inline with `FootnoteReferenceRun(n)`.
|
|
|
|
## keepNext — Element Binding
|
|
|
|
Prevent page breaks between related elements:
|
|
|
|
```js
|
|
// Heading stays with next paragraph
|
|
new Paragraph({
|
|
heading: HeadingLevel.HEADING_2,
|
|
keepNext: true, // don't break after this
|
|
children: [new TextRun({ text: "Table 1: Results" })],
|
|
})
|
|
// Table immediately follows on same page
|
|
|
|
// Caption stays with image
|
|
new Paragraph({
|
|
keepNext: true,
|
|
alignment: AlignmentType.CENTER,
|
|
children: [new TextRun({ text: "Figure 1: Architecture Diagram", italics: true, size: 20 })],
|
|
})
|
|
// ImageRun paragraph follows
|
|
```
|
|
|
|
Use `keepNext: true` for:
|
|
- Heading → first paragraph of section
|
|
- Table caption → table
|
|
- Image → image caption
|
|
- "Figure X" label → image
|
|
|
|
## Page Break Rules
|
|
|
|
Follow the document type strategy defined in SOUL.md Rule 1.
|
|
|
|
**Structural breaks (always):**
|
|
- Cover page → TOC
|
|
- TOC → main content
|
|
- Main content → back cover
|
|
|
|
**Content breaks (by document type):**
|
|
- Academic / teaching → `new Paragraph({ children: [new PageBreak()] })` before each H1 chapter
|
|
- Business report → PageBreak before each H1; H2 flows naturally
|
|
- Resume / contract / letter → No content page breaks
|
|
- Short article → No content page breaks
|
|
|
|
**Anti-tear (mandatory):**
|
|
```js
|
|
// Heading stays with next paragraph
|
|
new Paragraph({
|
|
heading: HeadingLevel.HEADING_1,
|
|
keepNext: true,
|
|
children: [new TextRun("Chapter Title")],
|
|
})
|
|
|
|
// Table caption stays with table
|
|
new Paragraph({
|
|
keepNext: true,
|
|
children: [new TextRun({ text: "Table 1: Summary", italics: true })],
|
|
})
|
|
|
|
// Image caption stays with image
|
|
new Paragraph({
|
|
keepNext: true,
|
|
children: [new TextRun({ text: "Figure 1: Architecture", italics: true })],
|
|
})
|
|
```
|
|
|
|
**Never:**
|
|
- PageBreak inside tables
|
|
- PageBreak as standalone element (must be inside Paragraph)
|
|
- PageBreak at the END of the last section (causes blank page)
|
|
|
|
```js
|
|
// Correct: page break between cover and TOC
|
|
new Paragraph({ children: [new PageBreak()] })
|
|
```
|
|
|
|
## Quotes Escaping in JS Strings
|
|
|
|
**⚠️⚠️⚠️ CRITICAL — #1 MOST COMMON BUG ⚠️⚠️⚠️**
|
|
|
|
Bare Chinese curly quotation marks (`""` `''`) in JS string literals **WILL break syntax and crash document generation**. This bug occurs most often in **Chinese body text** where curly quotes are used for emphasis, proper nouns, event names, or quoted speech — e.g., `"双11"`, `"前低后高"`, `"618"大促`. **Every single occurrence** of `""''` in text content MUST be Unicode-escaped. No exceptions.
|
|
|
|
**MANDATORY RULE: Before writing ANY `TextRun`, `para()`, or string containing Chinese text, scan the text for `""''` characters and replace ALL of them with `\u201c \u201d \u2018 \u2019`.**
|
|
|
|
| Character | Unicode | Escape method |
|
|
|-----------|---------|---------------|
|
|
| `"` `"` | `\u201c` `\u201d` | Unicode escape `\u201c` `\u201d` |
|
|
| `'` `'` | `\u2018` `\u2019` | Unicode escape `\u2018` `\u2019` |
|
|
| `"` | U+0022 | `\"` or wrap string in single quotes / template literal |
|
|
| `'` | U+0027 | `\'` or wrap string in double quotes / template literal |
|
|
|
|
```js
|
|
// ❌ WRONG — curly quotes in Chinese text break JS syntax (VERY COMMON MISTAKE)
|
|
content.push(para("2025年四个季度行业增速呈现"前低后高"的态势。在"618"大促、"双11""双12"活动拉动下增长显著。"));
|
|
new TextRun({ text: "他说"你好"" })
|
|
new TextRun({ text: 'It's a test' })
|
|
|
|
// ✅ CORRECT — ALL curly quotes replaced with Unicode escapes
|
|
content.push(para("2025年四个季度行业增速呈现\u201c前低后高\u201d的态势。在\u201c618\u201d大促、\u201c双11\u201d\u201c双12\u201d活动拉动下增长显著。"));
|
|
new TextRun({ text: "他说\u201c你好\u201d" })
|
|
new TextRun({ text: "It\u2019s a test" })
|
|
|
|
// ✅ CORRECT — straight quotes escaped or use alternate delimiters
|
|
new TextRun({ text: "He said \"hello\"" })
|
|
new TextRun({ text: 'He said "hello"' })
|
|
new TextRun({ text: `He said "hello"` })
|
|
```
|
|
|
|
## Multi-Section Documents
|
|
|
|
Different headers/footers per section:
|
|
|
|
```js
|
|
const doc = new Document({
|
|
sections: [
|
|
{
|
|
// Section 1: Cover — no header/footer
|
|
properties: { page: { /* ... */ } },
|
|
children: coverChildren,
|
|
},
|
|
{
|
|
// Section 2: Front matter — Roman page numbers
|
|
properties: {
|
|
type: SectionType.NEXT_PAGE,
|
|
page: {
|
|
/* size, margin... */
|
|
pageNumbers: { start: 1, formatType: NumberFormat.UPPER_ROMAN },
|
|
},
|
|
},
|
|
headers: { default: new Header({ children: [] }) },
|
|
footers: {
|
|
default: new Footer({
|
|
children: [new Paragraph({
|
|
alignment: AlignmentType.CENTER,
|
|
children: [new TextRun({ children: [PageNumber.CURRENT], size: 18 })],
|
|
})],
|
|
}),
|
|
},
|
|
children: tocAndAbstract,
|
|
},
|
|
{
|
|
// Section 3: Main content — Arabic page numbers
|
|
properties: {
|
|
type: SectionType.NEXT_PAGE,
|
|
page: {
|
|
/* size, margin... */
|
|
pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL },
|
|
},
|
|
},
|
|
headers: {
|
|
default: new Header({
|
|
children: [new Paragraph({
|
|
alignment: AlignmentType.CENTER,
|
|
children: [new TextRun({ text: docTitle, size: 18, color: "888888" })],
|
|
})],
|
|
}),
|
|
},
|
|
footers: { default: footerWithPageNumbers },
|
|
children: mainContent,
|
|
},
|
|
],
|
|
});
|
|
```
|
|
|
|
## Converting DOCX to PDF
|
|
|
|
```bash
|
|
# Using LibreOffice (headless)
|
|
libreoffice --headless --convert-to pdf output.docx
|
|
|
|
# ⚠️ TOC Rule: If document has TOC, warn user that:
|
|
# 1. LibreOffice conversion may show empty TOC
|
|
# 2. User should open in Word first, update fields (Ctrl+A → F9), save, then convert
|
|
# 3. Or use Word's "Save as PDF" for best results
|
|
```
|
|
|
|
## Converting DOCX to Images
|
|
|
|
```bash
|
|
# Step 1: Convert to PDF
|
|
libreoffice --headless --convert-to pdf output.docx
|
|
|
|
# Step 2: Convert PDF to images
|
|
pdftoppm -png -r 200 output.pdf output_page
|
|
|
|
# This generates output_page-1.png, output_page-2.png, etc.
|
|
# Use -r 200 for good quality (200 DPI)
|
|
```
|
|
|
|
Useful for generating preview thumbnails or when user needs images instead of document files.
|