Initial commit

This commit is contained in:
Z User
2026-06-06 05:21:10 +00:00
Unverified
commit 6664758a6d
493 changed files with 135653 additions and 0 deletions

13
skills/docx/LICENSE.txt Executable file
View File

@@ -0,0 +1,13 @@
Copyright (c) 2026 Z.ai All rights reserved.
Permission is granted for personal, educational, and non-commercial use only.
Commercial use is strictly prohibited without prior written permission from the author.
Unauthorized copying, modification, or distribution of the software for commercial purposes is prohibited.
The author reserves the right to make the final determination of what constitutes "commercial use".
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY ARISING FROM THE USE OF THE SOFTWARE.

201
skills/docx/SKILL.md Executable file
View File

@@ -0,0 +1,201 @@
---
name: docx
metadata:
author: Z.AI
version: "1.0"
description: "Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When GLM needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks"
license: Proprietary. LICENSE.txt has complete terms
---
# DOCX Creation, Editing, and Analysis
## Quick Setup
```bash
bash "$SKILL_DIR/setup.sh" # Interactive environment check + install
```
## Overview
A .docx file is a ZIP archive containing XML files. This skill provides tools for creating, editing, reading, and reviewing Word documents.
## Quick Route — Read This First
**Step 1**: Determine task type → load the corresponding route file
**Step 2**: Determine business scene → load the corresponding scene file (if applicable)
**Step 3**: Load `references/design-system.md` for cover recipes, palettes, and chart colors
**Step 4**: Load `references/common-rules.md` for shared layout, font, and quality rules
**Step 5**: Execute per route instructions
**Step 6**: Run the post-generation checklist
⚠️ **MANDATORY — Cover Recipe Enforcement (Step 3):**
When creating a document that needs a cover page, you MUST use one of the 7 validated cover recipes (R1R7) from `design-system.md`. **Free-form cover code is FORBIDDEN.** The recipe provides the wrapper table, background, layout structure, border settings, and spacing — do not reinvent any of these.
Workflow: (1) Call `selectCoverRecipe(docType, industry)` to get recipe + palette → (2) Use the corresponding `buildCoverRX()` function code from `design-system.md` → (3) Pass your `config` (title, subtitle, metaLines, etc.) into the recipe builder. If you skip this and write cover code from scratch, the cover WILL have compatibility issues (blank pages in MS Office, missing borders, overflow, etc.).
### Script Path Setup (MANDATORY before any script call)
All CLI tools live in `scripts/` relative to this skill's directory. Before calling any script, resolve the absolute path once:
```bash
DOCX_SCRIPTS="<skill_directory>/scripts" # ← parent directory of this SKILL.md
# Then all commands use $DOCX_SCRIPTS:
python3 "$DOCX_SCRIPTS/postcheck.py" output.docx
python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx --auto
```
**For Python imports** (when generation code needs to import skill modules):
```python
import sys, os
DOCX_SCRIPTS = os.path.join("<skill_directory>", "scripts")
if DOCX_SCRIPTS not in sys.path:
sys.path.insert(0, DOCX_SCRIPTS)
```
**⚠️ NEVER use bare `python3 scripts/...`** — it only works if cwd happens to be the skill directory. Always use the absolute `$DOCX_SCRIPTS` path.
### Task Router
| User Intent | Route | Files to Load |
|-------------|-------|---------------|
| Create/write/generate (no attachment) | **Create** | `routes/create.md` + `references/docx-js-core.md` |
| Edit/modify/revise (has attachment) | **Edit** | `routes/edit.md` + `references/ooxml.md` |
| Format/layout/font/margin | **Format** | `routes/format.md` |
| Comment/annotate/review | **Comment** | `routes/comment.md` |
| Read/analyze/extract | **Read** | `routes/read.md` |
### Scene Router (Optional — load after route)
| User Keywords | Scene | File |
|---------------|-------|------|
| thesis, academic, research, paper, dissertation, abstract, journal | Academic | `scenes/academic.md` |
| report, analysis, experiment, testing, survey, review, summary, proposal, feasibility, competitor, industry, operations | Report | `scenes/report.md` |
| contract, agreement, terms, transfer, NDA, confidential, framework, cooperation, service terms, user agreement, procurement | Contract | `scenes/contract.md` |
| resume, CV, job application | Resume | `scenes/resume.md` |
| exam, test, quiz, paper (exam context), lesson plan | Exam | `scenes/exam.md` |
| official document, notice, letter, reply, minutes, red header, government, issuance | Official | `scenes/official-doc.md` |
| broadcast script, product copy, livestream, speech, presentation script, video script | Copywriting | `scenes/copywriting.md` |
| plan, proposal (if not report context) | Report | `scenes/report.md` |
| policy, regulation, standard, management rules | Official | `scenes/official-doc.md` |
**If no scene matches**, use default design rules from `references/design-system.md` and `references/common-rules.md`.
## Formatting Standards (Always Apply)
→ See `references/common-rules.md` for full font profiles, spacing, indent, and layout rules.
**Key rules (quick reference):**
- **Line spacing**: 1.3x (`line: 312`) — MANDATORY. Exceptions: resume 1.15x, official doc 28pt fixed, copywriting `400`, contract 1.5x
- **CJK body**: Justified + 2-char indent (`firstLine: 480` SimSun / `420` YaHei)
- **Tables**: `margins` set, `ShadingType.CLEAR`, `tableHeader: true`, `cantSplit: true`, title `keepNext: true`
- **Images**: `type` parameter required, preserve aspect ratio via `image-size`, PageBreak inside Paragraph
- **Full-page Table row**: `rule: "exact"` with 1200 twips safety margin
## Unit Quick Reference
| Unit | Value |
|------|-------|
| 1 cm | 567 twips |
| 1 inch | 1440 twips |
| 1 pt | 20 half-points |
| A4 | 11906 × 16838 twips |
For Chinese font size table and common margins, see `references/common-rules.md`.
## Post-Generation — Two-Layer Verification
### Layer 1: Manual Checklist (self-check during generation)
#### Basic Format
- [ ] Line spacing is 1.3x (`line: 312`) or scene-specific override
- [ ] CJK body has 2-char indent (`firstLine: 480` or `420`)
- [ ] Tables have margins set
- [ ] Images preserve aspect ratio via `image-size` — NEVER hardcode both width and height
- [ ] PageBreak inside Paragraph
- [ ] ShadingType uses CLEAR
- [ ] Each numbered list uses unique `reference`
- [ ] **⚠️ CRITICAL — Quotation marks in JS strings properly escaped.** Chinese curly quotes (`""` `''`) MUST use Unicode escapes (`\u201c` `\u201d` `\u2018` `\u2019`); straight quotes (`"` `'`) use `\"` `\'` or alternate delimiters. **This is the #1 most common code generation bug.** Chinese text frequently contains `""` for emphasis or proper nouns (e.g., "双11", "前低后高", "618") — every occurrence MUST be escaped. Failure to escape produces JS syntax errors that silently break document generation.
- [ ] ImageRun includes `type` parameter
- [ ] Header/footer present (unless scene says otherwise)
#### Heading Styles
- [ ] All body chapter headings use `heading: HeadingLevel.HEADING_X` (never simulate with bold + large font)
- [ ] Cover title may skip Heading style (not in TOC), but body headings MUST use Heading style
#### Page Break & Blank Page Prevention
- [ ] Cover/content in separate sections
- [ ] Three rules to prevent blank pages:
- ① When using section(NEXT_PAGE), previous section must NOT end with PageBreak (double break = blank page)
- ② PageBreak paragraph SHOULD contain visible text — **exception**: section-ending empty para + PageBreak is allowed (normal section separator, e.g., after cover page)
- ③ No more than 3 consecutive empty paragraphs
- [ ] Full-page Table row height uses `rule: "exact"` (never `"atLeast"` for tall tables)
- [ ] No unwanted blank pages (check each section ending)
#### TOC
→ See `references/toc.md` for the complete TOC reference and checklist.
- [ ] If TOC title exists → `TableOfContents` element must be present
- [ ] **⚠️ MANDATORY PageBreak after TableOfContents** — a Paragraph containing PageBreak MUST immediately follow the `TableOfContents` element; without it, TOC and body content will render on the same page. This is the #1 TOC formatting failure — never omit it
- [ ] `add_toc_placeholders.py --auto` runs after generation; exit code = 0
- [ ] **TOC MUST be in its own section** — body section sets `page: { pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL } }` so page numbers start from the first body page, not from the TOC pages
- [ ] **Page number API nesting**`pageNumbers` MUST be inside `page: {}`, NOT at properties top level (see toc.md § Page Number API)
- [ ] **3-section page numbering** — Cover (no page#) → Front matter (Roman i,ii,iii, start=1) → Body (Arabic 1,2,3, start=1)
- [ ] **Post-process footers** — Roman section footer instrText must contain `PAGE \* ROMAN \* MERGEFORMAT`; Arabic section `PAGE \* arabic \* MERGEFORMAT` (WPS ignores pgNumType fmt). **⚠️ NEVER use `\* decimal` in instrText** — `decimal` is a docx-js API enum value (`NumberFormat.DECIMAL`), NOT a valid Word field format switch; using it causes page numbers to render as "1decimal", "2decimal". The correct Word field switch for Arabic numerals is `\* arabic`.
- [ ] **Remove empty pgNumType** — Post-process to strip `<w:pgNumType/>` from cover section (docx-js emits empty element that confuses WPS)
- [ ] **⚠️ TOC Refresh Hint MANDATORY** — between `TableOfContents` element and the PageBreak, MUST add an italic gray note paragraph telling users to right-click TOC → "Update Field" to refresh page numbers (see toc.md § TOC Refresh Hint)
#### Table Cross-Page
- [ ] Header rows: `tableHeader: true`
- [ ] All rows: `cantSplit: true`
- [ ] Title paragraph: `keepNext: true`
#### Cover
- [ ] **Cover MUST use a validated recipe (R1R7)** from `design-system.md` — free-form cover code is forbidden
- [ ] Cover recipe matches document type (per `selectCoverRecipe()` in `design-system.md`)
- [ ] Cover uses the 16838 outer wrapper table with `allNoBorders` (all recipes provide this)
- [ ] Cover title uses `calcTitleLayout()` — never hardcoded font size above 40pt
- [ ] Cover spacing uses `calcCoverSpacing()` — never hardcoded large spacing values
- [ ] Cover content does not overflow (total height ≤ 15638 twips, Table uses `rule: "exact"`)
- [ ] Every TextRun on dark/colored background has explicit `color` set (Rule 9 — never rely on default black)
- [ ] Cover section has no trailing PageBreak or empty paragraphs
- [ ] Title lines split at semantic boundaries (no mid-word breaks, no single-char orphan lines)
- [ ] No text-character decorative lines (`───`, `━━━`) — use paragraph borders only
### Layer 2: Automated Post-Check Script
```bash
python3 "$DOCX_SCRIPTS/postcheck.py" output.docx
```
Automatically checks 14 business rules: blank pages, **cover overflow (font size/spacing/trailing content)**, line spacing consistency, table margins, table cross-page control (cantSplit/tblHeader), image overflow, image aspect ratio distortion, font fallback, CJK indent, heading hierarchy, ShadingType misuse, TOC quality, document cleanliness (placeholder text/Markdown/HTML residuals), report content quality (abstract presence/heading specificity/vague conclusion detection).
⚠️ **After generating any document, MUST run postcheck.py and fix all ❌ errors.**
## Math Formulas
Formula input uses **LaTeX syntax**, internally converted to docx-js Math objects.
- **Basic formulas** (fractions, sub/superscript, roots, summation) → docx-js Math components
- **Complex formulas** (3+ nesting, matrices, piecewise functions) → matplotlib PNG fallback
See `references/math-formulas.md`.
## Charts
Default: **matplotlib template library** generates PNG for embedding.
6 ready-to-use templates: bar, line, pie, box, radar, heatmap.
Colors auto-derived from document palette.accent for style consistency.
Default palette: Morandi low-saturation (see design-system.md).
See `references/chart-templates.md`.
## Dependencies
- **pandoc**: Text extraction
- **docx**: `bun add docx` or `npm install docx` (creating)
- **LibreOffice**: PDF conversion, .doc support
- **Poppler**: PDF to image (`pdftoppm`)
- **defusedxml**: Secure XML parsing
- **python-docx**: Simple comment operations

View File

@@ -0,0 +1,386 @@
# Chart Templates — matplotlib Template Library
## Design Philosophy
GLM uses **matplotlib as the primary chart engine**. Advantages:
- High chart quality, print-ready
- Full style control, consistent with document palette
- Supports complex chart types (heatmap, radar, box plot, etc.)
- Reliable CJK rendering (with SimHei font configured)
**When to use native Word charts?**
Only when the user explicitly requests "editable charts." Default is always matplotlib PNG embedding.
## Base Configuration
```python
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.font_manager import FontProperties
# ── CJK Font ──
_FONT_PATHS = [
"/System/Library/Fonts/Supplemental/SimHei.ttf", # macOS
"/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc", # Linux
"/usr/share/fonts/truetype/chinese/SimHei.ttf", # custom install
"./SimHei.ttf", # current dir
]
ZH_FONT = None
for _fp in _FONT_PATHS:
try:
ZH_FONT = FontProperties(fname=_fp)
break
except:
continue
plt.rcParams["axes.unicode_minus"] = False
# ── Palette Adapter ──
def make_chart_palette(accent: str, surface: str = "#F2F4F6") -> dict:
"""Generate chart palette from document palette.accent"""
return {
"primary": accent,
"series": _generate_series_colors(accent, 6),
"grid": "#E0E0E0",
"bg": "white",
"text": "#333333",
"surface": surface,
}
def _generate_series_colors(base_hex: str, count: int) -> list:
"""Generate series colors via hue rotation from base color"""
import colorsys
base = tuple(int(base_hex.lstrip("#")[i:i+2], 16) / 255.0 for i in (0, 2, 4))
h, s, v = colorsys.rgb_to_hsv(*base)
colors = []
for i in range(count):
hi = (h + i * (1.0 / count)) % 1.0
r, g, b = colorsys.hsv_to_rgb(hi, min(s * 0.9, 1.0), min(v * 1.05, 1.0))
colors.append(f"#{int(r*255):02x}{int(g*255):02x}{int(b*255):02x}")
return colors
# ── Universal Export ──
def save_chart(fig, path: str, dpi: int = 200):
"""Save chart with uniform DPI. Square charts (pie/radar) use fixed padding to preserve 1:1 ratio."""
w, h = fig.get_size_inches()
if abs(w - h) < 0.1:
fig.savefig(path, dpi=dpi, bbox_inches="tight", pad_inches=0.3,
facecolor="white", edgecolor="none")
else:
fig.savefig(path, dpi=dpi, bbox_inches="tight", pad_inches=0.1,
facecolor="white", edgecolor="none")
plt.close(fig)
return path
```
## Template 1: Bar Chart
```python
def bar_chart(categories: list, values: list, title: str = "",
ylabel: str = "", palette: dict = None, output: str = "bar.png"):
"""
Basic bar chart.
categories: ["Q1", "Q2", "Q3", "Q4"]
values: [120, 150, 180, 200]
"""
p = palette or make_chart_palette("#5B8DB8")
fig, ax = plt.subplots(figsize=(10, 6))
bars = ax.bar(categories, values, color=p["primary"], width=0.6, edgecolor="white")
# Data labels
for bar, val in zip(bars, values):
ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + max(values) * 0.02,
str(val), ha="center", va="bottom", fontsize=10,
fontproperties=ZH_FONT, color=p["text"])
if title:
ax.set_title(title, fontproperties=ZH_FONT, fontsize=14, pad=15, color=p["text"])
if ylabel:
ax.set_ylabel(ylabel, fontproperties=ZH_FONT, fontsize=11, color=p["text"])
ax.set_xticklabels(categories, fontproperties=ZH_FONT, fontsize=10)
ax.spines[["top", "right"]].set_visible(False)
ax.grid(axis="y", alpha=0.3, color=p["grid"])
if len(categories) > 6:
plt.xticks(rotation=45, ha="right")
return save_chart(fig, output)
```
### Grouped Bar Chart
```python
def grouped_bar(categories: list, groups: dict, title: str = "",
ylabel: str = "", palette: dict = None, output: str = "grouped_bar.png"):
"""
groups: {"Product A": [10, 20, 30], "Product B": [15, 25, 35]}
"""
p = palette or make_chart_palette("#5B8DB8")
fig, ax = plt.subplots(figsize=(10, 6))
x = np.arange(len(categories))
n = len(groups)
width = 0.8 / n
for i, (name, vals) in enumerate(groups.items()):
offset = (i - n / 2 + 0.5) * width
bars = ax.bar(x + offset, vals, width, label=name, color=p["series"][i % len(p["series"])])
ax.set_xticks(x)
ax.set_xticklabels(categories, fontproperties=ZH_FONT, fontsize=10)
ax.legend(prop=ZH_FONT, frameon=False)
if title:
ax.set_title(title, fontproperties=ZH_FONT, fontsize=14, pad=15)
ax.spines[["top", "right"]].set_visible(False)
ax.grid(axis="y", alpha=0.3)
return save_chart(fig, output)
```
## Template 2: Line Chart
```python
def line_chart(x_data: list, series: dict, title: str = "",
xlabel: str = "", ylabel: str = "", palette: dict = None,
output: str = "line.png"):
"""
series: {"Revenue": [100, 120, 150, 180], "Cost": [80, 90, 100, 110]}
"""
p = palette or make_chart_palette("#5B8DB8")
fig, ax = plt.subplots(figsize=(10, 6))
for i, (name, values) in enumerate(series.items()):
color = p["series"][i % len(p["series"])]
ax.plot(x_data, values, marker="o", markersize=5, linewidth=2,
label=name, color=color)
if title:
ax.set_title(title, fontproperties=ZH_FONT, fontsize=14, pad=15)
if xlabel:
ax.set_xlabel(xlabel, fontproperties=ZH_FONT, fontsize=11)
if ylabel:
ax.set_ylabel(ylabel, fontproperties=ZH_FONT, fontsize=11)
ax.legend(prop=ZH_FONT, frameon=False, loc="best")
ax.spines[["top", "right"]].set_visible(False)
ax.grid(True, alpha=0.3)
if len(x_data) > 6:
plt.xticks(rotation=45, ha="right")
return save_chart(fig, output)
```
## Template 3: Pie Chart
```python
def pie_chart(labels: list, values: list, title: str = "",
palette: dict = None, output: str = "pie.png"):
"""Pie chart — auto-merges slices below 3% into 'Other'"""
p = palette or make_chart_palette("#5B8DB8")
fig, ax = plt.subplots(figsize=(8, 8))
# Merge slices below 3% into "Other"
total = sum(values)
merged_labels, merged_values = [], []
other = 0
for lbl, val in zip(labels, values):
if val / total < 0.03:
other += val
else:
merged_labels.append(lbl)
merged_values.append(val)
if other > 0:
merged_labels.append("Other")
merged_values.append(other)
colors = p["series"][:len(merged_labels)]
wedges, texts, autotexts = ax.pie(
merged_values, labels=merged_labels, colors=colors,
autopct="%1.1f%%", startangle=90, pctdistance=0.75,
textprops={"fontproperties": ZH_FONT, "fontsize": 11}
)
for t in autotexts:
t.set_fontsize(10)
t.set_color("white")
if title:
ax.set_title(title, fontproperties=ZH_FONT, fontsize=14, pad=20)
return save_chart(fig, output)
```
## Template 4: Box Plot
```python
def box_plot(data: dict, title: str = "", ylabel: str = "",
palette: dict = None, output: str = "box.png"):
"""
data: {"Class A": [78, 82, 91, ...], "Class B": [65, 70, 88, ...]}
"""
p = palette or make_chart_palette("#5B8DB8")
fig, ax = plt.subplots(figsize=(10, 6))
labels = list(data.keys())
values = list(data.values())
bp = ax.boxplot(values, labels=labels, patch_artist=True, notch=False,
medianprops={"color": "white", "linewidth": 2})
for i, patch in enumerate(bp["boxes"]):
patch.set_facecolor(p["series"][i % len(p["series"])])
patch.set_alpha(0.8)
ax.set_xticklabels(labels, fontproperties=ZH_FONT, fontsize=11)
if title:
ax.set_title(title, fontproperties=ZH_FONT, fontsize=14, pad=15)
if ylabel:
ax.set_ylabel(ylabel, fontproperties=ZH_FONT, fontsize=11)
ax.spines[["top", "right"]].set_visible(False)
ax.grid(axis="y", alpha=0.3)
return save_chart(fig, output)
```
## Template 5: Radar Chart
```python
def radar_chart(categories: list, series: dict, title: str = "",
palette: dict = None, output: str = "radar.png"):
"""
categories: ["Chinese", "Math", "English", "Physics", "Chemistry"]
series: {"Student A": [85, 92, 78, 90, 88], "Student B": [75, 88, 92, 70, 85]}
"""
p = palette or make_chart_palette("#5B8DB8")
fig, ax = plt.subplots(figsize=(8, 8), subplot_kw=dict(polar=True))
n = len(categories)
angles = np.linspace(0, 2 * np.pi, n, endpoint=False).tolist()
angles += angles[:1] # close the polygon
for i, (name, values) in enumerate(series.items()):
vals = values + values[:1] # close the polygon
color = p["series"][i % len(p["series"])]
ax.plot(angles, vals, linewidth=2, label=name, color=color)
ax.fill(angles, vals, alpha=0.15, color=color)
ax.set_xticks(angles[:-1])
ax.set_xticklabels(categories, fontproperties=ZH_FONT, fontsize=11)
ax.legend(prop=ZH_FONT, loc="upper right", bbox_to_anchor=(1.2, 1.1), frameon=False)
if title:
ax.set_title(title, fontproperties=ZH_FONT, fontsize=14, pad=25)
return save_chart(fig, output)
```
## Template 6: Heatmap
```python
def heatmap(data: list, row_labels: list, col_labels: list, title: str = "",
palette: dict = None, output: str = "heatmap.png"):
"""
data: 2D array [[1,2,3],[4,5,6]]
row_labels: ["Row 1", "Row 2"]
col_labels: ["Col 1", "Col 2", "Col 3"]
"""
fig, ax = plt.subplots(figsize=(max(8, len(col_labels) * 1.2), max(6, len(row_labels) * 0.8)))
arr = np.array(data)
im = ax.imshow(arr, cmap="YlOrRd", aspect="auto")
ax.set_xticks(range(len(col_labels)))
ax.set_yticks(range(len(row_labels)))
ax.set_xticklabels(col_labels, fontproperties=ZH_FONT, fontsize=10)
ax.set_yticklabels(row_labels, fontproperties=ZH_FONT, fontsize=10)
# Value annotations
for i in range(len(row_labels)):
for j in range(len(col_labels)):
val = arr[i, j]
color = "white" if val > arr.max() * 0.7 else "black"
ax.text(j, i, f"{val:.1f}", ha="center", va="center",
fontsize=10, color=color)
fig.colorbar(im, ax=ax, shrink=0.8)
if title:
ax.set_title(title, fontproperties=ZH_FONT, fontsize=14, pad=15)
return save_chart(fig, output)
```
## Embedding in Documents (MANDATORY — Preserve Aspect Ratio)
**⚠️ Core Rule: When embedding any chart image, you MUST read actual image dimensions to calculate displayHeight. NEVER hardcode both width and height.**
Pie and radar charts are square — mismatched width/height produces ellipses or diamonds.
```js
// ✅ Correct: read actual image dimensions
const chartBuffer = fs.readFileSync("bar.png");
const sizeOf = require("image-size");
const dims = sizeOf(chartBuffer);
const displayWidth = 500;
const displayHeight = Math.round(displayWidth * (dims.height / dims.width));
new Paragraph({
alignment: AlignmentType.CENTER,
spacing: { before: 200, after: 100 },
children: [
new ImageRun({
data: chartBuffer,
transformation: { width: displayWidth, height: displayHeight },
type: "png",
}),
],
})
```
```js
// ❌ Wrong: hardcoded width and height (pie becomes ellipse, radar becomes diamond)
new ImageRun({
data: chartBuffer,
transformation: { width: 500, height: 350 }, // wrong ratio!
type: "png",
})
```
```python
# ✅ Python (ReportLab) correct approach:
from PIL import Image as PILImage
from reportlab.platypus import Image
pil_img = PILImage.open('chart.png')
orig_w, orig_h = pil_img.size
target_width = 400 # pt
scale = target_width / orig_w
img = Image('chart.png', width=target_width, height=orig_h * scale)
```
## Chart Selection Guide
| Data Scenario | Recommended Chart | Template Function |
|---------------|-------------------|-------------------|
| Category comparison | Bar chart | `bar_chart()` |
| Multi-group comparison | Grouped bar | `grouped_bar()` |
| Trend over time | Line chart | `line_chart()` |
| Proportion/composition | Pie chart | `pie_chart()` |
| Distribution/spread | Box plot | `box_plot()` |
| Multi-dimensional assessment | Radar chart | `radar_chart()` |
| Matrix correlation | Heatmap | `heatmap()` |
## Quality Standards
1. **DPI**: Uniform 200 DPI (built into `save_chart`)
2. **Colors**: Derived from document palette.accent for style consistency
3. **CJK text**: Must configure SimHei font; otherwise renders as boxes
4. **Label overlap prevention**: Auto-rotate 45° when >6 x-axis labels
5. **Legend**: Move outside chart (`bbox_to_anchor`) when >4 series
6. **Grid**: Light gray dashed grid lines for readability
7. **Clean frames**: Remove top/right spines for modern minimalist look
8. **Aspect ratio (CRITICAL)**: Must use `image-size` (JS) or `PIL` (Python) to read actual image dimensions and calculate displayHeight proportionally. **Pie and radar charts are square — hardcoding non-1:1 ratio causes ellipse/diamond distortion.**
9. **Dimensions**: Default 10×6 inches, fits well within A4 page

View File

@@ -0,0 +1,419 @@
# Common Rules
Shared rules referenced by all scene files. Scene-specific overrides take precedence.
## Default Page Layout
A4 portrait. Unless the scene specifies otherwise, use:
| Property | Value | Twips |
|----------|-------|-------|
| Page width | 21.0 cm | 11906 |
| Page height | 29.7 cm | 16838 |
| Top margin | 2.54 cm | 1440 |
| Bottom margin | 2.54 cm | 1440 |
| Left margin | 3.0 cm | 1701 |
| Right margin | 2.5 cm | 1417 |
```js
page: {
size: { width: 11906, height: 16838, orientation: PageOrientation.PORTRAIT },
margin: { top: 1440, bottom: 1440, left: 1701, right: 1417 },
}
```
**Scene overrides:**
- **Official doc (GB/T 9704 red-header):** top 2098, bottom 1984, left 1588, right 1474
- **Exam:** top/bottom 1134 (2 cm), left/right 1134 (2 cm)
## Default Font Specifications
Two font profiles exist. Each scene declares which profile it uses.
### Profile A: Formal (report, academic, contract, official-doc, exam)
| Element | CN Font | EN Font | Size | Notes |
|---------|---------|---------|------|-------|
| H1 | SimHei | Times New Roman | 16 pt (size: 32) | Bold, centered |
| H2 | SimHei | Times New Roman | 15 pt (size: 30) | Bold |
| H3 | SimHei | Times New Roman | 14 pt (size: 28) | Bold |
| Body | SimSun | Times New Roman | 12 pt (size: 24) | |
| Caption | SimSun | Times New Roman | 10.5 pt (size: 21) | |
- Text color: always **pure black `"000000"`** (never dark-blue-grey)
- First-line indent: **480 twips** (2 chars at SimSun 12pt)
- Line spacing: **312** (1.3x).
- **Color routing for non-report documents**: When the document is a short-form text (essay, evaluation, letter, speech, application, reflection, etc.) rather than a structured report/whitepaper/proposal/consulting deliverable, heading color MUST use pure black `"000000"` instead of `palette.primary`. Colored headings are reserved for documents that need brand/professional identity (reports with covers, whitepapers, proposals, consulting deliverables).
### Profile B: Visual (resume, copywriting)
| Element | CN Font | EN Font | Size |
|---------|---------|---------|------|
| Name/Title | Microsoft YaHei | Calibri | Varies |
| Body | Microsoft YaHei | Calibri | 1011 pt |
| Caption | Microsoft YaHei | Calibri | 9 pt |
- First-line indent: **420 twips** (2 chars at YaHei)
- Color: per design-system palette
### Official-Doc Font Override (GB/T 9704)
When `needsRedHeader() = true`:
| Element | Font | Size |
|---------|------|------|
| Red header org name | STXiaoBiaoSong (or SimSun bold) | 26 pt (size: 52) |
| Title | STXiaoBiaoSong (or SimHei) | 22 pt (size: 44) |
| Body | FangSong | 16 pt (size: 32) |
| Section heading | FangSong_GB2312 bold (or HeiTi) | 16 pt (size: 32) |
- Line spacing: **560** (28 pt fixed)
- First-line indent: **640 twips** (2 chars at FangSong 16pt)
## Chinese Font Size Reference
| Name | Points | Half-points (size:) |
|------|--------|---------------------|
| Chu Hao (initial) | 42 | 84 |
| Xiao Chu | 36 | 72 |
| Yi Hao (1st) | 26 | 52 |
| Xiao Yi | 24 | 48 |
| Er Hao (2nd) | 22 | 44 |
| Xiao Er | 18 | 36 |
| San Hao (3rd) | 16 | 32 |
| Xiao San | 15 | 30 |
| Si Hao (4th) | 14 | 28 |
| Xiao Si | 12 | 24 |
| Wu Hao (5th) | 10.5 | 21 |
| Xiao Wu | 9 | 18 |
| Liu Hao (6th) | 7.5 | 15 |
## Placeholder Convention
When required information is missing, use standardized placeholders so users can Find & Replace in Word.
**Format:** Always use full-width brackets `【 】`.
| Type | Format | Example |
|------|--------|---------|
| General field | `【field name】` | Name: 【company name】 |
| Monetary amount | `【RMB in words: yuan (lowercase: ¥)】` | Amount: 【RMB in words】 |
| Date field | `【____/____/____】` | Signing date: 【____/____/____】 |
| Long text | `【Please fill in: ______】` | Delivery criteria: 【Please fill in: ______】 |
| Attachment ref | `【See Appendix 1: ______】` | |
**Rules:**
1. Placeholder format must be consistent throughout the entire document
2. Each placeholder must specify exactly what is needed (never use vague "TBD" or "to be completed")
3. Never hard-code unconfirmed critical facts; use a placeholder instead
4. Never use sloppy expressions like "to be refined", "omitted", "user fills in later"
## Title Orphan Prevention (All Scenes)
Body headings (H1/H2/H3) and cover titles must avoid leaving 12 characters alone on the last line. This rule applies to ALL document types.
**For cover titles:** Always use `calcTitleLayout()` + `splitTitleLines()` from `design-system.md` — these handle orphan prevention automatically (merges ≤2-char last lines into the previous line).
**For body headings (H1/H2/H3):** When a heading text is long enough to wrap, apply the same `splitTitleLines()` logic. If the heading would cause a single-character orphan in Word's auto-wrapping, manually split into multiple `TextRun` elements with a `Break` (soft line break) at a semantic boundary.
```js
const { Break } = require("docx");
// Check if heading needs manual line break to prevent orphan
function buildHeadingRuns(text, maxCharsPerLine, runProps) {
// If text fits in one line, no action needed
if (text.length <= maxCharsPerLine) {
return [new TextRun({ text, ...runProps })];
}
// Use splitTitleLines to find semantic break points
const lines = splitTitleLines(text, maxCharsPerLine);
const runs = [];
for (let i = 0; i < lines.length; i++) {
if (i > 0) runs.push(new TextRun({ break: 1, ...runProps, text: "" })); // soft line break
runs.push(new TextRun({ text: lines[i], ...runProps }));
}
return runs;
}
```
**Estimation for maxCharsPerLine:** For centered headings, estimate available width = page width - left margin - right margin. For SimHei at a given pt size, each CJK char ≈ pt × 20 twips wide. Divide available width by char width to get `maxCharsPerLine`.
---
## Undefined / Null Value Prevention (Mandatory)
Generated code MUST guard against outputting literal `undefined`, `null`, `NaN`, or empty strings for any visible text field. This is a **hard requirement** — these are never acceptable in a delivered document.
```js
// ✅ MANDATORY: Safe text helper — use for ALL user-facing text values
function safeText(value, placeholder) {
if (value === undefined || value === null || value === "" || String(value) === "NaN" || String(value) === "undefined") {
return placeholder || "【Please fill in】";
}
return String(value);
}
// Usage:
new TextRun({ text: safeText(config.contact, "【Contact person】") })
new TextRun({ text: safeText(row.phone, "【Phone number】") })
```
**Rules:**
1. Every `TextRun` displaying user-provided or config-derived data MUST use `safeText()` or equivalent guard
2. If a field is optional and not provided, use `【Please fill in: field_name】` placeholder (full-width brackets)
3. Table cells with missing data: show `【Please fill in】`, never leave as empty string or undefined
4. This applies to ALL scenes — contracts, reports, academic, exams, etc.
---
## WPS / Office Word Compatibility (Mandatory)
Generated .docx files must render consistently in both Microsoft Office Word and WPS Office. The following OOXML features have known compatibility issues — avoid or use carefully.
### Features to AVOID (high incompatibility risk)
| Feature | Issue | Alternative |
|---------|-------|------------|
| **Text-character decorative lines** (e.g., `───`, `━━━`, `═══`, `——————`) | Character-drawn lines depend on font metrics and rendering engine — they appear different widths/lengths in MS Office vs WPS, often truncated or misaligned. They cannot span a controlled width. | **Always use paragraph borders** (`border.top`, `border.bottom`) for horizontal decorative lines. Paragraph borders render consistently across engines and respect indent for precise width control. See recipe R2 for correct implementation. |
| **Default table borders on cover wrapper tables** (forgetting `allNoBorders`) | docx-js default table borders are `single/auto/sz=4`. On the 16838-high cover wrapper, these borders add ~8 twips of extra height per edge. MS Office includes border thickness in height calculation, causing content to overflow by a few twips → **blank page 2**. WPS is more lenient and may absorb the overflow. | **Every cover wrapper table MUST explicitly set `borders: allNoBorders`** (all 6 border positions = NONE). Never rely on defaults. Define the `allNoBorders` constant and use it consistently. |
| `verticalAlign: "center"` or `"bottom"` in exact-height TableRow | WPS ignores vertical alignment in exact-height rows; content may clip or shift | Use `verticalAlign: "top"` + `spacing.before` to position content. Avoid `margins.top`/`margins.bottom` in exact-height cells — they reduce available height unpredictably across engines |
| `characterSpacing` (large values) | WPS renders differently from Word; letter spacing may collapse or expand | Keep `characterSpacing` ≤ 80; for cover English labels, test both renderers |
| `margins.top`/`margins.bottom` inside exact-height cells | MS Office and WPS calculate remaining height differently when cell margins are present | Use `spacing.before` on the first paragraph for vertical positioning; only use `margins.left`/`margins.right` |
| Complex nested Tables inside exact-height cells | WPS height calculation differs from Word; content may overflow or clip | Wrap everything in a single 16838 outer wrapper cell (R1 architecture). Nested tables inside are acceptable when the outer wrapper provides a safety net |
| Large font without explicit `spacing.line` | Paragraph inherits small line spacing from document default (e.g., 560tw for body); font taller than line height → top of characters clipped | Always set `spacing: { line: fontPt * 23, lineRule: "atLeast" }` on paragraphs with font size > body text |
| `ShadingType.SOLID` | WPS shows solid black instead of intended color | Always use `ShadingType.CLEAR` |
| OOXML raw XML for columns (`w:cols`) | WPS column rendering may differ | Use only when explicitly needed (A3 exam papers); test output |
| `titlePage: true` with complex headers/footers | WPS may not properly suppress first-page header/footer | Use separate sections instead of titlePage flag |
| Tab stops for alignment | WPS tab width may differ from Word | Use borderless Tables for alignment instead |
### Features that are SAFE (consistent rendering)
| Feature | Notes |
|---------|-------|
| Borderless Tables for layout | Both renderers handle well |
| `ShadingType.CLEAR` with fill color | Consistent |
| `rule: "exact"` on single-level TableRow | Works in both (avoid with nested Tables) |
| Paragraph borders (left, bottom, etc.) | Consistent |
| `spacing.before` / `spacing.after` | Consistent |
| Standard fonts (SimHei, SimSun, YaHei, TNR, Calibri) | Available on both platforms |
| `PageBreak` inside Paragraph | Consistent |
| Section breaks (`SectionType.NEXT_PAGE`) | Consistent |
### Mandatory Compatibility Checks (Post-Generation)
Add to quality self-check:
- [ ] No `ShadingType.SOLID` anywhere (search codebase)
- [ ] No `verticalAlign: "center"` or `"bottom"` in exact-height rows
- [ ] No tab-stop alignment for party info or data alignment (use Tables)
- [ ] Covers use the 16838 outer wrapper architecture (R1 pattern) with `spacing.before` for positioning; no `margins.top`/`margins.bottom` in exact-height cells
- [ ] **Cover section margin = `{ top: 0, bottom: 0, left: 0, right: 0 }`** — non-zero margins cause wrapper to shrink away from page edges
- [ ] **Cover wrapper row has `height: { value: 16838, rule: "exact" }`** — without this, content overflows or leaves whitespace
- [ ] **Cover is in a separate section from body content** — cover and body must not share a section
- [ ] **Cover wrapper table uses explicit `allNoBorders`** — never rely on default table borders (causes blank page 2 in MS Office)
- [ ] **No text-character decorative lines** (`───`, `━━━`, `═══`, `——————`) — use paragraph borders instead
- [ ] `characterSpacing` values ≤ 80 throughout
- [ ] TOC: follow `references/toc.md` checklist (heading style, TableOfContents element, PageBreak, post-processing script)
- [ ] All tables use `WidthType.PERCENTAGE` for column widths (WPS tblGrid bug; if DXA is unavoidable, set `columnWidths` explicitly)
```js
// ✅ Correct — percentage widths, WPS-safe
new Table({
width: { size: 100, type: WidthType.PERCENTAGE },
rows: [new TableRow({ children: [
new TableCell({ width: { size: 30, type: WidthType.PERCENTAGE }, children: [...] }),
new TableCell({ width: { size: 70, type: WidthType.PERCENTAGE }, children: [...] }),
]})],
});
// ❌ WRONG — DXA widths cause WPS tblGrid mismatch (all gridCol=100)
new TableCell({ width: { size: 3000, type: WidthType.DXA }, ... })
```
---
## Universal Prohibitions
These apply to ALL scenes. Scene files may add scene-specific prohibitions.
1. **No outlines-only** — always produce a complete, finished document
2. **No chat-style output** — the document must not read like a conversation or explanation
3. **No fake TOC / page numbers / headers** — use proper docx-js structures
4. **No excessive blank lines** to pad layout
5. **No dirty formatting** — no stray annotations, template fragments, broken hyperlinks, garbled markers
6. **No sloppy placeholders** — "TBD", "omitted", "略", "to be refined" are forbidden; use proper `【】` placeholders
7. **No fabricated data** — do not invent statistics, citations, legal references, or facts to appear professional
8. **No inconsistent heading/numbering** — one numbering system per document, no level-skipping
9. **No Markdown artifacts** — no `#`, `**`, `-` list markers, `>` blockquotes, and **no Markdown table syntax** (`| col1 | col2 |`, `|---|---|`) in the final docx. Any tabular data MUST be rendered as a proper docx `Table` object — never as plain-text pipe-delimited lines. This applies to ALL scenes including exam paper data tables, report statistics, and academic result tables.
10. **No bullet-list documents** — body text must be proper paragraphs, not endless bullet points
## Letter / Correspondence Format (Universal)
When generating any letter-style document (invitation letter, thank-you letter, cover letter, recommendation letter, English essay in letter format, etc.), the following layout rules apply regardless of scene:
1. **Complimentary close and sender name MUST be right-aligned** — e.g., "Yours sincerely,", "Best regards,", "Yours,", and the sender name below it must use `alignment: AlignmentType.RIGHT`
2. **Date** — if placed at the top of the letter, right-aligned; if at the bottom, right-aligned with the closing
3. **Salutation** ("Dear Mr. Smith," / "Dear Mike,") — left-aligned, followed by a blank line or `spacing.after`
4. **Body paragraphs** — left-aligned (English) or justified (CJK), with appropriate `spacing.after` between paragraphs
```js
// ✅ Correct — closing and sender right-aligned
new Paragraph({ alignment: AlignmentType.RIGHT, spacing: { before: 400 },
children: [new TextRun({ text: "Yours sincerely,", size: 24 })] }),
new Paragraph({ alignment: AlignmentType.RIGHT,
children: [new TextRun({ text: "Li Hua", size: 24 })] }),
// ❌ WRONG — closing left-aligned (default)
new Paragraph({
children: [new TextRun({ text: "Yours sincerely," })] }),
```
## Quality Self-Check (Universal)
→ See **SKILL.md § Post-Generation — Two-Layer Verification** for the complete checklist.
Scene files add scene-specific checks on top of that universal checklist.
## Execution Priority
When rules conflict, follow this precedence (highest first):
1. **User-provided template or explicit instructions** — always override defaults
2. **Scene-specific rules** — override common rules and design-system defaults
3. **Common rules** (this file) — override design-system aesthetic defaults
4. **Design-system defaults** — baseline aesthetics
## Cover Recipes
See `references/design-system.md` for the 7 validated cover recipes (R1R7) and 14 color palettes.
Cover recipe selection: `selectCoverRecipe(docType, industry, titleLength)` — defined in `references/design-system.md` (authoritative source).
---
## Cover Title Layout Rules (Mandatory)
These rules apply to ALL cover recipes (R1R7). They prevent the most common cover quality issues: title overflow, content spilling to page 2, and mid-word line breaks.
### Rule 1: Always use `calcTitleLayout()`
Every cover MUST call `calcTitleLayout(title, availableWidth)` from `design-system.md` to determine:
- **Font size** (dynamically calculated, never hardcoded above 40pt)
- **Line breaks** (semantically split, never mid-word)
**Forbidden:** Passing the full title as a single long TextRun and letting Word auto-wrap. This causes uncontrolled line breaks at arbitrary character positions.
### Rule 2: No single-character orphan lines
If the last line of a title contains only 12 characters, merge it into the previous line. The `splitTitleLines()` function handles this automatically.
### Rule 3: No mid-word breaks for CJK text
Line breaks must occur at semantic boundaries: after particles (e.g., de/yu/he/ji/zhi), punctuation, connectors, spaces, or underscores. Never split a compound term (e.g., a 4-character term like a management specification must not be split into 3+1 characters).
For mixed Chinese+English titles (e.g., "基于Transformer架构的..."), use `estimateTextWidth()` instead of character count for line break calculation. Chinese characters are ~2× wider than English characters at the same font size.
### Rule 4: Maximum 3 title lines on cover
Cover titles must not exceed 3 lines. If the title is too long, reduce font size (down to minimum 24pt) before adding more lines. If it still exceeds 3 lines at 24pt, force 3 lines with longer line lengths.
### Rule 5: Always use `calcCoverSpacing()` for whitespace
Spacing values (`spacing.before`) in cover elements must be dynamically calculated, not hardcoded. Fixed values like `before: 4500` assume a specific title length and will cause overflow with longer titles.
### Rule 6: Cover height budget validation
Before generating, verify that total content height stays within 15638 twips (16838 page height minus 1200 twips safety margin — MS Office renders large fonts taller than calculated). Each recipe in `design-system.md` includes height budget annotations — verify during generation.
### Rule 7: R5 meta info table (academic covers)
Academic cover meta info must use a 2-column table with **percentage widths only** (NOT DXA — WPS breaks with DXA widths):
- **Table width:** adaptive 5575% of page, calculated by `calcR5MetaLayout()` in `design-system.md`. Table is centered via `alignment: CENTER`.
- **Label column:** adaptive 2545% of table width, **LEFT aligned**, plain text label + "". NO full-width space padding, NO right-alignment, NO distributed alignment.
- **Value column:** remaining percentage, **LEFT aligned**, `bottom border single sz=4` = fixed-length underline (same length for all rows regardless of value text length).
- **Label column borders:** none (NO bottom border on label cells).
- ⚠️ Do NOT use DXA widths, full-width space padding (`\u3000`), spacer columns, or tab stops — these render inconsistently between MS Office and WPS.
### Rule 8: Large font paragraphs must set explicit line spacing
When a paragraph uses a font size larger than the document body text (e.g., cover titles at 36pt+), it **MUST** set explicit `spacing.line` to prevent clipping. Without it, the paragraph inherits the document/style default line spacing (often 560 twips for body text), which is smaller than the font height → the top of characters gets clipped.
**Formula:** `spacing.line = Math.ceil(fontPt * 23)` with `lineRule: "atLeast"`
**Example:** A 36pt title needs `spacing: { line: 828, lineRule: "atLeast" }`. Without this, the inherited `line=560` clips the top 160 twips of the text.
This applies to ALL large-font paragraphs (cover titles, chapter headings, decorative text), not just covers.
### Rule 9: Every TextRun on a colored background MUST set explicit `color`
⚠️ **CRITICAL:** When a TextRun is inside a cell/area with a dark or colored background (shading), it **MUST** explicitly set the `color` property. Omitting `color` defaults to black (`#000000`), which is invisible on dark backgrounds.
**Common mistake:** Subtitle or meta text on R1/R2/R4 dark cover blocks without `color` → appears as invisible black text on dark bg.
**Rule:** For any TextRun inside a shaded cell:
- Use `P.cover.titleColor` for title text
- Use `P.cover.subtitleColor` for subtitle text
- Use `P.cover.metaColor` for meta info text
- Use `P.cover.footerColor` for footer text
- **NEVER** rely on default color when background is not white
### Rule 10: Page number API nesting and 3-section numbering
⚠️ **CRITICAL:** Page number settings MUST be nested inside `page.pageNumbers`:
```js
// ❌ WRONG — docx-js ignores top-level pageNumberStart/pageNumberFormatType
properties: { pageNumberStart: 1, pageNumberFormatType: NumberFormat.DECIMAL }
// ✅ CORRECT
properties: { page: { pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL } } }
```
**Standard page numbering (5-zone convention):**
All multi-section documents MUST follow this five-zone page numbering scheme unless the user explicitly requests otherwise.
| Zone | Section | pageNumbers | Footer instrText | Notes |
|------|---------|-------------|-----------------|-------|
| 1. Cover | Title page | None (no footer) | — | Always logical page 1, but number is **hidden** |
| 2. Front matter | Abstract, TOC, Preface | `{ start: 1, formatType: UPPER_ROMAN }` | `PAGE \* ROMAN \* MERGEFORMAT` | Separate Roman numeral sequence (i, ii, iii…) |
| 3. Body | Main content | `{ start: 1, formatType: DECIMAL }` | `PAGE \* arabic \* MERGEFORMAT` | **Resets to 1** |
| 4. Appendix | Appendices (A, B, C…) | Continues body (no reset) | Same as body | No section break needed unless different headers required |
| 5. References | Bibliography | Continues body (no reset) | Same as body | If body ends on p.42, references continue from p.43 |
**Key rules:**
0. **NEVER use "Page X of Y" denominator format.** Footer must show only the current page number (e.g., `1`, `2`, `iii`). Do NOT display total page count. No `Page 3 of 12`, no `3 / 12`, no `第3页/共12页`. Just the bare number. `PageNumber.TOTAL_PAGES` / `NUMPAGES` is **FORBIDDEN** in footers.
1. **Cover is always page 1 internally** but the page number is never displayed. Suppress footer in cover section.
2. **Front matter uses independent Roman numerals** starting at `i`. This sequence is separate from the body.
3. **Body resets to Arabic 1.** The first page of main content is always page `1`.
4. **Appendix and references continue the body sequence.** No reset between body → appendix → references.
5. **Documents without front matter** skip zone 2 (cover hidden, body starts at Arabic 1).
6. **Documents without cover** start body (or front matter) at page 1 directly.
7. **Short documents (≤3 pages):** simple Arabic 1, 2, 3 throughout, no cover/frontmatter distinction.
8. **Single-page documents** (certificates, letters): no page numbering at all.
**3-section docx-js implementation (for documents with TOC):**
At minimum, implement zones 13 as separate docx sections:
```js
// Section 1: Cover — no page number
properties: { page: { /* no pageNumbers */ } }
// No footer children, or empty footer
// Section 2: Front matter — Roman numerals
properties: { page: { pageNumbers: { start: 1, formatType: NumberFormat.UPPER_ROMAN } } }
// Footer: PAGE \* ROMAN \* MERGEFORMAT
// Section 3: Body — Arabic, reset to 1
properties: { page: { pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL } } }
// Footer: PAGE \* arabic \* MERGEFORMAT
// Appendix and References: same section as body (continues numbering)
// Only create a new section if different header/footer content is needed
```
**Post-processing required** (WPS compatibility):
1. Remove empty `<w:pgNumType/>` from cover section XML
2. Patch footer instrText: replace bare `PAGE` with format-specific `PAGE \* ROMAN` or `PAGE \* arabic`
See `toc.md` § Page Number API for full details.

View File

@@ -0,0 +1,538 @@
## Geometric Decoration System — Pure docx-js Decorations
### Design Philosophy
Uses only docx-js native capabilities for visual decoration — no external tools (like Playwright screenshots). Suitable for covers, chapter separators, page background enhancement.
**When to fall back to Playwright?**
Only when gradients, complex illustrations, or brand visuals are needed that pure OOXML cannot express. Default: prefer native solutions below.
### Decoration Element Library
#### 1. Color Strip — Table Simulation
Single-row single-column borderless table + background color to create horizontal color strips.
```js
function colorStrip(color, height = 80) {
return new Table({
width: { size: 100, type: WidthType.PERCENTAGE },
borders: { top: NB, bottom: NB, left: NB, right: NB,
insideHorizontal: NB, insideVertical: NB },
rows: [new TableRow({
height: { value: height, rule: "exact" },
children: [new TableCell({
shading: { type: ShadingType.CLEAR, fill: color.replace("#", "") },
borders: { top: NB, bottom: NB, left: NB, right: NB },
children: [new Paragraph({ children: [] })],
})],
})],
});
}
// ══════════════════════════════════════════════════════════════
// R6 — Editorial Warm (minimal, warm white bg, no decorations)
// ══════════════════════════════════════════════════════════════
// Suitable for: lesson plans (non-STEM), cultural/creative, newsletters,
// event planning, internal reports, light-weight documents
// NOT for: formal business, consulting, finance, government, academic
// Title constraint: single line only (≤20 chars). Longer titles → route to R1.
//
// Structure: 2-row wrapper table (no border, warm bg shading)
// Row 1 (content): category → title → subtitle → fields
// Row 2 (footer): left English title + right label
// All spacing via paragraph indent (WPS safe, no cell margins).
function buildCoverR6(config) {
const P = config.palette;
const PAD_L = 1300, PAD_R = 1100;
const ind = { left: PAD_L, right: PAD_R };
const FOOTER_H = 900;
const CONTENT_H = 16838 - FOOTER_H;
const shading = { fill: P.bg || "F7F7F5", type: ShadingType.CLEAR };
// ⚠️ R6 uses a simplified title layout: prefer single line, shrink font to fit
const availW = 11906 - PAD_L - PAD_R;
const { titlePt, titleLines } = calcTitleLayoutR6(config.title, availW, 36, 22);
const titleSize = titlePt * 2;
const lineH = Math.ceil(titlePt * 23 * 1.3);
// Dynamic top spacing
const titleH = titleLines.length * (titleSize * 10 + 200);
const categoryH = 22 * 10 + 900;
const subtitleH = config.subtitle ? (28 * 10 + 1200) : 0;
const fieldsH = (config.metaLines || []).length * (24 * 10 + 100);
const contentH = categoryH + titleH + subtitleH + fieldsH;
const remaining = Math.max(CONTENT_H - 1200 - contentH, 400);
const topSpacing = Math.floor(remaining * 0.55);
const children = [];
// 1. Top spacer (dynamic)
children.push(new Paragraph({ indent: ind, spacing: { before: topSpacing } }));
// 2. Category label (small, wide letter-spacing)
if (config.englishLabel) {
children.push(new Paragraph({
indent: ind, spacing: { after: 900 },
children: [new TextRun({
text: config.englishLabel, size: 22,
color: P.cover.metaColor || "9A9A9A",
font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" },
characterSpacing: 60,
})],
}));
}
// 3. Title (single line preferred, dynamic font size)
for (let i = 0; i < titleLines.length; i++) {
children.push(new Paragraph({
indent: ind,
spacing: { after: i < titleLines.length - 1 ? 60 : 300, line: lineH, lineRule: "atLeast" },
children: [new TextRun({
text: titleLines[i], size: titleSize,
color: P.cover.titleColor || "2C2C2C",
font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" },
characterSpacing: 30,
})],
}));
}
// 4. Subtitle
if (config.subtitle) {
children.push(new Paragraph({
indent: ind, spacing: { after: 1200 },
children: [new TextRun({
text: config.subtitle, size: 28,
color: P.cover.subtitleColor || "6B6B6B",
font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" },
characterSpacing: 15,
})],
}));
}
// 5. Meta fields (tab-aligned label + value)
for (const line of (config.metaLines || [])) {
// Expect "labelvalue" format or plain text
const sep = line.indexOf("") !== -1 ? "" : (line.indexOf(":") !== -1 ? ":" : null);
const label = sep ? line.split(sep)[0].trim() : line;
const value = sep ? line.split(sep).slice(1).join(sep).trim() : "";
children.push(new Paragraph({
indent: ind, spacing: { after: 100 },
tabStops: [{ type: TabStopType.LEFT, position: PAD_L + 1600 }],
children: [
new TextRun({ text: label, size: 22, color: P.cover.metaColor || "9A9A9A",
font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" }, characterSpacing: 20 }),
...(value ? [
new TextRun({ text: "\t" }),
new TextRun({ text: value, size: 24, color: P.cover.subtitleColor || "6B6B6B",
font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" }, characterSpacing: 8 }),
] : []),
],
}));
}
// 6. Footer (2-column borderless table)
const footerLeft = config.footerLeft || "";
const footerRight = config.footerRight || "";
// Adaptive font size for long English footer text
const flSize = footerLeft.length > 60 ? 14 : (footerLeft.length > 40 ? 16 : 18);
const flSpacing = footerLeft.length > 60 ? 5 : (footerLeft.length > 40 ? 10 : 20);
const footerTable = new Table({
width: { size: 100, type: WidthType.PERCENTAGE },
layout: TableLayoutType.FIXED, borders: allNoBorders,
rows: [new TableRow({
children: [
new TableCell({
width: { size: 70, type: WidthType.PERCENTAGE }, borders: noBorders, shading,
children: [new Paragraph({
indent: { left: PAD_L },
children: [new TextRun({ text: footerLeft, size: flSize,
color: P.cover.footerColor || "9A9A9A",
font: { ascii: "Calibri" }, characterSpacing: flSpacing })],
})],
}),
new TableCell({
width: { size: 30, type: WidthType.PERCENTAGE }, borders: noBorders, shading,
children: [new Paragraph({
alignment: AlignmentType.RIGHT, indent: { right: PAD_R },
children: [new TextRun({ text: footerRight, size: 18,
color: P.cover.footerColor || "9A9A9A",
font: { ascii: "Calibri" }, characterSpacing: 20 })],
})],
}),
],
})],
});
// 7. 2-row wrapper (content + footer)
return [new Table({
width: { size: 100, type: WidthType.PERCENTAGE },
layout: TableLayoutType.FIXED, borders: allNoBorders,
rows: [
new TableRow({
height: { value: CONTENT_H, rule: "exact" },
children: [new TableCell({
shading, borders: noBorders,
margins: { top: 0, bottom: 0, left: 0, right: 0 },
verticalAlign: VerticalAlign.TOP,
children,
})],
}),
new TableRow({
height: { value: FOOTER_H, rule: "exact" },
children: [new TableCell({
shading, borders: noBorders,
margins: { top: 0, bottom: 0, left: 0, right: 0 },
verticalAlign: VerticalAlign.CENTER,
children: [footerTable],
})],
}),
],
})];
}
// R6 title layout: prefer FEWER lines over larger font size (single line best)
function calcTitleLayoutR6(title, availableWidthTw, preferredPt, minPt) {
const step = 2;
// Try to fit in 1 line (shrink font if needed)
for (let pt = preferredPt; pt >= minPt; pt -= step) {
const charWidthTw = pt * 23 * 0.5; // CJK ~50% em width
const charsPerLine = Math.floor(availableWidthTw / charWidthTw);
if (title.length <= charsPerLine) return { titlePt: pt, titleLines: [title] };
}
// Can't fit in 1 line, try 2 lines at largest possible font
for (let pt = preferredPt; pt >= minPt; pt -= step) {
const charWidthTw = pt * 23 * 0.5;
const charsPerLine = Math.floor(availableWidthTw / charWidthTw);
const lines = splitTitleLines(title, charsPerLine);
if (lines.length <= 2) return { titlePt: pt, titleLines: lines };
}
// Fallback: minPt, up to 3 lines
const charWidthTw = minPt * 23 * 0.5;
const charsPerLine = Math.floor(availableWidthTw / charWidthTw);
return { titlePt: minPt, titleLines: splitTitleLines(title, charsPerLine) };
}
// Usage: cover top decoration
// children: [colorStrip(P.accent, 120), ...]
```
#### 2. Side Ribbon
Uses left border to create vertical ribbon effect.
```js
function sideRibbon(content, color, width = 14) {
return new Paragraph({
border: {
left: { style: BorderStyle.SINGLE, size: width, color: color.replace("#", ""), space: 12 },
},
indent: { left: 240 },
spacing: { before: 100, after: 100 },
children: content,
});
}
// Usage: emphasis quotes, chapter tips
// sideRibbon([new TextRun({ text: "Key Insight", bold: true })], P.accent)
```
#### 3. Border Compositions
```js
// Top thick line + bottom thin line — title area frame
function frameTitle(titleRuns) {
return new Paragraph({
border: {
top: { style: BorderStyle.SINGLE, size: 18, color: c(P.accent) },
bottom: { style: BorderStyle.SINGLE, size: 4, color: c(P.accent) },
},
spacing: { before: 400, after: 200 },
alignment: AlignmentType.CENTER,
children: titleRuns,
});
}
// L-shape border — left + bottom
function lShapeBorder(content) {
return new Paragraph({
border: {
left: { style: BorderStyle.SINGLE, size: 12, color: c(P.accent), space: 10 },
bottom: { style: BorderStyle.SINGLE, size: 12, color: c(P.accent) },
},
indent: { left: 300 },
spacing: { before: 200, after: 300 },
children: content,
});
}
// Double-line frame — top and bottom double lines
function doubleLine(content) {
return new Paragraph({
border: {
top: { style: BorderStyle.DOUBLE, size: 6, color: c(P.accent) },
bottom: { style: BorderStyle.DOUBLE, size: 6, color: c(P.accent) },
},
spacing: { before: 200, after: 200 },
alignment: AlignmentType.CENTER,
children: content,
});
}
```
#### 4. Gradient Simulation
Multiple narrow color strips to simulate gradient effect.
```js
function gradientStrip(startColor, endColor, steps = 5, totalHeight = 200) {
const rows = [];
const h = Math.floor(totalHeight / steps);
for (let i = 0; i < steps; i++) {
const ratio = i / (steps - 1);
const blended = blendColors(startColor, endColor, ratio);
rows.push(new TableRow({
height: { value: h, rule: "exact" },
children: [new TableCell({
shading: { type: ShadingType.CLEAR, fill: blended },
borders: { top: NB, bottom: NB, left: NB, right: NB },
children: [new Paragraph({ children: [] })],
})],
}));
}
return new Table({
width: { size: 100, type: WidthType.PERCENTAGE },
borders: { top: NB, bottom: NB, left: NB, right: NB,
insideHorizontal: NB, insideVertical: NB },
rows,
});
}
function blendColors(hex1, hex2, ratio) {
const r1 = parseInt(hex1.slice(1, 3), 16), g1 = parseInt(hex1.slice(3, 5), 16), b1 = parseInt(hex1.slice(5, 7), 16);
const r2 = parseInt(hex2.slice(1, 3), 16), g2 = parseInt(hex2.slice(3, 5), 16), b2 = parseInt(hex2.slice(5, 7), 16);
const r = Math.round(r1 + (r2 - r1) * ratio), g = Math.round(g1 + (g2 - g1) * ratio), b = Math.round(b1 + (b2 - b1) * ratio);
return `${r.toString(16).padStart(2,"0")}${g.toString(16).padStart(2,"0")}${b.toString(16).padStart(2,"0")}`;
}
```
#### 5. Symbol Ornaments
```js
// Section divider line — for chapter separation
function ornamentDivider(symbol = "◆", count = 3) {
const ornament = Array(count).fill(symbol).join(" ");
return new Paragraph({
alignment: AlignmentType.CENTER,
spacing: { before: 400, after: 400 },
children: [new TextRun({ text: ornament, size: 20, color: c(P.accent) })],
});
}
// Common decoration symbols
// ◆ ◇ ● ○ ★ ☆ ■ □ ▲ △ ─ ━ ═ ║ ╔ ╗ ╚ ╝
// Ornamental: ❧ ❦ ✦ ✧ ✿ ❀ ❁ ※
```
#### 6. Info Card — Table Implementation
```js
function infoCard(title, items, accentColor) {
const ac = accentColor.replace("#", "");
const headerRow = new TableRow({
children: [new TableCell({
columnSpan: 2,
shading: { type: ShadingType.CLEAR, fill: ac },
margins: { top: 80, bottom: 80, left: 160, right: 160 },
borders: { top: NB, bottom: NB, left: NB, right: NB },
children: [new Paragraph({
children: [new TextRun({ text: title, bold: true, size: 24, color: "FFFFFF" })],
})],
})],
});
const dataRows = items.map(([label, value]) => new TableRow({
children: [
new TableCell({
width: { size: 30, type: WidthType.PERCENTAGE },
margins: { top: 60, bottom: 60, left: 160, right: 80 },
shading: { type: ShadingType.CLEAR, fill: "F8F9FA" },
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "E0E0E0" },
top: NB, left: NB, right: NB },
children: [new Paragraph({ children: [new TextRun({ text: label, size: 21, color: "666666" })] })],
}),
new TableCell({
margins: { top: 60, bottom: 60, left: 80, right: 160 },
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "E0E0E0" },
top: NB, left: NB, right: NB },
children: [new Paragraph({ children: [new TextRun({ text: value, size: 21 })] })],
}),
],
}));
return new Table({
width: { size: 80, type: WidthType.PERCENTAGE },
alignment: AlignmentType.CENTER,
borders: { top: NB, bottom: NB, left: NB, right: NB,
insideHorizontal: NB, insideVertical: NB },
rows: [headerRow, ...dataRows],
});
}
```
// R7 — Swiss Tech Minimalist (slate grey bg, Klein blue accent, asymmetric layout)
// Suitable for: cultural/creative research, trend reports, brand strategy, design deliverables
// Palette: ST-1 (exclusive)
// Layout: left-aligned title (upper 20%), right-shifted subtitle with top rule,
// right-aligned info block with accent right border, Swiss cross anchor
// Key features: ■ square accent dot, open-frame tables, large whitespace
//
// ⚠️ MANDATORY: All cover non-negotiables apply (margin=0, 16838 exact, allNoBorders)
// ⚠️ Title uses calcTitleLayout() with maxPt=36 (not 40 — R7 uses lighter visual weight)
function buildCoverR7(config) {
const P = palettes[config.palette || "ST-1"];
const C = P.cover;
const padL = 600;
// Title layout — R7 uses 36pt max (lighter than R1-R4's 40pt)
const availW = 11906 - padL - 600;
const { titlePt, titleLines } = calcTitleLayout(config.title, availW, 36, 24);
const titleSize = titlePt * 2;
const lineH = Math.ceil(titlePt * 23);
// Dynamic spacing based on title lines
const topSpacer = titleLines.length <= 2 ? 1200 : 800;
const subtitleSpacer = titleLines.length <= 2 ? 1400 : 800;
const infoSpacer = titleLines.length <= 2 ? 2200 : 1200;
const children = [];
// 1. Swiss cross anchor — top-left decorative element
children.push(new Paragraph({
spacing: { before: 600 },
indent: { left: padL },
children: [new TextRun({
text: "\uFF0B", // fullwidth plus
size: 40, bold: true, color: C.titleColor,
font: { ascii: "Arial", eastAsia: "SimHei" },
})],
}));
// 2. Top spacer
children.push(new Paragraph({ spacing: { before: topSpacer } }));
// 3. Title lines — left-aligned, last line has accent ■
titleLines.forEach((line, i) => {
const isLast = i === titleLines.length - 1;
const runs = [new TextRun({
text: line, size: titleSize, color: C.titleColor,
font: { ascii: "Arial", eastAsia: "Noto Sans SC" },
})];
if (isLast) {
runs.push(new TextRun({
text: " \u25A0", // ■ black square
size: 24, color: P.accent,
font: { ascii: "Arial" },
}));
}
children.push(new Paragraph({
indent: { left: padL },
spacing: { after: isLast ? 200 : 80, line: lineH, lineRule: "atLeast" },
children: runs,
}));
});
// 4. Subtitle spacer
children.push(new Paragraph({ spacing: { before: subtitleSpacer } }));
// 5. Subtitle — right-shifted, top border rule, wide character spacing
if (config.subtitle) {
children.push(new Paragraph({
indent: { left: 3800, right: 600 },
border: { top: { style: BorderStyle.SINGLE, size: 2, color: C.titleColor, space: 14 } },
spacing: { after: 200 },
children: [new TextRun({
text: config.subtitle, size: 26, color: C.subtitleColor,
font: { ascii: "Arial", eastAsia: "Noto Sans SC" },
characterSpacing: 40,
})],
}));
}
// 6. Decorative horizontal line
children.push(new Paragraph({
spacing: { before: 600 },
border: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "C8D0DC", space: 0 } },
}));
// 7. Info spacer
children.push(new Paragraph({ spacing: { before: infoSpacer } }));
// 8. Info footer — right-aligned, 4 label+value pairs, accent right border
// Standard fields: ORGANIZATION, RESPONSIBILITY, REPORT NUMBER, DATE & EDITION
const metaEntries = config.metaEntries || [
{ label: "ORGANIZATION", value: config.organization || "" },
{ label: "RESPONSIBILITY", value: config.responsibility || "" },
{ label: "REPORT NUMBER", value: config.reportNumber || "" },
{ label: "DATE & EDITION", value: config.dateEdition || "" },
];
for (const entry of metaEntries) {
// Label — 7pt uppercase English
children.push(new Paragraph({
alignment: AlignmentType.RIGHT,
indent: { right: 800 },
border: { right: { style: BorderStyle.SINGLE, size: 12, color: P.accent, space: 16 } },
spacing: { after: 20 },
children: [new TextRun({
text: entry.label, size: 14, color: C.metaColor,
font: { ascii: "Arial" },
characterSpacing: 20,
})],
}));
// Value — 11pt bold
children.push(new Paragraph({
alignment: AlignmentType.RIGHT,
indent: { right: 800 },
border: { right: { style: BorderStyle.SINGLE, size: 12, color: P.accent, space: 16 } },
spacing: { after: 280 },
children: [new TextRun({
text: entry.value, size: 22, bold: true, color: C.titleColor,
font: { ascii: "Arial", eastAsia: "Noto Sans SC" },
})],
}));
}
// Wrap in 16838 exact wrapper table
return [new Table({
width: { size: 100, type: WidthType.PERCENTAGE },
layout: TableLayoutType.FIXED,
borders: allNoBorders,
rows: [new TableRow({
height: { value: 16838, rule: "exact" },
children: [new TableCell({
shading: { type: ShadingType.CLEAR, fill: P.bg },
borders: noBorders,
verticalAlign: VerticalAlign.TOP,
children,
})],
})],
})];
}
### Decoration Usage Scenarios
| Scenario | Recommended Decoration | Combination |
|------|----------|----------|
| Report cover | Color strip + L-frame border | Top strip → Title area → L-frame author info |
| Proposal cover | Gradient simulation + double-line frame | Gradient bg → Double-line title |
| Chapter separator | Symbol ornament + side ribbon | Symbol divider → New chapter title with ribbon |
| Summary card | Info card | Standalone card displaying key metrics |
| Academic cover | Color strip + info table | Top strip → School name → Title → Info table |
---

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,257 @@
# docx-js Advanced Features
Advanced API for complex document scenarios. Load this when creating documents with TOC, cover pages, footnotes, multi-section layouts, or post-processing needs.
## Table of Contents (TOC)
**→ See `references/toc.md` for the complete TOC reference** (3-step process, code examples, page numbering, common bugs, checklist).
## Cover Page Design (Vertical Centering)
Use large `spacing.before` to push content down for visual centering:
```js
// Approximate vertical center on A4:
// Total printable height ≈ 14000 twips
// For title at ~40% from top: before = 5600
const coverSection = {
properties: {
page: { /* standard A4 */ },
// No headers/footers on cover page
},
children: [
new Paragraph({ spacing: { before: 5600 } }), // spacer
new Paragraph({
alignment: AlignmentType.CENTER,
children: [new TextRun({
text: title,
font: { ascii: "Calibri", eastAsia: "SimHei" },
size: 52, bold: true, color: palette.primary,
})],
}),
// ... subtitle, author, date
],
};
```
For multi-section documents, put the cover in its own section so it can have different headers/footers.
## Footnotes
```js
const { FootnoteReferenceRun, Footnote } = require("docx");
const doc = new Document({
footnotes: {
1: { children: [new Paragraph({ children: [new TextRun({ text: "Smith, J. (2024). Research Methods. Academic Press, pp. 45-67.", size: 18 })] })] },
2: { children: [new Paragraph({ children: [new TextRun({ text: "Zhang, W. (2023). \u201c数据分析方法研究\u201d. 科学通报, 68(12), 1234-1250.", size: 18 })] })] },
},
sections: [{
children: [
new Paragraph({
children: [
new TextRun({ text: "According to recent studies" }),
new FootnoteReferenceRun(1), // superscript [1]
new TextRun({ text: ", data analysis methods have evolved" }),
new FootnoteReferenceRun(2), // superscript [2]
new TextRun({ text: "." }),
],
}),
],
}],
});
```
### Academic Reference Pattern
For sequential references [1][2][3]..., pre-define all footnotes in the `footnotes` object with numeric keys, then reference them inline with `FootnoteReferenceRun(n)`.
## keepNext — Element Binding
Prevent page breaks between related elements:
```js
// Heading stays with next paragraph
new Paragraph({
heading: HeadingLevel.HEADING_2,
keepNext: true, // don't break after this
children: [new TextRun({ text: "Table 1: Results" })],
})
// Table immediately follows on same page
// Caption stays with image
new Paragraph({
keepNext: true,
alignment: AlignmentType.CENTER,
children: [new TextRun({ text: "Figure 1: Architecture Diagram", italics: true, size: 20 })],
})
// ImageRun paragraph follows
```
Use `keepNext: true` for:
- Heading → first paragraph of section
- Table caption → table
- Image → image caption
- "Figure X" label → image
## Page Break Rules
Follow the document type strategy defined in SOUL.md Rule 1.
**Structural breaks (always):**
- Cover page → TOC
- TOC → main content
- Main content → back cover
**Content breaks (by document type):**
- Academic / teaching → `new Paragraph({ children: [new PageBreak()] })` before each H1 chapter
- Business report → PageBreak before each H1; H2 flows naturally
- Resume / contract / letter → No content page breaks
- Short article → No content page breaks
**Anti-tear (mandatory):**
```js
// Heading stays with next paragraph
new Paragraph({
heading: HeadingLevel.HEADING_1,
keepNext: true,
children: [new TextRun("Chapter Title")],
})
// Table caption stays with table
new Paragraph({
keepNext: true,
children: [new TextRun({ text: "Table 1: Summary", italics: true })],
})
// Image caption stays with image
new Paragraph({
keepNext: true,
children: [new TextRun({ text: "Figure 1: Architecture", italics: true })],
})
```
**Never:**
- PageBreak inside tables
- PageBreak as standalone element (must be inside Paragraph)
- PageBreak at the END of the last section (causes blank page)
```js
// Correct: page break between cover and TOC
new Paragraph({ children: [new PageBreak()] })
```
## Quotes Escaping in JS Strings
**⚠️⚠️⚠️ CRITICAL — #1 MOST COMMON BUG ⚠️⚠️⚠️**
Bare Chinese curly quotation marks (`""` `''`) in JS string literals **WILL break syntax and crash document generation**. This bug occurs most often in **Chinese body text** where curly quotes are used for emphasis, proper nouns, event names, or quoted speech — e.g., `"双11"`, `"前低后高"`, `"618"大促`. **Every single occurrence** of `""''` in text content MUST be Unicode-escaped. No exceptions.
**MANDATORY RULE: Before writing ANY `TextRun`, `para()`, or string containing Chinese text, scan the text for `""''` characters and replace ALL of them with `\u201c \u201d \u2018 \u2019`.**
| Character | Unicode | Escape method |
|-----------|---------|---------------|
| `"` `"` | `\u201c` `\u201d` | Unicode escape `\u201c` `\u201d` |
| `'` `'` | `\u2018` `\u2019` | Unicode escape `\u2018` `\u2019` |
| `"` | U+0022 | `\"` or wrap string in single quotes / template literal |
| `'` | U+0027 | `\'` or wrap string in double quotes / template literal |
```js
// ❌ WRONG — curly quotes in Chinese text break JS syntax (VERY COMMON MISTAKE)
content.push(para("2025年四个季度行业增速呈现"前低后高"的态势。在"618"大促、"双11""双12"活动拉动下增长显著。"));
new TextRun({ text: "他说"你好"" })
new TextRun({ text: 'It's a test' })
// ✅ CORRECT — ALL curly quotes replaced with Unicode escapes
content.push(para("2025年四个季度行业增速呈现\u201c前低后高\u201d的态势。在\u201c618\u201d大促、\u201c双11\u201d\u201c双12\u201d活动拉动下增长显著。"));
new TextRun({ text: "他说\u201c你好\u201d" })
new TextRun({ text: "It\u2019s a test" })
// ✅ CORRECT — straight quotes escaped or use alternate delimiters
new TextRun({ text: "He said \"hello\"" })
new TextRun({ text: 'He said "hello"' })
new TextRun({ text: `He said "hello"` })
```
## Multi-Section Documents
Different headers/footers per section:
```js
const doc = new Document({
sections: [
{
// Section 1: Cover — no header/footer
properties: { page: { /* ... */ } },
children: coverChildren,
},
{
// Section 2: Front matter — Roman page numbers
properties: {
type: SectionType.NEXT_PAGE,
page: {
/* size, margin... */
pageNumbers: { start: 1, formatType: NumberFormat.UPPER_ROMAN },
},
},
headers: { default: new Header({ children: [] }) },
footers: {
default: new Footer({
children: [new Paragraph({
alignment: AlignmentType.CENTER,
children: [new TextRun({ children: [PageNumber.CURRENT], size: 18 })],
})],
}),
},
children: tocAndAbstract,
},
{
// Section 3: Main content — Arabic page numbers
properties: {
type: SectionType.NEXT_PAGE,
page: {
/* size, margin... */
pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL },
},
},
headers: {
default: new Header({
children: [new Paragraph({
alignment: AlignmentType.CENTER,
children: [new TextRun({ text: docTitle, size: 18, color: "888888" })],
})],
}),
},
footers: { default: footerWithPageNumbers },
children: mainContent,
},
],
});
```
## Converting DOCX to PDF
```bash
# Using LibreOffice (headless)
libreoffice --headless --convert-to pdf output.docx
# ⚠️ TOC Rule: If document has TOC, warn user that:
# 1. LibreOffice conversion may show empty TOC
# 2. User should open in Word first, update fields (Ctrl+A → F9), save, then convert
# 3. Or use Word's "Save as PDF" for best results
```
## Converting DOCX to Images
```bash
# Step 1: Convert to PDF
libreoffice --headless --convert-to pdf output.docx
# Step 2: Convert PDF to images
pdftoppm -png -r 200 output.pdf output_page
# This generates output_page-1.png, output_page-2.png, etc.
# Use -r 200 for good quality (200 DPI)
```
Useful for generating preview thumbnails or when user needs images instead of document files.

View File

@@ -0,0 +1,333 @@
# docx-js API Reference
Complete API for creating .docx documents with the `docx` npm package. For advanced features (TOC details, footnotes, PDF conversion), see `docx-js-advanced.md`.
## Setup
```js
const {
Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell,
ImageRun, PageBreak, Header, Footer, PageNumber, NumberFormat,
AlignmentType, HeadingLevel, WidthType, BorderStyle, ShadingType,
PageOrientation, TabStopType, TabStopPosition, ExternalHyperlink,
InternalHyperlink, Bookmark, LevelFormat, TableOfContents,
} = require("docx");
const fs = require("fs");
```
## Document Creation + Export
```js
const doc = new Document({
styles: { /* see Styles section */ },
numbering: { config: [ /* see Lists section */ ] },
sections: [{
properties: {
page: {
size: { width: 11906, height: 16838 },
margin: { top: 1417, bottom: 1417, left: 1701, right: 1417 },
},
},
headers: { default: new Header({ children: [/* */] }) },
footers: { default: new Footer({ children: [/* */] }) },
children: [ /* Paragraphs, Tables, etc. */ ],
}],
});
const buffer = await Packer.toBuffer(doc);
fs.writeFileSync("output.docx", buffer);
```
## Paragraph + TextRun
```js
new Paragraph({
heading: HeadingLevel.HEADING_1, // or HEADING_2, HEADING_3
alignment: AlignmentType.JUSTIFIED,
spacing: { before: 240, after: 120, line: 312 }, // 1.3x mandatory
indent: { firstLine: 480 }, // 2-char CJK indent (480 SimSun / 420 YaHei)
children: [
new TextRun({
text: "Hello",
bold: true,
italics: true,
size: 24, // 12pt = Xiao Si
font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" },
color: "000000", // Pure black for Profile A; for Profile B use palette.body
}),
],
});
// Additional text formatting options
new TextRun({ text: "Underlined", underline: { type: UnderlineType.SINGLE } })
new TextRun({ text: "Highlighted", highlight: "yellow" })
new TextRun({ text: "Strikethrough", strike: true })
new TextRun({ text: "x²", superScript: true })
new TextRun({ text: "H₂O", subScript: true })
new SymbolRun({ char: "2022", font: "Symbol" }) // Bullet •
```
## Table
**⚠️ CRITICAL**: Always set `margins` on TableCell (or at Table level for global default). Without margins, text touches borders.
**⚠️ CRITICAL**: Use `ShadingType.CLEAR` — never `ShadingType.SOLID` (causes black cells).
**⚠️ CRITICAL — Table Cross-Page Control**:
- Header row MUST set `tableHeader: true` (auto-repeat header on page break)
- All rows MUST set `cantSplit: true` (prevent row content split across pages)
- Title paragraph before table MUST set `keepNext: true` (keep title with table)
```js
// ⚠️ Title before table — keepNext keeps title with table
new Paragraph({
keepNext: true, // ← critical
children: [new TextRun({ text: "Table 1 Feature Comparison", bold: true, size: 21 })],
}),
new Table({
width: { size: 100, type: WidthType.PERCENTAGE },
borders: {
top: { style: BorderStyle.SINGLE, size: 2, color: "9AA6B2" },
bottom: { style: BorderStyle.SINGLE, size: 2, color: "9AA6B2" },
left: { style: BorderStyle.NONE },
right: { style: BorderStyle.NONE },
insideHorizontal: { style: BorderStyle.SINGLE, size: 1, color: "D0D0D0" },
insideVertical: { style: BorderStyle.NONE },
},
rows: [
// ⚠️ Header row — tableHeader + cantSplit
new TableRow({
tableHeader: true, // auto-repeat on page break
cantSplit: true, // prevent row split
children: ["Header 1", "Header 2"].map(text =>
new TableCell({
children: [new Paragraph({ children: [new TextRun({ text, bold: true, size: 21 })] })],
shading: { type: ShadingType.CLEAR, fill: "F1F5F9" },
margins: { top: 60, bottom: 60, left: 120, right: 120 },
width: { size: 50, type: WidthType.PERCENTAGE },
})
),
}),
// ⚠️ Data rows — cantSplit
new TableRow({
cantSplit: true, // prevent row split
children: ["Data 1", "Data 2"].map(text =>
new TableCell({
children: [new Paragraph({ children: [new TextRun({ text, size: 21 })] })],
margins: { top: 60, bottom: 60, left: 120, right: 120 },
width: { size: 50, type: WidthType.PERCENTAGE },
})
),
}),
],
});
```
### Column Widths
```js
// Fixed widths (twips)
width: { size: 3000, type: WidthType.DXA }
// Percentage
width: { size: 50, type: WidthType.PERCENTAGE }
```
## ImageRun
**⚠️ CRITICAL**: Always include `type` parameter. Always preserve aspect ratio.
```js
const imageBuffer = fs.readFileSync("chart.png");
// Calculate dimensions preserving aspect ratio
const displayWidth = 500;
const aspectRatio = originalHeight / originalWidth;
const displayHeight = Math.round(displayWidth * aspectRatio);
new Paragraph({
alignment: AlignmentType.CENTER,
children: [
new ImageRun({
data: imageBuffer,
transformation: { width: displayWidth, height: displayHeight },
type: "png", // REQUIRED: "png", "jpg", "gif", "bmp"
}),
],
});
```
## PageBreak
**⚠️ CRITICAL**: PageBreak MUST be inside a Paragraph. Standalone PageBreak crashes Word.
**⚠️ Best Practice**: Attach PageBreak to the end of a **paragraph with text content**. Avoid empty paragraph + PageBreak (may cause blank pages). If using multi-section structure, prefer section breaks over PageBreak.
```js
// ✅ Recommended — PageBreak attached to content paragraph
new Paragraph({
children: [
new TextRun({ text: "End of section" }),
new PageBreak()
]
})
// ✅ Acceptable — but prefer section breaks
new Paragraph({ children: [new PageBreak()] })
// ✅ Best — use section breaks instead of PageBreak
// Place content in different sections — auto page break
```
## Headers & Footers + Page Numbers
```js
headers: {
default: new Header({
children: [
new Paragraph({
alignment: AlignmentType.CENTER,
children: [new TextRun({ text: "Document Title", size: 18, color: "888888" })],
}),
],
}),
},
footers: {
default: new Footer({
children: [
new Paragraph({
alignment: AlignmentType.CENTER,
children: [
new TextRun({ children: [PageNumber.CURRENT], size: 18 }),
],
}),
],
}),
},
```
> ⚠️ **Denominator FORBIDDEN** — never use `PageNumber.TOTAL_PAGES` or "X / Y" format. Show only current page number.
## Styles Definition
The example below is for **Chinese documents** (default). For **English documents**, replace `font` with `"Times New Roman"` throughout.
```js
styles: {
default: {
document: {
run: {
font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" },
size: 24, color: "000000", // Pure black for Profile A; for Profile B use palette.body
},
paragraph: {
spacing: { line: 312 }, // 1.3x mandatory
},
},
heading1: {
run: { font: { ascii: "Calibri", eastAsia: "SimHei" }, size: 32, bold: true, color: "0B1220" },
paragraph: { spacing: { before: 360, after: 160, line: 312 } },
},
heading2: {
run: { font: { ascii: "Calibri", eastAsia: "SimHei" }, size: 28, bold: true, color: "0B1220" },
paragraph: { spacing: { before: 240, after: 120, line: 312 } },
},
heading3: {
run: { font: { ascii: "Calibri", eastAsia: "SimHei" }, size: 24, bold: true, color: "0B1220" },
paragraph: { spacing: { before: 200, after: 100, line: 312 } },
},
},
}
```
## Lists
**⚠️ CRITICAL**: Each separate numbered list MUST use a unique `reference` name. Reusing the same reference causes numbering to continue instead of restarting.
```js
// In Document numbering config
numbering: {
config: [
{
reference: "list-features", // unique name!
levels: [{
level: 0,
format: LevelFormat.DECIMAL,
text: "%1.",
alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } },
}],
},
{
reference: "list-benefits", // different name for second list!
levels: [{ /* same config */ }],
},
],
},
// Usage in paragraphs
new Paragraph({
numbering: { reference: "list-features", level: 0 },
children: [new TextRun({ text: "First item" })],
})
```
### Bullet Lists
```js
new Paragraph({
bullet: { level: 0 },
children: [new TextRun({ text: "Bullet item" })],
})
```
## Hyperlinks
### External Link
```js
new ExternalHyperlink({
children: [new TextRun({ text: "Click here", style: "Hyperlink" })],
link: "https://example.com",
})
```
### Internal Link (Bookmark)
```js
// Define bookmark at target
new Paragraph({
children: [
new Bookmark({ id: "section1", children: [new TextRun("Section 1")] }),
],
})
// Link to bookmark
new InternalHyperlink({
children: [new TextRun({ text: "Go to Section 1", style: "Hyperlink" })],
anchor: "section1",
})
```
## Table of Contents (TOC)
**→ See `references/toc.md` for the complete TOC reference.**
Quick reminder: (1) Add `TableOfContents` element + PageBreak, (2) Run `python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx --auto`, (3) Check exit code.
## Tabs
```js
new Paragraph({
tabStops: [
{ type: TabStopType.RIGHT, position: TabStopPosition.MAX },
],
children: [new TextRun("Left"), new TextRun("\t"), new TextRun("Right")]
})
```
## Constants Quick Reference
- **Underlines:** `SINGLE`, `DOUBLE`, `WAVY`, `DASH`
- **Borders:** `SINGLE`, `DOUBLE`, `DASHED`, `DOTTED`
- **Numbering:** `DECIMAL` (1,2,3), `UPPER_ROMAN` (I,II,III), `LOWER_LETTER` (a,b,c)
- **Symbols:** `"2022"` (•), `"00A9"` (©), `"00AE"` (®), `"2122"` (™)

323
skills/docx/references/faq.md Executable file
View File

@@ -0,0 +1,323 @@
# FAQ — Common Bugs and Fixes
## Bug: Table text touching cell borders
**Symptom**: Text is cramped against table cell edges, no padding.
**Fix**: Set `margins` at the TableCell level:
```js
new TableCell({
margins: { top: 60, bottom: 60, left: 120, right: 120 },
children: [/* ... */],
})
```
---
## Bug: Numbered list doesn't restart
**Symptom**: Second numbered list continues from where the first left off (e.g., starts at 4 instead of 1).
**Fix**: Each separate numbered list MUST use a unique `reference` name in numbering config:
```js
numbering: { config: [
{ reference: "list-A", levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1." }] },
{ reference: "list-B", levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1." }] },
]}
```
---
## Bug: Cover and content on same page
**Symptom**: Cover page content flows directly into main content without page break.
**Fix**: Add a PageBreak paragraph at the end of cover content:
```js
coverChildren.push(new Paragraph({ children: [new PageBreak()] }));
```
---
## Bug: Three-line table shows all borders
**Symptom**: Table intended to be three-line shows full grid borders.
**Fix**: Set table-level borders to NONE, then override only specific cell borders:
```js
// Table level: all borders NONE
borders: { top: { style: BorderStyle.SINGLE, size: 4 }, bottom: { style: BorderStyle.SINGLE, size: 4 },
left: { style: BorderStyle.NONE }, right: { style: BorderStyle.NONE },
insideHorizontal: { style: BorderStyle.NONE }, insideVertical: { style: BorderStyle.NONE } }
// Header cells: bottom border only
headerCell.borders = { bottom: { style: BorderStyle.SINGLE, size: 2, color: "000000" } }
```
---
## Bug: User requests Chinese font size name (e.g. Wu Hao) but output is wrong
**Symptom**: Font size doesn't match expected Chinese size name.
**Fix**: Use the correct half-point value. `size` in docx-js is in half-points:
- Wu Hao 五号 = 10.5pt → `size: 21`
- Xiao Si 小四 = 12pt → `size: 24`
- Si Hao 四号 = 14pt → `size: 28`
See SKILL.md for complete conversion table.
---
## Bug: Black table cells
**Symptom**: Table cells appear solid black in Word.
**Fix**: Use `ShadingType.CLEAR` not `ShadingType.SOLID`:
```js
// ❌ WRONG
shading: { type: ShadingType.SOLID, fill: "F1F5F9" }
// ✅ CORRECT
shading: { type: ShadingType.CLEAR, fill: "F1F5F9" }
```
---
## Bug: Chinese characters garbled in matplotlib charts
**Symptom**: Chinese text shows as empty boxes □□□ in generated PNG charts.
**Fix**: Configure SimHei font before plotting:
```python
from matplotlib.font_manager import FontProperties
zh_font = FontProperties(fname="/path/to/SimHei.ttf")
plt.title("中文标题", fontproperties=zh_font)
plt.rcParams["axes.unicode_minus"] = False
```
---
## Bug: Image stretched/squashed in document
**Symptom**: Embedded image appears distorted.
**Fix**: Calculate display height from width using original aspect ratio:
```js
const aspectRatio = originalHeight / originalWidth;
const displayWidth = 500;
const displayHeight = Math.round(displayWidth * aspectRatio);
new ImageRun({ data: buf, transformation: { width: displayWidth, height: displayHeight }, type: "png" });
```
---
## Bug: TOC shows empty in generated document
→ See `references/toc.md` — "5 Common TOC Bugs" section for diagnosis and fixes.
---
## Bug: PageBreak standalone crashes Word
**Symptom**: Document fails to open or renders incorrectly.
**Fix**: PageBreak must always be wrapped in a Paragraph:
```js
// ❌ WRONG — standalone
children: [new PageBreak()]
// ✅ CORRECT — inside Paragraph
children: [new Paragraph({ children: [new PageBreak()] })]
```
---
## Bug: Quotation marks break JavaScript syntax — ⚠️ #1 MOST COMMON BUG
**This is the single most frequent code generation error.** Chinese text routinely uses curly quotes `""` for emphasis, proper nouns, and event names (e.g., "双11", "前低后高", "618"大促). These MUST be Unicode-escaped — bare curly quotes silently break JS syntax.
**Rule: scan ALL Chinese text for `""''` and replace with `\u201c \u201d \u2018 \u2019` BEFORE writing the string.**
```js
// ❌ WRONG — curly quotes in Chinese text break syntax (extremely common)
para("行业增速呈现"前低后高"的态势,在"618"大促拉动下增长。")
"他说"你好"" // \u201c \u201d
'It's a test' // \u2019
// ✅ CORRECT — Unicode escapes for ALL curly quotes
para("行业增速呈现\u201c前低后高\u201d的态势在\u201c618\u201d大促拉动下增长。")
"他说\u201c你好\u201d"
"It\u2019s a test"
// ✅ Straight quotes: escape or use alternate delimiters
"He said \"hello\""
'He said "hello"'
```
---
## Bug: Unwanted blank pages in document
**Common causes:**
1. **Trailing PageBreak at end of last section** — pagination should use section breaks or be at the start of the next section
2. **Empty Paragraph overflow** — empty paragraphs at page bottom push to a new page
3. **PageBreak right after Table** — Table already at page bottom, PageBreak creates extra page
**Fix:**
```js
// Post-generation check: last section's children should not end with PageBreak
function removeTrailingPageBreak(section) {
const children = section.children;
if (!children.length) return;
const last = children[children.length - 1];
// If last element is a Paragraph containing only PageBreak, remove it
if (last instanceof Paragraph) {
const runs = last.root?.filter(c => c instanceof PageBreak);
if (runs?.length && !last.root?.some(c => c instanceof TextRun)) {
children.pop();
}
}
}
```
**Prevention rules:**
- Place PageBreak at the **start of the next section**, not the end of the previous one
- Or use separate sections for pagination (no PageBreak needed)
- The last section of a document must NEVER end with a PageBreak
---
## Bug: Different rendering in WPS vs Microsoft Word
**Symptom**: Document looks correct in Word but renders differently in WPS (or vice versa) — misaligned tables, shifted content, clipped text in cells, black cells, or broken covers.
**Root causes and fixes:**
### 1. `ShadingType.SOLID` shows black in WPS
```js
// ❌ WPS shows solid black
shading: { type: ShadingType.SOLID, fill: "F1F5F9" }
// ✅ Both renderers show correct color
shading: { type: ShadingType.CLEAR, fill: "F1F5F9" }
```
### 2. `verticalAlign: "center"` in exact-height rows shifts content
WPS ignores vertical centering in `rule: "exact"` rows — content stays at top, creating visual mismatch.
```js
// ❌ Inconsistent between Word and WPS
new TableRow({ height: { value: 800, rule: "exact" },
children: [new TableCell({ verticalAlign: VerticalAlign.CENTER, ... })] })
// ✅ Use top alignment + margins/spacing for positioning
new TableRow({ height: { value: 800, rule: "exact" },
children: [new TableCell({ verticalAlign: VerticalAlign.TOP,
margins: { top: 200 }, ... })] })
```
### 3. Tab stops misalign in WPS
Tab widths differ between Word and WPS. Never use tabs for alignment.
```js
// ❌ Tab-based alignment — breaks in WPS
new Paragraph({ tabStops: [{ type: TabStopType.RIGHT, position: 8000 }],
children: [new TextRun({ text: "Party A:\tCompany Name" })] })
// ✅ Borderless table for alignment — consistent everywhere
new Table({ borders: allNoBorders, rows: [new TableRow({ children: [
new TableCell({ children: [new Paragraph({ children: [new TextRun({ text: "Party A:" })] })] }),
new TableCell({ children: [new Paragraph({ children: [new TextRun({ text: "Company Name" })] })] }),
] })] })
```
### 4. Nested tables in exact-height cells overflow differently
Word calculates nested table heights more accurately than WPS. Use stacked tables instead.
```js
// ❌ Nested table inside exact-height cell
new TableRow({ height: { value: 16838, rule: "exact" },
children: [new TableCell({ children: [nestedTable1, nestedTable2] })] })
// ✅ Stacked approach — content table + filler table
[contentTable, fillerTable] // both at top level, heights sum to 16838
```
### 5. `characterSpacing` renders differently
Large `characterSpacing` values cause inconsistent letter spacing. Keep ≤ 80.
### 6. `titlePage: true` header/footer suppression
WPS may not correctly hide first-page headers when using `titlePage: true`. Use a separate section for the cover instead.
---
## Bug: Cover spills to second page
**Symptom**: Cover content overflows, with some elements (date, footer, accent strip) appearing on page 2.
**Root cause**: Total content height exceeds 16838 twips (A4 page height). Common when:
- Title is very long (3+ lines at large font size)
- Fixed spacing values assume short title
- Multiple meta lines + subtitle + English label
**Fix**: Always use `calcTitleLayout()` + `calcCoverSpacing()` from `design-system.md`. These dynamically adjust font sizes and spacing to fit within the page. See `design-system.md § Cover Content Overflow Prevention` for the complete checklist.
---
## Bug: Blank page 2 after cover in MS Office (but not WPS)
**Symptom**: Cover displays correctly in WPS but produces a blank second page in MS Office Word.
**Root cause**: The cover wrapper table uses **default docx-js table borders** (`single/auto/sz=4`) instead of explicitly setting `allNoBorders`. Default borders add ~8 twips per edge. MS Office includes border thickness in the exact-height row calculation, pushing total height past 16838 twips → overflow to page 2. WPS is more lenient and absorbs the extra pixels.
**Fix**: Every cover wrapper table MUST explicitly set `borders: allNoBorders`:
```js
const NB = { style: BorderStyle.NONE, size: 0, color: "FFFFFF" };
const allNoBorders = { top: NB, bottom: NB, left: NB, right: NB,
insideHorizontal: NB, insideVertical: NB };
new Table({
borders: allNoBorders, // ← MANDATORY
rows: [new TableRow({
height: { value: 16838, rule: "exact" },
// ...
})],
});
```
**Prevention**: Add to post-generation check — search for any `new Table` in cover code that does not explicitly set `borders`.
---
## Bug: Cover decorative lines appear truncated or misaligned
**Symptom**: Horizontal decorative lines on the cover (accent strips, divider rules) display at different widths in MS Office vs WPS, or appear truncated / not spanning the intended width.
**Root cause**: Lines were implemented using text characters (`───`, `━━━`, `═══`, `——————`) instead of paragraph borders. Character-drawn lines depend on font metrics (character width × count), which vary across rendering engines.
**Fix**: Always use **paragraph borders** for decorative lines:
```js
// ✅ Paragraph border — renders consistently in both MS Office and WPS
new Paragraph({
indent: { left: 1000, right: 1000 },
border: { top: { style: BorderStyle.SINGLE, size: 18, color: accentColor, space: 20 } },
children: [],
})
// ❌ NEVER use text characters for decorative lines
new TextRun({ text: "───────────────" }) // width varies across engines
```
**Note**: This applies to ALL cover recipes (R1R5). Recipe R2 uses `border.top` and `border.bottom` for its double-rule frame — follow this pattern.
---
## Bug: "undefined" appears in document text
**Symptom**: Fields like "Contact: undefined" or "Location: undefined" in generated documents.
**Root cause**: JavaScript outputs the string `"undefined"` when accessing a property that doesn't exist on the config object.
**Fix**: Use `safeText()` helper for ALL user-facing text values:
```js
function safeText(value, placeholder) {
if (value === undefined || value === null || value === "" ||
String(value) === "NaN" || String(value) === "undefined") {
return placeholder || "【Please fill in】";
}
return String(value);
}
// Usage: new TextRun({ text: safeText(config.contact, "【Contact person】") })
```

View File

@@ -0,0 +1,276 @@
# Math Formulas — LaTeX → docx-js Mapping
## Design Philosophy
GLM uses **LaTeX as the formula input syntax**, internally converting to docx-js Math objects.
**Why not write OMML directly?**
- Models are naturally proficient in LaTeX (abundant in training data)
- LaTeX is semantically clear and highly readable
- Conversion layer is encapsulated internally, transparent to the user
## Quick Start
```js
const { Math: OoxmlMath, MathRun, MathFraction, MathSuperScript,
MathSubScript, MathRadical, MathSum, MathSubSuperScript } = require("docx");
// Embed formula in paragraph
new Paragraph({
alignment: AlignmentType.CENTER,
children: [
new OoxmlMath({
children: [/* Math components */]
})
]
})
```
## LaTeX → docx-js Conversion Table
### Basic Operations
| LaTeX | Meaning | docx-js Implementation |
|-------|---------|----------------------|
| `x + y` | Addition | `new MathRun("x + y")` |
| `x - y` | Subtraction | `new MathRun("x y")` (use Unicode minus ``) |
| `x \times y` | Multiplication | `new MathRun("x × y")` |
| `x \div y` | Division | `new MathRun("x ÷ y")` |
| `x \pm y` | Plus-minus | `new MathRun("x ± y")` |
| `x \neq y` | Not equal | `new MathRun("x ≠ y")` |
| `x \leq y` | Less or equal | `new MathRun("x ≤ y")` |
| `x \geq y` | Greater or equal | `new MathRun("x ≥ y")` |
### Fractions
| LaTeX | docx-js |
|-------|---------|
| `\frac{a}{b}` | `new MathFraction({ numerator: [new MathRun("a")], denominator: [new MathRun("b")] })` |
| `\frac{x+1}{x-1}` | `new MathFraction({ numerator: [new MathRun("x+1")], denominator: [new MathRun("x1")] })` |
### Superscripts & Subscripts
| LaTeX | docx-js |
|-------|---------|
| `x^2` | `new MathSuperScript({ children: [new MathRun("x")], superScript: [new MathRun("2")] })` |
| `x_i` | `new MathSubScript({ children: [new MathRun("x")], subScript: [new MathRun("i")] })` |
| `x_i^2` | `new MathSubSuperScript({ children: [new MathRun("x")], subScript: [new MathRun("i")], superScript: [new MathRun("2")] })` |
### Radicals
| LaTeX | docx-js |
|-------|---------|
| `\sqrt{x}` | `new MathRadical({ children: [new MathRun("x")] })` |
| `\sqrt[3]{x}` | `new MathRadical({ children: [new MathRun("x")], degree: [new MathRun("3")] })` |
### Summation & Integrals
| LaTeX | docx-js |
|-------|---------|
| `\sum_{i=1}^{n}` | `new MathSum({ subScript: [new MathRun("i=1")], superScript: [new MathRun("n")], children: [new MathRun("aᵢ")] })` |
### Greek Letters
Use Unicode characters directly:
```js
// LaTeX → Unicode mapping
const GREEK = {
"\\alpha": "α", "\\beta": "β", "\\gamma": "γ", "\\delta": "δ",
"\\epsilon": "ε", "\\zeta": "ζ", "\\eta": "η", "\\theta": "θ",
"\\iota": "ι", "\\kappa": "κ", "\\lambda": "λ", "\\mu": "μ",
"\\nu": "ν", "\\xi": "ξ", "\\pi": "π", "\\rho": "ρ",
"\\sigma": "σ", "\\tau": "τ", "\\phi": "φ", "\\chi": "χ",
"\\psi": "ψ", "\\omega": "ω",
"\\Alpha": "Α", "\\Beta": "Β", "\\Gamma": "Γ", "\\Delta": "Δ",
"\\Theta": "Θ", "\\Lambda": "Λ", "\\Pi": "Π", "\\Sigma": "Σ",
"\\Phi": "Φ", "\\Psi": "Ψ", "\\Omega": "Ω",
};
```
## Complete Formula Examples
### Quadratic Formula
LaTeX: `x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}`
```js
new OoxmlMath({
children: [
new MathRun("x = "),
new MathFraction({
numerator: [
new MathRun("b ± "),
new MathRadical({
children: [
new MathSuperScript({
children: [new MathRun("b")],
superScript: [new MathRun("2")],
}),
new MathRun(" 4ac"),
],
}),
],
denominator: [new MathRun("2a")],
}),
],
})
```
### Pythagorean Theorem
LaTeX: `a^2 + b^2 = c^2`
```js
new OoxmlMath({
children: [
new MathSuperScript({ children: [new MathRun("a")], superScript: [new MathRun("2")] }),
new MathRun(" + "),
new MathSuperScript({ children: [new MathRun("b")], superScript: [new MathRun("2")] }),
new MathRun(" = "),
new MathSuperScript({ children: [new MathRun("c")], superScript: [new MathRun("2")] }),
],
})
```
### Trigonometric Identity
LaTeX: `\sin^2\theta + \cos^2\theta = 1`
```js
new OoxmlMath({
children: [
new MathSuperScript({ children: [new MathRun("sin")], superScript: [new MathRun("2")] }),
new MathRun("θ + "),
new MathSuperScript({ children: [new MathRun("cos")], superScript: [new MathRun("2")] }),
new MathRun("θ = 1"),
],
})
```
## Common Exam Formula Templates
### Middle School Math
```js
// Quadratic discriminant
const discriminant = new OoxmlMath({
children: [
new MathRun("Δ = "),
new MathSuperScript({ children: [new MathRun("b")], superScript: [new MathRun("2")] }),
new MathRun(" 4ac"),
],
});
// Circle area
const circleArea = new OoxmlMath({
children: [
new MathRun("S = π"),
new MathSuperScript({ children: [new MathRun("r")], superScript: [new MathRun("2")] }),
],
});
```
### High School Math
```js
// Logarithm change of base
const logChange = new OoxmlMath({
children: [
new MathSubScript({ children: [new MathRun("log")], subScript: [new MathRun("a")] }),
new MathRun("b = "),
new MathFraction({
numerator: [new MathRun("ln b")],
denominator: [new MathRun("ln a")],
}),
],
});
// Arithmetic series sum
const arithmeticSum = new OoxmlMath({
children: [
new MathSubScript({ children: [new MathRun("S")], subScript: [new MathRun("n")] }),
new MathRun(" = "),
new MathFraction({
numerator: [
new MathRun("n("),
new MathSubScript({ children: [new MathRun("a")], subScript: [new MathRun("1")] }),
new MathRun(" + "),
new MathSubScript({ children: [new MathRun("a")], subScript: [new MathRun("n")] }),
new MathRun(")"),
],
denominator: [new MathRun("2")],
}),
],
});
```
### Physics
```js
// Newton's second law
const newton2 = new OoxmlMath({
children: [new MathRun("F = ma")],
});
// Kinetic energy
const kineticEnergy = new OoxmlMath({
children: [
new MathSubScript({ children: [new MathRun("E")], subScript: [new MathRun("k")] }),
new MathRun(" = "),
new MathFraction({
numerator: [new MathRun("1")],
denominator: [new MathRun("2")],
}),
new MathRun("m"),
new MathSuperScript({ children: [new MathRun("v")], superScript: [new MathRun("2")] }),
],
});
```
## Complexity Fallback Strategy
When formulas are too complex (nesting >3 levels) for docx-js Math, **fall back to matplotlib PNG rendering:**
```python
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
def latex_to_png(latex_str: str, output_path: str, fontsize: int = 14, dpi: int = 200):
"""Render LaTeX formula as PNG image"""
fig, ax = plt.subplots(figsize=(0.1, 0.1))
ax.axis("off")
text = ax.text(0, 0.5, f"${latex_str}$", fontsize=fontsize,
transform=ax.transAxes, verticalalignment="center")
fig.canvas.draw()
bbox = text.get_window_extent(fig.canvas.get_renderer())
fig.set_size_inches(bbox.width / dpi + 0.2, bbox.height / dpi + 0.2)
plt.savefig(output_path, dpi=dpi, bbox_inches="tight",
pad_inches=0.05, transparent=True)
plt.close()
return output_path
```
Then embed the PNG in the document:
```js
const formulaImg = fs.readFileSync("formula.png");
new Paragraph({
alignment: AlignmentType.CENTER,
children: [new ImageRun({
data: formulaImg,
transformation: { width: 300, height: 40 }, // adjust based on actual size
type: "png",
})],
})
```
**Fallback rules:**
- Nested fractions >2 levels → fallback
- Matrices/determinants → fallback
- Complex integrals (multiple integrals + limits + integrand) → fallback
- Piecewise functions → fallback
- All other cases → prefer docx-js Math

222
skills/docx/references/ooxml.md Executable file
View File

@@ -0,0 +1,222 @@
# OOXML Editing Reference — Document Library API
**Important: Read this entire document before editing.** This is the primary reference for modifying existing .docx files.
## Document Library (Python) — Primary API
Use the `Document` class from `"$DOCX_SCRIPTS/document.py"` for all edits, tracked changes, and comments. It handles infrastructure automatically (people.xml, RSIDs, settings.xml, comments, relationships, content types).
**Working with Unicode and Entities:**
- Both entity notation and Unicode work for search: `contains="&#8220;Company"``contains="\u201cCompany"`
- Both work for replacement too
### Setup
```bash
# Find the docx skill root
find /mnt/skills -name "document.py" -path "*/docx/scripts/*" 2>/dev/null | head -1
# Skill root = parent of scripts/
# Run with PYTHONPATH
PYTHONPATH=/mnt/skills/docx python your_script.py
```
```python
from scripts.document import Document, DocxXMLEditor
# Basic init (auto-creates temp copy, sets up infrastructure)
doc = Document('unpacked')
# Custom author/initials
doc = Document('unpacked', author="John Doe", initials="JD")
# Enable tracked changes
doc = Document('unpacked', track_revisions=True)
# Custom RSID (auto-generated if omitted)
doc = Document('unpacked', rsid="07DC5ECB")
```
### Finding Nodes
```python
# By text
node = doc["word/document.xml"].get_node(tag="w:p", contains="specific text")
# By line range
para = doc["word/document.xml"].get_node(tag="w:p", line_number=range(100, 150))
# By attributes
node = doc["word/document.xml"].get_node(tag="w:del", attrs={"w:id": "1"})
# By exact line number
para = doc["word/document.xml"].get_node(tag="w:p", line_number=42)
# Combined filters (disambiguation)
node = doc["word/document.xml"].get_node(tag="w:r", contains="Section", line_number=range(2400, 2500))
```
### Tracked Changes
**CRITICAL**: Only mark text that actually changes. Keep unchanged text outside `<w:del>`/`<w:ins>` tags.
**Method Selection**:
- Regular text → `replace_node()` with `<w:del>`/`<w:ins>`, or `suggest_deletion()` for whole elements
- Partially modify another's tracked change → `replace_node()` to nest changes
- Reject another's insertion → `revert_insertion()` (NOT `suggest_deletion()`)
- Reject another's deletion → `revert_deletion()`
```python
# Change one word: "monthly" → "quarterly"
node = doc["word/document.xml"].get_node(tag="w:r", contains="The report is monthly")
rpr = tags[0].toxml() if (tags := node.getElementsByTagName("w:rPr")) else ""
replacement = f'<w:r w:rsidR="00AB12CD">{rpr}<w:t>The report is </w:t></w:r><w:del><w:r>{rpr}<w:delText>monthly</w:delText></w:r></w:del><w:ins><w:r>{rpr}<w:t>quarterly</w:t></w:r></w:ins>'
doc["word/document.xml"].replace_node(node, replacement)
# Delete entire run
node = doc["word/document.xml"].get_node(tag="w:r", contains="text to delete")
doc["word/document.xml"].suggest_deletion(node)
# Delete entire paragraph
para = doc["word/document.xml"].get_node(tag="w:p", contains="paragraph to delete")
doc["word/document.xml"].suggest_deletion(para)
# Insert new content after a node
node = doc["word/document.xml"].get_node(tag="w:r", contains="existing text")
doc["word/document.xml"].insert_after(node, '<w:ins><w:r><w:t>new text</w:t></w:r></w:ins>')
# Add new numbered list item
target_para = doc["word/document.xml"].get_node(tag="w:p", contains="existing list item")
pPr = tags[0].toxml() if (tags := target_para.getElementsByTagName("w:pPr")) else ""
new_item = f'<w:p>{pPr}<w:r><w:t>New item</w:t></w:r></w:p>'
tracked_para = DocxXMLEditor.suggest_paragraph(new_item)
doc["word/document.xml"].insert_after(target_para, tracked_para)
```
### Handling Other Authors' Changes
```python
# Partially delete another author's insertion
node = doc["word/document.xml"].get_node(tag="w:ins", attrs={"w:id": "5"})
replacement = '''<w:ins w:author="Jane Smith" w:date="2025-01-15T10:00:00Z">
<w:r><w:t>quarterly </w:t></w:r>
<w:del><w:r><w:delText>financial </w:delText></w:r></w:del>
<w:r><w:t>report</w:t></w:r>
</w:ins>'''
doc["word/document.xml"].replace_node(node, replacement)
# Reject insertion (wraps in deletion)
ins = doc["word/document.xml"].get_node(tag="w:ins", attrs={"w:id": "5"})
doc["word/document.xml"].revert_insertion(ins)
# Reject deletion (restores deleted content)
del_elem = doc["word/document.xml"].get_node(tag="w:del", attrs={"w:id": "3"})
doc["word/document.xml"].revert_deletion(del_elem)
```
### Comments
```python
doc = Document('unpacked', author="Z.ai", initials="Z")
# Comment on a range
start = doc["word/document.xml"].get_node(tag="w:del", attrs={"w:id": "1"})
end = doc["word/document.xml"].get_node(tag="w:ins", attrs={"w:id": "2"})
doc.add_comment(start=start, end=end, text="Explanation of this change")
# Comment on paragraph
para = doc["word/document.xml"].get_node(tag="w:p", contains="text")
doc.add_comment(start=para, end=para, text="Comment here")
# Comment on newly created tracked change
node = doc["word/document.xml"].get_node(tag="w:r", contains="old")
new_nodes = doc["word/document.xml"].replace_node(
node, '<w:del><w:r><w:delText>old</w:delText></w:r></w:del><w:ins><w:r><w:t>new</w:t></w:r></w:ins>')
doc.add_comment(start=new_nodes[0], end=new_nodes[1], text="Changed per requirements")
# Reply to comment
doc.reply_to_comment(parent_comment_id=0, text="I agree")
```
### Images
```python
from PIL import Image
import shutil, os
doc = Document('unpacked')
media_dir = os.path.join(doc.unpacked_path, 'word/media')
os.makedirs(media_dir, exist_ok=True)
shutil.copy('image.png', os.path.join(media_dir, 'image1.png'))
img = Image.open(os.path.join(media_dir, 'image1.png'))
width_emus = int(6.5 * 914400) # 6.5" usable width
height_emus = int(width_emus * img.size[1] / img.size[0])
# Add relationship
rels_editor = doc['word/_rels/document.xml.rels']
next_rid = rels_editor.get_next_rid()
rels_editor.append_to(rels_editor.dom.documentElement,
f'<Relationship Id="{next_rid}" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png"/>')
doc['[Content_Types].xml'].append_to(doc['[Content_Types].xml'].dom.documentElement,
'<Default Extension="png" ContentType="image/png"/>')
# Insert
node = doc["word/document.xml"].get_node(tag="w:p", line_number=100)
doc["word/document.xml"].insert_after(node, f'''<w:p><w:r><w:drawing>
<wp:inline distT="0" distB="0" distL="0" distR="0">
<wp:extent cx="{width_emus}" cy="{height_emus}"/>
<wp:docPr id="1" name="Picture 1"/>
<a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
<a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:nvPicPr><pic:cNvPr id="1" name="image1.png"/><pic:cNvPicPr/></pic:nvPicPr>
<pic:blipFill><a:blip r:embed="{next_rid}"/><a:stretch><a:fillRect/></a:stretch></pic:blipFill>
<pic:spPr><a:xfrm><a:ext cx="{width_emus}" cy="{height_emus}"/></a:xfrm><a:prstGeom prst="rect"><a:avLst/></a:prstGeom></pic:spPr>
</pic:pic>
</a:graphicData>
</a:graphic>
</wp:inline>
</w:drawing></w:r></w:p>''')
```
### Saving
```python
doc.save() # Validates + copies back to original dir
doc.save('modified-unpacked') # Save to different location
doc.save(validate=False) # Skip validation (debug only)
```
### Direct DOM Manipulation
```python
editor = doc["word/document.xml"]
node = doc["word/document.xml"].get_node(tag="w:p", line_number=5)
parent = node.parentNode
parent.removeChild(node)
# General replacement (without tracked changes)
old = doc["word/document.xml"].get_node(tag="w:p", contains="original")
doc["word/document.xml"].replace_node(old, "<w:p><w:r><w:t>replacement</w:t></w:r></w:p>")
# Chained insertions
node = doc["word/document.xml"].get_node(tag="w:r", line_number=100)
nodes = doc["word/document.xml"].insert_after(node, "<w:r><w:t>A</w:t></w:r>")
nodes = doc["word/document.xml"].insert_after(nodes[-1], "<w:r><w:t>B</w:t></w:r>")
```
## Schema Compliance Quick Reference
- **Element ordering in `<w:pPr>`**: `<w:pStyle>``<w:numPr>``<w:spacing>``<w:ind>``<w:jc>`
- **Whitespace**: `xml:space='preserve'` on `<w:t>` with leading/trailing spaces
- **RSIDs**: 8-digit hex only (0-9, A-F)
- **trackRevisions**: Add `<w:trackRevisions/>` after `<w:proofState>` in settings.xml
- **`<w:del>`/`<w:ins>` placement**: At paragraph level, containing complete `<w:r>` elements. Never nest inside `<w:r>`.
## Validation Rules
The validator ensures document text matches the original after reverting GLM's changes:
- **Never modify text inside another author's `<w:ins>` or `<w:del>` tags**
- **Use nested deletions** to remove another author's insertions
- **Every edit must be tracked** with `<w:ins>` or `<w:del>` tags

264
skills/docx/references/toc.md Executable file
View File

@@ -0,0 +1,264 @@
# Table of Contents (TOC) — Complete Reference
> **This is the single source of truth for all TOC rules.** Other files should reference this file instead of duplicating TOC instructions.
## Overview
DOCX TOC is a **3-step process**: Code → Post-process → User opens Word.
```
Step A: docx-js code generates empty TOC field structure
Step B: add_toc_placeholders.py fills it with visible placeholder entries
Step C: User opens Word → "Update Field" → real page numbers replace placeholders
```
All 3 steps are **mandatory**. Skipping any step results in a broken or empty TOC.
## When to Add TOC
- **Recommended**: Long or complex documents with many headings (reports, theses, papers, manuals)
- **Do NOT add**: Resumes, contracts, letters, exam papers, short documents
- **postcheck rule**: If document contains a "目录" title but no `TableOfContents` element → error
## Step A: Code Generation (docx-js)
Insert **4 elements** in sequence:
```js
const { TableOfContents, Paragraph, TextRun, PageBreak, AlignmentType } = require("docx");
// 1. TOC title — ⛔ DO NOT use HeadingLevel (or TOC will index itself!)
new Paragraph({
alignment: AlignmentType.CENTER,
spacing: { before: 480, after: 360 },
children: [new TextRun({
text: "目 录", // or "Table of Contents" for English docs
bold: true, size: 32,
font: { eastAsia: "SimHei", ascii: "Times New Roman" }
})],
}),
// 2. TOC field element — ⚠️ first parameter is NOT displayed, it's internal name only
new TableOfContents("Table of Contents", {
hyperlink: true,
headingStyleRange: "1-3", // match HeadingLevel range used in document
}),
// 3. ★ MANDATORY Refresh Hint — tells user how to update page numbers
new Paragraph({
spacing: { before: 200 },
children: [new TextRun({
text: "Note: This Table of Contents is generated via field codes. To ensure page number accuracy after editing, please right-click the TOC and select \"Update Field.\"",
italics: true, size: 18, color: "888888"
})]
}),
// 4. ★ MANDATORY PageBreak after TOC — prevents TOC and body merging on same page
new Paragraph({ children: [new PageBreak()] }),
```
### Heading Requirements
**⚠️ CRITICAL**: TOC only picks up paragraphs with `heading: HeadingLevel.HEADING_X`.
```js
// ✅ Correct — Heading style, TOC can index
new Paragraph({
heading: HeadingLevel.HEADING_1,
children: [new TextRun({ text: "第一章 引言", bold: true, size: 32, color: c(P.primary) })]
})
// ❌ Wrong — manual bold + large font, TOC cannot detect
new Paragraph({
children: [new TextRun({ text: "第一章 引言", bold: true, size: 32, color: c(P.primary) })]
})
```
**Exceptions:**
- Cover title: does NOT need Heading style (should not appear in TOC)
- "目录" title: **MUST NOT** use Heading style (prevents TOC from indexing itself)
## Step B: Post-Processing Script
**MUST** run after generating the DOCX file:
```bash
python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx --auto
```
### What the script does
1. Extracts Heading 1-3 from the document as TOC entries
2. Fixes docx-js fldChar structure bug (begin+instrText+separate merged in one `<w:r>`)
3. Patches `settings.xml` with `updateFields=true` (Word prompts to refresh on open)
4. Ensures Heading styles have `outlineLvl` (required for TOC field update)
5. Ensures TOC 1/2/3 styles exist in `styles.xml`
6. Injects placeholder entries with HYPERLINK + PAGEREF between `separate` and `end` fldChars
7. Handles duplicate heading texts (each gets its own bookmark)
### Error handling
The script **exits with code 1** if:
- No TOC field structure found (missing `TableOfContents` element)
- TOC field has `begin` but no `separate` fldChar (malformed structure)
- Field structure exists but no TOC instrText detected
**If exit code = 1 → the generated code is wrong. Fix the code and regenerate.**
### Options
```bash
# Auto mode (recommended — default behavior)
python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx --auto
# Manual entries
python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx \
--entries '[{"level":1,"text":"Chapter 1","page":"1"},{"level":2,"text":"Section 1.1","page":"2"}]'
```
## Step C: User Opens in Word/WPS
- **Word**: Detects `updateFields=true` → prompts "Update field?" → click Yes → real page numbers
- **WPS**: May NOT auto-prompt. User must: right-click TOC → "Update Field" → "Update entire table"
The placeholder entries ensure TOC is **not blank** even without updating — users see heading titles with approximate page numbers.
## Multi-Section Page Numbering
When a document has a TOC, the TOC MUST be in its own section so that body page numbering starts from 1. This applies to **all document types with a TOC** (reports, whitepapers, PRDs, academic papers, etc.) — not just academic papers.
**Mandatory 3-section architecture for documents with cover + TOC:**
```js
sections: [
{ /* Section 1: Cover — no page number, no footer */
properties: {
page: { size: pgSize, margin: pgMargin },
// ⚠️ Do NOT set page.pageNumbers here — docx-js emits empty <pgNumType/> which confuses WPS
},
},
{ /* Section 2: Front matter (abstract, TOC) — Roman numerals */
properties: {
type: SectionType.NEXT_PAGE,
page: {
size: pgSize, margin: pgMargin,
pageNumbers: { start: 1, formatType: NumberFormat.UPPER_ROMAN }, // I, II, III...
},
},
footers: { default: pageNumFooter() }, // see footer rules below
children: [/* abstract + TOC title + TableOfContents + PageBreak */]
},
{ /* Section 3: Body — Arabic numerals starting from 1 */
properties: {
type: SectionType.NEXT_PAGE,
page: {
size: pgSize, margin: pgMargin,
pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL }, // 1, 2, 3...
},
},
footers: { default: pageNumFooter() },
children: [/* body content */]
},
]
```
### ⚠️ Page Number API — Correct Nesting (CRITICAL)
Page number settings MUST be nested inside `page.pageNumbers`, NOT at properties top level:
```js
// ❌ WRONG — docx-js ignores these, pgNumType will be empty
properties: {
pageNumberStart: 1,
pageNumberFormatType: NumberFormat.DECIMAL,
}
// ✅ CORRECT — docx-js writes start= and fmt= attributes
properties: {
page: {
pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL },
},
}
```
### ⚠️ Footer Field Instruction — WPS Compatibility (CRITICAL)
WPS may ignore `pgNumType fmt` in the section properties. To ensure correct display, the footer PAGE field **MUST** include an explicit format switch via **post-processing**:
After generating the docx, unzip and patch each footer XML:
- **Roman numeral footer**: replace `PAGE` with `PAGE \* ROMAN \\** MERGEFORMAT`
- **Arabic numeral footer**: replace `PAGE \* arabic \* MERGEFORMAT`
**⚠️ NEVER use `\* decimal` in instrText** — `decimal` is a docx-js API enum value (`NumberFormat.DECIMAL` for `pgNumType` XML attribute), NOT a valid Word field format switch. Using it causes page numbers to render as "1decimal", "2decimal". The correct Word field switch for Arabic numerals is always `\* arabic`.
```js
// Post-process footer XML:
footerXml = footerXml.replace(
/(<w:instrText[^>]*>)\s*PAGE\s*(<\/w:instrText>)/g,
'$1 PAGE \\* ROMAN \\** MERGEFORMAT $2' // or "arabic" for body section
);
```
Also remove any empty `<w:pgNumType/>` from the cover section (docx-js emits these even when no pageNumbers is set):
```js
docXml = docXml.replace(/<w:pgNumType\/>/g, "");
```
### Page Numbering Rules
| Section | Content | Format | Start | Footer |
|---------|---------|--------|-------|--------|
| Cover | Title page | None | — | No footer |
| Front matter | Abstract, TOC | Roman (I, II, III) | 1 | `PAGE \* ROMAN` |
| Body | Main content | Arabic (1, 2, 3) | 1 | `PAGE \* arabic` |
⚠️ **The body section MUST set `pageNumbers: { start: 1 }`** — otherwise page numbers continue from the front matter pages, causing TOC page references to be offset. This is the #1 cause of "TOC page numbers are wrong".
### Common Causes of Incorrect Page Numbers
| Cause | Fix |
|-------|-----|
| `pageNumberStart` at properties top level | Move to `page: { pageNumbers: { start: 1 } }` |
| Cover section emits empty `<pgNumType/>` | Post-process to remove it |
| Footer uses bare `PAGE` without format switch | Post-process to add `\* roman` or `\* arabic` |
| Cover and body in same section | Separate cover into its own section |
| Multiple sections without pageNumbers.start | Explicitly set on each section needing independent counting |
| headingStyleRange doesn't match headings | Ensure `headingStyleRange: "1-3"` covers all HeadingLevel values used |
| Cover section has header/footer | Don't set header/footer on cover section |
## TOC Refresh Hint (MANDATORY)
**⚠️ When the document contains a TOC, you MUST add the following hint paragraph between the `TableOfContents` element and the PageBreak (so it appears on the TOC page, not the body page).** This ensures users know how to refresh page numbers after editing.
```js
new Paragraph({
spacing: { before: 200 },
children: [new TextRun({
text: "Note: This Table of Contents is generated via field codes. To ensure page number accuracy after editing, please right-click the TOC and select \"Update Field.\"",
italics: true, size: 18, color: "888888"
})]
}),
```
## 5 Common TOC Bugs
| # | Bug | Symptom | Fix |
|---|-----|---------|-----|
| 1 | "目录" heading uses `HeadingLevel.HEADING_1` | TOC includes "目录" as an entry | Remove `heading:` from TOC title paragraph |
| 2 | No `PageBreak` after `TableOfContents` | TOC and body text on same page | Add `new Paragraph({ children: [new PageBreak()] })` after TOC |
| 3 | Missing `TableOfContents` element | Script cannot inject placeholders, TOC is empty | Always include `new TableOfContents(...)` in code |
| 4 | Headings use bold+large instead of `HeadingLevel` | TOC is empty even after running script | Change all body headings to `heading: HeadingLevel.HEADING_X` |
| 5 | Script not run or exit code ignored | TOC page shows only title + blank space | Always run script; if exit code = 1, fix code and regenerate |
## Checklist (for self-check during generation)
- [ ] Document has 3+ H1 → TOC is included
- [ ] "目录" heading does NOT use `HeadingLevel` (prevents self-indexing)
- [ ] `new TableOfContents(...)` element present (not just plain text)
- [ ] `PageBreak` exists after TOC element (prevents merging with body)
- [ ] All body chapter headings use `heading: HeadingLevel.HEADING_X`
- [ ] `add_toc_placeholders.py --auto` runs after generation
- [ ] Script exit code checked — if 1, fix code and regenerate
- [ ] TOC page has visible placeholder content (not empty)
- [ ] **TOC Refresh Hint present** — italic gray note after TOC PageBreak telling user to right-click → "Update Field"
- [ ] `outlineLevel: 0` for H1, `1` for H2, etc. (needed for TOC field update)

88
skills/docx/routes/comment.md Executable file
View File

@@ -0,0 +1,88 @@
# Route: Add Comments
## Method 1: python-docx (Recommended — Simple)
```python
from docx import Document
from docx.oxml.ns import qn
from docx.oxml import OxmlElement
from datetime import datetime
def add_comment(paragraph, comment_text, author="GLM", initials="G"):
"""Add a comment to an entire paragraph."""
# Create comment reference
comment_id = str(hash(comment_text) % 10000)
# Add to comments.xml (need to create if not exists)
# ... complex XML manipulation required
pass
# Simpler approach: use python-docx-ng or manipulate XML directly
```
**Note**: python-docx has limited native comment support. For reliable results, use the OOXML method.
## Method 2: OOXML Direct Manipulation (Reliable)
### Step 1: Unpack
```bash
mkdir work && cd work && unzip ../input.docx
```
### Step 2: Create/update word/comments.xml
```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:comments xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<w:comment w:id="1" w:author="Reviewer" w:date="2024-01-15T10:30:00Z" w:initials="R">
<w:p>
<w:r>
<w:t>This section needs more detail.</w:t>
</w:r>
</w:p>
</w:comment>
</w:comments>
```
### Step 3: Mark comment range in document.xml
```xml
<w:commentRangeStart w:id="1"/>
<w:r><w:t>Text being commented on</w:t></w:r>
<w:commentRangeEnd w:id="1"/>
<w:r>
<w:rPr><w:rStyle w:val="CommentReference"/></w:rPr>
<w:commentReference w:id="1"/>
</w:r>
```
### Step 4: Update relationships
In `word/_rels/document.xml.rels`, add:
```xml
<Relationship Id="rIdComments" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments" Target="comments.xml"/>
```
### Step 5: Update Content_Types
In `[Content_Types].xml`, ensure:
```xml
<Override PartName="/word/comments.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml"/>
```
### Step 6: Pack
```bash
zip -r ../output.docx . -x ".*"
```
## When to Use Each Method
| Scenario | Method |
|----------|--------|
| Add 1-2 simple comments | OOXML |
| Batch review (many comments) | OOXML with Python script |
| Comment on specific words | OOXML (precise range control) |
| Quick annotation | python-docx if available |

207
skills/docx/routes/create.md Executable file
View File

@@ -0,0 +1,207 @@
# Route: Create New Document
## Workflow
```
0. Check if user provided a reference template (PDF/docx) → if yes, use Template-Following Mode below
1. Load `references/design-system.md` → select palette and cover recipe
2. Load `references/common-rules.md` → shared layout, font, placeholder rules
3. Check user keywords → load scene file if applicable
4. Load `references/docx-js-core.md`
5. If complex → also load `references/docx-js-advanced.md`
6. Plan document structure (outline)
7. Write JS/TS using docx library
⚠️ **BEFORE writing any string**: scan ALL Chinese text for curly quotes `""''` and replace with `\u201c \u201d \u2018 \u2019` — bare curly quotes break JS syntax (see docx-js-advanced.md § Quotes Escaping)
8. Run with `bun run generate.js` (or `node generate.js`)
9. If TOC → run `python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx --auto`
10. Run post-generation checklist (see SKILL.md)
```
## Template-Following Mode
When the user provides a reference document (PDF/docx) as a **formatting template** (e.g., "generate following this template format", "refer to this sample"), switch to template-following mode instead of the standard recipe-based workflow:
1. **Extract the template's structure** — cover layout, section order, heading hierarchy, page breaks, special pages (e.g., advisor comments page, approval form)
2. **Replicate structure exactly** — every major structural unit becomes a **separate section** (cover, body, appendix/form pages) with appropriate margins and page breaks
3. **Fill content** from the user's content source, or generate per user instructions
4. **Preserve template-specific elements** — school-specific forms, signature areas, stamp placeholders, advisor comment blocks → reproduce as-is with placeholder text (e.g., "Advisor (signature):")
5. **Maintain formatting fidelity** — font choices, table layouts, spacing, and alignment should match the template, not the standard design-system palettes
⚠️ **Do NOT apply standard cover recipes (R1R7) when a user-provided template defines its own cover format.** Follow the template's cover layout instead. Standard `common-rules.md` constraints (e.g., `WidthType.PERCENTAGE`, `allNoBorders` for cover wrapper, `Rule 8` line spacing) still apply for cross-engine compatibility.
⚠️ **Each distinct page type = separate section.** Cover section (margin: 0), body section (standard margins), appendix/form pages (may need different margins or orientation). Never place cover + body + appendix in a single section.
---
## Decision Tree
### Cover Page?
- **YES**: Reports, theses, proposals, plans, or 3+ page docs with clear title/author
- **NO**: Resumes, contracts, official documents, exam papers, short memos
### Cover Style Selector — Recipe Router
Covers use **7 validated layout recipes (R1R7)**, auto-selected by `selectCoverRecipe()` in `references/design-system.md` (the **authoritative source** — do NOT duplicate the function).
**Quick Reference:**
| docType | Recipe | Default Palette |
|---------|--------|-----------------|
| contract / official / exam / resume | null (no cover) | — |
| academic | R5 (Clean White) | ACADEMIC |
| proposal_report (thesis proposal) | R5 (Clean White) | ACADEMIC |
| lesson_plan (STEM) | R4 (Top Color Block) | DM-1 |
| lesson_plan (arts/general) | R6 (Editorial Warm) | ED-1 |
| creative / branding / design | R3 (Centered Card Frame) | SN-2 |
| cultural / newsletter / internal | R6 (Editorial Warm) | ED-1 |
| activity / event | R6 (Editorial Warm) | ED-1 |
| trend/research (cultural/creative/brand) | R7 (Swiss Tech) | ST-1 |
| whitepaper | R2 (Double-Rule Frame) | IG-1 / CM-2 |
| consulting | R2 (Double-Rule Frame) | MIN-1 |
| proposal / plan | R4 (Top Color Block) | GO-1 |
| report | R1 (Pure Paragraph Left) | by industry |
| default | R1 (Pure Paragraph Left) | DS-1 |
⚠️ **Long title routing:** After selecting recipe, apply `applyLongTitleOverride(result, titleLength)`. Titles >20 chars on R3/R4/R6 → fall back to R1. Titles >30 chars on R2 → fall back to R1. R5 is never overridden.
⚠️ **Academic thesis cover:** Use `buildAcademicCover()` from `scenes/academic.md`.
⚠️ **Thesis proposal report (开题报告):** Use `buildProposalCover()` from `scenes/academic.md`. Cover MUST be an independent section. Keywords: "开题报告" (Chinese), "thesis proposal", "research proposal" — NOT the same as business proposals (which use R4).
### Table of Contents?
- **YES**: 3+ major sections (H1 headings)
- **NO**: Resumes, exam papers, short docs, contracts (<20 clauses)
→ See `references/toc.md` for the complete TOC reference (3-step process, code examples, common bugs).
### Headers/Footers?
- **YES** by default (page numbers minimum)
- **NO**: cover page section, official docs (special format)
### Load Math Formulas?
When: exam papers, academic papers, physics/math/chemistry → load `references/math-formulas.md`
### Load Chart Templates?
When: data visualization, reports with charts → load `references/chart-templates.md`
## Outline Rules
**User provides outline** → Follow EXACTLY. No additions, deletions, or reordering.
**No outline** → Create from scene template:
- **Academic:** Abstract → TOC → Body → References
- **Report:** Use `selectReportType()` to determine type, then follow template AF:
- analysis → Template A (Executive Summary → Background → Scope & Method → Findings → Diagnosis → Conclusions)
- experiment → Template B (Abstract → Objective & Hypothesis → Environment → Procedure → Results → Error Analysis → Conclusions)
- testing → Template C (Overview → Scope & Environment → Test Plan → Results → Defects → Risks → Conclusions)
- research → Template D (Summary → Background → Subjects & Method → Sample → Findings → Synthesis → Recommendations)
- review → Template E (Overview → Goals → Review → Results → Issues → Lessons → Action Plan)
- proposal → Template F (Summary → Status → Goals → Solution → Roadmap → Resources → Risks → Benefits)
- **Contract:** Use `selectContractType()` then follow template AE:
- bilateral → Template A (Header → Parties → Recitals → Definitions → Subject → Price → Rights → Delivery → Tax → IP → Breach → Force Majeure → Termination → Notices → Dispute → Miscellaneous → Signature)
- transfer → Template B (Header → Recitals → Definitions → Subject → Consideration → Closing → Representations → Tax → Breach → Dispute → Signature)
- nda → Template C (Header → Recitals → Definition → Obligations → Use Restrictions → Return/Destroy → Exceptions → Duration → Breach → Dispute → Signature)
- framework → Template D (Header → Recitals → Purpose → Scope → Division → Mechanism → Commercial → Confidentiality → Term → Breach → Dispute → Signature)
- terms → Template E (Title → Definitions → Services → Rights → Liability → Fees → IP → Termination → Notices → Dispute → Miscellaneous)
- **Official:** Use `selectOfficialType()` + `needsRedHeader()`:
- notice → Template A ([Red header] → [Doc number] → Title → Addressee → Reason → Items → Requirements → [Attachments] → [Signature] → [Date] → [Colophon])
- letter → Template B ([Red header] → [Doc number] → Title → Addressee → Reason → Negotiation/Reply → Closing → [Signature] → [Date])
- reply → Template C ([Red header] → [Doc number] → Title → Addressee → Reference → Reply → "This is the reply." → Signature → Date)
- minutes → Template D (Title → Meeting Overview → Agreed Items → Responsibilities → [Distribution]) — typically no red header
- Present outline to user before generating when possible
## Scene Completeness
Include ALL elements a scene specifies:
- **Academic thesis:** Cover (`buildAcademicCover()` in its own section), abstract, TOC, references
- **Thesis proposal report (thesis proposal / 开题报告):** Cover (`buildProposalCover()` in its own section), body sections per proposal template. Cover MUST be a separate section.
- **Report:** Cover, executive summary, conclusions
- **Contract:** Party info, recitals, complete clause closure, signature block, uniform `【】` placeholders
- **Official:** Correct document type, specific title, closing phrase matching type, proper numbering hierarchy, red header only when requested
- **Exam:** Student info area, scoring criteria
Generate complete, substantive content — not skeletons.
## Content Guidelines
- **Length**: "detailed report" = 3000+ words. "brief summary" = 5001000.
- **Data**: Use user's data, or generate realistic placeholders
- **Charts**: Use `references/chart-templates.md` matplotlib templates → PNG → embed
- **Math**: Use `references/math-formulas.md` LaTeX → docx-js Math mapping
- **Tables**: For structured data, not layout
- **Numbering**: Figures, tables numbered sequentially with cross-references
## Code Architecture
### Heading Style Rule (Mandatory)
**All body chapter headings MUST use `heading: HeadingLevel.HEADING_X`** — never simulate with bold + large font (TOC cannot detect simulated headings).
**Exception:** Cover title and TOC title ("目录") heading MUST NOT use Heading style.
### Blank Page Prevention
→ See SKILL.md § Post-Generation checklist for the full set of rules.
Key rules:
1. No double page breaks (SectionType.NEXT_PAGE + PageBreak = blank page)
2. PageBreak paragraphs should have visible text content
3. No more than 3 consecutive empty paragraphs
4. Cover section: ≤2 trailing empty paragraphs, no trailing PageBreak
### Builder Pattern Example
```js
const { Document, Packer, Paragraph, TextRun, Header, Footer,
AlignmentType, HeadingLevel, PageNumber } = require("docx");
const fs = require("fs");
// 1. Palette
const P = { primary: "#101820", body: "#182030", secondary: "#506070", accent: "#8090A0" };
const c = (hex) => hex.replace("#", "");
// 2. Component builders
function heading(text, level = HeadingLevel.HEADING_1) {
return new Paragraph({
heading: level,
spacing: { before: level === HeadingLevel.HEADING_1 ? 360 : 240, after: 120 },
children: [new TextRun({ text, bold: true, color: c(P.primary), font: { ascii: "Calibri", eastAsia: "SimHei" } })]
});
}
function body(text) {
return new Paragraph({
alignment: AlignmentType.JUSTIFIED,
indent: { firstLine: 480 },
spacing: { line: 312 },
children: [new TextRun({ text, size: 24, color: c(P.body) })],
});
}
// 3. Assembly — cover + body in separate sections
const doc = new Document({
styles: { default: { document: {
run: { font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" }, size: 24, color: c(P.body) },
paragraph: { spacing: { line: 312 } },
}}},
sections: [
{ properties: { page: { margin: { top: 0, bottom: 0, left: 0, right: 0 } } },
children: buildCoverR1(config) }, // ← use recipe from design-system.md
{ properties: { page: { margin: { top: 1440, bottom: 1440, left: 1701, right: 1417 } } },
footers: { default: new Footer({ children: [new Paragraph({ alignment: AlignmentType.CENTER,
children: [new TextRun({ children: [PageNumber.CURRENT], size: 18 })] })] }) },
children: [heading("Chapter 1"), body("Content...")] },
],
});
Packer.toBuffer(doc).then(buf => { fs.writeFileSync("output.docx", buf); });
```
## Post-Generation
→ See SKILL.md § Post-Generation for the complete two-layer verification checklist.
```bash
python3 "$DOCX_SCRIPTS/postcheck.py" output.docx
```
⚠️ **Running postcheck.py is MANDATORY.** Fix all ❌ errors before delivering.

115
skills/docx/routes/edit.md Executable file
View File

@@ -0,0 +1,115 @@
# Route: Edit Existing Document
## Workflow Overview
```
1. Receive .docx (or .doc → convert)
2. Unpack → working directory
3. Analyze structure (document.xml, styles.xml)
4. Plan changes → batch by type
5. Implement via Document library (Python)
6. Pack → output.docx
7. Verify (pandoc or visual)
```
## Step 0: Format Conversion
```bash
# .doc → .docx
libreoffice --headless --convert-to docx input.doc
```
## Step 1: Unpack
```bash
mkdir -p work_dir && cd work_dir && unzip ../input.docx
```
Key files: `word/document.xml` (content), `word/styles.xml` (styles), `word/numbering.xml` (lists), `word/media/` (images), `[Content_Types].xml`, `word/_rels/document.xml.rels`
## Step 2: Plan Changes
Group changes into batches, process in order:
1. **Structural** — Add/remove sections, reorder paragraphs
2. **Style** — Font, size, color modifications
3. **Text** — Find/replace, fix typos
4. **Table** — Add/remove rows/columns, update data
5. **Image** — Replace/add images
## Step 3: Implement
Load `references/ooxml.md` for the full Document library API. Key patterns:
```python
from scripts.document import Document
doc = Document('work_dir')
# Text replacement with tracked changes
node = doc["word/document.xml"].get_node(tag="w:r", contains="old text")
rpr = tags[0].toxml() if (tags := node.getElementsByTagName("w:rPr")) else ""
replacement = f'<w:del><w:r>{rpr}<w:delText>old text</w:delText></w:r></w:del><w:ins><w:r>{rpr}<w:t>new text</w:t></w:r></w:ins>'
doc["word/document.xml"].replace_node(node, replacement)
doc.save()
```
## Step 4: Pack
```bash
cd work_dir && zip -r ../output.docx . -x ".*"
```
## Step 5: Verify
```bash
pandoc output.docx -t plain -o /dev/stdout | head -50
# or visual
libreoffice --headless --convert-to pdf output.docx
```
---
## Template Matching Workflow
When user says "use this format" or provides a template:
1. Unpack template, extract `styles.xml`, `numbering.xml`
2. Analyze font/size/spacing/margins
3. Copy `styles.xml` into target document
4. Match heading hierarchy and spacing
## Multi-File Merge
1. Use first document as base
2. Extract content from additional documents
3. Insert with page breaks between sections
4. Merge styles (prefer base document's)
5. Re-number figures/tables sequentially
## Redlining (Tracked Changes) — Default for Revisions
When user asks for revisions, **default to tracked changes** so they can review:
```python
doc = Document('work_dir', track_revisions=True)
# ... make changes using replace_node with <w:del>/<w:ins>
doc.save()
```
Ask user if they want clean output or tracked changes only if ambiguous.
## Common Operations Quick Reference
| Operation | Approach |
|-----------|----------|
| Replace text | `get_node` + `replace_node` with tracked changes |
| Change font | Modify `<w:rFonts>` in run properties |
| Add paragraph | `insert_after` with `<w:p>` element |
| Delete paragraph | `suggest_deletion` on `<w:p>` |
| Add table row | Clone `<w:tr>`, modify cells |
| Update header | Edit `word/headerN.xml` |
| Change margins | Edit `<w:pgMar>` in `<w:sectPr>` |
| Add image | See `references/ooxml.md` image insertion pattern |
| Add comment | `doc.add_comment(start, end, text)` |

120
skills/docx/routes/format.md Executable file
View File

@@ -0,0 +1,120 @@
# Route: Format / Layout
## Workflow
```
1. Read current document (pandoc for content, unpack for structure)
2. Identify format requirements from user
3. Use unit conversion table (see SKILL.md)
4. Apply formatting via OOXML manipulation or python-docx
5. Pack and verify
```
## Quick Formatting via python-docx
For simple formatting tasks, python-docx is often faster than raw XML:
```python
from docx import Document as PythonDocument
from docx.shared import Pt, Cm, Twips
from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = PythonDocument("input.docx")
# Change all body paragraph formatting
for para in doc.paragraphs:
if para.style.name.startswith("Heading"):
continue
para.paragraph_format.first_line_indent = Twips(420)
para.paragraph_format.line_spacing = 1.5
para.paragraph_format.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
for run in para.runs:
run.font.name = "宋体"
run.font.size = Pt(12) # Xiao Si 小四
doc.save("output.docx")
```
## Common Format Request Patterns
### University Thesis Formatting
Typical Chinese university thesis requirements:
```python
from docx.shared import Cm, Pt, Twips
# Margins
for section in doc.sections:
section.top_margin = Cm(2.5)
section.bottom_margin = Cm(2.5)
section.left_margin = Cm(3.0)
section.right_margin = Cm(2.5)
# Fonts
# Body: SimSun 宋体 Xiao Si 小四 (12pt)
# H1: SimHei 黑体 San Hao 三号 (16pt) centered
# H2: SimHei 黑体 Si Hao 四号 (14pt)
# H3: SimHei 黑体 Xiao Si 小四 (12pt)
# English: Times New Roman, same sizes
```
### Page Numbers Starting from Specific Page
Use multi-section approach:
```python
# Section 1: Front matter (Roman numerals)
# Section 2: Main content (Arabic, starting from 1)
# This requires OOXML manipulation — see routes/edit.md for unpack/pack workflow
```
In raw XML (`word/document.xml`):
```xml
<w:sectPr>
<w:pgNumType w:fmt="upperRoman" w:start="1"/>
</w:sectPr>
<!-- New section -->
<w:sectPr>
<w:pgNumType w:fmt="decimal" w:start="1"/>
</w:sectPr>
```
### Different Headers Per Section
Each section in a .docx can have its own header/footer. See `references/docx-js-advanced.md` for the multi-section approach.
For existing documents, modify `word/document.xml` to split `<w:sectPr>` and create separate `headerN.xml` files.
### Font Size Conversion
When user requests a Chinese font size name:
| Request | Action |
|---------|--------|
| "Change to Wu Hao (5th) size" | `font.size = Pt(10.5)` or `size: 21` in docx-js |
| "Title in San Hao SimHei" | `font.size = Pt(16)`, `font.name = "SimHei"` |
| "Body in Xiao Si SimSun" | `font.size = Pt(12)`, `font.name = "SimSun"` |
### Line Spacing Adjustment
```python
from docx.shared import Twips
# 1.0x spacing
para.paragraph_format.line_spacing_rule = WD_LINE_SPACING.MULTIPLE
para.paragraph_format.line_spacing = 1.0
# 1.3x spacing (our default)
para.paragraph_format.line_spacing = 1.5
# Fixed spacing (e.g., 28pt)
para.paragraph_format.line_spacing_rule = WD_LINE_SPACING.EXACTLY
para.paragraph_format.line_spacing = Pt(28)
```
## Verification
After formatting changes:
1. Open in LibreOffice or convert to PDF for visual check
2. Extract text with pandoc to ensure content unchanged
3. Compare file sizes (formatting-only changes shouldn't dramatically change size)

114
skills/docx/routes/read.md Executable file
View File

@@ -0,0 +1,114 @@
# Route: Read / Analyze / Extract
## Method 1: Text Extraction via pandoc (Fastest)
```bash
# Plain text
pandoc input.docx -t plain -o output.txt
# Markdown (preserves structure)
pandoc input.docx -t markdown -o output.md
# Extract with metadata
pandoc input.docx -t markdown --standalone -o output.md
```
**Best for**: Quick content reading, text analysis, word count, searching.
## Method 2: Raw XML Access (Detailed)
```bash
mkdir work && cd work && unzip ../input.docx
# Read main content
cat word/document.xml
# Read styles
cat word/styles.xml
# List embedded media
ls word/media/
# Read headers/footers
cat word/header1.xml
cat word/footer1.xml
```
**Best for**: Analyzing formatting, extracting styles, inspecting document structure, debugging layout issues.
### Quick XML Parsing
```python
import defusedxml.ElementTree as ET
tree = ET.parse("word/document.xml")
ns = {"w": "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}
# Extract all text
texts = []
for t in tree.iter("{http://schemas.openxmlformats.org/wordprocessingml/2006/main}t"):
if t.text:
texts.append(t.text)
full_text = "".join(texts)
# Count paragraphs
paras = tree.findall(".//w:p", ns)
print(f"Paragraphs: {len(paras)}")
# Find headings
for para in paras:
pPr = para.find("w:pPr", ns)
if pPr is not None:
pStyle = pPr.find("w:pStyle", ns)
if pStyle is not None and "Heading" in pStyle.get(f"{{{ns['w']}}}val", ""):
text = "".join(t.text for t in para.iter(f"{{{ns['w']}}}t") if t.text)
print(f" {pStyle.get(f'{{{ns[\"w\"]}}}val')}: {text}")
```
## Method 3: Convert to Images (Visual Analysis)
```bash
# Convert to PDF first
libreoffice --headless --convert-to pdf input.docx
# Then to images
pdftoppm -png -r 200 input.pdf page
# Generates page-1.png, page-2.png, etc.
```
**Best for**: Visual layout analysis, comparing formatting, generating previews, when user asks "what does it look like".
## Method 4: python-docx Reading
```python
from docx import Document
doc = Document("input.docx")
# Read paragraphs
for para in doc.paragraphs:
print(f"[{para.style.name}] {para.text}")
# Read tables
for table in doc.tables:
for row in table.rows:
print([cell.text for cell in row.cells])
# Document properties
print(f"Sections: {len(doc.sections)}")
print(f"Paragraphs: {len(doc.paragraphs)}")
print(f"Tables: {len(doc.tables)}")
```
## Choosing the Right Method
| Need | Method |
|------|--------|
| Quick text content | pandoc |
| Document structure/outline | pandoc → markdown |
| Formatting details | Raw XML |
| Table data extraction | python-docx |
| Visual appearance | Convert to images |
| Style analysis | Raw XML (styles.xml) |
| Word/character count | pandoc → plain → wc |

783
skills/docx/scenes/academic.md Executable file
View File

@@ -0,0 +1,783 @@
# Scene: Academic / Thesis
## Palette
**Academic Dark** (Cool + Heavy + Calm) — Academic papers use **pure black body text**. Palette only for cover decoration and minimal title scenarios.
```js
const palette = {
primary: "#000000", // Title — pure black
body: "#000000", // Body — pure black
secondary: "#333333", // Header/caption — dark grey
accent: "#8B7E5A", // Cover decoration line — cover only
surface: "#F5F7FA", // Table header light bg — three-line tables only
};
```
⚠️ **Body text color must be pure black `"000000"`**. No decorative dark-blue-grey. Academic papers require print-friendly, black-and-white clarity.
→ Placeholder convention & universal prohibitions — see `references/common-rules.md`
**Note:** This scene uses Profile A fonts with academic-specific overrides below.
---
## Page Layout
| Property | Value | Twips |
|----------|-------|-------|
| Top margin | 2.54 cm | 1440 |
| Bottom margin | 2.54 cm | 1440 |
| Left margin | 3.00 cm | 1701 |
| Right margin | 2.50 cm | 1417 |
| Header distance | 1.5 cm | 850 |
| Footer distance | 1.75 cm | 992 |
```js
page: {
size: { width: 11906, height: 16838 },
margin: { top: 1440, bottom: 1440, left: 1701, right: 1417, header: 850, footer: 992 },
}
```
For binding margin, add 0.51.0 cm to left (i.e., left: 19852268).
---
## Font Specifications
| Element | CN Font | EN Font | Size | half-pt | Style |
|---------|---------|---------|------|---------|-------|
| Thesis title | SimHei | Times New Roman | Xiao Er 18pt | 36 | Bold, centered |
| H1 | SimHei | Times New Roman | San Hao 16pt | 32 | Bold, centered |
| H2 | SimHei | Times New Roman | Xiao San 15pt | 30 | Bold, left |
| H3 | SimHei | Times New Roman | Si Hao 14pt | 28 | Bold, left |
| Body | SimSun | Times New Roman | Xiao Si 12pt | 24 | Normal, justified |
| Abstract title | SimHei | Times New Roman Bold | San Hao 16pt | 32 | Bold, centered |
| Abstract body | SimSun | Times New Roman | Xiao Si 12pt | 24 | Normal, justified |
| Keywords label | SimHei | Times New Roman Bold | Xiao Si 12pt | 24 | Bold |
| Keywords content | SimSun | Times New Roman | Xiao Si 12pt | 24 | Normal |
| Header | SimSun | Times New Roman | Xiao Wu 9pt | 18 | Centered, color 333333 |
| Page number | — | Times New Roman | Xiao Wu 10.5pt | 21 | Centered |
| Footnote | SimSun | Times New Roman | Xiao Wu 9pt | 18 | Normal |
| Figure/table caption | SimSun | Times New Roman | Wu Hao 10.5pt | 21 | Centered |
### Paragraph Format
- Body: justified, first-line indent 2 chars (`firstLine: 480`, SimSun Xiao Si = 480 twips)
- Line spacing: 1.5x (`line: 360`); if school requires fixed 22pt, use `line: 440, lineRule: "exact"`
- Body paragraph spacing: before/after 0pt; heading spacing per styles below
```js
styles: {
default: {
document: {
run: { font: { ascii: "Times New Roman", eastAsia: "SimSun" }, size: 24, color: "000000" },
paragraph: { spacing: { line: 360 } },
},
heading1: {
run: { font: { ascii: "Times New Roman", eastAsia: "SimHei" }, size: 32, bold: true, color: "000000" },
paragraph: { alignment: AlignmentType.CENTER, spacing: { before: 480, after: 360, line: 360 } },
},
heading2: {
run: { font: { ascii: "Times New Roman", eastAsia: "SimHei" }, size: 30, bold: true, color: "000000" },
paragraph: { spacing: { before: 360, after: 240, line: 360 } },
},
heading3: {
run: { font: { ascii: "Times New Roman", eastAsia: "SimHei" }, size: 28, bold: true, color: "000000" },
paragraph: { spacing: { before: 240, after: 120, line: 360 } },
},
},
}
```
---
## Heading Numbering System (Mandatory)
### Format
| Level | Format | Example |
|-------|--------|---------|
| H1 | Chapter X + title | 第一章 绪论 (Chapter 1 Introduction) |
| H2 | X.X + section title | 1.1 Research Background |
| H3 | X.X.X + subsection | 1.1.1 Domestic Research Status |
### Mandatory Rules
1. **H1 must use "第X章" format** — not "一、", not "Chapter 1", not "第1章"
2. **H2/H3 use Arabic decimal numbering** (1.1, 1.1.1) — no "(一)", "1)"
3. **No mixing multiple numbering systems**
4. **No level-skipping** (cannot jump from H1 to H3)
5. **All body headings must use `heading: HeadingLevel.HEADING_X`** (TOC depends on this)
```js
// ✅ Correct
new Paragraph({
heading: HeadingLevel.HEADING_1,
children: [new TextRun({ text: "第一章 绪论", bold: true, size: 32, font: { eastAsia: "SimHei", ascii: "Times New Roman" } })]
})
new Paragraph({
heading: HeadingLevel.HEADING_2,
children: [new TextRun({ text: "1.1 研究背景", bold: true, size: 30, font: { eastAsia: "SimHei", ascii: "Times New Roman" } })]
})
```
### Non-Body Headings
Abstract, Table of Contents, References, Appendices, Acknowledgments:
- Use H1 style (San Hao SimHei centered) for TOC indexing
- But **no numbering** (write directly: "摘 要", "参考文献", etc. — these are non-numbered standalone section headings)
---
## Document Structure & Multi-Section Architecture
Theses must use **multi-section structure** for independent page numbering and header/footer per section.
### Complete Structure
```
Section 1: Cover → No page number, no header/footer
Section 2: Chinese Abstract → Roman numerals starting from i
Section 3: English Abstract → Roman numerals continued
Section 4: Table of Contents → Roman numerals continued
Section 5: Body (all chapters) → Arabic numerals from 1
Section 6: References → Arabic numerals continued
Section 7: Appendices (if any) → Arabic numerals continued
Section 8: Acknowledgments (if any) → Arabic numerals continued
```
### Page Number Implementation
```js
const { NumberFormat } = require("docx");
// Section 1: Cover — no page number
{
properties: {
page: { margin: { top: 0, bottom: 0, left: 0, right: 0 } },
titlePage: true,
},
children: buildCover(...),
}
// Section 2: Abstract — Roman numerals from i
{
properties: {
type: SectionType.NEXT_PAGE,
page: {
margin: { top: 1440, bottom: 1440, left: 1701, right: 1417, header: 850, footer: 992 },
pageNumbers: { start: 1, formatType: NumberFormat.UPPER_ROMAN },
},
},
headers: { default: buildHeader("Thesis Title") },
footers: { default: buildPageNumberFooter() },
children: buildAbstractCN(...),
}
// Section 3: English Abstract — Roman numerals continued (no reset)
{
properties: {
type: SectionType.NEXT_PAGE,
page: {
margin: { top: 1440, bottom: 1440, left: 1701, right: 1417, header: 850, footer: 992 },
pageNumbers: { formatType: NumberFormat.UPPER_ROMAN }, // no start → continues from previous
},
},
headers: { default: buildHeader("Thesis Title") },
footers: { default: buildPageNumberFooter() },
children: buildAbstractEN(...),
}
// Section 5: Body — Arabic numerals from 1
{
properties: {
type: SectionType.NEXT_PAGE,
page: {
margin: { top: 1440, bottom: 1440, left: 1701, right: 1417, header: 850, footer: 992 },
pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL },
},
},
headers: { default: buildHeader("Thesis Title") },
footers: { default: buildPageNumberFooter() },
children: buildMainContent(...),
}
// Section 6+: References/Appendices/Acknowledgments — Arabic continued
```
### Header & Footer Helpers
```js
function buildHeader(title) {
return new Header({ children: [
new Paragraph({ alignment: AlignmentType.CENTER,
border: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "000000" } },
children: [new TextRun({ text: title, size: 18, color: "333333",
font: { ascii: "Times New Roman", eastAsia: "SimSun" } })],
}),
] });
}
function buildPageNumberFooter() {
return new Footer({ children: [
new Paragraph({ alignment: AlignmentType.CENTER,
children: [
new TextRun({ text: "- ", size: 21 }),
new TextRun({ children: [PageNumber.CURRENT], size: 21 }),
new TextRun({ text: " -", size: 21 }),
],
}),
] });
}
```
### Page Break Rules
- Cover is a separate section (no PageBreak needed)
- Chinese abstract, English abstract, TOC each in their own section
- All body chapters in **one section** (no forced page breaks between chapters unless user requests)
- References, appendices, acknowledgments each in their own section
- **Never use blank lines instead of section breaks**
---
## Cover
### Information Fields
Cover must include (use placeholders for missing info):
| Field | Format | Placeholder |
|-------|--------|-------------|
| University name | Er Hao SimHei, centered | ×××University |
| Thesis title (CN) | Xiao Er SimHei, centered | (user-provided) |
| Thesis title (EN) | San Hao Times New Roman, centered | (translated from CN) |
| College | Si Hao SimSun | ×××College |
| Major | Si Hao SimSun | ×××Major |
| Author | Si Hao SimSun | ××× |
| Student ID | Si Hao SimSun | ××××××× |
| Advisor | Si Hao SimSun | ×××Professor |
| Date | Si Hao SimSun | 2026/XX |
### Cover Style
Use Recipe R5 (Clean White) or academic-specific `buildAcademicCover()` — never use commercial-style covers.
### Cover Layout Order (Mandatory)
The visual order on academic covers must follow this hierarchy from top to bottom:
1. School name (top)
2. Document type label (e.g., "Undergraduate Thesis", "Thesis Proposal Report")
3. **Thesis title** (prominent, centered)
4. Thesis English title (if bilingual)
5. **Author information table** (college, major, author, student ID, advisor)
6. Date (bottom)
⚠️ **Title MUST appear ABOVE the author info table.** The screenshot issue of info table appearing above the title is caused by incorrect element ordering. The `buildAcademicCover()` and `buildProposalCover()` functions below enforce correct order.
⚠️ **Layout must be vertically balanced** — use dynamic spacing to distribute whitespace evenly. Do not cram all elements into the top half or let large gaps appear between elements.
```js
function buildAcademicCover(info) {
const { school, title, titleEN, college, major, author, studentId, advisor, date } = info;
// ⚠️ Use safeText() for all values — never output "undefined"
const infoRows = [
["College", safeText(college, "【College】")],
["Major", safeText(major, "【Major】")],
["Author", safeText(author, "【Author】")],
["Student ID", safeText(studentId, "【Student ID】")],
["Advisor", safeText(advisor, "【Advisor】")],
];
const infoTable = new Table({
width: { size: 60, type: WidthType.PERCENTAGE },
alignment: AlignmentType.CENTER,
borders: { top: NB, bottom: NB, left: NB, right: NB, insideHorizontal: NB, insideVertical: NB },
rows: infoRows.map(([label, value]) => new TableRow({
cantSplit: true,
children: [
new TableCell({
width: { size: 35, type: WidthType.PERCENTAGE },
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "000000" }, top: NB, left: NB, right: NB },
margins: { top: 60, bottom: 60, left: 120, right: 120 },
children: [new Paragraph({
alignment: AlignmentType.RIGHT,
children: [new TextRun({ text: label + ":", size: 28, font: { eastAsia: "SimHei", ascii: "Times New Roman" } })],
})],
}),
new TableCell({
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "000000" }, top: NB, left: NB, right: NB },
margins: { top: 60, bottom: 60, left: 120, right: 120 },
children: [new Paragraph({
alignment: AlignmentType.CENTER,
children: [new TextRun({ text: value, size: 28, font: { eastAsia: "SimSun", ascii: "Times New Roman" } })],
})],
}),
],
})),
});
// ⚠️ Correct order: school → doc type → TITLE → info table → date
// ★ Rule 8: All large-font paragraphs must set explicit line spacing
return [
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { before: 1200, after: 400, line: Math.ceil(22 * 23), lineRule: "atLeast" },
children: [new TextRun({ text: safeText(school, "【University Name】"), size: 44, bold: true, font: { eastAsia: "SimHei" } })] }),
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { after: 800, line: Math.ceil(18 * 23), lineRule: "atLeast" },
children: [new TextRun({ text: "Undergraduate Thesis", size: 36, font: { eastAsia: "SimHei" } })] }),
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { after: 200, line: Math.ceil(18 * 23), lineRule: "atLeast" },
children: [new TextRun({ text: safeText(title, "【Thesis Title】"), size: 36, bold: true, font: { eastAsia: "SimHei", ascii: "Times New Roman" } })] }),
titleEN ? new Paragraph({ alignment: AlignmentType.CENTER, spacing: { after: 1200, line: Math.ceil(16 * 23), lineRule: "atLeast" },
children: [new TextRun({ text: titleEN, size: 32, font: { ascii: "Times New Roman" } })] })
: new Paragraph({ spacing: { after: 1200 }, children: [] }),
infoTable,
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { before: 1200, line: Math.ceil(14 * 23), lineRule: "atLeast" },
children: [new TextRun({ text: safeText(date, "2026/XX"), size: 28, font: { eastAsia: "SimSun" } })] }),
];
}
```
### Thesis Proposal Report Cover (开题报告)
Thesis proposal reports use a similar cover layout but with different document type label. The key layout rule is the same: **title above author info, evenly spaced**.
⚠️ **CRITICAL — Proposal cover MUST be an independent section:**
The proposal cover MUST be placed in its **own section** (with margin: 0 and a 16838 wrapper table), completely separate from the body content. The body content starts in the **next section** (with `SectionType.NEXT_PAGE` or as a separate section entry). **Never place the cover elements and body content in the same section** — this causes them to render on the same page without any page break, which is the #1 proposal report formatting failure.
```js
// ✅ Correct — cover and body in separate sections
sections: [
{
properties: { page: { margin: { top: 0, bottom: 0, left: 0, right: 0 } } },
children: buildProposalCover(info), // standalone cover section
},
{
properties: { page: { margin: { top: 1440, bottom: 1440, left: 1701, right: 1417 } } },
children: [...bodyContent], // body starts here
},
]
// ❌ WRONG — cover and body in same section (no page separation!)
sections: [
{
children: [...coverElements, ...bodyContent], // everything on one continuous flow
},
]
```
```js
function buildProposalCover(info) {
const { school, year, title, subtitle, college, major, author, studentId, advisor, date } = info;
// ⚠️ Use safeText() for all values
const infoRows = [
["姓名 (Name)", safeText(author, "XXX")],
["专业 (Major)", safeText(major, "XXX")],
["入学时间 (Enrollment)", safeText(info.enrollment, "XXX")],
];
const infoTable = new Table({
width: { size: 60, type: WidthType.PERCENTAGE },
alignment: AlignmentType.CENTER,
borders: { top: NB, bottom: NB, left: NB, right: NB, insideHorizontal: NB, insideVertical: NB },
rows: infoRows.map(([label, value]) => new TableRow({
children: [
new TableCell({
width: { size: 35, type: WidthType.PERCENTAGE },
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "000000" }, top: NB, left: NB, right: NB },
margins: { top: 60, bottom: 60, left: 120, right: 120 },
children: [new Paragraph({
alignment: AlignmentType.CENTER,
children: [new TextRun({ text: label, size: 28, bold: true, font: { eastAsia: "SimHei", ascii: "Times New Roman" } })],
})],
}),
new TableCell({
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "000000" }, top: NB, left: NB, right: NB },
margins: { top: 60, bottom: 60, left: 120, right: 120 },
children: [new Paragraph({
alignment: AlignmentType.CENTER,
children: [new TextRun({ text: value, size: 28, font: { eastAsia: "SimSun", ascii: "Times New Roman" } })],
})],
}),
],
})),
});
// ⚠️ Correct order: doc type label → info table → "论文题目" label → TITLE → subtitle
// Layout balanced: upper 40% for header + info, middle 20% for title, lower 40% for whitespace
// ★ Rule 8: All large-font paragraphs must set explicit line spacing
return [
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { before: 1500, after: 600, line: Math.ceil(18 * 23), lineRule: "atLeast" },
children: [new TextRun({ text: safeText(year, "2025") + " 届本科毕业论文开题报告",
size: 36, bold: true, font: { eastAsia: "SimHei", ascii: "Times New Roman" } })] }),
infoTable,
new Paragraph({ spacing: { before: 1200 } }), // Balanced whitespace
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { after: 200 },
children: [new TextRun({ text: "论文题目", size: 28, font: { eastAsia: "SimSun", ascii: "Times New Roman" } })] }),
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { after: 200, line: Math.ceil(16 * 23), lineRule: "atLeast" },
children: [new TextRun({ text: safeText(title, "【Thesis Title】"), size: 32, bold: true,
font: { eastAsia: "SimHei", ascii: "Times New Roman" } })] }),
subtitle ? new Paragraph({ alignment: AlignmentType.CENTER, spacing: { after: 800 },
children: [new TextRun({ text: "——" + subtitle, size: 28,
font: { eastAsia: "SimSun", ascii: "Times New Roman" } })] })
: new Paragraph({ spacing: { after: 800 }, children: [] }),
];
}
```
### ⚠️ WPS Compatibility Notes for Academic Covers
Both thesis cover and proposal cover use info tables. These MUST follow the cross-engine rules:
- Table uses **percentage widths** (`WidthType.PERCENTAGE`), NOT DXA — WPS renders DXA widths differently in nested contexts
- Table width: adaptive 5575%, centered via `alignment: CENTER` (calculated by `calcR5MetaLayout()`)
- Label column: **LEFT aligned**, plain text + "", NO full-width space padding, NO borders
- Value column: **LEFT aligned**, `bottom: single sz=4` border = fixed-length underline
- Cell `margins.top/bottom: 60` is acceptable (small values) but avoid larger values
- All paragraphs with font size > 12pt (body) must set `spacing: { line: Math.ceil(fontPt * 23), lineRule: "atLeast" }` to prevent top clipping (Rule 8)
- ⚠️ Do NOT use DXA widths, full-width space padding (`\u3000`), tab stops, or right-alignment for meta info
⚠️ **Proposal cover must fit on one page.** Use the same height-budget approach as commercial covers — total content height must stay within 15638 twips (1200 twips safety margin). If the title is very long, reduce font size (minimum 24pt).
```
---
## Section Content Standards
### Chinese Abstract
**Format:**
- Title: "摘 要" (space in middle), San Hao SimHei centered, H1 style
- Body: Xiao Si SimSun, justified, first-line indent 480 twips
- Keywords: "关键词:" SimHei bold + content SimSun normal, 38 keywords, semicolon-separated
**Content structure (mandatory):**
1. Research background (12 sentences)
2. Research problem/purpose (1 sentence)
3. Research method (12 sentences)
4. Main results/findings (23 sentences)
5. Research significance/value (1 sentence)
⚠️ **Abstract is NOT a TOC summary.** Must not read as "Chapter 1 introduces... Chapter 2 analyzes..."
### English Abstract
- Title: "Abstract", San Hao Times New Roman Bold, centered, H1 style
- Body: Xiao Si Times New Roman, justified
- Keywords: bold label + normal content, 38 keywords, comma-separated
- **Must be consistent with Chinese abstract** — no significant shrinkage
- Use formal academic English, avoid Chinglish
### Table of Contents
- Title: "目 录", San Hao SimHei centered
- Use `TableOfContents` field for auto-generation, display at least H1H2, recommend H3
- Run `"$DOCX_SCRIPTS/add_toc_placeholders.py" --auto` after generation
- TOC on its own page
---
## Body Chapter Structure
### Standard Structure (6-chapter)
```
Chapter 1: Introduction
1.1 Research Background
1.2 Research Purpose & Significance
1.3 Literature Review (Domestic & International)
1.4 Research Content & Methods
1.5 Thesis Structure
Chapter 2: Theoretical Framework & Literature Review
2.1 Core Concept Definitions
2.2 Theoretical Basis
2.3 Literature Review
2.4 Research Gap & Entry Point
Chapter 3: Research Design / Method / Model
3.1 Research Framework
3.2 Method Design / System Architecture / Algorithm
3.3 Variables / Data Sources / Experimental Environment
Chapter 4: Empirical Analysis / Case Study / Results
4.1 Data Analysis / Case Description / Experiment Process
4.2 Results Presentation
4.3 Results Interpretation
Chapter 5: Discussion
5.1 Key Findings
5.2 Comparison with Existing Research
5.3 Limitations
Chapter 6: Conclusions & Outlook
6.1 Research Conclusions
6.2 Contributions
6.3 Limitations
6.4 Future Research Directions
```
### Chapter Content Requirements
**Chapter 1 (Introduction):** Must state background, purpose, significance, methods, content, structure.
**Chapter 2 (Literature Review):** Must be systematically organized by theme/method/stage — **never a chronological dump of papers**. Must identify contributions, gaps, and research opportunities.
**Chapter 3 (Method):** Must explain why this method was chosen and its rationale. Content must be understandable, executable, reproducible.
**Chapter 4 (Results):** Must be specific, not vague. Must be consistent with Chapter 3 design.
**Chapter 5 (Discussion):** Must not merely repeat Chapter 4 results. Must explain what results mean and what conclusions they support.
**Chapter 6 (Conclusions):** Must summarize concisely, state contributions, acknowledge limitations, propose future directions. Must end formally — no abrupt ending.
---
## Discipline-Adaptive Routing
Auto-adjust research methods and chapter emphasis by discipline. **When user doesn't specify method, choose the most appropriate research paradigm for the discipline — never mechanically apply "empirical + survey + regression" template.**
### 1. Humanities & Social Sciences (Literature, History, Philosophy, Arts)
**Preferred methods:** Literature analysis, theoretical research, text analysis, comparative studies, historical research
**Adjustments:** Ch.2 focuses on theoretical lineage; Ch.4 becomes text analysis/case argumentation; minimize "variables", "hypotheses", "regression" terminology
### 2. Management / Economics / Public Administration
**Preferred methods:** Case analysis, surveys, model analysis, institutional research, empirical research
**Adjustments:** Ch.3 focuses on hypotheses, variables, framework; Ch.4 on data collection & analysis; Ch.5 adds management implications/policy recommendations
### 3. Computer Science / Engineering / IT
**Preferred methods:** Method design, system architecture, experimental comparison, performance evaluation, algorithm analysis
**Adjustments:** Ch.3 becomes system/algorithm design; Ch.4 becomes experiments (environment, parameters, control experiments, metric comparison); minimize "interviews", "surveys"
### 4. Education / Linguistics / Communication
**Preferred methods:** Teaching experiments, text analysis, survey research, interview research, case studies
**Adjustments:** Ch.3 focuses on subjects, dimensions, samples; Ch.4 on teaching practice/communication case analysis; Ch.5 adds educational implications/communication strategies
### 5. Law / Marxism / Policy Studies
**Preferred methods:** Normative analysis, statutory interpretation, case studies, institutional comparison, theoretical analysis
**Adjustments:** Ch.2 focuses on legal/policy framework; Ch.4 becomes case analysis/institutional comparison; Ch.5 focuses on normative evaluation, reform recommendations
---
## Figure/Table/Formula Numbering (By Chapter)
### Numbering Rules
| Type | Format | Example |
|------|--------|---------|
| Figure | Figure X-Y | Figure 3-1, Figure 4-2 |
| Table | Table X-Y | Table 2-1, Table 4-3 |
| Formula | Eq. (X-Y) | Eq. (3-1), Eq. (5-2) |
Where X = chapter number, Y = sequential number within chapter.
### Figures
- Caption **below** figure, Wu Hao SimSun, centered
- Format: "Figure X-Y Description"
- Must be referenced in text: "as shown in Figure 3-1"
```js
new Paragraph({ alignment: AlignmentType.CENTER,
children: [new ImageRun({ data: imgBuf, transformation: { width: w, height: h }, type: "png" })] }),
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { before: 60, after: 200 },
children: [new TextRun({ text: "图3-1 System Architecture", size: 21,
font: { eastAsia: "SimSun", ascii: "Times New Roman" } })] }),
```
### Tables
- Caption **above** table, Wu Hao SimSun, centered, `keepNext: true`
- Format: "Table X-Y Description"
- Must use three-line table (mandatory for academic papers)
- Must be referenced in text: "as shown in Table 2-1"
### Formulas
- Formula centered, number **right-aligned**
- Use Tab for center + right alignment
- Text reference: "from Eq. (3-1)"
```js
new Paragraph({
alignment: AlignmentType.CENTER,
tabStops: [
{ type: TabStopType.CENTER, position: 4500 },
{ type: TabStopType.RIGHT, position: 9000 },
],
children: [
new TextRun({ text: "\t" }),
new TextRun({ text: "E = mc²" }),
new TextRun({ text: "\t(3-1)" }),
],
}),
```
### Mandatory Rules
1. Figures/tables/formulas **must be referenced in text** — never placed without explanation
2. Must have introductory and analytical text before/after
3. Must not exceed page margins
4. Insert only when analytically valuable — not for decoration
---
## Citation & Reference System
### In-Text Citation (Sequential Numbering)
Default: **GB/T 7714 sequential numbering**`[1]`, `[2]` in text, references listed in order of appearance.
```js
new TextRun({ text: "[1]", superScript: true, size: 18, font: { ascii: "Times New Roman" } })
```
### Citation Rules
1. In-text numbers must **correspond one-to-one** with reference list
2. **Same source reused keeps the same number**
3. **Do not mix footnote citations and endnote references** (unless user explicitly requests)
4. Footnotes are for supplementary notes only, not primary citations
### Reference Format (GB/T 7714)
```
[1] Author. Title[J]. Journal, Year, Vol(No): Pages.
[2] Author. Book Title[M]. Place: Publisher, Year: Pages.
[3] Author. Title[D]. Location: Institution, Year.
[4] Author. Title[EB/OL]. (Published)[Cited]. URL.
```
### Reference Formatting
```js
// Reference title — H1 style
new Paragraph({ heading: HeadingLevel.HEADING_1, alignment: AlignmentType.CENTER,
children: [new TextRun({ text: "References", bold: true, size: 32, font: { eastAsia: "SimHei" } })] }),
// Each entry — hanging indent
new Paragraph({
indent: { left: 420, hanging: 420 },
spacing: { line: 360 },
children: [new TextRun({ text: "[1] Author. Title[J]. Journal, 2024, 59(3): 45-62.",
size: 21, font: { eastAsia: "SimSun", ascii: "Times New Roman" } })],
}),
```
### Reference Count Guidelines
| Thesis Type | Suggested Count |
|------------|----------------|
| Course paper (30005000 words) | 1015 |
| Undergraduate thesis | 1530 |
| Master's thesis | 4080 |
| Doctoral dissertation | 80150 |
If user specifies APA, MLA, Chicago, or school-specific format, follow that instead.
---
## Three-Line Table (Mandatory for Academic Papers)
All tables in academic papers **must use three-line tables** — no full-border tables.
```js
const threeLineTable = new Table({
width: { size: 100, type: WidthType.PERCENTAGE },
borders: {
top: { style: BorderStyle.SINGLE, size: 4, color: "000000" },
bottom: { style: BorderStyle.SINGLE, size: 4, color: "000000" },
left: { style: BorderStyle.NONE }, right: { style: BorderStyle.NONE },
insideHorizontal: { style: BorderStyle.NONE }, insideVertical: { style: BorderStyle.NONE },
},
rows: [
new TableRow({
tableHeader: true, cantSplit: true,
children: headerCells.map(text => new TableCell({
borders: { bottom: { style: BorderStyle.SINGLE, size: 2, color: "000000" },
top: { style: BorderStyle.NONE }, left: { style: BorderStyle.NONE }, right: { style: BorderStyle.NONE } },
margins: { top: 60, bottom: 60, left: 120, right: 120 },
children: [new Paragraph({ alignment: AlignmentType.CENTER,
children: [new TextRun({ text, bold: true, size: 21, font: { eastAsia: "SimSun", ascii: "Times New Roman" } })] })],
})),
}),
...dataRows, // All borders NONE
],
});
```
---
## Content Quality Constraints (Mandatory)
### Truthfulness & Conservatism
1. **Never fabricate** unverifiable statistics, survey response counts, significance levels, interview subject identities, experimental precision, government document numbers
2. **Never invent** non-existent classic theories, authoritative scholar opinions, regulation names, core data sources
3. When user provides no real data → prefer **theoretical analysis, literature research, case studies, comparative analysis** (low-risk methods)
4. If example data must be constructed → keep scale reasonable, results conservative; never produce "significantly superior" or "dramatically improved" high-risk claims
5. Research conclusions must be **restrained** — do not overstate contributions, effects, or applicability
6. Research limitations must be **honestly disclosed**
### Language Style
1. Formal academic register throughout
2. **Forbidden:** "I think", "everyone knows", "obviously", "it is well known" (subjective expressions)
3. **Forbidden:** Sloganeering, propaganda, advertising-style expressions
4. First occurrence of CN/EN terms should include English original
5. CN/EN punctuation, spacing, and number formats must be consistent throughout
### Structural Consistency
1. Abstract, body, and conclusions **must be consistent** — no self-contradiction
2. Must form complete loop: "research question → method → analysis → findings → conclusions & outlook"
3. Terminology consistent throughout — no concept drift
4. All chapters balanced and substantive — no padding
### Document Cleanliness
1. **No residual** comments, tracked changes, field codes, template default text
2. **No** "TBD", "omitted", "user modifies", "insert figure here" expressions
3. **No** Markdown syntax, HTML tags, code blocks wrapping body text
4. **No** consecutive blank lines, abnormal page breaks, chaotic numbering
5. Final document must be clean, well-formatted, ready for submission
---
## School Standard Override Rule
⚠️ **When user specifies school/journal-specific format requirements, those requirements OVERRIDE all defaults above.**
Common override items:
- Margins (binding margin left 3.5 cm common)
- Body font (some schools require FangSong)
- Line spacing (some schools require fixed 28pt)
- Cover layout (varies significantly by school)
- Reference format (APA, MLA, etc.)
- Heading numbering (some schools use "1", "2" instead of "Chapter 1", "Chapter 2")
### Common Variants
| Thesis Type | Common Differences |
|------------|-------------------|
| Top universities | Strict GB/T 7714, often require STXiaoBiaoSong cover |
| Regular undergraduate | More flexible, SimSun/SimHei sufficient |
| Master's thesis | Requires English abstract, longer lit review, innovation statement |
| Doctoral dissertation | Requires innovation statement, publication list, originality declaration |
---
## Scene-Specific Quality Checks
In addition to universal checks (see `references/common-rules.md`):
### Structure & Content
- [ ] Cover, abstract, English abstract, TOC, body, references all present
- [ ] Cover info complete (school/title/EN title/college/major/name/ID/advisor/date)
- [ ] Abstract contains 5 elements: background + problem + method + results + significance
- [ ] English abstract consistent with Chinese abstract
- [ ] All chapters balanced, substantive, logical loop complete
- [ ] Literature review is thematic, not chronological dump
- [ ] Conclusions respond to research questions
### Format & Layout
- [ ] Heading numbering consistent (Chapter X / X.X / X.X.X), no mixing
- [ ] All body headings use `heading: HeadingLevel.HEADING_X`
- [ ] Body text pure black `"000000"`
- [ ] Three-line tables used consistently (no full-border tables)
- [ ] Figure captions below, table captions above, numbered by chapter
- [ ] Formulas centered, numbers right-aligned
- [ ] In-text citations match reference list one-to-one
- [ ] References use hanging indent, consistent format
- [ ] Page numbers: front matter Roman, body Arabic from 1
- [ ] Cover has no page number
- [ ] Headers formal and concise
- [ ] No extra blank pages
### Cleanliness
- [ ] No comment/revision residuals
- [ ] No "TBD" / "omitted" expressions
- [ ] No Markdown/HTML/code block residuals
- [ ] No consecutive blank lines or abnormal page breaks
- [ ] No fabricated high-risk data or exaggerated conclusions

463
skills/docx/scenes/contract.md Executable file
View File

@@ -0,0 +1,463 @@
# Scene: Contract / Agreement
## Goal
Generate a complete, formal, well-structured legal document with clear clauses, rigorous logic, and proper formatting. Must simultaneously meet:
- Complete structure, clear clauses, formal language, explicit responsibilities
- Identifiable risk boundaries, proper Word formatting
- Ready for review, revision, circulation, or signing preparation
**Forbidden:** Producing outlines-only / sample clauses / drafting advice / risk summaries; outputting chat-style explanations or filler phrases.
→ Font profile: **A (Formal)** — see `references/common-rules.md`
→ Default layout: standard margins — see `references/common-rules.md`
→ Placeholder convention & universal prohibitions — see `references/common-rules.md`
---
## Contract Type Routing
```js
function selectContractType(keywords, topic) {
if (/confidential|NDA|non-disclosure/.test(keywords)) return "nda";
if (/transfer|equity|asset|rights/.test(keywords)) return "transfer";
if (/framework|strategic|cooperation agreement/.test(keywords)) return "framework";
if (/terms|platform rules|user agreement|privacy/.test(keywords)) return "terms";
return "bilateral"; // default: bilateral commercial contract
}
```
### 5 Contract Types
| Type | Use Case | Structure Focus |
|------|----------|----------------|
| bilateral | Service/sale/development/procurement contracts | Subject → Consideration → Performance → Acceptance → Breach → Dispute |
| transfer | Equity/debt/asset/rights transfer | Subject → Consideration → Closing & Registration → Representations → Tax |
| nda | Non-disclosure agreements | Definition of Confidential Info → Obligations → Use Restrictions → Exceptions → Duration |
| framework | Cooperation framework / strategic alliance | Scope → Division of Work → Mechanism → Subsequent Agreements |
| terms | Platform rules / Terms of Service / User agreements | Definitions → Services → Rights & Obligations → Liability Limits → Amendments |
---
## Standard Template Structures
### Template A: Bilateral Commercial Contract
1. Header (title, contract number, date, location)
2. Party Information (Party A, Party B)
3. Recitals ("Whereas" clauses)
4. Definitions & Interpretation
5. Subject Matter & Scope of Services/Delivery
6. Contract Price & Payment Terms
7. Rights & Obligations of Both Parties
8. Timeline, Delivery & Acceptance
9. Invoicing, Tax & Settlement
10. Intellectual Property & Confidentiality
11. Representations & Warranties (if applicable)
12. Liability for Breach
13. Force Majeure
14. Termination & Dissolution
15. Notices & Service
16. Dispute Resolution
17. Miscellaneous
18. Signature Block
### Template B: Rights Transfer Agreement
1. Header & Parties
2. Recitals
3. Definitions & Interpretation
4. Subject of Transfer
5. Consideration & Payment Arrangement
6. Closing & Registration/Transfer
7. Representations & Warranties
8. Tax Allocation
9. Liability for Breach
10. Dispute Resolution
11. Miscellaneous
12. Signature Block
### Template C: Non-Disclosure Agreement (NDA)
1. Header & Parties
2. Recitals
3. Definition of Confidential Information
4. Confidentiality Obligations
5. Use Restrictions
6. Return, Deletion & Destruction of Information
7. Exceptions
8. Confidentiality Period
9. Liability for Breach
10. Dispute Resolution
11. Miscellaneous
12. Signature Block
### Template D: Framework / Cooperation Agreement
1. Header & Parties
2. Recitals
3. Purpose & Principles
4. Scope of Cooperation
5. Division of Work & Responsibilities
6. Project Advancement Mechanism
7. Commercial Arrangements / Subsequent Agreements
8. Confidentiality, IP & Compliance
9. Term, Amendment & Termination
10. Liability for Breach
11. Dispute Resolution
12. Miscellaneous
13. Signature Block
### Template E: Unilateral Terms / Platform Rules
1. Document Title
2. Definitions & Scope
3. Service/Rule Content
4. User Rights & Obligations / Platform Rights & Obligations
5. Liability Limitations & Disclaimers
6. Fees & Payment (if applicable)
7. Intellectual Property
8. Termination, Suspension & Amendment
9. Notices & Service
10. Dispute Resolution
11. Miscellaneous
**Note:** Unilateral/boilerplate terms require special attention to adhesion clause risks — avoid creating extremely one-sided documents.
**If the user provides an existing template, historical agreement, or company standard, always follow it first.**
---
## Input Recognition & Completion
### Processing Rules
1. If user provides a template, historical agreement, or company standard → **always follow it first**
2. If information is incomplete, fill conservatively — must be **restrained, natural, professional, consistent with transaction logic**
3. **Never fabricate** unrealistic commercial terms, regulatory requirements, approval conclusions, qualification status, tax treatment results, payment facts, or performance facts
4. If critical info is missing → use standardized placeholders
5. If user does not specify jurisdiction → default to PRC commercial writing conventions, but avoid making specific legal conclusions
---
## Legal Writing Standards
### Register
1. Use formal legal document register
2. Use clear party designations: "Party A", "Party B", "both parties", "either party", "non-breaching party", "breaching party"
3. **Forbidden:** Colloquial expressions ("you", "me", "they", "pay up", "cancel the contract", "handle ASAP")
4. Preferred terms: "pay consideration", "perform obligations", "constitute a breach", "terminate the contract", "assume liability for damages", "written notice", "deliver and accept", "representations and warranties"
### Precision
1. Eliminate vague adjectives: avoid "quality", "reasonable", "enormous", "appropriate", "ASAP" unless necessary for legal flexibility
2. Each obligation must specify: who, when, how, what
3. Consistent legal phrasing:
- Mandatory obligation → "shall"
- Right authorization → "has the right to"
- Prohibition → "shall not"
- Discretionary → "may"
4. Amounts, dates, percentages, deadlines, business days vs. calendar days must be as specific as possible
### Clear Subjects
1. Every clause must have an explicit responsible party — avoid vague subjects ("relevant parties", "relevant personnel", "when necessary")
2. Joint obligations: explicitly write "both parties agree" or "both parties shall"
3. Unilateral obligations: explicitly write "Party A shall" or "Party B shall"
---
## Transaction Closure & Risk Control
A contract must not only describe the transaction — it must ensure logical closure. Check the following:
1. If a performance deadline is specified → specify consequences of delay
2. If payment milestones are specified → specify payment conditions, method, invoice requirements
3. If a delivery obligation exists → specify delivery standards, method, acceptance rules, objection period
4. If termination rights exist → specify conditions, notice, effective date, post-termination settlement
5. If breach liability exists → must correspond to main obligations in preceding clauses
6. If IP/technology/data/trade secrets are involved → separately address ownership, license scope, use restrictions
7. If confidentiality obligations exist → define scope, exceptions, duration, breach consequences
8. If force majeure clause exists → specify notice obligation, mitigation duty, subsequent negotiation mechanism
9. If notice/service arrangements exist → specify address, contact person, email, or other delivery method
10. If user requests significantly one-sided adhesion/disclaimer clauses → add a note near the clause:
`[Note: This clause may involve adhesion terms or liability limitations. Manual review recommended for the specific transaction.]`
---
## Truthfulness & Legal Caution
1. **Never fabricate** specific statute article numbers, judicial interpretation numbers, or regulatory document numbers
2. Legal bases should use general references, e.g.: "In accordance with the Civil Code of the PRC and relevant laws and regulations..."
3. **Never** pretend to provide formal legal opinions, litigation success predictions, or definitive validity/invalidity conclusions
4. **Never** state definitive legality conclusions for high-risk clauses (adhesion terms, penalty clauses, disclaimers, non-compete, exclusivity, unilateral interpretation rights)
5. **Never** fabricate that regulatory approvals are obtained, title is unencumbered, tax compliance is assured, or third-party consent is secured
6. When critical info is insufficient → use placeholders, never present as confirmed fact
7. For high-risk areas (equity, debt, licenses, data compliance, labor, personal information, cross-border) → maintain restrained language, do not add rigid commitments without user confirmation
---
## Special Clause Requirements
### Definitions Clause
If the document repeatedly uses specialized terms ("deliverables", "service results", "confidential information", "source code", "project milestones", "acceptance criteria", "trade secrets"), include a "Definitions & Interpretation" clause near the beginning.
### Dispute Resolution
1. Must be explicit
2. Choose between litigation OR arbitration — never mix both
3. Litigation → specify jurisdictional connection point
4. Arbitration → specify arbitration institution
5. If user hasn't specified → use placeholder for confirmation
### Tax Clause
1. If the transaction involves taxes → specify which party bears them, whether price includes tax, invoice type and conditions
2. Avoid vague "taxes borne as required by law" without transaction-specific detail
### Breach Liability
1. Must correspond to main obligations in preceding clauses
2. Penalty amounts should be restrained — avoid obviously exaggerated or severely imbalanced figures
3. If fundamental breach exists → consider corresponding termination rights and damages
### Appendices
1. For complex subjects/pricing/technical requirements/deliverables → use "Appendix 1, Appendix 2..." format
2. Explicitly state appendix-contract relationship (typically: "Appendices form an integral part of this contract")
3. If appendix content is unknown → use placeholder
---
## Palette
**Legal Wood** (Warm + Heavy + Calm) — for decorative elements only; body text must be pure black.
```js
const palette = { primary:"#28201C", body:"#000000", secondary:"#6E6560", accent:"#7A5C3A", surface:"#FBF9F7" };
```
⚠️ **ALL visible text in contracts must be pure black `"000000"`.** This includes:
- Contract title (SimHei, black, NOT accent color)
- Contract number (black)
- Clause headings (black)
- Body text (black)
- Party information (black)
- Signature block text (black)
**The only exception** is red-header official documents (红头文件), which follow their own GB/T 9704 color rules. For standard contracts, NO colored text is permitted — no red, no accent color, no dark-blue-grey.
```js
// ✅ Contract title — always pure black
new Paragraph({ alignment: AlignmentType.CENTER,
spacing: { line: Math.ceil(22 * 23), lineRule: "atLeast" }, // ★ Rule 8: prevent clipping
children: [new TextRun({ text: "Training Cooperation Framework Agreement",
size: 44, bold: true, color: "000000", // ← MUST be "000000"
font: { eastAsia: "SimHei", ascii: "Times New Roman" } })]
})
// ❌ FORBIDDEN — accent/palette color on contract text
new TextRun({ text: "Training Cooperation Framework Agreement", color: palette.accent }) // ← WRONG
new TextRun({ text: "Contract No.:", color: palette.primary }) // ← WRONG (if primary ≠ "000000")
```
---
## Scene-Specific Font Overrides
Beyond Profile A defaults:
| Element | Font | Size | Style |
|---------|------|------|-------|
| Contract title | SimHei | Er Hao 22pt (size: 44) | Bold, centered |
| Contract number | SimSun | Wu Hao 10.5pt (size: 21) | Right-aligned |
| Clause heading | SimHei | Xiao Si 12pt (size: 24) | Bold |
| Monetary amount | SimSun | Xiao Si 12pt (size: 24) | Bold |
---
## Document Structure
1. **Title**: "XXX Contract" or "XXX Agreement" — Er Hao SimHei, centered
2. **Contract number**: right-aligned, Wu Hao
3. **Preamble**: Party information with placeholders
4. **Recitals** (summarize transaction background and purpose)
5. **Definitions** (if specialized terms recur)
6. **Substantive clauses** (per selected template)
7. **Signature block**
8. **Appendices** (if any)
---
## Clause Numbering System
Use stable, consistent, pure-text numbering suitable for Chinese legal documents.
```
Article 1 Subject Matter
1.1 xxxxxxxxxx
1.2 xxxxxxxxxx
(1) xxxxxxxxxx
(2) xxxxxxxxxx
① xxxxxxxxxx
② xxxxxxxxxx
Article 2 Price and Payment
2.1 ...
```
**Numbering discipline:**
1. No level-skipping
2. **Forbidden:** Using Markdown list markers (`-` `*` `1.`) for clause hierarchy
3. No switching from "Article X" to `-` or `*` or auto-list mid-document
4. Numbering style must be consistent throughout the entire document
5. Clause headings should be clean and simple
---
## Party Information Layout (Table-Based Alignment — Mandatory)
Party A and Party B information MUST be laid out using a **borderless table** so that labels align vertically. Never use plain paragraphs with indentation — this causes misalignment between parties.
```js
// ✅ Correct — borderless table ensures "统一社会信用代码:", "地址:", "法定代表人:" align
function partyInfoBlock(partyLabel, partyName, fields) {
// fields: [["Unified Social Credit Code", value], ["Address", value], ["Legal Representative", value]]
const NB = { style: BorderStyle.NONE, size: 0, color: "FFFFFF" };
const noBorders = { top: NB, bottom: NB, left: NB, right: NB };
const headerPara = new Paragraph({ spacing: { before: 200, after: 120 },
children: [new TextRun({ text: `${partyLabel}: ${safeText(partyName, "【Company full name】")}`,
size: 24, font: { eastAsia: "SimSun", ascii: "Times New Roman" } })]
});
const infoTable = new Table({
width: { size: 90, type: WidthType.PERCENTAGE },
borders: { top: NB, bottom: NB, left: NB, right: NB, insideHorizontal: NB, insideVertical: NB },
rows: fields.map(([label, value]) => new TableRow({
children: [
new TableCell({
width: { size: 35, type: WidthType.PERCENTAGE },
borders: noBorders,
margins: { top: 40, bottom: 40, left: 420, right: 60 },
children: [new Paragraph({
children: [new TextRun({ text: `${label}:`, size: 24,
font: { eastAsia: "SimSun", ascii: "Times New Roman" } })],
})],
}),
new TableCell({
borders: noBorders,
margins: { top: 40, bottom: 40, left: 60, right: 120 },
children: [new Paragraph({
children: [new TextRun({ text: safeText(value, `【Please fill in: ${label}`), size: 24,
font: { eastAsia: "SimSun", ascii: "Times New Roman" } })],
})],
}),
],
})),
});
return [headerPara, infoTable];
}
// Usage:
const partyAChildren = partyInfoBlock("Party A (甲方)", config.partyA?.name, [
["Unified Social Credit Code (统一社会信用代码)", config.partyA?.creditCode],
["Address (地址)", config.partyA?.address],
["Legal Representative (法定代表人/负责人)", config.partyA?.legalRep],
]);
```
**Rules:**
1. Party A and Party B info blocks must use the **same table column widths** — labels align across both blocks
2. Use `safeText()` for all field values — never output `undefined`
3. Label column width should accommodate the longest label (e.g., "统一社会信用代码")
4. The indent (`margins.left: 420`) simulates sub-level nesting under the party name
---
## Signature Block
Left-right symmetric, structured, easy to adjust in Word. Never write as scattered paragraphs.
Required fields for each party:
- Party name (seal)
- Legal representative / Authorized representative
- Contact person
- Contact information
- Signing location
- Date: 【____/____/____】
Use a borderless 2-column table for symmetry. **Every field value must use `safeText()`** — never output `undefined` or empty string. If a field is not provided, use the appropriate `【Please fill in】` placeholder.
```js
// ✅ Correct signature block — safeText for all values
function buildSignatureBlock(partyA, partyB) {
const fields = ["Party (Seal)", "Legal Rep / Authorized Rep (Signature)", "Contact Person", "Contact Info", "Signing Location", "Date"];
const NB = { style: BorderStyle.NONE, size: 0, color: "FFFFFF" };
const noBorders = { top: NB, bottom: NB, left: NB, right: NB };
return new Table({
width: { size: 100, type: WidthType.PERCENTAGE },
borders: { top: NB, bottom: NB, left: NB, right: NB, insideHorizontal: NB, insideVertical: NB },
rows: fields.map((label, i) => {
const aVal = i === fields.length - 1 ? "【____/____/____】" : safeText(partyA?.[i], "");
const bVal = i === fields.length - 1 ? "【____/____/____】" : safeText(partyB?.[i], "");
const displayA = i === 0 ? `Party A (甲方): ${aVal}` : `${label}: ${aVal}`;
const displayB = i === 0 ? `Party B (乙方): ${bVal}` : `${label}: ${bVal}`;
return new TableRow({
children: [
new TableCell({ width: { size: 50, type: WidthType.PERCENTAGE }, borders: noBorders,
margins: { top: 80, bottom: 80, left: 120, right: 60 },
children: [new Paragraph({ children: [new TextRun({ text: displayA, size: 24, color: "000000" })] })] }),
new TableCell({ width: { size: 50, type: WidthType.PERCENTAGE }, borders: noBorders,
margins: { top: 80, bottom: 80, left: 60, right: 120 },
children: [new Paragraph({ children: [new TextRun({ text: displayB, size: 24, color: "000000" })] })] }),
],
});
}),
});
}
```
---
## Monetary Amount Format
Contracts must show amounts in **both uppercase Chinese and numeric format**:
```
Contract amount: RMB One Million Two Hundred Thirty-Four Thousand Five Hundred Sixty-Seven Yuan (¥1,234,567.00)
```
---
## Style Rules
- **NO cover page** — title page is the first page (title + contract number at top)
- **NO TOC** unless >20 clauses
- **NO decorative elements** — contracts must be formal and clean
- **Line spacing**: 1.5x (line: 360) — ⚠️ scene override (Profile A default is 1.3x/312; contracts use 1.5x for readability and annotation space)
- **Body**: Justified, first-line indent 480 twips
- **Color**: pure black "000000" throughout — no colored text
---
## Scene-Specific Quality Checks
In addition to universal checks (see `references/common-rules.md`):
### Format
- [ ] Party information complete (full name / address / legal representative / contact)
- [ ] Signature block properly formatted, symmetrical, all fields present
- [ ] Monetary amounts shown in both uppercase and numeric format
- [ ] Clause numbering sequential with no gaps
- [ ] No cover page (title page is first page)
- [ ] No Markdown list markers mixed into clause hierarchy
### Content
- [ ] Clause numbering system consistent, no mixing
- [ ] Transaction closure complete (subject → consideration → performance → acceptance → breach → dispute)
- [ ] Breach liability corresponds to main obligations
- [ ] Dispute resolution explicitly stated (or placeholder for confirmation)
- [ ] All unconfirmed variables use `【】` placeholders consistently
- [ ] Language is formal, restrained, subjects are explicit
- [ ] No fabricated statute numbers or overreaching legal conclusions
- [ ] High-risk clauses include manual review notes
- [ ] Terminology consistent throughout
- [ ] Appendix-contract relationship explicitly stated
### Closure
- [ ] Performance deadline → delay consequences specified
- [ ] Payment milestones → conditions and invoice requirements specified
- [ ] Delivery obligation → acceptance rules and objection period specified
- [ ] Termination right → conditions and post-termination handling specified
- [ ] Confidentiality obligation → scope, exceptions, duration, breach consequences specified
- [ ] Force majeure → notice and mitigation duties specified

139
skills/docx/scenes/copywriting.md Executable file
View File

@@ -0,0 +1,139 @@
# Scene: Copywriting / Script
## Scope
Broadcast scripts, product promotion copy, livestream scripts, presentation scripts, speeches, hosting scripts, short video scripts — any document where the goal is **spoken delivery**.
→ Placeholder convention & universal prohibitions — see `references/common-rules.md`
→ Font profile: **B (Visual)** — see `references/common-rules.md`
---
## 1. Core Principles
⚠️ **A broadcast script is NOT a report, NOT a spec sheet, NOT an encyclopedia.**
The goal is for the audience to **understand on first listen, remember key points, and take action.** Therefore:
1. **Highlight selling points, don't pile specs:** Each paragraph covers only 12 core points with relatable scenario descriptions
2. **Conversational tone:** Use "you" not "the user"; use natural speech, not corporate jargon
3. **Rhythm:** Alternate long and short sentences, insert pause markers, avoid wall-of-text paragraphs
4. **Length discipline:** ~250300 words per minute of speech; a 5-minute script should not exceed 1500 words
5. **Information consistency:** All data, model numbers, prices must be consistent throughout — no self-contradiction
---
## 2. Document Structure
Completely different from reports:
```
Title (centered, short and punchy)
────────────────────
[Opening] ← Grab attention, 12 sentences
[Core Para 1] ← One selling point/opinion + scenario
[Core Para 2] ← One selling point/opinion + scenario
[Core Para 3] ← One selling point/opinion + scenario (max 35 paras)
[Closing] ← Summary + Call to Action (CTA)
────────────────────
[Notes] ← Supplementary info, data sources (optional, small grey text)
```
### Decisions
- **Cover:** ❌ Not needed
- **TOC:** ❌ Not needed
- **Header/footer:** Optional, minimal
- **Sections:** Single section sufficient
- **Line spacing:** `line: 400` (slightly larger than standard 1.5x for reading/marking ease)
---
## 3. Layout Standards
### Font Specifications
| Element | Font | Size | Style |
|---------|------|------|-------|
| Title | SimHei | 18pt (size:36) | Bold, centered |
| Section heading / highlight | SimHei | 14pt (size:28) | Bold |
| Body | Microsoft YaHei | 12pt (size:24) | Left-aligned |
| Rhythm markers | Microsoft YaHei | 10.5pt (size:21) | Grey 999999, italic |
| Notes | Microsoft YaHei | 10pt (size:20) | Grey 666666 |
### Paragraph Spacing
```js
// Generous spacing between paragraphs for reading/breathing pauses
spacing: { before: 200, after: 200, line: 400 }
// Larger gap between core sections
sectionGap: { before: 400, after: 200 }
```
### Key Point Highlighting
Use **bold** or **accent-colored text** to mark key selling points:
```js
new TextRun({ text: "Key selling point", bold: true, color: c(P.accent) })
```
### Rhythm Markers (optional)
Insert small grey markers where pauses, emphasis, or tone changes are needed:
```js
new Paragraph({ spacing: { before: 60, after: 60 },
children: [new TextRun({ text: "[Pause 2 sec]", size: 21, color: "999999", italics: true })] })
// Or inline: new TextRun({ text: " [emphasis] ", size: 18, color: "999999", italics: true })
```
---
## 4. Content Quality Rules
### Information Density Guide
| Script Type | Duration | Word Count | Core Paragraphs |
|-------------|----------|-----------|----------------|
| Short video | 3060 sec | 150300 | 12 |
| Product promotion | 23 min | 500800 | 34 |
| Presentation / Speech | 510 min | 12002500 | 58 |
| Hosting script | Per agenda | Per segment | Per segment |
### Scene-Specific Prohibitions
1. **No spec dumping:** Do not list all product specifications in tables. Select 23 most persuasive data points and express them through scenarios
2. **No information contradiction:** Model numbers, prices, data appearing multiple times must be perfectly consistent
3. **No report tone:** No "in conclusion", "research indicates", "as mentioned above" — this is spoken word
4. **No lengthy citations:** Broadcast scripts do not need quotes, footnotes, or references
5. **No dense layout:** Paragraphs must have visible spacing — no screen-filling text walls
### Product Promotion Specific Rules
- **Opening:** Lead with pain point / scenario ("Does your washing machine still smell after a cycle?"), not self-introduction
- **Product intro:** Compare only 12 competitive dimensions at a time — not a full review
- **Price anchor:** State original/market price first, then discount price — create contrast
- **CTA:** Explicitly state the action ("Click the link below", "Type 1 in comments")
---
## 5. Palette
Broadcast scripts use clean, simple colors — no complex visual design needed:
```js
const P = {
primary: "#1A1A1A", // Title
body: "#333333", // Body
secondary: "#666666", // Notes
accent: "#E85D3A", // Key highlight (warm, energetic)
surface: "#FFF8F5", // Background (if needed)
};
```
---
## 6. Scene-Specific Quality Checks
In addition to universal checks (see `references/common-rules.md`):
- [ ] Total word count within target range (not exceeding)
- [ ] Each core paragraph has only 12 selling points (no dumping)
- [ ] Conversational tone present (not report/formal style)
- [ ] Information consistent throughout (model, price, data — no contradictions)
- [ ] Paragraph spacing sufficient (visually not crowded)
- [ ] Clear attention-grabbing opening + closing CTA

698
skills/docx/scenes/exam.md Executable file
View File

@@ -0,0 +1,698 @@
# Scene: Exam Paper
## Overview
Exam papers are among the most critical document types in education. Unlike general documents, they require high precision in layout, print compatibility, and subject-specific formatting. This specification covers the complete workflow from page framework to subject-specific features.
→ Universal prohibitions — see `references/common-rules.md`
**Note:** Exam papers use their OWN font/layout specs (not Profile A defaults). All text is pure black/white/grey for photocopy clarity.
---
## 1. Page Setup & Framework
### Paper Specifications
| Type | Paper | Orientation | Use Case |
|------|-------|-------------|----------|
| Practice / Unit quiz | A4 | Portrait | Daily practice, homework, quizzes |
| Formal exam | A3 | Landscape + 2-column | Midterm / final / standardized (requires OOXML) |
| Answer sheet | A4 | Portrait | Standalone answer card |
### Margins
```js
// A4 portrait — no seal line
page: { size: { width: 11906, height: 16838 },
margin: { top: 850, bottom: 850, left: 1200, right: 1200 } }
// A4 portrait — with seal line (left binding area reserved)
page: { size: { width: 11906, height: 16838 },
margin: { top: 850, bottom: 850, left: 2200, right: 850 } }
// A3 landscape dual-column (requires OOXML)
// ⚠️ A3 dual-column may render slightly differently in WPS vs Word. Test in both before batch printing.
page: { size: { width: 23812, height: 16838, orientation: PageOrientation.LANDSCAPE },
margin: { top: 850, bottom: 850, left: 2200, right: 850 } }
```
### Section Handling
Different parts should use section breaks (`SectionType.NEXT_PAGE`):
- **Header area (full-width):** Title, instructions, score table (no columns)
- **Content area:** Questions (may use columns)
- **Composition / answer sheet:** Independent section, independent format
- **Attachment pages:** Large maps/diagrams for geography/biology can be separate pages
```js
sections: [
{ properties: { /* Header section — no columns */ }, children: [...] },
{ properties: { type: SectionType.CONTINUOUS, column: { count: 2, space: 720 } }, children: [...] },
{ properties: { type: SectionType.NEXT_PAGE }, children: [...] }, // Composition
]
```
### Template-First Principle
⚠️ **Build framework first, fill content second.** Before writing questions, determine:
1. Paper size + margins
2. Whether seal line is needed
3. Whether columns are used
4. Question type structure and point allocation
5. Whether composition grid / answer sheet is needed
---
## 2. Seal Line & Student Information Area
### When to Use Seal Line
| Scenario | Seal Line | Student Info Position |
|----------|-----------|---------------------|
| Formal standardized exam | ✅ Required | Left vertical info column |
| Midterm / Final | ✅ Recommended | Left vertical info column |
| Unit quiz | ❌ Optional | Header horizontal info row |
| Daily practice | ❌ Skip | Header horizontal info row |
### Seal Line Implementation
#### Method 1: Header horizontal prompt (simple)
```js
headers: { default: new Header({ children: [
new Paragraph({ alignment: AlignmentType.CENTER,
children: [new TextRun({
text: ".............. Seal ...... Line ...... Do ...... Not ...... Answer ...... Inside ..............",
size: 16, color: "999999", font: "SimSun" })] })
] }) }
```
#### Method 2: Vertical text box (OOXML advanced)
```xml
<w:txbxContent>
<w:p><w:pPr><w:jc w:val="center"/></w:pPr>
<w:r><w:rPr><w:sz w:val="18"/><w:color w:val="999999"/></w:rPr>
<w:t>Name:________ Class:________ ID:________</w:t></w:r>
</w:p>
<w:p><w:r><w:rPr><w:sz w:val="16"/><w:color w:val="CCCCCC"/></w:rPr>
<w:t>- - - - - - - - - Seal Line - - - - - - - - -</w:t></w:r>
</w:p>
</w:txbxContent>
```
### Student Info Row
```js
// Horizontal info row (when no seal line) — borderless 3-column table
new Table({
alignment: AlignmentType.CENTER, columnWidths: [2800, 2800, 2800],
rows: [new TableRow({ children: [
cell("Name: ______________"),
cell("Class: ______________", AlignmentType.CENTER),
cell("ID: ______________", AlignmentType.RIGHT),
] })]
})
```
Fill lines should be moderate length (1014 underscore chars). Label order: Name → Class → Student ID.
---
## 3. Paper Header & Title Area
### Structure
```
School name (16pt SimHei, centered)
Exam title (14pt SimHei, centered) — e.g., "20252026 Academic Year Second Semester Midterm"
Subject title (14pt SimHei, centered) — e.g., "Grade 7 Mathematics"
Student info row
Instructions (10pt SimSun, centered, grey)
Score table (as needed)
```
### Font Specifications
| Element | Font | Size | Style |
|---------|------|------|-------|
| School name | SimHei | 16pt (size:32) | Bold, centered |
| Exam title | SimHei | 14pt (size:28) | Bold, centered |
| Subject title | SimHei | 14pt (size:28) | Bold, centered |
| Instructions | SimSun | 10pt (size:20) | Grey 333333, centered |
| Student info | SimSun | 10.5pt (size:21) | Normal |
### Instructions Content
Should include: total score, exam duration, answer method, special requirements (e.g., calculator allowed).
### Score Table
- Header row: light grey background F0F0F0, centered
- Columns: Question type | Section names... | Total
- Rows: Points | Section points... | Total points
- Row: Score | blank... | blank
- Table centered, 80% page width
⚠️ **Header area should not be too full** — title + info + instructions + score table should not exceed 1/3 of the page.
---
## 4. Content Layout Rules
### Color Palette
```js
// Exam papers use only black/white/grey for clear photocopying
const C = {
title: "000000", body: "000000", section: "333333",
seal: "999999", answerLine: "CCCCCC", headerBg: "F0F0F0", gridLine: "DDDDDD",
};
```
### Column Usage
| Subject / Question Type | Recommendation |
|------------------------|----------------|
| Math multiple choice + fill-in | ✅ Suitable for columns |
| Physics multiple choice | ✅ Suitable for columns |
| Chinese reading / composition | ❌ Not suitable |
| English cloze / reading | ❌ Not suitable |
| History source-based | ❌ Not suitable |
| Geography map reading | ❌ Not suitable |
### Question Numbering
Entire paper uses consistent three-level numbering:
- **Major sections:** I, II, III, IV... (Chinese: 一、二、三、四…)
- **Questions:** 1. 2. 3. ... (Arabic + period)
- **Sub-questions:** (1) (2) (3) ... (parenthesized)
⚠️ **No extra symbols before question numbers** (no `•`, `▸`, `▪`, `-`, `*`). The number itself is the only marker. **Never use docx numbering/bullet list styles** for question numbers — must use plain TextRun manual numbering.
```js
// ✅ Correct — plain TextRun manual numbering
new Paragraph({ spacing: { before: 120, after: 60, line: 360 },
children: [new TextRun({ text: `${i+1}. ${question}`, size: 21, font: { eastAsia: "SimSun" } })] })
// ❌ Wrong — numbering causes Word to add bullets
new Paragraph({ numbering: { reference: "xxx", level: 0 }, // ← Forbidden!
children: [new TextRun({ text: question })] })
```
### Question Spacing
```js
sectionTitle: { before: 300, after: 150 } // Major section headers
question: { before: 120, after: 80 } // Between questions
subQuestion: { before: 60, after: 40 } // Between sub-questions
```
### Page Break Control
⚠️ Key principles:
- **Question stem and answer area must not split** across pages
- **Source material and questions on same page**
- **Figures adjacent to their questions**
- **Avoid orphan lines** — question stem, options, answer area appear as a group
```js
new Paragraph({ keepNext: true, keepLines: true, children: [...] })
```
⚠️ **Answer question page break rule (mandatory):**
Complete combination (stem + figure + answer lines) must be considered as a unit. If remaining space cannot fit stem + figure + at least 3 answer lines, push entire question to next page.
Use `keepNext: true` to chain: stem → figure → first 3 answer lines.
---
## 5. Font & Paragraph Standards
### Underline Formatting for "Underlined Parts" (Mandatory)
When a question references "underlined part" (划线部分), the relevant text MUST use actual underline formatting (`underline: { type: UnderlineType.SINGLE }`). **Never** show "划线部分为 XXX" as plain text annotation — the underline must be visually rendered.
```js
// ✅ Correct — actual underline on the referenced text
new Paragraph({ children: [
new TextRun({ text: "1. It is ", size: 21, font: { ascii: "Times New Roman" } }),
new TextRun({ text: "a butterfly", size: 21, font: { ascii: "Times New Roman" },
underline: { type: UnderlineType.SINGLE, color: "000000" } }),
new TextRun({ text: ". (Ask about the underlined part)", size: 21, font: { ascii: "Times New Roman" } }),
]})
// ❌ Wrong — underlined part described as annotation text
new TextRun({ text: "1. It is a butterfly. (对划线部分提问) 注:划线部分为 a butterfly" })
```
### Font Hierarchy
| Element | Font | Size | Style |
|---------|------|------|-------|
| Section title | SimHei | 11pt (size:22) | Bold |
| Question content | SimSun | 10.5pt (size:21) | Normal |
| Points annotation | SimSun | 10pt (size:20) | In parentheses |
| Reading material | KaiTi/SimSun | 10.5pt (size:21) | KaiTi to differentiate |
| Notes/source | SimSun | 9pt (size:18) | Grey 666666 |
| Seal line | SimSun | 8pt (size:16) | Grey 999999 |
| Page number | SimSun | 9pt (size:18) | Centered |
### Line Spacing
```js
line: 360 // ~1.5x for readability
answerLine: 500 // Answer line spacing for writing room
```
### Paragraph Rules
- ⚠️ **Never use consecutive returns for whitespace** — use `spacing.before/after`
- Chinese questions use Chinese punctuation; English materials use English punctuation
- Mixed CN/EN: use Times New Roman or Calibri for English text
---
## 6. Multiple Choice Layout
### Core Rule
⚠️ **Options must NEVER be aligned with spaces!** Must use borderless tables.
### Option Layout — Borderless Table
```js
// Short options: 4 columns in 1 row
new Table({
columnWidths: [2200, 2200, 2200, 2200],
rows: [new TableRow({ children: ["A","B","C","D"].map((label, i) =>
new TableCell({ borders: NBs, width: { size: 2200, type: WidthType.DXA },
margins: { top: 0, bottom: 0, left: 60, right: 60 },
children: [new Paragraph({ spacing: { before: 0, after: 0 },
children: [new TextRun({ text: `${label}. ${options[i]}`, size: 21, font: "SimSun" })] })]
})
) })]
})
// Medium options: 2 columns, 2 rows
// Long options: 1 column, 4 rows
```
### Option Length Detection
```js
function getOptionLayout(options) {
const maxLen = Math.max(...options.map(o => o.length));
if (maxLen <= 6) return "4col";
if (maxLen <= 15) return "2col";
return "1col";
}
```
---
## 7. Fill-in-the-Blank Layout
```js
// Blank line length matches expected answer:
// Short answer (number/word): 8 underscores
// Medium (phrase): 14 underscores
// Long (sentence): 20 underscores
new Paragraph({ spacing: { before: 140, after: 80, line: 400 },
children: [new TextRun({ text: `${num}. Question text ________________.`, size: 21, font: "SimSun" })] })
```
⚠️ Fill-in lines must not break across lines — if line is too long, put the blank on the next line.
---
## 8. Short Answer / Problem-Solving Layout
### Question + Points
```js
new Paragraph({ spacing: { before: 200, after: 60, line: 360 }, keepNext: true,
children: [new TextRun({ text: `${num}. (${points} pts) ${question}`, size: 21, font: "SimSun" })] })
```
### Answer Lines
```js
// Light grey answer lines (CCCCCC), NOT black
// ⚠️ Answer lines are ONLY for writing space within each question — never as dividers between questions
function answerLines(count) {
return Array(count).fill(null).map(() =>
new Paragraph({ spacing: { before: 0, after: 0, line: 500 },
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" } },
children: [new TextRun({ text: " ", size: 21 })] })
);
}
```
⚠️ **Separation between questions:**
Use **only spacing** (`spacing.before: 200`) for visual separation between questions. **Forbidden:**
- ❌ Grey horizontal lines (borders)
- ❌ Color block dividers (Table-simulated separators)
- ❌ Symbol dividers (e.g., `───────`)
- ❌ Any visual separator decoration
### Answer Space vs. Points
| Points | Suggested Lines | Description |
|--------|----------------|-------------|
| 24 | 34 lines | Simple calculation / short answer |
| 58 | 68 lines | Medium problem |
| 1012 | 810 lines | Complex problem |
| 1420 | 1014 lines | Comprehensive / essay question |
---
## 9. Source-Based / Reading Question Layout
### Material vs. Question Separation
```js
// Material area — indented + KaiTi to differentiate
new Paragraph({ indent: { left: 420, right: 420 }, spacing: { before: 100, after: 100, line: 380 },
children: [new TextRun({ text: materialText, size: 21, font: "KaiTi" })] })
// Source attribution
new Paragraph({ alignment: AlignmentType.RIGHT, indent: { right: 420 },
children: [new TextRun({ text: "— from \"XXX\"", size: 18, color: "666666", font: "SimSun" })] })
```
### Key Principles
- Material title, source, body, and notes use different fonts
- Long materials: increase line spacing (line: 380400)
- Material and corresponding questions on same page
- Sub-question numbers (1)(2)(3) clearly correspond to material
- **Data tables in materials MUST use proper docx `Table` objects** — never render tabular data as Markdown plain text (`| col | col |`). This includes statistics tables, climate data tables, comparison tables, and any structured data within question materials. Use bordered tables (see § 13 Table Usage Standards) with appropriate header row styling.
---
## 10. Composition / Writing Area
### Grid Count Calculation
⚠️ **Grid count must exceed required word count by 2030%** (for title, paragraph indents, line breaks).
| Required Words | Min Grid Count | Recommended Layout |
|---------------|---------------|-------------------|
| 400 | 500 | 25 rows × 20 cols |
| 600 | 750 | 38 rows × 20 cols |
| 800 | 1000 | 50 rows × 20 cols |
| 1000 | 1250 | 63 rows × 20 cols |
```js
function calcGridSize(requiredWords, colsPerRow = 20) {
const totalCells = Math.ceil(requiredWords * 1.25);
const rows = Math.ceil(totalCells / colsPerRow);
return { rows, colsPerRow, totalCells: rows * colsPerRow };
}
```
### Chinese Composition Grid
```js
function compositionGrid(rows, colsPerRow) {
const cellSize = Math.floor(8800 / colsPerRow);
return new Table({
columnWidths: Array(colsPerRow).fill(cellSize),
rows: Array(rows).fill(null).map(() =>
new TableRow({
height: { value: cellSize, rule: HeightRule.EXACT },
children: Array(colsPerRow).fill(null).map(() =>
new TableCell({ borders: thinBs("DDDDDD"), width: { size: cellSize, type: WidthType.DXA },
children: [new Paragraph({ children: [] })] })
)
})
)
});
}
```
### English Writing Area (Horizontal Lines) — MANDATORY for English Writing Questions
⚠️ **Every English writing/composition question MUST include ruled horizontal lines.** A blank area without lines is FORBIDDEN — students need lines to write on.
```js
function writingLines(count) {
return Array(count).fill(null).map(() =>
new Paragraph({ spacing: { before: 0, after: 0, line: 560 },
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" } },
children: [new TextRun({ text: " ", size: 21 })] })
);
}
```
**Line count by word requirement:**
| Required Words | Lines |
|---------------|-------|
| ≤50 | 8 |
| 5080 | 10 |
| 80120 | 12 |
| 120+ | 15 |
**Rules:**
1. Lines must appear immediately after the writing prompt paragraph
2. Line color: light grey `CCCCCC` (print-friendly, not visually heavy)
3. Line spacing: `line: 560` (provides adequate writing room)
4. Chinese composition uses grid (`compositionGrid`), English uses lines (`writingLines`) — never mix them up
```
### Composition Area Requirements
- Independent section or clear separation
- Title space reserved (for self-chosen topics)
- Word count prompt visible ("No fewer than 800 words" / "About 120 words")
- Grid/line colors light — must not interfere with writing
- Pages continuous, not split
---
## 11. Answer Key (参考答案)
### Output Rules
1. **Default (user does not request answers in the same file):** Generate the answer key as a **separate .docx file** (e.g., `exam.docx` + `exam_answers.docx`). This prevents students from accidentally seeing answers.
2. **User explicitly requests answers in the same file:** Place the answer key on an **independent page** using `SectionType.NEXT_PAGE`. Answer key MUST NOT appear on the same page as any exam question.
### Separate File Format (Default)
The answer key file should include:
- Title: "《{exam title}》参考答案" (SimHei, 14pt/size:28, bold, centered)
- Same question numbering as the exam
- Concise answers (letter choices, key words, short solutions)
- Font: SimSun 10.5pt (size: 21)
### Same File Format (When User Requests)
```js
// Answer key as a separate section — MUST use SectionType.NEXT_PAGE
{
properties: { type: SectionType.NEXT_PAGE,
page: { margin: { top: 850, bottom: 850, left: 1200, right: 1200 } } },
children: [
new Paragraph({
alignment: AlignmentType.CENTER, spacing: { after: 300 },
children: [new TextRun({ text: "参考答案", size: 28, bold: true,
font: { eastAsia: "SimHei" } })],
}),
// ... answer content paragraphs
],
}
```
### Rules
1. ⚠️ **Never place answer content directly after the last question without a page/section break**
2. Answer content should be concise — no answer lines, no grid, plain text only
3. Calculation/proof questions: show key steps, not just final answer
4. If the exam has figures, answers may reference "see Figure X" without re-embedding
---
## 12. Figures & Illustrations
### Image Insertion
```js
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { before: 100, after: 60 },
children: [new ImageRun({ data: imageBuffer, transformation: { width: 300, height: 200 }, type: "png" })] })
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { after: 100 },
children: [new TextRun({ text: "(Figure 1)", size: 18, color: "666666", font: "SimSun" })] })
```
### Key Principles
- Images set as inline (default) to prevent floating
- Resolution sufficient for print clarity
- **B&W print compatible:** images must remain distinguishable when printed in grayscale
- Figure numbers and captions complete
- Figures adjacent to corresponding questions
- Maps must have: scale bar, north arrow, legend
- Coordinate graphs must have: axis labels, tick marks, units
### ⚠️ Figure-Text Order (Strictly Enforced)
**For questions with figures, element order must be:**
```
1. Question stem (keepNext: true)
2. Figure (centered, keepNext: true)
3. Answer lines / answer area
```
**Forbidden:** answer lines between stem and figure, or figure after answer lines.
### Figure Content Matching
- **Figures must be semantically consistent with question stem:** if question says "triangle ABC", figure must label vertices A, B, C
- Geometry annotations must match described angles, side lengths
- Function graphs must mark key points mentioned in the question
- Physics experiment diagrams must match described apparatus
- Figure width: geometry ≤ 50% page width, data/experiment ≤ 70%
### ⚠️ Figure Diversity Rule (Mandatory)
**No duplicate figures in the entire paper.** Even if two questions involve the same type (e.g., both triangles), each must have a distinct figure:
1. Different labels (different vertex letters, angles, side lengths)
2. Different shapes (acute vs. right vs. obtuse triangle)
3. Different styling (if applicable)
If using matplotlib, each call must use **different parameters and data** — never copy the same generation code.
### Subject-Specific Figure Requirements
| Subject | Common Types | Special Requirements |
|---------|-------------|---------------------|
| Math | Geometry, functions, coordinates | No distortion, clear labels |
| Physics | Circuits, mechanics, apparatus | Standard symbols, correct arrows |
| Chemistry | Apparatus, molecular structures | Reagent names labeled |
| Biology | Cell, organ, ecosystem diagrams | Labels not too small |
| Geography | Maps, contour lines, statistics | Legend + scale + north arrow |
---
## 13. Formulas & Special Symbols
### Formulas
Math/physics/chemistry formulas use **LaTeX → docx-js Math mapping** (see `references/math-formulas.md`):
- Basic (fractions, sub/superscript, roots) → docx-js Math components
- Complex (3+ nesting, matrices) → matplotlib PNG fallback
- Never hand-type Unicode formula approximations
### Common Unicode Math Symbols
```
× ÷ ± ∓ ≠ ≈ ≤ ≥ ∞ √ ∑ ∏ ∫ ∂ ∆ ∇
α β γ δ ε θ λ μ π σ φ ω
⊂ ⊃ ∈ ∉ ∩ ∅ ∀ ∃
→ ← ↑ ↓ ⇒ ⇔ ° ″ ‰ ² ³ ⁴ ⁿ ₁ ₂ ₃
```
### Chemical Formulas
Subscripts/superscripts must be correct: H₂O, CO₂, Fe₂O₃, Ca(OH)₂
Reaction arrows: → ⇌ ↑ ↓
---
## 14. Table Usage Standards
### Borderless Tables (for alignment)
For: option alignment, info rows, question number + points alignment
```js
const NB = { style: BorderStyle.NONE, size: 0, color: "FFFFFF" };
const NBs = { top: NB, bottom: NB, left: NB, right: NB };
```
### Bordered Tables (for data display)
For: score tables, data tables, statistics
```js
const thinB = (c="000000") => ({ style: BorderStyle.SINGLE, size: 1, color: c });
const thinBs = (c="000000") => ({ top: thinB(c), bottom: thinB(c), left: thinB(c), right: thinB(c) });
```
### Table Standards
- Cell padding moderate (margins: top/bottom 4060, left/right 6080)
- Consistent border thickness
- Header row: light grey F0F0F0 background
- Avoid cross-page tables
- Tables centered (`alignment: AlignmentType.CENTER`)
---
## 15. Headers & Footers
### Page Numbers
```js
footers: { default: new Footer({ children: [
new Paragraph({ alignment: AlignmentType.CENTER,
children: [
new TextRun({ children: [PageNumber.CURRENT], size: 18, font: "SimSun" }),
] })
] }) }
```
⚠️ **Denominator FORBIDDEN** — never use `PageNumber.TOTAL_PAGES` or "Page X of Y". Show only current page number.
### Headers
- May contain seal line prompt or subject name
- Small font (89pt), grey color (999999)
- Should not be visually heavy — must not compete with content
---
## 16. Subject-Specific Standards
### Chinese Language
- Reading, classical poetry, composition: **no columns**
- Poetry preserves original line breaks
- Classical text needs annotation area (smaller font, indented)
- Composition grid in independent section, grid count via `calcGridSize` (800 words → 50×20 = 1000 cells)
- Dictation questions: horizontal lines, moderate length
- Reading materials: use KaiTi to differentiate
### Mathematics
- Multiple choice, fill-in: suitable for neat layout
- Formulas: Unicode symbols or OOXML
- Geometry/function graphs must be clear, undistorted
- Problem-solving: sufficient working space
- Coordinate graphs: labeled axes, tick marks
### English
- English font: Times New Roman, moderate character spacing
- Cloze: numbers in text, options after passage
- Reading comprehension: material + questions as groups
- Writing area: horizontal lines, not grid
- Listening (if any): numbers aligned with options
### Physics / Chemistry / Biology
- Experiment/apparatus diagrams must be clear and accurate
- Unit symbols standardized (m/s, kg, mol/L, etc.)
- Chemical formula subscripts correct
- Calculation and experiment analysis: sufficient answer space
- Biology structure diagrams: labels not too small
### History / Politics
- Source-based questions are lengthy — **no columns**
- Dates, figures, events clearly labeled
- Essay questions: more whitespace than multiple choice
- Historical sources cite provenance
- Chart materials in logical order
### Geography
- Maps are the focus — must be clear
- Legend, scale bar, north arrow required
- Map and question close together — avoid page turns
- Map reading questions: balance figure and text space
- Contour line values clearly labeled
---
## Final Review Checklist
After generating an exam paper, check every item:
- [ ] Question numbers sequential, points correct, total correct
- [ ] Question stems match options / materials / illustrations one-to-one
- [ ] **Figures come after stem, before answer area** (strict order)
- [ ] **Figure content matches question semantics** (labels, symbols match)
- [ ] **Composition grid count ≥ required words × 1.25** (800 words → at least 1000 cells)
- [ ] Options aligned with borderless tables (not spaces)
- [ ] No wrong pages, missing pages, **no extra blank pages**
- [ ] Images / tables / formulas positioned correctly
- [ ] **No Markdown table syntax in document** — all data tables use proper docx Table objects
- [ ] Fonts, sizes, line spacing consistent
- [ ] Answer space matches difficulty and point value
- [ ] Clear when printed in B&W
- [ ] Subject-specific layout handled properly
- [ ] Seal line / page numbers / headers formatted correctly
- [ ] Header info complete (school, subject, duration, total score)
- [ ] **No extra PageBreak at end of last section**
- [ ] **Answer key is either a separate file (default) or on a separate page (if user requested in same file)** — never on the same page as questions

View File

@@ -0,0 +1,411 @@
# Scene: Official Document (Government Notice / Letter / Reply / Minutes)
## Goal
Generate a complete, formal, properly structured official document ready for Word delivery. Must simultaneously meet:
- Correct document type, complete structure, clear elements
- Formal government register, stable hierarchy, reliable layout
- Ready for approval, circulation, filing, issuance, or formal internal communication
**Forbidden:** Producing outlines-only / sample paragraphs / writing advice / half-finished drafts; outputting chat-style explanations.
→ Placeholder convention & universal prohibitions — see `references/common-rules.md`
**Note:** This scene uses its OWN font and layout specs (not Profile A defaults), because official documents follow GB/T 9704 standards.
---
## Scope & Document Type Boundaries
This scene covers:
1. **Notice** — assigning work, communicating requirements, forwarding documents
2. **Official Letter** — between non-subordinate organizations: negotiation, inquiry, assistance requests, replies
3. **Reply (to Request)** — superior authority answering a subordinate's formal request
4. **Meeting Minutes** — recording key outcomes and agreed items
**Important boundaries:**
- "Red header" is a format/layout, not a document type — it typically carries notices, letters, or replies
- **Not all official documents need red headers / document numbers / colophons** — only enable when user explicitly requests "red header format", "GB/T 9704 format", or "formal issuance format"
- Internal enterprise notices, business letters, meeting minutes often do NOT use full GB/T standard format
- This scene does NOT cover: speeches, press releases, promotional materials, papers, summary reports, contracts, or legal opinions
---
## Document Type Routing
```js
function selectOfficialType(keywords, purpose) {
if (/minutes|meeting/.test(keywords)) return "minutes";
if (/reply|respond to request/.test(keywords)) return "reply";
if (/letter|inquiry|negotiation/.test(keywords)) return "letter";
return "notice"; // default
}
```
### Red Header Activation
```js
function needsRedHeader(userRequest) {
// Only activate when explicitly requested
return /red header|GB\/T 9704|formal issuance|official format/.test(userRequest);
}
```
**Rules:**
- `needsRedHeader = true` → Enable red header, document number, colophon (full formal elements)
- `needsRedHeader = false` → Maintain formal style but no mandatory red header; keep only title + addressee + body + signature
---
## Standard Template Structures
### Template A: Notice
1. Red header area (if applicable)
2. Document number (if applicable)
3. Title
4. Addressee
5. Reason for issuance
6. "The relevant matters are hereby notified as follows:"
7. Notice items (expanded by hierarchy)
8. Requirements
9. Attachment notes (if any)
10. Signature (if applicable)
11. Date (if applicable)
12. Colophon (if applicable)
**Closing phrase:** "This notice is hereby given." or "Please implement accordingly."
### Template B: Official Letter
1. Red header area (if applicable)
2. Document number (if applicable)
3. Title
4. Addressee
5. Reason / reference to incoming letter
6. Negotiation / inquiry / reply items
7. Closing
8. Signature (if applicable)
9. Date (if applicable)
10. Colophon (if applicable)
**Closing phrases:** "Please reply by letter." / "This letter is hereby sent." / "This is in reply."
### Template C: Reply
111. Similar to Notice structure
- Addressee is typically the single requesting organization
- Must reference the incoming request document
- "After review, the reply is as follows:"
- Closing: "This is the reply."
### Template D: Meeting Minutes
1. Title (meeting name + "Minutes")
2. Meeting overview (time, place, chair, attendees)
3. Agreed items
4. Responsibility assignments / follow-up requirements (if applicable)
5. Distribution scope (if applicable)
**Notes:**
- Minutes record "agreed items", not a transcript of speeches
- Minutes generally do NOT follow standard red header format
- Unless user explicitly requests organizational template compliance
---
## Input Recognition & Completion
### Processing Rules
1. If user provides a template, historical document, or organizational standard → **always follow it first**
2. If information is incomplete → fill conservatively, formally, and appropriately for the government context
3. **Never fabricate** policy bases, incoming document numbers, leadership directives, meeting decisions, or official organization names
4. If critical info is missing → use standardized placeholders
5. Never present a draft as if it were already formally issued
---
## Title Drafting Rules
The title is the most critical identifying element — must accurately, concisely reflect the issuing body, subject matter, and document type.
| Type | Format | Example |
|------|--------|---------|
| Notice | Issuing body + "regarding" + subject + "notice" | XX Municipal Government Notice on Issuing the XX Management Measures |
| Letter | Issuing body + "regarding" + subject + "letter" | XX Company Letter Regarding Land Use for XX Project |
| Reply | Issuing body + "regarding" + subject + "reply" | XX Bureau Reply on Approving Establishment of XX Branch |
| Minutes | Meeting name + "minutes" | XX Company Third General Manager Meeting Minutes |
**Rules:**
1. Title must specify the subject — no vague titles ("Notice on Relevant Matters")
2. Titles generally do not use periods
3. Title length should be moderate — avoid excessive length
---
## Addressee & CC
### Addressee
1. The primary recipient of the document
2. On its own line, between title and body
3. Followed by full-width colon
4. Replies typically address only one requesting organization
5. Meeting minutes generally do not have a standard addressee
### CC (Carbon Copy)
1. CC recipients are NOT addressees — do not mix them
2. CC information typically appears in the colophon area
3. Non-red-header documents should not mechanically add "CC:" lines
---
## Writing Style & Register
### Language Style
1. Must be **solemn, plain, precise, rigorous, concise**
2. **Forbidden:** Literary devices (metaphor, personification, hyperbole, rhetorical questions, exclamations)
3. **Forbidden:** Vague expressions ("approximately", "recently", "relevant departments", "as soon as possible") — unless user explicitly requires vague wording
4. Time, location, organization, scope, milestones should be as specific as possible
5. No sloganeering filler or obvious "AI boilerplate" feel
### Common Phrase Patterns
**Purpose phrases:**
- "In order to implement..."
- "To further standardize..."
- "To effectively carry out..."
**Basis phrases:**
- "In accordance with the provisions of..."
- "As required by..."
- "Pursuant to relevant regulations"
**Transition phrases:**
- Notice: "The relevant matters are hereby notified as follows:"
- Letter: "The following is hereby communicated:"
- Reply: "After review, the reply is as follows:"
- Minutes: "The agreed items of the meeting are recorded as follows:"
**Closing phrases (must match document type):**
- Notice: "This notice is hereby given."
- Letter: "Please reply." / "This is hereby communicated." / "This is in reply."
- Reply: "This is the reply."
- Minutes: generally no fixed closing phrase
### Conciseness
1. Use "because" not "due to the reason that..."
2. Use "to" not "for the purpose of..."
3. Name specific entities — not "relevant parties" or "related departments"
4. Name responsible units — not "all units should ensure implementation" (vague ending)
---
## Body Hierarchy & Numbering
Official document body must strictly follow the standard Chinese government numbering system:
```
I. General matters
(1) Sub-items
1. Specific points
(1) Detail supplements
```
Original Chinese numbering:
```
一、General matters
Sub-items
1. Specific points
1Detail supplements
```
**Rules:**
1. No level-skipping
2. **Forbidden:** Markdown list markers (`-` `*`)
3. No switching between numbering styles at the same level
4. Level 1: major tasks; Level 2: sub-items; Levels 34: only when truly necessary
---
## Truthfulness & Caution
1. **Never fabricate** issuing bodies, incoming organizations, document numbers, leadership directives, meeting decisions, or policy bases
2. **Never** write "per the spirit of XX meeting" or "per XX directive" unless user explicitly provides these
3. **Never** fabricate titles and numbers of referenced documents in replies or letters
4. **Never** present a draft as already formally issued
5. When information is insufficient → use placeholders, never pretend elements are complete
---
## Attachment Notes
1. Placed after body text, before signature
2. "Attachment:" followed by attachment name
3. Multiple attachments: numbered sequentially (Attachment 1, Attachment 2...)
4. Attachment names must be clear and specific — never fabricate unknown attachments
---
## Signature & Date
1. Document types requiring signatures should have issuing body name and date
2. Not all types mechanically require signatures (minutes typically do not)
3. Formal document dates must use Chinese numeral format with proper "" character
- Example: March 31, 2026 → 二〇二六年三月三十一日
4. Document numbers use tortoiseshell brackets "" (not square brackets "[]")
- Example: X政发20261号
5. Date format must be consistent throughout
---
## Palette
**NO decorative colors.** Pure black text on white background. The only color is red header text.
```js
const palette = { primary:"#000000", body:"#000000", accent:"#000000", surface:"#FFFFFF" };
const RED_HEADER = "FF0000"; // Only for red header text
```
---
## Page Layout (GB/T 9704-2012 Standard)
**Only for formal GB/T red-header documents.** Non-GB/T scenarios may use standard margins.
| Property | Value | Twips |
|----------|-------|-------|
| Top margin | 3.7 cm | 2098 |
| Bottom margin | 3.5 cm | 1984 |
| Left margin | 2.8 cm | 1588 |
| Right margin | 2.6 cm | 1474 |
```js
// GB/T red header layout
page: { size: { width: 11906, height: 16838 }, margin: { top: 2098, bottom: 1984, left: 1588, right: 1474 } }
// Non-GB/T formal documents may use standard margins:
// margin: { top: 1440, bottom: 1440, left: 1701, right: 1417 }
```
---
## Font Specifications (GB/T 9704)
| Element | Font | Size | Style |
|---------|------|------|-------|
| Red header org name | STXiaoBiaoSong / SimSun Bold | As determined by org | Red (#FF0000), centered |
| Document title | STXiaoBiaoSong / SimSun Bold | Er Hao 22pt (size: 44) | Centered |
**Font fallback for STXiaoBiaoSong:** This font is not installed by default on all systems. WPS ships FZXiaoBiaoSong-S13 instead. Use this fallback chain:
- Preferred: `STXiaoBiaoSong` (华文小标宋)
- Fallback 1: `FZXiaoBiaoSong-S13` (方正小标宋, available in WPS)
- Fallback 2: `SimSun` with Bold (宋体加粗, universally available)
In code, set primary font and note the fallback:
```js
font: { eastAsia: "STXiaoBiaoSong" }
// Fallback: FZXiaoBiaoSong-S13 → SimSun Bold. User may need to install STXiaoBiaoSong for exact rendering.
```
| Addressee | FangSong | San Hao 16pt (size: 32) | Left-aligned |
| Body | FangSong | San Hao 16pt (size: 32) | Justified, indent 640 |
| Level 1 heading | SimHei | San Hao 16pt (size: 32) | Bold |
| Level 2 heading | KaiTi | San Hao 16pt (size: 32) | Normal |
| Level 3 heading | FangSong | San Hao 16pt (size: 32) | Bold |
| Attachment notes | FangSong | San Hao 16pt (size: 32) | Left-aligned |
| Signature/date | FangSong | San Hao 16pt (size: 32) | Right-aligned |
| Page number | FangSong | Si Hao 14pt (size: 28) | Centered, "— X —" |
```js
styles: {
default: {
document: {
run: { font: { ascii: "Times New Roman", eastAsia: "FangSong" }, size: 32, color: "000000" },
paragraph: { spacing: { line: 560 } }, // Fixed 28pt line spacing
},
heading1: {
run: { font: { eastAsia: "SimHei" }, size: 32, bold: true, color: "000000" },
},
heading2: {
run: { font: { eastAsia: "KaiTi" }, size: 32, color: "000000" },
},
},
}
```
**Note:** For "formal administrative style" (not strict GB/T), retain the style logic but do not rigidly require every GB/T element.
---
## Code Examples
### Red Header (red-header documents only)
```js
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { before: 0, after: 200, line: Math.ceil(26 * 23), lineRule: "atLeast" },
children: [new TextRun({ text: "XX Municipal Government", font: { eastAsia: "SimSun" },
size: 52, bold: true, color: "FF0000" })] })
new Paragraph({ border: { bottom: { style: BorderStyle.SINGLE, size: 4, color: "FF0000" } },
spacing: { after: 40 }, children: [] })
```
### Page Number Footer
```js
footers: { default: new Footer({ children: [new Paragraph({
alignment: AlignmentType.CENTER,
children: [
new TextRun({ text: "\u2014 ", size: 28 }),
new TextRun({ children: [PageNumber.CURRENT], size: 28 }),
new TextRun({ text: " \u2014", size: 28 }),
],
})] }) }
```
---
## Style Rules
1. **Strictly follow official document format — no decorative elements**
2. NO cover page
3. NO TOC
4. NO headers (only page numbers in footer)
5. NO colors except red header (red-header documents only)
6. NO images or charts (unless integral to document content)
7. NO fancy fonts — only FangSong, SimHei, KaiTi, STXiaoBiaoSong
8. Line spacing: fixed 28pt (`line: 560`) — **NOT** the default 1.5x
---
## Scene-Specific Prohibitions
In addition to universal prohibitions (see `references/common-rules.md`):
1. Must not write official documents as chat replies, promotional copy, speeches, or papers
2. Must not use Markdown headings/lists/bold/italic for document hierarchy
3. Must not apply red header/document number/colophon to all document types indiscriminately
4. Must not format meeting minutes as a standard red-header notice
5. Must not use literary rhetoric, colloquial expressions, or strongly emotional language
6. Must not fabricate incoming documents, policies, document numbers, meeting decisions, or superior directives
7. Must not use excessive blank lines to create "formal appearance"
8. Must not let the document read like a report, paper, or marketing copy
---
## Scene-Specific Quality Checks
In addition to universal checks (see `references/common-rules.md`):
### Format
- [ ] Red header text is #FF0000 and only red header uses color (red-header scenarios)
- [ ] Line spacing fixed at 28pt (line: 560)
- [ ] FangSong / SimHei / KaiTi correctly applied
- [ ] Signature right-aligned, date format correct
- [ ] No cover page, no TOC, no header
- [ ] Page number format "— X —"
- [ ] Red header / document number / colophon only where appropriate
### Content
- [ ] Document type correctly identified, structure matches
- [ ] Title is accurate, specific, document type clear (not vague)
- [ ] Addressee, attachments, signature, colophon used appropriately
- [ ] Closing phrase matches document type
- [ ] Body hierarchy strictly follows: 一、(Level 1) →(一)(Level 2) → 1. (Level 3) →1(Level 4)
- [ ] No Markdown headings/lists/bold/italic mixed in
- [ ] Meeting minutes not incorrectly given standard document signature and colophon
- [ ] Date uses Chinese numerals with proper "" character
- [ ] Document number uses tortoiseshell brackets ""
- [ ] No fabricated incoming documents / policy bases / organizational elements
- [ ] Register is solemn and plain — no colloquial / literary / promotional tone

340
skills/docx/scenes/report.md Executable file
View File

@@ -0,0 +1,340 @@
# Scene: Report / Proposal
## Goal
Generate a complete, formal, well-structured report ready for Word delivery. Must simultaneously meet:
- Complete structure, clear logic, formal language, definitive conclusions
- Objective data presentation, proper Word formatting
- Ready for presentation, filing, review, submission, or internal communication
**Forbidden:** Producing outlines-only / summaries / template annotations / half-finished drafts; outputting chat-style explanations or filler phrases like "here is the report content".
→ Font profile: **A (Formal)** — see `references/common-rules.md`
→ Default layout: standard margins — see `references/common-rules.md`
→ Placeholder convention — see `references/common-rules.md`
→ Universal prohibitions & quality checks — see `references/common-rules.md`
---
## Report Type Routing
Auto-select structure and expression style based on user intent. If not explicit, infer from topic.
```js
function selectReportType(keywords, topic) {
if (/analysis|competitor|industry|operations|data/.test(keywords)) return "analysis";
if (/experiment|lab|algorithm|engineering/.test(keywords)) return "experiment";
if (/test|QA|performance|security|compatibility/.test(keywords)) return "testing";
if (/survey|questionnaire|interview|market research/.test(keywords)) return "research";
if (/review|retrospective|post-mortem|summary/.test(keywords)) return "review";
if (/proposal|feasibility|implementation|optimization/.test(keywords)) return "proposal";
return "analysis"; // default
}
```
### 6 Report Types
| Type | Use Case | Structure Focus | Expression Focus |
|------|----------|----------------|-----------------|
| analysis | Industry/competitor/operations/data analysis | Background → Dimensions → Findings → Diagnosis → Recommendations | Conclusion-first, clear dimensions, chart-supported, actionable advice |
| experiment | Scientific/academic/algorithm/engineering experiments | Objective → Environment → Method → Results → Error → Conclusion | Precise process, clear conditions, objective results, conclusion ties to hypothesis |
| testing | Functional/performance/security/compatibility testing | Overview → Scope → Plan → Results → Defects → Risks → Conclusion | Data-driven, traceable, reproducible, supports go/no-go decisions |
| research | User/market/survey/interview research | Background → Subjects & Method → Sample → Findings → Synthesis → Recommendations | Clear sample boundaries, layered findings, recommendations match findings |
| review | Project/incident retrospective, phase summary | Goals → Review → Results → Issues → Lessons → Actions | Clear facts, restrained attribution, specific action items |
| proposal | Project/optimization proposal, feasibility study | Status → Goals → Solution → Roadmap → Resources → Risks → Benefits | Strong argumentation, executable plan, clear boundaries |
---
## Standard Template Structures
### Template A: Analysis Report
1. Executive Summary
2. Background & Objectives
3. Scope, Data Sources & Methodology
4. Core Findings
5. Problem Diagnosis & Root Cause
6. Conclusions & Recommendations
7. Appendices (if needed)
### Template B: Experiment Report
1. Abstract
2. Objective & Hypothesis
3. Environment & Materials
4. Procedure & Method
5. Data & Results
6. Error Analysis & Discussion
7. Conclusions
8. Appendices (if needed)
### Template C: Testing Report
1. Test Overview
2. Test Scope & Environment
3. Test Plan & Case Design
4. Test Results Summary
5. Defect Analysis & Distribution
6. Risk Assessment & Outstanding Issues
7. Test Conclusions
8. Appendices (if needed)
### Template D: Research Report
1. Research Summary
2. Background & Objectives
3. Subjects & Methodology
4. Sample & Data Description
5. Core Findings
6. Problem Synthesis
7. Recommendations & Action Direction
8. Appendices (if needed)
### Template E: Review / Summary Report
1. Overview
2. Goals & Scope
3. Process Review
4. Results Summary
5. Issues & Root Cause Analysis
6. Lessons Learned
7. Follow-up Action Plan
8. Appendices (if needed)
### Template F: Proposal / Feasibility Report
1. Executive Summary
2. Current State & Problem Analysis
3. Goals & Expected Outcomes
4. Solution Design
5. Implementation Roadmap & Milestones
6. Resource Requirements & Budget
7. Risk Analysis & Mitigation
8. Expected Benefits & Evaluation
9. Appendices (if needed)
**If the user provides a company/school/course template or fixed chapter requirements, always follow those first.**
---
## Input Recognition & Completion
### User May Provide
Report topic, type, use case, audience, industry, length requirements, data sources, structure requirements, output purpose (presentation/filing/audit/review/external submission/coursework), template files, company/department/project/author/date, etc.
### Processing Rules
1. If the user provides a template, existing document, company standard, or format example, **always follow it first**
2. If information is incomplete, fill in conservatively — completions must be **restrained, natural, credible, professional**
3. **Never fabricate** unrealistic data, conclusions, test results, business metrics, project statuses, policy backgrounds, or customer feedback
4. If critical information is missing and cannot be safely inferred, use standardized placeholders
5. If no real data is available, prefer low-hallucination approaches: "status description → analysis framework → problem synthesis → recommendations"
---
## Content Quality Constraints
### Logic & Structure
1. Report must revolve around a clear topic, objective, audience, and through-line
2. Must not just pile up background/concepts/vague statements — must demonstrate analysis, synthesis, judgment, comparison, or review value
3. Terminology must be consistent throughout — concepts must not drift
4. Abstract, body, conclusions, and recommendations must be consistent — no self-contradiction
5. Must form a complete loop: "background → objective → method/basis → process/status → findings/results → problems/judgment → recommendations/conclusions"
6. Each major chapter must have a clear core conclusion or topic sentence — no information dump
### Language Style
1. Formal, objective, restrained, professional
2. No colloquial expressions, chat tone, hyperbole, emotional language, or propaganda style
3. For management/decision-maker audience: conclusion-first, highlight key points, actionable recommendations
4. For technical/testing reports: clear basis, reproducible process, verifiable results, stated risks
### Data Expression
1. Never use vague expressions as main conclusions: "significantly improved", "obviously optimized", "performed well", "has certain issues"
2. If data exists, express quantitatively (e.g., "average response time under 200 ms" not "fast response")
3. First occurrence of a term: write full name with abbreviation, e.g., "Application Programming Interface (API)"
4. Without real data backing, never fabricate precise figures
5. Statements about facts, data, status, and results must be internally consistent
### Truthfulness & Conservative Generation
1. Never fabricate test results, experiment data, growth rates, customer counts, interview conclusions, sample distributions, or launch decisions
2. Never present speculation as proven fact
3. Never fabricate meeting minutes, regulatory bases, customer feedback, or system logs
4. When information is insufficient, use placeholders — never pretend information is complete
5. Conclusions must be restrained — do not overstate effects, risks, or value
6. Recommendations must be grounded in preceding analysis — no conclusions from thin air
---
## Chapter Content Requirements
### (1) Cover
1. Formal reports should have a cover page
2. Cover includes: title, subtitle (if any), organization/department, author, date, classification (if requested)
3. Cover must be a separate section
4. Cover does not display page numbers
5. Use `selectCoverRecipe()` for recipe + palette (see design-system.md)
6. Common recipes: general report R1, whitepaper R2, consulting R3, proposal R4
### (2) Executive Summary
1. Formal reports **must have** a summary opening — never jump directly into details
2. Summary should briefly state: background, objective, key methodology, key findings, main recommendations
3. Suitable for quick reading by management — generally 200400 words
4. Must not read like a TOC description or pile of background filler
### (3) Table of Contents
1. Medium-to-long formal reports should include a TOC
2. TOC must be generated from real heading styles (Heading + TOC field) — never write a fake TOC
3. TOC page is typically a separate page
4. TOC depth: usually 23 levels
### (4) Background & Objectives
1. Must explain why this report exists
2. Must state what problem/scenario/audience the report serves
3. If scope boundaries exist, state what the report does NOT cover
4. Must not be vague/grand background — must relate directly to this report's task
### (5) Methodology / Scope / Basis
1. Must state what materials, criteria, methods, and time range the report is based on
2. Analysis: data sources, analysis dimensions, criteria definitions
3. Experiment: environment, materials, samples, procedure principles
4. Testing: scope, version, environment, methods, coverage/rounds
5. Research: sample source, sample size, research method, time range
6. Reader must understand how conclusions were derived
### (6) Core Content / Process / Status / Results
1. Organized by logical or dimensional order — no chaotic piling
2. Each section should lead with its conclusion, then expand with evidence
3. Results must be specific — never just "performed well" or "has certain issues"
4. Data, metrics, phenomena, and comparisons must be clearly stated
5. If charts are needed but cannot be generated, use chart placeholders (see below)
### (7) Analysis / Discussion / Problem Diagnosis
1. Must not merely repeat earlier results
2. Must explain what results mean, what patterns they reveal, what problems they expose
3. May include: comparative analysis, root cause analysis, mechanism analysis, anomaly explanation, limitations, risk boundaries
4. Analysis must be consistent with preceding data and facts
### (8) Conclusions / Recommendations / Next Steps
1. Conclusions must respond to report objectives
2. Recommendations must be executable — not just principle slogans
3. Recommendations should state: who executes, what to do, when, expected improvement
4. Testing/review: clear verdict (pass / conditional pass / fail)
5. Retrospective/summary: specific follow-up action items
### (9) Appendices
1. Supplementary material valuable to the report but not suitable for the main body
2. Includes: raw data excerpts, detailed parameters, supplementary tables, sample screenshots
3. Appendices should be on separate pages with proper headings
---
## Chart Placeholder Convention
When charts are needed but cannot be directly generated:
```
[Chart Placeholder: Bar chart; Topic: Q1-Q4 2025 revenue comparison; X-axis: Quarter; Y-axis: Revenue (10K CNY); Style: clean business]
```
**Rules:**
- Specify: chart type, topic, axis meanings, key dimensions, optional palette suggestion
- Placeholder must be a standalone paragraph — never inline
- Never use vague placeholders like "insert chart here"
**Prefer direct generation:** Charts that can be produced via matplotlib should be generated as embedded PNGs. Placeholders are a fallback only.
---
## Content-to-Word Mapping
### Heading Levels
1. Strict hierarchy — no level-skipping
2. Headings must be informative — never "Background", "Content", "Other" (use "Project Background & Report Objectives" instead)
3. Do not mix multiple numbering systems
4. Normal paragraphs must not masquerade as headings
### Paragraphs
1. Do not use consecutive blank lines for visual spacing
2. Each paragraph should be a complete semantic unit — not too long or too fragmented
### Lists
1. Use lists only when genuinely needed — an entire report must not be bullet points
2. Nesting depth ≤ 3 levels
3. Consistent punctuation within a list (all complete sentences or all fragments)
4. Combine "key points" with "analysis paragraphs" — never just list without explaining
### Tables
1. Use tables only for structured data (statistics, comparisons, parameter lists)
2. Every table must have a header row — headers must not be blank
3. Avoid heavily merged-cell complex nested tables
4. Tables must have introductory and explanatory text before/after
5. Cell content should be concise — avoid long paragraphs inside cells
### Emphasis
1. Bold only for key conclusions, critical metrics, first occurrence of key terms
2. Never bold entire paragraphs
3. Avoid italic, strikethrough, and other unstable styles
---
## Palette Selection
| Report Type | Suggested Palette |
|-------------|-------------------|
| General | Neutral calm (primary: #101820) |
| Consulting | Warm terracotta |
| Tech | Cool dawn mist |
| Environment / Education | Warm sunshine |
| Medical | Cool mint |
See `references/design-system.md` for full palette definitions.
---
## Document Structure
1. **Cover** — via `selectCoverRecipe()` (see design-system.md)
- Separate section, page margin typically 0
- Common: general R1, whitepaper R2, consulting R3, proposal R4
2. **Table of Contents** — H1H3, separate section
3. **Executive Summary** — 1 page max
4. **Body** — Chapters per selected template (AF)
5. **Conclusions & Recommendations**
6. **Appendices** — Raw data, detailed tables
---
## Professional Elements
- **Page numbers**: bottom center, size 18, color "808080"
- **Header**: report title (abbreviated), size 18, color "808080"
- **Figure/table numbering**: sequential (Figure 1 / Table 1)
- **Cover**: no page number, no header/footer
- **TOC**: optional Roman numerals or no page numbers
- **Body**: Arabic numerals, continuous
---
## Scene-Specific Quality Checks
In addition to universal checks (see `references/common-rules.md`):
### Format
- [ ] Executive summary ≤ 1 page
- [ ] Figures/tables have captions ("Figure X: description" / "Table X: description")
- [ ] Cover recipe matches report type
- [ ] Data charts use palette accent color
### Content
- [ ] Has executive summary — not starting directly with details
- [ ] Heading names are specific and meaningful
- [ ] Complete loop: background → basis → content → analysis → conclusions/recommendations
- [ ] No fabricated or exaggerated details
- [ ] Abstract and conclusions are consistent
- [ ] Terminology consistent throughout
- [ ] Data expressions are quantified, not vague
- [ ] Recommendations are actionable with owners and timeline
### Structure
- [ ] Heading hierarchy has no level-skipping
- [ ] List nesting ≤ 3 levels
- [ ] Tables have headers with intro/explanation text
- [ ] Bold used sparingly for emphasis only

534
skills/docx/scenes/resume.md Executable file
View File

@@ -0,0 +1,534 @@
# Scene: Resume / CV
## Goal
Generate a complete, authentic, well-structured, position-targeted resume with stable Word formatting. Must simultaneously meet:
- Authentic and credible content, clear position targeting
- ATS-friendly, stable Word layout
- Clean structure, professional visual design, easy to scan
**Execution priority** (when conflicting): Position relevance > Information readability > ATS compatibility > Visual decoration
**Forbidden:** Producing advice-only / fragments / half-finished drafts; outputting chat-style explanations.
→ Font profile: **B (Visual)** — see `references/common-rules.md`
→ Placeholder convention & universal prohibitions — see `references/common-rules.md`
---
## Scope
Default: generate a position-oriented general resume. Switch to English resume, academic CV, international format, or design portfolio style only when explicitly requested by the user.
---
## Resume Type Routing
Auto-select module order based on user background and target:
### General Resume (default)
Name & Contact → Target Position → Profile Summary (optional) → Core Skills → Work Experience → Projects → Education → Certifications / Awards
### New Graduate Resume
Name & Contact → Target Position → Education → Internship Experience → Projects → Campus Activities / Competitions / Awards → Skills & Certifications
### Technical Role Resume
Name & Contact → Target Direction → Profile Summary (optional) → Tech Stack / Core Skills → Work Experience → Projects → Education → Open Source / Papers / Patents / Competitions
### Academic CV
Name & Contact → Research Direction / Target → Education → Research Experience → Papers / Patents / Projects / Grants → Teaching / Academic Service → Awards / Skills / Languages
---
## Input Processing Rules
1. If user provides a target position or JD → **must reorganize and rewrite content around position requirements**
2. If user provides a raw draft → prioritize restructuring, phrasing refinement, and priority reordering; do not rewrite into an unfamiliar career
3. **Never fabricate** companies, positions, degrees, projects, certifications, awards, papers, patents, data results, or achievements
4. If critical data is missing → use conservative expressions or placeholder `【Please fill in: ______】`; never fabricate precise numbers
5. A single resume should generally serve only one primary career direction
---
## Content Quality Constraints
### Core Principles
1. Resume must revolve around the target position — do not spread all experiences equally
2. Most relevant experiences, projects, and skills must be **placed first and detailed**
3. Terminology, company names, position titles, date formats, and skill names must be consistent
4. Must demonstrate: **personal positioning → capability tags → relevant experience → provable results**
5. No piling of vague self-praise; no inspirational writing or chronological dumps
### Experience Writing Standards
Each experience bullet should demonstrate: **Action + Object/Context + Method + Result/Impact**
**Recommended verbs:** Led, built, drove, optimized, refactored, designed, delivered, coordinated, improved, reduced, achieved
**Rules:**
- "Responsible for" / "participated in" are not absolutely forbidden, but must include scope and results
- Each bullet is concise — one core contribution per bullet
- Quantify when possible, but do not force-bold all numbers
- Recent experience gets detail; low-relevance/low-value experience gets compressed or removed
- Reverse chronological order — most recent and relevant first
- Expand the most recent 2 experiences; compress earlier ones
### Profile Summary / Self-Assessment
1. Not mandatory
2. If included, frame as "Profile Summary" — **34 lines max**
3. Focus on: years of experience, career direction, core capabilities, representative achievements, position fit
4. **Forbidden** as main content: "hardworking", "strong sense of responsibility", "team player", "quick learner", "outgoing personality"
### Truthfulness & Risk Control
1. Never fabricate experiences, achievements, education, awards, or certifications
2. Never upgrade "participated in" to "led" unless user information supports it
3. Never attribute team results entirely to the individual
4. Never fabricate revenue, conversion rates, headcount, budgets, or technical metrics
5. If no data available, use restrained expressions: "improved delivery efficiency", "shortened processing cycle", "supported core business launch"
---
## Length Control
| Candidate Type | Target Pages |
|---------------|-------------|
| New graduate / <3 years experience | **1 page** |
| 310 years experience | 12 pages |
| Senior manager / researcher / academic CV | May exceed 2 pages, but must maintain information density |
**Compression rules:**
- Experiences >5 years old with low relevance should be compressed
- Experiences >10 years old and irrelevant may be omitted
- Never pad low-value experiences just to "look comprehensive"
---
## ATS & Structure Constraints
1. Core information must be plain text — never rely on images, icons, text boxes, or headers/footers for key content
2. No embedded charts, objects, SmartArt, or WordArt
3. Experience descriptions use consistent bullet symbols — no complex auto-numbering
4. Bullets within the same position should be compact — no excess blank lines
**Table layout vs. ATS balance:** The 3 visual templates (A/B/C) use Table-based layouts for Word visual quality. In strict ATS scenarios (user explicitly says "ATS priority"), prefer Template B (single-column) with reduced table dependency. Default: visual quality first.
---
## Module Naming
Use only standard, universal, recruiter-familiar names:
- Personal Info, Target Position, Profile Summary, Core Skills, Work Experience, Projects, Education, Certifications, Awards, Languages
**Forbidden fancy names:** "My Growth Journey", "Self-Appreciation", "Shining Moments", "Life Motto"
---
## Template Disease Prevention
1. Do not include irrelevant identity tags (political affiliation, hometown, etc.) unless user explicitly requests
2. Do not place low-priority modules (hobbies, languages, personality traits) before work experience
3. Do not combine cover letter and resume in one document (unless user explicitly requests)
4. Do not let template feel overpower actual personal information
5. Do not let "self-assessment" occupy the golden area of the page (should come after core skills/experience)
---
## Template Selection
Three templates are provided, auto-selected based on user needs:
| Template | Layout | Best For | Color Style |
|----------|--------|----------|-------------|
| A | Left sidebar + right body | General purpose, tech roles | Dark grey sidebar + blue bar headings |
| B | Dark header banner + single column | Content-heavy / senior candidates | Dark blue header + underline headings |
| C | Left sidebar + vertical-line headings | International / bilingual / foreign companies | Blue sidebar + left-border headings |
**Selection logic:**
- Default: Template A
- Lots of content (expected > 1 page) → Template B (no sidebar, better space utilization)
- User explicitly requests bilingual / English → Template C
### Industry Color Suggestions
| Career Direction | Sidebar BG | Accent Color | Recommended Template |
|-----------------|-----------|-------------|---------------------|
| Tech / Internet | `#1A1F36` (deep blue-purple) | `#667eea` (amethyst) | A or C |
| Finance / Consulting | `#0F2027` (deep sea blue) | `#D4AF37` (gold) | A or B |
| Design / Creative | `#2D1B30` (deep purple) | `#f5576c` (coral pink) | A or C |
| Education / Training | `#1A3A3A` (dark green) | `#3CB4A0` (mint green) | A |
| Medical / Health | `#0E2030` (dark cyan) | `#3888A8` (medical blue) | B |
| General / Default | `#303030` (warm dark neutral) | `#B89870` (warm accent) | A |
When industry is unspecified, use default warm neutral palette. This aligns with the Visual Profile warm-neutral guidance in `design-system.md`.
## Key Rules
- **NO cover page / NO TOC**
- **Target: 1 page** (2 pages max for senior roles)
- **Compact spacing**: `line: 276` (1.15x)
- All templates use **bilingual section headings** (e.g., "Work Experience 工作经历")
---
## Template A: Left Sidebar + Color Bar Headings
### Color Palette
```js
const S = {
bg: "3B4F5C", // sidebar background (dark grey-blue)
text: "D8E2E8", // sidebar text
label: "8BA0AD", // sidebar secondary text
accent: "2F97B8", // accent color (blue-cyan)
title: "1A2D38", // body heading
body: "2C3E4A", // body content
sec: "6B8592", // secondary info (dates etc.)
};
```
### Layout Structure
```
┌──────────┬──────────────────────┐
│ [Photo] │ ██ Profile ██ │ ← Blue bar heading
│ │ Summary text... │
│ Name │ │
│ Title │ ██ Work Experience ██│
│ │ Company Role Date │
│ ──────── │ ▸ Achievement... │
│ Basic │ ▸ Achievement... │
│ Info │ │
│ │ ██ Projects ██ │
│ ──────── │ ... │
│ Contact │ │
│ │ ██ Education ██ │
│ ──────── │ ... │
│ Skills │ │
│ Java ●●●●○│ │
│ Go ●●●○○│ │
│ │ │
│ ──────── │ │
│ Certs │ │
└──────────┴──────────────────────┘
30% 70%
```
### Implementation Notes
**Page setup:**
```js
page: { margin: { top: 0, bottom: 0, left: 0, right: 0 } }
// Use Table to simulate columns: columnWidths: [3400, 8506]
// ⚠️ Row height must use "exact" with safety margin to prevent overflow blank pages
// Row height: height: { value: 16038, rule: "exact" }
// 16038 = 16838(A4 height) - 1200(safety margin for cross-engine compatibility)
```
**Sidebar element order:**
1. Photo placeholder (rectangle + border, width 2400 DXA, height 1800)
2. Name (32pt bold white SimHei) + Title (18pt accent)
3. Basic info (DOB / degree / school)
4. Contact info (phone / email / address)
5. Skill ratings (name + ●○ dot rating, 5 levels each)
6. Certificates list
**Right-side section headings (color bar style):**
```js
// Full-width bar background + white Chinese text + lighter English text
new Table({ columnWidths:[7600], rows:[new TableRow({ children:[
new TableCell({
shading: { fill: S.accent, type: ShadingType.CLEAR },
margins: { top:40, bottom:40, left:200, right:100 },
children: [new Paragraph({ children: [
new TextRun({ text: "Work Experience ", size:22, bold:true, color:"FFFFFF", font:"SimHei" }),
new TextRun({ text: "Experience", size:18, color:"C8E8F0", font:"Times New Roman", italics:true }),
] })],
})
] })] });
```
**Experience entry format:**
```js
// Line 1: Company(bold) + Title(accent) + Date(right-aligned)
new Paragraph({
tabStops: [{ type: TabStopType.RIGHT, position: 7200 }],
children: [
new TextRun({ text: "Company Name", size:22, bold:true, color:S.title }),
new TextRun({ text: " Role Title", size:20, color:S.accent }),
new TextRun({ text: "\t2023.06 — Present", size:17, color:S.sec }),
]
});
// Line 2+: ▸ bullet points
```
---
## Template B: Dark Header Banner + Single Column
### Color Palette
```js
const C = {
dark: "1A3352", // header background (dark blue)
accent: "2980B9", // accent color
title: "1A2636", // heading
body: "2C3E50", // body text
sec: "6B8599", // secondary info
light: "E8EFF5", // light background
};
```
### Layout Structure
```
┌────────────────────────────────┐
│ ██████████████████████████████ │ ← Dark blue background banner
│ █ Name Title █ │ Contains name / title /
│ █ Phone | Email | Location █ │ contact / basic info
│ █ DOB | Degree | School █ │
│ ██████████████████████████████ │
│ │
│ Profile │ ← Underline heading
│ ───────────────────────────── │
│ Summary text... │
│ │
│ Work Experience │
│ ───────────────────────────── │
│ Company | Role Date │
│ • Achievement... │
│ ... │
│ │
│ Skills │
│ ───────────────────────────── │
│ Programming ●●●●○ Java/Go/...│ ← Rating + details
└────────────────────────────────┘
```
### Implementation Notes
**Header banner:**
```js
// Table single row single column, dark background, height 2400 DXA
new Table({ columnWidths:[11906], rows:[new TableRow({
height: { value:2400, rule:"exact" },
children:[new TableCell({
shading: { fill: C.dark },
margins: { top:300, bottom:200, left:800, right:800 },
verticalAlign: VerticalAlign.TOP, // Never use CENTER in exact-height rows (WPS incompatible)
children: [
// Line 1: Name(48pt white) + Title
// Line 2: Phone | Email | Location
// Line 3: DOB | Degree | School
]
})]
})] });
```
**Section headings (underline style):**
```js
new Paragraph({
borders: { bottom: { style: BorderStyle.SINGLE, size: 2, color: C.accent } },
children: [
new TextRun({ text: "Work Experience", size:24, bold:true, color:C.accent, font:"SimHei" }),
new TextRun({ text: " Experience", size:18, color:C.sec, italics:true }),
]
});
```
**Skills display (rating + details):**
```js
// Name(bold) + ●○ rating + specific tools list
new Paragraph({ children: [
new TextRun({ text: "Programming ", size:19, bold:true, color:C.title }),
new TextRun({ text: "●●●●○ ", size:13, color:C.accent }),
new TextRun({ text: "Java / Go / Python / TypeScript", size:18, color:C.sec }),
] });
```
---
## Template C: Blue Sidebar + Vertical-Line Headings
### Color Palette
```js
const C = {
side: "4A7C8F", // sidebar background (teal-blue)
text: "FFFFFF", // sidebar text
label: "A0C4D0", // sidebar secondary text
accent: "357A8F", // accent color
dot: "2F8FAD", // skill dot fill color
dotDim: "B8D4DE", // skill dot empty color
title: "1A3040", // body heading
body: "2C4050", // body content
sec: "6B8A98", // secondary info
};
```
### Sidebar-Specific Elements
**Circular photo placeholder:**
```js
new Paragraph({ alignment: AlignmentType.CENTER,
children: [new TextRun({ text: "◯", size:80, color:C.label })]
});
```
**Language proficiency matrix:**
```js
"English ● ● ● ● ○"
"Japanese ● ● ○ ○ ○"
```
**Right-side section headings (left-border style):**
```js
new Paragraph({
borders: { left: { style: BorderStyle.SINGLE, size:8, color:C.accent, space:8 } },
indent: { left: 120 },
children: [
new TextRun({ text: "Work Experience", size:24, bold:true, color:C.title, font:"SimHei" }),
new TextRun({ text: " Experience", size:18, color:C.sec, italics:true }),
]
});
```
**Experience entry format (differs from A):**
```js
// Line 1: Company name (bold)
// Line 2: Role (accent color) + Date
// Line 3+: ▸ bullet points
```
---
## Universal Rules
### Font Specifications
| Element | Font | Size | Style |
|---------|------|------|-------|
| Name (sidebar) | SimHei | 32pt (size:64) | Bold, white |
| Name (header) | SimHei | 24pt (size:48) | Bold, white |
| Section heading | SimHei | 11pt (size:22) | Bold |
| Company / School | Microsoft YaHei | 11pt (size:22) | Bold |
| Role title | Microsoft YaHei | 10pt (size:20) | accent color |
| Date range | Microsoft YaHei | 8.5pt (size:17) | sec color |
| Bullet description | Microsoft YaHei | 9.5pt (size:19) | body color |
| Skill dots | Default | 6.5pt (size:13) | accent / dimColor |
### Bullet Symbols
- Template A / C: `▸` (small triangle)
- Template B: `•` (round dot)
### Skill Rating Rules
- 15 levels using filled ● and empty ○ dots
- One skill per line, name on the left, dots on the right
- Filled dot color: accent; empty dot color: dimColor
### JD Matching Logic
When user provides a job description:
1. Extract key requirements (skills, experience, education)
2. Prioritize matching experience items to the top
3. Naturally incorporate JD keywords into descriptions
4. Highlight relevant skills
### Multi-Page Handling
- 1 page content: Sidebar templates (A/C) or single-column template (B)
- Over 1 page: Prefer Template B; if using A/C, switch page 2 to full-width layout with a name bar at the top (Name | Title)
⚠️ **Multi-page resumes must use multi-section structure:**
Page 1 and Page 2 must be **different sections** for independent margin and layout control:
```js
sections: [
{
// Page 1 section — margin 0 (sidebar layout needs full-page)
properties: { page: { margin: { top: 0, bottom: 0, left: 0, right: 0 } } },
children: [page1Table],
},
{
// Page 2 section — normal margins with header bar
properties: { page: { margin: { top: 800, bottom: 600, left: 800, right: 800 } } },
children: [pageHeader(name, title), ...page2Content],
},
]
```
⚠️ **Template B multi-page handling:**
Template B header banner uses Table simulation:
1. Banner `columnWidths` must equal **page content area width** (pageWidth - marginLeft - marginRight), not full page width
2. If banner needs full page width → set page 1 section margin to 0, banner columnWidths to 11906
3. Page 2+ must be independent sections, margin.top ≥ 800
⚠️ **Page 2+ top spacing rules (mandatory):**
1. **Page margin.top must be ≥ 800 twips** (~1.4 cm), never 0
2. **Page 2+ needs a header info bar:** concise `Name | Title` bar, height ~400600 twips, separated from body with light background or bottom line
3. **200300 twips spacing between header bar and body content**
4. **Forbidden: content touching the very top of page 2**
```js
// Concise header bar for page 2+
function pageHeader(name, title) {
return new Table({
width: { size: 100, type: WidthType.PERCENTAGE },
borders: { top: NB, left: NB, right: NB, insideHorizontal: NB, insideVertical: NB,
bottom: { style: BorderStyle.SINGLE, size: 1, color: "D0D0D0" } },
rows: [new TableRow({
cantSplit: true,
height: { value: 500, rule: "atLeast" },
children: [new TableCell({
margins: { top: 60, bottom: 60, left: 200, right: 200 },
borders: { top: NB, left: NB, right: NB, bottom: NB },
children: [new Paragraph({
children: [
new TextRun({ text: name, size: 20, bold: true, color: S.title || C.title }),
new TextRun({ text: ` | ${title}`, size: 18, color: S.sec || C.sec }),
]
})],
})],
})],
});
}
```
---
## Scene-Specific Quality Checks
In addition to universal checks (see `references/common-rules.md`):
### Format
- [ ] Fits within 1 page (senior ≤ 2 pages)
- [ ] **Single-page fill rate ≥ 85%** (bottom whitespace ≤ 15%, ~2500 twips)
- [ ] Section headings are bilingual
- [ ] Skill rating dots correct (●○)
- [ ] Experience in reverse chronological order
- [ ] No cover page, no TOC
- [ ] Line spacing 1.15x (line: 276)
- [ ] No extra blank pages
- [ ] **Table row height uses `rule: "exact"` with value ≤ 16038** (prevent overflow blank pages)
- [ ] **Multi-page: page 2+ has header info bar + proper top spacing**
### Content
- [ ] Clearly organized around target position
- [ ] No vague self-assessments ("hardworking", "responsible", "team player")
- [ ] No fabricated data or exaggerated results
- [ ] Most relevant experience placed first and detailed
- [ ] Each bullet demonstrates action + object + method + result
- [ ] No long narrative blocks / excessive long sentences / information density imbalance
- [ ] Module names are standardized
- [ ] Contact info is plain text, clearly positioned
- [ ] Header area forms visual center
- [ ] Work experience and projects are the visual main body
- [ ] Page count matches candidate seniority
### Single-Page Fill Rules
Single-page resumes must fully utilize page space — **large bottom whitespace is forbidden:**
1. If content is insufficient → **proactively expand:**
- Add project details, skill keywords, achievement data
- Add supplementary modules: profile summary, interests, awards
2. Use section spacing (`spacing.before/after`) to **distribute content evenly**
3. Sidebar templates (A/C): sidebar height should approach full page
- If sidebar content is sparse, increase element spacing
- Or add supplementary modules: "Languages", "Interests"
4. Assessment: after generation, check last content element position; if >2500 twips from page bottom, adjust

View File

@@ -0,0 +1 @@
# Make scripts directory a package for relative imports in tests

View File

@@ -0,0 +1,749 @@
#!/usr/bin/env python3
"""
Add placeholder entries to Table of Contents in a DOCX file.
This script adds placeholder TOC entries between the 'separate' and 'end'
field characters, so users see some content on first open instead of an empty TOC.
The original file is replaced with the modified version.
Usage:
python add_toc_placeholders.py <docx_file> # auto-extract headings (default)
python add_toc_placeholders.py <docx_file> --auto # explicit auto mode
python add_toc_placeholders.py <docx_file> --entries <entries_json>
entries_json format: JSON string with array of objects:
[
{"level": 1, "text": "Chapter 1 Overview", "page": "1"},
{"level": 2, "text": "Section 1.1 Details", "page": "1"}
]
Default behavior (no flags): auto-extracts Heading 1-3 from the document.
Filters out table/figure captions (e.g. "表 1xxx", "图 2xxx").
Example:
python add_toc_placeholders.py document.docx
python add_toc_placeholders.py document.docx --auto
python add_toc_placeholders.py document.docx --entries '[{"level":1,"text":"Introduction","page":"1"}]'
"""
import argparse
import html
import json
import re
import shutil
import sys
import tempfile
import zipfile
from pathlib import Path
def _extract_headings_from_docx(docx_path: str, max_level: int = 3) -> list:
"""Extract headings from a DOCX file for auto-mode TOC generation.
Args:
docx_path: Path to DOCX file
max_level: Maximum heading level to include (default 3)
Returns:
List of dicts with 'level', 'text', 'page' keys
"""
from docx import Document
doc = Document(docx_path)
entries = []
page_estimate = 1
# Pattern to filter out table/figure captions styled as headings
caption_pattern = re.compile(r'^[表图]\s*\d')
for i, para in enumerate(doc.paragraphs):
style_name = para.style.name if para.style else ''
if not style_name.startswith('Heading'):
continue
m = re.search(r'(\d+)', style_name)
if not m:
continue
level = int(m.group(1))
if level > max_level:
continue
text = para.text.strip()
if not text:
continue
# Filter table/figure captions
if caption_pattern.match(text):
continue
# Rough page estimate: increment every ~8 headings
page_estimate = max(1, 1 + i // 8)
entries.append({"level": level, "text": text, "page": str(page_estimate)})
return entries
def add_toc_placeholders(docx_path: str, entries: list = None) -> None:
"""Add placeholder TOC entries to a DOCX file (in-place replacement).
Args:
docx_path: Path to DOCX file (will be modified in-place)
entries: Optional list of placeholder entries. Each entry should be a dict
with 'level' (1-3), 'text', and 'page' keys.
"""
docx_path = Path(docx_path)
# Create temp directory for extraction
with tempfile.TemporaryDirectory() as temp_dir:
temp_path = Path(temp_dir)
extracted_dir = temp_path / "extracted"
temp_output = temp_path / "output.docx"
# Extract DOCX
with zipfile.ZipFile(docx_path, 'r') as zip_ref:
zip_ref.extractall(extracted_dir)
# Ensure TOC styles exist in styles.xml
styles_xml_path = extracted_dir / "word" / "styles.xml"
toc_style_mapping = _ensure_toc_styles(styles_xml_path)
print(f"TOC style mapping: {toc_style_mapping}")
# Fix settings.xml: ensure updateFields has val="true"
settings_xml_path = extracted_dir / "word" / "settings.xml"
_fix_update_fields(settings_xml_path)
# Fix Heading styles: ensure outlineLvl is set (required for TOC field update)
_fix_heading_outline_levels(styles_xml_path)
# Process document.xml
document_xml = extracted_dir / "word" / "document.xml"
if not document_xml.exists():
raise ValueError("document.xml not found in the DOCX file")
# Read and process XML
content = document_xml.read_text(encoding='utf-8')
# Fix fldChar structure: split merged begin+instrText+separate into separate <w:r> elements
content = _fix_fld_char_structure(content)
# Find TOC structure and add placeholders (uses lxml for robust XML parsing)
modified_content = _insert_toc_placeholders(content, entries, toc_style_mapping)
# Write back
document_xml.write_text(modified_content, encoding='utf-8')
# Repack DOCX to temp file
with zipfile.ZipFile(temp_output, 'w', zipfile.ZIP_DEFLATED) as zipf:
for file_path in extracted_dir.rglob('*'):
if file_path.is_file():
arcname = file_path.relative_to(extracted_dir)
zipf.write(file_path, arcname)
# Replace original file with modified version (use shutil.move for cross-device support)
docx_path.unlink()
shutil.move(str(temp_output), str(docx_path))
def _fix_update_fields(settings_xml_path: Path) -> None:
"""Fix settings.xml to ensure <w:updateFields w:val="true"/> is present.
The docx npm library generates <w:updateFields/> without val="true",
which Word/WPS interprets as false, preventing TOC auto-update on open.
"""
if not settings_xml_path.exists():
return
content = settings_xml_path.read_text(encoding='utf-8')
original = content
# Case 1: <w:updateFields/> (self-closing, no val) → add val="true"
if '<w:updateFields/>' in content:
content = content.replace('<w:updateFields/>', '<w:updateFields w:val="true"/>')
print('Fixed: <w:updateFields/> → <w:updateFields w:val="true"/>')
# Case 2: <w:updateFields w:val="false"/> → change to true (match precisely)
elif re.search(r'<w:updateFields\s+w:val="false"\s*/>', content):
content = re.sub(
r'<w:updateFields\s+w:val="false"\s*/>',
'<w:updateFields w:val="true"/>',
content
)
print('Fixed: <w:updateFields w:val="false"/> → <w:updateFields w:val="true"/>')
# Case 3: Not present at all → inject before </w:settings>
elif '<w:updateFields' not in content:
content = content.replace('</w:settings>', '<w:updateFields w:val="true"/></w:settings>')
print('Fixed: added <w:updateFields w:val="true"/> to settings.xml')
if content != original:
settings_xml_path.write_text(content, encoding='utf-8')
def _fix_heading_outline_levels(styles_xml_path: Path) -> None:
"""Fix Heading styles to include outlineLvl in pPr.
The docx npm library creates Heading styles but sometimes doesn't set outlineLvl
in the style definition. Without outlineLvl, Word's TOC field update won't find
headings even though they display correctly.
This ensures Heading1 has outlineLvl=0, Heading2 has outlineLvl=1, etc.
"""
if not styles_xml_path.exists():
return
content = styles_xml_path.read_text(encoding='utf-8')
original = content
W_NS = 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'
for level in range(1, 7):
style_id = f'Heading{level}'
outline_val = str(level - 1)
# Pattern: find <w:style> with w:styleId="HeadingN"
style_pattern = (
rf'(<w:style[^>]*w:styleId="{style_id}"[^>]*>)'
rf'(.*?)'
rf'(</w:style>)'
)
match = re.search(style_pattern, content, flags=re.DOTALL)
if not match:
continue
style_content = match.group(2)
# Check if outlineLvl already exists in this style
if f'<w:outlineLvl' in style_content:
continue
# Find or create <w:pPr> within this style
ppr_match = re.search(r'(<w:pPr[^>]*>)(.*?)(</w:pPr>)', style_content, flags=re.DOTALL)
if ppr_match:
# Add outlineLvl inside existing pPr
new_ppr_content = ppr_match.group(2) + f'<w:outlineLvl w:val="{outline_val}"/>'
new_style_content = (
style_content[:ppr_match.start()] +
ppr_match.group(1) + new_ppr_content + ppr_match.group(3) +
style_content[ppr_match.end():]
)
else:
# No pPr exists, create one
new_ppr = f'<w:pPr><w:outlineLvl w:val="{outline_val}"/></w:pPr>'
# Insert pPr right after style opening (after name/basedOn if present)
new_style_content = new_ppr + style_content
new_style = match.group(1) + new_style_content + match.group(3)
content = content[:match.start()] + new_style + content[match.end():]
print(f'Fixed: added outlineLvl={outline_val} to {style_id} style')
if content != original:
styles_xml_path.write_text(content, encoding='utf-8')
def _fix_fld_char_structure(xml_content: str) -> str:
"""Fix malformed fldChar structure where begin+instrText+separate are in one <w:r>.
The docx npm library generates:
<w:r><w:fldChar begin/><w:instrText>TOC...</w:instrText><w:fldChar separate/></w:r>
Word/WPS requires the standard structure:
<w:r><w:fldChar begin/></w:r>
<w:r><w:instrText>TOC...</w:instrText></w:r>
<w:r><w:fldChar separate/></w:r>
"""
# Match a <w:r> that contains both begin fldChar AND instrText AND separate fldChar
pattern = (
r'<w:r(?:\s[^>]*)?>('
r'<w:fldChar[^>]*w:fldCharType="begin"[^>]*/>' # begin
r')('
r'<w:instrText[^>]*>.*?</w:instrText>' # instrText
r')('
r'<w:fldChar[^>]*w:fldCharType="separate"[^>]*/>' # separate
r')</w:r>'
)
def split_run(match):
begin = match.group(1)
instr = match.group(2)
separate = match.group(3)
return f'<w:r>{begin}</w:r><w:r>{instr}</w:r><w:r>{separate}</w:r>'
modified = re.sub(pattern, split_run, xml_content, flags=re.DOTALL)
if modified != xml_content:
print("Fixed: split merged fldChar begin+instrText+separate into separate <w:r> elements")
# Fix TOC instrText: remove \t switch with wrong style names
# docx npm lib generates \t "Heading1,1,Heading2,2,..." but Word expects "Heading 1,1,..."
# Since we already have \o "1-3" which uses outlineLvl (now fixed), \t is redundant and harmful
toc_t_pattern = r'(TOC\s+[^<]*?)\\t\s+&quot;[^&]*&quot;'
modified2 = re.sub(toc_t_pattern, r'\1', modified)
if modified2 != modified:
print("Fixed: removed \\t switch from TOC instrText (\\o with outlineLvl is sufficient)")
modified = modified2
return modified
def _detect_toc_styles(styles_xml_path: Path) -> dict:
"""Detect TOC style IDs from styles.xml.
Args:
styles_xml_path: Path to styles.xml
Returns:
Dictionary mapping level (1-3) to style ID string
"""
if not styles_xml_path.exists():
return {}
content = styles_xml_path.read_text(encoding='utf-8')
result = {}
for level in range(1, 4):
# Standard TOC style names: "TOC 1", "TOC 2", "TOC 3" (with space)
# or "TOC1", "TOC2", "TOC3" (no space) — docx-js uses numeric IDs like "9", "11", "12"
patterns = [
rf'w:styleId="(TOC{level})"',
rf'w:styleId="(TOC {level})"',
rf'<w:name\s+w:val="toc\s*{level}"[^/]*/>\s*</w:name>|<w:name\s+w:val="toc\s*{level}"[^/]*/>',
]
for pattern in patterns[:2]:
m = re.search(pattern, content)
if m:
result[level] = m.group(1)
break
else:
# Try matching by w:name (case insensitive toc N)
# Find <w:style> blocks with name containing "toc N"
name_pattern = rf'<w:style[^>]*w:styleId="([^"]*)"[^>]*>.*?<w:name\s+w:val="[Tt][Oo][Cc]\s*{level}"'
m = re.search(name_pattern, content, flags=re.DOTALL)
if m:
result[level] = m.group(1)
return result
def _ensure_toc_styles(styles_xml_path: Path) -> dict:
"""Ensure TOC styles exist in styles.xml, adding them if necessary.
Returns:
Dictionary mapping level (1-3) to style ID string
"""
if not styles_xml_path.exists():
return {1: "9", 2: "11", 3: "12"}
content = styles_xml_path.read_text(encoding='utf-8')
detected = _detect_toc_styles(styles_xml_path)
result = dict(detected)
W_NS = 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'
# Define TOC styles to add if missing
toc_style_defs = {
1: {
'id': '9',
'name': 'toc 1',
'xml': f'''<w:style w:type="paragraph" w:styleId="9" xmlns:w="{W_NS}">
<w:name w:val="toc 1"/>
<w:basedOn w:val="Normal"/>
<w:uiPriority w:val="39"/>
<w:pPr>
<w:tabs><w:tab w:val="right" w:leader="dot" w:pos="9026"/></w:tabs>
<w:spacing w:before="120" w:after="60"/>
</w:pPr>
<w:rPr><w:b/><w:bCs/></w:rPr>
</w:style>'''
},
2: {
'id': '11',
'name': 'toc 2',
'xml': f'''<w:style w:type="paragraph" w:styleId="11" xmlns:w="{W_NS}">
<w:name w:val="toc 2"/>
<w:basedOn w:val="Normal"/>
<w:uiPriority w:val="39"/>
<w:pPr>
<w:tabs><w:tab w:val="right" w:leader="dot" w:pos="9026"/></w:tabs>
<w:ind w:left="360"/>
<w:spacing w:before="60" w:after="40"/>
</w:pPr>
</w:style>'''
},
3: {
'id': '12',
'name': 'toc 3',
'xml': f'''<w:style w:type="paragraph" w:styleId="12" xmlns:w="{W_NS}">
<w:name w:val="toc 3"/>
<w:basedOn w:val="Normal"/>
<w:uiPriority w:val="39"/>
<w:pPr>
<w:tabs><w:tab w:val="right" w:leader="dot" w:pos="9026"/></w:tabs>
<w:ind w:left="720"/>
<w:spacing w:before="40" w:after="20"/>
</w:pPr>
</w:style>'''
},
}
modified = False
for level in range(1, 4):
if level not in result:
style_def = toc_style_defs[level]
result[level] = style_def['id']
# Add style before </w:styles>
insert_point = content.rfind('</w:styles>')
if insert_point == -1:
print(f"WARNING: Could not find </w:styles> to insert TOC {level} style", file=sys.stderr)
continue
content = content[:insert_point] + style_def['xml'] + '\n' + content[insert_point:]
print(f"Added TOC {level} style (ID: {style_def['id']})")
modified = True
if modified:
styles_xml_path.write_text(content, encoding='utf-8')
# Ensure Hyperlink style exists
_ensure_hyperlink_style(styles_xml_path)
return result
def _ensure_hyperlink_style(styles_xml_path: Path) -> None:
"""Ensure Hyperlink character style exists in styles.xml."""
if not styles_xml_path.exists():
return
content = styles_xml_path.read_text(encoding='utf-8')
if 'w:styleId="Hyperlink"' in content:
return
W_NS = 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'
hyperlink_style = f'''<w:style w:type="character" w:styleId="Hyperlink" xmlns:w="{W_NS}">
<w:name w:val="Hyperlink"/>
<w:uiPriority w:val="99"/>
<w:rPr>
<w:color w:val="0563C1"/>
<w:u w:val="single"/>
</w:rPr>
</w:style>'''
insert_point = content.rfind('</w:styles>')
if insert_point != -1:
content = content[:insert_point] + hyperlink_style + '\n' + content[insert_point:]
styles_xml_path.write_text(content, encoding='utf-8')
print("Added Hyperlink character style")
def _insert_toc_placeholders(xml_content: str, entries: list = None, toc_style_mapping: dict = None) -> str:
"""Insert placeholder TOC entries and heading bookmarks into XML content.
Uses lxml ElementTree for robust XML manipulation instead of fragile regex.
This function does TWO things:
1. Adds bookmark anchors to each Heading paragraph (so Word can link TOC → heading)
2. Replaces TOC placeholder area with proper entries containing HYPERLINK + PAGEREF
Args:
xml_content: The XML content of document.xml
entries: List of placeholder entries with 'level', 'text', 'page' keys
toc_style_mapping: Dictionary mapping level to style ID
Returns:
Modified XML content with bookmarks and TOC placeholders
Raises:
RuntimeError: If TOC structure cannot be found or is malformed
"""
from lxml import etree
if entries is None:
entries = [{"level": 1, "text": "Contents", "page": "1"}]
if toc_style_mapping is None:
toc_style_mapping = {1: "9", 2: "11", 3: "12"}
W = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
R_NS = "http://schemas.openxmlformats.org/officeDocument/2006/relationships"
# Parse XML
root = etree.fromstring(xml_content.encode('utf-8'))
nsmap = {'w': W, 'r': R_NS}
# ── Step 1: Add bookmarks to Heading paragraphs ──
bookmark_id_counter = 100000
heading_bookmark_map = {} # text → first bookmark_name (backward compat)
heading_bookmark_map_all = {} # text → [list of bookmark_names] for duplicate headings
for para in root.iter(f'{{{W}}}p'):
# Find pStyle
ppr = para.find(f'{{{W}}}pPr')
if ppr is None:
continue
pstyle = ppr.find(f'{{{W}}}pStyle')
if pstyle is None:
continue
style_val = pstyle.get(f'{{{W}}}val', '')
if not re.match(r'Heading\d$', style_val):
continue
# Extract heading text
texts = []
for t_elem in para.iter(f'{{{W}}}t'):
if t_elem.text:
texts.append(t_elem.text)
heading_text = ''.join(texts).strip()
if not heading_text:
continue
# Skip if already has bookmark
if para.find(f'{{{W}}}bookmarkStart') is not None:
continue
# Generate bookmark
bm_name = f"_Toc{bookmark_id_counter}"
bm_id_str = str(bookmark_id_counter)
bookmark_id_counter += 1
# Store mapping (support duplicate headings)
if heading_text not in heading_bookmark_map_all:
heading_bookmark_map_all[heading_text] = []
heading_bookmark_map_all[heading_text].append(bm_name)
if heading_text not in heading_bookmark_map:
heading_bookmark_map[heading_text] = bm_name
# Insert bookmarkStart after pPr
bm_start = etree.Element(f'{{{W}}}bookmarkStart')
bm_start.set(f'{{{W}}}id', bm_id_str)
bm_start.set(f'{{{W}}}name', bm_name)
bm_end = etree.Element(f'{{{W}}}bookmarkEnd')
bm_end.set(f'{{{W}}}id', bm_id_str)
ppr_index = list(para).index(ppr)
para.insert(ppr_index + 1, bm_start)
# bookmarkEnd at end of paragraph
para.append(bm_end)
bookmarks_added = len(heading_bookmark_map)
if bookmarks_added > 0:
print(f"Added {bookmarks_added} bookmarks to Heading paragraphs")
# ── Step 2: Find TOC field structure (begin → instrText → separate → end) ──
toc_separate_para = None
toc_end_para = None
# Track field nesting to handle nested fields correctly
field_stack = []
toc_field_depth = None
for fld_char in root.iter(f'{{{W}}}fldChar'):
fld_type = fld_char.get(f'{{{W}}}fldCharType')
run = fld_char.getparent()
if fld_type == 'begin':
para = run.getparent()
instr_text = ''
found_run = False
for sibling in para:
if sibling is run:
found_run = True
it = sibling.find(f'{{{W}}}instrText')
if it is not None and it.text:
instr_text += it.text
continue
if found_run and sibling.tag == f'{{{W}}}r':
it = sibling.find(f'{{{W}}}instrText')
if it is not None and it.text:
instr_text += it.text
if sibling.find(f'{{{W}}}fldChar') is not None:
break
field_stack.append(instr_text.strip())
if 'TOC' in instr_text and toc_field_depth is None:
toc_field_depth = len(field_stack)
elif fld_type == 'separate':
if toc_field_depth is not None and len(field_stack) == toc_field_depth:
toc_separate_para = run.getparent()
elif fld_type == 'end':
if toc_field_depth is not None and len(field_stack) == toc_field_depth:
toc_end_para = run.getparent()
break
if field_stack:
field_stack.pop()
if toc_separate_para is None or toc_end_para is None:
has_begin = root.find(f'.//{{{W}}}fldChar[@{{{W}}}fldCharType="begin"]') is not None
has_separate = root.find(f'.//{{{W}}}fldChar[@{{{W}}}fldCharType="separate"]') is not None
if not has_begin:
raise RuntimeError(
"TOC FAILED: No field structure found in document. "
"Ensure the code includes a TableOfContents element."
)
elif not has_separate:
raise RuntimeError(
"TOC FAILED: TOC field has 'begin' but no 'separate' fldChar. "
"Run _fix_fld_char_structure() first or check the docx-js version."
)
else:
raise RuntimeError(
"TOC FAILED: Field structure found but no TOC instrText detected. "
"Ensure TableOfContents element generates a TOC \\o field code."
)
# ── Step 3: Remove everything between separate-para and end-para ──
# The TOC paragraphs may be direct children of <w:body> or wrapped in <w:sdt><w:sdtContent>
toc_container = toc_separate_para.getparent() # could be body or sdtContent
container_children = list(toc_container)
sep_idx = container_children.index(toc_separate_para)
end_idx = container_children.index(toc_end_para)
for elem in container_children[sep_idx + 1:end_idx]:
toc_container.remove(elem)
# ── Step 4: Build and insert placeholder paragraphs ──
indent_mapping = {1: 0, 2: 360, 3: 720, 4: 1080, 5: 1440, 6: 1800}
heading_occurrence_counter = {}
insert_pos = list(toc_container).index(toc_end_para)
for entry in entries:
level = entry.get('level', 1)
text_raw = entry.get('text', '')
page = entry.get('page', '1')
toc_style = toc_style_mapping.get(level, toc_style_mapping.get(1, "9"))
indent = indent_mapping.get(level, 0)
# Resolve bookmark (handle duplicate headings correctly)
bm_name = ''
if text_raw in heading_bookmark_map_all:
occ = heading_occurrence_counter.get(text_raw, 0)
bm_list = heading_bookmark_map_all[text_raw]
if occ < len(bm_list):
bm_name = bm_list[occ]
heading_occurrence_counter[text_raw] = occ + 1
# Build paragraph element
p = etree.Element(f'{{{W}}}p')
toc_container.insert(insert_pos, p)
insert_pos += 1
# pPr
ppr = etree.SubElement(p, f'{{{W}}}pPr')
pstyle = etree.SubElement(ppr, f'{{{W}}}pStyle')
pstyle.set(f'{{{W}}}val', str(toc_style))
if indent > 0:
ind = etree.SubElement(ppr, f'{{{W}}}ind')
ind.set(f'{{{W}}}left', str(indent))
tabs = etree.SubElement(ppr, f'{{{W}}}tabs')
tab = etree.SubElement(tabs, f'{{{W}}}tab')
tab.set(f'{{{W}}}val', 'right')
tab.set(f'{{{W}}}leader', 'dot')
tab.set(f'{{{W}}}pos', '9026')
spacing = etree.SubElement(ppr, f'{{{W}}}spacing')
spacing.set(f'{{{W}}}before', '120')
spacing.set(f'{{{W}}}after', '60')
if bm_name:
hyperlink = etree.SubElement(p, f'{{{W}}}hyperlink')
hyperlink.set(f'{{{W}}}anchor', bm_name)
hyperlink.set(f'{{{W}}}history', '1')
r_text = etree.SubElement(hyperlink, f'{{{W}}}r')
rpr = etree.SubElement(r_text, f'{{{W}}}rPr')
rstyle = etree.SubElement(rpr, f'{{{W}}}rStyle')
rstyle.set(f'{{{W}}}val', 'Hyperlink')
t = etree.SubElement(r_text, f'{{{W}}}t')
t.text = text_raw
r_tab = etree.SubElement(hyperlink, f'{{{W}}}r')
etree.SubElement(r_tab, f'{{{W}}}tab')
r_begin = etree.SubElement(hyperlink, f'{{{W}}}r')
fc_begin = etree.SubElement(r_begin, f'{{{W}}}fldChar')
fc_begin.set(f'{{{W}}}fldCharType', 'begin')
r_instr = etree.SubElement(hyperlink, f'{{{W}}}r')
instr = etree.SubElement(r_instr, f'{{{W}}}instrText')
instr.set('{http://www.w3.org/XML/1998/namespace}space', 'preserve')
instr.text = f' PAGEREF {bm_name} \\h '
r_sep = etree.SubElement(hyperlink, f'{{{W}}}r')
fc_sep = etree.SubElement(r_sep, f'{{{W}}}fldChar')
fc_sep.set(f'{{{W}}}fldCharType', 'separate')
r_page = etree.SubElement(hyperlink, f'{{{W}}}r')
t_page = etree.SubElement(r_page, f'{{{W}}}t')
t_page.text = str(page)
r_end = etree.SubElement(hyperlink, f'{{{W}}}r')
fc_end = etree.SubElement(r_end, f'{{{W}}}fldChar')
fc_end.set(f'{{{W}}}fldCharType', 'end')
else:
r_text = etree.SubElement(p, f'{{{W}}}r')
t = etree.SubElement(r_text, f'{{{W}}}t')
t.text = text_raw
r_tab = etree.SubElement(p, f'{{{W}}}r')
etree.SubElement(r_tab, f'{{{W}}}tab')
r_page = etree.SubElement(p, f'{{{W}}}r')
t_page = etree.SubElement(r_page, f'{{{W}}}t')
t_page.text = str(page)
placeholders_inserted = len(entries)
print(f"Inserted {placeholders_inserted} TOC placeholder entries")
# Serialize back to string
result = etree.tostring(root, xml_declaration=True, encoding='UTF-8', standalone=True)
return result.decode('utf-8')
def main():
parser = argparse.ArgumentParser(
description='Add placeholder entries to Table of Contents in a DOCX file (in-place)'
)
parser.add_argument('docx_file', help='DOCX file to modify (will be replaced)')
parser.add_argument(
'--auto', action='store_true',
help='Auto-extract Heading 1-3 from the DOCX as TOC entries (recommended)'
)
parser.add_argument(
'--entries',
help='JSON string with placeholder entries: [{"level":1,"text":"Chapter 1","page":"1"}]'
)
args = parser.parse_args()
# Determine entries
entries = None
if args.entries:
try:
entries = json.loads(args.entries)
except json.JSONDecodeError as e:
print(f"Error parsing entries JSON: {e}", file=sys.stderr)
sys.exit(1)
elif args.auto or True:
# Default to auto mode — always extract from document headings
entries = _extract_headings_from_docx(args.docx_file)
if entries:
print(f"Auto-extracted {len(entries)} headings from document", file=sys.stderr)
else:
print("No headings found in document, using minimal placeholder", file=sys.stderr)
entries = [{"level": 1, "text": "Contents", "page": "1"}]
# Add placeholders
try:
add_toc_placeholders(args.docx_file, entries)
print(f"Successfully added TOC placeholders to {args.docx_file}")
except RuntimeError as e:
# TOC structure errors — hard fail with exit code 1
print(f"ERROR: {e}", file=sys.stderr)
sys.exit(1)
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
if __name__ == '__main__':
main()

1333
skills/docx/scripts/document.py Executable file

File diff suppressed because it is too large Load Diff

807
skills/docx/scripts/postcheck.py Executable file
View File

@@ -0,0 +1,807 @@
#!/usr/bin/env python3
"""
postcheck.py — Document business rule self-check script
Unlike traditional OpenXML Schema validation, this script does not check XML legality.
Instead, it checks document "visual quality" and "typesetting correctness" — issues visible to the human eye.
Usage:
python3 postcheck.py output.docx [--fix] [--json]
Checks:
1. Blank page detection — trailing/middle excess blank pages, double page breaks, consecutive empty paragraphs
2. Line spacing consistency — whether body paragraph line spacing is uniform
3. Table margins — whether cells have padding set
4. Table pagination control — whether header rows have tblHeader set, data rows have cantSplit
5. Image overflow — whether image width exceeds page usable area
6. Font fallback — whether fonts are used that may be missing on target systems
7. CJK indentation — whether Chinese body text has first-line indent (excluding table cells, lists, centered paragraphs)
8. Heading level continuity — whether headings skip levels (H1→H3 skipping H2)
9. Numbering continuity — whether numbered lists have gaps
10. Cover separation — whether cover and body are in different sections
11. ShadingType — whether SOLID is misused causing black cells
12. TOC quality — whether TOC field exists, whether headings use standard Heading styles
13. Image aspect ratio — whether images are stretched/distorted
14. Document cleanliness — whether placeholder text, Markdown syntax, or draft expressions remain
15. Report content quality — whether summary exists, whether titles are specific, whether vague conclusions are used
"""
import zipfile
import sys
import json
import re
from pathlib import Path
from xml.etree import ElementTree as ET
NS = {
"w": "http://schemas.openxmlformats.org/wordprocessingml/2006/main",
"r": "http://schemas.openxmlformats.org/officeDocument/2006/relationships",
"wp": "http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing",
"a": "http://schemas.openxmlformats.org/drawingml/2006/main",
"pic": "http://schemas.openxmlformats.org/drawingml/2006/picture",
}
class CheckResult:
def __init__(self, name: str, passed: bool, message: str, severity: str = "warning"):
self.name = name
self.passed = passed
self.message = message
self.severity = severity # "error" | "warning" | "info"
def to_dict(self):
return {
"name": self.name,
"passed": self.passed,
"message": self.message,
"severity": self.severity,
}
def __str__(self):
icon = "" if self.passed else ("" if self.severity == "error" else "⚠️")
return f"{icon} [{self.name}] {self.message}"
def read_document_xml(docx_path: str) -> ET.Element:
"""Read document.xml and return the root element"""
with zipfile.ZipFile(docx_path, "r") as z:
return ET.fromstring(z.read("word/document.xml"))
def get_sections(root: ET.Element) -> list:
"""Extract all sections (located via sectPr)"""
body = root.find(".//w:body", NS)
if body is None:
return []
sections = []
current_children = []
for child in body:
tag = child.tag.split("}")[-1] if "}" in child.tag else child.tag
if tag == "sectPr":
sections.append({"children": current_children, "sectPr": child})
current_children = []
else:
# Check whether paragraph contains sectPr (section break inside paragraph pPr)
ppr_sect = child.find(".//w:pPr/w:sectPr", NS)
if ppr_sect is not None:
current_children.append(child)
sections.append({"children": current_children, "sectPr": ppr_sect})
current_children = []
else:
current_children.append(child)
# Last section (body-level sectPr)
body_sect = body.find("w:sectPr", NS)
if body_sect is not None and current_children:
sections.append({"children": current_children, "sectPr": body_sect})
return sections
def check_blank_pages(root: ET.Element) -> CheckResult:
"""Detect excess blank pages — multi-pattern detection"""
body = root.find(".//w:body", NS)
paragraphs = body.findall("w:p", NS)
issues = []
if not paragraphs:
return CheckResult("blank-pages", True, "No paragraph content")
# Check 1: Whether the last paragraph only has a page break
last_p = paragraphs[-1]
runs = last_p.findall(".//w:r", NS)
has_page_break = False
has_text = False
for run in runs:
br = run.find("w:br", NS)
if br is not None and br.get(f"{{{NS['w']}}}type") == "page":
has_page_break = True
t = run.find("w:t", NS)
if t is not None and t.text and t.text.strip():
has_text = True
if has_page_break and not has_text:
issues.append("Trailing page break at document end may cause blank page")
# Check 2: Consecutive empty paragraphs (≥5 consecutive may form visual blank page)
consecutive_empty = 0
max_empty = 0
max_empty_pos = 0
for idx, p in enumerate(paragraphs):
texts = p.findall(".//w:t", NS)
has_any_text = any(t.text and t.text.strip() for t in texts)
has_br = any(
br.get(f"{{{NS['w']}}}type") == "page"
for br in p.findall(".//w:br", NS)
)
has_drawing = p.find(".//{http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing}inline", None) is not None
if not has_any_text and not has_br and not has_drawing:
consecutive_empty += 1
if consecutive_empty > max_empty:
max_empty = consecutive_empty
max_empty_pos = idx
else:
consecutive_empty = 0
if max_empty >= 5:
issues.append(f"Found {max_empty} consecutive empty paragraphs (starting around paragraph {max_empty_pos - max_empty + 2}), may form visual blank page")
# Check 3: Double page break at section boundary (PageBreak at section end + NEXT_PAGE in next section)
sections = get_sections(root)
for i in range(len(sections) - 1):
sec_children = sections[i]["children"]
if not sec_children:
continue
# Check whether the last paragraph of the section contains PageBreak
last_child = sec_children[-1]
if last_child.tag == f"{{{NS['w']}}}p":
for br in last_child.findall(".//w:br", NS):
if br.get(f"{{{NS['w']}}}type") == "page":
# Check whether the next section is NEXT_PAGE
next_sect_pr = sections[i + 1]["sectPr"]
sect_type = next_sect_pr.find("w:type", NS)
if sect_type is not None and sect_type.get(f"{{{NS['w']}}}val") == "nextPage":
issues.append(f"Section {i+1} ends with PageBreak and Section {i+2} is type nextPage, double page break causes blank page")
# Check 4: Empty paragraph + PageBreak (paragraph has only PageBreak, no text)
# Exclude section-ending PageBreaks — they are normal section separators
# (e.g., cover page ending with an empty para + PageBreak before a new section)
section_last_paras = set()
for sec in sections:
children = sec["children"]
if children:
last_child = children[-1]
section_last_paras.add(id(last_child))
empty_pb_count = 0
for p in paragraphs[:-1]: # Last paragraph already handled in Check 1
if id(p) in section_last_paras:
continue # Skip section-ending paragraphs (normal section breaks)
runs = p.findall(".//w:r", NS)
p_has_break = False
p_has_text = False
for run in runs:
br = run.find("w:br", NS)
if br is not None and br.get(f"{{{NS['w']}}}type") == "page":
p_has_break = True
t = run.find("w:t", NS)
if t is not None and t.text and t.text.strip():
p_has_text = True
if p_has_break and not p_has_text:
empty_pb_count += 1
if empty_pb_count > 0:
issues.append(f"Found {empty_pb_count} empty paragraphs with PageBreak (suggest attaching PageBreak to content paragraphs)")
# Separate hard errors from soft warnings
hard_issues = [i for i in issues if "double page break" in i.lower() or "trailing page break" in i.lower() or "consecutive" in i.lower()]
soft_issues = [i for i in issues if i not in hard_issues]
if hard_issues:
return CheckResult(
"blank-pages", False,
"; ".join(hard_issues[:3]),
"error"
)
if soft_issues:
return CheckResult(
"blank-pages", False,
"; ".join(soft_issues[:3]),
"warning"
)
return CheckResult("blank-pages", True, "No blank page issues detected")
def check_line_spacing(root: ET.Element) -> CheckResult:
"""Check body paragraph line spacing consistency"""
body = root.find(".//w:body", NS)
paragraphs = body.findall(".//w:p", NS)
spacing_values = {}
body_para_count = 0
for p in paragraphs:
ppr = p.find("w:pPr", NS)
# Skip heading paragraphs
if ppr is not None:
style = ppr.find("w:pStyle", NS)
if style is not None:
val = style.get(f"{{{NS['w']}}}val", "")
if val.startswith("Heading") or val == "Title":
continue
spacing = ppr.find("w:spacing", NS) if ppr is not None else None
line_val = spacing.get(f"{{{NS['w']}}}line") if spacing is not None else None
# Only count paragraphs with text content
texts = p.findall(".//w:t", NS)
if not any(t.text and t.text.strip() for t in texts):
continue
body_para_count += 1
key = line_val or "default"
spacing_values[key] = spacing_values.get(key, 0) + 1
if body_para_count == 0:
return CheckResult("line-spacing", True, "No body paragraphs")
if len(spacing_values) <= 1:
dominant = list(spacing_values.keys())[0] if spacing_values else "default"
return CheckResult("line-spacing", True, f"Line spacing uniform (line={dominant})")
# Find the most common line spacing
dominant = max(spacing_values, key=spacing_values.get)
inconsistent = sum(v for k, v in spacing_values.items() if k != dominant)
total = sum(spacing_values.values())
if inconsistent / total > 0.2:
return CheckResult(
"line-spacing", False,
f"Line spacing inconsistent: {dict(spacing_values)}, {inconsistent}/{total} paragraphs differ from dominant spacing {dominant}",
"warning"
)
return CheckResult("line-spacing", True, f"Line spacing mostly uniform (line={dominant}, {inconsistent} exceptions)")
def check_image_overflow(root: ET.Element) -> CheckResult:
"""Check whether image width may exceed page bounds"""
# Get page width
sect_pr = root.find(".//w:body/w:sectPr", NS)
page_width = 11906 # A4 default
margin_left = 1701
margin_right = 1417
if sect_pr is not None:
pg_sz = sect_pr.find("w:pgSz", NS)
pg_mar = sect_pr.find("w:pgMar", NS)
if pg_sz is not None:
page_width = int(pg_sz.get(f"{{{NS['w']}}}w", "11906"))
if pg_mar is not None:
margin_left = int(pg_mar.get(f"{{{NS['w']}}}left", "1701"))
margin_right = int(pg_mar.get(f"{{{NS['w']}}}right", "1417"))
usable_width_emu = (page_width - margin_left - margin_right) * 635 # twips → EMU
drawings = root.findall(".//wp:inline", NS) + root.findall(".//wp:anchor", NS)
oversized = 0
for dwg in drawings:
extent = dwg.find("wp:extent", NS)
if extent is not None:
cx = int(extent.get("cx", "0"))
if cx > usable_width_emu * 1.05: # 5% tolerance
oversized += 1
if oversized > 0:
return CheckResult(
"image-overflow", False,
f"{oversized} images exceed page usable area",
"error"
)
return CheckResult(
"image-overflow", True,
f"All images within page width ({len(drawings)} images)"
)
def check_image_aspect_ratio(docx_path: str, root: ET.Element) -> CheckResult:
"""Check whether images are stretched/distorted (aspect ratio drift).
Compares the original aspect ratio of embedded images with the display aspect ratio set in wp:extent.
Drift >10% is considered distortion (pie charts becoming elliptical, radar charts becoming diamond-shaped, etc).
"""
import zipfile as _zf
# Build a mapping: rId → image file path inside the zip
# We need to parse word/_rels/document.xml.rels
rid_to_path = {}
try:
with _zf.ZipFile(docx_path, 'r') as z:
rels_path = 'word/_rels/document.xml.rels'
if rels_path in z.namelist():
rels_xml = z.read(rels_path)
rels_root = ET.fromstring(rels_xml)
rels_ns = 'http://schemas.openxmlformats.org/package/2006/relationships'
for rel in rels_root.findall(f'{{{rels_ns}}}Relationship'):
rid = rel.get('Id', '')
target = rel.get('Target', '')
rel_type = rel.get('Type', '')
if 'image' in rel_type:
# Target is relative to word/ directory
if not target.startswith('/'):
img_path = 'word/' + target
else:
img_path = target.lstrip('/')
rid_to_path[rid] = img_path
# Now check each drawing
drawings = root.findall(".//wp:inline", NS) + root.findall(".//wp:anchor", NS)
distorted = []
for dwg in drawings:
extent = dwg.find("wp:extent", NS)
if extent is None:
continue
display_cx = int(extent.get("cx", "0"))
display_cy = int(extent.get("cy", "0"))
if display_cx == 0 or display_cy == 0:
continue
# Find the blip rId
blip = dwg.find(".//a:blip", NS)
if blip is None:
continue
r_embed = blip.get(f"{{{NS['r']}}}embed", "")
if not r_embed or r_embed not in rid_to_path:
continue
img_zip_path = rid_to_path[r_embed]
if img_zip_path not in z.namelist():
continue
# Read actual image dimensions
try:
img_data = z.read(img_zip_path)
from PIL import Image as _PILImage
import io as _io
pil_img = _PILImage.open(_io.BytesIO(img_data))
orig_w, orig_h = pil_img.size
if orig_w == 0 or orig_h == 0:
continue
except Exception:
continue
# Compare aspect ratios
orig_ratio = orig_w / orig_h
display_ratio = display_cx / display_cy
drift = abs(orig_ratio - display_ratio) / orig_ratio
if drift > 0.10: # >10% distortion
pct = drift * 100
distorted.append(
f"{img_zip_path.split('/')[-1]}: "
f"original {orig_w}×{orig_h} (ratio={orig_ratio:.2f}), "
f"display ratio={display_ratio:.2f}, distortion {pct:.0f}%"
)
except Exception:
return CheckResult(
"image-aspect-ratio", True,
"Cannot check image aspect ratio (zip read error)",
"info"
)
if distorted:
detail = "; ".join(distorted[:3])
if len(distorted) > 3:
detail += f" ...and {len(distorted)} more"
return CheckResult(
"image-aspect-ratio", False,
f"{len(distorted)} images have aspect ratio distortion: {detail}",
"warning"
)
img_count = len(drawings)
return CheckResult(
"image-aspect-ratio", True,
f"All images have correct aspect ratio ({img_count} images)"
)
def check_font_fallback(root: ET.Element) -> CheckResult:
"""Check whether potentially missing fonts are used"""
SAFE_FONTS = {
# Chinese
"宋体", "SimSun", "黑体", "SimHei", "微软雅黑", "Microsoft YaHei",
"仿宋", "FangSong", "FangSong_GB2312", "楷体", "KaiTi",
# English
"Times New Roman", "Arial", "Calibri", "Helvetica",
"Courier New", "Georgia", "Verdana", "Tahoma",
# Universal
"Symbol", "Wingdings",
}
fonts_used = set()
for rpr in root.findall(".//w:rPr", NS):
for font_tag in ["w:rFonts"]:
rf = rpr.find(font_tag, NS)
if rf is not None:
for attr in ["ascii", "eastAsia", "hAnsi", "cs"]:
f = rf.get(f"{{{NS['w']}}}{attr}")
if f:
fonts_used.add(f)
risky = fonts_used - SAFE_FONTS
if risky:
return CheckResult(
"font-fallback", False,
f"Following fonts may be missing on target system: {', '.join(sorted(risky))}",
"info"
)
return CheckResult("font-fallback", True, f"All fonts are common system fonts ({len(fonts_used)} types)")
def check_heading_levels(root: ET.Element) -> CheckResult:
"""Check whether headings skip levels"""
body = root.find(".//w:body", NS)
heading_levels = []
for p in body.findall(".//w:p", NS):
ppr = p.find("w:pPr", NS)
if ppr is None:
continue
style = ppr.find("w:pStyle", NS)
if style is None:
continue
val = style.get(f"{{{NS['w']}}}val", "")
m = re.match(r"Heading(\d+)", val)
if m:
heading_levels.append(int(m.group(1)))
if len(heading_levels) < 2:
return CheckResult("heading-levels", True, "Too few headings, skipping check")
skips = []
for i in range(1, len(heading_levels)):
diff = heading_levels[i] - heading_levels[i - 1]
if diff > 1:
skips.append(f"H{heading_levels[i-1]}→H{heading_levels[i]}")
if skips:
return CheckResult(
"heading-levels", False,
f"Heading level skip: {', '.join(skips[:5])}",
"warning"
)
return CheckResult("heading-levels", True, f"Heading levels continuous ({len(heading_levels)} headings)")
# check_cover_separation removed — false positives on complex covers (>15 elements is normal)
def check_shading_type(root: ET.Element) -> CheckResult:
"""Check whether ShadingType.SOLID is misused"""
shadings = root.findall(".//w:shd", NS)
solid_count = 0
for shd in shadings:
val = shd.get(f"{{{NS['w']}}}val", "")
if val == "solid":
solid_count += 1
if solid_count > 0:
return CheckResult(
"shading-type", False,
f"Found {solid_count} instances of ShadingType.SOLID (should be CLEAR), may cause black cells",
"error"
)
return CheckResult("shading-type", True, "No ShadingType.SOLID misuse found")
def check_toc(root: ET.Element, docx_path: str = "") -> CheckResult:
"""Check TOC quality: field existence, headings presence, outlineLvl, updateFields."""
body = root.find(".//w:body", NS)
if body is None:
return CheckResult("toc", True, "Document body is empty, skipping TOC check", "info")
paragraphs = list(body)
w_ns = NS["w"]
# --- Detect headings and their levels ---
heading_count = 0
heading_levels_used = set() # e.g. {1, 2, 3}
for p in paragraphs:
if p.tag != f"{{{w_ns}}}p":
continue
ppr = p.find(f"{{{w_ns}}}pPr")
if ppr is None:
continue
ps = ppr.find(f"{{{w_ns}}}pStyle")
if ps is None:
continue
val = ps.get(f"{{{w_ns}}}val", "")
m = re.match(r"(?i)heading\s*(\d)", val)
if m:
heading_count += 1
heading_levels_used.add(int(m.group(1)))
# --- Detect TOC field ---
has_toc = False
for instr in root.findall(f".//{{{w_ns}}}instrText"):
if instr.text and "TOC" in instr.text.upper():
has_toc = True
break
if not has_toc:
for fld in root.findall(f".//{{{w_ns}}}fldSimple"):
if "TOC" in fld.get(f"{{{w_ns}}}instr", "").upper():
has_toc = True
break
# Also check SDT-wrapped TOC
if not has_toc:
for sdt in root.findall(f".//{{{w_ns}}}sdt"):
for instr in sdt.findall(f".//{{{w_ns}}}instrText"):
if instr.text and "TOC" in instr.text.upper():
has_toc = True
break
if has_toc:
break
issues = []
# Check 1: Document has a "目录" / "目 录" / "Table of Contents" title but no TOC field
has_toc_title = False
toc_title_pattern = re.compile(r'^(?:目\s*录|table\s+of\s+contents|contents)$', re.IGNORECASE)
for p in paragraphs:
if p.tag != f"{{{w_ns}}}p":
continue
texts = p.findall(f".//{{{w_ns}}}t")
p_text = "".join(t.text or "" for t in texts).strip()
if toc_title_pattern.match(p_text):
has_toc_title = True
break
if has_toc_title and not has_toc:
issues.append("TOC_FIELD_MISSING: document has a TOC title but no TOC field element — add TableOfContents in code")
# Check 2: TOC field exists but no headings in document → TOC will be empty after update
if has_toc and heading_count == 0:
issues.append("TOC_NO_HEADINGS: TOC field exists but document has 0 Heading-styled paragraphs — TOC will be empty after update")
# Check 3 & 4: Read styles.xml and settings.xml from DOCX (only when TOC exists)
if has_toc and docx_path:
try:
import zipfile
with zipfile.ZipFile(docx_path, 'r') as zf:
# Check 3: outlineLvl missing in Heading styles
if 'word/styles.xml' in zf.namelist():
styles_content = zf.read('word/styles.xml').decode('utf-8')
styles_root = ET.fromstring(styles_content)
missing_outline = []
for level in sorted(heading_levels_used):
style_id = f"Heading{level}"
# Find <w:style w:styleId="HeadingN">
for style_elem in styles_root.findall(f".//{{{w_ns}}}style"):
sid = style_elem.get(f"{{{w_ns}}}styleId", "")
if sid == style_id:
# Check if pPr has outlineLvl
ppr = style_elem.find(f"{{{w_ns}}}pPr")
has_outline = False
if ppr is not None:
ol = ppr.find(f"{{{w_ns}}}outlineLvl")
if ol is not None:
has_outline = True
if not has_outline:
missing_outline.append(style_id)
break
if missing_outline:
issues.append(
"TOC_OUTLINE_MISSING: %s style(s) missing outlineLvl — "
"Word TOC update won't find these headings. "
"Run add_toc_placeholders.py to fix" % ", ".join(missing_outline)
)
# Check 4: updateFields not set to true
if 'word/settings.xml' in zf.namelist():
settings_content = zf.read('word/settings.xml').decode('utf-8')
# Check for <w:updateFields w:val="true"/>
update_ok = bool(re.search(
r'<w:updateFields\s+[^>]*w:val\s*=\s*"true"',
settings_content
))
if not update_ok:
issues.append(
"TOC_UPDATE_DISABLED: settings.xml missing updateFields=true — "
"Word won't prompt to update TOC on open. "
"Run add_toc_placeholders.py to fix"
)
except Exception as e:
issues.append(f"TOC_CHECK_ERROR: failed to read styles/settings from DOCX: {e}")
if not issues:
if has_toc:
return CheckResult("toc", True, "TOC field present and update-ready")
else:
return CheckResult("toc", True, "No TOC needed")
severity = "error" if any(k in i for i in issues for k in ("FIELD_MISSING", "NO_HEADINGS", "OUTLINE_MISSING")) else "warning"
return CheckResult("toc", False, "; ".join(issues[:5]), severity)
def check_cover_overflow(root: ET.Element) -> CheckResult:
"""Detect cover section issues: oversized fonts, excessive spacing, trailing empty content."""
sections = get_sections(root)
if not sections:
return CheckResult("cover-overflow", True, "No sections found")
sec0 = sections[0]
sect_pr = sec0["sectPr"]
# Get page dimensions and margins for accurate available height calculation
pg_sz = sect_pr.find("w:pgSz", NS)
pg_mar = sect_pr.find("w:pgMar", NS)
page_height = int(pg_sz.get(f"{{{NS['w']}}}h", "16838")) if pg_sz is not None else 16838
margin_top = int(pg_mar.get(f"{{{NS['w']}}}top", "0")) if pg_mar is not None else 0
margin_bottom = int(pg_mar.get(f"{{{NS['w']}}}bottom", "0")) if pg_mar is not None else 0
issues = []
children = sec0["children"]
# Check 1: Oversized font in cover section (> 44pt = 88 half-points = 889000 EMU)
max_font_size = 0
for child in children:
for sz in child.findall(".//" + f"{{{NS['w']}}}sz"):
val = sz.get(f"{{{NS['w']}}}val")
if val and val.isdigit():
size_hp = int(val)
if size_hp > max_font_size:
max_font_size = size_hp
if max_font_size > 88: # 88 half-points = 44pt
issues.append(
f"Cover has font size {max_font_size // 2}pt (>{44}pt max). "
f"Use calcTitleLayout() for dynamic sizing"
)
# Check 2: Excessive spacing.before in cover section (> 5000 twips)
max_spacing = 0
for child in children:
for sp in child.findall(".//" + f"{{{NS['w']}}}spacing"):
before = sp.get(f"{{{NS['w']}}}before")
if before and before.isdigit():
val = int(before)
if val > max_spacing:
max_spacing = val
if max_spacing > 5000:
issues.append(
f"Cover has spacing.before={max_spacing} twips (>5000 max). "
f"Use calcCoverSpacing() for dynamic spacing"
)
# Check 3: Trailing empty paragraphs in cover section
trailing_empty = 0
for child in reversed(children):
tag = child.tag.split("}")[-1] if "}" in child.tag else child.tag
if tag != "p":
break
texts = child.findall(".//" + f"{{{NS['w']}}}t")
has_text = any(t.text and t.text.strip() for t in texts)
if not has_text:
trailing_empty += 1
else:
break
if trailing_empty > 2:
issues.append(
f"Cover section ends with {trailing_empty} empty paragraphs (max 2 allowed) — "
f"excessive empty paragraphs may cause blank page after cover"
)
if issues:
return CheckResult(
"cover-overflow", False,
"; ".join(issues),
"error"
)
return CheckResult("cover-overflow", True, "Cover section layout looks OK")
def run_all_checks(docx_path: str) -> list[CheckResult]:
"""Run all checks"""
root = read_document_xml(docx_path)
checks = [
check_blank_pages,
check_cover_overflow,
check_line_spacing,
check_image_overflow,
check_font_fallback,
check_heading_levels,
check_shading_type,
]
results = []
for check_fn in checks:
try:
results.append(check_fn(root))
except Exception as e:
results.append(CheckResult(
check_fn.__name__.replace("check_", ""),
False,
f"Check error: {e}",
"error"
))
# TOC check needs both root and docx_path
try:
results.append(check_toc(root, docx_path))
except Exception as e:
results.append(CheckResult("toc", False, f"Check error: {e}", "error"))
# Image aspect ratio check needs both root and docx_path
try:
results.append(check_image_aspect_ratio(docx_path, root))
except Exception as e:
results.append(CheckResult("image-aspect-ratio", False, f"Check error: {e}", "error"))
return results
def main():
import argparse
parser = argparse.ArgumentParser(description="docx business rule self-check")
parser.add_argument("docx_path", help="Path to the .docx file to check")
parser.add_argument("--json", action="store_true", help="Output in JSON format")
parser.add_argument("--strict", action="store_true", help="Treat warnings as failures")
args = parser.parse_args()
if not Path(args.docx_path).exists():
print(f"❌ File not found: {args.docx_path}")
sys.exit(1)
results = run_all_checks(args.docx_path)
if args.json:
print(json.dumps([r.to_dict() for r in results], ensure_ascii=False, indent=2))
else:
print(f"\n📋 Document self-check report: {args.docx_path}\n")
for r in results:
print(f" {r}")
passed = sum(1 for r in results if r.passed)
total = len(results)
errors = sum(1 for r in results if not r.passed and r.severity == "error")
warnings = sum(1 for r in results if not r.passed and r.severity == "warning")
print(f"\n {'' * 50}")
print(f" Passed {passed}/{total} | ❌ {errors} errors | ⚠️ {warnings} warnings\n")
# Exit code
has_errors = any(not r.passed and r.severity == "error" for r in results)
has_warnings = any(not r.passed and r.severity == "warning" for r in results)
if has_errors:
sys.exit(2)
elif args.strict and has_warnings:
sys.exit(1)
else:
sys.exit(0)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,3 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:comments xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:oel="http://schemas.microsoft.com/office/2019/extlst" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16du="http://schemas.microsoft.com/office/word/2023/wordml/word16du" xmlns:w16sdtdh="http://schemas.microsoft.com/office/word/2020/wordml/sdtdatahash" xmlns:w16sdtfl="http://schemas.microsoft.com/office/word/2024/wordml/sdtformatlock" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid w16 w16cex w16sdtdh w16sdtfl w16du wp14">
</w:comments>

View File

@@ -0,0 +1,3 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w15:commentsEx xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:oel="http://schemas.microsoft.com/office/2019/extlst" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16du="http://schemas.microsoft.com/office/word/2023/wordml/word16du" xmlns:w16sdtdh="http://schemas.microsoft.com/office/word/2020/wordml/sdtdatahash" xmlns:w16sdtfl="http://schemas.microsoft.com/office/word/2024/wordml/sdtformatlock" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid w16 w16cex w16sdtdh w16sdtfl w16du wp14">
</w15:commentsEx>

View File

@@ -0,0 +1,3 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w16cex:commentsExtensible xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:oel="http://schemas.microsoft.com/office/2019/extlst" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16du="http://schemas.microsoft.com/office/word/2023/wordml/word16du" xmlns:w16sdtdh="http://schemas.microsoft.com/office/word/2020/wordml/sdtdatahash" xmlns:w16sdtfl="http://schemas.microsoft.com/office/word/2024/wordml/sdtformatlock" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" xmlns:cr="http://schemas.microsoft.com/office/comments/2020/reactions" mc:Ignorable="w14 w15 w16se w16cid w16 w16cex w16sdtdh w16sdtfl cr w16du wp14">
</w16cex:commentsExtensible>

View File

@@ -0,0 +1,3 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w16cid:commentsIds xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:oel="http://schemas.microsoft.com/office/2019/extlst" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16du="http://schemas.microsoft.com/office/word/2023/wordml/word16du" xmlns:w16sdtdh="http://schemas.microsoft.com/office/word/2020/wordml/sdtdatahash" xmlns:w16sdtfl="http://schemas.microsoft.com/office/word/2024/wordml/sdtformatlock" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid w16 w16cex w16sdtdh w16sdtfl w16du wp14">
</w16cid:commentsIds>

View File

@@ -0,0 +1,3 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w15:people xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml">
</w15:people>

374
skills/docx/scripts/utilities.py Executable file
View File

@@ -0,0 +1,374 @@
#!/usr/bin/env python3
"""
Utilities for editing OOXML documents.
This module provides XMLEditor, a tool for manipulating XML files with support for
line-number-based node finding and DOM manipulation. Each element is automatically
annotated with its original line and column position during parsing.
Example usage:
editor = XMLEditor("document.xml")
# Find node by line number or range
elem = editor.get_node(tag="w:r", line_number=519)
elem = editor.get_node(tag="w:p", line_number=range(100, 200))
# Find node by text content
elem = editor.get_node(tag="w:p", contains="specific text")
# Find node by attributes
elem = editor.get_node(tag="w:r", attrs={"w:id": "target"})
# Combine filters
elem = editor.get_node(tag="w:p", line_number=range(1, 50), contains="text")
# Replace, insert, or manipulate
new_elem = editor.replace_node(elem, "<w:r><w:t>new text</w:t></w:r>")
editor.insert_after(new_elem, "<w:r><w:t>more</w:t></w:r>")
# Save changes
editor.save()
"""
import html
from pathlib import Path
from typing import Optional, Union
import defusedxml.minidom
import defusedxml.sax
class XMLEditor:
"""
Editor for manipulating OOXML XML files with line-number-based node finding.
This class parses XML files and tracks the original line and column position
of each element. This enables finding nodes by their line number in the original
file, which is useful when working with Read tool output.
Attributes:
xml_path: Path to the XML file being edited
encoding: Detected encoding of the XML file ('ascii' or 'utf-8')
dom: Parsed DOM tree with parse_position attributes on elements
"""
def __init__(self, xml_path):
"""
Initialize with path to XML file and parse with line number tracking.
Args:
xml_path: Path to XML file to edit (str or Path)
Raises:
ValueError: If the XML file does not exist
"""
self.xml_path = Path(xml_path)
if not self.xml_path.exists():
raise ValueError(f"XML file not found: {xml_path}")
with open(self.xml_path, "rb") as f:
header = f.read(200).decode("utf-8", errors="ignore")
self.encoding = "ascii" if 'encoding="ascii"' in header else "utf-8"
parser = _create_line_tracking_parser()
self.dom = defusedxml.minidom.parse(str(self.xml_path), parser)
def get_node(
self,
tag: str,
attrs: Optional[dict[str, str]] = None,
line_number: Optional[Union[int, range]] = None,
contains: Optional[str] = None,
):
"""
Get a DOM element by tag and identifier.
Finds an element by either its line number in the original file or by
matching attribute values. Exactly one match must be found.
Args:
tag: The XML tag name (e.g., "w:del", "w:ins", "w:r")
attrs: Dictionary of attribute name-value pairs to match (e.g., {"w:id": "1"})
line_number: Line number (int) or line range (range) in original XML file (1-indexed)
contains: Text string that must appear in any text node within the element.
Supports both entity notation (&#8220;) and Unicode characters (\u201c).
Returns:
defusedxml.minidom.Element: The matching DOM element
Raises:
ValueError: If node not found or multiple matches found
Example:
elem = editor.get_node(tag="w:r", line_number=519)
elem = editor.get_node(tag="w:r", line_number=range(100, 200))
elem = editor.get_node(tag="w:del", attrs={"w:id": "1"})
elem = editor.get_node(tag="w:p", attrs={"w14:paraId": "12345678"})
elem = editor.get_node(tag="w:commentRangeStart", attrs={"w:id": "0"})
elem = editor.get_node(tag="w:p", contains="specific text")
elem = editor.get_node(tag="w:t", contains="&#8220;Agreement") # Entity notation
elem = editor.get_node(tag="w:t", contains="\u201cAgreement") # Unicode character
"""
matches = []
for elem in self.dom.getElementsByTagName(tag):
# Check line_number filter
if line_number is not None:
parse_pos = getattr(elem, "parse_position", (None,))
elem_line = parse_pos[0]
# Handle both single line number and range
if isinstance(line_number, range):
if elem_line not in line_number:
continue
else:
if elem_line != line_number:
continue
# Check attrs filter
if attrs is not None:
if not all(
elem.getAttribute(attr_name) == attr_value
for attr_name, attr_value in attrs.items()
):
continue
# Check contains filter
if contains is not None:
elem_text = self._get_element_text(elem)
# Normalize the search string: convert HTML entities to Unicode characters
# This allows searching for both "&#8220;Rowan" and ""Rowan"
normalized_contains = html.unescape(contains)
if normalized_contains not in elem_text:
continue
# If all applicable filters passed, this is a match
matches.append(elem)
if not matches:
# Build descriptive error message
filters = []
if line_number is not None:
line_str = (
f"lines {line_number.start}-{line_number.stop - 1}"
if isinstance(line_number, range)
else f"line {line_number}"
)
filters.append(f"at {line_str}")
if attrs is not None:
filters.append(f"with attributes {attrs}")
if contains is not None:
filters.append(f"containing '{contains}'")
filter_desc = " ".join(filters) if filters else ""
base_msg = f"Node not found: <{tag}> {filter_desc}".strip()
# Add helpful hint based on filters used
if contains:
hint = "Text may be split across elements or use different wording."
elif line_number:
hint = "Line numbers may have changed if document was modified."
elif attrs:
hint = "Verify attribute values are correct."
else:
hint = "Try adding filters (attrs, line_number, or contains)."
raise ValueError(f"{base_msg}. {hint}")
if len(matches) > 1:
raise ValueError(
f"Multiple nodes found: <{tag}>. "
f"Add more filters (attrs, line_number, or contains) to narrow the search."
)
return matches[0]
def _get_element_text(self, elem):
"""
Recursively extract all text content from an element.
Skips text nodes that contain only whitespace (spaces, tabs, newlines),
which typically represent XML formatting rather than document content.
Args:
elem: defusedxml.minidom.Element to extract text from
Returns:
str: Concatenated text from all non-whitespace text nodes within the element
"""
text_parts = []
for node in elem.childNodes:
if node.nodeType == node.TEXT_NODE:
# Skip whitespace-only text nodes (XML formatting)
if node.data.strip():
text_parts.append(node.data)
elif node.nodeType == node.ELEMENT_NODE:
text_parts.append(self._get_element_text(node))
return "".join(text_parts)
def replace_node(self, elem, new_content):
"""
Replace a DOM element with new XML content.
Args:
elem: defusedxml.minidom.Element to replace
new_content: String containing XML to replace the node with
Returns:
List[defusedxml.minidom.Node]: All inserted nodes
Example:
new_nodes = editor.replace_node(old_elem, "<w:r><w:t>text</w:t></w:r>")
"""
parent = elem.parentNode
nodes = self._parse_fragment(new_content)
for node in nodes:
parent.insertBefore(node, elem)
parent.removeChild(elem)
return nodes
def insert_after(self, elem, xml_content):
"""
Insert XML content after a DOM element.
Args:
elem: defusedxml.minidom.Element to insert after
xml_content: String containing XML to insert
Returns:
List[defusedxml.minidom.Node]: All inserted nodes
Example:
new_nodes = editor.insert_after(elem, "<w:r><w:t>text</w:t></w:r>")
"""
parent = elem.parentNode
next_sibling = elem.nextSibling
nodes = self._parse_fragment(xml_content)
for node in nodes:
if next_sibling:
parent.insertBefore(node, next_sibling)
else:
parent.appendChild(node)
return nodes
def insert_before(self, elem, xml_content):
"""
Insert XML content before a DOM element.
Args:
elem: defusedxml.minidom.Element to insert before
xml_content: String containing XML to insert
Returns:
List[defusedxml.minidom.Node]: All inserted nodes
Example:
new_nodes = editor.insert_before(elem, "<w:r><w:t>text</w:t></w:r>")
"""
parent = elem.parentNode
nodes = self._parse_fragment(xml_content)
for node in nodes:
parent.insertBefore(node, elem)
return nodes
def append_to(self, elem, xml_content):
"""
Append XML content as a child of a DOM element.
Args:
elem: defusedxml.minidom.Element to append to
xml_content: String containing XML to append
Returns:
List[defusedxml.minidom.Node]: All inserted nodes
Example:
new_nodes = editor.append_to(elem, "<w:r><w:t>text</w:t></w:r>")
"""
nodes = self._parse_fragment(xml_content)
for node in nodes:
elem.appendChild(node)
return nodes
def get_next_rid(self):
"""Get the next available rId for relationships files."""
max_id = 0
for rel_elem in self.dom.getElementsByTagName("Relationship"):
rel_id = rel_elem.getAttribute("Id")
if rel_id.startswith("rId"):
try:
max_id = max(max_id, int(rel_id[3:]))
except ValueError:
pass
return f"rId{max_id + 1}"
def save(self):
"""
Save the edited XML back to the file.
Serializes the DOM tree and writes it back to the original file path,
preserving the original encoding (ascii or utf-8).
"""
content = self.dom.toxml(encoding=self.encoding)
self.xml_path.write_bytes(content)
def _parse_fragment(self, xml_content):
"""
Parse XML fragment and return list of imported nodes.
Args:
xml_content: String containing XML fragment
Returns:
List of defusedxml.minidom.Node objects imported into this document
Raises:
AssertionError: If fragment contains no element nodes
"""
# Extract namespace declarations from the root document element
root_elem = self.dom.documentElement
namespaces = []
if root_elem and root_elem.attributes:
for i in range(root_elem.attributes.length):
attr = root_elem.attributes.item(i)
if attr.name.startswith("xmlns"): # type: ignore
namespaces.append(f'{attr.name}="{attr.value}"') # type: ignore
ns_decl = " ".join(namespaces)
wrapper = f"<root {ns_decl}>{xml_content}</root>"
fragment_doc = defusedxml.minidom.parseString(wrapper)
nodes = [
self.dom.importNode(child, deep=True)
for child in fragment_doc.documentElement.childNodes # type: ignore
]
elements = [n for n in nodes if n.nodeType == n.ELEMENT_NODE]
assert elements, "Fragment must contain at least one element"
return nodes
def _create_line_tracking_parser():
"""
Create a SAX parser that tracks line and column numbers for each element.
Monkey patches the SAX content handler to store the current line and column
position from the underlying expat parser onto each element as a parse_position
attribute (line, column) tuple.
Returns:
defusedxml.sax.xmlreader.XMLReader: Configured SAX parser
"""
def set_content_handler(dom_handler):
def startElementNS(name, tagName, attrs):
orig_start_cb(name, tagName, attrs)
cur_elem = dom_handler.elementStack[-1]
cur_elem.parse_position = (
parser._parser.CurrentLineNumber, # type: ignore
parser._parser.CurrentColumnNumber, # type: ignore
)
orig_start_cb = dom_handler.startElementNS
dom_handler.startElementNS = startElementNS
orig_set_content_handler(dom_handler)
parser = defusedxml.sax.make_parser()
orig_set_content_handler = parser.setContentHandler
parser.setContentHandler = set_content_handler # type: ignore
return parser

177
skills/docx/setup.sh Executable file
View File

@@ -0,0 +1,177 @@
#!/usr/bin/env bash
# ---
# name: docx-setup
# author: Z.AI
# version: "1.0"
# description: Environment setup for the DOCX skill. Checks and installs all required dependencies.
# ---
#
# Installs only dependencies required by the DOCX skill.
set -euo pipefail
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m'
ok() { echo -e " ${GREEN}${NC} $1"; }
fail() { echo -e " ${RED}${NC} $1"; }
warn() { echo -e " ${YELLOW}${NC} $1"; }
info() { echo -e " ${BLUE}${NC} $1"; }
echo "============================================"
echo " DOCX Skill — Environment Setup"
echo "============================================"
echo ""
OS="$(uname -s)"
ARCH="$(uname -m)"
echo "Platform: $OS $ARCH"
echo ""
# ── 0. macOS: Homebrew ──
if [ "$OS" = "Darwin" ]; then
echo "--- Homebrew (macOS package manager) ---"
if command -v brew &>/dev/null; then
BREW_VER=$(brew --version 2>/dev/null | head -1)
ok "brew ($BREW_VER)"
else
fail "brew not found — Node.js install needs Homebrew on macOS"
info "Install: /bin/bash -c \"\$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\""
fi
echo ""
fi
# ── 1. Node.js (docx-js runs on Node) ──
echo "--- Node.js ---"
if command -v node &>/dev/null; then
NODE_VER=$(node --version)
ok "node ($NODE_VER)"
else
fail "node not found (required — docx generation uses docx-js on Node)"
case "$OS" in
Darwin) info "Install: brew install node" ;;
Linux) info "Install: curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -"
info " sudo apt install -y nodejs" ;;
*) info "Install: https://nodejs.org/" ;;
esac
fi
# ── 2. npm ──
echo ""
echo "--- npm ---"
if command -v npm &>/dev/null; then
NPM_VER=$(npm --version 2>/dev/null)
ok "npm ($NPM_VER)"
else
fail "npm not found"
case "$OS" in
Darwin) info "Install: brew install node (includes npm)" ;;
Linux) info "Install: comes with nodejs" ;;
*) info "Install: https://nodejs.org/" ;;
esac
fi
# ── 3. npm package: docx ──
echo ""
echo "--- npm Packages ---"
if node -e "require('docx')" 2>/dev/null || npm list -g docx &>/dev/null; then
DOCX_VER=$(node -e "try{console.log(require('docx/package.json').version)}catch(e){console.log('installed')}" 2>/dev/null)
ok "docx ($DOCX_VER)"
else
fail "docx not installed"
info "Install: npm install -g docx"
echo ""
if [ -t 0 ]; then
read -p " Install now? [Y/n] " -n 1 -r REPLY
echo ""
REPLY=${REPLY:-Y}
else
warn "Non-interactive mode — skipping auto-install."
REPLY=N
fi
if [[ ! $REPLY =~ ^[Nn]$ ]]; then
npm install -g docx 2>/dev/null && ok "Installed: docx" || fail "npm install failed"
fi
fi
# ── 4. Python 3 (post-processing scripts) ──
echo ""
echo "--- Python (post-processing) ---"
if command -v python3 &>/dev/null; then
PY_VER=$(python3 --version 2>&1)
ok "python3 ($PY_VER)"
if [ "$OS" = "Darwin" ]; then
PY_PATH=$(which python3 2>/dev/null)
if [[ "$PY_PATH" == "/usr/bin/python3" ]]; then
warn "Using macOS system Python (limited). Recommend: brew install python3"
fi
fi
else
fail "python3 not found"
case "$OS" in
Darwin) info "Install: brew install python3" ;;
Linux) info "Install: sudo apt install python3 python3-pip (Debian/Ubuntu)"
info " sudo dnf install python3 python3-pip (Fedora/RHEL)" ;;
*) info "Install: https://www.python.org/downloads/" ;;
esac
fi
# ── 5. pip ──
echo ""
echo "--- pip ---"
if python3 -m pip --version &>/dev/null 2>&1; then
PIP_VER=$(python3 -m pip --version 2>/dev/null | head -1)
ok "pip ($PIP_VER)"
else
fail "pip not found"
case "$OS" in
Darwin) info "Install: python3 -m ensurepip --upgrade"
info " or: brew install python3 (includes pip)" ;;
Linux) info "Install: sudo apt install python3-pip (Debian/Ubuntu)" ;;
*) info "Install: python3 -m ensurepip --upgrade" ;;
esac
fi
# ── 6. Python packages ──
echo ""
echo "--- Python Packages ---"
PY_PKGS=(
"defusedxml:defusedxml"
)
MISSING_PY=()
for entry in "${PY_PKGS[@]}"; do
mod="${entry%%:*}"
pkg="${entry##*:}"
if python3 -c "import $mod" 2>/dev/null; then
ver=$(python3 -c "import $mod; print(getattr($mod, '__version__', 'installed'))" 2>/dev/null)
ok "$pkg ($ver)"
else
fail "$pkg not installed"
MISSING_PY+=("$pkg")
fi
done
if [ ${#MISSING_PY[@]} -gt 0 ]; then
echo ""
if [ -t 0 ]; then
read -p " Install missing Python packages? [Y/n] " -n 1 -r REPLY
echo ""
REPLY=${REPLY:-Y}
else
warn "Non-interactive mode — skipping auto-install. Run interactively or install manually."
REPLY=N
fi
if [[ ! $REPLY =~ ^[Nn]$ ]]; then
python3 -m pip install -q "${MISSING_PY[@]}" 2>/dev/null \
|| python3 -m pip install -q --user "${MISSING_PY[@]}" 2>/dev/null \
|| python3 -m pip install -q --break-system-packages "${MISSING_PY[@]}" 2>/dev/null \
|| { fail "pip install failed. Try manually: pip install ${MISSING_PY[*]}"; }
ok "Installed: ${MISSING_PY[*]}"
fi
fi
# ── Summary ──
echo ""
echo "============================================"
echo " Setup complete."
echo " Core: Node.js + docx (npm)"
echo " Post-processing: Python + defusedxml"
echo "============================================"