Initial commit
This commit is contained in:
13
skills/docx/LICENSE.txt
Executable file
13
skills/docx/LICENSE.txt
Executable file
@@ -0,0 +1,13 @@
|
||||
Copyright (c) 2026 Z.ai All rights reserved.
|
||||
|
||||
Permission is granted for personal, educational, and non-commercial use only.
|
||||
|
||||
Commercial use is strictly prohibited without prior written permission from the author.
|
||||
|
||||
Unauthorized copying, modification, or distribution of the software for commercial purposes is prohibited.
|
||||
|
||||
The author reserves the right to make the final determination of what constitutes "commercial use".
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
||||
|
||||
IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY ARISING FROM THE USE OF THE SOFTWARE.
|
||||
201
skills/docx/SKILL.md
Executable file
201
skills/docx/SKILL.md
Executable file
@@ -0,0 +1,201 @@
|
||||
---
|
||||
name: docx
|
||||
metadata:
|
||||
author: Z.AI
|
||||
version: "1.0"
|
||||
description: "Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When GLM needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks"
|
||||
license: Proprietary. LICENSE.txt has complete terms
|
||||
---
|
||||
|
||||
# DOCX Creation, Editing, and Analysis
|
||||
|
||||
## Quick Setup
|
||||
|
||||
```bash
|
||||
bash "$SKILL_DIR/setup.sh" # Interactive environment check + install
|
||||
```
|
||||
|
||||
## Overview
|
||||
|
||||
A .docx file is a ZIP archive containing XML files. This skill provides tools for creating, editing, reading, and reviewing Word documents.
|
||||
|
||||
## Quick Route — Read This First
|
||||
|
||||
**Step 1**: Determine task type → load the corresponding route file
|
||||
**Step 2**: Determine business scene → load the corresponding scene file (if applicable)
|
||||
**Step 3**: Load `references/design-system.md` for cover recipes, palettes, and chart colors
|
||||
**Step 4**: Load `references/common-rules.md` for shared layout, font, and quality rules
|
||||
**Step 5**: Execute per route instructions
|
||||
**Step 6**: Run the post-generation checklist
|
||||
|
||||
⚠️ **MANDATORY — Cover Recipe Enforcement (Step 3):**
|
||||
When creating a document that needs a cover page, you MUST use one of the 7 validated cover recipes (R1–R7) from `design-system.md`. **Free-form cover code is FORBIDDEN.** The recipe provides the wrapper table, background, layout structure, border settings, and spacing — do not reinvent any of these.
|
||||
|
||||
Workflow: (1) Call `selectCoverRecipe(docType, industry)` to get recipe + palette → (2) Use the corresponding `buildCoverRX()` function code from `design-system.md` → (3) Pass your `config` (title, subtitle, metaLines, etc.) into the recipe builder. If you skip this and write cover code from scratch, the cover WILL have compatibility issues (blank pages in MS Office, missing borders, overflow, etc.).
|
||||
|
||||
### Script Path Setup (MANDATORY before any script call)
|
||||
|
||||
All CLI tools live in `scripts/` relative to this skill's directory. Before calling any script, resolve the absolute path once:
|
||||
|
||||
```bash
|
||||
DOCX_SCRIPTS="<skill_directory>/scripts" # ← parent directory of this SKILL.md
|
||||
|
||||
# Then all commands use $DOCX_SCRIPTS:
|
||||
python3 "$DOCX_SCRIPTS/postcheck.py" output.docx
|
||||
python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx --auto
|
||||
```
|
||||
|
||||
**For Python imports** (when generation code needs to import skill modules):
|
||||
|
||||
```python
|
||||
import sys, os
|
||||
DOCX_SCRIPTS = os.path.join("<skill_directory>", "scripts")
|
||||
if DOCX_SCRIPTS not in sys.path:
|
||||
sys.path.insert(0, DOCX_SCRIPTS)
|
||||
```
|
||||
|
||||
**⚠️ NEVER use bare `python3 scripts/...`** — it only works if cwd happens to be the skill directory. Always use the absolute `$DOCX_SCRIPTS` path.
|
||||
|
||||
### Task Router
|
||||
|
||||
| User Intent | Route | Files to Load |
|
||||
|-------------|-------|---------------|
|
||||
| Create/write/generate (no attachment) | **Create** | `routes/create.md` + `references/docx-js-core.md` |
|
||||
| Edit/modify/revise (has attachment) | **Edit** | `routes/edit.md` + `references/ooxml.md` |
|
||||
| Format/layout/font/margin | **Format** | `routes/format.md` |
|
||||
| Comment/annotate/review | **Comment** | `routes/comment.md` |
|
||||
| Read/analyze/extract | **Read** | `routes/read.md` |
|
||||
|
||||
### Scene Router (Optional — load after route)
|
||||
|
||||
| User Keywords | Scene | File |
|
||||
|---------------|-------|------|
|
||||
| thesis, academic, research, paper, dissertation, abstract, journal | Academic | `scenes/academic.md` |
|
||||
| report, analysis, experiment, testing, survey, review, summary, proposal, feasibility, competitor, industry, operations | Report | `scenes/report.md` |
|
||||
| contract, agreement, terms, transfer, NDA, confidential, framework, cooperation, service terms, user agreement, procurement | Contract | `scenes/contract.md` |
|
||||
| resume, CV, job application | Resume | `scenes/resume.md` |
|
||||
| exam, test, quiz, paper (exam context), lesson plan | Exam | `scenes/exam.md` |
|
||||
| official document, notice, letter, reply, minutes, red header, government, issuance | Official | `scenes/official-doc.md` |
|
||||
| broadcast script, product copy, livestream, speech, presentation script, video script | Copywriting | `scenes/copywriting.md` |
|
||||
| plan, proposal (if not report context) | Report | `scenes/report.md` |
|
||||
| policy, regulation, standard, management rules | Official | `scenes/official-doc.md` |
|
||||
|
||||
**If no scene matches**, use default design rules from `references/design-system.md` and `references/common-rules.md`.
|
||||
|
||||
## Formatting Standards (Always Apply)
|
||||
|
||||
→ See `references/common-rules.md` for full font profiles, spacing, indent, and layout rules.
|
||||
|
||||
**Key rules (quick reference):**
|
||||
- **Line spacing**: 1.3x (`line: 312`) — MANDATORY. Exceptions: resume 1.15x, official doc 28pt fixed, copywriting `400`, contract 1.5x
|
||||
- **CJK body**: Justified + 2-char indent (`firstLine: 480` SimSun / `420` YaHei)
|
||||
- **Tables**: `margins` set, `ShadingType.CLEAR`, `tableHeader: true`, `cantSplit: true`, title `keepNext: true`
|
||||
- **Images**: `type` parameter required, preserve aspect ratio via `image-size`, PageBreak inside Paragraph
|
||||
- **Full-page Table row**: `rule: "exact"` with 1200 twips safety margin
|
||||
|
||||
## Unit Quick Reference
|
||||
|
||||
| Unit | Value |
|
||||
|------|-------|
|
||||
| 1 cm | 567 twips |
|
||||
| 1 inch | 1440 twips |
|
||||
| 1 pt | 20 half-points |
|
||||
| A4 | 11906 × 16838 twips |
|
||||
|
||||
For Chinese font size table and common margins, see `references/common-rules.md`.
|
||||
|
||||
## Post-Generation — Two-Layer Verification
|
||||
|
||||
### Layer 1: Manual Checklist (self-check during generation)
|
||||
|
||||
#### Basic Format
|
||||
- [ ] Line spacing is 1.3x (`line: 312`) or scene-specific override
|
||||
- [ ] CJK body has 2-char indent (`firstLine: 480` or `420`)
|
||||
- [ ] Tables have margins set
|
||||
- [ ] Images preserve aspect ratio via `image-size` — NEVER hardcode both width and height
|
||||
- [ ] PageBreak inside Paragraph
|
||||
- [ ] ShadingType uses CLEAR
|
||||
- [ ] Each numbered list uses unique `reference`
|
||||
- [ ] **⚠️ CRITICAL — Quotation marks in JS strings properly escaped.** Chinese curly quotes (`""` `''`) MUST use Unicode escapes (`\u201c` `\u201d` `\u2018` `\u2019`); straight quotes (`"` `'`) use `\"` `\'` or alternate delimiters. **This is the #1 most common code generation bug.** Chinese text frequently contains `""` for emphasis or proper nouns (e.g., "双11", "前低后高", "618") — every occurrence MUST be escaped. Failure to escape produces JS syntax errors that silently break document generation.
|
||||
- [ ] ImageRun includes `type` parameter
|
||||
- [ ] Header/footer present (unless scene says otherwise)
|
||||
|
||||
#### Heading Styles
|
||||
- [ ] All body chapter headings use `heading: HeadingLevel.HEADING_X` (never simulate with bold + large font)
|
||||
- [ ] Cover title may skip Heading style (not in TOC), but body headings MUST use Heading style
|
||||
|
||||
#### Page Break & Blank Page Prevention
|
||||
- [ ] Cover/content in separate sections
|
||||
- [ ] Three rules to prevent blank pages:
|
||||
- ① When using section(NEXT_PAGE), previous section must NOT end with PageBreak (double break = blank page)
|
||||
- ② PageBreak paragraph SHOULD contain visible text — **exception**: section-ending empty para + PageBreak is allowed (normal section separator, e.g., after cover page)
|
||||
- ③ No more than 3 consecutive empty paragraphs
|
||||
- [ ] Full-page Table row height uses `rule: "exact"` (never `"atLeast"` for tall tables)
|
||||
- [ ] No unwanted blank pages (check each section ending)
|
||||
|
||||
#### TOC
|
||||
→ See `references/toc.md` for the complete TOC reference and checklist.
|
||||
- [ ] If TOC title exists → `TableOfContents` element must be present
|
||||
- [ ] **⚠️ MANDATORY PageBreak after TableOfContents** — a Paragraph containing PageBreak MUST immediately follow the `TableOfContents` element; without it, TOC and body content will render on the same page. This is the #1 TOC formatting failure — never omit it
|
||||
- [ ] `add_toc_placeholders.py --auto` runs after generation; exit code = 0
|
||||
- [ ] **TOC MUST be in its own section** — body section sets `page: { pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL } }` so page numbers start from the first body page, not from the TOC pages
|
||||
- [ ] **Page number API nesting** — `pageNumbers` MUST be inside `page: {}`, NOT at properties top level (see toc.md § Page Number API)
|
||||
- [ ] **3-section page numbering** — Cover (no page#) → Front matter (Roman i,ii,iii, start=1) → Body (Arabic 1,2,3, start=1)
|
||||
- [ ] **Post-process footers** — Roman section footer instrText must contain `PAGE \* ROMAN \* MERGEFORMAT`; Arabic section `PAGE \* arabic \* MERGEFORMAT` (WPS ignores pgNumType fmt). **⚠️ NEVER use `\* decimal` in instrText** — `decimal` is a docx-js API enum value (`NumberFormat.DECIMAL`), NOT a valid Word field format switch; using it causes page numbers to render as "1decimal", "2decimal". The correct Word field switch for Arabic numerals is `\* arabic`.
|
||||
- [ ] **Remove empty pgNumType** — Post-process to strip `<w:pgNumType/>` from cover section (docx-js emits empty element that confuses WPS)
|
||||
- [ ] **⚠️ TOC Refresh Hint MANDATORY** — between `TableOfContents` element and the PageBreak, MUST add an italic gray note paragraph telling users to right-click TOC → "Update Field" to refresh page numbers (see toc.md § TOC Refresh Hint)
|
||||
|
||||
#### Table Cross-Page
|
||||
- [ ] Header rows: `tableHeader: true`
|
||||
- [ ] All rows: `cantSplit: true`
|
||||
- [ ] Title paragraph: `keepNext: true`
|
||||
|
||||
#### Cover
|
||||
- [ ] **Cover MUST use a validated recipe (R1–R7)** from `design-system.md` — free-form cover code is forbidden
|
||||
- [ ] Cover recipe matches document type (per `selectCoverRecipe()` in `design-system.md`)
|
||||
- [ ] Cover uses the 16838 outer wrapper table with `allNoBorders` (all recipes provide this)
|
||||
- [ ] Cover title uses `calcTitleLayout()` — never hardcoded font size above 40pt
|
||||
- [ ] Cover spacing uses `calcCoverSpacing()` — never hardcoded large spacing values
|
||||
- [ ] Cover content does not overflow (total height ≤ 15638 twips, Table uses `rule: "exact"`)
|
||||
- [ ] Every TextRun on dark/colored background has explicit `color` set (Rule 9 — never rely on default black)
|
||||
- [ ] Cover section has no trailing PageBreak or empty paragraphs
|
||||
- [ ] Title lines split at semantic boundaries (no mid-word breaks, no single-char orphan lines)
|
||||
- [ ] No text-character decorative lines (`───`, `━━━`) — use paragraph borders only
|
||||
|
||||
### Layer 2: Automated Post-Check Script
|
||||
|
||||
```bash
|
||||
python3 "$DOCX_SCRIPTS/postcheck.py" output.docx
|
||||
```
|
||||
|
||||
Automatically checks 14 business rules: blank pages, **cover overflow (font size/spacing/trailing content)**, line spacing consistency, table margins, table cross-page control (cantSplit/tblHeader), image overflow, image aspect ratio distortion, font fallback, CJK indent, heading hierarchy, ShadingType misuse, TOC quality, document cleanliness (placeholder text/Markdown/HTML residuals), report content quality (abstract presence/heading specificity/vague conclusion detection).
|
||||
|
||||
⚠️ **After generating any document, MUST run postcheck.py and fix all ❌ errors.**
|
||||
|
||||
## Math Formulas
|
||||
|
||||
Formula input uses **LaTeX syntax**, internally converted to docx-js Math objects.
|
||||
|
||||
- **Basic formulas** (fractions, sub/superscript, roots, summation) → docx-js Math components
|
||||
- **Complex formulas** (3+ nesting, matrices, piecewise functions) → matplotlib PNG fallback
|
||||
|
||||
See `references/math-formulas.md`.
|
||||
|
||||
## Charts
|
||||
|
||||
Default: **matplotlib template library** generates PNG for embedding.
|
||||
|
||||
6 ready-to-use templates: bar, line, pie, box, radar, heatmap.
|
||||
Colors auto-derived from document palette.accent for style consistency.
|
||||
Default palette: Morandi low-saturation (see design-system.md).
|
||||
|
||||
See `references/chart-templates.md`.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **pandoc**: Text extraction
|
||||
- **docx**: `bun add docx` or `npm install docx` (creating)
|
||||
- **LibreOffice**: PDF conversion, .doc support
|
||||
- **Poppler**: PDF to image (`pdftoppm`)
|
||||
- **defusedxml**: Secure XML parsing
|
||||
- **python-docx**: Simple comment operations
|
||||
386
skills/docx/references/chart-templates.md
Executable file
386
skills/docx/references/chart-templates.md
Executable file
@@ -0,0 +1,386 @@
|
||||
# Chart Templates — matplotlib Template Library
|
||||
|
||||
## Design Philosophy
|
||||
|
||||
GLM uses **matplotlib as the primary chart engine**. Advantages:
|
||||
- High chart quality, print-ready
|
||||
- Full style control, consistent with document palette
|
||||
- Supports complex chart types (heatmap, radar, box plot, etc.)
|
||||
- Reliable CJK rendering (with SimHei font configured)
|
||||
|
||||
**When to use native Word charts?**
|
||||
Only when the user explicitly requests "editable charts." Default is always matplotlib PNG embedding.
|
||||
|
||||
## Base Configuration
|
||||
|
||||
```python
|
||||
import matplotlib
|
||||
matplotlib.use("Agg")
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
from matplotlib.font_manager import FontProperties
|
||||
|
||||
# ── CJK Font ──
|
||||
_FONT_PATHS = [
|
||||
"/System/Library/Fonts/Supplemental/SimHei.ttf", # macOS
|
||||
"/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc", # Linux
|
||||
"/usr/share/fonts/truetype/chinese/SimHei.ttf", # custom install
|
||||
"./SimHei.ttf", # current dir
|
||||
]
|
||||
ZH_FONT = None
|
||||
for _fp in _FONT_PATHS:
|
||||
try:
|
||||
ZH_FONT = FontProperties(fname=_fp)
|
||||
break
|
||||
except:
|
||||
continue
|
||||
|
||||
plt.rcParams["axes.unicode_minus"] = False
|
||||
|
||||
# ── Palette Adapter ──
|
||||
def make_chart_palette(accent: str, surface: str = "#F2F4F6") -> dict:
|
||||
"""Generate chart palette from document palette.accent"""
|
||||
return {
|
||||
"primary": accent,
|
||||
"series": _generate_series_colors(accent, 6),
|
||||
"grid": "#E0E0E0",
|
||||
"bg": "white",
|
||||
"text": "#333333",
|
||||
"surface": surface,
|
||||
}
|
||||
|
||||
def _generate_series_colors(base_hex: str, count: int) -> list:
|
||||
"""Generate series colors via hue rotation from base color"""
|
||||
import colorsys
|
||||
base = tuple(int(base_hex.lstrip("#")[i:i+2], 16) / 255.0 for i in (0, 2, 4))
|
||||
h, s, v = colorsys.rgb_to_hsv(*base)
|
||||
colors = []
|
||||
for i in range(count):
|
||||
hi = (h + i * (1.0 / count)) % 1.0
|
||||
r, g, b = colorsys.hsv_to_rgb(hi, min(s * 0.9, 1.0), min(v * 1.05, 1.0))
|
||||
colors.append(f"#{int(r*255):02x}{int(g*255):02x}{int(b*255):02x}")
|
||||
return colors
|
||||
|
||||
# ── Universal Export ──
|
||||
def save_chart(fig, path: str, dpi: int = 200):
|
||||
"""Save chart with uniform DPI. Square charts (pie/radar) use fixed padding to preserve 1:1 ratio."""
|
||||
w, h = fig.get_size_inches()
|
||||
if abs(w - h) < 0.1:
|
||||
fig.savefig(path, dpi=dpi, bbox_inches="tight", pad_inches=0.3,
|
||||
facecolor="white", edgecolor="none")
|
||||
else:
|
||||
fig.savefig(path, dpi=dpi, bbox_inches="tight", pad_inches=0.1,
|
||||
facecolor="white", edgecolor="none")
|
||||
plt.close(fig)
|
||||
return path
|
||||
```
|
||||
|
||||
## Template 1: Bar Chart
|
||||
|
||||
```python
|
||||
def bar_chart(categories: list, values: list, title: str = "",
|
||||
ylabel: str = "", palette: dict = None, output: str = "bar.png"):
|
||||
"""
|
||||
Basic bar chart.
|
||||
categories: ["Q1", "Q2", "Q3", "Q4"]
|
||||
values: [120, 150, 180, 200]
|
||||
"""
|
||||
p = palette or make_chart_palette("#5B8DB8")
|
||||
fig, ax = plt.subplots(figsize=(10, 6))
|
||||
|
||||
bars = ax.bar(categories, values, color=p["primary"], width=0.6, edgecolor="white")
|
||||
|
||||
# Data labels
|
||||
for bar, val in zip(bars, values):
|
||||
ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + max(values) * 0.02,
|
||||
str(val), ha="center", va="bottom", fontsize=10,
|
||||
fontproperties=ZH_FONT, color=p["text"])
|
||||
|
||||
if title:
|
||||
ax.set_title(title, fontproperties=ZH_FONT, fontsize=14, pad=15, color=p["text"])
|
||||
if ylabel:
|
||||
ax.set_ylabel(ylabel, fontproperties=ZH_FONT, fontsize=11, color=p["text"])
|
||||
|
||||
ax.set_xticklabels(categories, fontproperties=ZH_FONT, fontsize=10)
|
||||
ax.spines[["top", "right"]].set_visible(False)
|
||||
ax.grid(axis="y", alpha=0.3, color=p["grid"])
|
||||
|
||||
if len(categories) > 6:
|
||||
plt.xticks(rotation=45, ha="right")
|
||||
|
||||
return save_chart(fig, output)
|
||||
```
|
||||
|
||||
### Grouped Bar Chart
|
||||
|
||||
```python
|
||||
def grouped_bar(categories: list, groups: dict, title: str = "",
|
||||
ylabel: str = "", palette: dict = None, output: str = "grouped_bar.png"):
|
||||
"""
|
||||
groups: {"Product A": [10, 20, 30], "Product B": [15, 25, 35]}
|
||||
"""
|
||||
p = palette or make_chart_palette("#5B8DB8")
|
||||
fig, ax = plt.subplots(figsize=(10, 6))
|
||||
|
||||
x = np.arange(len(categories))
|
||||
n = len(groups)
|
||||
width = 0.8 / n
|
||||
|
||||
for i, (name, vals) in enumerate(groups.items()):
|
||||
offset = (i - n / 2 + 0.5) * width
|
||||
bars = ax.bar(x + offset, vals, width, label=name, color=p["series"][i % len(p["series"])])
|
||||
|
||||
ax.set_xticks(x)
|
||||
ax.set_xticklabels(categories, fontproperties=ZH_FONT, fontsize=10)
|
||||
ax.legend(prop=ZH_FONT, frameon=False)
|
||||
if title:
|
||||
ax.set_title(title, fontproperties=ZH_FONT, fontsize=14, pad=15)
|
||||
ax.spines[["top", "right"]].set_visible(False)
|
||||
ax.grid(axis="y", alpha=0.3)
|
||||
|
||||
return save_chart(fig, output)
|
||||
```
|
||||
|
||||
## Template 2: Line Chart
|
||||
|
||||
```python
|
||||
def line_chart(x_data: list, series: dict, title: str = "",
|
||||
xlabel: str = "", ylabel: str = "", palette: dict = None,
|
||||
output: str = "line.png"):
|
||||
"""
|
||||
series: {"Revenue": [100, 120, 150, 180], "Cost": [80, 90, 100, 110]}
|
||||
"""
|
||||
p = palette or make_chart_palette("#5B8DB8")
|
||||
fig, ax = plt.subplots(figsize=(10, 6))
|
||||
|
||||
for i, (name, values) in enumerate(series.items()):
|
||||
color = p["series"][i % len(p["series"])]
|
||||
ax.plot(x_data, values, marker="o", markersize=5, linewidth=2,
|
||||
label=name, color=color)
|
||||
|
||||
if title:
|
||||
ax.set_title(title, fontproperties=ZH_FONT, fontsize=14, pad=15)
|
||||
if xlabel:
|
||||
ax.set_xlabel(xlabel, fontproperties=ZH_FONT, fontsize=11)
|
||||
if ylabel:
|
||||
ax.set_ylabel(ylabel, fontproperties=ZH_FONT, fontsize=11)
|
||||
|
||||
ax.legend(prop=ZH_FONT, frameon=False, loc="best")
|
||||
ax.spines[["top", "right"]].set_visible(False)
|
||||
ax.grid(True, alpha=0.3)
|
||||
|
||||
if len(x_data) > 6:
|
||||
plt.xticks(rotation=45, ha="right")
|
||||
|
||||
return save_chart(fig, output)
|
||||
```
|
||||
|
||||
## Template 3: Pie Chart
|
||||
|
||||
```python
|
||||
def pie_chart(labels: list, values: list, title: str = "",
|
||||
palette: dict = None, output: str = "pie.png"):
|
||||
"""Pie chart — auto-merges slices below 3% into 'Other'"""
|
||||
p = palette or make_chart_palette("#5B8DB8")
|
||||
fig, ax = plt.subplots(figsize=(8, 8))
|
||||
|
||||
# Merge slices below 3% into "Other"
|
||||
total = sum(values)
|
||||
merged_labels, merged_values = [], []
|
||||
other = 0
|
||||
for lbl, val in zip(labels, values):
|
||||
if val / total < 0.03:
|
||||
other += val
|
||||
else:
|
||||
merged_labels.append(lbl)
|
||||
merged_values.append(val)
|
||||
if other > 0:
|
||||
merged_labels.append("Other")
|
||||
merged_values.append(other)
|
||||
|
||||
colors = p["series"][:len(merged_labels)]
|
||||
wedges, texts, autotexts = ax.pie(
|
||||
merged_values, labels=merged_labels, colors=colors,
|
||||
autopct="%1.1f%%", startangle=90, pctdistance=0.75,
|
||||
textprops={"fontproperties": ZH_FONT, "fontsize": 11}
|
||||
)
|
||||
|
||||
for t in autotexts:
|
||||
t.set_fontsize(10)
|
||||
t.set_color("white")
|
||||
|
||||
if title:
|
||||
ax.set_title(title, fontproperties=ZH_FONT, fontsize=14, pad=20)
|
||||
|
||||
return save_chart(fig, output)
|
||||
```
|
||||
|
||||
## Template 4: Box Plot
|
||||
|
||||
```python
|
||||
def box_plot(data: dict, title: str = "", ylabel: str = "",
|
||||
palette: dict = None, output: str = "box.png"):
|
||||
"""
|
||||
data: {"Class A": [78, 82, 91, ...], "Class B": [65, 70, 88, ...]}
|
||||
"""
|
||||
p = palette or make_chart_palette("#5B8DB8")
|
||||
fig, ax = plt.subplots(figsize=(10, 6))
|
||||
|
||||
labels = list(data.keys())
|
||||
values = list(data.values())
|
||||
|
||||
bp = ax.boxplot(values, labels=labels, patch_artist=True, notch=False,
|
||||
medianprops={"color": "white", "linewidth": 2})
|
||||
|
||||
for i, patch in enumerate(bp["boxes"]):
|
||||
patch.set_facecolor(p["series"][i % len(p["series"])])
|
||||
patch.set_alpha(0.8)
|
||||
|
||||
ax.set_xticklabels(labels, fontproperties=ZH_FONT, fontsize=11)
|
||||
if title:
|
||||
ax.set_title(title, fontproperties=ZH_FONT, fontsize=14, pad=15)
|
||||
if ylabel:
|
||||
ax.set_ylabel(ylabel, fontproperties=ZH_FONT, fontsize=11)
|
||||
ax.spines[["top", "right"]].set_visible(False)
|
||||
ax.grid(axis="y", alpha=0.3)
|
||||
|
||||
return save_chart(fig, output)
|
||||
```
|
||||
|
||||
## Template 5: Radar Chart
|
||||
|
||||
```python
|
||||
def radar_chart(categories: list, series: dict, title: str = "",
|
||||
palette: dict = None, output: str = "radar.png"):
|
||||
"""
|
||||
categories: ["Chinese", "Math", "English", "Physics", "Chemistry"]
|
||||
series: {"Student A": [85, 92, 78, 90, 88], "Student B": [75, 88, 92, 70, 85]}
|
||||
"""
|
||||
p = palette or make_chart_palette("#5B8DB8")
|
||||
fig, ax = plt.subplots(figsize=(8, 8), subplot_kw=dict(polar=True))
|
||||
|
||||
n = len(categories)
|
||||
angles = np.linspace(0, 2 * np.pi, n, endpoint=False).tolist()
|
||||
angles += angles[:1] # close the polygon
|
||||
|
||||
for i, (name, values) in enumerate(series.items()):
|
||||
vals = values + values[:1] # close the polygon
|
||||
color = p["series"][i % len(p["series"])]
|
||||
ax.plot(angles, vals, linewidth=2, label=name, color=color)
|
||||
ax.fill(angles, vals, alpha=0.15, color=color)
|
||||
|
||||
ax.set_xticks(angles[:-1])
|
||||
ax.set_xticklabels(categories, fontproperties=ZH_FONT, fontsize=11)
|
||||
ax.legend(prop=ZH_FONT, loc="upper right", bbox_to_anchor=(1.2, 1.1), frameon=False)
|
||||
|
||||
if title:
|
||||
ax.set_title(title, fontproperties=ZH_FONT, fontsize=14, pad=25)
|
||||
|
||||
return save_chart(fig, output)
|
||||
```
|
||||
|
||||
## Template 6: Heatmap
|
||||
|
||||
```python
|
||||
def heatmap(data: list, row_labels: list, col_labels: list, title: str = "",
|
||||
palette: dict = None, output: str = "heatmap.png"):
|
||||
"""
|
||||
data: 2D array [[1,2,3],[4,5,6]]
|
||||
row_labels: ["Row 1", "Row 2"]
|
||||
col_labels: ["Col 1", "Col 2", "Col 3"]
|
||||
"""
|
||||
fig, ax = plt.subplots(figsize=(max(8, len(col_labels) * 1.2), max(6, len(row_labels) * 0.8)))
|
||||
|
||||
arr = np.array(data)
|
||||
im = ax.imshow(arr, cmap="YlOrRd", aspect="auto")
|
||||
|
||||
ax.set_xticks(range(len(col_labels)))
|
||||
ax.set_yticks(range(len(row_labels)))
|
||||
ax.set_xticklabels(col_labels, fontproperties=ZH_FONT, fontsize=10)
|
||||
ax.set_yticklabels(row_labels, fontproperties=ZH_FONT, fontsize=10)
|
||||
|
||||
# Value annotations
|
||||
for i in range(len(row_labels)):
|
||||
for j in range(len(col_labels)):
|
||||
val = arr[i, j]
|
||||
color = "white" if val > arr.max() * 0.7 else "black"
|
||||
ax.text(j, i, f"{val:.1f}", ha="center", va="center",
|
||||
fontsize=10, color=color)
|
||||
|
||||
fig.colorbar(im, ax=ax, shrink=0.8)
|
||||
if title:
|
||||
ax.set_title(title, fontproperties=ZH_FONT, fontsize=14, pad=15)
|
||||
|
||||
return save_chart(fig, output)
|
||||
```
|
||||
|
||||
## Embedding in Documents (MANDATORY — Preserve Aspect Ratio)
|
||||
|
||||
**⚠️ Core Rule: When embedding any chart image, you MUST read actual image dimensions to calculate displayHeight. NEVER hardcode both width and height.**
|
||||
|
||||
Pie and radar charts are square — mismatched width/height produces ellipses or diamonds.
|
||||
|
||||
```js
|
||||
// ✅ Correct: read actual image dimensions
|
||||
const chartBuffer = fs.readFileSync("bar.png");
|
||||
const sizeOf = require("image-size");
|
||||
const dims = sizeOf(chartBuffer);
|
||||
const displayWidth = 500;
|
||||
const displayHeight = Math.round(displayWidth * (dims.height / dims.width));
|
||||
|
||||
new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
spacing: { before: 200, after: 100 },
|
||||
children: [
|
||||
new ImageRun({
|
||||
data: chartBuffer,
|
||||
transformation: { width: displayWidth, height: displayHeight },
|
||||
type: "png",
|
||||
}),
|
||||
],
|
||||
})
|
||||
```
|
||||
|
||||
```js
|
||||
// ❌ Wrong: hardcoded width and height (pie becomes ellipse, radar becomes diamond)
|
||||
new ImageRun({
|
||||
data: chartBuffer,
|
||||
transformation: { width: 500, height: 350 }, // wrong ratio!
|
||||
type: "png",
|
||||
})
|
||||
```
|
||||
|
||||
```python
|
||||
# ✅ Python (ReportLab) correct approach:
|
||||
from PIL import Image as PILImage
|
||||
from reportlab.platypus import Image
|
||||
pil_img = PILImage.open('chart.png')
|
||||
orig_w, orig_h = pil_img.size
|
||||
target_width = 400 # pt
|
||||
scale = target_width / orig_w
|
||||
img = Image('chart.png', width=target_width, height=orig_h * scale)
|
||||
```
|
||||
|
||||
## Chart Selection Guide
|
||||
|
||||
| Data Scenario | Recommended Chart | Template Function |
|
||||
|---------------|-------------------|-------------------|
|
||||
| Category comparison | Bar chart | `bar_chart()` |
|
||||
| Multi-group comparison | Grouped bar | `grouped_bar()` |
|
||||
| Trend over time | Line chart | `line_chart()` |
|
||||
| Proportion/composition | Pie chart | `pie_chart()` |
|
||||
| Distribution/spread | Box plot | `box_plot()` |
|
||||
| Multi-dimensional assessment | Radar chart | `radar_chart()` |
|
||||
| Matrix correlation | Heatmap | `heatmap()` |
|
||||
|
||||
## Quality Standards
|
||||
|
||||
1. **DPI**: Uniform 200 DPI (built into `save_chart`)
|
||||
2. **Colors**: Derived from document palette.accent for style consistency
|
||||
3. **CJK text**: Must configure SimHei font; otherwise renders as boxes
|
||||
4. **Label overlap prevention**: Auto-rotate 45° when >6 x-axis labels
|
||||
5. **Legend**: Move outside chart (`bbox_to_anchor`) when >4 series
|
||||
6. **Grid**: Light gray dashed grid lines for readability
|
||||
7. **Clean frames**: Remove top/right spines for modern minimalist look
|
||||
8. **Aspect ratio (CRITICAL)**: Must use `image-size` (JS) or `PIL` (Python) to read actual image dimensions and calculate displayHeight proportionally. **Pie and radar charts are square — hardcoding non-1:1 ratio causes ellipse/diamond distortion.**
|
||||
9. **Dimensions**: Default 10×6 inches, fits well within A4 page
|
||||
419
skills/docx/references/common-rules.md
Executable file
419
skills/docx/references/common-rules.md
Executable file
@@ -0,0 +1,419 @@
|
||||
# Common Rules
|
||||
|
||||
Shared rules referenced by all scene files. Scene-specific overrides take precedence.
|
||||
|
||||
## Default Page Layout
|
||||
|
||||
A4 portrait. Unless the scene specifies otherwise, use:
|
||||
|
||||
| Property | Value | Twips |
|
||||
|----------|-------|-------|
|
||||
| Page width | 21.0 cm | 11906 |
|
||||
| Page height | 29.7 cm | 16838 |
|
||||
| Top margin | 2.54 cm | 1440 |
|
||||
| Bottom margin | 2.54 cm | 1440 |
|
||||
| Left margin | 3.0 cm | 1701 |
|
||||
| Right margin | 2.5 cm | 1417 |
|
||||
|
||||
```js
|
||||
page: {
|
||||
size: { width: 11906, height: 16838, orientation: PageOrientation.PORTRAIT },
|
||||
margin: { top: 1440, bottom: 1440, left: 1701, right: 1417 },
|
||||
}
|
||||
```
|
||||
|
||||
**Scene overrides:**
|
||||
- **Official doc (GB/T 9704 red-header):** top 2098, bottom 1984, left 1588, right 1474
|
||||
- **Exam:** top/bottom 1134 (2 cm), left/right 1134 (2 cm)
|
||||
|
||||
## Default Font Specifications
|
||||
|
||||
Two font profiles exist. Each scene declares which profile it uses.
|
||||
|
||||
### Profile A: Formal (report, academic, contract, official-doc, exam)
|
||||
|
||||
| Element | CN Font | EN Font | Size | Notes |
|
||||
|---------|---------|---------|------|-------|
|
||||
| H1 | SimHei | Times New Roman | 16 pt (size: 32) | Bold, centered |
|
||||
| H2 | SimHei | Times New Roman | 15 pt (size: 30) | Bold |
|
||||
| H3 | SimHei | Times New Roman | 14 pt (size: 28) | Bold |
|
||||
| Body | SimSun | Times New Roman | 12 pt (size: 24) | |
|
||||
| Caption | SimSun | Times New Roman | 10.5 pt (size: 21) | |
|
||||
|
||||
- Text color: always **pure black `"000000"`** (never dark-blue-grey)
|
||||
- First-line indent: **480 twips** (2 chars at SimSun 12pt)
|
||||
- Line spacing: **312** (1.3x).
|
||||
- **Color routing for non-report documents**: When the document is a short-form text (essay, evaluation, letter, speech, application, reflection, etc.) rather than a structured report/whitepaper/proposal/consulting deliverable, heading color MUST use pure black `"000000"` instead of `palette.primary`. Colored headings are reserved for documents that need brand/professional identity (reports with covers, whitepapers, proposals, consulting deliverables).
|
||||
|
||||
### Profile B: Visual (resume, copywriting)
|
||||
|
||||
| Element | CN Font | EN Font | Size |
|
||||
|---------|---------|---------|------|
|
||||
| Name/Title | Microsoft YaHei | Calibri | Varies |
|
||||
| Body | Microsoft YaHei | Calibri | 10–11 pt |
|
||||
| Caption | Microsoft YaHei | Calibri | 9 pt |
|
||||
|
||||
- First-line indent: **420 twips** (2 chars at YaHei)
|
||||
- Color: per design-system palette
|
||||
|
||||
### Official-Doc Font Override (GB/T 9704)
|
||||
|
||||
When `needsRedHeader() = true`:
|
||||
|
||||
| Element | Font | Size |
|
||||
|---------|------|------|
|
||||
| Red header org name | STXiaoBiaoSong (or SimSun bold) | 26 pt (size: 52) |
|
||||
| Title | STXiaoBiaoSong (or SimHei) | 22 pt (size: 44) |
|
||||
| Body | FangSong | 16 pt (size: 32) |
|
||||
| Section heading | FangSong_GB2312 bold (or HeiTi) | 16 pt (size: 32) |
|
||||
|
||||
- Line spacing: **560** (28 pt fixed)
|
||||
- First-line indent: **640 twips** (2 chars at FangSong 16pt)
|
||||
|
||||
## Chinese Font Size Reference
|
||||
|
||||
| Name | Points | Half-points (size:) |
|
||||
|------|--------|---------------------|
|
||||
| Chu Hao (initial) | 42 | 84 |
|
||||
| Xiao Chu | 36 | 72 |
|
||||
| Yi Hao (1st) | 26 | 52 |
|
||||
| Xiao Yi | 24 | 48 |
|
||||
| Er Hao (2nd) | 22 | 44 |
|
||||
| Xiao Er | 18 | 36 |
|
||||
| San Hao (3rd) | 16 | 32 |
|
||||
| Xiao San | 15 | 30 |
|
||||
| Si Hao (4th) | 14 | 28 |
|
||||
| Xiao Si | 12 | 24 |
|
||||
| Wu Hao (5th) | 10.5 | 21 |
|
||||
| Xiao Wu | 9 | 18 |
|
||||
| Liu Hao (6th) | 7.5 | 15 |
|
||||
|
||||
## Placeholder Convention
|
||||
|
||||
When required information is missing, use standardized placeholders so users can Find & Replace in Word.
|
||||
|
||||
**Format:** Always use full-width brackets `【 】`.
|
||||
|
||||
| Type | Format | Example |
|
||||
|------|--------|---------|
|
||||
| General field | `【field name】` | Name: 【company name】 |
|
||||
| Monetary amount | `【RMB in words: yuan (lowercase: ¥)】` | Amount: 【RMB in words】 |
|
||||
| Date field | `【____/____/____】` | Signing date: 【____/____/____】 |
|
||||
| Long text | `【Please fill in: ______】` | Delivery criteria: 【Please fill in: ______】 |
|
||||
| Attachment ref | `【See Appendix 1: ______】` | |
|
||||
|
||||
**Rules:**
|
||||
1. Placeholder format must be consistent throughout the entire document
|
||||
2. Each placeholder must specify exactly what is needed (never use vague "TBD" or "to be completed")
|
||||
3. Never hard-code unconfirmed critical facts; use a placeholder instead
|
||||
4. Never use sloppy expressions like "to be refined", "omitted", "user fills in later"
|
||||
|
||||
## Title Orphan Prevention (All Scenes)
|
||||
|
||||
Body headings (H1/H2/H3) and cover titles must avoid leaving 1–2 characters alone on the last line. This rule applies to ALL document types.
|
||||
|
||||
**For cover titles:** Always use `calcTitleLayout()` + `splitTitleLines()` from `design-system.md` — these handle orphan prevention automatically (merges ≤2-char last lines into the previous line).
|
||||
|
||||
**For body headings (H1/H2/H3):** When a heading text is long enough to wrap, apply the same `splitTitleLines()` logic. If the heading would cause a single-character orphan in Word's auto-wrapping, manually split into multiple `TextRun` elements with a `Break` (soft line break) at a semantic boundary.
|
||||
|
||||
```js
|
||||
const { Break } = require("docx");
|
||||
|
||||
// Check if heading needs manual line break to prevent orphan
|
||||
function buildHeadingRuns(text, maxCharsPerLine, runProps) {
|
||||
// If text fits in one line, no action needed
|
||||
if (text.length <= maxCharsPerLine) {
|
||||
return [new TextRun({ text, ...runProps })];
|
||||
}
|
||||
// Use splitTitleLines to find semantic break points
|
||||
const lines = splitTitleLines(text, maxCharsPerLine);
|
||||
const runs = [];
|
||||
for (let i = 0; i < lines.length; i++) {
|
||||
if (i > 0) runs.push(new TextRun({ break: 1, ...runProps, text: "" })); // soft line break
|
||||
runs.push(new TextRun({ text: lines[i], ...runProps }));
|
||||
}
|
||||
return runs;
|
||||
}
|
||||
```
|
||||
|
||||
**Estimation for maxCharsPerLine:** For centered headings, estimate available width = page width - left margin - right margin. For SimHei at a given pt size, each CJK char ≈ pt × 20 twips wide. Divide available width by char width to get `maxCharsPerLine`.
|
||||
|
||||
---
|
||||
|
||||
## Undefined / Null Value Prevention (Mandatory)
|
||||
|
||||
Generated code MUST guard against outputting literal `undefined`, `null`, `NaN`, or empty strings for any visible text field. This is a **hard requirement** — these are never acceptable in a delivered document.
|
||||
|
||||
```js
|
||||
// ✅ MANDATORY: Safe text helper — use for ALL user-facing text values
|
||||
function safeText(value, placeholder) {
|
||||
if (value === undefined || value === null || value === "" || String(value) === "NaN" || String(value) === "undefined") {
|
||||
return placeholder || "【Please fill in】";
|
||||
}
|
||||
return String(value);
|
||||
}
|
||||
|
||||
// Usage:
|
||||
new TextRun({ text: safeText(config.contact, "【Contact person】") })
|
||||
new TextRun({ text: safeText(row.phone, "【Phone number】") })
|
||||
```
|
||||
|
||||
**Rules:**
|
||||
1. Every `TextRun` displaying user-provided or config-derived data MUST use `safeText()` or equivalent guard
|
||||
2. If a field is optional and not provided, use `【Please fill in: field_name】` placeholder (full-width brackets)
|
||||
3. Table cells with missing data: show `【Please fill in】`, never leave as empty string or undefined
|
||||
4. This applies to ALL scenes — contracts, reports, academic, exams, etc.
|
||||
|
||||
---
|
||||
|
||||
## WPS / Office Word Compatibility (Mandatory)
|
||||
|
||||
Generated .docx files must render consistently in both Microsoft Office Word and WPS Office. The following OOXML features have known compatibility issues — avoid or use carefully.
|
||||
|
||||
### Features to AVOID (high incompatibility risk)
|
||||
|
||||
| Feature | Issue | Alternative |
|
||||
|---------|-------|------------|
|
||||
| **Text-character decorative lines** (e.g., `───`, `━━━`, `═══`, `——————`) | Character-drawn lines depend on font metrics and rendering engine — they appear different widths/lengths in MS Office vs WPS, often truncated or misaligned. They cannot span a controlled width. | **Always use paragraph borders** (`border.top`, `border.bottom`) for horizontal decorative lines. Paragraph borders render consistently across engines and respect indent for precise width control. See recipe R2 for correct implementation. |
|
||||
| **Default table borders on cover wrapper tables** (forgetting `allNoBorders`) | docx-js default table borders are `single/auto/sz=4`. On the 16838-high cover wrapper, these borders add ~8 twips of extra height per edge. MS Office includes border thickness in height calculation, causing content to overflow by a few twips → **blank page 2**. WPS is more lenient and may absorb the overflow. | **Every cover wrapper table MUST explicitly set `borders: allNoBorders`** (all 6 border positions = NONE). Never rely on defaults. Define the `allNoBorders` constant and use it consistently. |
|
||||
| `verticalAlign: "center"` or `"bottom"` in exact-height TableRow | WPS ignores vertical alignment in exact-height rows; content may clip or shift | Use `verticalAlign: "top"` + `spacing.before` to position content. Avoid `margins.top`/`margins.bottom` in exact-height cells — they reduce available height unpredictably across engines |
|
||||
| `characterSpacing` (large values) | WPS renders differently from Word; letter spacing may collapse or expand | Keep `characterSpacing` ≤ 80; for cover English labels, test both renderers |
|
||||
| `margins.top`/`margins.bottom` inside exact-height cells | MS Office and WPS calculate remaining height differently when cell margins are present | Use `spacing.before` on the first paragraph for vertical positioning; only use `margins.left`/`margins.right` |
|
||||
| Complex nested Tables inside exact-height cells | WPS height calculation differs from Word; content may overflow or clip | Wrap everything in a single 16838 outer wrapper cell (R1 architecture). Nested tables inside are acceptable when the outer wrapper provides a safety net |
|
||||
| Large font without explicit `spacing.line` | Paragraph inherits small line spacing from document default (e.g., 560tw for body); font taller than line height → top of characters clipped | Always set `spacing: { line: fontPt * 23, lineRule: "atLeast" }` on paragraphs with font size > body text |
|
||||
| `ShadingType.SOLID` | WPS shows solid black instead of intended color | Always use `ShadingType.CLEAR` |
|
||||
| OOXML raw XML for columns (`w:cols`) | WPS column rendering may differ | Use only when explicitly needed (A3 exam papers); test output |
|
||||
| `titlePage: true` with complex headers/footers | WPS may not properly suppress first-page header/footer | Use separate sections instead of titlePage flag |
|
||||
| Tab stops for alignment | WPS tab width may differ from Word | Use borderless Tables for alignment instead |
|
||||
|
||||
### Features that are SAFE (consistent rendering)
|
||||
|
||||
| Feature | Notes |
|
||||
|---------|-------|
|
||||
| Borderless Tables for layout | Both renderers handle well |
|
||||
| `ShadingType.CLEAR` with fill color | Consistent |
|
||||
| `rule: "exact"` on single-level TableRow | Works in both (avoid with nested Tables) |
|
||||
| Paragraph borders (left, bottom, etc.) | Consistent |
|
||||
| `spacing.before` / `spacing.after` | Consistent |
|
||||
| Standard fonts (SimHei, SimSun, YaHei, TNR, Calibri) | Available on both platforms |
|
||||
| `PageBreak` inside Paragraph | Consistent |
|
||||
| Section breaks (`SectionType.NEXT_PAGE`) | Consistent |
|
||||
|
||||
### Mandatory Compatibility Checks (Post-Generation)
|
||||
|
||||
Add to quality self-check:
|
||||
- [ ] No `ShadingType.SOLID` anywhere (search codebase)
|
||||
- [ ] No `verticalAlign: "center"` or `"bottom"` in exact-height rows
|
||||
- [ ] No tab-stop alignment for party info or data alignment (use Tables)
|
||||
- [ ] Covers use the 16838 outer wrapper architecture (R1 pattern) with `spacing.before` for positioning; no `margins.top`/`margins.bottom` in exact-height cells
|
||||
- [ ] **Cover section margin = `{ top: 0, bottom: 0, left: 0, right: 0 }`** — non-zero margins cause wrapper to shrink away from page edges
|
||||
- [ ] **Cover wrapper row has `height: { value: 16838, rule: "exact" }`** — without this, content overflows or leaves whitespace
|
||||
- [ ] **Cover is in a separate section from body content** — cover and body must not share a section
|
||||
- [ ] **Cover wrapper table uses explicit `allNoBorders`** — never rely on default table borders (causes blank page 2 in MS Office)
|
||||
- [ ] **No text-character decorative lines** (`───`, `━━━`, `═══`, `——————`) — use paragraph borders instead
|
||||
- [ ] `characterSpacing` values ≤ 80 throughout
|
||||
- [ ] TOC: follow `references/toc.md` checklist (heading style, TableOfContents element, PageBreak, post-processing script)
|
||||
- [ ] All tables use `WidthType.PERCENTAGE` for column widths (WPS tblGrid bug; if DXA is unavoidable, set `columnWidths` explicitly)
|
||||
|
||||
```js
|
||||
// ✅ Correct — percentage widths, WPS-safe
|
||||
new Table({
|
||||
width: { size: 100, type: WidthType.PERCENTAGE },
|
||||
rows: [new TableRow({ children: [
|
||||
new TableCell({ width: { size: 30, type: WidthType.PERCENTAGE }, children: [...] }),
|
||||
new TableCell({ width: { size: 70, type: WidthType.PERCENTAGE }, children: [...] }),
|
||||
]})],
|
||||
});
|
||||
|
||||
// ❌ WRONG — DXA widths cause WPS tblGrid mismatch (all gridCol=100)
|
||||
new TableCell({ width: { size: 3000, type: WidthType.DXA }, ... })
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Universal Prohibitions
|
||||
|
||||
These apply to ALL scenes. Scene files may add scene-specific prohibitions.
|
||||
|
||||
1. **No outlines-only** — always produce a complete, finished document
|
||||
2. **No chat-style output** — the document must not read like a conversation or explanation
|
||||
3. **No fake TOC / page numbers / headers** — use proper docx-js structures
|
||||
4. **No excessive blank lines** to pad layout
|
||||
5. **No dirty formatting** — no stray annotations, template fragments, broken hyperlinks, garbled markers
|
||||
6. **No sloppy placeholders** — "TBD", "omitted", "略", "to be refined" are forbidden; use proper `【】` placeholders
|
||||
7. **No fabricated data** — do not invent statistics, citations, legal references, or facts to appear professional
|
||||
8. **No inconsistent heading/numbering** — one numbering system per document, no level-skipping
|
||||
9. **No Markdown artifacts** — no `#`, `**`, `-` list markers, `>` blockquotes, and **no Markdown table syntax** (`| col1 | col2 |`, `|---|---|`) in the final docx. Any tabular data MUST be rendered as a proper docx `Table` object — never as plain-text pipe-delimited lines. This applies to ALL scenes including exam paper data tables, report statistics, and academic result tables.
|
||||
10. **No bullet-list documents** — body text must be proper paragraphs, not endless bullet points
|
||||
|
||||
## Letter / Correspondence Format (Universal)
|
||||
|
||||
When generating any letter-style document (invitation letter, thank-you letter, cover letter, recommendation letter, English essay in letter format, etc.), the following layout rules apply regardless of scene:
|
||||
|
||||
1. **Complimentary close and sender name MUST be right-aligned** — e.g., "Yours sincerely,", "Best regards,", "Yours,", and the sender name below it must use `alignment: AlignmentType.RIGHT`
|
||||
2. **Date** — if placed at the top of the letter, right-aligned; if at the bottom, right-aligned with the closing
|
||||
3. **Salutation** ("Dear Mr. Smith," / "Dear Mike,") — left-aligned, followed by a blank line or `spacing.after`
|
||||
4. **Body paragraphs** — left-aligned (English) or justified (CJK), with appropriate `spacing.after` between paragraphs
|
||||
|
||||
```js
|
||||
// ✅ Correct — closing and sender right-aligned
|
||||
new Paragraph({ alignment: AlignmentType.RIGHT, spacing: { before: 400 },
|
||||
children: [new TextRun({ text: "Yours sincerely,", size: 24 })] }),
|
||||
new Paragraph({ alignment: AlignmentType.RIGHT,
|
||||
children: [new TextRun({ text: "Li Hua", size: 24 })] }),
|
||||
|
||||
// ❌ WRONG — closing left-aligned (default)
|
||||
new Paragraph({
|
||||
children: [new TextRun({ text: "Yours sincerely," })] }),
|
||||
```
|
||||
|
||||
## Quality Self-Check (Universal)
|
||||
|
||||
→ See **SKILL.md § Post-Generation — Two-Layer Verification** for the complete checklist.
|
||||
|
||||
Scene files add scene-specific checks on top of that universal checklist.
|
||||
|
||||
## Execution Priority
|
||||
|
||||
When rules conflict, follow this precedence (highest first):
|
||||
|
||||
1. **User-provided template or explicit instructions** — always override defaults
|
||||
2. **Scene-specific rules** — override common rules and design-system defaults
|
||||
3. **Common rules** (this file) — override design-system aesthetic defaults
|
||||
4. **Design-system defaults** — baseline aesthetics
|
||||
|
||||
## Cover Recipes
|
||||
|
||||
See `references/design-system.md` for the 7 validated cover recipes (R1–R7) and 14 color palettes.
|
||||
|
||||
Cover recipe selection: `selectCoverRecipe(docType, industry, titleLength)` — defined in `references/design-system.md` (authoritative source).
|
||||
|
||||
---
|
||||
|
||||
## Cover Title Layout Rules (Mandatory)
|
||||
|
||||
These rules apply to ALL cover recipes (R1–R7). They prevent the most common cover quality issues: title overflow, content spilling to page 2, and mid-word line breaks.
|
||||
|
||||
### Rule 1: Always use `calcTitleLayout()`
|
||||
|
||||
Every cover MUST call `calcTitleLayout(title, availableWidth)` from `design-system.md` to determine:
|
||||
- **Font size** (dynamically calculated, never hardcoded above 40pt)
|
||||
- **Line breaks** (semantically split, never mid-word)
|
||||
|
||||
**Forbidden:** Passing the full title as a single long TextRun and letting Word auto-wrap. This causes uncontrolled line breaks at arbitrary character positions.
|
||||
|
||||
### Rule 2: No single-character orphan lines
|
||||
|
||||
If the last line of a title contains only 1–2 characters, merge it into the previous line. The `splitTitleLines()` function handles this automatically.
|
||||
|
||||
### Rule 3: No mid-word breaks for CJK text
|
||||
|
||||
Line breaks must occur at semantic boundaries: after particles (e.g., de/yu/he/ji/zhi), punctuation, connectors, spaces, or underscores. Never split a compound term (e.g., a 4-character term like a management specification must not be split into 3+1 characters).
|
||||
|
||||
For mixed Chinese+English titles (e.g., "基于Transformer架构的..."), use `estimateTextWidth()` instead of character count for line break calculation. Chinese characters are ~2× wider than English characters at the same font size.
|
||||
|
||||
### Rule 4: Maximum 3 title lines on cover
|
||||
|
||||
Cover titles must not exceed 3 lines. If the title is too long, reduce font size (down to minimum 24pt) before adding more lines. If it still exceeds 3 lines at 24pt, force 3 lines with longer line lengths.
|
||||
|
||||
### Rule 5: Always use `calcCoverSpacing()` for whitespace
|
||||
|
||||
Spacing values (`spacing.before`) in cover elements must be dynamically calculated, not hardcoded. Fixed values like `before: 4500` assume a specific title length and will cause overflow with longer titles.
|
||||
|
||||
### Rule 6: Cover height budget validation
|
||||
|
||||
Before generating, verify that total content height stays within 15638 twips (16838 page height minus 1200 twips safety margin — MS Office renders large fonts taller than calculated). Each recipe in `design-system.md` includes height budget annotations — verify during generation.
|
||||
|
||||
### Rule 7: R5 meta info table (academic covers)
|
||||
|
||||
Academic cover meta info must use a 2-column table with **percentage widths only** (NOT DXA — WPS breaks with DXA widths):
|
||||
- **Table width:** adaptive 55–75% of page, calculated by `calcR5MetaLayout()` in `design-system.md`. Table is centered via `alignment: CENTER`.
|
||||
- **Label column:** adaptive 25–45% of table width, **LEFT aligned**, plain text label + ":". NO full-width space padding, NO right-alignment, NO distributed alignment.
|
||||
- **Value column:** remaining percentage, **LEFT aligned**, `bottom border single sz=4` = fixed-length underline (same length for all rows regardless of value text length).
|
||||
- **Label column borders:** none (NO bottom border on label cells).
|
||||
- ⚠️ Do NOT use DXA widths, full-width space padding (`\u3000`), spacer columns, or tab stops — these render inconsistently between MS Office and WPS.
|
||||
|
||||
### Rule 8: Large font paragraphs must set explicit line spacing
|
||||
|
||||
When a paragraph uses a font size larger than the document body text (e.g., cover titles at 36pt+), it **MUST** set explicit `spacing.line` to prevent clipping. Without it, the paragraph inherits the document/style default line spacing (often 560 twips for body text), which is smaller than the font height → the top of characters gets clipped.
|
||||
|
||||
**Formula:** `spacing.line = Math.ceil(fontPt * 23)` with `lineRule: "atLeast"`
|
||||
|
||||
**Example:** A 36pt title needs `spacing: { line: 828, lineRule: "atLeast" }`. Without this, the inherited `line=560` clips the top 160 twips of the text.
|
||||
|
||||
This applies to ALL large-font paragraphs (cover titles, chapter headings, decorative text), not just covers.
|
||||
|
||||
### Rule 9: Every TextRun on a colored background MUST set explicit `color`
|
||||
|
||||
⚠️ **CRITICAL:** When a TextRun is inside a cell/area with a dark or colored background (shading), it **MUST** explicitly set the `color` property. Omitting `color` defaults to black (`#000000`), which is invisible on dark backgrounds.
|
||||
|
||||
**Common mistake:** Subtitle or meta text on R1/R2/R4 dark cover blocks without `color` → appears as invisible black text on dark bg.
|
||||
|
||||
**Rule:** For any TextRun inside a shaded cell:
|
||||
- Use `P.cover.titleColor` for title text
|
||||
- Use `P.cover.subtitleColor` for subtitle text
|
||||
- Use `P.cover.metaColor` for meta info text
|
||||
- Use `P.cover.footerColor` for footer text
|
||||
- **NEVER** rely on default color when background is not white
|
||||
|
||||
### Rule 10: Page number API nesting and 3-section numbering
|
||||
|
||||
⚠️ **CRITICAL:** Page number settings MUST be nested inside `page.pageNumbers`:
|
||||
|
||||
```js
|
||||
// ❌ WRONG — docx-js ignores top-level pageNumberStart/pageNumberFormatType
|
||||
properties: { pageNumberStart: 1, pageNumberFormatType: NumberFormat.DECIMAL }
|
||||
|
||||
// ✅ CORRECT
|
||||
properties: { page: { pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL } } }
|
||||
```
|
||||
|
||||
**Standard page numbering (5-zone convention):**
|
||||
|
||||
All multi-section documents MUST follow this five-zone page numbering scheme unless the user explicitly requests otherwise.
|
||||
|
||||
| Zone | Section | pageNumbers | Footer instrText | Notes |
|
||||
|------|---------|-------------|-----------------|-------|
|
||||
| 1. Cover | Title page | None (no footer) | — | Always logical page 1, but number is **hidden** |
|
||||
| 2. Front matter | Abstract, TOC, Preface | `{ start: 1, formatType: UPPER_ROMAN }` | `PAGE \* ROMAN \* MERGEFORMAT` | Separate Roman numeral sequence (i, ii, iii…) |
|
||||
| 3. Body | Main content | `{ start: 1, formatType: DECIMAL }` | `PAGE \* arabic \* MERGEFORMAT` | **Resets to 1** |
|
||||
| 4. Appendix | Appendices (A, B, C…) | Continues body (no reset) | Same as body | No section break needed unless different headers required |
|
||||
| 5. References | Bibliography | Continues body (no reset) | Same as body | If body ends on p.42, references continue from p.43 |
|
||||
|
||||
**Key rules:**
|
||||
0. **NEVER use "Page X of Y" denominator format.** Footer must show only the current page number (e.g., `1`, `2`, `iii`). Do NOT display total page count. No `Page 3 of 12`, no `3 / 12`, no `第3页/共12页`. Just the bare number. `PageNumber.TOTAL_PAGES` / `NUMPAGES` is **FORBIDDEN** in footers.
|
||||
1. **Cover is always page 1 internally** but the page number is never displayed. Suppress footer in cover section.
|
||||
2. **Front matter uses independent Roman numerals** starting at `i`. This sequence is separate from the body.
|
||||
3. **Body resets to Arabic 1.** The first page of main content is always page `1`.
|
||||
4. **Appendix and references continue the body sequence.** No reset between body → appendix → references.
|
||||
5. **Documents without front matter** skip zone 2 (cover hidden, body starts at Arabic 1).
|
||||
6. **Documents without cover** start body (or front matter) at page 1 directly.
|
||||
7. **Short documents (≤3 pages):** simple Arabic 1, 2, 3 throughout, no cover/frontmatter distinction.
|
||||
8. **Single-page documents** (certificates, letters): no page numbering at all.
|
||||
|
||||
**3-section docx-js implementation (for documents with TOC):**
|
||||
|
||||
At minimum, implement zones 1–3 as separate docx sections:
|
||||
|
||||
```js
|
||||
// Section 1: Cover — no page number
|
||||
properties: { page: { /* no pageNumbers */ } }
|
||||
// No footer children, or empty footer
|
||||
|
||||
// Section 2: Front matter — Roman numerals
|
||||
properties: { page: { pageNumbers: { start: 1, formatType: NumberFormat.UPPER_ROMAN } } }
|
||||
// Footer: PAGE \* ROMAN \* MERGEFORMAT
|
||||
|
||||
// Section 3: Body — Arabic, reset to 1
|
||||
properties: { page: { pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL } } }
|
||||
// Footer: PAGE \* arabic \* MERGEFORMAT
|
||||
|
||||
// Appendix and References: same section as body (continues numbering)
|
||||
// Only create a new section if different header/footer content is needed
|
||||
```
|
||||
|
||||
**Post-processing required** (WPS compatibility):
|
||||
1. Remove empty `<w:pgNumType/>` from cover section XML
|
||||
2. Patch footer instrText: replace bare `PAGE` with format-specific `PAGE \* ROMAN` or `PAGE \* arabic`
|
||||
|
||||
See `toc.md` § Page Number API for full details.
|
||||
538
skills/docx/references/decorations.md
Executable file
538
skills/docx/references/decorations.md
Executable file
@@ -0,0 +1,538 @@
|
||||
## Geometric Decoration System — Pure docx-js Decorations
|
||||
|
||||
### Design Philosophy
|
||||
|
||||
Uses only docx-js native capabilities for visual decoration — no external tools (like Playwright screenshots). Suitable for covers, chapter separators, page background enhancement.
|
||||
|
||||
**When to fall back to Playwright?**
|
||||
Only when gradients, complex illustrations, or brand visuals are needed that pure OOXML cannot express. Default: prefer native solutions below.
|
||||
|
||||
### Decoration Element Library
|
||||
|
||||
#### 1. Color Strip — Table Simulation
|
||||
|
||||
Single-row single-column borderless table + background color to create horizontal color strips.
|
||||
|
||||
```js
|
||||
function colorStrip(color, height = 80) {
|
||||
return new Table({
|
||||
width: { size: 100, type: WidthType.PERCENTAGE },
|
||||
borders: { top: NB, bottom: NB, left: NB, right: NB,
|
||||
insideHorizontal: NB, insideVertical: NB },
|
||||
rows: [new TableRow({
|
||||
height: { value: height, rule: "exact" },
|
||||
children: [new TableCell({
|
||||
shading: { type: ShadingType.CLEAR, fill: color.replace("#", "") },
|
||||
borders: { top: NB, bottom: NB, left: NB, right: NB },
|
||||
children: [new Paragraph({ children: [] })],
|
||||
})],
|
||||
})],
|
||||
});
|
||||
}
|
||||
|
||||
// ══════════════════════════════════════════════════════════════
|
||||
// R6 — Editorial Warm (minimal, warm white bg, no decorations)
|
||||
// ══════════════════════════════════════════════════════════════
|
||||
// Suitable for: lesson plans (non-STEM), cultural/creative, newsletters,
|
||||
// event planning, internal reports, light-weight documents
|
||||
// NOT for: formal business, consulting, finance, government, academic
|
||||
// Title constraint: single line only (≤20 chars). Longer titles → route to R1.
|
||||
//
|
||||
// Structure: 2-row wrapper table (no border, warm bg shading)
|
||||
// Row 1 (content): category → title → subtitle → fields
|
||||
// Row 2 (footer): left English title + right label
|
||||
// All spacing via paragraph indent (WPS safe, no cell margins).
|
||||
|
||||
function buildCoverR6(config) {
|
||||
const P = config.palette;
|
||||
const PAD_L = 1300, PAD_R = 1100;
|
||||
const ind = { left: PAD_L, right: PAD_R };
|
||||
const FOOTER_H = 900;
|
||||
const CONTENT_H = 16838 - FOOTER_H;
|
||||
const shading = { fill: P.bg || "F7F7F5", type: ShadingType.CLEAR };
|
||||
|
||||
// ⚠️ R6 uses a simplified title layout: prefer single line, shrink font to fit
|
||||
const availW = 11906 - PAD_L - PAD_R;
|
||||
const { titlePt, titleLines } = calcTitleLayoutR6(config.title, availW, 36, 22);
|
||||
const titleSize = titlePt * 2;
|
||||
const lineH = Math.ceil(titlePt * 23 * 1.3);
|
||||
|
||||
// Dynamic top spacing
|
||||
const titleH = titleLines.length * (titleSize * 10 + 200);
|
||||
const categoryH = 22 * 10 + 900;
|
||||
const subtitleH = config.subtitle ? (28 * 10 + 1200) : 0;
|
||||
const fieldsH = (config.metaLines || []).length * (24 * 10 + 100);
|
||||
const contentH = categoryH + titleH + subtitleH + fieldsH;
|
||||
const remaining = Math.max(CONTENT_H - 1200 - contentH, 400);
|
||||
const topSpacing = Math.floor(remaining * 0.55);
|
||||
|
||||
const children = [];
|
||||
|
||||
// 1. Top spacer (dynamic)
|
||||
children.push(new Paragraph({ indent: ind, spacing: { before: topSpacing } }));
|
||||
|
||||
// 2. Category label (small, wide letter-spacing)
|
||||
if (config.englishLabel) {
|
||||
children.push(new Paragraph({
|
||||
indent: ind, spacing: { after: 900 },
|
||||
children: [new TextRun({
|
||||
text: config.englishLabel, size: 22,
|
||||
color: P.cover.metaColor || "9A9A9A",
|
||||
font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" },
|
||||
characterSpacing: 60,
|
||||
})],
|
||||
}));
|
||||
}
|
||||
|
||||
// 3. Title (single line preferred, dynamic font size)
|
||||
for (let i = 0; i < titleLines.length; i++) {
|
||||
children.push(new Paragraph({
|
||||
indent: ind,
|
||||
spacing: { after: i < titleLines.length - 1 ? 60 : 300, line: lineH, lineRule: "atLeast" },
|
||||
children: [new TextRun({
|
||||
text: titleLines[i], size: titleSize,
|
||||
color: P.cover.titleColor || "2C2C2C",
|
||||
font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" },
|
||||
characterSpacing: 30,
|
||||
})],
|
||||
}));
|
||||
}
|
||||
|
||||
// 4. Subtitle
|
||||
if (config.subtitle) {
|
||||
children.push(new Paragraph({
|
||||
indent: ind, spacing: { after: 1200 },
|
||||
children: [new TextRun({
|
||||
text: config.subtitle, size: 28,
|
||||
color: P.cover.subtitleColor || "6B6B6B",
|
||||
font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" },
|
||||
characterSpacing: 15,
|
||||
})],
|
||||
}));
|
||||
}
|
||||
|
||||
// 5. Meta fields (tab-aligned label + value)
|
||||
for (const line of (config.metaLines || [])) {
|
||||
// Expect "label:value" format or plain text
|
||||
const sep = line.indexOf(":") !== -1 ? ":" : (line.indexOf(":") !== -1 ? ":" : null);
|
||||
const label = sep ? line.split(sep)[0].trim() : line;
|
||||
const value = sep ? line.split(sep).slice(1).join(sep).trim() : "";
|
||||
children.push(new Paragraph({
|
||||
indent: ind, spacing: { after: 100 },
|
||||
tabStops: [{ type: TabStopType.LEFT, position: PAD_L + 1600 }],
|
||||
children: [
|
||||
new TextRun({ text: label, size: 22, color: P.cover.metaColor || "9A9A9A",
|
||||
font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" }, characterSpacing: 20 }),
|
||||
...(value ? [
|
||||
new TextRun({ text: "\t" }),
|
||||
new TextRun({ text: value, size: 24, color: P.cover.subtitleColor || "6B6B6B",
|
||||
font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" }, characterSpacing: 8 }),
|
||||
] : []),
|
||||
],
|
||||
}));
|
||||
}
|
||||
|
||||
// 6. Footer (2-column borderless table)
|
||||
const footerLeft = config.footerLeft || "";
|
||||
const footerRight = config.footerRight || "";
|
||||
// Adaptive font size for long English footer text
|
||||
const flSize = footerLeft.length > 60 ? 14 : (footerLeft.length > 40 ? 16 : 18);
|
||||
const flSpacing = footerLeft.length > 60 ? 5 : (footerLeft.length > 40 ? 10 : 20);
|
||||
|
||||
const footerTable = new Table({
|
||||
width: { size: 100, type: WidthType.PERCENTAGE },
|
||||
layout: TableLayoutType.FIXED, borders: allNoBorders,
|
||||
rows: [new TableRow({
|
||||
children: [
|
||||
new TableCell({
|
||||
width: { size: 70, type: WidthType.PERCENTAGE }, borders: noBorders, shading,
|
||||
children: [new Paragraph({
|
||||
indent: { left: PAD_L },
|
||||
children: [new TextRun({ text: footerLeft, size: flSize,
|
||||
color: P.cover.footerColor || "9A9A9A",
|
||||
font: { ascii: "Calibri" }, characterSpacing: flSpacing })],
|
||||
})],
|
||||
}),
|
||||
new TableCell({
|
||||
width: { size: 30, type: WidthType.PERCENTAGE }, borders: noBorders, shading,
|
||||
children: [new Paragraph({
|
||||
alignment: AlignmentType.RIGHT, indent: { right: PAD_R },
|
||||
children: [new TextRun({ text: footerRight, size: 18,
|
||||
color: P.cover.footerColor || "9A9A9A",
|
||||
font: { ascii: "Calibri" }, characterSpacing: 20 })],
|
||||
})],
|
||||
}),
|
||||
],
|
||||
})],
|
||||
});
|
||||
|
||||
// 7. 2-row wrapper (content + footer)
|
||||
return [new Table({
|
||||
width: { size: 100, type: WidthType.PERCENTAGE },
|
||||
layout: TableLayoutType.FIXED, borders: allNoBorders,
|
||||
rows: [
|
||||
new TableRow({
|
||||
height: { value: CONTENT_H, rule: "exact" },
|
||||
children: [new TableCell({
|
||||
shading, borders: noBorders,
|
||||
margins: { top: 0, bottom: 0, left: 0, right: 0 },
|
||||
verticalAlign: VerticalAlign.TOP,
|
||||
children,
|
||||
})],
|
||||
}),
|
||||
new TableRow({
|
||||
height: { value: FOOTER_H, rule: "exact" },
|
||||
children: [new TableCell({
|
||||
shading, borders: noBorders,
|
||||
margins: { top: 0, bottom: 0, left: 0, right: 0 },
|
||||
verticalAlign: VerticalAlign.CENTER,
|
||||
children: [footerTable],
|
||||
})],
|
||||
}),
|
||||
],
|
||||
})];
|
||||
}
|
||||
|
||||
// R6 title layout: prefer FEWER lines over larger font size (single line best)
|
||||
function calcTitleLayoutR6(title, availableWidthTw, preferredPt, minPt) {
|
||||
const step = 2;
|
||||
// Try to fit in 1 line (shrink font if needed)
|
||||
for (let pt = preferredPt; pt >= minPt; pt -= step) {
|
||||
const charWidthTw = pt * 23 * 0.5; // CJK ~50% em width
|
||||
const charsPerLine = Math.floor(availableWidthTw / charWidthTw);
|
||||
if (title.length <= charsPerLine) return { titlePt: pt, titleLines: [title] };
|
||||
}
|
||||
// Can't fit in 1 line, try 2 lines at largest possible font
|
||||
for (let pt = preferredPt; pt >= minPt; pt -= step) {
|
||||
const charWidthTw = pt * 23 * 0.5;
|
||||
const charsPerLine = Math.floor(availableWidthTw / charWidthTw);
|
||||
const lines = splitTitleLines(title, charsPerLine);
|
||||
if (lines.length <= 2) return { titlePt: pt, titleLines: lines };
|
||||
}
|
||||
// Fallback: minPt, up to 3 lines
|
||||
const charWidthTw = minPt * 23 * 0.5;
|
||||
const charsPerLine = Math.floor(availableWidthTw / charWidthTw);
|
||||
return { titlePt: minPt, titleLines: splitTitleLines(title, charsPerLine) };
|
||||
}
|
||||
|
||||
// Usage: cover top decoration
|
||||
// children: [colorStrip(P.accent, 120), ...]
|
||||
```
|
||||
|
||||
#### 2. Side Ribbon
|
||||
|
||||
Uses left border to create vertical ribbon effect.
|
||||
|
||||
```js
|
||||
function sideRibbon(content, color, width = 14) {
|
||||
return new Paragraph({
|
||||
border: {
|
||||
left: { style: BorderStyle.SINGLE, size: width, color: color.replace("#", ""), space: 12 },
|
||||
},
|
||||
indent: { left: 240 },
|
||||
spacing: { before: 100, after: 100 },
|
||||
children: content,
|
||||
});
|
||||
}
|
||||
|
||||
// Usage: emphasis quotes, chapter tips
|
||||
// sideRibbon([new TextRun({ text: "Key Insight", bold: true })], P.accent)
|
||||
```
|
||||
|
||||
#### 3. Border Compositions
|
||||
|
||||
```js
|
||||
// Top thick line + bottom thin line — title area frame
|
||||
function frameTitle(titleRuns) {
|
||||
return new Paragraph({
|
||||
border: {
|
||||
top: { style: BorderStyle.SINGLE, size: 18, color: c(P.accent) },
|
||||
bottom: { style: BorderStyle.SINGLE, size: 4, color: c(P.accent) },
|
||||
},
|
||||
spacing: { before: 400, after: 200 },
|
||||
alignment: AlignmentType.CENTER,
|
||||
children: titleRuns,
|
||||
});
|
||||
}
|
||||
|
||||
// L-shape border — left + bottom
|
||||
function lShapeBorder(content) {
|
||||
return new Paragraph({
|
||||
border: {
|
||||
left: { style: BorderStyle.SINGLE, size: 12, color: c(P.accent), space: 10 },
|
||||
bottom: { style: BorderStyle.SINGLE, size: 12, color: c(P.accent) },
|
||||
},
|
||||
indent: { left: 300 },
|
||||
spacing: { before: 200, after: 300 },
|
||||
children: content,
|
||||
});
|
||||
}
|
||||
|
||||
// Double-line frame — top and bottom double lines
|
||||
function doubleLine(content) {
|
||||
return new Paragraph({
|
||||
border: {
|
||||
top: { style: BorderStyle.DOUBLE, size: 6, color: c(P.accent) },
|
||||
bottom: { style: BorderStyle.DOUBLE, size: 6, color: c(P.accent) },
|
||||
},
|
||||
spacing: { before: 200, after: 200 },
|
||||
alignment: AlignmentType.CENTER,
|
||||
children: content,
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
#### 4. Gradient Simulation
|
||||
|
||||
Multiple narrow color strips to simulate gradient effect.
|
||||
|
||||
```js
|
||||
function gradientStrip(startColor, endColor, steps = 5, totalHeight = 200) {
|
||||
const rows = [];
|
||||
const h = Math.floor(totalHeight / steps);
|
||||
for (let i = 0; i < steps; i++) {
|
||||
const ratio = i / (steps - 1);
|
||||
const blended = blendColors(startColor, endColor, ratio);
|
||||
rows.push(new TableRow({
|
||||
height: { value: h, rule: "exact" },
|
||||
children: [new TableCell({
|
||||
shading: { type: ShadingType.CLEAR, fill: blended },
|
||||
borders: { top: NB, bottom: NB, left: NB, right: NB },
|
||||
children: [new Paragraph({ children: [] })],
|
||||
})],
|
||||
}));
|
||||
}
|
||||
return new Table({
|
||||
width: { size: 100, type: WidthType.PERCENTAGE },
|
||||
borders: { top: NB, bottom: NB, left: NB, right: NB,
|
||||
insideHorizontal: NB, insideVertical: NB },
|
||||
rows,
|
||||
});
|
||||
}
|
||||
|
||||
function blendColors(hex1, hex2, ratio) {
|
||||
const r1 = parseInt(hex1.slice(1, 3), 16), g1 = parseInt(hex1.slice(3, 5), 16), b1 = parseInt(hex1.slice(5, 7), 16);
|
||||
const r2 = parseInt(hex2.slice(1, 3), 16), g2 = parseInt(hex2.slice(3, 5), 16), b2 = parseInt(hex2.slice(5, 7), 16);
|
||||
const r = Math.round(r1 + (r2 - r1) * ratio), g = Math.round(g1 + (g2 - g1) * ratio), b = Math.round(b1 + (b2 - b1) * ratio);
|
||||
return `${r.toString(16).padStart(2,"0")}${g.toString(16).padStart(2,"0")}${b.toString(16).padStart(2,"0")}`;
|
||||
}
|
||||
```
|
||||
|
||||
#### 5. Symbol Ornaments
|
||||
|
||||
```js
|
||||
// Section divider line — for chapter separation
|
||||
function ornamentDivider(symbol = "◆", count = 3) {
|
||||
const ornament = Array(count).fill(symbol).join(" ");
|
||||
return new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
spacing: { before: 400, after: 400 },
|
||||
children: [new TextRun({ text: ornament, size: 20, color: c(P.accent) })],
|
||||
});
|
||||
}
|
||||
|
||||
// Common decoration symbols
|
||||
// ◆ ◇ ● ○ ★ ☆ ■ □ ▲ △ ─ ━ ═ ║ ╔ ╗ ╚ ╝
|
||||
// Ornamental: ❧ ❦ ✦ ✧ ✿ ❀ ❁ ※
|
||||
```
|
||||
|
||||
#### 6. Info Card — Table Implementation
|
||||
|
||||
```js
|
||||
function infoCard(title, items, accentColor) {
|
||||
const ac = accentColor.replace("#", "");
|
||||
const headerRow = new TableRow({
|
||||
children: [new TableCell({
|
||||
columnSpan: 2,
|
||||
shading: { type: ShadingType.CLEAR, fill: ac },
|
||||
margins: { top: 80, bottom: 80, left: 160, right: 160 },
|
||||
borders: { top: NB, bottom: NB, left: NB, right: NB },
|
||||
children: [new Paragraph({
|
||||
children: [new TextRun({ text: title, bold: true, size: 24, color: "FFFFFF" })],
|
||||
})],
|
||||
})],
|
||||
});
|
||||
|
||||
const dataRows = items.map(([label, value]) => new TableRow({
|
||||
children: [
|
||||
new TableCell({
|
||||
width: { size: 30, type: WidthType.PERCENTAGE },
|
||||
margins: { top: 60, bottom: 60, left: 160, right: 80 },
|
||||
shading: { type: ShadingType.CLEAR, fill: "F8F9FA" },
|
||||
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "E0E0E0" },
|
||||
top: NB, left: NB, right: NB },
|
||||
children: [new Paragraph({ children: [new TextRun({ text: label, size: 21, color: "666666" })] })],
|
||||
}),
|
||||
new TableCell({
|
||||
margins: { top: 60, bottom: 60, left: 80, right: 160 },
|
||||
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "E0E0E0" },
|
||||
top: NB, left: NB, right: NB },
|
||||
children: [new Paragraph({ children: [new TextRun({ text: value, size: 21 })] })],
|
||||
}),
|
||||
],
|
||||
}));
|
||||
|
||||
return new Table({
|
||||
width: { size: 80, type: WidthType.PERCENTAGE },
|
||||
alignment: AlignmentType.CENTER,
|
||||
borders: { top: NB, bottom: NB, left: NB, right: NB,
|
||||
insideHorizontal: NB, insideVertical: NB },
|
||||
rows: [headerRow, ...dataRows],
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
// R7 — Swiss Tech Minimalist (slate grey bg, Klein blue accent, asymmetric layout)
|
||||
// Suitable for: cultural/creative research, trend reports, brand strategy, design deliverables
|
||||
// Palette: ST-1 (exclusive)
|
||||
// Layout: left-aligned title (upper 20%), right-shifted subtitle with top rule,
|
||||
// right-aligned info block with accent right border, Swiss cross anchor
|
||||
// Key features: ■ square accent dot, open-frame tables, large whitespace
|
||||
//
|
||||
// ⚠️ MANDATORY: All cover non-negotiables apply (margin=0, 16838 exact, allNoBorders)
|
||||
// ⚠️ Title uses calcTitleLayout() with maxPt=36 (not 40 — R7 uses lighter visual weight)
|
||||
|
||||
function buildCoverR7(config) {
|
||||
const P = palettes[config.palette || "ST-1"];
|
||||
const C = P.cover;
|
||||
const padL = 600;
|
||||
|
||||
// Title layout — R7 uses 36pt max (lighter than R1-R4's 40pt)
|
||||
const availW = 11906 - padL - 600;
|
||||
const { titlePt, titleLines } = calcTitleLayout(config.title, availW, 36, 24);
|
||||
const titleSize = titlePt * 2;
|
||||
const lineH = Math.ceil(titlePt * 23);
|
||||
|
||||
// Dynamic spacing based on title lines
|
||||
const topSpacer = titleLines.length <= 2 ? 1200 : 800;
|
||||
const subtitleSpacer = titleLines.length <= 2 ? 1400 : 800;
|
||||
const infoSpacer = titleLines.length <= 2 ? 2200 : 1200;
|
||||
|
||||
const children = [];
|
||||
|
||||
// 1. Swiss cross anchor — top-left decorative element
|
||||
children.push(new Paragraph({
|
||||
spacing: { before: 600 },
|
||||
indent: { left: padL },
|
||||
children: [new TextRun({
|
||||
text: "\uFF0B", // + fullwidth plus
|
||||
size: 40, bold: true, color: C.titleColor,
|
||||
font: { ascii: "Arial", eastAsia: "SimHei" },
|
||||
})],
|
||||
}));
|
||||
|
||||
// 2. Top spacer
|
||||
children.push(new Paragraph({ spacing: { before: topSpacer } }));
|
||||
|
||||
// 3. Title lines — left-aligned, last line has accent ■
|
||||
titleLines.forEach((line, i) => {
|
||||
const isLast = i === titleLines.length - 1;
|
||||
const runs = [new TextRun({
|
||||
text: line, size: titleSize, color: C.titleColor,
|
||||
font: { ascii: "Arial", eastAsia: "Noto Sans SC" },
|
||||
})];
|
||||
if (isLast) {
|
||||
runs.push(new TextRun({
|
||||
text: " \u25A0", // ■ black square
|
||||
size: 24, color: P.accent,
|
||||
font: { ascii: "Arial" },
|
||||
}));
|
||||
}
|
||||
children.push(new Paragraph({
|
||||
indent: { left: padL },
|
||||
spacing: { after: isLast ? 200 : 80, line: lineH, lineRule: "atLeast" },
|
||||
children: runs,
|
||||
}));
|
||||
});
|
||||
|
||||
// 4. Subtitle spacer
|
||||
children.push(new Paragraph({ spacing: { before: subtitleSpacer } }));
|
||||
|
||||
// 5. Subtitle — right-shifted, top border rule, wide character spacing
|
||||
if (config.subtitle) {
|
||||
children.push(new Paragraph({
|
||||
indent: { left: 3800, right: 600 },
|
||||
border: { top: { style: BorderStyle.SINGLE, size: 2, color: C.titleColor, space: 14 } },
|
||||
spacing: { after: 200 },
|
||||
children: [new TextRun({
|
||||
text: config.subtitle, size: 26, color: C.subtitleColor,
|
||||
font: { ascii: "Arial", eastAsia: "Noto Sans SC" },
|
||||
characterSpacing: 40,
|
||||
})],
|
||||
}));
|
||||
}
|
||||
|
||||
// 6. Decorative horizontal line
|
||||
children.push(new Paragraph({
|
||||
spacing: { before: 600 },
|
||||
border: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "C8D0DC", space: 0 } },
|
||||
}));
|
||||
|
||||
// 7. Info spacer
|
||||
children.push(new Paragraph({ spacing: { before: infoSpacer } }));
|
||||
|
||||
// 8. Info footer — right-aligned, 4 label+value pairs, accent right border
|
||||
// Standard fields: ORGANIZATION, RESPONSIBILITY, REPORT NUMBER, DATE & EDITION
|
||||
const metaEntries = config.metaEntries || [
|
||||
{ label: "ORGANIZATION", value: config.organization || "" },
|
||||
{ label: "RESPONSIBILITY", value: config.responsibility || "" },
|
||||
{ label: "REPORT NUMBER", value: config.reportNumber || "" },
|
||||
{ label: "DATE & EDITION", value: config.dateEdition || "" },
|
||||
];
|
||||
|
||||
for (const entry of metaEntries) {
|
||||
// Label — 7pt uppercase English
|
||||
children.push(new Paragraph({
|
||||
alignment: AlignmentType.RIGHT,
|
||||
indent: { right: 800 },
|
||||
border: { right: { style: BorderStyle.SINGLE, size: 12, color: P.accent, space: 16 } },
|
||||
spacing: { after: 20 },
|
||||
children: [new TextRun({
|
||||
text: entry.label, size: 14, color: C.metaColor,
|
||||
font: { ascii: "Arial" },
|
||||
characterSpacing: 20,
|
||||
})],
|
||||
}));
|
||||
// Value — 11pt bold
|
||||
children.push(new Paragraph({
|
||||
alignment: AlignmentType.RIGHT,
|
||||
indent: { right: 800 },
|
||||
border: { right: { style: BorderStyle.SINGLE, size: 12, color: P.accent, space: 16 } },
|
||||
spacing: { after: 280 },
|
||||
children: [new TextRun({
|
||||
text: entry.value, size: 22, bold: true, color: C.titleColor,
|
||||
font: { ascii: "Arial", eastAsia: "Noto Sans SC" },
|
||||
})],
|
||||
}));
|
||||
}
|
||||
|
||||
// Wrap in 16838 exact wrapper table
|
||||
return [new Table({
|
||||
width: { size: 100, type: WidthType.PERCENTAGE },
|
||||
layout: TableLayoutType.FIXED,
|
||||
borders: allNoBorders,
|
||||
rows: [new TableRow({
|
||||
height: { value: 16838, rule: "exact" },
|
||||
children: [new TableCell({
|
||||
shading: { type: ShadingType.CLEAR, fill: P.bg },
|
||||
borders: noBorders,
|
||||
verticalAlign: VerticalAlign.TOP,
|
||||
children,
|
||||
})],
|
||||
})],
|
||||
})];
|
||||
}
|
||||
|
||||
### Decoration Usage Scenarios
|
||||
|
||||
| Scenario | Recommended Decoration | Combination |
|
||||
|------|----------|----------|
|
||||
| Report cover | Color strip + L-frame border | Top strip → Title area → L-frame author info |
|
||||
| Proposal cover | Gradient simulation + double-line frame | Gradient bg → Double-line title |
|
||||
| Chapter separator | Symbol ornament + side ribbon | Symbol divider → New chapter title with ribbon |
|
||||
| Summary card | Info card | Standalone card displaying key metrics |
|
||||
| Academic cover | Color strip + info table | Top strip → School name → Title → Info table |
|
||||
|
||||
---
|
||||
|
||||
1797
skills/docx/references/design-system.md
Executable file
1797
skills/docx/references/design-system.md
Executable file
File diff suppressed because it is too large
Load Diff
257
skills/docx/references/docx-js-advanced.md
Executable file
257
skills/docx/references/docx-js-advanced.md
Executable file
@@ -0,0 +1,257 @@
|
||||
# docx-js Advanced Features
|
||||
|
||||
Advanced API for complex document scenarios. Load this when creating documents with TOC, cover pages, footnotes, multi-section layouts, or post-processing needs.
|
||||
|
||||
## Table of Contents (TOC)
|
||||
|
||||
**→ See `references/toc.md` for the complete TOC reference** (3-step process, code examples, page numbering, common bugs, checklist).
|
||||
|
||||
## Cover Page Design (Vertical Centering)
|
||||
|
||||
Use large `spacing.before` to push content down for visual centering:
|
||||
|
||||
```js
|
||||
// Approximate vertical center on A4:
|
||||
// Total printable height ≈ 14000 twips
|
||||
// For title at ~40% from top: before = 5600
|
||||
const coverSection = {
|
||||
properties: {
|
||||
page: { /* standard A4 */ },
|
||||
// No headers/footers on cover page
|
||||
},
|
||||
children: [
|
||||
new Paragraph({ spacing: { before: 5600 } }), // spacer
|
||||
new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
children: [new TextRun({
|
||||
text: title,
|
||||
font: { ascii: "Calibri", eastAsia: "SimHei" },
|
||||
size: 52, bold: true, color: palette.primary,
|
||||
})],
|
||||
}),
|
||||
// ... subtitle, author, date
|
||||
],
|
||||
};
|
||||
```
|
||||
|
||||
For multi-section documents, put the cover in its own section so it can have different headers/footers.
|
||||
|
||||
## Footnotes
|
||||
|
||||
```js
|
||||
const { FootnoteReferenceRun, Footnote } = require("docx");
|
||||
|
||||
const doc = new Document({
|
||||
footnotes: {
|
||||
1: { children: [new Paragraph({ children: [new TextRun({ text: "Smith, J. (2024). Research Methods. Academic Press, pp. 45-67.", size: 18 })] })] },
|
||||
2: { children: [new Paragraph({ children: [new TextRun({ text: "Zhang, W. (2023). \u201c数据分析方法研究\u201d. 科学通报, 68(12), 1234-1250.", size: 18 })] })] },
|
||||
},
|
||||
sections: [{
|
||||
children: [
|
||||
new Paragraph({
|
||||
children: [
|
||||
new TextRun({ text: "According to recent studies" }),
|
||||
new FootnoteReferenceRun(1), // superscript [1]
|
||||
new TextRun({ text: ", data analysis methods have evolved" }),
|
||||
new FootnoteReferenceRun(2), // superscript [2]
|
||||
new TextRun({ text: "." }),
|
||||
],
|
||||
}),
|
||||
],
|
||||
}],
|
||||
});
|
||||
```
|
||||
|
||||
### Academic Reference Pattern
|
||||
|
||||
For sequential references [1][2][3]..., pre-define all footnotes in the `footnotes` object with numeric keys, then reference them inline with `FootnoteReferenceRun(n)`.
|
||||
|
||||
## keepNext — Element Binding
|
||||
|
||||
Prevent page breaks between related elements:
|
||||
|
||||
```js
|
||||
// Heading stays with next paragraph
|
||||
new Paragraph({
|
||||
heading: HeadingLevel.HEADING_2,
|
||||
keepNext: true, // don't break after this
|
||||
children: [new TextRun({ text: "Table 1: Results" })],
|
||||
})
|
||||
// Table immediately follows on same page
|
||||
|
||||
// Caption stays with image
|
||||
new Paragraph({
|
||||
keepNext: true,
|
||||
alignment: AlignmentType.CENTER,
|
||||
children: [new TextRun({ text: "Figure 1: Architecture Diagram", italics: true, size: 20 })],
|
||||
})
|
||||
// ImageRun paragraph follows
|
||||
```
|
||||
|
||||
Use `keepNext: true` for:
|
||||
- Heading → first paragraph of section
|
||||
- Table caption → table
|
||||
- Image → image caption
|
||||
- "Figure X" label → image
|
||||
|
||||
## Page Break Rules
|
||||
|
||||
Follow the document type strategy defined in SOUL.md Rule 1.
|
||||
|
||||
**Structural breaks (always):**
|
||||
- Cover page → TOC
|
||||
- TOC → main content
|
||||
- Main content → back cover
|
||||
|
||||
**Content breaks (by document type):**
|
||||
- Academic / teaching → `new Paragraph({ children: [new PageBreak()] })` before each H1 chapter
|
||||
- Business report → PageBreak before each H1; H2 flows naturally
|
||||
- Resume / contract / letter → No content page breaks
|
||||
- Short article → No content page breaks
|
||||
|
||||
**Anti-tear (mandatory):**
|
||||
```js
|
||||
// Heading stays with next paragraph
|
||||
new Paragraph({
|
||||
heading: HeadingLevel.HEADING_1,
|
||||
keepNext: true,
|
||||
children: [new TextRun("Chapter Title")],
|
||||
})
|
||||
|
||||
// Table caption stays with table
|
||||
new Paragraph({
|
||||
keepNext: true,
|
||||
children: [new TextRun({ text: "Table 1: Summary", italics: true })],
|
||||
})
|
||||
|
||||
// Image caption stays with image
|
||||
new Paragraph({
|
||||
keepNext: true,
|
||||
children: [new TextRun({ text: "Figure 1: Architecture", italics: true })],
|
||||
})
|
||||
```
|
||||
|
||||
**Never:**
|
||||
- PageBreak inside tables
|
||||
- PageBreak as standalone element (must be inside Paragraph)
|
||||
- PageBreak at the END of the last section (causes blank page)
|
||||
|
||||
```js
|
||||
// Correct: page break between cover and TOC
|
||||
new Paragraph({ children: [new PageBreak()] })
|
||||
```
|
||||
|
||||
## Quotes Escaping in JS Strings
|
||||
|
||||
**⚠️⚠️⚠️ CRITICAL — #1 MOST COMMON BUG ⚠️⚠️⚠️**
|
||||
|
||||
Bare Chinese curly quotation marks (`""` `''`) in JS string literals **WILL break syntax and crash document generation**. This bug occurs most often in **Chinese body text** where curly quotes are used for emphasis, proper nouns, event names, or quoted speech — e.g., `"双11"`, `"前低后高"`, `"618"大促`. **Every single occurrence** of `""''` in text content MUST be Unicode-escaped. No exceptions.
|
||||
|
||||
**MANDATORY RULE: Before writing ANY `TextRun`, `para()`, or string containing Chinese text, scan the text for `""''` characters and replace ALL of them with `\u201c \u201d \u2018 \u2019`.**
|
||||
|
||||
| Character | Unicode | Escape method |
|
||||
|-----------|---------|---------------|
|
||||
| `"` `"` | `\u201c` `\u201d` | Unicode escape `\u201c` `\u201d` |
|
||||
| `'` `'` | `\u2018` `\u2019` | Unicode escape `\u2018` `\u2019` |
|
||||
| `"` | U+0022 | `\"` or wrap string in single quotes / template literal |
|
||||
| `'` | U+0027 | `\'` or wrap string in double quotes / template literal |
|
||||
|
||||
```js
|
||||
// ❌ WRONG — curly quotes in Chinese text break JS syntax (VERY COMMON MISTAKE)
|
||||
content.push(para("2025年四个季度行业增速呈现"前低后高"的态势。在"618"大促、"双11""双12"活动拉动下增长显著。"));
|
||||
new TextRun({ text: "他说"你好"" })
|
||||
new TextRun({ text: 'It's a test' })
|
||||
|
||||
// ✅ CORRECT — ALL curly quotes replaced with Unicode escapes
|
||||
content.push(para("2025年四个季度行业增速呈现\u201c前低后高\u201d的态势。在\u201c618\u201d大促、\u201c双11\u201d\u201c双12\u201d活动拉动下增长显著。"));
|
||||
new TextRun({ text: "他说\u201c你好\u201d" })
|
||||
new TextRun({ text: "It\u2019s a test" })
|
||||
|
||||
// ✅ CORRECT — straight quotes escaped or use alternate delimiters
|
||||
new TextRun({ text: "He said \"hello\"" })
|
||||
new TextRun({ text: 'He said "hello"' })
|
||||
new TextRun({ text: `He said "hello"` })
|
||||
```
|
||||
|
||||
## Multi-Section Documents
|
||||
|
||||
Different headers/footers per section:
|
||||
|
||||
```js
|
||||
const doc = new Document({
|
||||
sections: [
|
||||
{
|
||||
// Section 1: Cover — no header/footer
|
||||
properties: { page: { /* ... */ } },
|
||||
children: coverChildren,
|
||||
},
|
||||
{
|
||||
// Section 2: Front matter — Roman page numbers
|
||||
properties: {
|
||||
type: SectionType.NEXT_PAGE,
|
||||
page: {
|
||||
/* size, margin... */
|
||||
pageNumbers: { start: 1, formatType: NumberFormat.UPPER_ROMAN },
|
||||
},
|
||||
},
|
||||
headers: { default: new Header({ children: [] }) },
|
||||
footers: {
|
||||
default: new Footer({
|
||||
children: [new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
children: [new TextRun({ children: [PageNumber.CURRENT], size: 18 })],
|
||||
})],
|
||||
}),
|
||||
},
|
||||
children: tocAndAbstract,
|
||||
},
|
||||
{
|
||||
// Section 3: Main content — Arabic page numbers
|
||||
properties: {
|
||||
type: SectionType.NEXT_PAGE,
|
||||
page: {
|
||||
/* size, margin... */
|
||||
pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL },
|
||||
},
|
||||
},
|
||||
headers: {
|
||||
default: new Header({
|
||||
children: [new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
children: [new TextRun({ text: docTitle, size: 18, color: "888888" })],
|
||||
})],
|
||||
}),
|
||||
},
|
||||
footers: { default: footerWithPageNumbers },
|
||||
children: mainContent,
|
||||
},
|
||||
],
|
||||
});
|
||||
```
|
||||
|
||||
## Converting DOCX to PDF
|
||||
|
||||
```bash
|
||||
# Using LibreOffice (headless)
|
||||
libreoffice --headless --convert-to pdf output.docx
|
||||
|
||||
# ⚠️ TOC Rule: If document has TOC, warn user that:
|
||||
# 1. LibreOffice conversion may show empty TOC
|
||||
# 2. User should open in Word first, update fields (Ctrl+A → F9), save, then convert
|
||||
# 3. Or use Word's "Save as PDF" for best results
|
||||
```
|
||||
|
||||
## Converting DOCX to Images
|
||||
|
||||
```bash
|
||||
# Step 1: Convert to PDF
|
||||
libreoffice --headless --convert-to pdf output.docx
|
||||
|
||||
# Step 2: Convert PDF to images
|
||||
pdftoppm -png -r 200 output.pdf output_page
|
||||
|
||||
# This generates output_page-1.png, output_page-2.png, etc.
|
||||
# Use -r 200 for good quality (200 DPI)
|
||||
```
|
||||
|
||||
Useful for generating preview thumbnails or when user needs images instead of document files.
|
||||
333
skills/docx/references/docx-js-core.md
Executable file
333
skills/docx/references/docx-js-core.md
Executable file
@@ -0,0 +1,333 @@
|
||||
# docx-js API Reference
|
||||
|
||||
Complete API for creating .docx documents with the `docx` npm package. For advanced features (TOC details, footnotes, PDF conversion), see `docx-js-advanced.md`.
|
||||
|
||||
## Setup
|
||||
|
||||
```js
|
||||
const {
|
||||
Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell,
|
||||
ImageRun, PageBreak, Header, Footer, PageNumber, NumberFormat,
|
||||
AlignmentType, HeadingLevel, WidthType, BorderStyle, ShadingType,
|
||||
PageOrientation, TabStopType, TabStopPosition, ExternalHyperlink,
|
||||
InternalHyperlink, Bookmark, LevelFormat, TableOfContents,
|
||||
} = require("docx");
|
||||
const fs = require("fs");
|
||||
```
|
||||
|
||||
## Document Creation + Export
|
||||
|
||||
```js
|
||||
const doc = new Document({
|
||||
styles: { /* see Styles section */ },
|
||||
numbering: { config: [ /* see Lists section */ ] },
|
||||
sections: [{
|
||||
properties: {
|
||||
page: {
|
||||
size: { width: 11906, height: 16838 },
|
||||
margin: { top: 1417, bottom: 1417, left: 1701, right: 1417 },
|
||||
},
|
||||
},
|
||||
headers: { default: new Header({ children: [/* */] }) },
|
||||
footers: { default: new Footer({ children: [/* */] }) },
|
||||
children: [ /* Paragraphs, Tables, etc. */ ],
|
||||
}],
|
||||
});
|
||||
|
||||
const buffer = await Packer.toBuffer(doc);
|
||||
fs.writeFileSync("output.docx", buffer);
|
||||
```
|
||||
|
||||
## Paragraph + TextRun
|
||||
|
||||
```js
|
||||
new Paragraph({
|
||||
heading: HeadingLevel.HEADING_1, // or HEADING_2, HEADING_3
|
||||
alignment: AlignmentType.JUSTIFIED,
|
||||
spacing: { before: 240, after: 120, line: 312 }, // 1.3x mandatory
|
||||
indent: { firstLine: 480 }, // 2-char CJK indent (480 SimSun / 420 YaHei)
|
||||
children: [
|
||||
new TextRun({
|
||||
text: "Hello",
|
||||
bold: true,
|
||||
italics: true,
|
||||
size: 24, // 12pt = Xiao Si
|
||||
font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" },
|
||||
color: "000000", // Pure black for Profile A; for Profile B use palette.body
|
||||
}),
|
||||
],
|
||||
});
|
||||
|
||||
// Additional text formatting options
|
||||
new TextRun({ text: "Underlined", underline: { type: UnderlineType.SINGLE } })
|
||||
new TextRun({ text: "Highlighted", highlight: "yellow" })
|
||||
new TextRun({ text: "Strikethrough", strike: true })
|
||||
new TextRun({ text: "x²", superScript: true })
|
||||
new TextRun({ text: "H₂O", subScript: true })
|
||||
new SymbolRun({ char: "2022", font: "Symbol" }) // Bullet •
|
||||
```
|
||||
|
||||
## Table
|
||||
|
||||
**⚠️ CRITICAL**: Always set `margins` on TableCell (or at Table level for global default). Without margins, text touches borders.
|
||||
|
||||
**⚠️ CRITICAL**: Use `ShadingType.CLEAR` — never `ShadingType.SOLID` (causes black cells).
|
||||
|
||||
**⚠️ CRITICAL — Table Cross-Page Control**:
|
||||
- Header row MUST set `tableHeader: true` (auto-repeat header on page break)
|
||||
- All rows MUST set `cantSplit: true` (prevent row content split across pages)
|
||||
- Title paragraph before table MUST set `keepNext: true` (keep title with table)
|
||||
|
||||
```js
|
||||
// ⚠️ Title before table — keepNext keeps title with table
|
||||
new Paragraph({
|
||||
keepNext: true, // ← critical
|
||||
children: [new TextRun({ text: "Table 1 Feature Comparison", bold: true, size: 21 })],
|
||||
}),
|
||||
|
||||
new Table({
|
||||
width: { size: 100, type: WidthType.PERCENTAGE },
|
||||
borders: {
|
||||
top: { style: BorderStyle.SINGLE, size: 2, color: "9AA6B2" },
|
||||
bottom: { style: BorderStyle.SINGLE, size: 2, color: "9AA6B2" },
|
||||
left: { style: BorderStyle.NONE },
|
||||
right: { style: BorderStyle.NONE },
|
||||
insideHorizontal: { style: BorderStyle.SINGLE, size: 1, color: "D0D0D0" },
|
||||
insideVertical: { style: BorderStyle.NONE },
|
||||
},
|
||||
rows: [
|
||||
// ⚠️ Header row — tableHeader + cantSplit
|
||||
new TableRow({
|
||||
tableHeader: true, // auto-repeat on page break
|
||||
cantSplit: true, // prevent row split
|
||||
children: ["Header 1", "Header 2"].map(text =>
|
||||
new TableCell({
|
||||
children: [new Paragraph({ children: [new TextRun({ text, bold: true, size: 21 })] })],
|
||||
shading: { type: ShadingType.CLEAR, fill: "F1F5F9" },
|
||||
margins: { top: 60, bottom: 60, left: 120, right: 120 },
|
||||
width: { size: 50, type: WidthType.PERCENTAGE },
|
||||
})
|
||||
),
|
||||
}),
|
||||
// ⚠️ Data rows — cantSplit
|
||||
new TableRow({
|
||||
cantSplit: true, // prevent row split
|
||||
children: ["Data 1", "Data 2"].map(text =>
|
||||
new TableCell({
|
||||
children: [new Paragraph({ children: [new TextRun({ text, size: 21 })] })],
|
||||
margins: { top: 60, bottom: 60, left: 120, right: 120 },
|
||||
width: { size: 50, type: WidthType.PERCENTAGE },
|
||||
})
|
||||
),
|
||||
}),
|
||||
],
|
||||
});
|
||||
```
|
||||
|
||||
### Column Widths
|
||||
|
||||
```js
|
||||
// Fixed widths (twips)
|
||||
width: { size: 3000, type: WidthType.DXA }
|
||||
// Percentage
|
||||
width: { size: 50, type: WidthType.PERCENTAGE }
|
||||
```
|
||||
|
||||
## ImageRun
|
||||
|
||||
**⚠️ CRITICAL**: Always include `type` parameter. Always preserve aspect ratio.
|
||||
|
||||
```js
|
||||
const imageBuffer = fs.readFileSync("chart.png");
|
||||
// Calculate dimensions preserving aspect ratio
|
||||
const displayWidth = 500;
|
||||
const aspectRatio = originalHeight / originalWidth;
|
||||
const displayHeight = Math.round(displayWidth * aspectRatio);
|
||||
|
||||
new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
children: [
|
||||
new ImageRun({
|
||||
data: imageBuffer,
|
||||
transformation: { width: displayWidth, height: displayHeight },
|
||||
type: "png", // REQUIRED: "png", "jpg", "gif", "bmp"
|
||||
}),
|
||||
],
|
||||
});
|
||||
```
|
||||
|
||||
## PageBreak
|
||||
|
||||
**⚠️ CRITICAL**: PageBreak MUST be inside a Paragraph. Standalone PageBreak crashes Word.
|
||||
|
||||
**⚠️ Best Practice**: Attach PageBreak to the end of a **paragraph with text content**. Avoid empty paragraph + PageBreak (may cause blank pages). If using multi-section structure, prefer section breaks over PageBreak.
|
||||
|
||||
```js
|
||||
// ✅ Recommended — PageBreak attached to content paragraph
|
||||
new Paragraph({
|
||||
children: [
|
||||
new TextRun({ text: "End of section" }),
|
||||
new PageBreak()
|
||||
]
|
||||
})
|
||||
|
||||
// ✅ Acceptable — but prefer section breaks
|
||||
new Paragraph({ children: [new PageBreak()] })
|
||||
|
||||
// ✅ Best — use section breaks instead of PageBreak
|
||||
// Place content in different sections — auto page break
|
||||
```
|
||||
|
||||
## Headers & Footers + Page Numbers
|
||||
|
||||
```js
|
||||
headers: {
|
||||
default: new Header({
|
||||
children: [
|
||||
new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
children: [new TextRun({ text: "Document Title", size: 18, color: "888888" })],
|
||||
}),
|
||||
],
|
||||
}),
|
||||
},
|
||||
footers: {
|
||||
default: new Footer({
|
||||
children: [
|
||||
new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
children: [
|
||||
new TextRun({ children: [PageNumber.CURRENT], size: 18 }),
|
||||
],
|
||||
}),
|
||||
],
|
||||
}),
|
||||
},
|
||||
```
|
||||
|
||||
> ⚠️ **Denominator FORBIDDEN** — never use `PageNumber.TOTAL_PAGES` or "X / Y" format. Show only current page number.
|
||||
|
||||
## Styles Definition
|
||||
|
||||
The example below is for **Chinese documents** (default). For **English documents**, replace `font` with `"Times New Roman"` throughout.
|
||||
|
||||
```js
|
||||
styles: {
|
||||
default: {
|
||||
document: {
|
||||
run: {
|
||||
font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" },
|
||||
size: 24, color: "000000", // Pure black for Profile A; for Profile B use palette.body
|
||||
},
|
||||
paragraph: {
|
||||
spacing: { line: 312 }, // 1.3x mandatory
|
||||
},
|
||||
},
|
||||
heading1: {
|
||||
run: { font: { ascii: "Calibri", eastAsia: "SimHei" }, size: 32, bold: true, color: "0B1220" },
|
||||
paragraph: { spacing: { before: 360, after: 160, line: 312 } },
|
||||
},
|
||||
heading2: {
|
||||
run: { font: { ascii: "Calibri", eastAsia: "SimHei" }, size: 28, bold: true, color: "0B1220" },
|
||||
paragraph: { spacing: { before: 240, after: 120, line: 312 } },
|
||||
},
|
||||
heading3: {
|
||||
run: { font: { ascii: "Calibri", eastAsia: "SimHei" }, size: 24, bold: true, color: "0B1220" },
|
||||
paragraph: { spacing: { before: 200, after: 100, line: 312 } },
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
## Lists
|
||||
|
||||
**⚠️ CRITICAL**: Each separate numbered list MUST use a unique `reference` name. Reusing the same reference causes numbering to continue instead of restarting.
|
||||
|
||||
```js
|
||||
// In Document numbering config
|
||||
numbering: {
|
||||
config: [
|
||||
{
|
||||
reference: "list-features", // unique name!
|
||||
levels: [{
|
||||
level: 0,
|
||||
format: LevelFormat.DECIMAL,
|
||||
text: "%1.",
|
||||
alignment: AlignmentType.LEFT,
|
||||
style: { paragraph: { indent: { left: 720, hanging: 360 } } },
|
||||
}],
|
||||
},
|
||||
{
|
||||
reference: "list-benefits", // different name for second list!
|
||||
levels: [{ /* same config */ }],
|
||||
},
|
||||
],
|
||||
},
|
||||
|
||||
// Usage in paragraphs
|
||||
new Paragraph({
|
||||
numbering: { reference: "list-features", level: 0 },
|
||||
children: [new TextRun({ text: "First item" })],
|
||||
})
|
||||
```
|
||||
|
||||
### Bullet Lists
|
||||
|
||||
```js
|
||||
new Paragraph({
|
||||
bullet: { level: 0 },
|
||||
children: [new TextRun({ text: "Bullet item" })],
|
||||
})
|
||||
```
|
||||
|
||||
## Hyperlinks
|
||||
|
||||
### External Link
|
||||
|
||||
```js
|
||||
new ExternalHyperlink({
|
||||
children: [new TextRun({ text: "Click here", style: "Hyperlink" })],
|
||||
link: "https://example.com",
|
||||
})
|
||||
```
|
||||
|
||||
### Internal Link (Bookmark)
|
||||
|
||||
```js
|
||||
// Define bookmark at target
|
||||
new Paragraph({
|
||||
children: [
|
||||
new Bookmark({ id: "section1", children: [new TextRun("Section 1")] }),
|
||||
],
|
||||
})
|
||||
|
||||
// Link to bookmark
|
||||
new InternalHyperlink({
|
||||
children: [new TextRun({ text: "Go to Section 1", style: "Hyperlink" })],
|
||||
anchor: "section1",
|
||||
})
|
||||
```
|
||||
## Table of Contents (TOC)
|
||||
|
||||
**→ See `references/toc.md` for the complete TOC reference.**
|
||||
|
||||
Quick reminder: (1) Add `TableOfContents` element + PageBreak, (2) Run `python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx --auto`, (3) Check exit code.
|
||||
|
||||
## Tabs
|
||||
|
||||
```js
|
||||
new Paragraph({
|
||||
tabStops: [
|
||||
{ type: TabStopType.RIGHT, position: TabStopPosition.MAX },
|
||||
],
|
||||
children: [new TextRun("Left"), new TextRun("\t"), new TextRun("Right")]
|
||||
})
|
||||
```
|
||||
|
||||
## Constants Quick Reference
|
||||
|
||||
- **Underlines:** `SINGLE`, `DOUBLE`, `WAVY`, `DASH`
|
||||
- **Borders:** `SINGLE`, `DOUBLE`, `DASHED`, `DOTTED`
|
||||
- **Numbering:** `DECIMAL` (1,2,3), `UPPER_ROMAN` (I,II,III), `LOWER_LETTER` (a,b,c)
|
||||
- **Symbols:** `"2022"` (•), `"00A9"` (©), `"00AE"` (®), `"2122"` (™)
|
||||
|
||||
323
skills/docx/references/faq.md
Executable file
323
skills/docx/references/faq.md
Executable file
@@ -0,0 +1,323 @@
|
||||
# FAQ — Common Bugs and Fixes
|
||||
|
||||
## Bug: Table text touching cell borders
|
||||
|
||||
**Symptom**: Text is cramped against table cell edges, no padding.
|
||||
|
||||
**Fix**: Set `margins` at the TableCell level:
|
||||
```js
|
||||
new TableCell({
|
||||
margins: { top: 60, bottom: 60, left: 120, right: 120 },
|
||||
children: [/* ... */],
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bug: Numbered list doesn't restart
|
||||
|
||||
**Symptom**: Second numbered list continues from where the first left off (e.g., starts at 4 instead of 1).
|
||||
|
||||
**Fix**: Each separate numbered list MUST use a unique `reference` name in numbering config:
|
||||
```js
|
||||
numbering: { config: [
|
||||
{ reference: "list-A", levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1." }] },
|
||||
{ reference: "list-B", levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1." }] },
|
||||
]}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bug: Cover and content on same page
|
||||
|
||||
**Symptom**: Cover page content flows directly into main content without page break.
|
||||
|
||||
**Fix**: Add a PageBreak paragraph at the end of cover content:
|
||||
```js
|
||||
coverChildren.push(new Paragraph({ children: [new PageBreak()] }));
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bug: Three-line table shows all borders
|
||||
|
||||
**Symptom**: Table intended to be three-line shows full grid borders.
|
||||
|
||||
**Fix**: Set table-level borders to NONE, then override only specific cell borders:
|
||||
```js
|
||||
// Table level: all borders NONE
|
||||
borders: { top: { style: BorderStyle.SINGLE, size: 4 }, bottom: { style: BorderStyle.SINGLE, size: 4 },
|
||||
left: { style: BorderStyle.NONE }, right: { style: BorderStyle.NONE },
|
||||
insideHorizontal: { style: BorderStyle.NONE }, insideVertical: { style: BorderStyle.NONE } }
|
||||
// Header cells: bottom border only
|
||||
headerCell.borders = { bottom: { style: BorderStyle.SINGLE, size: 2, color: "000000" } }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bug: User requests Chinese font size name (e.g. Wu Hao) but output is wrong
|
||||
|
||||
**Symptom**: Font size doesn't match expected Chinese size name.
|
||||
|
||||
**Fix**: Use the correct half-point value. `size` in docx-js is in half-points:
|
||||
- Wu Hao 五号 = 10.5pt → `size: 21`
|
||||
- Xiao Si 小四 = 12pt → `size: 24`
|
||||
- Si Hao 四号 = 14pt → `size: 28`
|
||||
|
||||
See SKILL.md for complete conversion table.
|
||||
|
||||
---
|
||||
|
||||
## Bug: Black table cells
|
||||
|
||||
**Symptom**: Table cells appear solid black in Word.
|
||||
|
||||
**Fix**: Use `ShadingType.CLEAR` not `ShadingType.SOLID`:
|
||||
```js
|
||||
// ❌ WRONG
|
||||
shading: { type: ShadingType.SOLID, fill: "F1F5F9" }
|
||||
// ✅ CORRECT
|
||||
shading: { type: ShadingType.CLEAR, fill: "F1F5F9" }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bug: Chinese characters garbled in matplotlib charts
|
||||
|
||||
**Symptom**: Chinese text shows as empty boxes □□□ in generated PNG charts.
|
||||
|
||||
**Fix**: Configure SimHei font before plotting:
|
||||
```python
|
||||
from matplotlib.font_manager import FontProperties
|
||||
zh_font = FontProperties(fname="/path/to/SimHei.ttf")
|
||||
plt.title("中文标题", fontproperties=zh_font)
|
||||
plt.rcParams["axes.unicode_minus"] = False
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bug: Image stretched/squashed in document
|
||||
|
||||
**Symptom**: Embedded image appears distorted.
|
||||
|
||||
**Fix**: Calculate display height from width using original aspect ratio:
|
||||
```js
|
||||
const aspectRatio = originalHeight / originalWidth;
|
||||
const displayWidth = 500;
|
||||
const displayHeight = Math.round(displayWidth * aspectRatio);
|
||||
new ImageRun({ data: buf, transformation: { width: displayWidth, height: displayHeight }, type: "png" });
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bug: TOC shows empty in generated document
|
||||
|
||||
→ See `references/toc.md` — "5 Common TOC Bugs" section for diagnosis and fixes.
|
||||
|
||||
---
|
||||
|
||||
## Bug: PageBreak standalone crashes Word
|
||||
|
||||
**Symptom**: Document fails to open or renders incorrectly.
|
||||
|
||||
**Fix**: PageBreak must always be wrapped in a Paragraph:
|
||||
```js
|
||||
// ❌ WRONG — standalone
|
||||
children: [new PageBreak()]
|
||||
// ✅ CORRECT — inside Paragraph
|
||||
children: [new Paragraph({ children: [new PageBreak()] })]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bug: Quotation marks break JavaScript syntax — ⚠️ #1 MOST COMMON BUG
|
||||
|
||||
**This is the single most frequent code generation error.** Chinese text routinely uses curly quotes `""` for emphasis, proper nouns, and event names (e.g., "双11", "前低后高", "618"大促). These MUST be Unicode-escaped — bare curly quotes silently break JS syntax.
|
||||
|
||||
**Rule: scan ALL Chinese text for `""''` and replace with `\u201c \u201d \u2018 \u2019` BEFORE writing the string.**
|
||||
|
||||
```js
|
||||
// ❌ WRONG — curly quotes in Chinese text break syntax (extremely common)
|
||||
para("行业增速呈现"前低后高"的态势,在"618"大促拉动下增长。")
|
||||
"他说"你好"" // \u201c \u201d
|
||||
'It's a test' // \u2019
|
||||
|
||||
// ✅ CORRECT — Unicode escapes for ALL curly quotes
|
||||
para("行业增速呈现\u201c前低后高\u201d的态势,在\u201c618\u201d大促拉动下增长。")
|
||||
"他说\u201c你好\u201d"
|
||||
"It\u2019s a test"
|
||||
|
||||
// ✅ Straight quotes: escape or use alternate delimiters
|
||||
"He said \"hello\""
|
||||
'He said "hello"'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bug: Unwanted blank pages in document
|
||||
|
||||
**Common causes:**
|
||||
|
||||
1. **Trailing PageBreak at end of last section** — pagination should use section breaks or be at the start of the next section
|
||||
2. **Empty Paragraph overflow** — empty paragraphs at page bottom push to a new page
|
||||
3. **PageBreak right after Table** — Table already at page bottom, PageBreak creates extra page
|
||||
|
||||
**Fix:**
|
||||
```js
|
||||
// Post-generation check: last section's children should not end with PageBreak
|
||||
function removeTrailingPageBreak(section) {
|
||||
const children = section.children;
|
||||
if (!children.length) return;
|
||||
const last = children[children.length - 1];
|
||||
// If last element is a Paragraph containing only PageBreak, remove it
|
||||
if (last instanceof Paragraph) {
|
||||
const runs = last.root?.filter(c => c instanceof PageBreak);
|
||||
if (runs?.length && !last.root?.some(c => c instanceof TextRun)) {
|
||||
children.pop();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Prevention rules:**
|
||||
- Place PageBreak at the **start of the next section**, not the end of the previous one
|
||||
- Or use separate sections for pagination (no PageBreak needed)
|
||||
- The last section of a document must NEVER end with a PageBreak
|
||||
|
||||
---
|
||||
|
||||
## Bug: Different rendering in WPS vs Microsoft Word
|
||||
|
||||
**Symptom**: Document looks correct in Word but renders differently in WPS (or vice versa) — misaligned tables, shifted content, clipped text in cells, black cells, or broken covers.
|
||||
|
||||
**Root causes and fixes:**
|
||||
|
||||
### 1. `ShadingType.SOLID` shows black in WPS
|
||||
```js
|
||||
// ❌ WPS shows solid black
|
||||
shading: { type: ShadingType.SOLID, fill: "F1F5F9" }
|
||||
// ✅ Both renderers show correct color
|
||||
shading: { type: ShadingType.CLEAR, fill: "F1F5F9" }
|
||||
```
|
||||
|
||||
### 2. `verticalAlign: "center"` in exact-height rows shifts content
|
||||
WPS ignores vertical centering in `rule: "exact"` rows — content stays at top, creating visual mismatch.
|
||||
```js
|
||||
// ❌ Inconsistent between Word and WPS
|
||||
new TableRow({ height: { value: 800, rule: "exact" },
|
||||
children: [new TableCell({ verticalAlign: VerticalAlign.CENTER, ... })] })
|
||||
// ✅ Use top alignment + margins/spacing for positioning
|
||||
new TableRow({ height: { value: 800, rule: "exact" },
|
||||
children: [new TableCell({ verticalAlign: VerticalAlign.TOP,
|
||||
margins: { top: 200 }, ... })] })
|
||||
```
|
||||
|
||||
### 3. Tab stops misalign in WPS
|
||||
Tab widths differ between Word and WPS. Never use tabs for alignment.
|
||||
```js
|
||||
// ❌ Tab-based alignment — breaks in WPS
|
||||
new Paragraph({ tabStops: [{ type: TabStopType.RIGHT, position: 8000 }],
|
||||
children: [new TextRun({ text: "Party A:\tCompany Name" })] })
|
||||
// ✅ Borderless table for alignment — consistent everywhere
|
||||
new Table({ borders: allNoBorders, rows: [new TableRow({ children: [
|
||||
new TableCell({ children: [new Paragraph({ children: [new TextRun({ text: "Party A:" })] })] }),
|
||||
new TableCell({ children: [new Paragraph({ children: [new TextRun({ text: "Company Name" })] })] }),
|
||||
] })] })
|
||||
```
|
||||
|
||||
### 4. Nested tables in exact-height cells overflow differently
|
||||
Word calculates nested table heights more accurately than WPS. Use stacked tables instead.
|
||||
```js
|
||||
// ❌ Nested table inside exact-height cell
|
||||
new TableRow({ height: { value: 16838, rule: "exact" },
|
||||
children: [new TableCell({ children: [nestedTable1, nestedTable2] })] })
|
||||
// ✅ Stacked approach — content table + filler table
|
||||
[contentTable, fillerTable] // both at top level, heights sum to 16838
|
||||
```
|
||||
|
||||
### 5. `characterSpacing` renders differently
|
||||
Large `characterSpacing` values cause inconsistent letter spacing. Keep ≤ 80.
|
||||
|
||||
### 6. `titlePage: true` header/footer suppression
|
||||
WPS may not correctly hide first-page headers when using `titlePage: true`. Use a separate section for the cover instead.
|
||||
|
||||
---
|
||||
|
||||
## Bug: Cover spills to second page
|
||||
|
||||
**Symptom**: Cover content overflows, with some elements (date, footer, accent strip) appearing on page 2.
|
||||
|
||||
**Root cause**: Total content height exceeds 16838 twips (A4 page height). Common when:
|
||||
- Title is very long (3+ lines at large font size)
|
||||
- Fixed spacing values assume short title
|
||||
- Multiple meta lines + subtitle + English label
|
||||
|
||||
**Fix**: Always use `calcTitleLayout()` + `calcCoverSpacing()` from `design-system.md`. These dynamically adjust font sizes and spacing to fit within the page. See `design-system.md § Cover Content Overflow Prevention` for the complete checklist.
|
||||
|
||||
---
|
||||
|
||||
## Bug: Blank page 2 after cover in MS Office (but not WPS)
|
||||
|
||||
**Symptom**: Cover displays correctly in WPS but produces a blank second page in MS Office Word.
|
||||
|
||||
**Root cause**: The cover wrapper table uses **default docx-js table borders** (`single/auto/sz=4`) instead of explicitly setting `allNoBorders`. Default borders add ~8 twips per edge. MS Office includes border thickness in the exact-height row calculation, pushing total height past 16838 twips → overflow to page 2. WPS is more lenient and absorbs the extra pixels.
|
||||
|
||||
**Fix**: Every cover wrapper table MUST explicitly set `borders: allNoBorders`:
|
||||
```js
|
||||
const NB = { style: BorderStyle.NONE, size: 0, color: "FFFFFF" };
|
||||
const allNoBorders = { top: NB, bottom: NB, left: NB, right: NB,
|
||||
insideHorizontal: NB, insideVertical: NB };
|
||||
|
||||
new Table({
|
||||
borders: allNoBorders, // ← MANDATORY
|
||||
rows: [new TableRow({
|
||||
height: { value: 16838, rule: "exact" },
|
||||
// ...
|
||||
})],
|
||||
});
|
||||
```
|
||||
|
||||
**Prevention**: Add to post-generation check — search for any `new Table` in cover code that does not explicitly set `borders`.
|
||||
|
||||
---
|
||||
|
||||
## Bug: Cover decorative lines appear truncated or misaligned
|
||||
|
||||
**Symptom**: Horizontal decorative lines on the cover (accent strips, divider rules) display at different widths in MS Office vs WPS, or appear truncated / not spanning the intended width.
|
||||
|
||||
**Root cause**: Lines were implemented using text characters (`───`, `━━━`, `═══`, `——————`) instead of paragraph borders. Character-drawn lines depend on font metrics (character width × count), which vary across rendering engines.
|
||||
|
||||
**Fix**: Always use **paragraph borders** for decorative lines:
|
||||
```js
|
||||
// ✅ Paragraph border — renders consistently in both MS Office and WPS
|
||||
new Paragraph({
|
||||
indent: { left: 1000, right: 1000 },
|
||||
border: { top: { style: BorderStyle.SINGLE, size: 18, color: accentColor, space: 20 } },
|
||||
children: [],
|
||||
})
|
||||
|
||||
// ❌ NEVER use text characters for decorative lines
|
||||
new TextRun({ text: "───────────────" }) // width varies across engines
|
||||
```
|
||||
|
||||
**Note**: This applies to ALL cover recipes (R1–R5). Recipe R2 uses `border.top` and `border.bottom` for its double-rule frame — follow this pattern.
|
||||
|
||||
---
|
||||
|
||||
## Bug: "undefined" appears in document text
|
||||
|
||||
**Symptom**: Fields like "Contact: undefined" or "Location: undefined" in generated documents.
|
||||
|
||||
**Root cause**: JavaScript outputs the string `"undefined"` when accessing a property that doesn't exist on the config object.
|
||||
|
||||
**Fix**: Use `safeText()` helper for ALL user-facing text values:
|
||||
```js
|
||||
function safeText(value, placeholder) {
|
||||
if (value === undefined || value === null || value === "" ||
|
||||
String(value) === "NaN" || String(value) === "undefined") {
|
||||
return placeholder || "【Please fill in】";
|
||||
}
|
||||
return String(value);
|
||||
}
|
||||
// Usage: new TextRun({ text: safeText(config.contact, "【Contact person】") })
|
||||
```
|
||||
276
skills/docx/references/math-formulas.md
Executable file
276
skills/docx/references/math-formulas.md
Executable file
@@ -0,0 +1,276 @@
|
||||
# Math Formulas — LaTeX → docx-js Mapping
|
||||
|
||||
## Design Philosophy
|
||||
|
||||
GLM uses **LaTeX as the formula input syntax**, internally converting to docx-js Math objects.
|
||||
|
||||
**Why not write OMML directly?**
|
||||
- Models are naturally proficient in LaTeX (abundant in training data)
|
||||
- LaTeX is semantically clear and highly readable
|
||||
- Conversion layer is encapsulated internally, transparent to the user
|
||||
|
||||
## Quick Start
|
||||
|
||||
```js
|
||||
const { Math: OoxmlMath, MathRun, MathFraction, MathSuperScript,
|
||||
MathSubScript, MathRadical, MathSum, MathSubSuperScript } = require("docx");
|
||||
|
||||
// Embed formula in paragraph
|
||||
new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
children: [
|
||||
new OoxmlMath({
|
||||
children: [/* Math components */]
|
||||
})
|
||||
]
|
||||
})
|
||||
```
|
||||
|
||||
## LaTeX → docx-js Conversion Table
|
||||
|
||||
### Basic Operations
|
||||
|
||||
| LaTeX | Meaning | docx-js Implementation |
|
||||
|-------|---------|----------------------|
|
||||
| `x + y` | Addition | `new MathRun("x + y")` |
|
||||
| `x - y` | Subtraction | `new MathRun("x − y")` (use Unicode minus `−`) |
|
||||
| `x \times y` | Multiplication | `new MathRun("x × y")` |
|
||||
| `x \div y` | Division | `new MathRun("x ÷ y")` |
|
||||
| `x \pm y` | Plus-minus | `new MathRun("x ± y")` |
|
||||
| `x \neq y` | Not equal | `new MathRun("x ≠ y")` |
|
||||
| `x \leq y` | Less or equal | `new MathRun("x ≤ y")` |
|
||||
| `x \geq y` | Greater or equal | `new MathRun("x ≥ y")` |
|
||||
|
||||
### Fractions
|
||||
|
||||
| LaTeX | docx-js |
|
||||
|-------|---------|
|
||||
| `\frac{a}{b}` | `new MathFraction({ numerator: [new MathRun("a")], denominator: [new MathRun("b")] })` |
|
||||
| `\frac{x+1}{x-1}` | `new MathFraction({ numerator: [new MathRun("x+1")], denominator: [new MathRun("x−1")] })` |
|
||||
|
||||
### Superscripts & Subscripts
|
||||
|
||||
| LaTeX | docx-js |
|
||||
|-------|---------|
|
||||
| `x^2` | `new MathSuperScript({ children: [new MathRun("x")], superScript: [new MathRun("2")] })` |
|
||||
| `x_i` | `new MathSubScript({ children: [new MathRun("x")], subScript: [new MathRun("i")] })` |
|
||||
| `x_i^2` | `new MathSubSuperScript({ children: [new MathRun("x")], subScript: [new MathRun("i")], superScript: [new MathRun("2")] })` |
|
||||
|
||||
### Radicals
|
||||
|
||||
| LaTeX | docx-js |
|
||||
|-------|---------|
|
||||
| `\sqrt{x}` | `new MathRadical({ children: [new MathRun("x")] })` |
|
||||
| `\sqrt[3]{x}` | `new MathRadical({ children: [new MathRun("x")], degree: [new MathRun("3")] })` |
|
||||
|
||||
### Summation & Integrals
|
||||
|
||||
| LaTeX | docx-js |
|
||||
|-------|---------|
|
||||
| `\sum_{i=1}^{n}` | `new MathSum({ subScript: [new MathRun("i=1")], superScript: [new MathRun("n")], children: [new MathRun("aᵢ")] })` |
|
||||
|
||||
### Greek Letters
|
||||
|
||||
Use Unicode characters directly:
|
||||
|
||||
```js
|
||||
// LaTeX → Unicode mapping
|
||||
const GREEK = {
|
||||
"\\alpha": "α", "\\beta": "β", "\\gamma": "γ", "\\delta": "δ",
|
||||
"\\epsilon": "ε", "\\zeta": "ζ", "\\eta": "η", "\\theta": "θ",
|
||||
"\\iota": "ι", "\\kappa": "κ", "\\lambda": "λ", "\\mu": "μ",
|
||||
"\\nu": "ν", "\\xi": "ξ", "\\pi": "π", "\\rho": "ρ",
|
||||
"\\sigma": "σ", "\\tau": "τ", "\\phi": "φ", "\\chi": "χ",
|
||||
"\\psi": "ψ", "\\omega": "ω",
|
||||
"\\Alpha": "Α", "\\Beta": "Β", "\\Gamma": "Γ", "\\Delta": "Δ",
|
||||
"\\Theta": "Θ", "\\Lambda": "Λ", "\\Pi": "Π", "\\Sigma": "Σ",
|
||||
"\\Phi": "Φ", "\\Psi": "Ψ", "\\Omega": "Ω",
|
||||
};
|
||||
```
|
||||
|
||||
## Complete Formula Examples
|
||||
|
||||
### Quadratic Formula
|
||||
|
||||
LaTeX: `x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}`
|
||||
|
||||
```js
|
||||
new OoxmlMath({
|
||||
children: [
|
||||
new MathRun("x = "),
|
||||
new MathFraction({
|
||||
numerator: [
|
||||
new MathRun("−b ± "),
|
||||
new MathRadical({
|
||||
children: [
|
||||
new MathSuperScript({
|
||||
children: [new MathRun("b")],
|
||||
superScript: [new MathRun("2")],
|
||||
}),
|
||||
new MathRun(" − 4ac"),
|
||||
],
|
||||
}),
|
||||
],
|
||||
denominator: [new MathRun("2a")],
|
||||
}),
|
||||
],
|
||||
})
|
||||
```
|
||||
|
||||
### Pythagorean Theorem
|
||||
|
||||
LaTeX: `a^2 + b^2 = c^2`
|
||||
|
||||
```js
|
||||
new OoxmlMath({
|
||||
children: [
|
||||
new MathSuperScript({ children: [new MathRun("a")], superScript: [new MathRun("2")] }),
|
||||
new MathRun(" + "),
|
||||
new MathSuperScript({ children: [new MathRun("b")], superScript: [new MathRun("2")] }),
|
||||
new MathRun(" = "),
|
||||
new MathSuperScript({ children: [new MathRun("c")], superScript: [new MathRun("2")] }),
|
||||
],
|
||||
})
|
||||
```
|
||||
|
||||
### Trigonometric Identity
|
||||
|
||||
LaTeX: `\sin^2\theta + \cos^2\theta = 1`
|
||||
|
||||
```js
|
||||
new OoxmlMath({
|
||||
children: [
|
||||
new MathSuperScript({ children: [new MathRun("sin")], superScript: [new MathRun("2")] }),
|
||||
new MathRun("θ + "),
|
||||
new MathSuperScript({ children: [new MathRun("cos")], superScript: [new MathRun("2")] }),
|
||||
new MathRun("θ = 1"),
|
||||
],
|
||||
})
|
||||
```
|
||||
|
||||
## Common Exam Formula Templates
|
||||
|
||||
### Middle School Math
|
||||
|
||||
```js
|
||||
// Quadratic discriminant
|
||||
const discriminant = new OoxmlMath({
|
||||
children: [
|
||||
new MathRun("Δ = "),
|
||||
new MathSuperScript({ children: [new MathRun("b")], superScript: [new MathRun("2")] }),
|
||||
new MathRun(" − 4ac"),
|
||||
],
|
||||
});
|
||||
|
||||
// Circle area
|
||||
const circleArea = new OoxmlMath({
|
||||
children: [
|
||||
new MathRun("S = π"),
|
||||
new MathSuperScript({ children: [new MathRun("r")], superScript: [new MathRun("2")] }),
|
||||
],
|
||||
});
|
||||
```
|
||||
|
||||
### High School Math
|
||||
|
||||
```js
|
||||
// Logarithm change of base
|
||||
const logChange = new OoxmlMath({
|
||||
children: [
|
||||
new MathSubScript({ children: [new MathRun("log")], subScript: [new MathRun("a")] }),
|
||||
new MathRun("b = "),
|
||||
new MathFraction({
|
||||
numerator: [new MathRun("ln b")],
|
||||
denominator: [new MathRun("ln a")],
|
||||
}),
|
||||
],
|
||||
});
|
||||
|
||||
// Arithmetic series sum
|
||||
const arithmeticSum = new OoxmlMath({
|
||||
children: [
|
||||
new MathSubScript({ children: [new MathRun("S")], subScript: [new MathRun("n")] }),
|
||||
new MathRun(" = "),
|
||||
new MathFraction({
|
||||
numerator: [
|
||||
new MathRun("n("),
|
||||
new MathSubScript({ children: [new MathRun("a")], subScript: [new MathRun("1")] }),
|
||||
new MathRun(" + "),
|
||||
new MathSubScript({ children: [new MathRun("a")], subScript: [new MathRun("n")] }),
|
||||
new MathRun(")"),
|
||||
],
|
||||
denominator: [new MathRun("2")],
|
||||
}),
|
||||
],
|
||||
});
|
||||
```
|
||||
|
||||
### Physics
|
||||
|
||||
```js
|
||||
// Newton's second law
|
||||
const newton2 = new OoxmlMath({
|
||||
children: [new MathRun("F = ma")],
|
||||
});
|
||||
|
||||
// Kinetic energy
|
||||
const kineticEnergy = new OoxmlMath({
|
||||
children: [
|
||||
new MathSubScript({ children: [new MathRun("E")], subScript: [new MathRun("k")] }),
|
||||
new MathRun(" = "),
|
||||
new MathFraction({
|
||||
numerator: [new MathRun("1")],
|
||||
denominator: [new MathRun("2")],
|
||||
}),
|
||||
new MathRun("m"),
|
||||
new MathSuperScript({ children: [new MathRun("v")], superScript: [new MathRun("2")] }),
|
||||
],
|
||||
});
|
||||
```
|
||||
|
||||
## Complexity Fallback Strategy
|
||||
|
||||
When formulas are too complex (nesting >3 levels) for docx-js Math, **fall back to matplotlib PNG rendering:**
|
||||
|
||||
```python
|
||||
import matplotlib
|
||||
matplotlib.use("Agg")
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
def latex_to_png(latex_str: str, output_path: str, fontsize: int = 14, dpi: int = 200):
|
||||
"""Render LaTeX formula as PNG image"""
|
||||
fig, ax = plt.subplots(figsize=(0.1, 0.1))
|
||||
ax.axis("off")
|
||||
text = ax.text(0, 0.5, f"${latex_str}$", fontsize=fontsize,
|
||||
transform=ax.transAxes, verticalalignment="center")
|
||||
|
||||
fig.canvas.draw()
|
||||
bbox = text.get_window_extent(fig.canvas.get_renderer())
|
||||
fig.set_size_inches(bbox.width / dpi + 0.2, bbox.height / dpi + 0.2)
|
||||
|
||||
plt.savefig(output_path, dpi=dpi, bbox_inches="tight",
|
||||
pad_inches=0.05, transparent=True)
|
||||
plt.close()
|
||||
return output_path
|
||||
```
|
||||
|
||||
Then embed the PNG in the document:
|
||||
|
||||
```js
|
||||
const formulaImg = fs.readFileSync("formula.png");
|
||||
new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
children: [new ImageRun({
|
||||
data: formulaImg,
|
||||
transformation: { width: 300, height: 40 }, // adjust based on actual size
|
||||
type: "png",
|
||||
})],
|
||||
})
|
||||
```
|
||||
|
||||
**Fallback rules:**
|
||||
- Nested fractions >2 levels → fallback
|
||||
- Matrices/determinants → fallback
|
||||
- Complex integrals (multiple integrals + limits + integrand) → fallback
|
||||
- Piecewise functions → fallback
|
||||
- All other cases → prefer docx-js Math
|
||||
222
skills/docx/references/ooxml.md
Executable file
222
skills/docx/references/ooxml.md
Executable file
@@ -0,0 +1,222 @@
|
||||
# OOXML Editing Reference — Document Library API
|
||||
|
||||
**Important: Read this entire document before editing.** This is the primary reference for modifying existing .docx files.
|
||||
|
||||
## Document Library (Python) — Primary API
|
||||
|
||||
Use the `Document` class from `"$DOCX_SCRIPTS/document.py"` for all edits, tracked changes, and comments. It handles infrastructure automatically (people.xml, RSIDs, settings.xml, comments, relationships, content types).
|
||||
|
||||
**Working with Unicode and Entities:**
|
||||
- Both entity notation and Unicode work for search: `contains="“Company"` ≡ `contains="\u201cCompany"`
|
||||
- Both work for replacement too
|
||||
|
||||
### Setup
|
||||
|
||||
```bash
|
||||
# Find the docx skill root
|
||||
find /mnt/skills -name "document.py" -path "*/docx/scripts/*" 2>/dev/null | head -1
|
||||
# Skill root = parent of scripts/
|
||||
|
||||
# Run with PYTHONPATH
|
||||
PYTHONPATH=/mnt/skills/docx python your_script.py
|
||||
```
|
||||
|
||||
```python
|
||||
from scripts.document import Document, DocxXMLEditor
|
||||
|
||||
# Basic init (auto-creates temp copy, sets up infrastructure)
|
||||
doc = Document('unpacked')
|
||||
|
||||
# Custom author/initials
|
||||
doc = Document('unpacked', author="John Doe", initials="JD")
|
||||
|
||||
# Enable tracked changes
|
||||
doc = Document('unpacked', track_revisions=True)
|
||||
|
||||
# Custom RSID (auto-generated if omitted)
|
||||
doc = Document('unpacked', rsid="07DC5ECB")
|
||||
```
|
||||
|
||||
### Finding Nodes
|
||||
|
||||
```python
|
||||
# By text
|
||||
node = doc["word/document.xml"].get_node(tag="w:p", contains="specific text")
|
||||
|
||||
# By line range
|
||||
para = doc["word/document.xml"].get_node(tag="w:p", line_number=range(100, 150))
|
||||
|
||||
# By attributes
|
||||
node = doc["word/document.xml"].get_node(tag="w:del", attrs={"w:id": "1"})
|
||||
|
||||
# By exact line number
|
||||
para = doc["word/document.xml"].get_node(tag="w:p", line_number=42)
|
||||
|
||||
# Combined filters (disambiguation)
|
||||
node = doc["word/document.xml"].get_node(tag="w:r", contains="Section", line_number=range(2400, 2500))
|
||||
```
|
||||
|
||||
### Tracked Changes
|
||||
|
||||
**CRITICAL**: Only mark text that actually changes. Keep unchanged text outside `<w:del>`/`<w:ins>` tags.
|
||||
|
||||
**Method Selection**:
|
||||
- Regular text → `replace_node()` with `<w:del>`/`<w:ins>`, or `suggest_deletion()` for whole elements
|
||||
- Partially modify another's tracked change → `replace_node()` to nest changes
|
||||
- Reject another's insertion → `revert_insertion()` (NOT `suggest_deletion()`)
|
||||
- Reject another's deletion → `revert_deletion()`
|
||||
|
||||
```python
|
||||
# Change one word: "monthly" → "quarterly"
|
||||
node = doc["word/document.xml"].get_node(tag="w:r", contains="The report is monthly")
|
||||
rpr = tags[0].toxml() if (tags := node.getElementsByTagName("w:rPr")) else ""
|
||||
replacement = f'<w:r w:rsidR="00AB12CD">{rpr}<w:t>The report is </w:t></w:r><w:del><w:r>{rpr}<w:delText>monthly</w:delText></w:r></w:del><w:ins><w:r>{rpr}<w:t>quarterly</w:t></w:r></w:ins>'
|
||||
doc["word/document.xml"].replace_node(node, replacement)
|
||||
|
||||
# Delete entire run
|
||||
node = doc["word/document.xml"].get_node(tag="w:r", contains="text to delete")
|
||||
doc["word/document.xml"].suggest_deletion(node)
|
||||
|
||||
# Delete entire paragraph
|
||||
para = doc["word/document.xml"].get_node(tag="w:p", contains="paragraph to delete")
|
||||
doc["word/document.xml"].suggest_deletion(para)
|
||||
|
||||
# Insert new content after a node
|
||||
node = doc["word/document.xml"].get_node(tag="w:r", contains="existing text")
|
||||
doc["word/document.xml"].insert_after(node, '<w:ins><w:r><w:t>new text</w:t></w:r></w:ins>')
|
||||
|
||||
# Add new numbered list item
|
||||
target_para = doc["word/document.xml"].get_node(tag="w:p", contains="existing list item")
|
||||
pPr = tags[0].toxml() if (tags := target_para.getElementsByTagName("w:pPr")) else ""
|
||||
new_item = f'<w:p>{pPr}<w:r><w:t>New item</w:t></w:r></w:p>'
|
||||
tracked_para = DocxXMLEditor.suggest_paragraph(new_item)
|
||||
doc["word/document.xml"].insert_after(target_para, tracked_para)
|
||||
```
|
||||
|
||||
### Handling Other Authors' Changes
|
||||
|
||||
```python
|
||||
# Partially delete another author's insertion
|
||||
node = doc["word/document.xml"].get_node(tag="w:ins", attrs={"w:id": "5"})
|
||||
replacement = '''<w:ins w:author="Jane Smith" w:date="2025-01-15T10:00:00Z">
|
||||
<w:r><w:t>quarterly </w:t></w:r>
|
||||
<w:del><w:r><w:delText>financial </w:delText></w:r></w:del>
|
||||
<w:r><w:t>report</w:t></w:r>
|
||||
</w:ins>'''
|
||||
doc["word/document.xml"].replace_node(node, replacement)
|
||||
|
||||
# Reject insertion (wraps in deletion)
|
||||
ins = doc["word/document.xml"].get_node(tag="w:ins", attrs={"w:id": "5"})
|
||||
doc["word/document.xml"].revert_insertion(ins)
|
||||
|
||||
# Reject deletion (restores deleted content)
|
||||
del_elem = doc["word/document.xml"].get_node(tag="w:del", attrs={"w:id": "3"})
|
||||
doc["word/document.xml"].revert_deletion(del_elem)
|
||||
```
|
||||
|
||||
### Comments
|
||||
|
||||
```python
|
||||
doc = Document('unpacked', author="Z.ai", initials="Z")
|
||||
|
||||
# Comment on a range
|
||||
start = doc["word/document.xml"].get_node(tag="w:del", attrs={"w:id": "1"})
|
||||
end = doc["word/document.xml"].get_node(tag="w:ins", attrs={"w:id": "2"})
|
||||
doc.add_comment(start=start, end=end, text="Explanation of this change")
|
||||
|
||||
# Comment on paragraph
|
||||
para = doc["word/document.xml"].get_node(tag="w:p", contains="text")
|
||||
doc.add_comment(start=para, end=para, text="Comment here")
|
||||
|
||||
# Comment on newly created tracked change
|
||||
node = doc["word/document.xml"].get_node(tag="w:r", contains="old")
|
||||
new_nodes = doc["word/document.xml"].replace_node(
|
||||
node, '<w:del><w:r><w:delText>old</w:delText></w:r></w:del><w:ins><w:r><w:t>new</w:t></w:r></w:ins>')
|
||||
doc.add_comment(start=new_nodes[0], end=new_nodes[1], text="Changed per requirements")
|
||||
|
||||
# Reply to comment
|
||||
doc.reply_to_comment(parent_comment_id=0, text="I agree")
|
||||
```
|
||||
|
||||
### Images
|
||||
|
||||
```python
|
||||
from PIL import Image
|
||||
import shutil, os
|
||||
|
||||
doc = Document('unpacked')
|
||||
media_dir = os.path.join(doc.unpacked_path, 'word/media')
|
||||
os.makedirs(media_dir, exist_ok=True)
|
||||
shutil.copy('image.png', os.path.join(media_dir, 'image1.png'))
|
||||
|
||||
img = Image.open(os.path.join(media_dir, 'image1.png'))
|
||||
width_emus = int(6.5 * 914400) # 6.5" usable width
|
||||
height_emus = int(width_emus * img.size[1] / img.size[0])
|
||||
|
||||
# Add relationship
|
||||
rels_editor = doc['word/_rels/document.xml.rels']
|
||||
next_rid = rels_editor.get_next_rid()
|
||||
rels_editor.append_to(rels_editor.dom.documentElement,
|
||||
f'<Relationship Id="{next_rid}" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png"/>')
|
||||
doc['[Content_Types].xml'].append_to(doc['[Content_Types].xml'].dom.documentElement,
|
||||
'<Default Extension="png" ContentType="image/png"/>')
|
||||
|
||||
# Insert
|
||||
node = doc["word/document.xml"].get_node(tag="w:p", line_number=100)
|
||||
doc["word/document.xml"].insert_after(node, f'''<w:p><w:r><w:drawing>
|
||||
<wp:inline distT="0" distB="0" distL="0" distR="0">
|
||||
<wp:extent cx="{width_emus}" cy="{height_emus}"/>
|
||||
<wp:docPr id="1" name="Picture 1"/>
|
||||
<a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
|
||||
<a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
|
||||
<pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
|
||||
<pic:nvPicPr><pic:cNvPr id="1" name="image1.png"/><pic:cNvPicPr/></pic:nvPicPr>
|
||||
<pic:blipFill><a:blip r:embed="{next_rid}"/><a:stretch><a:fillRect/></a:stretch></pic:blipFill>
|
||||
<pic:spPr><a:xfrm><a:ext cx="{width_emus}" cy="{height_emus}"/></a:xfrm><a:prstGeom prst="rect"><a:avLst/></a:prstGeom></pic:spPr>
|
||||
</pic:pic>
|
||||
</a:graphicData>
|
||||
</a:graphic>
|
||||
</wp:inline>
|
||||
</w:drawing></w:r></w:p>''')
|
||||
```
|
||||
|
||||
### Saving
|
||||
|
||||
```python
|
||||
doc.save() # Validates + copies back to original dir
|
||||
doc.save('modified-unpacked') # Save to different location
|
||||
doc.save(validate=False) # Skip validation (debug only)
|
||||
```
|
||||
|
||||
### Direct DOM Manipulation
|
||||
|
||||
```python
|
||||
editor = doc["word/document.xml"]
|
||||
node = doc["word/document.xml"].get_node(tag="w:p", line_number=5)
|
||||
parent = node.parentNode
|
||||
parent.removeChild(node)
|
||||
|
||||
# General replacement (without tracked changes)
|
||||
old = doc["word/document.xml"].get_node(tag="w:p", contains="original")
|
||||
doc["word/document.xml"].replace_node(old, "<w:p><w:r><w:t>replacement</w:t></w:r></w:p>")
|
||||
|
||||
# Chained insertions
|
||||
node = doc["word/document.xml"].get_node(tag="w:r", line_number=100)
|
||||
nodes = doc["word/document.xml"].insert_after(node, "<w:r><w:t>A</w:t></w:r>")
|
||||
nodes = doc["word/document.xml"].insert_after(nodes[-1], "<w:r><w:t>B</w:t></w:r>")
|
||||
```
|
||||
|
||||
## Schema Compliance Quick Reference
|
||||
|
||||
- **Element ordering in `<w:pPr>`**: `<w:pStyle>` → `<w:numPr>` → `<w:spacing>` → `<w:ind>` → `<w:jc>`
|
||||
- **Whitespace**: `xml:space='preserve'` on `<w:t>` with leading/trailing spaces
|
||||
- **RSIDs**: 8-digit hex only (0-9, A-F)
|
||||
- **trackRevisions**: Add `<w:trackRevisions/>` after `<w:proofState>` in settings.xml
|
||||
- **`<w:del>`/`<w:ins>` placement**: At paragraph level, containing complete `<w:r>` elements. Never nest inside `<w:r>`.
|
||||
|
||||
## Validation Rules
|
||||
|
||||
The validator ensures document text matches the original after reverting GLM's changes:
|
||||
- **Never modify text inside another author's `<w:ins>` or `<w:del>` tags**
|
||||
- **Use nested deletions** to remove another author's insertions
|
||||
- **Every edit must be tracked** with `<w:ins>` or `<w:del>` tags
|
||||
264
skills/docx/references/toc.md
Executable file
264
skills/docx/references/toc.md
Executable file
@@ -0,0 +1,264 @@
|
||||
# Table of Contents (TOC) — Complete Reference
|
||||
|
||||
> **This is the single source of truth for all TOC rules.** Other files should reference this file instead of duplicating TOC instructions.
|
||||
|
||||
## Overview
|
||||
|
||||
DOCX TOC is a **3-step process**: Code → Post-process → User opens Word.
|
||||
|
||||
```
|
||||
Step A: docx-js code generates empty TOC field structure
|
||||
Step B: add_toc_placeholders.py fills it with visible placeholder entries
|
||||
Step C: User opens Word → "Update Field" → real page numbers replace placeholders
|
||||
```
|
||||
|
||||
All 3 steps are **mandatory**. Skipping any step results in a broken or empty TOC.
|
||||
|
||||
## When to Add TOC
|
||||
|
||||
- **Recommended**: Long or complex documents with many headings (reports, theses, papers, manuals)
|
||||
- **Do NOT add**: Resumes, contracts, letters, exam papers, short documents
|
||||
- **postcheck rule**: If document contains a "目录" title but no `TableOfContents` element → error
|
||||
|
||||
## Step A: Code Generation (docx-js)
|
||||
|
||||
Insert **4 elements** in sequence:
|
||||
|
||||
```js
|
||||
const { TableOfContents, Paragraph, TextRun, PageBreak, AlignmentType } = require("docx");
|
||||
|
||||
// 1. TOC title — ⛔ DO NOT use HeadingLevel (or TOC will index itself!)
|
||||
new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
spacing: { before: 480, after: 360 },
|
||||
children: [new TextRun({
|
||||
text: "目 录", // or "Table of Contents" for English docs
|
||||
bold: true, size: 32,
|
||||
font: { eastAsia: "SimHei", ascii: "Times New Roman" }
|
||||
})],
|
||||
}),
|
||||
|
||||
// 2. TOC field element — ⚠️ first parameter is NOT displayed, it's internal name only
|
||||
new TableOfContents("Table of Contents", {
|
||||
hyperlink: true,
|
||||
headingStyleRange: "1-3", // match HeadingLevel range used in document
|
||||
}),
|
||||
|
||||
// 3. ★ MANDATORY Refresh Hint — tells user how to update page numbers
|
||||
new Paragraph({
|
||||
spacing: { before: 200 },
|
||||
children: [new TextRun({
|
||||
text: "Note: This Table of Contents is generated via field codes. To ensure page number accuracy after editing, please right-click the TOC and select \"Update Field.\"",
|
||||
italics: true, size: 18, color: "888888"
|
||||
})]
|
||||
}),
|
||||
|
||||
// 4. ★ MANDATORY PageBreak after TOC — prevents TOC and body merging on same page
|
||||
new Paragraph({ children: [new PageBreak()] }),
|
||||
```
|
||||
|
||||
### Heading Requirements
|
||||
|
||||
**⚠️ CRITICAL**: TOC only picks up paragraphs with `heading: HeadingLevel.HEADING_X`.
|
||||
|
||||
```js
|
||||
// ✅ Correct — Heading style, TOC can index
|
||||
new Paragraph({
|
||||
heading: HeadingLevel.HEADING_1,
|
||||
children: [new TextRun({ text: "第一章 引言", bold: true, size: 32, color: c(P.primary) })]
|
||||
})
|
||||
|
||||
// ❌ Wrong — manual bold + large font, TOC cannot detect
|
||||
new Paragraph({
|
||||
children: [new TextRun({ text: "第一章 引言", bold: true, size: 32, color: c(P.primary) })]
|
||||
})
|
||||
```
|
||||
|
||||
**Exceptions:**
|
||||
- Cover title: does NOT need Heading style (should not appear in TOC)
|
||||
- "目录" title: **MUST NOT** use Heading style (prevents TOC from indexing itself)
|
||||
|
||||
## Step B: Post-Processing Script
|
||||
|
||||
**MUST** run after generating the DOCX file:
|
||||
|
||||
```bash
|
||||
python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx --auto
|
||||
```
|
||||
|
||||
### What the script does
|
||||
|
||||
1. Extracts Heading 1-3 from the document as TOC entries
|
||||
2. Fixes docx-js fldChar structure bug (begin+instrText+separate merged in one `<w:r>`)
|
||||
3. Patches `settings.xml` with `updateFields=true` (Word prompts to refresh on open)
|
||||
4. Ensures Heading styles have `outlineLvl` (required for TOC field update)
|
||||
5. Ensures TOC 1/2/3 styles exist in `styles.xml`
|
||||
6. Injects placeholder entries with HYPERLINK + PAGEREF between `separate` and `end` fldChars
|
||||
7. Handles duplicate heading texts (each gets its own bookmark)
|
||||
|
||||
### Error handling
|
||||
|
||||
The script **exits with code 1** if:
|
||||
- No TOC field structure found (missing `TableOfContents` element)
|
||||
- TOC field has `begin` but no `separate` fldChar (malformed structure)
|
||||
- Field structure exists but no TOC instrText detected
|
||||
|
||||
**If exit code = 1 → the generated code is wrong. Fix the code and regenerate.**
|
||||
|
||||
### Options
|
||||
|
||||
```bash
|
||||
# Auto mode (recommended — default behavior)
|
||||
python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx --auto
|
||||
|
||||
# Manual entries
|
||||
python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx \
|
||||
--entries '[{"level":1,"text":"Chapter 1","page":"1"},{"level":2,"text":"Section 1.1","page":"2"}]'
|
||||
```
|
||||
|
||||
## Step C: User Opens in Word/WPS
|
||||
|
||||
- **Word**: Detects `updateFields=true` → prompts "Update field?" → click Yes → real page numbers
|
||||
- **WPS**: May NOT auto-prompt. User must: right-click TOC → "Update Field" → "Update entire table"
|
||||
|
||||
The placeholder entries ensure TOC is **not blank** even without updating — users see heading titles with approximate page numbers.
|
||||
|
||||
## Multi-Section Page Numbering
|
||||
|
||||
When a document has a TOC, the TOC MUST be in its own section so that body page numbering starts from 1. This applies to **all document types with a TOC** (reports, whitepapers, PRDs, academic papers, etc.) — not just academic papers.
|
||||
|
||||
**Mandatory 3-section architecture for documents with cover + TOC:**
|
||||
|
||||
```js
|
||||
sections: [
|
||||
{ /* Section 1: Cover — no page number, no footer */
|
||||
properties: {
|
||||
page: { size: pgSize, margin: pgMargin },
|
||||
// ⚠️ Do NOT set page.pageNumbers here — docx-js emits empty <pgNumType/> which confuses WPS
|
||||
},
|
||||
},
|
||||
{ /* Section 2: Front matter (abstract, TOC) — Roman numerals */
|
||||
properties: {
|
||||
type: SectionType.NEXT_PAGE,
|
||||
page: {
|
||||
size: pgSize, margin: pgMargin,
|
||||
pageNumbers: { start: 1, formatType: NumberFormat.UPPER_ROMAN }, // I, II, III...
|
||||
},
|
||||
},
|
||||
footers: { default: pageNumFooter() }, // see footer rules below
|
||||
children: [/* abstract + TOC title + TableOfContents + PageBreak */]
|
||||
},
|
||||
{ /* Section 3: Body — Arabic numerals starting from 1 */
|
||||
properties: {
|
||||
type: SectionType.NEXT_PAGE,
|
||||
page: {
|
||||
size: pgSize, margin: pgMargin,
|
||||
pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL }, // 1, 2, 3...
|
||||
},
|
||||
},
|
||||
footers: { default: pageNumFooter() },
|
||||
children: [/* body content */]
|
||||
},
|
||||
]
|
||||
```
|
||||
|
||||
### ⚠️ Page Number API — Correct Nesting (CRITICAL)
|
||||
|
||||
Page number settings MUST be nested inside `page.pageNumbers`, NOT at properties top level:
|
||||
|
||||
```js
|
||||
// ❌ WRONG — docx-js ignores these, pgNumType will be empty
|
||||
properties: {
|
||||
pageNumberStart: 1,
|
||||
pageNumberFormatType: NumberFormat.DECIMAL,
|
||||
}
|
||||
|
||||
// ✅ CORRECT — docx-js writes start= and fmt= attributes
|
||||
properties: {
|
||||
page: {
|
||||
pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL },
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### ⚠️ Footer Field Instruction — WPS Compatibility (CRITICAL)
|
||||
|
||||
WPS may ignore `pgNumType fmt` in the section properties. To ensure correct display, the footer PAGE field **MUST** include an explicit format switch via **post-processing**:
|
||||
|
||||
After generating the docx, unzip and patch each footer XML:
|
||||
- **Roman numeral footer**: replace `PAGE` with `PAGE \* ROMAN \\** MERGEFORMAT`
|
||||
- **Arabic numeral footer**: replace `PAGE \* arabic \* MERGEFORMAT`
|
||||
|
||||
**⚠️ NEVER use `\* decimal` in instrText** — `decimal` is a docx-js API enum value (`NumberFormat.DECIMAL` for `pgNumType` XML attribute), NOT a valid Word field format switch. Using it causes page numbers to render as "1decimal", "2decimal". The correct Word field switch for Arabic numerals is always `\* arabic`.
|
||||
|
||||
```js
|
||||
// Post-process footer XML:
|
||||
footerXml = footerXml.replace(
|
||||
/(<w:instrText[^>]*>)\s*PAGE\s*(<\/w:instrText>)/g,
|
||||
'$1 PAGE \\* ROMAN \\** MERGEFORMAT $2' // or "arabic" for body section
|
||||
);
|
||||
```
|
||||
|
||||
Also remove any empty `<w:pgNumType/>` from the cover section (docx-js emits these even when no pageNumbers is set):
|
||||
```js
|
||||
docXml = docXml.replace(/<w:pgNumType\/>/g, "");
|
||||
```
|
||||
|
||||
### Page Numbering Rules
|
||||
|
||||
| Section | Content | Format | Start | Footer |
|
||||
|---------|---------|--------|-------|--------|
|
||||
| Cover | Title page | None | — | No footer |
|
||||
| Front matter | Abstract, TOC | Roman (I, II, III) | 1 | `PAGE \* ROMAN` |
|
||||
| Body | Main content | Arabic (1, 2, 3) | 1 | `PAGE \* arabic` |
|
||||
|
||||
⚠️ **The body section MUST set `pageNumbers: { start: 1 }`** — otherwise page numbers continue from the front matter pages, causing TOC page references to be offset. This is the #1 cause of "TOC page numbers are wrong".
|
||||
|
||||
### Common Causes of Incorrect Page Numbers
|
||||
|
||||
| Cause | Fix |
|
||||
|-------|-----|
|
||||
| `pageNumberStart` at properties top level | Move to `page: { pageNumbers: { start: 1 } }` |
|
||||
| Cover section emits empty `<pgNumType/>` | Post-process to remove it |
|
||||
| Footer uses bare `PAGE` without format switch | Post-process to add `\* roman` or `\* arabic` |
|
||||
| Cover and body in same section | Separate cover into its own section |
|
||||
| Multiple sections without pageNumbers.start | Explicitly set on each section needing independent counting |
|
||||
| headingStyleRange doesn't match headings | Ensure `headingStyleRange: "1-3"` covers all HeadingLevel values used |
|
||||
| Cover section has header/footer | Don't set header/footer on cover section |
|
||||
|
||||
## TOC Refresh Hint (MANDATORY)
|
||||
|
||||
**⚠️ When the document contains a TOC, you MUST add the following hint paragraph between the `TableOfContents` element and the PageBreak (so it appears on the TOC page, not the body page).** This ensures users know how to refresh page numbers after editing.
|
||||
|
||||
```js
|
||||
new Paragraph({
|
||||
spacing: { before: 200 },
|
||||
children: [new TextRun({
|
||||
text: "Note: This Table of Contents is generated via field codes. To ensure page number accuracy after editing, please right-click the TOC and select \"Update Field.\"",
|
||||
italics: true, size: 18, color: "888888"
|
||||
})]
|
||||
}),
|
||||
```
|
||||
|
||||
## 5 Common TOC Bugs
|
||||
|
||||
| # | Bug | Symptom | Fix |
|
||||
|---|-----|---------|-----|
|
||||
| 1 | "目录" heading uses `HeadingLevel.HEADING_1` | TOC includes "目录" as an entry | Remove `heading:` from TOC title paragraph |
|
||||
| 2 | No `PageBreak` after `TableOfContents` | TOC and body text on same page | Add `new Paragraph({ children: [new PageBreak()] })` after TOC |
|
||||
| 3 | Missing `TableOfContents` element | Script cannot inject placeholders, TOC is empty | Always include `new TableOfContents(...)` in code |
|
||||
| 4 | Headings use bold+large instead of `HeadingLevel` | TOC is empty even after running script | Change all body headings to `heading: HeadingLevel.HEADING_X` |
|
||||
| 5 | Script not run or exit code ignored | TOC page shows only title + blank space | Always run script; if exit code = 1, fix code and regenerate |
|
||||
|
||||
## Checklist (for self-check during generation)
|
||||
|
||||
- [ ] Document has 3+ H1 → TOC is included
|
||||
- [ ] "目录" heading does NOT use `HeadingLevel` (prevents self-indexing)
|
||||
- [ ] `new TableOfContents(...)` element present (not just plain text)
|
||||
- [ ] `PageBreak` exists after TOC element (prevents merging with body)
|
||||
- [ ] All body chapter headings use `heading: HeadingLevel.HEADING_X`
|
||||
- [ ] `add_toc_placeholders.py --auto` runs after generation
|
||||
- [ ] Script exit code checked — if 1, fix code and regenerate
|
||||
- [ ] TOC page has visible placeholder content (not empty)
|
||||
- [ ] **TOC Refresh Hint present** — italic gray note after TOC PageBreak telling user to right-click → "Update Field"
|
||||
- [ ] `outlineLevel: 0` for H1, `1` for H2, etc. (needed for TOC field update)
|
||||
88
skills/docx/routes/comment.md
Executable file
88
skills/docx/routes/comment.md
Executable file
@@ -0,0 +1,88 @@
|
||||
# Route: Add Comments
|
||||
|
||||
## Method 1: python-docx (Recommended — Simple)
|
||||
|
||||
```python
|
||||
from docx import Document
|
||||
from docx.oxml.ns import qn
|
||||
from docx.oxml import OxmlElement
|
||||
from datetime import datetime
|
||||
|
||||
def add_comment(paragraph, comment_text, author="GLM", initials="G"):
|
||||
"""Add a comment to an entire paragraph."""
|
||||
# Create comment reference
|
||||
comment_id = str(hash(comment_text) % 10000)
|
||||
|
||||
# Add to comments.xml (need to create if not exists)
|
||||
# ... complex XML manipulation required
|
||||
pass
|
||||
|
||||
# Simpler approach: use python-docx-ng or manipulate XML directly
|
||||
```
|
||||
|
||||
**Note**: python-docx has limited native comment support. For reliable results, use the OOXML method.
|
||||
|
||||
## Method 2: OOXML Direct Manipulation (Reliable)
|
||||
|
||||
### Step 1: Unpack
|
||||
|
||||
```bash
|
||||
mkdir work && cd work && unzip ../input.docx
|
||||
```
|
||||
|
||||
### Step 2: Create/update word/comments.xml
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
|
||||
<w:comments xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
|
||||
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
|
||||
<w:comment w:id="1" w:author="Reviewer" w:date="2024-01-15T10:30:00Z" w:initials="R">
|
||||
<w:p>
|
||||
<w:r>
|
||||
<w:t>This section needs more detail.</w:t>
|
||||
</w:r>
|
||||
</w:p>
|
||||
</w:comment>
|
||||
</w:comments>
|
||||
```
|
||||
|
||||
### Step 3: Mark comment range in document.xml
|
||||
|
||||
```xml
|
||||
<w:commentRangeStart w:id="1"/>
|
||||
<w:r><w:t>Text being commented on</w:t></w:r>
|
||||
<w:commentRangeEnd w:id="1"/>
|
||||
<w:r>
|
||||
<w:rPr><w:rStyle w:val="CommentReference"/></w:rPr>
|
||||
<w:commentReference w:id="1"/>
|
||||
</w:r>
|
||||
```
|
||||
|
||||
### Step 4: Update relationships
|
||||
|
||||
In `word/_rels/document.xml.rels`, add:
|
||||
```xml
|
||||
<Relationship Id="rIdComments" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments" Target="comments.xml"/>
|
||||
```
|
||||
|
||||
### Step 5: Update Content_Types
|
||||
|
||||
In `[Content_Types].xml`, ensure:
|
||||
```xml
|
||||
<Override PartName="/word/comments.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml"/>
|
||||
```
|
||||
|
||||
### Step 6: Pack
|
||||
|
||||
```bash
|
||||
zip -r ../output.docx . -x ".*"
|
||||
```
|
||||
|
||||
## When to Use Each Method
|
||||
|
||||
| Scenario | Method |
|
||||
|----------|--------|
|
||||
| Add 1-2 simple comments | OOXML |
|
||||
| Batch review (many comments) | OOXML with Python script |
|
||||
| Comment on specific words | OOXML (precise range control) |
|
||||
| Quick annotation | python-docx if available |
|
||||
207
skills/docx/routes/create.md
Executable file
207
skills/docx/routes/create.md
Executable file
@@ -0,0 +1,207 @@
|
||||
# Route: Create New Document
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
0. Check if user provided a reference template (PDF/docx) → if yes, use Template-Following Mode below
|
||||
1. Load `references/design-system.md` → select palette and cover recipe
|
||||
2. Load `references/common-rules.md` → shared layout, font, placeholder rules
|
||||
3. Check user keywords → load scene file if applicable
|
||||
4. Load `references/docx-js-core.md`
|
||||
5. If complex → also load `references/docx-js-advanced.md`
|
||||
6. Plan document structure (outline)
|
||||
7. Write JS/TS using docx library
|
||||
⚠️ **BEFORE writing any string**: scan ALL Chinese text for curly quotes `""''` and replace with `\u201c \u201d \u2018 \u2019` — bare curly quotes break JS syntax (see docx-js-advanced.md § Quotes Escaping)
|
||||
8. Run with `bun run generate.js` (or `node generate.js`)
|
||||
9. If TOC → run `python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx --auto`
|
||||
10. Run post-generation checklist (see SKILL.md)
|
||||
```
|
||||
|
||||
## Template-Following Mode
|
||||
|
||||
When the user provides a reference document (PDF/docx) as a **formatting template** (e.g., "generate following this template format", "refer to this sample"), switch to template-following mode instead of the standard recipe-based workflow:
|
||||
|
||||
1. **Extract the template's structure** — cover layout, section order, heading hierarchy, page breaks, special pages (e.g., advisor comments page, approval form)
|
||||
2. **Replicate structure exactly** — every major structural unit becomes a **separate section** (cover, body, appendix/form pages) with appropriate margins and page breaks
|
||||
3. **Fill content** from the user's content source, or generate per user instructions
|
||||
4. **Preserve template-specific elements** — school-specific forms, signature areas, stamp placeholders, advisor comment blocks → reproduce as-is with placeholder text (e.g., "Advisor (signature):")
|
||||
5. **Maintain formatting fidelity** — font choices, table layouts, spacing, and alignment should match the template, not the standard design-system palettes
|
||||
|
||||
⚠️ **Do NOT apply standard cover recipes (R1–R7) when a user-provided template defines its own cover format.** Follow the template's cover layout instead. Standard `common-rules.md` constraints (e.g., `WidthType.PERCENTAGE`, `allNoBorders` for cover wrapper, `Rule 8` line spacing) still apply for cross-engine compatibility.
|
||||
|
||||
⚠️ **Each distinct page type = separate section.** Cover section (margin: 0), body section (standard margins), appendix/form pages (may need different margins or orientation). Never place cover + body + appendix in a single section.
|
||||
|
||||
---
|
||||
|
||||
## Decision Tree
|
||||
|
||||
### Cover Page?
|
||||
- **YES**: Reports, theses, proposals, plans, or 3+ page docs with clear title/author
|
||||
- **NO**: Resumes, contracts, official documents, exam papers, short memos
|
||||
|
||||
### Cover Style Selector — Recipe Router
|
||||
|
||||
Covers use **7 validated layout recipes (R1–R7)**, auto-selected by `selectCoverRecipe()` in `references/design-system.md` (the **authoritative source** — do NOT duplicate the function).
|
||||
|
||||
**Quick Reference:**
|
||||
|
||||
| docType | Recipe | Default Palette |
|
||||
|---------|--------|-----------------|
|
||||
| contract / official / exam / resume | null (no cover) | — |
|
||||
| academic | R5 (Clean White) | ACADEMIC |
|
||||
| proposal_report (thesis proposal) | R5 (Clean White) | ACADEMIC |
|
||||
| lesson_plan (STEM) | R4 (Top Color Block) | DM-1 |
|
||||
| lesson_plan (arts/general) | R6 (Editorial Warm) | ED-1 |
|
||||
| creative / branding / design | R3 (Centered Card Frame) | SN-2 |
|
||||
| cultural / newsletter / internal | R6 (Editorial Warm) | ED-1 |
|
||||
| activity / event | R6 (Editorial Warm) | ED-1 |
|
||||
| trend/research (cultural/creative/brand) | R7 (Swiss Tech) | ST-1 |
|
||||
| whitepaper | R2 (Double-Rule Frame) | IG-1 / CM-2 |
|
||||
| consulting | R2 (Double-Rule Frame) | MIN-1 |
|
||||
| proposal / plan | R4 (Top Color Block) | GO-1 |
|
||||
| report | R1 (Pure Paragraph Left) | by industry |
|
||||
| default | R1 (Pure Paragraph Left) | DS-1 |
|
||||
|
||||
⚠️ **Long title routing:** After selecting recipe, apply `applyLongTitleOverride(result, titleLength)`. Titles >20 chars on R3/R4/R6 → fall back to R1. Titles >30 chars on R2 → fall back to R1. R5 is never overridden.
|
||||
|
||||
⚠️ **Academic thesis cover:** Use `buildAcademicCover()` from `scenes/academic.md`.
|
||||
|
||||
⚠️ **Thesis proposal report (开题报告):** Use `buildProposalCover()` from `scenes/academic.md`. Cover MUST be an independent section. Keywords: "开题报告" (Chinese), "thesis proposal", "research proposal" — NOT the same as business proposals (which use R4).
|
||||
|
||||
### Table of Contents?
|
||||
- **YES**: 3+ major sections (H1 headings)
|
||||
- **NO**: Resumes, exam papers, short docs, contracts (<20 clauses)
|
||||
|
||||
→ See `references/toc.md` for the complete TOC reference (3-step process, code examples, common bugs).
|
||||
|
||||
### Headers/Footers?
|
||||
- **YES** by default (page numbers minimum)
|
||||
- **NO**: cover page section, official docs (special format)
|
||||
|
||||
### Load Math Formulas?
|
||||
When: exam papers, academic papers, physics/math/chemistry → load `references/math-formulas.md`
|
||||
|
||||
### Load Chart Templates?
|
||||
When: data visualization, reports with charts → load `references/chart-templates.md`
|
||||
|
||||
## Outline Rules
|
||||
|
||||
**User provides outline** → Follow EXACTLY. No additions, deletions, or reordering.
|
||||
|
||||
**No outline** → Create from scene template:
|
||||
- **Academic:** Abstract → TOC → Body → References
|
||||
- **Report:** Use `selectReportType()` to determine type, then follow template A–F:
|
||||
- analysis → Template A (Executive Summary → Background → Scope & Method → Findings → Diagnosis → Conclusions)
|
||||
- experiment → Template B (Abstract → Objective & Hypothesis → Environment → Procedure → Results → Error Analysis → Conclusions)
|
||||
- testing → Template C (Overview → Scope & Environment → Test Plan → Results → Defects → Risks → Conclusions)
|
||||
- research → Template D (Summary → Background → Subjects & Method → Sample → Findings → Synthesis → Recommendations)
|
||||
- review → Template E (Overview → Goals → Review → Results → Issues → Lessons → Action Plan)
|
||||
- proposal → Template F (Summary → Status → Goals → Solution → Roadmap → Resources → Risks → Benefits)
|
||||
- **Contract:** Use `selectContractType()` then follow template A–E:
|
||||
- bilateral → Template A (Header → Parties → Recitals → Definitions → Subject → Price → Rights → Delivery → Tax → IP → Breach → Force Majeure → Termination → Notices → Dispute → Miscellaneous → Signature)
|
||||
- transfer → Template B (Header → Recitals → Definitions → Subject → Consideration → Closing → Representations → Tax → Breach → Dispute → Signature)
|
||||
- nda → Template C (Header → Recitals → Definition → Obligations → Use Restrictions → Return/Destroy → Exceptions → Duration → Breach → Dispute → Signature)
|
||||
- framework → Template D (Header → Recitals → Purpose → Scope → Division → Mechanism → Commercial → Confidentiality → Term → Breach → Dispute → Signature)
|
||||
- terms → Template E (Title → Definitions → Services → Rights → Liability → Fees → IP → Termination → Notices → Dispute → Miscellaneous)
|
||||
- **Official:** Use `selectOfficialType()` + `needsRedHeader()`:
|
||||
- notice → Template A ([Red header] → [Doc number] → Title → Addressee → Reason → Items → Requirements → [Attachments] → [Signature] → [Date] → [Colophon])
|
||||
- letter → Template B ([Red header] → [Doc number] → Title → Addressee → Reason → Negotiation/Reply → Closing → [Signature] → [Date])
|
||||
- reply → Template C ([Red header] → [Doc number] → Title → Addressee → Reference → Reply → "This is the reply." → Signature → Date)
|
||||
- minutes → Template D (Title → Meeting Overview → Agreed Items → Responsibilities → [Distribution]) — typically no red header
|
||||
- Present outline to user before generating when possible
|
||||
|
||||
## Scene Completeness
|
||||
|
||||
Include ALL elements a scene specifies:
|
||||
- **Academic thesis:** Cover (`buildAcademicCover()` in its own section), abstract, TOC, references
|
||||
- **Thesis proposal report (thesis proposal / 开题报告):** Cover (`buildProposalCover()` in its own section), body sections per proposal template. Cover MUST be a separate section.
|
||||
- **Report:** Cover, executive summary, conclusions
|
||||
- **Contract:** Party info, recitals, complete clause closure, signature block, uniform `【】` placeholders
|
||||
- **Official:** Correct document type, specific title, closing phrase matching type, proper numbering hierarchy, red header only when requested
|
||||
- **Exam:** Student info area, scoring criteria
|
||||
|
||||
Generate complete, substantive content — not skeletons.
|
||||
|
||||
## Content Guidelines
|
||||
|
||||
- **Length**: "detailed report" = 3000+ words. "brief summary" = 500–1000.
|
||||
- **Data**: Use user's data, or generate realistic placeholders
|
||||
- **Charts**: Use `references/chart-templates.md` matplotlib templates → PNG → embed
|
||||
- **Math**: Use `references/math-formulas.md` LaTeX → docx-js Math mapping
|
||||
- **Tables**: For structured data, not layout
|
||||
- **Numbering**: Figures, tables numbered sequentially with cross-references
|
||||
|
||||
## Code Architecture
|
||||
|
||||
### Heading Style Rule (Mandatory)
|
||||
|
||||
**All body chapter headings MUST use `heading: HeadingLevel.HEADING_X`** — never simulate with bold + large font (TOC cannot detect simulated headings).
|
||||
|
||||
**Exception:** Cover title and TOC title ("目录") heading MUST NOT use Heading style.
|
||||
|
||||
### Blank Page Prevention
|
||||
|
||||
→ See SKILL.md § Post-Generation checklist for the full set of rules.
|
||||
|
||||
Key rules:
|
||||
1. No double page breaks (SectionType.NEXT_PAGE + PageBreak = blank page)
|
||||
2. PageBreak paragraphs should have visible text content
|
||||
3. No more than 3 consecutive empty paragraphs
|
||||
4. Cover section: ≤2 trailing empty paragraphs, no trailing PageBreak
|
||||
|
||||
### Builder Pattern Example
|
||||
|
||||
```js
|
||||
const { Document, Packer, Paragraph, TextRun, Header, Footer,
|
||||
AlignmentType, HeadingLevel, PageNumber } = require("docx");
|
||||
const fs = require("fs");
|
||||
|
||||
// 1. Palette
|
||||
const P = { primary: "#101820", body: "#182030", secondary: "#506070", accent: "#8090A0" };
|
||||
const c = (hex) => hex.replace("#", "");
|
||||
|
||||
// 2. Component builders
|
||||
function heading(text, level = HeadingLevel.HEADING_1) {
|
||||
return new Paragraph({
|
||||
heading: level,
|
||||
spacing: { before: level === HeadingLevel.HEADING_1 ? 360 : 240, after: 120 },
|
||||
children: [new TextRun({ text, bold: true, color: c(P.primary), font: { ascii: "Calibri", eastAsia: "SimHei" } })]
|
||||
});
|
||||
}
|
||||
|
||||
function body(text) {
|
||||
return new Paragraph({
|
||||
alignment: AlignmentType.JUSTIFIED,
|
||||
indent: { firstLine: 480 },
|
||||
spacing: { line: 312 },
|
||||
children: [new TextRun({ text, size: 24, color: c(P.body) })],
|
||||
});
|
||||
}
|
||||
|
||||
// 3. Assembly — cover + body in separate sections
|
||||
const doc = new Document({
|
||||
styles: { default: { document: {
|
||||
run: { font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" }, size: 24, color: c(P.body) },
|
||||
paragraph: { spacing: { line: 312 } },
|
||||
}}},
|
||||
sections: [
|
||||
{ properties: { page: { margin: { top: 0, bottom: 0, left: 0, right: 0 } } },
|
||||
children: buildCoverR1(config) }, // ← use recipe from design-system.md
|
||||
{ properties: { page: { margin: { top: 1440, bottom: 1440, left: 1701, right: 1417 } } },
|
||||
footers: { default: new Footer({ children: [new Paragraph({ alignment: AlignmentType.CENTER,
|
||||
children: [new TextRun({ children: [PageNumber.CURRENT], size: 18 })] })] }) },
|
||||
children: [heading("Chapter 1"), body("Content...")] },
|
||||
],
|
||||
});
|
||||
|
||||
Packer.toBuffer(doc).then(buf => { fs.writeFileSync("output.docx", buf); });
|
||||
```
|
||||
|
||||
## Post-Generation
|
||||
|
||||
→ See SKILL.md § Post-Generation for the complete two-layer verification checklist.
|
||||
|
||||
```bash
|
||||
python3 "$DOCX_SCRIPTS/postcheck.py" output.docx
|
||||
```
|
||||
⚠️ **Running postcheck.py is MANDATORY.** Fix all ❌ errors before delivering.
|
||||
115
skills/docx/routes/edit.md
Executable file
115
skills/docx/routes/edit.md
Executable file
@@ -0,0 +1,115 @@
|
||||
# Route: Edit Existing Document
|
||||
|
||||
## Workflow Overview
|
||||
|
||||
```
|
||||
1. Receive .docx (or .doc → convert)
|
||||
2. Unpack → working directory
|
||||
3. Analyze structure (document.xml, styles.xml)
|
||||
4. Plan changes → batch by type
|
||||
5. Implement via Document library (Python)
|
||||
6. Pack → output.docx
|
||||
7. Verify (pandoc or visual)
|
||||
```
|
||||
|
||||
## Step 0: Format Conversion
|
||||
|
||||
```bash
|
||||
# .doc → .docx
|
||||
libreoffice --headless --convert-to docx input.doc
|
||||
```
|
||||
|
||||
## Step 1: Unpack
|
||||
|
||||
```bash
|
||||
mkdir -p work_dir && cd work_dir && unzip ../input.docx
|
||||
```
|
||||
|
||||
Key files: `word/document.xml` (content), `word/styles.xml` (styles), `word/numbering.xml` (lists), `word/media/` (images), `[Content_Types].xml`, `word/_rels/document.xml.rels`
|
||||
|
||||
## Step 2: Plan Changes
|
||||
|
||||
Group changes into batches, process in order:
|
||||
|
||||
1. **Structural** — Add/remove sections, reorder paragraphs
|
||||
2. **Style** — Font, size, color modifications
|
||||
3. **Text** — Find/replace, fix typos
|
||||
4. **Table** — Add/remove rows/columns, update data
|
||||
5. **Image** — Replace/add images
|
||||
|
||||
## Step 3: Implement
|
||||
|
||||
Load `references/ooxml.md` for the full Document library API. Key patterns:
|
||||
|
||||
```python
|
||||
from scripts.document import Document
|
||||
|
||||
doc = Document('work_dir')
|
||||
|
||||
# Text replacement with tracked changes
|
||||
node = doc["word/document.xml"].get_node(tag="w:r", contains="old text")
|
||||
rpr = tags[0].toxml() if (tags := node.getElementsByTagName("w:rPr")) else ""
|
||||
replacement = f'<w:del><w:r>{rpr}<w:delText>old text</w:delText></w:r></w:del><w:ins><w:r>{rpr}<w:t>new text</w:t></w:r></w:ins>'
|
||||
doc["word/document.xml"].replace_node(node, replacement)
|
||||
|
||||
doc.save()
|
||||
```
|
||||
|
||||
## Step 4: Pack
|
||||
|
||||
```bash
|
||||
cd work_dir && zip -r ../output.docx . -x ".*"
|
||||
```
|
||||
|
||||
## Step 5: Verify
|
||||
|
||||
```bash
|
||||
pandoc output.docx -t plain -o /dev/stdout | head -50
|
||||
# or visual
|
||||
libreoffice --headless --convert-to pdf output.docx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Template Matching Workflow
|
||||
|
||||
When user says "use this format" or provides a template:
|
||||
|
||||
1. Unpack template, extract `styles.xml`, `numbering.xml`
|
||||
2. Analyze font/size/spacing/margins
|
||||
3. Copy `styles.xml` into target document
|
||||
4. Match heading hierarchy and spacing
|
||||
|
||||
## Multi-File Merge
|
||||
|
||||
1. Use first document as base
|
||||
2. Extract content from additional documents
|
||||
3. Insert with page breaks between sections
|
||||
4. Merge styles (prefer base document's)
|
||||
5. Re-number figures/tables sequentially
|
||||
|
||||
## Redlining (Tracked Changes) — Default for Revisions
|
||||
|
||||
When user asks for revisions, **default to tracked changes** so they can review:
|
||||
|
||||
```python
|
||||
doc = Document('work_dir', track_revisions=True)
|
||||
# ... make changes using replace_node with <w:del>/<w:ins>
|
||||
doc.save()
|
||||
```
|
||||
|
||||
Ask user if they want clean output or tracked changes only if ambiguous.
|
||||
|
||||
## Common Operations Quick Reference
|
||||
|
||||
| Operation | Approach |
|
||||
|-----------|----------|
|
||||
| Replace text | `get_node` + `replace_node` with tracked changes |
|
||||
| Change font | Modify `<w:rFonts>` in run properties |
|
||||
| Add paragraph | `insert_after` with `<w:p>` element |
|
||||
| Delete paragraph | `suggest_deletion` on `<w:p>` |
|
||||
| Add table row | Clone `<w:tr>`, modify cells |
|
||||
| Update header | Edit `word/headerN.xml` |
|
||||
| Change margins | Edit `<w:pgMar>` in `<w:sectPr>` |
|
||||
| Add image | See `references/ooxml.md` image insertion pattern |
|
||||
| Add comment | `doc.add_comment(start, end, text)` |
|
||||
120
skills/docx/routes/format.md
Executable file
120
skills/docx/routes/format.md
Executable file
@@ -0,0 +1,120 @@
|
||||
# Route: Format / Layout
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
1. Read current document (pandoc for content, unpack for structure)
|
||||
2. Identify format requirements from user
|
||||
3. Use unit conversion table (see SKILL.md)
|
||||
4. Apply formatting via OOXML manipulation or python-docx
|
||||
5. Pack and verify
|
||||
```
|
||||
|
||||
## Quick Formatting via python-docx
|
||||
|
||||
For simple formatting tasks, python-docx is often faster than raw XML:
|
||||
|
||||
```python
|
||||
from docx import Document as PythonDocument
|
||||
from docx.shared import Pt, Cm, Twips
|
||||
from docx.enum.text import WD_ALIGN_PARAGRAPH
|
||||
|
||||
doc = PythonDocument("input.docx")
|
||||
|
||||
# Change all body paragraph formatting
|
||||
for para in doc.paragraphs:
|
||||
if para.style.name.startswith("Heading"):
|
||||
continue
|
||||
para.paragraph_format.first_line_indent = Twips(420)
|
||||
para.paragraph_format.line_spacing = 1.5
|
||||
para.paragraph_format.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
|
||||
for run in para.runs:
|
||||
run.font.name = "宋体"
|
||||
run.font.size = Pt(12) # Xiao Si 小四
|
||||
|
||||
doc.save("output.docx")
|
||||
```
|
||||
|
||||
## Common Format Request Patterns
|
||||
|
||||
### University Thesis Formatting
|
||||
|
||||
Typical Chinese university thesis requirements:
|
||||
|
||||
```python
|
||||
from docx.shared import Cm, Pt, Twips
|
||||
|
||||
# Margins
|
||||
for section in doc.sections:
|
||||
section.top_margin = Cm(2.5)
|
||||
section.bottom_margin = Cm(2.5)
|
||||
section.left_margin = Cm(3.0)
|
||||
section.right_margin = Cm(2.5)
|
||||
|
||||
# Fonts
|
||||
# Body: SimSun 宋体 Xiao Si 小四 (12pt)
|
||||
# H1: SimHei 黑体 San Hao 三号 (16pt) centered
|
||||
# H2: SimHei 黑体 Si Hao 四号 (14pt)
|
||||
# H3: SimHei 黑体 Xiao Si 小四 (12pt)
|
||||
# English: Times New Roman, same sizes
|
||||
```
|
||||
|
||||
### Page Numbers Starting from Specific Page
|
||||
|
||||
Use multi-section approach:
|
||||
```python
|
||||
# Section 1: Front matter (Roman numerals)
|
||||
# Section 2: Main content (Arabic, starting from 1)
|
||||
# This requires OOXML manipulation — see routes/edit.md for unpack/pack workflow
|
||||
```
|
||||
|
||||
In raw XML (`word/document.xml`):
|
||||
```xml
|
||||
<w:sectPr>
|
||||
<w:pgNumType w:fmt="upperRoman" w:start="1"/>
|
||||
</w:sectPr>
|
||||
<!-- New section -->
|
||||
<w:sectPr>
|
||||
<w:pgNumType w:fmt="decimal" w:start="1"/>
|
||||
</w:sectPr>
|
||||
```
|
||||
|
||||
### Different Headers Per Section
|
||||
|
||||
Each section in a .docx can have its own header/footer. See `references/docx-js-advanced.md` for the multi-section approach.
|
||||
|
||||
For existing documents, modify `word/document.xml` to split `<w:sectPr>` and create separate `headerN.xml` files.
|
||||
|
||||
### Font Size Conversion
|
||||
|
||||
When user requests a Chinese font size name:
|
||||
|
||||
| Request | Action |
|
||||
|---------|--------|
|
||||
| "Change to Wu Hao (5th) size" | `font.size = Pt(10.5)` or `size: 21` in docx-js |
|
||||
| "Title in San Hao SimHei" | `font.size = Pt(16)`, `font.name = "SimHei"` |
|
||||
| "Body in Xiao Si SimSun" | `font.size = Pt(12)`, `font.name = "SimSun"` |
|
||||
|
||||
### Line Spacing Adjustment
|
||||
|
||||
```python
|
||||
from docx.shared import Twips
|
||||
|
||||
# 1.0x spacing
|
||||
para.paragraph_format.line_spacing_rule = WD_LINE_SPACING.MULTIPLE
|
||||
para.paragraph_format.line_spacing = 1.0
|
||||
|
||||
# 1.3x spacing (our default)
|
||||
para.paragraph_format.line_spacing = 1.5
|
||||
|
||||
# Fixed spacing (e.g., 28pt)
|
||||
para.paragraph_format.line_spacing_rule = WD_LINE_SPACING.EXACTLY
|
||||
para.paragraph_format.line_spacing = Pt(28)
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
After formatting changes:
|
||||
1. Open in LibreOffice or convert to PDF for visual check
|
||||
2. Extract text with pandoc to ensure content unchanged
|
||||
3. Compare file sizes (formatting-only changes shouldn't dramatically change size)
|
||||
114
skills/docx/routes/read.md
Executable file
114
skills/docx/routes/read.md
Executable file
@@ -0,0 +1,114 @@
|
||||
# Route: Read / Analyze / Extract
|
||||
|
||||
## Method 1: Text Extraction via pandoc (Fastest)
|
||||
|
||||
```bash
|
||||
# Plain text
|
||||
pandoc input.docx -t plain -o output.txt
|
||||
|
||||
# Markdown (preserves structure)
|
||||
pandoc input.docx -t markdown -o output.md
|
||||
|
||||
# Extract with metadata
|
||||
pandoc input.docx -t markdown --standalone -o output.md
|
||||
```
|
||||
|
||||
**Best for**: Quick content reading, text analysis, word count, searching.
|
||||
|
||||
## Method 2: Raw XML Access (Detailed)
|
||||
|
||||
```bash
|
||||
mkdir work && cd work && unzip ../input.docx
|
||||
|
||||
# Read main content
|
||||
cat word/document.xml
|
||||
|
||||
# Read styles
|
||||
cat word/styles.xml
|
||||
|
||||
# List embedded media
|
||||
ls word/media/
|
||||
|
||||
# Read headers/footers
|
||||
cat word/header1.xml
|
||||
cat word/footer1.xml
|
||||
```
|
||||
|
||||
**Best for**: Analyzing formatting, extracting styles, inspecting document structure, debugging layout issues.
|
||||
|
||||
### Quick XML Parsing
|
||||
|
||||
```python
|
||||
import defusedxml.ElementTree as ET
|
||||
|
||||
tree = ET.parse("word/document.xml")
|
||||
ns = {"w": "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}
|
||||
|
||||
# Extract all text
|
||||
texts = []
|
||||
for t in tree.iter("{http://schemas.openxmlformats.org/wordprocessingml/2006/main}t"):
|
||||
if t.text:
|
||||
texts.append(t.text)
|
||||
full_text = "".join(texts)
|
||||
|
||||
# Count paragraphs
|
||||
paras = tree.findall(".//w:p", ns)
|
||||
print(f"Paragraphs: {len(paras)}")
|
||||
|
||||
# Find headings
|
||||
for para in paras:
|
||||
pPr = para.find("w:pPr", ns)
|
||||
if pPr is not None:
|
||||
pStyle = pPr.find("w:pStyle", ns)
|
||||
if pStyle is not None and "Heading" in pStyle.get(f"{{{ns['w']}}}val", ""):
|
||||
text = "".join(t.text for t in para.iter(f"{{{ns['w']}}}t") if t.text)
|
||||
print(f" {pStyle.get(f'{{{ns[\"w\"]}}}val')}: {text}")
|
||||
```
|
||||
|
||||
## Method 3: Convert to Images (Visual Analysis)
|
||||
|
||||
```bash
|
||||
# Convert to PDF first
|
||||
libreoffice --headless --convert-to pdf input.docx
|
||||
|
||||
# Then to images
|
||||
pdftoppm -png -r 200 input.pdf page
|
||||
|
||||
# Generates page-1.png, page-2.png, etc.
|
||||
```
|
||||
|
||||
**Best for**: Visual layout analysis, comparing formatting, generating previews, when user asks "what does it look like".
|
||||
|
||||
## Method 4: python-docx Reading
|
||||
|
||||
```python
|
||||
from docx import Document
|
||||
|
||||
doc = Document("input.docx")
|
||||
|
||||
# Read paragraphs
|
||||
for para in doc.paragraphs:
|
||||
print(f"[{para.style.name}] {para.text}")
|
||||
|
||||
# Read tables
|
||||
for table in doc.tables:
|
||||
for row in table.rows:
|
||||
print([cell.text for cell in row.cells])
|
||||
|
||||
# Document properties
|
||||
print(f"Sections: {len(doc.sections)}")
|
||||
print(f"Paragraphs: {len(doc.paragraphs)}")
|
||||
print(f"Tables: {len(doc.tables)}")
|
||||
```
|
||||
|
||||
## Choosing the Right Method
|
||||
|
||||
| Need | Method |
|
||||
|------|--------|
|
||||
| Quick text content | pandoc |
|
||||
| Document structure/outline | pandoc → markdown |
|
||||
| Formatting details | Raw XML |
|
||||
| Table data extraction | python-docx |
|
||||
| Visual appearance | Convert to images |
|
||||
| Style analysis | Raw XML (styles.xml) |
|
||||
| Word/character count | pandoc → plain → wc |
|
||||
783
skills/docx/scenes/academic.md
Executable file
783
skills/docx/scenes/academic.md
Executable file
@@ -0,0 +1,783 @@
|
||||
# Scene: Academic / Thesis
|
||||
|
||||
## Palette
|
||||
|
||||
**Academic Dark** (Cool + Heavy + Calm) — Academic papers use **pure black body text**. Palette only for cover decoration and minimal title scenarios.
|
||||
|
||||
```js
|
||||
const palette = {
|
||||
primary: "#000000", // Title — pure black
|
||||
body: "#000000", // Body — pure black
|
||||
secondary: "#333333", // Header/caption — dark grey
|
||||
accent: "#8B7E5A", // Cover decoration line — cover only
|
||||
surface: "#F5F7FA", // Table header light bg — three-line tables only
|
||||
};
|
||||
```
|
||||
|
||||
⚠️ **Body text color must be pure black `"000000"`**. No decorative dark-blue-grey. Academic papers require print-friendly, black-and-white clarity.
|
||||
|
||||
→ Placeholder convention & universal prohibitions — see `references/common-rules.md`
|
||||
→ **Note:** This scene uses Profile A fonts with academic-specific overrides below.
|
||||
|
||||
---
|
||||
|
||||
## Page Layout
|
||||
|
||||
| Property | Value | Twips |
|
||||
|----------|-------|-------|
|
||||
| Top margin | 2.54 cm | 1440 |
|
||||
| Bottom margin | 2.54 cm | 1440 |
|
||||
| Left margin | 3.00 cm | 1701 |
|
||||
| Right margin | 2.50 cm | 1417 |
|
||||
| Header distance | 1.5 cm | 850 |
|
||||
| Footer distance | 1.75 cm | 992 |
|
||||
|
||||
```js
|
||||
page: {
|
||||
size: { width: 11906, height: 16838 },
|
||||
margin: { top: 1440, bottom: 1440, left: 1701, right: 1417, header: 850, footer: 992 },
|
||||
}
|
||||
```
|
||||
|
||||
For binding margin, add 0.5–1.0 cm to left (i.e., left: 1985–2268).
|
||||
|
||||
---
|
||||
|
||||
## Font Specifications
|
||||
|
||||
| Element | CN Font | EN Font | Size | half-pt | Style |
|
||||
|---------|---------|---------|------|---------|-------|
|
||||
| Thesis title | SimHei | Times New Roman | Xiao Er 18pt | 36 | Bold, centered |
|
||||
| H1 | SimHei | Times New Roman | San Hao 16pt | 32 | Bold, centered |
|
||||
| H2 | SimHei | Times New Roman | Xiao San 15pt | 30 | Bold, left |
|
||||
| H3 | SimHei | Times New Roman | Si Hao 14pt | 28 | Bold, left |
|
||||
| Body | SimSun | Times New Roman | Xiao Si 12pt | 24 | Normal, justified |
|
||||
| Abstract title | SimHei | Times New Roman Bold | San Hao 16pt | 32 | Bold, centered |
|
||||
| Abstract body | SimSun | Times New Roman | Xiao Si 12pt | 24 | Normal, justified |
|
||||
| Keywords label | SimHei | Times New Roman Bold | Xiao Si 12pt | 24 | Bold |
|
||||
| Keywords content | SimSun | Times New Roman | Xiao Si 12pt | 24 | Normal |
|
||||
| Header | SimSun | Times New Roman | Xiao Wu 9pt | 18 | Centered, color 333333 |
|
||||
| Page number | — | Times New Roman | Xiao Wu 10.5pt | 21 | Centered |
|
||||
| Footnote | SimSun | Times New Roman | Xiao Wu 9pt | 18 | Normal |
|
||||
| Figure/table caption | SimSun | Times New Roman | Wu Hao 10.5pt | 21 | Centered |
|
||||
|
||||
### Paragraph Format
|
||||
- Body: justified, first-line indent 2 chars (`firstLine: 480`, SimSun Xiao Si = 480 twips)
|
||||
- Line spacing: 1.5x (`line: 360`); if school requires fixed 22pt, use `line: 440, lineRule: "exact"`
|
||||
- Body paragraph spacing: before/after 0pt; heading spacing per styles below
|
||||
|
||||
```js
|
||||
styles: {
|
||||
default: {
|
||||
document: {
|
||||
run: { font: { ascii: "Times New Roman", eastAsia: "SimSun" }, size: 24, color: "000000" },
|
||||
paragraph: { spacing: { line: 360 } },
|
||||
},
|
||||
heading1: {
|
||||
run: { font: { ascii: "Times New Roman", eastAsia: "SimHei" }, size: 32, bold: true, color: "000000" },
|
||||
paragraph: { alignment: AlignmentType.CENTER, spacing: { before: 480, after: 360, line: 360 } },
|
||||
},
|
||||
heading2: {
|
||||
run: { font: { ascii: "Times New Roman", eastAsia: "SimHei" }, size: 30, bold: true, color: "000000" },
|
||||
paragraph: { spacing: { before: 360, after: 240, line: 360 } },
|
||||
},
|
||||
heading3: {
|
||||
run: { font: { ascii: "Times New Roman", eastAsia: "SimHei" }, size: 28, bold: true, color: "000000" },
|
||||
paragraph: { spacing: { before: 240, after: 120, line: 360 } },
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Heading Numbering System (Mandatory)
|
||||
|
||||
### Format
|
||||
|
||||
| Level | Format | Example |
|
||||
|-------|--------|---------|
|
||||
| H1 | Chapter X + title | 第一章 绪论 (Chapter 1 Introduction) |
|
||||
| H2 | X.X + section title | 1.1 Research Background |
|
||||
| H3 | X.X.X + subsection | 1.1.1 Domestic Research Status |
|
||||
|
||||
### Mandatory Rules
|
||||
1. **H1 must use "第X章" format** — not "一、", not "Chapter 1", not "第1章"
|
||||
2. **H2/H3 use Arabic decimal numbering** (1.1, 1.1.1) — no "(一)", "1)"
|
||||
3. **No mixing multiple numbering systems**
|
||||
4. **No level-skipping** (cannot jump from H1 to H3)
|
||||
5. **All body headings must use `heading: HeadingLevel.HEADING_X`** (TOC depends on this)
|
||||
|
||||
```js
|
||||
// ✅ Correct
|
||||
new Paragraph({
|
||||
heading: HeadingLevel.HEADING_1,
|
||||
children: [new TextRun({ text: "第一章 绪论", bold: true, size: 32, font: { eastAsia: "SimHei", ascii: "Times New Roman" } })]
|
||||
})
|
||||
new Paragraph({
|
||||
heading: HeadingLevel.HEADING_2,
|
||||
children: [new TextRun({ text: "1.1 研究背景", bold: true, size: 30, font: { eastAsia: "SimHei", ascii: "Times New Roman" } })]
|
||||
})
|
||||
```
|
||||
|
||||
### Non-Body Headings
|
||||
Abstract, Table of Contents, References, Appendices, Acknowledgments:
|
||||
- Use H1 style (San Hao SimHei centered) for TOC indexing
|
||||
- But **no numbering** (write directly: "摘 要", "参考文献", etc. — these are non-numbered standalone section headings)
|
||||
|
||||
---
|
||||
|
||||
## Document Structure & Multi-Section Architecture
|
||||
|
||||
Theses must use **multi-section structure** for independent page numbering and header/footer per section.
|
||||
|
||||
### Complete Structure
|
||||
|
||||
```
|
||||
Section 1: Cover → No page number, no header/footer
|
||||
Section 2: Chinese Abstract → Roman numerals starting from i
|
||||
Section 3: English Abstract → Roman numerals continued
|
||||
Section 4: Table of Contents → Roman numerals continued
|
||||
Section 5: Body (all chapters) → Arabic numerals from 1
|
||||
Section 6: References → Arabic numerals continued
|
||||
Section 7: Appendices (if any) → Arabic numerals continued
|
||||
Section 8: Acknowledgments (if any) → Arabic numerals continued
|
||||
```
|
||||
|
||||
### Page Number Implementation
|
||||
|
||||
```js
|
||||
const { NumberFormat } = require("docx");
|
||||
|
||||
// Section 1: Cover — no page number
|
||||
{
|
||||
properties: {
|
||||
page: { margin: { top: 0, bottom: 0, left: 0, right: 0 } },
|
||||
titlePage: true,
|
||||
},
|
||||
children: buildCover(...),
|
||||
}
|
||||
|
||||
// Section 2: Abstract — Roman numerals from i
|
||||
{
|
||||
properties: {
|
||||
type: SectionType.NEXT_PAGE,
|
||||
page: {
|
||||
margin: { top: 1440, bottom: 1440, left: 1701, right: 1417, header: 850, footer: 992 },
|
||||
pageNumbers: { start: 1, formatType: NumberFormat.UPPER_ROMAN },
|
||||
},
|
||||
},
|
||||
headers: { default: buildHeader("Thesis Title") },
|
||||
footers: { default: buildPageNumberFooter() },
|
||||
children: buildAbstractCN(...),
|
||||
}
|
||||
|
||||
// Section 3: English Abstract — Roman numerals continued (no reset)
|
||||
{
|
||||
properties: {
|
||||
type: SectionType.NEXT_PAGE,
|
||||
page: {
|
||||
margin: { top: 1440, bottom: 1440, left: 1701, right: 1417, header: 850, footer: 992 },
|
||||
pageNumbers: { formatType: NumberFormat.UPPER_ROMAN }, // no start → continues from previous
|
||||
},
|
||||
},
|
||||
headers: { default: buildHeader("Thesis Title") },
|
||||
footers: { default: buildPageNumberFooter() },
|
||||
children: buildAbstractEN(...),
|
||||
}
|
||||
|
||||
// Section 5: Body — Arabic numerals from 1
|
||||
{
|
||||
properties: {
|
||||
type: SectionType.NEXT_PAGE,
|
||||
page: {
|
||||
margin: { top: 1440, bottom: 1440, left: 1701, right: 1417, header: 850, footer: 992 },
|
||||
pageNumbers: { start: 1, formatType: NumberFormat.DECIMAL },
|
||||
},
|
||||
},
|
||||
headers: { default: buildHeader("Thesis Title") },
|
||||
footers: { default: buildPageNumberFooter() },
|
||||
children: buildMainContent(...),
|
||||
}
|
||||
// Section 6+: References/Appendices/Acknowledgments — Arabic continued
|
||||
```
|
||||
|
||||
### Header & Footer Helpers
|
||||
|
||||
```js
|
||||
function buildHeader(title) {
|
||||
return new Header({ children: [
|
||||
new Paragraph({ alignment: AlignmentType.CENTER,
|
||||
border: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "000000" } },
|
||||
children: [new TextRun({ text: title, size: 18, color: "333333",
|
||||
font: { ascii: "Times New Roman", eastAsia: "SimSun" } })],
|
||||
}),
|
||||
] });
|
||||
}
|
||||
|
||||
function buildPageNumberFooter() {
|
||||
return new Footer({ children: [
|
||||
new Paragraph({ alignment: AlignmentType.CENTER,
|
||||
children: [
|
||||
new TextRun({ text: "- ", size: 21 }),
|
||||
new TextRun({ children: [PageNumber.CURRENT], size: 21 }),
|
||||
new TextRun({ text: " -", size: 21 }),
|
||||
],
|
||||
}),
|
||||
] });
|
||||
}
|
||||
```
|
||||
|
||||
### Page Break Rules
|
||||
- Cover is a separate section (no PageBreak needed)
|
||||
- Chinese abstract, English abstract, TOC each in their own section
|
||||
- All body chapters in **one section** (no forced page breaks between chapters unless user requests)
|
||||
- References, appendices, acknowledgments each in their own section
|
||||
- **Never use blank lines instead of section breaks**
|
||||
|
||||
---
|
||||
|
||||
## Cover
|
||||
|
||||
### Information Fields
|
||||
|
||||
Cover must include (use placeholders for missing info):
|
||||
|
||||
| Field | Format | Placeholder |
|
||||
|-------|--------|-------------|
|
||||
| University name | Er Hao SimHei, centered | ×××University |
|
||||
| Thesis title (CN) | Xiao Er SimHei, centered | (user-provided) |
|
||||
| Thesis title (EN) | San Hao Times New Roman, centered | (translated from CN) |
|
||||
| College | Si Hao SimSun | ×××College |
|
||||
| Major | Si Hao SimSun | ×××Major |
|
||||
| Author | Si Hao SimSun | ××× |
|
||||
| Student ID | Si Hao SimSun | ××××××× |
|
||||
| Advisor | Si Hao SimSun | ×××Professor |
|
||||
| Date | Si Hao SimSun | 2026/XX |
|
||||
|
||||
### Cover Style
|
||||
|
||||
Use Recipe R5 (Clean White) or academic-specific `buildAcademicCover()` — never use commercial-style covers.
|
||||
|
||||
### Cover Layout Order (Mandatory)
|
||||
|
||||
The visual order on academic covers must follow this hierarchy from top to bottom:
|
||||
|
||||
1. School name (top)
|
||||
2. Document type label (e.g., "Undergraduate Thesis", "Thesis Proposal Report")
|
||||
3. **Thesis title** (prominent, centered)
|
||||
4. Thesis English title (if bilingual)
|
||||
5. **Author information table** (college, major, author, student ID, advisor)
|
||||
6. Date (bottom)
|
||||
|
||||
⚠️ **Title MUST appear ABOVE the author info table.** The screenshot issue of info table appearing above the title is caused by incorrect element ordering. The `buildAcademicCover()` and `buildProposalCover()` functions below enforce correct order.
|
||||
|
||||
⚠️ **Layout must be vertically balanced** — use dynamic spacing to distribute whitespace evenly. Do not cram all elements into the top half or let large gaps appear between elements.
|
||||
|
||||
```js
|
||||
function buildAcademicCover(info) {
|
||||
const { school, title, titleEN, college, major, author, studentId, advisor, date } = info;
|
||||
|
||||
// ⚠️ Use safeText() for all values — never output "undefined"
|
||||
const infoRows = [
|
||||
["College", safeText(college, "【College】")],
|
||||
["Major", safeText(major, "【Major】")],
|
||||
["Author", safeText(author, "【Author】")],
|
||||
["Student ID", safeText(studentId, "【Student ID】")],
|
||||
["Advisor", safeText(advisor, "【Advisor】")],
|
||||
];
|
||||
|
||||
const infoTable = new Table({
|
||||
width: { size: 60, type: WidthType.PERCENTAGE },
|
||||
alignment: AlignmentType.CENTER,
|
||||
borders: { top: NB, bottom: NB, left: NB, right: NB, insideHorizontal: NB, insideVertical: NB },
|
||||
rows: infoRows.map(([label, value]) => new TableRow({
|
||||
cantSplit: true,
|
||||
children: [
|
||||
new TableCell({
|
||||
width: { size: 35, type: WidthType.PERCENTAGE },
|
||||
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "000000" }, top: NB, left: NB, right: NB },
|
||||
margins: { top: 60, bottom: 60, left: 120, right: 120 },
|
||||
children: [new Paragraph({
|
||||
alignment: AlignmentType.RIGHT,
|
||||
children: [new TextRun({ text: label + ":", size: 28, font: { eastAsia: "SimHei", ascii: "Times New Roman" } })],
|
||||
})],
|
||||
}),
|
||||
new TableCell({
|
||||
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "000000" }, top: NB, left: NB, right: NB },
|
||||
margins: { top: 60, bottom: 60, left: 120, right: 120 },
|
||||
children: [new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
children: [new TextRun({ text: value, size: 28, font: { eastAsia: "SimSun", ascii: "Times New Roman" } })],
|
||||
})],
|
||||
}),
|
||||
],
|
||||
})),
|
||||
});
|
||||
|
||||
// ⚠️ Correct order: school → doc type → TITLE → info table → date
|
||||
// ★ Rule 8: All large-font paragraphs must set explicit line spacing
|
||||
return [
|
||||
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { before: 1200, after: 400, line: Math.ceil(22 * 23), lineRule: "atLeast" },
|
||||
children: [new TextRun({ text: safeText(school, "【University Name】"), size: 44, bold: true, font: { eastAsia: "SimHei" } })] }),
|
||||
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { after: 800, line: Math.ceil(18 * 23), lineRule: "atLeast" },
|
||||
children: [new TextRun({ text: "Undergraduate Thesis", size: 36, font: { eastAsia: "SimHei" } })] }),
|
||||
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { after: 200, line: Math.ceil(18 * 23), lineRule: "atLeast" },
|
||||
children: [new TextRun({ text: safeText(title, "【Thesis Title】"), size: 36, bold: true, font: { eastAsia: "SimHei", ascii: "Times New Roman" } })] }),
|
||||
titleEN ? new Paragraph({ alignment: AlignmentType.CENTER, spacing: { after: 1200, line: Math.ceil(16 * 23), lineRule: "atLeast" },
|
||||
children: [new TextRun({ text: titleEN, size: 32, font: { ascii: "Times New Roman" } })] })
|
||||
: new Paragraph({ spacing: { after: 1200 }, children: [] }),
|
||||
infoTable,
|
||||
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { before: 1200, line: Math.ceil(14 * 23), lineRule: "atLeast" },
|
||||
children: [new TextRun({ text: safeText(date, "2026/XX"), size: 28, font: { eastAsia: "SimSun" } })] }),
|
||||
];
|
||||
}
|
||||
```
|
||||
|
||||
### Thesis Proposal Report Cover (开题报告)
|
||||
|
||||
Thesis proposal reports use a similar cover layout but with different document type label. The key layout rule is the same: **title above author info, evenly spaced**.
|
||||
|
||||
⚠️ **CRITICAL — Proposal cover MUST be an independent section:**
|
||||
The proposal cover MUST be placed in its **own section** (with margin: 0 and a 16838 wrapper table), completely separate from the body content. The body content starts in the **next section** (with `SectionType.NEXT_PAGE` or as a separate section entry). **Never place the cover elements and body content in the same section** — this causes them to render on the same page without any page break, which is the #1 proposal report formatting failure.
|
||||
|
||||
```js
|
||||
// ✅ Correct — cover and body in separate sections
|
||||
sections: [
|
||||
{
|
||||
properties: { page: { margin: { top: 0, bottom: 0, left: 0, right: 0 } } },
|
||||
children: buildProposalCover(info), // standalone cover section
|
||||
},
|
||||
{
|
||||
properties: { page: { margin: { top: 1440, bottom: 1440, left: 1701, right: 1417 } } },
|
||||
children: [...bodyContent], // body starts here
|
||||
},
|
||||
]
|
||||
|
||||
// ❌ WRONG — cover and body in same section (no page separation!)
|
||||
sections: [
|
||||
{
|
||||
children: [...coverElements, ...bodyContent], // everything on one continuous flow
|
||||
},
|
||||
]
|
||||
```
|
||||
|
||||
```js
|
||||
function buildProposalCover(info) {
|
||||
const { school, year, title, subtitle, college, major, author, studentId, advisor, date } = info;
|
||||
|
||||
// ⚠️ Use safeText() for all values
|
||||
const infoRows = [
|
||||
["姓名 (Name)", safeText(author, "XXX")],
|
||||
["专业 (Major)", safeText(major, "XXX")],
|
||||
["入学时间 (Enrollment)", safeText(info.enrollment, "XXX")],
|
||||
];
|
||||
|
||||
const infoTable = new Table({
|
||||
width: { size: 60, type: WidthType.PERCENTAGE },
|
||||
alignment: AlignmentType.CENTER,
|
||||
borders: { top: NB, bottom: NB, left: NB, right: NB, insideHorizontal: NB, insideVertical: NB },
|
||||
rows: infoRows.map(([label, value]) => new TableRow({
|
||||
children: [
|
||||
new TableCell({
|
||||
width: { size: 35, type: WidthType.PERCENTAGE },
|
||||
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "000000" }, top: NB, left: NB, right: NB },
|
||||
margins: { top: 60, bottom: 60, left: 120, right: 120 },
|
||||
children: [new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
children: [new TextRun({ text: label, size: 28, bold: true, font: { eastAsia: "SimHei", ascii: "Times New Roman" } })],
|
||||
})],
|
||||
}),
|
||||
new TableCell({
|
||||
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "000000" }, top: NB, left: NB, right: NB },
|
||||
margins: { top: 60, bottom: 60, left: 120, right: 120 },
|
||||
children: [new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
children: [new TextRun({ text: value, size: 28, font: { eastAsia: "SimSun", ascii: "Times New Roman" } })],
|
||||
})],
|
||||
}),
|
||||
],
|
||||
})),
|
||||
});
|
||||
|
||||
// ⚠️ Correct order: doc type label → info table → "论文题目" label → TITLE → subtitle
|
||||
// Layout balanced: upper 40% for header + info, middle 20% for title, lower 40% for whitespace
|
||||
// ★ Rule 8: All large-font paragraphs must set explicit line spacing
|
||||
return [
|
||||
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { before: 1500, after: 600, line: Math.ceil(18 * 23), lineRule: "atLeast" },
|
||||
children: [new TextRun({ text: safeText(year, "2025") + " 届本科毕业论文开题报告",
|
||||
size: 36, bold: true, font: { eastAsia: "SimHei", ascii: "Times New Roman" } })] }),
|
||||
infoTable,
|
||||
new Paragraph({ spacing: { before: 1200 } }), // Balanced whitespace
|
||||
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { after: 200 },
|
||||
children: [new TextRun({ text: "论文题目", size: 28, font: { eastAsia: "SimSun", ascii: "Times New Roman" } })] }),
|
||||
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { after: 200, line: Math.ceil(16 * 23), lineRule: "atLeast" },
|
||||
children: [new TextRun({ text: safeText(title, "【Thesis Title】"), size: 32, bold: true,
|
||||
font: { eastAsia: "SimHei", ascii: "Times New Roman" } })] }),
|
||||
subtitle ? new Paragraph({ alignment: AlignmentType.CENTER, spacing: { after: 800 },
|
||||
children: [new TextRun({ text: "——" + subtitle, size: 28,
|
||||
font: { eastAsia: "SimSun", ascii: "Times New Roman" } })] })
|
||||
: new Paragraph({ spacing: { after: 800 }, children: [] }),
|
||||
];
|
||||
}
|
||||
```
|
||||
|
||||
### ⚠️ WPS Compatibility Notes for Academic Covers
|
||||
|
||||
Both thesis cover and proposal cover use info tables. These MUST follow the cross-engine rules:
|
||||
- Table uses **percentage widths** (`WidthType.PERCENTAGE`), NOT DXA — WPS renders DXA widths differently in nested contexts
|
||||
- Table width: adaptive 55–75%, centered via `alignment: CENTER` (calculated by `calcR5MetaLayout()`)
|
||||
- Label column: **LEFT aligned**, plain text + ":", NO full-width space padding, NO borders
|
||||
- Value column: **LEFT aligned**, `bottom: single sz=4` border = fixed-length underline
|
||||
- Cell `margins.top/bottom: 60` is acceptable (small values) but avoid larger values
|
||||
- All paragraphs with font size > 12pt (body) must set `spacing: { line: Math.ceil(fontPt * 23), lineRule: "atLeast" }` to prevent top clipping (Rule 8)
|
||||
- ⚠️ Do NOT use DXA widths, full-width space padding (`\u3000`), tab stops, or right-alignment for meta info
|
||||
|
||||
⚠️ **Proposal cover must fit on one page.** Use the same height-budget approach as commercial covers — total content height must stay within 15638 twips (1200 twips safety margin). If the title is very long, reduce font size (minimum 24pt).
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Section Content Standards
|
||||
|
||||
### Chinese Abstract
|
||||
**Format:**
|
||||
- Title: "摘 要" (space in middle), San Hao SimHei centered, H1 style
|
||||
- Body: Xiao Si SimSun, justified, first-line indent 480 twips
|
||||
- Keywords: "关键词:" SimHei bold + content SimSun normal, 3–8 keywords, semicolon-separated
|
||||
|
||||
**Content structure (mandatory):**
|
||||
1. Research background (1–2 sentences)
|
||||
2. Research problem/purpose (1 sentence)
|
||||
3. Research method (1–2 sentences)
|
||||
4. Main results/findings (2–3 sentences)
|
||||
5. Research significance/value (1 sentence)
|
||||
|
||||
⚠️ **Abstract is NOT a TOC summary.** Must not read as "Chapter 1 introduces... Chapter 2 analyzes..."
|
||||
|
||||
### English Abstract
|
||||
- Title: "Abstract", San Hao Times New Roman Bold, centered, H1 style
|
||||
- Body: Xiao Si Times New Roman, justified
|
||||
- Keywords: bold label + normal content, 3–8 keywords, comma-separated
|
||||
- **Must be consistent with Chinese abstract** — no significant shrinkage
|
||||
- Use formal academic English, avoid Chinglish
|
||||
|
||||
### Table of Contents
|
||||
- Title: "目 录", San Hao SimHei centered
|
||||
- Use `TableOfContents` field for auto-generation, display at least H1–H2, recommend H3
|
||||
- Run `"$DOCX_SCRIPTS/add_toc_placeholders.py" --auto` after generation
|
||||
- TOC on its own page
|
||||
|
||||
---
|
||||
|
||||
## Body Chapter Structure
|
||||
|
||||
### Standard Structure (6-chapter)
|
||||
|
||||
```
|
||||
Chapter 1: Introduction
|
||||
1.1 Research Background
|
||||
1.2 Research Purpose & Significance
|
||||
1.3 Literature Review (Domestic & International)
|
||||
1.4 Research Content & Methods
|
||||
1.5 Thesis Structure
|
||||
|
||||
Chapter 2: Theoretical Framework & Literature Review
|
||||
2.1 Core Concept Definitions
|
||||
2.2 Theoretical Basis
|
||||
2.3 Literature Review
|
||||
2.4 Research Gap & Entry Point
|
||||
|
||||
Chapter 3: Research Design / Method / Model
|
||||
3.1 Research Framework
|
||||
3.2 Method Design / System Architecture / Algorithm
|
||||
3.3 Variables / Data Sources / Experimental Environment
|
||||
|
||||
Chapter 4: Empirical Analysis / Case Study / Results
|
||||
4.1 Data Analysis / Case Description / Experiment Process
|
||||
4.2 Results Presentation
|
||||
4.3 Results Interpretation
|
||||
|
||||
Chapter 5: Discussion
|
||||
5.1 Key Findings
|
||||
5.2 Comparison with Existing Research
|
||||
5.3 Limitations
|
||||
|
||||
Chapter 6: Conclusions & Outlook
|
||||
6.1 Research Conclusions
|
||||
6.2 Contributions
|
||||
6.3 Limitations
|
||||
6.4 Future Research Directions
|
||||
```
|
||||
|
||||
### Chapter Content Requirements
|
||||
|
||||
**Chapter 1 (Introduction):** Must state background, purpose, significance, methods, content, structure.
|
||||
|
||||
**Chapter 2 (Literature Review):** Must be systematically organized by theme/method/stage — **never a chronological dump of papers**. Must identify contributions, gaps, and research opportunities.
|
||||
|
||||
**Chapter 3 (Method):** Must explain why this method was chosen and its rationale. Content must be understandable, executable, reproducible.
|
||||
|
||||
**Chapter 4 (Results):** Must be specific, not vague. Must be consistent with Chapter 3 design.
|
||||
|
||||
**Chapter 5 (Discussion):** Must not merely repeat Chapter 4 results. Must explain what results mean and what conclusions they support.
|
||||
|
||||
**Chapter 6 (Conclusions):** Must summarize concisely, state contributions, acknowledge limitations, propose future directions. Must end formally — no abrupt ending.
|
||||
|
||||
---
|
||||
|
||||
## Discipline-Adaptive Routing
|
||||
|
||||
Auto-adjust research methods and chapter emphasis by discipline. **When user doesn't specify method, choose the most appropriate research paradigm for the discipline — never mechanically apply "empirical + survey + regression" template.**
|
||||
|
||||
### 1. Humanities & Social Sciences (Literature, History, Philosophy, Arts)
|
||||
**Preferred methods:** Literature analysis, theoretical research, text analysis, comparative studies, historical research
|
||||
**Adjustments:** Ch.2 focuses on theoretical lineage; Ch.4 becomes text analysis/case argumentation; minimize "variables", "hypotheses", "regression" terminology
|
||||
|
||||
### 2. Management / Economics / Public Administration
|
||||
**Preferred methods:** Case analysis, surveys, model analysis, institutional research, empirical research
|
||||
**Adjustments:** Ch.3 focuses on hypotheses, variables, framework; Ch.4 on data collection & analysis; Ch.5 adds management implications/policy recommendations
|
||||
|
||||
### 3. Computer Science / Engineering / IT
|
||||
**Preferred methods:** Method design, system architecture, experimental comparison, performance evaluation, algorithm analysis
|
||||
**Adjustments:** Ch.3 becomes system/algorithm design; Ch.4 becomes experiments (environment, parameters, control experiments, metric comparison); minimize "interviews", "surveys"
|
||||
|
||||
### 4. Education / Linguistics / Communication
|
||||
**Preferred methods:** Teaching experiments, text analysis, survey research, interview research, case studies
|
||||
**Adjustments:** Ch.3 focuses on subjects, dimensions, samples; Ch.4 on teaching practice/communication case analysis; Ch.5 adds educational implications/communication strategies
|
||||
|
||||
### 5. Law / Marxism / Policy Studies
|
||||
**Preferred methods:** Normative analysis, statutory interpretation, case studies, institutional comparison, theoretical analysis
|
||||
**Adjustments:** Ch.2 focuses on legal/policy framework; Ch.4 becomes case analysis/institutional comparison; Ch.5 focuses on normative evaluation, reform recommendations
|
||||
|
||||
---
|
||||
|
||||
## Figure/Table/Formula Numbering (By Chapter)
|
||||
|
||||
### Numbering Rules
|
||||
|
||||
| Type | Format | Example |
|
||||
|------|--------|---------|
|
||||
| Figure | Figure X-Y | Figure 3-1, Figure 4-2 |
|
||||
| Table | Table X-Y | Table 2-1, Table 4-3 |
|
||||
| Formula | Eq. (X-Y) | Eq. (3-1), Eq. (5-2) |
|
||||
|
||||
Where X = chapter number, Y = sequential number within chapter.
|
||||
|
||||
### Figures
|
||||
- Caption **below** figure, Wu Hao SimSun, centered
|
||||
- Format: "Figure X-Y Description"
|
||||
- Must be referenced in text: "as shown in Figure 3-1"
|
||||
|
||||
```js
|
||||
new Paragraph({ alignment: AlignmentType.CENTER,
|
||||
children: [new ImageRun({ data: imgBuf, transformation: { width: w, height: h }, type: "png" })] }),
|
||||
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { before: 60, after: 200 },
|
||||
children: [new TextRun({ text: "图3-1 System Architecture", size: 21,
|
||||
font: { eastAsia: "SimSun", ascii: "Times New Roman" } })] }),
|
||||
```
|
||||
|
||||
### Tables
|
||||
- Caption **above** table, Wu Hao SimSun, centered, `keepNext: true`
|
||||
- Format: "Table X-Y Description"
|
||||
- Must use three-line table (mandatory for academic papers)
|
||||
- Must be referenced in text: "as shown in Table 2-1"
|
||||
|
||||
### Formulas
|
||||
- Formula centered, number **right-aligned**
|
||||
- Use Tab for center + right alignment
|
||||
- Text reference: "from Eq. (3-1)"
|
||||
|
||||
```js
|
||||
new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
tabStops: [
|
||||
{ type: TabStopType.CENTER, position: 4500 },
|
||||
{ type: TabStopType.RIGHT, position: 9000 },
|
||||
],
|
||||
children: [
|
||||
new TextRun({ text: "\t" }),
|
||||
new TextRun({ text: "E = mc²" }),
|
||||
new TextRun({ text: "\t(3-1)" }),
|
||||
],
|
||||
}),
|
||||
```
|
||||
|
||||
### Mandatory Rules
|
||||
1. Figures/tables/formulas **must be referenced in text** — never placed without explanation
|
||||
2. Must have introductory and analytical text before/after
|
||||
3. Must not exceed page margins
|
||||
4. Insert only when analytically valuable — not for decoration
|
||||
|
||||
---
|
||||
|
||||
## Citation & Reference System
|
||||
|
||||
### In-Text Citation (Sequential Numbering)
|
||||
|
||||
Default: **GB/T 7714 sequential numbering** — `[1]`, `[2]` in text, references listed in order of appearance.
|
||||
|
||||
```js
|
||||
new TextRun({ text: "[1]", superScript: true, size: 18, font: { ascii: "Times New Roman" } })
|
||||
```
|
||||
|
||||
### Citation Rules
|
||||
1. In-text numbers must **correspond one-to-one** with reference list
|
||||
2. **Same source reused keeps the same number**
|
||||
3. **Do not mix footnote citations and endnote references** (unless user explicitly requests)
|
||||
4. Footnotes are for supplementary notes only, not primary citations
|
||||
|
||||
### Reference Format (GB/T 7714)
|
||||
```
|
||||
[1] Author. Title[J]. Journal, Year, Vol(No): Pages.
|
||||
[2] Author. Book Title[M]. Place: Publisher, Year: Pages.
|
||||
[3] Author. Title[D]. Location: Institution, Year.
|
||||
[4] Author. Title[EB/OL]. (Published)[Cited]. URL.
|
||||
```
|
||||
|
||||
### Reference Formatting
|
||||
```js
|
||||
// Reference title — H1 style
|
||||
new Paragraph({ heading: HeadingLevel.HEADING_1, alignment: AlignmentType.CENTER,
|
||||
children: [new TextRun({ text: "References", bold: true, size: 32, font: { eastAsia: "SimHei" } })] }),
|
||||
// Each entry — hanging indent
|
||||
new Paragraph({
|
||||
indent: { left: 420, hanging: 420 },
|
||||
spacing: { line: 360 },
|
||||
children: [new TextRun({ text: "[1] Author. Title[J]. Journal, 2024, 59(3): 45-62.",
|
||||
size: 21, font: { eastAsia: "SimSun", ascii: "Times New Roman" } })],
|
||||
}),
|
||||
```
|
||||
|
||||
### Reference Count Guidelines
|
||||
|
||||
| Thesis Type | Suggested Count |
|
||||
|------------|----------------|
|
||||
| Course paper (3000–5000 words) | 10–15 |
|
||||
| Undergraduate thesis | 15–30 |
|
||||
| Master's thesis | 40–80 |
|
||||
| Doctoral dissertation | 80–150 |
|
||||
|
||||
If user specifies APA, MLA, Chicago, or school-specific format, follow that instead.
|
||||
|
||||
---
|
||||
|
||||
## Three-Line Table (Mandatory for Academic Papers)
|
||||
|
||||
All tables in academic papers **must use three-line tables** — no full-border tables.
|
||||
|
||||
```js
|
||||
const threeLineTable = new Table({
|
||||
width: { size: 100, type: WidthType.PERCENTAGE },
|
||||
borders: {
|
||||
top: { style: BorderStyle.SINGLE, size: 4, color: "000000" },
|
||||
bottom: { style: BorderStyle.SINGLE, size: 4, color: "000000" },
|
||||
left: { style: BorderStyle.NONE }, right: { style: BorderStyle.NONE },
|
||||
insideHorizontal: { style: BorderStyle.NONE }, insideVertical: { style: BorderStyle.NONE },
|
||||
},
|
||||
rows: [
|
||||
new TableRow({
|
||||
tableHeader: true, cantSplit: true,
|
||||
children: headerCells.map(text => new TableCell({
|
||||
borders: { bottom: { style: BorderStyle.SINGLE, size: 2, color: "000000" },
|
||||
top: { style: BorderStyle.NONE }, left: { style: BorderStyle.NONE }, right: { style: BorderStyle.NONE } },
|
||||
margins: { top: 60, bottom: 60, left: 120, right: 120 },
|
||||
children: [new Paragraph({ alignment: AlignmentType.CENTER,
|
||||
children: [new TextRun({ text, bold: true, size: 21, font: { eastAsia: "SimSun", ascii: "Times New Roman" } })] })],
|
||||
})),
|
||||
}),
|
||||
...dataRows, // All borders NONE
|
||||
],
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Content Quality Constraints (Mandatory)
|
||||
|
||||
### Truthfulness & Conservatism
|
||||
1. **Never fabricate** unverifiable statistics, survey response counts, significance levels, interview subject identities, experimental precision, government document numbers
|
||||
2. **Never invent** non-existent classic theories, authoritative scholar opinions, regulation names, core data sources
|
||||
3. When user provides no real data → prefer **theoretical analysis, literature research, case studies, comparative analysis** (low-risk methods)
|
||||
4. If example data must be constructed → keep scale reasonable, results conservative; never produce "significantly superior" or "dramatically improved" high-risk claims
|
||||
5. Research conclusions must be **restrained** — do not overstate contributions, effects, or applicability
|
||||
6. Research limitations must be **honestly disclosed**
|
||||
|
||||
### Language Style
|
||||
1. Formal academic register throughout
|
||||
2. **Forbidden:** "I think", "everyone knows", "obviously", "it is well known" (subjective expressions)
|
||||
3. **Forbidden:** Sloganeering, propaganda, advertising-style expressions
|
||||
4. First occurrence of CN/EN terms should include English original
|
||||
5. CN/EN punctuation, spacing, and number formats must be consistent throughout
|
||||
|
||||
### Structural Consistency
|
||||
1. Abstract, body, and conclusions **must be consistent** — no self-contradiction
|
||||
2. Must form complete loop: "research question → method → analysis → findings → conclusions & outlook"
|
||||
3. Terminology consistent throughout — no concept drift
|
||||
4. All chapters balanced and substantive — no padding
|
||||
|
||||
### Document Cleanliness
|
||||
1. **No residual** comments, tracked changes, field codes, template default text
|
||||
2. **No** "TBD", "omitted", "user modifies", "insert figure here" expressions
|
||||
3. **No** Markdown syntax, HTML tags, code blocks wrapping body text
|
||||
4. **No** consecutive blank lines, abnormal page breaks, chaotic numbering
|
||||
5. Final document must be clean, well-formatted, ready for submission
|
||||
|
||||
---
|
||||
|
||||
## School Standard Override Rule
|
||||
|
||||
⚠️ **When user specifies school/journal-specific format requirements, those requirements OVERRIDE all defaults above.**
|
||||
|
||||
Common override items:
|
||||
- Margins (binding margin left 3.5 cm common)
|
||||
- Body font (some schools require FangSong)
|
||||
- Line spacing (some schools require fixed 28pt)
|
||||
- Cover layout (varies significantly by school)
|
||||
- Reference format (APA, MLA, etc.)
|
||||
- Heading numbering (some schools use "1", "2" instead of "Chapter 1", "Chapter 2")
|
||||
|
||||
### Common Variants
|
||||
|
||||
| Thesis Type | Common Differences |
|
||||
|------------|-------------------|
|
||||
| Top universities | Strict GB/T 7714, often require STXiaoBiaoSong cover |
|
||||
| Regular undergraduate | More flexible, SimSun/SimHei sufficient |
|
||||
| Master's thesis | Requires English abstract, longer lit review, innovation statement |
|
||||
| Doctoral dissertation | Requires innovation statement, publication list, originality declaration |
|
||||
|
||||
---
|
||||
|
||||
## Scene-Specific Quality Checks
|
||||
|
||||
In addition to universal checks (see `references/common-rules.md`):
|
||||
|
||||
### Structure & Content
|
||||
- [ ] Cover, abstract, English abstract, TOC, body, references all present
|
||||
- [ ] Cover info complete (school/title/EN title/college/major/name/ID/advisor/date)
|
||||
- [ ] Abstract contains 5 elements: background + problem + method + results + significance
|
||||
- [ ] English abstract consistent with Chinese abstract
|
||||
- [ ] All chapters balanced, substantive, logical loop complete
|
||||
- [ ] Literature review is thematic, not chronological dump
|
||||
- [ ] Conclusions respond to research questions
|
||||
|
||||
### Format & Layout
|
||||
- [ ] Heading numbering consistent (Chapter X / X.X / X.X.X), no mixing
|
||||
- [ ] All body headings use `heading: HeadingLevel.HEADING_X`
|
||||
- [ ] Body text pure black `"000000"`
|
||||
- [ ] Three-line tables used consistently (no full-border tables)
|
||||
- [ ] Figure captions below, table captions above, numbered by chapter
|
||||
- [ ] Formulas centered, numbers right-aligned
|
||||
- [ ] In-text citations match reference list one-to-one
|
||||
- [ ] References use hanging indent, consistent format
|
||||
- [ ] Page numbers: front matter Roman, body Arabic from 1
|
||||
- [ ] Cover has no page number
|
||||
- [ ] Headers formal and concise
|
||||
- [ ] No extra blank pages
|
||||
|
||||
### Cleanliness
|
||||
- [ ] No comment/revision residuals
|
||||
- [ ] No "TBD" / "omitted" expressions
|
||||
- [ ] No Markdown/HTML/code block residuals
|
||||
- [ ] No consecutive blank lines or abnormal page breaks
|
||||
- [ ] No fabricated high-risk data or exaggerated conclusions
|
||||
463
skills/docx/scenes/contract.md
Executable file
463
skills/docx/scenes/contract.md
Executable file
@@ -0,0 +1,463 @@
|
||||
# Scene: Contract / Agreement
|
||||
|
||||
## Goal
|
||||
|
||||
Generate a complete, formal, well-structured legal document with clear clauses, rigorous logic, and proper formatting. Must simultaneously meet:
|
||||
- Complete structure, clear clauses, formal language, explicit responsibilities
|
||||
- Identifiable risk boundaries, proper Word formatting
|
||||
- Ready for review, revision, circulation, or signing preparation
|
||||
|
||||
**Forbidden:** Producing outlines-only / sample clauses / drafting advice / risk summaries; outputting chat-style explanations or filler phrases.
|
||||
|
||||
→ Font profile: **A (Formal)** — see `references/common-rules.md`
|
||||
→ Default layout: standard margins — see `references/common-rules.md`
|
||||
→ Placeholder convention & universal prohibitions — see `references/common-rules.md`
|
||||
|
||||
---
|
||||
|
||||
## Contract Type Routing
|
||||
|
||||
```js
|
||||
function selectContractType(keywords, topic) {
|
||||
if (/confidential|NDA|non-disclosure/.test(keywords)) return "nda";
|
||||
if (/transfer|equity|asset|rights/.test(keywords)) return "transfer";
|
||||
if (/framework|strategic|cooperation agreement/.test(keywords)) return "framework";
|
||||
if (/terms|platform rules|user agreement|privacy/.test(keywords)) return "terms";
|
||||
return "bilateral"; // default: bilateral commercial contract
|
||||
}
|
||||
```
|
||||
|
||||
### 5 Contract Types
|
||||
|
||||
| Type | Use Case | Structure Focus |
|
||||
|------|----------|----------------|
|
||||
| bilateral | Service/sale/development/procurement contracts | Subject → Consideration → Performance → Acceptance → Breach → Dispute |
|
||||
| transfer | Equity/debt/asset/rights transfer | Subject → Consideration → Closing & Registration → Representations → Tax |
|
||||
| nda | Non-disclosure agreements | Definition of Confidential Info → Obligations → Use Restrictions → Exceptions → Duration |
|
||||
| framework | Cooperation framework / strategic alliance | Scope → Division of Work → Mechanism → Subsequent Agreements |
|
||||
| terms | Platform rules / Terms of Service / User agreements | Definitions → Services → Rights & Obligations → Liability Limits → Amendments |
|
||||
|
||||
---
|
||||
|
||||
## Standard Template Structures
|
||||
|
||||
### Template A: Bilateral Commercial Contract
|
||||
1. Header (title, contract number, date, location)
|
||||
2. Party Information (Party A, Party B)
|
||||
3. Recitals ("Whereas" clauses)
|
||||
4. Definitions & Interpretation
|
||||
5. Subject Matter & Scope of Services/Delivery
|
||||
6. Contract Price & Payment Terms
|
||||
7. Rights & Obligations of Both Parties
|
||||
8. Timeline, Delivery & Acceptance
|
||||
9. Invoicing, Tax & Settlement
|
||||
10. Intellectual Property & Confidentiality
|
||||
11. Representations & Warranties (if applicable)
|
||||
12. Liability for Breach
|
||||
13. Force Majeure
|
||||
14. Termination & Dissolution
|
||||
15. Notices & Service
|
||||
16. Dispute Resolution
|
||||
17. Miscellaneous
|
||||
18. Signature Block
|
||||
|
||||
### Template B: Rights Transfer Agreement
|
||||
1. Header & Parties
|
||||
2. Recitals
|
||||
3. Definitions & Interpretation
|
||||
4. Subject of Transfer
|
||||
5. Consideration & Payment Arrangement
|
||||
6. Closing & Registration/Transfer
|
||||
7. Representations & Warranties
|
||||
8. Tax Allocation
|
||||
9. Liability for Breach
|
||||
10. Dispute Resolution
|
||||
11. Miscellaneous
|
||||
12. Signature Block
|
||||
|
||||
### Template C: Non-Disclosure Agreement (NDA)
|
||||
1. Header & Parties
|
||||
2. Recitals
|
||||
3. Definition of Confidential Information
|
||||
4. Confidentiality Obligations
|
||||
5. Use Restrictions
|
||||
6. Return, Deletion & Destruction of Information
|
||||
7. Exceptions
|
||||
8. Confidentiality Period
|
||||
9. Liability for Breach
|
||||
10. Dispute Resolution
|
||||
11. Miscellaneous
|
||||
12. Signature Block
|
||||
|
||||
### Template D: Framework / Cooperation Agreement
|
||||
1. Header & Parties
|
||||
2. Recitals
|
||||
3. Purpose & Principles
|
||||
4. Scope of Cooperation
|
||||
5. Division of Work & Responsibilities
|
||||
6. Project Advancement Mechanism
|
||||
7. Commercial Arrangements / Subsequent Agreements
|
||||
8. Confidentiality, IP & Compliance
|
||||
9. Term, Amendment & Termination
|
||||
10. Liability for Breach
|
||||
11. Dispute Resolution
|
||||
12. Miscellaneous
|
||||
13. Signature Block
|
||||
|
||||
### Template E: Unilateral Terms / Platform Rules
|
||||
1. Document Title
|
||||
2. Definitions & Scope
|
||||
3. Service/Rule Content
|
||||
4. User Rights & Obligations / Platform Rights & Obligations
|
||||
5. Liability Limitations & Disclaimers
|
||||
6. Fees & Payment (if applicable)
|
||||
7. Intellectual Property
|
||||
8. Termination, Suspension & Amendment
|
||||
9. Notices & Service
|
||||
10. Dispute Resolution
|
||||
11. Miscellaneous
|
||||
|
||||
**Note:** Unilateral/boilerplate terms require special attention to adhesion clause risks — avoid creating extremely one-sided documents.
|
||||
|
||||
**If the user provides an existing template, historical agreement, or company standard, always follow it first.**
|
||||
|
||||
---
|
||||
|
||||
## Input Recognition & Completion
|
||||
|
||||
### Processing Rules
|
||||
1. If user provides a template, historical agreement, or company standard → **always follow it first**
|
||||
2. If information is incomplete, fill conservatively — must be **restrained, natural, professional, consistent with transaction logic**
|
||||
3. **Never fabricate** unrealistic commercial terms, regulatory requirements, approval conclusions, qualification status, tax treatment results, payment facts, or performance facts
|
||||
4. If critical info is missing → use standardized placeholders
|
||||
5. If user does not specify jurisdiction → default to PRC commercial writing conventions, but avoid making specific legal conclusions
|
||||
|
||||
---
|
||||
|
||||
## Legal Writing Standards
|
||||
|
||||
### Register
|
||||
1. Use formal legal document register
|
||||
2. Use clear party designations: "Party A", "Party B", "both parties", "either party", "non-breaching party", "breaching party"
|
||||
3. **Forbidden:** Colloquial expressions ("you", "me", "they", "pay up", "cancel the contract", "handle ASAP")
|
||||
4. Preferred terms: "pay consideration", "perform obligations", "constitute a breach", "terminate the contract", "assume liability for damages", "written notice", "deliver and accept", "representations and warranties"
|
||||
|
||||
### Precision
|
||||
1. Eliminate vague adjectives: avoid "quality", "reasonable", "enormous", "appropriate", "ASAP" unless necessary for legal flexibility
|
||||
2. Each obligation must specify: who, when, how, what
|
||||
3. Consistent legal phrasing:
|
||||
- Mandatory obligation → "shall"
|
||||
- Right authorization → "has the right to"
|
||||
- Prohibition → "shall not"
|
||||
- Discretionary → "may"
|
||||
4. Amounts, dates, percentages, deadlines, business days vs. calendar days must be as specific as possible
|
||||
|
||||
### Clear Subjects
|
||||
1. Every clause must have an explicit responsible party — avoid vague subjects ("relevant parties", "relevant personnel", "when necessary")
|
||||
2. Joint obligations: explicitly write "both parties agree" or "both parties shall"
|
||||
3. Unilateral obligations: explicitly write "Party A shall" or "Party B shall"
|
||||
|
||||
---
|
||||
|
||||
## Transaction Closure & Risk Control
|
||||
|
||||
A contract must not only describe the transaction — it must ensure logical closure. Check the following:
|
||||
|
||||
1. If a performance deadline is specified → specify consequences of delay
|
||||
2. If payment milestones are specified → specify payment conditions, method, invoice requirements
|
||||
3. If a delivery obligation exists → specify delivery standards, method, acceptance rules, objection period
|
||||
4. If termination rights exist → specify conditions, notice, effective date, post-termination settlement
|
||||
5. If breach liability exists → must correspond to main obligations in preceding clauses
|
||||
6. If IP/technology/data/trade secrets are involved → separately address ownership, license scope, use restrictions
|
||||
7. If confidentiality obligations exist → define scope, exceptions, duration, breach consequences
|
||||
8. If force majeure clause exists → specify notice obligation, mitigation duty, subsequent negotiation mechanism
|
||||
9. If notice/service arrangements exist → specify address, contact person, email, or other delivery method
|
||||
10. If user requests significantly one-sided adhesion/disclaimer clauses → add a note near the clause:
|
||||
`[Note: This clause may involve adhesion terms or liability limitations. Manual review recommended for the specific transaction.]`
|
||||
|
||||
---
|
||||
|
||||
## Truthfulness & Legal Caution
|
||||
|
||||
1. **Never fabricate** specific statute article numbers, judicial interpretation numbers, or regulatory document numbers
|
||||
2. Legal bases should use general references, e.g.: "In accordance with the Civil Code of the PRC and relevant laws and regulations..."
|
||||
3. **Never** pretend to provide formal legal opinions, litigation success predictions, or definitive validity/invalidity conclusions
|
||||
4. **Never** state definitive legality conclusions for high-risk clauses (adhesion terms, penalty clauses, disclaimers, non-compete, exclusivity, unilateral interpretation rights)
|
||||
5. **Never** fabricate that regulatory approvals are obtained, title is unencumbered, tax compliance is assured, or third-party consent is secured
|
||||
6. When critical info is insufficient → use placeholders, never present as confirmed fact
|
||||
7. For high-risk areas (equity, debt, licenses, data compliance, labor, personal information, cross-border) → maintain restrained language, do not add rigid commitments without user confirmation
|
||||
|
||||
---
|
||||
|
||||
## Special Clause Requirements
|
||||
|
||||
### Definitions Clause
|
||||
If the document repeatedly uses specialized terms ("deliverables", "service results", "confidential information", "source code", "project milestones", "acceptance criteria", "trade secrets"), include a "Definitions & Interpretation" clause near the beginning.
|
||||
|
||||
### Dispute Resolution
|
||||
1. Must be explicit
|
||||
2. Choose between litigation OR arbitration — never mix both
|
||||
3. Litigation → specify jurisdictional connection point
|
||||
4. Arbitration → specify arbitration institution
|
||||
5. If user hasn't specified → use placeholder for confirmation
|
||||
|
||||
### Tax Clause
|
||||
1. If the transaction involves taxes → specify which party bears them, whether price includes tax, invoice type and conditions
|
||||
2. Avoid vague "taxes borne as required by law" without transaction-specific detail
|
||||
|
||||
### Breach Liability
|
||||
1. Must correspond to main obligations in preceding clauses
|
||||
2. Penalty amounts should be restrained — avoid obviously exaggerated or severely imbalanced figures
|
||||
3. If fundamental breach exists → consider corresponding termination rights and damages
|
||||
|
||||
### Appendices
|
||||
1. For complex subjects/pricing/technical requirements/deliverables → use "Appendix 1, Appendix 2..." format
|
||||
2. Explicitly state appendix-contract relationship (typically: "Appendices form an integral part of this contract")
|
||||
3. If appendix content is unknown → use placeholder
|
||||
|
||||
---
|
||||
|
||||
## Palette
|
||||
|
||||
**Legal Wood** (Warm + Heavy + Calm) — for decorative elements only; body text must be pure black.
|
||||
|
||||
```js
|
||||
const palette = { primary:"#28201C", body:"#000000", secondary:"#6E6560", accent:"#7A5C3A", surface:"#FBF9F7" };
|
||||
```
|
||||
|
||||
⚠️ **ALL visible text in contracts must be pure black `"000000"`.** This includes:
|
||||
- Contract title (SimHei, black, NOT accent color)
|
||||
- Contract number (black)
|
||||
- Clause headings (black)
|
||||
- Body text (black)
|
||||
- Party information (black)
|
||||
- Signature block text (black)
|
||||
|
||||
**The only exception** is red-header official documents (红头文件), which follow their own GB/T 9704 color rules. For standard contracts, NO colored text is permitted — no red, no accent color, no dark-blue-grey.
|
||||
|
||||
```js
|
||||
// ✅ Contract title — always pure black
|
||||
new Paragraph({ alignment: AlignmentType.CENTER,
|
||||
spacing: { line: Math.ceil(22 * 23), lineRule: "atLeast" }, // ★ Rule 8: prevent clipping
|
||||
children: [new TextRun({ text: "Training Cooperation Framework Agreement",
|
||||
size: 44, bold: true, color: "000000", // ← MUST be "000000"
|
||||
font: { eastAsia: "SimHei", ascii: "Times New Roman" } })]
|
||||
})
|
||||
|
||||
// ❌ FORBIDDEN — accent/palette color on contract text
|
||||
new TextRun({ text: "Training Cooperation Framework Agreement", color: palette.accent }) // ← WRONG
|
||||
new TextRun({ text: "Contract No.:", color: palette.primary }) // ← WRONG (if primary ≠ "000000")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Scene-Specific Font Overrides
|
||||
|
||||
Beyond Profile A defaults:
|
||||
|
||||
| Element | Font | Size | Style |
|
||||
|---------|------|------|-------|
|
||||
| Contract title | SimHei | Er Hao 22pt (size: 44) | Bold, centered |
|
||||
| Contract number | SimSun | Wu Hao 10.5pt (size: 21) | Right-aligned |
|
||||
| Clause heading | SimHei | Xiao Si 12pt (size: 24) | Bold |
|
||||
| Monetary amount | SimSun | Xiao Si 12pt (size: 24) | Bold |
|
||||
|
||||
---
|
||||
|
||||
## Document Structure
|
||||
|
||||
1. **Title**: "XXX Contract" or "XXX Agreement" — Er Hao SimHei, centered
|
||||
2. **Contract number**: right-aligned, Wu Hao
|
||||
3. **Preamble**: Party information with placeholders
|
||||
4. **Recitals** (summarize transaction background and purpose)
|
||||
5. **Definitions** (if specialized terms recur)
|
||||
6. **Substantive clauses** (per selected template)
|
||||
7. **Signature block**
|
||||
8. **Appendices** (if any)
|
||||
|
||||
---
|
||||
|
||||
## Clause Numbering System
|
||||
|
||||
Use stable, consistent, pure-text numbering suitable for Chinese legal documents.
|
||||
|
||||
```
|
||||
Article 1 Subject Matter
|
||||
1.1 xxxxxxxxxx
|
||||
1.2 xxxxxxxxxx
|
||||
(1) xxxxxxxxxx
|
||||
(2) xxxxxxxxxx
|
||||
① xxxxxxxxxx
|
||||
② xxxxxxxxxx
|
||||
Article 2 Price and Payment
|
||||
2.1 ...
|
||||
```
|
||||
|
||||
**Numbering discipline:**
|
||||
1. No level-skipping
|
||||
2. **Forbidden:** Using Markdown list markers (`-` `*` `1.`) for clause hierarchy
|
||||
3. No switching from "Article X" to `-` or `*` or auto-list mid-document
|
||||
4. Numbering style must be consistent throughout the entire document
|
||||
5. Clause headings should be clean and simple
|
||||
|
||||
---
|
||||
|
||||
## Party Information Layout (Table-Based Alignment — Mandatory)
|
||||
|
||||
Party A and Party B information MUST be laid out using a **borderless table** so that labels align vertically. Never use plain paragraphs with indentation — this causes misalignment between parties.
|
||||
|
||||
```js
|
||||
// ✅ Correct — borderless table ensures "统一社会信用代码:", "地址:", "法定代表人:" align
|
||||
function partyInfoBlock(partyLabel, partyName, fields) {
|
||||
// fields: [["Unified Social Credit Code", value], ["Address", value], ["Legal Representative", value]]
|
||||
const NB = { style: BorderStyle.NONE, size: 0, color: "FFFFFF" };
|
||||
const noBorders = { top: NB, bottom: NB, left: NB, right: NB };
|
||||
|
||||
const headerPara = new Paragraph({ spacing: { before: 200, after: 120 },
|
||||
children: [new TextRun({ text: `${partyLabel}: ${safeText(partyName, "【Company full name】")}`,
|
||||
size: 24, font: { eastAsia: "SimSun", ascii: "Times New Roman" } })]
|
||||
});
|
||||
|
||||
const infoTable = new Table({
|
||||
width: { size: 90, type: WidthType.PERCENTAGE },
|
||||
borders: { top: NB, bottom: NB, left: NB, right: NB, insideHorizontal: NB, insideVertical: NB },
|
||||
rows: fields.map(([label, value]) => new TableRow({
|
||||
children: [
|
||||
new TableCell({
|
||||
width: { size: 35, type: WidthType.PERCENTAGE },
|
||||
borders: noBorders,
|
||||
margins: { top: 40, bottom: 40, left: 420, right: 60 },
|
||||
children: [new Paragraph({
|
||||
children: [new TextRun({ text: `${label}:`, size: 24,
|
||||
font: { eastAsia: "SimSun", ascii: "Times New Roman" } })],
|
||||
})],
|
||||
}),
|
||||
new TableCell({
|
||||
borders: noBorders,
|
||||
margins: { top: 40, bottom: 40, left: 60, right: 120 },
|
||||
children: [new Paragraph({
|
||||
children: [new TextRun({ text: safeText(value, `【Please fill in: ${label}】`), size: 24,
|
||||
font: { eastAsia: "SimSun", ascii: "Times New Roman" } })],
|
||||
})],
|
||||
}),
|
||||
],
|
||||
})),
|
||||
});
|
||||
|
||||
return [headerPara, infoTable];
|
||||
}
|
||||
|
||||
// Usage:
|
||||
const partyAChildren = partyInfoBlock("Party A (甲方)", config.partyA?.name, [
|
||||
["Unified Social Credit Code (统一社会信用代码)", config.partyA?.creditCode],
|
||||
["Address (地址)", config.partyA?.address],
|
||||
["Legal Representative (法定代表人/负责人)", config.partyA?.legalRep],
|
||||
]);
|
||||
```
|
||||
|
||||
**Rules:**
|
||||
1. Party A and Party B info blocks must use the **same table column widths** — labels align across both blocks
|
||||
2. Use `safeText()` for all field values — never output `undefined`
|
||||
3. Label column width should accommodate the longest label (e.g., "统一社会信用代码")
|
||||
4. The indent (`margins.left: 420`) simulates sub-level nesting under the party name
|
||||
|
||||
---
|
||||
|
||||
## Signature Block
|
||||
|
||||
Left-right symmetric, structured, easy to adjust in Word. Never write as scattered paragraphs.
|
||||
|
||||
Required fields for each party:
|
||||
- Party name (seal)
|
||||
- Legal representative / Authorized representative
|
||||
- Contact person
|
||||
- Contact information
|
||||
- Signing location
|
||||
- Date: 【____/____/____】
|
||||
|
||||
Use a borderless 2-column table for symmetry. **Every field value must use `safeText()`** — never output `undefined` or empty string. If a field is not provided, use the appropriate `【Please fill in】` placeholder.
|
||||
|
||||
```js
|
||||
// ✅ Correct signature block — safeText for all values
|
||||
function buildSignatureBlock(partyA, partyB) {
|
||||
const fields = ["Party (Seal)", "Legal Rep / Authorized Rep (Signature)", "Contact Person", "Contact Info", "Signing Location", "Date"];
|
||||
const NB = { style: BorderStyle.NONE, size: 0, color: "FFFFFF" };
|
||||
const noBorders = { top: NB, bottom: NB, left: NB, right: NB };
|
||||
|
||||
return new Table({
|
||||
width: { size: 100, type: WidthType.PERCENTAGE },
|
||||
borders: { top: NB, bottom: NB, left: NB, right: NB, insideHorizontal: NB, insideVertical: NB },
|
||||
rows: fields.map((label, i) => {
|
||||
const aVal = i === fields.length - 1 ? "【____/____/____】" : safeText(partyA?.[i], "");
|
||||
const bVal = i === fields.length - 1 ? "【____/____/____】" : safeText(partyB?.[i], "");
|
||||
const displayA = i === 0 ? `Party A (甲方): ${aVal}` : `${label}: ${aVal}`;
|
||||
const displayB = i === 0 ? `Party B (乙方): ${bVal}` : `${label}: ${bVal}`;
|
||||
return new TableRow({
|
||||
children: [
|
||||
new TableCell({ width: { size: 50, type: WidthType.PERCENTAGE }, borders: noBorders,
|
||||
margins: { top: 80, bottom: 80, left: 120, right: 60 },
|
||||
children: [new Paragraph({ children: [new TextRun({ text: displayA, size: 24, color: "000000" })] })] }),
|
||||
new TableCell({ width: { size: 50, type: WidthType.PERCENTAGE }, borders: noBorders,
|
||||
margins: { top: 80, bottom: 80, left: 60, right: 120 },
|
||||
children: [new Paragraph({ children: [new TextRun({ text: displayB, size: 24, color: "000000" })] })] }),
|
||||
],
|
||||
});
|
||||
}),
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monetary Amount Format
|
||||
|
||||
Contracts must show amounts in **both uppercase Chinese and numeric format**:
|
||||
|
||||
```
|
||||
Contract amount: RMB One Million Two Hundred Thirty-Four Thousand Five Hundred Sixty-Seven Yuan (¥1,234,567.00)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Style Rules
|
||||
|
||||
- **NO cover page** — title page is the first page (title + contract number at top)
|
||||
- **NO TOC** unless >20 clauses
|
||||
- **NO decorative elements** — contracts must be formal and clean
|
||||
- **Line spacing**: 1.5x (line: 360) — ⚠️ scene override (Profile A default is 1.3x/312; contracts use 1.5x for readability and annotation space)
|
||||
- **Body**: Justified, first-line indent 480 twips
|
||||
- **Color**: pure black "000000" throughout — no colored text
|
||||
|
||||
---
|
||||
|
||||
## Scene-Specific Quality Checks
|
||||
|
||||
In addition to universal checks (see `references/common-rules.md`):
|
||||
|
||||
### Format
|
||||
- [ ] Party information complete (full name / address / legal representative / contact)
|
||||
- [ ] Signature block properly formatted, symmetrical, all fields present
|
||||
- [ ] Monetary amounts shown in both uppercase and numeric format
|
||||
- [ ] Clause numbering sequential with no gaps
|
||||
- [ ] No cover page (title page is first page)
|
||||
- [ ] No Markdown list markers mixed into clause hierarchy
|
||||
|
||||
### Content
|
||||
- [ ] Clause numbering system consistent, no mixing
|
||||
- [ ] Transaction closure complete (subject → consideration → performance → acceptance → breach → dispute)
|
||||
- [ ] Breach liability corresponds to main obligations
|
||||
- [ ] Dispute resolution explicitly stated (or placeholder for confirmation)
|
||||
- [ ] All unconfirmed variables use `【】` placeholders consistently
|
||||
- [ ] Language is formal, restrained, subjects are explicit
|
||||
- [ ] No fabricated statute numbers or overreaching legal conclusions
|
||||
- [ ] High-risk clauses include manual review notes
|
||||
- [ ] Terminology consistent throughout
|
||||
- [ ] Appendix-contract relationship explicitly stated
|
||||
|
||||
### Closure
|
||||
- [ ] Performance deadline → delay consequences specified
|
||||
- [ ] Payment milestones → conditions and invoice requirements specified
|
||||
- [ ] Delivery obligation → acceptance rules and objection period specified
|
||||
- [ ] Termination right → conditions and post-termination handling specified
|
||||
- [ ] Confidentiality obligation → scope, exceptions, duration, breach consequences specified
|
||||
- [ ] Force majeure → notice and mitigation duties specified
|
||||
139
skills/docx/scenes/copywriting.md
Executable file
139
skills/docx/scenes/copywriting.md
Executable file
@@ -0,0 +1,139 @@
|
||||
# Scene: Copywriting / Script
|
||||
|
||||
## Scope
|
||||
|
||||
Broadcast scripts, product promotion copy, livestream scripts, presentation scripts, speeches, hosting scripts, short video scripts — any document where the goal is **spoken delivery**.
|
||||
|
||||
→ Placeholder convention & universal prohibitions — see `references/common-rules.md`
|
||||
→ Font profile: **B (Visual)** — see `references/common-rules.md`
|
||||
|
||||
---
|
||||
|
||||
## 1. Core Principles
|
||||
|
||||
⚠️ **A broadcast script is NOT a report, NOT a spec sheet, NOT an encyclopedia.**
|
||||
|
||||
The goal is for the audience to **understand on first listen, remember key points, and take action.** Therefore:
|
||||
|
||||
1. **Highlight selling points, don't pile specs:** Each paragraph covers only 1–2 core points with relatable scenario descriptions
|
||||
2. **Conversational tone:** Use "you" not "the user"; use natural speech, not corporate jargon
|
||||
3. **Rhythm:** Alternate long and short sentences, insert pause markers, avoid wall-of-text paragraphs
|
||||
4. **Length discipline:** ~250–300 words per minute of speech; a 5-minute script should not exceed 1500 words
|
||||
5. **Information consistency:** All data, model numbers, prices must be consistent throughout — no self-contradiction
|
||||
|
||||
---
|
||||
|
||||
## 2. Document Structure
|
||||
|
||||
Completely different from reports:
|
||||
|
||||
```
|
||||
Title (centered, short and punchy)
|
||||
────────────────────
|
||||
[Opening] ← Grab attention, 1–2 sentences
|
||||
[Core Para 1] ← One selling point/opinion + scenario
|
||||
[Core Para 2] ← One selling point/opinion + scenario
|
||||
[Core Para 3] ← One selling point/opinion + scenario (max 3–5 paras)
|
||||
[Closing] ← Summary + Call to Action (CTA)
|
||||
────────────────────
|
||||
[Notes] ← Supplementary info, data sources (optional, small grey text)
|
||||
```
|
||||
|
||||
### Decisions
|
||||
- **Cover:** ❌ Not needed
|
||||
- **TOC:** ❌ Not needed
|
||||
- **Header/footer:** Optional, minimal
|
||||
- **Sections:** Single section sufficient
|
||||
- **Line spacing:** `line: 400` (slightly larger than standard 1.5x for reading/marking ease)
|
||||
|
||||
---
|
||||
|
||||
## 3. Layout Standards
|
||||
|
||||
### Font Specifications
|
||||
|
||||
| Element | Font | Size | Style |
|
||||
|---------|------|------|-------|
|
||||
| Title | SimHei | 18pt (size:36) | Bold, centered |
|
||||
| Section heading / highlight | SimHei | 14pt (size:28) | Bold |
|
||||
| Body | Microsoft YaHei | 12pt (size:24) | Left-aligned |
|
||||
| Rhythm markers | Microsoft YaHei | 10.5pt (size:21) | Grey 999999, italic |
|
||||
| Notes | Microsoft YaHei | 10pt (size:20) | Grey 666666 |
|
||||
|
||||
### Paragraph Spacing
|
||||
```js
|
||||
// Generous spacing between paragraphs for reading/breathing pauses
|
||||
spacing: { before: 200, after: 200, line: 400 }
|
||||
// Larger gap between core sections
|
||||
sectionGap: { before: 400, after: 200 }
|
||||
```
|
||||
|
||||
### Key Point Highlighting
|
||||
Use **bold** or **accent-colored text** to mark key selling points:
|
||||
```js
|
||||
new TextRun({ text: "Key selling point", bold: true, color: c(P.accent) })
|
||||
```
|
||||
|
||||
### Rhythm Markers (optional)
|
||||
Insert small grey markers where pauses, emphasis, or tone changes are needed:
|
||||
```js
|
||||
new Paragraph({ spacing: { before: 60, after: 60 },
|
||||
children: [new TextRun({ text: "[Pause 2 sec]", size: 21, color: "999999", italics: true })] })
|
||||
// Or inline: new TextRun({ text: " [emphasis] ", size: 18, color: "999999", italics: true })
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Content Quality Rules
|
||||
|
||||
### Information Density Guide
|
||||
|
||||
| Script Type | Duration | Word Count | Core Paragraphs |
|
||||
|-------------|----------|-----------|----------------|
|
||||
| Short video | 30–60 sec | 150–300 | 1–2 |
|
||||
| Product promotion | 2–3 min | 500–800 | 3–4 |
|
||||
| Presentation / Speech | 5–10 min | 1200–2500 | 5–8 |
|
||||
| Hosting script | Per agenda | Per segment | Per segment |
|
||||
|
||||
### Scene-Specific Prohibitions
|
||||
|
||||
1. **No spec dumping:** Do not list all product specifications in tables. Select 2–3 most persuasive data points and express them through scenarios
|
||||
2. **No information contradiction:** Model numbers, prices, data appearing multiple times must be perfectly consistent
|
||||
3. **No report tone:** No "in conclusion", "research indicates", "as mentioned above" — this is spoken word
|
||||
4. **No lengthy citations:** Broadcast scripts do not need quotes, footnotes, or references
|
||||
5. **No dense layout:** Paragraphs must have visible spacing — no screen-filling text walls
|
||||
|
||||
### Product Promotion Specific Rules
|
||||
- **Opening:** Lead with pain point / scenario ("Does your washing machine still smell after a cycle?"), not self-introduction
|
||||
- **Product intro:** Compare only 1–2 competitive dimensions at a time — not a full review
|
||||
- **Price anchor:** State original/market price first, then discount price — create contrast
|
||||
- **CTA:** Explicitly state the action ("Click the link below", "Type 1 in comments")
|
||||
|
||||
---
|
||||
|
||||
## 5. Palette
|
||||
|
||||
Broadcast scripts use clean, simple colors — no complex visual design needed:
|
||||
|
||||
```js
|
||||
const P = {
|
||||
primary: "#1A1A1A", // Title
|
||||
body: "#333333", // Body
|
||||
secondary: "#666666", // Notes
|
||||
accent: "#E85D3A", // Key highlight (warm, energetic)
|
||||
surface: "#FFF8F5", // Background (if needed)
|
||||
};
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Scene-Specific Quality Checks
|
||||
|
||||
In addition to universal checks (see `references/common-rules.md`):
|
||||
|
||||
- [ ] Total word count within target range (not exceeding)
|
||||
- [ ] Each core paragraph has only 1–2 selling points (no dumping)
|
||||
- [ ] Conversational tone present (not report/formal style)
|
||||
- [ ] Information consistent throughout (model, price, data — no contradictions)
|
||||
- [ ] Paragraph spacing sufficient (visually not crowded)
|
||||
- [ ] Clear attention-grabbing opening + closing CTA
|
||||
698
skills/docx/scenes/exam.md
Executable file
698
skills/docx/scenes/exam.md
Executable file
@@ -0,0 +1,698 @@
|
||||
# Scene: Exam Paper
|
||||
|
||||
## Overview
|
||||
|
||||
Exam papers are among the most critical document types in education. Unlike general documents, they require high precision in layout, print compatibility, and subject-specific formatting. This specification covers the complete workflow from page framework to subject-specific features.
|
||||
|
||||
→ Universal prohibitions — see `references/common-rules.md`
|
||||
→ **Note:** Exam papers use their OWN font/layout specs (not Profile A defaults). All text is pure black/white/grey for photocopy clarity.
|
||||
|
||||
---
|
||||
|
||||
## 1. Page Setup & Framework
|
||||
|
||||
### Paper Specifications
|
||||
|
||||
| Type | Paper | Orientation | Use Case |
|
||||
|------|-------|-------------|----------|
|
||||
| Practice / Unit quiz | A4 | Portrait | Daily practice, homework, quizzes |
|
||||
| Formal exam | A3 | Landscape + 2-column | Midterm / final / standardized (requires OOXML) |
|
||||
| Answer sheet | A4 | Portrait | Standalone answer card |
|
||||
|
||||
### Margins
|
||||
|
||||
```js
|
||||
// A4 portrait — no seal line
|
||||
page: { size: { width: 11906, height: 16838 },
|
||||
margin: { top: 850, bottom: 850, left: 1200, right: 1200 } }
|
||||
|
||||
// A4 portrait — with seal line (left binding area reserved)
|
||||
page: { size: { width: 11906, height: 16838 },
|
||||
margin: { top: 850, bottom: 850, left: 2200, right: 850 } }
|
||||
|
||||
// A3 landscape dual-column (requires OOXML)
|
||||
// ⚠️ A3 dual-column may render slightly differently in WPS vs Word. Test in both before batch printing.
|
||||
page: { size: { width: 23812, height: 16838, orientation: PageOrientation.LANDSCAPE },
|
||||
margin: { top: 850, bottom: 850, left: 2200, right: 850 } }
|
||||
```
|
||||
|
||||
### Section Handling
|
||||
|
||||
Different parts should use section breaks (`SectionType.NEXT_PAGE`):
|
||||
- **Header area (full-width):** Title, instructions, score table (no columns)
|
||||
- **Content area:** Questions (may use columns)
|
||||
- **Composition / answer sheet:** Independent section, independent format
|
||||
- **Attachment pages:** Large maps/diagrams for geography/biology can be separate pages
|
||||
|
||||
```js
|
||||
sections: [
|
||||
{ properties: { /* Header section — no columns */ }, children: [...] },
|
||||
{ properties: { type: SectionType.CONTINUOUS, column: { count: 2, space: 720 } }, children: [...] },
|
||||
{ properties: { type: SectionType.NEXT_PAGE }, children: [...] }, // Composition
|
||||
]
|
||||
```
|
||||
|
||||
### Template-First Principle
|
||||
|
||||
⚠️ **Build framework first, fill content second.** Before writing questions, determine:
|
||||
1. Paper size + margins
|
||||
2. Whether seal line is needed
|
||||
3. Whether columns are used
|
||||
4. Question type structure and point allocation
|
||||
5. Whether composition grid / answer sheet is needed
|
||||
|
||||
---
|
||||
|
||||
## 2. Seal Line & Student Information Area
|
||||
|
||||
### When to Use Seal Line
|
||||
|
||||
| Scenario | Seal Line | Student Info Position |
|
||||
|----------|-----------|---------------------|
|
||||
| Formal standardized exam | ✅ Required | Left vertical info column |
|
||||
| Midterm / Final | ✅ Recommended | Left vertical info column |
|
||||
| Unit quiz | ❌ Optional | Header horizontal info row |
|
||||
| Daily practice | ❌ Skip | Header horizontal info row |
|
||||
|
||||
### Seal Line Implementation
|
||||
|
||||
#### Method 1: Header horizontal prompt (simple)
|
||||
```js
|
||||
headers: { default: new Header({ children: [
|
||||
new Paragraph({ alignment: AlignmentType.CENTER,
|
||||
children: [new TextRun({
|
||||
text: ".............. Seal ...... Line ...... Do ...... Not ...... Answer ...... Inside ..............",
|
||||
size: 16, color: "999999", font: "SimSun" })] })
|
||||
] }) }
|
||||
```
|
||||
|
||||
#### Method 2: Vertical text box (OOXML advanced)
|
||||
```xml
|
||||
<w:txbxContent>
|
||||
<w:p><w:pPr><w:jc w:val="center"/></w:pPr>
|
||||
<w:r><w:rPr><w:sz w:val="18"/><w:color w:val="999999"/></w:rPr>
|
||||
<w:t>Name:________ Class:________ ID:________</w:t></w:r>
|
||||
</w:p>
|
||||
<w:p><w:r><w:rPr><w:sz w:val="16"/><w:color w:val="CCCCCC"/></w:rPr>
|
||||
<w:t>- - - - - - - - - Seal Line - - - - - - - - -</w:t></w:r>
|
||||
</w:p>
|
||||
</w:txbxContent>
|
||||
```
|
||||
|
||||
### Student Info Row
|
||||
|
||||
```js
|
||||
// Horizontal info row (when no seal line) — borderless 3-column table
|
||||
new Table({
|
||||
alignment: AlignmentType.CENTER, columnWidths: [2800, 2800, 2800],
|
||||
rows: [new TableRow({ children: [
|
||||
cell("Name: ______________"),
|
||||
cell("Class: ______________", AlignmentType.CENTER),
|
||||
cell("ID: ______________", AlignmentType.RIGHT),
|
||||
] })]
|
||||
})
|
||||
```
|
||||
|
||||
Fill lines should be moderate length (10–14 underscore chars). Label order: Name → Class → Student ID.
|
||||
|
||||
---
|
||||
|
||||
## 3. Paper Header & Title Area
|
||||
|
||||
### Structure
|
||||
|
||||
```
|
||||
School name (16pt SimHei, centered)
|
||||
Exam title (14pt SimHei, centered) — e.g., "2025–2026 Academic Year Second Semester Midterm"
|
||||
Subject title (14pt SimHei, centered) — e.g., "Grade 7 Mathematics"
|
||||
Student info row
|
||||
Instructions (10pt SimSun, centered, grey)
|
||||
Score table (as needed)
|
||||
```
|
||||
|
||||
### Font Specifications
|
||||
|
||||
| Element | Font | Size | Style |
|
||||
|---------|------|------|-------|
|
||||
| School name | SimHei | 16pt (size:32) | Bold, centered |
|
||||
| Exam title | SimHei | 14pt (size:28) | Bold, centered |
|
||||
| Subject title | SimHei | 14pt (size:28) | Bold, centered |
|
||||
| Instructions | SimSun | 10pt (size:20) | Grey 333333, centered |
|
||||
| Student info | SimSun | 10.5pt (size:21) | Normal |
|
||||
|
||||
### Instructions Content
|
||||
|
||||
Should include: total score, exam duration, answer method, special requirements (e.g., calculator allowed).
|
||||
|
||||
### Score Table
|
||||
- Header row: light grey background F0F0F0, centered
|
||||
- Columns: Question type | Section names... | Total
|
||||
- Rows: Points | Section points... | Total points
|
||||
- Row: Score | blank... | blank
|
||||
- Table centered, 80% page width
|
||||
|
||||
⚠️ **Header area should not be too full** — title + info + instructions + score table should not exceed 1/3 of the page.
|
||||
|
||||
---
|
||||
|
||||
## 4. Content Layout Rules
|
||||
|
||||
### Color Palette
|
||||
|
||||
```js
|
||||
// Exam papers use only black/white/grey for clear photocopying
|
||||
const C = {
|
||||
title: "000000", body: "000000", section: "333333",
|
||||
seal: "999999", answerLine: "CCCCCC", headerBg: "F0F0F0", gridLine: "DDDDDD",
|
||||
};
|
||||
```
|
||||
|
||||
### Column Usage
|
||||
|
||||
| Subject / Question Type | Recommendation |
|
||||
|------------------------|----------------|
|
||||
| Math multiple choice + fill-in | ✅ Suitable for columns |
|
||||
| Physics multiple choice | ✅ Suitable for columns |
|
||||
| Chinese reading / composition | ❌ Not suitable |
|
||||
| English cloze / reading | ❌ Not suitable |
|
||||
| History source-based | ❌ Not suitable |
|
||||
| Geography map reading | ❌ Not suitable |
|
||||
|
||||
### Question Numbering
|
||||
|
||||
Entire paper uses consistent three-level numbering:
|
||||
- **Major sections:** I, II, III, IV... (Chinese: 一、二、三、四…)
|
||||
- **Questions:** 1. 2. 3. ... (Arabic + period)
|
||||
- **Sub-questions:** (1) (2) (3) ... (parenthesized)
|
||||
|
||||
⚠️ **No extra symbols before question numbers** (no `•`, `▸`, `▪`, `-`, `*`). The number itself is the only marker. **Never use docx numbering/bullet list styles** for question numbers — must use plain TextRun manual numbering.
|
||||
|
||||
```js
|
||||
// ✅ Correct — plain TextRun manual numbering
|
||||
new Paragraph({ spacing: { before: 120, after: 60, line: 360 },
|
||||
children: [new TextRun({ text: `${i+1}. ${question}`, size: 21, font: { eastAsia: "SimSun" } })] })
|
||||
|
||||
// ❌ Wrong — numbering causes Word to add bullets
|
||||
new Paragraph({ numbering: { reference: "xxx", level: 0 }, // ← Forbidden!
|
||||
children: [new TextRun({ text: question })] })
|
||||
```
|
||||
|
||||
### Question Spacing
|
||||
|
||||
```js
|
||||
sectionTitle: { before: 300, after: 150 } // Major section headers
|
||||
question: { before: 120, after: 80 } // Between questions
|
||||
subQuestion: { before: 60, after: 40 } // Between sub-questions
|
||||
```
|
||||
|
||||
### Page Break Control
|
||||
|
||||
⚠️ Key principles:
|
||||
- **Question stem and answer area must not split** across pages
|
||||
- **Source material and questions on same page**
|
||||
- **Figures adjacent to their questions**
|
||||
- **Avoid orphan lines** — question stem, options, answer area appear as a group
|
||||
|
||||
```js
|
||||
new Paragraph({ keepNext: true, keepLines: true, children: [...] })
|
||||
```
|
||||
|
||||
⚠️ **Answer question page break rule (mandatory):**
|
||||
|
||||
Complete combination (stem + figure + answer lines) must be considered as a unit. If remaining space cannot fit stem + figure + at least 3 answer lines, push entire question to next page.
|
||||
|
||||
Use `keepNext: true` to chain: stem → figure → first 3 answer lines.
|
||||
|
||||
---
|
||||
|
||||
## 5. Font & Paragraph Standards
|
||||
|
||||
### Underline Formatting for "Underlined Parts" (Mandatory)
|
||||
|
||||
When a question references "underlined part" (划线部分), the relevant text MUST use actual underline formatting (`underline: { type: UnderlineType.SINGLE }`). **Never** show "划线部分为 XXX" as plain text annotation — the underline must be visually rendered.
|
||||
|
||||
```js
|
||||
// ✅ Correct — actual underline on the referenced text
|
||||
new Paragraph({ children: [
|
||||
new TextRun({ text: "1. It is ", size: 21, font: { ascii: "Times New Roman" } }),
|
||||
new TextRun({ text: "a butterfly", size: 21, font: { ascii: "Times New Roman" },
|
||||
underline: { type: UnderlineType.SINGLE, color: "000000" } }),
|
||||
new TextRun({ text: ". (Ask about the underlined part)", size: 21, font: { ascii: "Times New Roman" } }),
|
||||
]})
|
||||
|
||||
// ❌ Wrong — underlined part described as annotation text
|
||||
new TextRun({ text: "1. It is a butterfly. (对划线部分提问) 注:划线部分为 a butterfly" })
|
||||
```
|
||||
|
||||
### Font Hierarchy
|
||||
|
||||
| Element | Font | Size | Style |
|
||||
|---------|------|------|-------|
|
||||
| Section title | SimHei | 11pt (size:22) | Bold |
|
||||
| Question content | SimSun | 10.5pt (size:21) | Normal |
|
||||
| Points annotation | SimSun | 10pt (size:20) | In parentheses |
|
||||
| Reading material | KaiTi/SimSun | 10.5pt (size:21) | KaiTi to differentiate |
|
||||
| Notes/source | SimSun | 9pt (size:18) | Grey 666666 |
|
||||
| Seal line | SimSun | 8pt (size:16) | Grey 999999 |
|
||||
| Page number | SimSun | 9pt (size:18) | Centered |
|
||||
|
||||
### Line Spacing
|
||||
```js
|
||||
line: 360 // ~1.5x for readability
|
||||
answerLine: 500 // Answer line spacing for writing room
|
||||
```
|
||||
|
||||
### Paragraph Rules
|
||||
- ⚠️ **Never use consecutive returns for whitespace** — use `spacing.before/after`
|
||||
- Chinese questions use Chinese punctuation; English materials use English punctuation
|
||||
- Mixed CN/EN: use Times New Roman or Calibri for English text
|
||||
|
||||
---
|
||||
|
||||
## 6. Multiple Choice Layout
|
||||
|
||||
### Core Rule
|
||||
|
||||
⚠️ **Options must NEVER be aligned with spaces!** Must use borderless tables.
|
||||
|
||||
### Option Layout — Borderless Table
|
||||
|
||||
```js
|
||||
// Short options: 4 columns in 1 row
|
||||
new Table({
|
||||
columnWidths: [2200, 2200, 2200, 2200],
|
||||
rows: [new TableRow({ children: ["A","B","C","D"].map((label, i) =>
|
||||
new TableCell({ borders: NBs, width: { size: 2200, type: WidthType.DXA },
|
||||
margins: { top: 0, bottom: 0, left: 60, right: 60 },
|
||||
children: [new Paragraph({ spacing: { before: 0, after: 0 },
|
||||
children: [new TextRun({ text: `${label}. ${options[i]}`, size: 21, font: "SimSun" })] })]
|
||||
})
|
||||
) })]
|
||||
})
|
||||
// Medium options: 2 columns, 2 rows
|
||||
// Long options: 1 column, 4 rows
|
||||
```
|
||||
|
||||
### Option Length Detection
|
||||
```js
|
||||
function getOptionLayout(options) {
|
||||
const maxLen = Math.max(...options.map(o => o.length));
|
||||
if (maxLen <= 6) return "4col";
|
||||
if (maxLen <= 15) return "2col";
|
||||
return "1col";
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Fill-in-the-Blank Layout
|
||||
|
||||
```js
|
||||
// Blank line length matches expected answer:
|
||||
// Short answer (number/word): 8 underscores
|
||||
// Medium (phrase): 14 underscores
|
||||
// Long (sentence): 20 underscores
|
||||
new Paragraph({ spacing: { before: 140, after: 80, line: 400 },
|
||||
children: [new TextRun({ text: `${num}. Question text ________________.`, size: 21, font: "SimSun" })] })
|
||||
```
|
||||
|
||||
⚠️ Fill-in lines must not break across lines — if line is too long, put the blank on the next line.
|
||||
|
||||
---
|
||||
|
||||
## 8. Short Answer / Problem-Solving Layout
|
||||
|
||||
### Question + Points
|
||||
```js
|
||||
new Paragraph({ spacing: { before: 200, after: 60, line: 360 }, keepNext: true,
|
||||
children: [new TextRun({ text: `${num}. (${points} pts) ${question}`, size: 21, font: "SimSun" })] })
|
||||
```
|
||||
|
||||
### Answer Lines
|
||||
```js
|
||||
// Light grey answer lines (CCCCCC), NOT black
|
||||
// ⚠️ Answer lines are ONLY for writing space within each question — never as dividers between questions
|
||||
function answerLines(count) {
|
||||
return Array(count).fill(null).map(() =>
|
||||
new Paragraph({ spacing: { before: 0, after: 0, line: 500 },
|
||||
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" } },
|
||||
children: [new TextRun({ text: " ", size: 21 })] })
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
⚠️ **Separation between questions:**
|
||||
|
||||
Use **only spacing** (`spacing.before: 200`) for visual separation between questions. **Forbidden:**
|
||||
- ❌ Grey horizontal lines (borders)
|
||||
- ❌ Color block dividers (Table-simulated separators)
|
||||
- ❌ Symbol dividers (e.g., `───────`)
|
||||
- ❌ Any visual separator decoration
|
||||
|
||||
### Answer Space vs. Points
|
||||
|
||||
| Points | Suggested Lines | Description |
|
||||
|--------|----------------|-------------|
|
||||
| 2–4 | 3–4 lines | Simple calculation / short answer |
|
||||
| 5–8 | 6–8 lines | Medium problem |
|
||||
| 10–12 | 8–10 lines | Complex problem |
|
||||
| 14–20 | 10–14 lines | Comprehensive / essay question |
|
||||
|
||||
---
|
||||
|
||||
## 9. Source-Based / Reading Question Layout
|
||||
|
||||
### Material vs. Question Separation
|
||||
|
||||
```js
|
||||
// Material area — indented + KaiTi to differentiate
|
||||
new Paragraph({ indent: { left: 420, right: 420 }, spacing: { before: 100, after: 100, line: 380 },
|
||||
children: [new TextRun({ text: materialText, size: 21, font: "KaiTi" })] })
|
||||
// Source attribution
|
||||
new Paragraph({ alignment: AlignmentType.RIGHT, indent: { right: 420 },
|
||||
children: [new TextRun({ text: "— from \"XXX\"", size: 18, color: "666666", font: "SimSun" })] })
|
||||
```
|
||||
|
||||
### Key Principles
|
||||
- Material title, source, body, and notes use different fonts
|
||||
- Long materials: increase line spacing (line: 380–400)
|
||||
- Material and corresponding questions on same page
|
||||
- Sub-question numbers (1)(2)(3) clearly correspond to material
|
||||
- **Data tables in materials MUST use proper docx `Table` objects** — never render tabular data as Markdown plain text (`| col | col |`). This includes statistics tables, climate data tables, comparison tables, and any structured data within question materials. Use bordered tables (see § 13 Table Usage Standards) with appropriate header row styling.
|
||||
|
||||
---
|
||||
|
||||
## 10. Composition / Writing Area
|
||||
|
||||
### Grid Count Calculation
|
||||
|
||||
⚠️ **Grid count must exceed required word count by 20–30%** (for title, paragraph indents, line breaks).
|
||||
|
||||
| Required Words | Min Grid Count | Recommended Layout |
|
||||
|---------------|---------------|-------------------|
|
||||
| 400 | 500 | 25 rows × 20 cols |
|
||||
| 600 | 750 | 38 rows × 20 cols |
|
||||
| 800 | 1000 | 50 rows × 20 cols |
|
||||
| 1000 | 1250 | 63 rows × 20 cols |
|
||||
|
||||
```js
|
||||
function calcGridSize(requiredWords, colsPerRow = 20) {
|
||||
const totalCells = Math.ceil(requiredWords * 1.25);
|
||||
const rows = Math.ceil(totalCells / colsPerRow);
|
||||
return { rows, colsPerRow, totalCells: rows * colsPerRow };
|
||||
}
|
||||
```
|
||||
|
||||
### Chinese Composition Grid
|
||||
|
||||
```js
|
||||
function compositionGrid(rows, colsPerRow) {
|
||||
const cellSize = Math.floor(8800 / colsPerRow);
|
||||
return new Table({
|
||||
columnWidths: Array(colsPerRow).fill(cellSize),
|
||||
rows: Array(rows).fill(null).map(() =>
|
||||
new TableRow({
|
||||
height: { value: cellSize, rule: HeightRule.EXACT },
|
||||
children: Array(colsPerRow).fill(null).map(() =>
|
||||
new TableCell({ borders: thinBs("DDDDDD"), width: { size: cellSize, type: WidthType.DXA },
|
||||
children: [new Paragraph({ children: [] })] })
|
||||
)
|
||||
})
|
||||
)
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### English Writing Area (Horizontal Lines) — MANDATORY for English Writing Questions
|
||||
|
||||
⚠️ **Every English writing/composition question MUST include ruled horizontal lines.** A blank area without lines is FORBIDDEN — students need lines to write on.
|
||||
|
||||
```js
|
||||
function writingLines(count) {
|
||||
return Array(count).fill(null).map(() =>
|
||||
new Paragraph({ spacing: { before: 0, after: 0, line: 560 },
|
||||
borders: { bottom: { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" } },
|
||||
children: [new TextRun({ text: " ", size: 21 })] })
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
**Line count by word requirement:**
|
||||
| Required Words | Lines |
|
||||
|---------------|-------|
|
||||
| ≤50 | 8 |
|
||||
| 50–80 | 10 |
|
||||
| 80–120 | 12 |
|
||||
| 120+ | 15 |
|
||||
|
||||
**Rules:**
|
||||
1. Lines must appear immediately after the writing prompt paragraph
|
||||
2. Line color: light grey `CCCCCC` (print-friendly, not visually heavy)
|
||||
3. Line spacing: `line: 560` (provides adequate writing room)
|
||||
4. Chinese composition uses grid (`compositionGrid`), English uses lines (`writingLines`) — never mix them up
|
||||
```
|
||||
|
||||
### Composition Area Requirements
|
||||
- Independent section or clear separation
|
||||
- Title space reserved (for self-chosen topics)
|
||||
- Word count prompt visible ("No fewer than 800 words" / "About 120 words")
|
||||
- Grid/line colors light — must not interfere with writing
|
||||
- Pages continuous, not split
|
||||
|
||||
---
|
||||
|
||||
## 11. Answer Key (参考答案)
|
||||
|
||||
### Output Rules
|
||||
|
||||
1. **Default (user does not request answers in the same file):** Generate the answer key as a **separate .docx file** (e.g., `exam.docx` + `exam_answers.docx`). This prevents students from accidentally seeing answers.
|
||||
2. **User explicitly requests answers in the same file:** Place the answer key on an **independent page** using `SectionType.NEXT_PAGE`. Answer key MUST NOT appear on the same page as any exam question.
|
||||
|
||||
### Separate File Format (Default)
|
||||
|
||||
The answer key file should include:
|
||||
- Title: "《{exam title}》参考答案" (SimHei, 14pt/size:28, bold, centered)
|
||||
- Same question numbering as the exam
|
||||
- Concise answers (letter choices, key words, short solutions)
|
||||
- Font: SimSun 10.5pt (size: 21)
|
||||
|
||||
### Same File Format (When User Requests)
|
||||
|
||||
```js
|
||||
// Answer key as a separate section — MUST use SectionType.NEXT_PAGE
|
||||
{
|
||||
properties: { type: SectionType.NEXT_PAGE,
|
||||
page: { margin: { top: 850, bottom: 850, left: 1200, right: 1200 } } },
|
||||
children: [
|
||||
new Paragraph({
|
||||
alignment: AlignmentType.CENTER, spacing: { after: 300 },
|
||||
children: [new TextRun({ text: "参考答案", size: 28, bold: true,
|
||||
font: { eastAsia: "SimHei" } })],
|
||||
}),
|
||||
// ... answer content paragraphs
|
||||
],
|
||||
}
|
||||
```
|
||||
|
||||
### Rules
|
||||
1. ⚠️ **Never place answer content directly after the last question without a page/section break**
|
||||
2. Answer content should be concise — no answer lines, no grid, plain text only
|
||||
3. Calculation/proof questions: show key steps, not just final answer
|
||||
4. If the exam has figures, answers may reference "see Figure X" without re-embedding
|
||||
|
||||
---
|
||||
|
||||
## 12. Figures & Illustrations
|
||||
|
||||
### Image Insertion
|
||||
```js
|
||||
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { before: 100, after: 60 },
|
||||
children: [new ImageRun({ data: imageBuffer, transformation: { width: 300, height: 200 }, type: "png" })] })
|
||||
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { after: 100 },
|
||||
children: [new TextRun({ text: "(Figure 1)", size: 18, color: "666666", font: "SimSun" })] })
|
||||
```
|
||||
|
||||
### Key Principles
|
||||
- Images set as inline (default) to prevent floating
|
||||
- Resolution sufficient for print clarity
|
||||
- **B&W print compatible:** images must remain distinguishable when printed in grayscale
|
||||
- Figure numbers and captions complete
|
||||
- Figures adjacent to corresponding questions
|
||||
- Maps must have: scale bar, north arrow, legend
|
||||
- Coordinate graphs must have: axis labels, tick marks, units
|
||||
|
||||
### ⚠️ Figure-Text Order (Strictly Enforced)
|
||||
|
||||
**For questions with figures, element order must be:**
|
||||
```
|
||||
1. Question stem (keepNext: true)
|
||||
2. Figure (centered, keepNext: true)
|
||||
3. Answer lines / answer area
|
||||
```
|
||||
|
||||
**Forbidden:** answer lines between stem and figure, or figure after answer lines.
|
||||
|
||||
### Figure Content Matching
|
||||
- **Figures must be semantically consistent with question stem:** if question says "triangle ABC", figure must label vertices A, B, C
|
||||
- Geometry annotations must match described angles, side lengths
|
||||
- Function graphs must mark key points mentioned in the question
|
||||
- Physics experiment diagrams must match described apparatus
|
||||
- Figure width: geometry ≤ 50% page width, data/experiment ≤ 70%
|
||||
|
||||
### ⚠️ Figure Diversity Rule (Mandatory)
|
||||
|
||||
**No duplicate figures in the entire paper.** Even if two questions involve the same type (e.g., both triangles), each must have a distinct figure:
|
||||
1. Different labels (different vertex letters, angles, side lengths)
|
||||
2. Different shapes (acute vs. right vs. obtuse triangle)
|
||||
3. Different styling (if applicable)
|
||||
|
||||
If using matplotlib, each call must use **different parameters and data** — never copy the same generation code.
|
||||
|
||||
### Subject-Specific Figure Requirements
|
||||
|
||||
| Subject | Common Types | Special Requirements |
|
||||
|---------|-------------|---------------------|
|
||||
| Math | Geometry, functions, coordinates | No distortion, clear labels |
|
||||
| Physics | Circuits, mechanics, apparatus | Standard symbols, correct arrows |
|
||||
| Chemistry | Apparatus, molecular structures | Reagent names labeled |
|
||||
| Biology | Cell, organ, ecosystem diagrams | Labels not too small |
|
||||
| Geography | Maps, contour lines, statistics | Legend + scale + north arrow |
|
||||
|
||||
---
|
||||
|
||||
## 13. Formulas & Special Symbols
|
||||
|
||||
### Formulas
|
||||
Math/physics/chemistry formulas use **LaTeX → docx-js Math mapping** (see `references/math-formulas.md`):
|
||||
- Basic (fractions, sub/superscript, roots) → docx-js Math components
|
||||
- Complex (3+ nesting, matrices) → matplotlib PNG fallback
|
||||
- Never hand-type Unicode formula approximations
|
||||
|
||||
### Common Unicode Math Symbols
|
||||
```
|
||||
× ÷ ± ∓ ≠ ≈ ≤ ≥ ∞ √ ∑ ∏ ∫ ∂ ∆ ∇
|
||||
α β γ δ ε θ λ μ π σ φ ω
|
||||
⊂ ⊃ ∈ ∉ ∪ ∩ ∅ ∀ ∃
|
||||
→ ← ↑ ↓ ⇒ ⇔ ° ′ ″ ‰ ² ³ ⁴ ⁿ ₁ ₂ ₃
|
||||
```
|
||||
|
||||
### Chemical Formulas
|
||||
Subscripts/superscripts must be correct: H₂O, CO₂, Fe₂O₃, Ca(OH)₂
|
||||
Reaction arrows: → ⇌ ↑ ↓
|
||||
|
||||
---
|
||||
|
||||
## 14. Table Usage Standards
|
||||
|
||||
### Borderless Tables (for alignment)
|
||||
For: option alignment, info rows, question number + points alignment
|
||||
```js
|
||||
const NB = { style: BorderStyle.NONE, size: 0, color: "FFFFFF" };
|
||||
const NBs = { top: NB, bottom: NB, left: NB, right: NB };
|
||||
```
|
||||
|
||||
### Bordered Tables (for data display)
|
||||
For: score tables, data tables, statistics
|
||||
```js
|
||||
const thinB = (c="000000") => ({ style: BorderStyle.SINGLE, size: 1, color: c });
|
||||
const thinBs = (c="000000") => ({ top: thinB(c), bottom: thinB(c), left: thinB(c), right: thinB(c) });
|
||||
```
|
||||
|
||||
### Table Standards
|
||||
- Cell padding moderate (margins: top/bottom 40–60, left/right 60–80)
|
||||
- Consistent border thickness
|
||||
- Header row: light grey F0F0F0 background
|
||||
- Avoid cross-page tables
|
||||
- Tables centered (`alignment: AlignmentType.CENTER`)
|
||||
|
||||
---
|
||||
|
||||
## 15. Headers & Footers
|
||||
|
||||
### Page Numbers
|
||||
```js
|
||||
footers: { default: new Footer({ children: [
|
||||
new Paragraph({ alignment: AlignmentType.CENTER,
|
||||
children: [
|
||||
new TextRun({ children: [PageNumber.CURRENT], size: 18, font: "SimSun" }),
|
||||
] })
|
||||
] }) }
|
||||
```
|
||||
|
||||
⚠️ **Denominator FORBIDDEN** — never use `PageNumber.TOTAL_PAGES` or "Page X of Y". Show only current page number.
|
||||
|
||||
### Headers
|
||||
- May contain seal line prompt or subject name
|
||||
- Small font (8–9pt), grey color (999999)
|
||||
- Should not be visually heavy — must not compete with content
|
||||
|
||||
---
|
||||
|
||||
## 16. Subject-Specific Standards
|
||||
|
||||
### Chinese Language
|
||||
- Reading, classical poetry, composition: **no columns**
|
||||
- Poetry preserves original line breaks
|
||||
- Classical text needs annotation area (smaller font, indented)
|
||||
- Composition grid in independent section, grid count via `calcGridSize` (800 words → 50×20 = 1000 cells)
|
||||
- Dictation questions: horizontal lines, moderate length
|
||||
- Reading materials: use KaiTi to differentiate
|
||||
|
||||
### Mathematics
|
||||
- Multiple choice, fill-in: suitable for neat layout
|
||||
- Formulas: Unicode symbols or OOXML
|
||||
- Geometry/function graphs must be clear, undistorted
|
||||
- Problem-solving: sufficient working space
|
||||
- Coordinate graphs: labeled axes, tick marks
|
||||
|
||||
### English
|
||||
- English font: Times New Roman, moderate character spacing
|
||||
- Cloze: numbers in text, options after passage
|
||||
- Reading comprehension: material + questions as groups
|
||||
- Writing area: horizontal lines, not grid
|
||||
- Listening (if any): numbers aligned with options
|
||||
|
||||
### Physics / Chemistry / Biology
|
||||
- Experiment/apparatus diagrams must be clear and accurate
|
||||
- Unit symbols standardized (m/s, kg, mol/L, etc.)
|
||||
- Chemical formula subscripts correct
|
||||
- Calculation and experiment analysis: sufficient answer space
|
||||
- Biology structure diagrams: labels not too small
|
||||
|
||||
### History / Politics
|
||||
- Source-based questions are lengthy — **no columns**
|
||||
- Dates, figures, events clearly labeled
|
||||
- Essay questions: more whitespace than multiple choice
|
||||
- Historical sources cite provenance
|
||||
- Chart materials in logical order
|
||||
|
||||
### Geography
|
||||
- Maps are the focus — must be clear
|
||||
- Legend, scale bar, north arrow required
|
||||
- Map and question close together — avoid page turns
|
||||
- Map reading questions: balance figure and text space
|
||||
- Contour line values clearly labeled
|
||||
|
||||
---
|
||||
|
||||
## Final Review Checklist
|
||||
|
||||
After generating an exam paper, check every item:
|
||||
|
||||
- [ ] Question numbers sequential, points correct, total correct
|
||||
- [ ] Question stems match options / materials / illustrations one-to-one
|
||||
- [ ] **Figures come after stem, before answer area** (strict order)
|
||||
- [ ] **Figure content matches question semantics** (labels, symbols match)
|
||||
- [ ] **Composition grid count ≥ required words × 1.25** (800 words → at least 1000 cells)
|
||||
- [ ] Options aligned with borderless tables (not spaces)
|
||||
- [ ] No wrong pages, missing pages, **no extra blank pages**
|
||||
- [ ] Images / tables / formulas positioned correctly
|
||||
- [ ] **No Markdown table syntax in document** — all data tables use proper docx Table objects
|
||||
- [ ] Fonts, sizes, line spacing consistent
|
||||
- [ ] Answer space matches difficulty and point value
|
||||
- [ ] Clear when printed in B&W
|
||||
- [ ] Subject-specific layout handled properly
|
||||
- [ ] Seal line / page numbers / headers formatted correctly
|
||||
- [ ] Header info complete (school, subject, duration, total score)
|
||||
- [ ] **No extra PageBreak at end of last section**
|
||||
- [ ] **Answer key is either a separate file (default) or on a separate page (if user requested in same file)** — never on the same page as questions
|
||||
411
skills/docx/scenes/official-doc.md
Executable file
411
skills/docx/scenes/official-doc.md
Executable file
@@ -0,0 +1,411 @@
|
||||
# Scene: Official Document (Government Notice / Letter / Reply / Minutes)
|
||||
|
||||
## Goal
|
||||
|
||||
Generate a complete, formal, properly structured official document ready for Word delivery. Must simultaneously meet:
|
||||
- Correct document type, complete structure, clear elements
|
||||
- Formal government register, stable hierarchy, reliable layout
|
||||
- Ready for approval, circulation, filing, issuance, or formal internal communication
|
||||
|
||||
**Forbidden:** Producing outlines-only / sample paragraphs / writing advice / half-finished drafts; outputting chat-style explanations.
|
||||
|
||||
→ Placeholder convention & universal prohibitions — see `references/common-rules.md`
|
||||
→ **Note:** This scene uses its OWN font and layout specs (not Profile A defaults), because official documents follow GB/T 9704 standards.
|
||||
|
||||
---
|
||||
|
||||
## Scope & Document Type Boundaries
|
||||
|
||||
This scene covers:
|
||||
1. **Notice** — assigning work, communicating requirements, forwarding documents
|
||||
2. **Official Letter** — between non-subordinate organizations: negotiation, inquiry, assistance requests, replies
|
||||
3. **Reply (to Request)** — superior authority answering a subordinate's formal request
|
||||
4. **Meeting Minutes** — recording key outcomes and agreed items
|
||||
|
||||
**Important boundaries:**
|
||||
- "Red header" is a format/layout, not a document type — it typically carries notices, letters, or replies
|
||||
- **Not all official documents need red headers / document numbers / colophons** — only enable when user explicitly requests "red header format", "GB/T 9704 format", or "formal issuance format"
|
||||
- Internal enterprise notices, business letters, meeting minutes often do NOT use full GB/T standard format
|
||||
- This scene does NOT cover: speeches, press releases, promotional materials, papers, summary reports, contracts, or legal opinions
|
||||
|
||||
---
|
||||
|
||||
## Document Type Routing
|
||||
|
||||
```js
|
||||
function selectOfficialType(keywords, purpose) {
|
||||
if (/minutes|meeting/.test(keywords)) return "minutes";
|
||||
if (/reply|respond to request/.test(keywords)) return "reply";
|
||||
if (/letter|inquiry|negotiation/.test(keywords)) return "letter";
|
||||
return "notice"; // default
|
||||
}
|
||||
```
|
||||
|
||||
### Red Header Activation
|
||||
|
||||
```js
|
||||
function needsRedHeader(userRequest) {
|
||||
// Only activate when explicitly requested
|
||||
return /red header|GB\/T 9704|formal issuance|official format/.test(userRequest);
|
||||
}
|
||||
```
|
||||
|
||||
**Rules:**
|
||||
- `needsRedHeader = true` → Enable red header, document number, colophon (full formal elements)
|
||||
- `needsRedHeader = false` → Maintain formal style but no mandatory red header; keep only title + addressee + body + signature
|
||||
|
||||
---
|
||||
|
||||
## Standard Template Structures
|
||||
|
||||
### Template A: Notice
|
||||
1. Red header area (if applicable)
|
||||
2. Document number (if applicable)
|
||||
3. Title
|
||||
4. Addressee
|
||||
5. Reason for issuance
|
||||
6. "The relevant matters are hereby notified as follows:"
|
||||
7. Notice items (expanded by hierarchy)
|
||||
8. Requirements
|
||||
9. Attachment notes (if any)
|
||||
10. Signature (if applicable)
|
||||
11. Date (if applicable)
|
||||
12. Colophon (if applicable)
|
||||
|
||||
**Closing phrase:** "This notice is hereby given." or "Please implement accordingly."
|
||||
|
||||
### Template B: Official Letter
|
||||
1. Red header area (if applicable)
|
||||
2. Document number (if applicable)
|
||||
3. Title
|
||||
4. Addressee
|
||||
5. Reason / reference to incoming letter
|
||||
6. Negotiation / inquiry / reply items
|
||||
7. Closing
|
||||
8. Signature (if applicable)
|
||||
9. Date (if applicable)
|
||||
10. Colophon (if applicable)
|
||||
|
||||
**Closing phrases:** "Please reply by letter." / "This letter is hereby sent." / "This is in reply."
|
||||
|
||||
### Template C: Reply
|
||||
1–11. Similar to Notice structure
|
||||
- Addressee is typically the single requesting organization
|
||||
- Must reference the incoming request document
|
||||
- "After review, the reply is as follows:"
|
||||
- Closing: "This is the reply."
|
||||
|
||||
### Template D: Meeting Minutes
|
||||
1. Title (meeting name + "Minutes")
|
||||
2. Meeting overview (time, place, chair, attendees)
|
||||
3. Agreed items
|
||||
4. Responsibility assignments / follow-up requirements (if applicable)
|
||||
5. Distribution scope (if applicable)
|
||||
|
||||
**Notes:**
|
||||
- Minutes record "agreed items", not a transcript of speeches
|
||||
- Minutes generally do NOT follow standard red header format
|
||||
- Unless user explicitly requests organizational template compliance
|
||||
|
||||
---
|
||||
|
||||
## Input Recognition & Completion
|
||||
|
||||
### Processing Rules
|
||||
1. If user provides a template, historical document, or organizational standard → **always follow it first**
|
||||
2. If information is incomplete → fill conservatively, formally, and appropriately for the government context
|
||||
3. **Never fabricate** policy bases, incoming document numbers, leadership directives, meeting decisions, or official organization names
|
||||
4. If critical info is missing → use standardized placeholders
|
||||
5. Never present a draft as if it were already formally issued
|
||||
|
||||
---
|
||||
|
||||
## Title Drafting Rules
|
||||
|
||||
The title is the most critical identifying element — must accurately, concisely reflect the issuing body, subject matter, and document type.
|
||||
|
||||
| Type | Format | Example |
|
||||
|------|--------|---------|
|
||||
| Notice | Issuing body + "regarding" + subject + "notice" | XX Municipal Government Notice on Issuing the XX Management Measures |
|
||||
| Letter | Issuing body + "regarding" + subject + "letter" | XX Company Letter Regarding Land Use for XX Project |
|
||||
| Reply | Issuing body + "regarding" + subject + "reply" | XX Bureau Reply on Approving Establishment of XX Branch |
|
||||
| Minutes | Meeting name + "minutes" | XX Company Third General Manager Meeting Minutes |
|
||||
|
||||
**Rules:**
|
||||
1. Title must specify the subject — no vague titles ("Notice on Relevant Matters")
|
||||
2. Titles generally do not use periods
|
||||
3. Title length should be moderate — avoid excessive length
|
||||
|
||||
---
|
||||
|
||||
## Addressee & CC
|
||||
|
||||
### Addressee
|
||||
1. The primary recipient of the document
|
||||
2. On its own line, between title and body
|
||||
3. Followed by full-width colon
|
||||
4. Replies typically address only one requesting organization
|
||||
5. Meeting minutes generally do not have a standard addressee
|
||||
|
||||
### CC (Carbon Copy)
|
||||
1. CC recipients are NOT addressees — do not mix them
|
||||
2. CC information typically appears in the colophon area
|
||||
3. Non-red-header documents should not mechanically add "CC:" lines
|
||||
|
||||
---
|
||||
|
||||
## Writing Style & Register
|
||||
|
||||
### Language Style
|
||||
1. Must be **solemn, plain, precise, rigorous, concise**
|
||||
2. **Forbidden:** Literary devices (metaphor, personification, hyperbole, rhetorical questions, exclamations)
|
||||
3. **Forbidden:** Vague expressions ("approximately", "recently", "relevant departments", "as soon as possible") — unless user explicitly requires vague wording
|
||||
4. Time, location, organization, scope, milestones should be as specific as possible
|
||||
5. No sloganeering filler or obvious "AI boilerplate" feel
|
||||
|
||||
### Common Phrase Patterns
|
||||
|
||||
**Purpose phrases:**
|
||||
- "In order to implement..."
|
||||
- "To further standardize..."
|
||||
- "To effectively carry out..."
|
||||
|
||||
**Basis phrases:**
|
||||
- "In accordance with the provisions of..."
|
||||
- "As required by..."
|
||||
- "Pursuant to relevant regulations"
|
||||
|
||||
**Transition phrases:**
|
||||
- Notice: "The relevant matters are hereby notified as follows:"
|
||||
- Letter: "The following is hereby communicated:"
|
||||
- Reply: "After review, the reply is as follows:"
|
||||
- Minutes: "The agreed items of the meeting are recorded as follows:"
|
||||
|
||||
**Closing phrases (must match document type):**
|
||||
- Notice: "This notice is hereby given."
|
||||
- Letter: "Please reply." / "This is hereby communicated." / "This is in reply."
|
||||
- Reply: "This is the reply."
|
||||
- Minutes: generally no fixed closing phrase
|
||||
|
||||
### Conciseness
|
||||
1. Use "because" not "due to the reason that..."
|
||||
2. Use "to" not "for the purpose of..."
|
||||
3. Name specific entities — not "relevant parties" or "related departments"
|
||||
4. Name responsible units — not "all units should ensure implementation" (vague ending)
|
||||
|
||||
---
|
||||
|
||||
## Body Hierarchy & Numbering
|
||||
|
||||
Official document body must strictly follow the standard Chinese government numbering system:
|
||||
|
||||
```
|
||||
I. General matters
|
||||
(1) Sub-items
|
||||
1. Specific points
|
||||
(1) Detail supplements
|
||||
```
|
||||
|
||||
Original Chinese numbering:
|
||||
```
|
||||
一、General matters
|
||||
(一)Sub-items
|
||||
1. Specific points
|
||||
(1)Detail supplements
|
||||
```
|
||||
|
||||
**Rules:**
|
||||
1. No level-skipping
|
||||
2. **Forbidden:** Markdown list markers (`-` `*`)
|
||||
3. No switching between numbering styles at the same level
|
||||
4. Level 1: major tasks; Level 2: sub-items; Levels 3–4: only when truly necessary
|
||||
|
||||
---
|
||||
|
||||
## Truthfulness & Caution
|
||||
1. **Never fabricate** issuing bodies, incoming organizations, document numbers, leadership directives, meeting decisions, or policy bases
|
||||
2. **Never** write "per the spirit of XX meeting" or "per XX directive" unless user explicitly provides these
|
||||
3. **Never** fabricate titles and numbers of referenced documents in replies or letters
|
||||
4. **Never** present a draft as already formally issued
|
||||
5. When information is insufficient → use placeholders, never pretend elements are complete
|
||||
|
||||
---
|
||||
|
||||
## Attachment Notes
|
||||
1. Placed after body text, before signature
|
||||
2. "Attachment:" followed by attachment name
|
||||
3. Multiple attachments: numbered sequentially (Attachment 1, Attachment 2...)
|
||||
4. Attachment names must be clear and specific — never fabricate unknown attachments
|
||||
|
||||
---
|
||||
|
||||
## Signature & Date
|
||||
|
||||
1. Document types requiring signatures should have issuing body name and date
|
||||
2. Not all types mechanically require signatures (minutes typically do not)
|
||||
3. Formal document dates must use Chinese numeral format with proper "〇" character
|
||||
- Example: March 31, 2026 → 二〇二六年三月三十一日
|
||||
4. Document numbers use tortoiseshell brackets "〔〕" (not square brackets "[]")
|
||||
- Example: X政发〔2026〕1号
|
||||
5. Date format must be consistent throughout
|
||||
|
||||
---
|
||||
|
||||
## Palette
|
||||
|
||||
**NO decorative colors.** Pure black text on white background. The only color is red header text.
|
||||
|
||||
```js
|
||||
const palette = { primary:"#000000", body:"#000000", accent:"#000000", surface:"#FFFFFF" };
|
||||
const RED_HEADER = "FF0000"; // Only for red header text
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Page Layout (GB/T 9704-2012 Standard)
|
||||
|
||||
**Only for formal GB/T red-header documents.** Non-GB/T scenarios may use standard margins.
|
||||
|
||||
| Property | Value | Twips |
|
||||
|----------|-------|-------|
|
||||
| Top margin | 3.7 cm | 2098 |
|
||||
| Bottom margin | 3.5 cm | 1984 |
|
||||
| Left margin | 2.8 cm | 1588 |
|
||||
| Right margin | 2.6 cm | 1474 |
|
||||
|
||||
```js
|
||||
// GB/T red header layout
|
||||
page: { size: { width: 11906, height: 16838 }, margin: { top: 2098, bottom: 1984, left: 1588, right: 1474 } }
|
||||
// Non-GB/T formal documents may use standard margins:
|
||||
// margin: { top: 1440, bottom: 1440, left: 1701, right: 1417 }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Font Specifications (GB/T 9704)
|
||||
|
||||
| Element | Font | Size | Style |
|
||||
|---------|------|------|-------|
|
||||
| Red header org name | STXiaoBiaoSong / SimSun Bold | As determined by org | Red (#FF0000), centered |
|
||||
| Document title | STXiaoBiaoSong / SimSun Bold | Er Hao 22pt (size: 44) | Centered |
|
||||
|
||||
**Font fallback for STXiaoBiaoSong:** This font is not installed by default on all systems. WPS ships FZXiaoBiaoSong-S13 instead. Use this fallback chain:
|
||||
- Preferred: `STXiaoBiaoSong` (华文小标宋)
|
||||
- Fallback 1: `FZXiaoBiaoSong-S13` (方正小标宋, available in WPS)
|
||||
- Fallback 2: `SimSun` with Bold (宋体加粗, universally available)
|
||||
|
||||
In code, set primary font and note the fallback:
|
||||
```js
|
||||
font: { eastAsia: "STXiaoBiaoSong" }
|
||||
// Fallback: FZXiaoBiaoSong-S13 → SimSun Bold. User may need to install STXiaoBiaoSong for exact rendering.
|
||||
```
|
||||
| Addressee | FangSong | San Hao 16pt (size: 32) | Left-aligned |
|
||||
| Body | FangSong | San Hao 16pt (size: 32) | Justified, indent 640 |
|
||||
| Level 1 heading | SimHei | San Hao 16pt (size: 32) | Bold |
|
||||
| Level 2 heading | KaiTi | San Hao 16pt (size: 32) | Normal |
|
||||
| Level 3 heading | FangSong | San Hao 16pt (size: 32) | Bold |
|
||||
| Attachment notes | FangSong | San Hao 16pt (size: 32) | Left-aligned |
|
||||
| Signature/date | FangSong | San Hao 16pt (size: 32) | Right-aligned |
|
||||
| Page number | FangSong | Si Hao 14pt (size: 28) | Centered, "— X —" |
|
||||
|
||||
```js
|
||||
styles: {
|
||||
default: {
|
||||
document: {
|
||||
run: { font: { ascii: "Times New Roman", eastAsia: "FangSong" }, size: 32, color: "000000" },
|
||||
paragraph: { spacing: { line: 560 } }, // Fixed 28pt line spacing
|
||||
},
|
||||
heading1: {
|
||||
run: { font: { eastAsia: "SimHei" }, size: 32, bold: true, color: "000000" },
|
||||
},
|
||||
heading2: {
|
||||
run: { font: { eastAsia: "KaiTi" }, size: 32, color: "000000" },
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
**Note:** For "formal administrative style" (not strict GB/T), retain the style logic but do not rigidly require every GB/T element.
|
||||
|
||||
---
|
||||
|
||||
## Code Examples
|
||||
|
||||
### Red Header (red-header documents only)
|
||||
|
||||
```js
|
||||
new Paragraph({ alignment: AlignmentType.CENTER, spacing: { before: 0, after: 200, line: Math.ceil(26 * 23), lineRule: "atLeast" },
|
||||
children: [new TextRun({ text: "XX Municipal Government", font: { eastAsia: "SimSun" },
|
||||
size: 52, bold: true, color: "FF0000" })] })
|
||||
new Paragraph({ border: { bottom: { style: BorderStyle.SINGLE, size: 4, color: "FF0000" } },
|
||||
spacing: { after: 40 }, children: [] })
|
||||
```
|
||||
|
||||
### Page Number Footer
|
||||
|
||||
```js
|
||||
footers: { default: new Footer({ children: [new Paragraph({
|
||||
alignment: AlignmentType.CENTER,
|
||||
children: [
|
||||
new TextRun({ text: "\u2014 ", size: 28 }),
|
||||
new TextRun({ children: [PageNumber.CURRENT], size: 28 }),
|
||||
new TextRun({ text: " \u2014", size: 28 }),
|
||||
],
|
||||
})] }) }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Style Rules
|
||||
|
||||
1. **Strictly follow official document format — no decorative elements**
|
||||
2. NO cover page
|
||||
3. NO TOC
|
||||
4. NO headers (only page numbers in footer)
|
||||
5. NO colors except red header (red-header documents only)
|
||||
6. NO images or charts (unless integral to document content)
|
||||
7. NO fancy fonts — only FangSong, SimHei, KaiTi, STXiaoBiaoSong
|
||||
8. Line spacing: fixed 28pt (`line: 560`) — **NOT** the default 1.5x
|
||||
|
||||
---
|
||||
|
||||
## Scene-Specific Prohibitions
|
||||
|
||||
In addition to universal prohibitions (see `references/common-rules.md`):
|
||||
|
||||
1. Must not write official documents as chat replies, promotional copy, speeches, or papers
|
||||
2. Must not use Markdown headings/lists/bold/italic for document hierarchy
|
||||
3. Must not apply red header/document number/colophon to all document types indiscriminately
|
||||
4. Must not format meeting minutes as a standard red-header notice
|
||||
5. Must not use literary rhetoric, colloquial expressions, or strongly emotional language
|
||||
6. Must not fabricate incoming documents, policies, document numbers, meeting decisions, or superior directives
|
||||
7. Must not use excessive blank lines to create "formal appearance"
|
||||
8. Must not let the document read like a report, paper, or marketing copy
|
||||
|
||||
---
|
||||
|
||||
## Scene-Specific Quality Checks
|
||||
|
||||
In addition to universal checks (see `references/common-rules.md`):
|
||||
|
||||
### Format
|
||||
- [ ] Red header text is #FF0000 and only red header uses color (red-header scenarios)
|
||||
- [ ] Line spacing fixed at 28pt (line: 560)
|
||||
- [ ] FangSong / SimHei / KaiTi correctly applied
|
||||
- [ ] Signature right-aligned, date format correct
|
||||
- [ ] No cover page, no TOC, no header
|
||||
- [ ] Page number format "— X —"
|
||||
- [ ] Red header / document number / colophon only where appropriate
|
||||
|
||||
### Content
|
||||
- [ ] Document type correctly identified, structure matches
|
||||
- [ ] Title is accurate, specific, document type clear (not vague)
|
||||
- [ ] Addressee, attachments, signature, colophon used appropriately
|
||||
- [ ] Closing phrase matches document type
|
||||
- [ ] Body hierarchy strictly follows: 一、(Level 1) →(一)(Level 2) → 1. (Level 3) →(1)(Level 4)
|
||||
- [ ] No Markdown headings/lists/bold/italic mixed in
|
||||
- [ ] Meeting minutes not incorrectly given standard document signature and colophon
|
||||
- [ ] Date uses Chinese numerals with proper "〇" character
|
||||
- [ ] Document number uses tortoiseshell brackets "〔〕"
|
||||
- [ ] No fabricated incoming documents / policy bases / organizational elements
|
||||
- [ ] Register is solemn and plain — no colloquial / literary / promotional tone
|
||||
340
skills/docx/scenes/report.md
Executable file
340
skills/docx/scenes/report.md
Executable file
@@ -0,0 +1,340 @@
|
||||
# Scene: Report / Proposal
|
||||
|
||||
## Goal
|
||||
|
||||
Generate a complete, formal, well-structured report ready for Word delivery. Must simultaneously meet:
|
||||
- Complete structure, clear logic, formal language, definitive conclusions
|
||||
- Objective data presentation, proper Word formatting
|
||||
- Ready for presentation, filing, review, submission, or internal communication
|
||||
|
||||
**Forbidden:** Producing outlines-only / summaries / template annotations / half-finished drafts; outputting chat-style explanations or filler phrases like "here is the report content".
|
||||
|
||||
→ Font profile: **A (Formal)** — see `references/common-rules.md`
|
||||
→ Default layout: standard margins — see `references/common-rules.md`
|
||||
→ Placeholder convention — see `references/common-rules.md`
|
||||
→ Universal prohibitions & quality checks — see `references/common-rules.md`
|
||||
|
||||
---
|
||||
|
||||
## Report Type Routing
|
||||
|
||||
Auto-select structure and expression style based on user intent. If not explicit, infer from topic.
|
||||
|
||||
```js
|
||||
function selectReportType(keywords, topic) {
|
||||
if (/analysis|competitor|industry|operations|data/.test(keywords)) return "analysis";
|
||||
if (/experiment|lab|algorithm|engineering/.test(keywords)) return "experiment";
|
||||
if (/test|QA|performance|security|compatibility/.test(keywords)) return "testing";
|
||||
if (/survey|questionnaire|interview|market research/.test(keywords)) return "research";
|
||||
if (/review|retrospective|post-mortem|summary/.test(keywords)) return "review";
|
||||
if (/proposal|feasibility|implementation|optimization/.test(keywords)) return "proposal";
|
||||
return "analysis"; // default
|
||||
}
|
||||
```
|
||||
|
||||
### 6 Report Types
|
||||
|
||||
| Type | Use Case | Structure Focus | Expression Focus |
|
||||
|------|----------|----------------|-----------------|
|
||||
| analysis | Industry/competitor/operations/data analysis | Background → Dimensions → Findings → Diagnosis → Recommendations | Conclusion-first, clear dimensions, chart-supported, actionable advice |
|
||||
| experiment | Scientific/academic/algorithm/engineering experiments | Objective → Environment → Method → Results → Error → Conclusion | Precise process, clear conditions, objective results, conclusion ties to hypothesis |
|
||||
| testing | Functional/performance/security/compatibility testing | Overview → Scope → Plan → Results → Defects → Risks → Conclusion | Data-driven, traceable, reproducible, supports go/no-go decisions |
|
||||
| research | User/market/survey/interview research | Background → Subjects & Method → Sample → Findings → Synthesis → Recommendations | Clear sample boundaries, layered findings, recommendations match findings |
|
||||
| review | Project/incident retrospective, phase summary | Goals → Review → Results → Issues → Lessons → Actions | Clear facts, restrained attribution, specific action items |
|
||||
| proposal | Project/optimization proposal, feasibility study | Status → Goals → Solution → Roadmap → Resources → Risks → Benefits | Strong argumentation, executable plan, clear boundaries |
|
||||
|
||||
---
|
||||
|
||||
## Standard Template Structures
|
||||
|
||||
### Template A: Analysis Report
|
||||
1. Executive Summary
|
||||
2. Background & Objectives
|
||||
3. Scope, Data Sources & Methodology
|
||||
4. Core Findings
|
||||
5. Problem Diagnosis & Root Cause
|
||||
6. Conclusions & Recommendations
|
||||
7. Appendices (if needed)
|
||||
|
||||
### Template B: Experiment Report
|
||||
1. Abstract
|
||||
2. Objective & Hypothesis
|
||||
3. Environment & Materials
|
||||
4. Procedure & Method
|
||||
5. Data & Results
|
||||
6. Error Analysis & Discussion
|
||||
7. Conclusions
|
||||
8. Appendices (if needed)
|
||||
|
||||
### Template C: Testing Report
|
||||
1. Test Overview
|
||||
2. Test Scope & Environment
|
||||
3. Test Plan & Case Design
|
||||
4. Test Results Summary
|
||||
5. Defect Analysis & Distribution
|
||||
6. Risk Assessment & Outstanding Issues
|
||||
7. Test Conclusions
|
||||
8. Appendices (if needed)
|
||||
|
||||
### Template D: Research Report
|
||||
1. Research Summary
|
||||
2. Background & Objectives
|
||||
3. Subjects & Methodology
|
||||
4. Sample & Data Description
|
||||
5. Core Findings
|
||||
6. Problem Synthesis
|
||||
7. Recommendations & Action Direction
|
||||
8. Appendices (if needed)
|
||||
|
||||
### Template E: Review / Summary Report
|
||||
1. Overview
|
||||
2. Goals & Scope
|
||||
3. Process Review
|
||||
4. Results Summary
|
||||
5. Issues & Root Cause Analysis
|
||||
6. Lessons Learned
|
||||
7. Follow-up Action Plan
|
||||
8. Appendices (if needed)
|
||||
|
||||
### Template F: Proposal / Feasibility Report
|
||||
1. Executive Summary
|
||||
2. Current State & Problem Analysis
|
||||
3. Goals & Expected Outcomes
|
||||
4. Solution Design
|
||||
5. Implementation Roadmap & Milestones
|
||||
6. Resource Requirements & Budget
|
||||
7. Risk Analysis & Mitigation
|
||||
8. Expected Benefits & Evaluation
|
||||
9. Appendices (if needed)
|
||||
|
||||
**If the user provides a company/school/course template or fixed chapter requirements, always follow those first.**
|
||||
|
||||
---
|
||||
|
||||
## Input Recognition & Completion
|
||||
|
||||
### User May Provide
|
||||
Report topic, type, use case, audience, industry, length requirements, data sources, structure requirements, output purpose (presentation/filing/audit/review/external submission/coursework), template files, company/department/project/author/date, etc.
|
||||
|
||||
### Processing Rules
|
||||
1. If the user provides a template, existing document, company standard, or format example, **always follow it first**
|
||||
2. If information is incomplete, fill in conservatively — completions must be **restrained, natural, credible, professional**
|
||||
3. **Never fabricate** unrealistic data, conclusions, test results, business metrics, project statuses, policy backgrounds, or customer feedback
|
||||
4. If critical information is missing and cannot be safely inferred, use standardized placeholders
|
||||
5. If no real data is available, prefer low-hallucination approaches: "status description → analysis framework → problem synthesis → recommendations"
|
||||
|
||||
---
|
||||
|
||||
## Content Quality Constraints
|
||||
|
||||
### Logic & Structure
|
||||
1. Report must revolve around a clear topic, objective, audience, and through-line
|
||||
2. Must not just pile up background/concepts/vague statements — must demonstrate analysis, synthesis, judgment, comparison, or review value
|
||||
3. Terminology must be consistent throughout — concepts must not drift
|
||||
4. Abstract, body, conclusions, and recommendations must be consistent — no self-contradiction
|
||||
5. Must form a complete loop: "background → objective → method/basis → process/status → findings/results → problems/judgment → recommendations/conclusions"
|
||||
6. Each major chapter must have a clear core conclusion or topic sentence — no information dump
|
||||
|
||||
### Language Style
|
||||
1. Formal, objective, restrained, professional
|
||||
2. No colloquial expressions, chat tone, hyperbole, emotional language, or propaganda style
|
||||
3. For management/decision-maker audience: conclusion-first, highlight key points, actionable recommendations
|
||||
4. For technical/testing reports: clear basis, reproducible process, verifiable results, stated risks
|
||||
|
||||
### Data Expression
|
||||
1. Never use vague expressions as main conclusions: "significantly improved", "obviously optimized", "performed well", "has certain issues"
|
||||
2. If data exists, express quantitatively (e.g., "average response time under 200 ms" not "fast response")
|
||||
3. First occurrence of a term: write full name with abbreviation, e.g., "Application Programming Interface (API)"
|
||||
4. Without real data backing, never fabricate precise figures
|
||||
5. Statements about facts, data, status, and results must be internally consistent
|
||||
|
||||
### Truthfulness & Conservative Generation
|
||||
1. Never fabricate test results, experiment data, growth rates, customer counts, interview conclusions, sample distributions, or launch decisions
|
||||
2. Never present speculation as proven fact
|
||||
3. Never fabricate meeting minutes, regulatory bases, customer feedback, or system logs
|
||||
4. When information is insufficient, use placeholders — never pretend information is complete
|
||||
5. Conclusions must be restrained — do not overstate effects, risks, or value
|
||||
6. Recommendations must be grounded in preceding analysis — no conclusions from thin air
|
||||
|
||||
---
|
||||
|
||||
## Chapter Content Requirements
|
||||
|
||||
### (1) Cover
|
||||
1. Formal reports should have a cover page
|
||||
2. Cover includes: title, subtitle (if any), organization/department, author, date, classification (if requested)
|
||||
3. Cover must be a separate section
|
||||
4. Cover does not display page numbers
|
||||
5. Use `selectCoverRecipe()` for recipe + palette (see design-system.md)
|
||||
6. Common recipes: general report R1, whitepaper R2, consulting R3, proposal R4
|
||||
|
||||
### (2) Executive Summary
|
||||
1. Formal reports **must have** a summary opening — never jump directly into details
|
||||
2. Summary should briefly state: background, objective, key methodology, key findings, main recommendations
|
||||
3. Suitable for quick reading by management — generally 200–400 words
|
||||
4. Must not read like a TOC description or pile of background filler
|
||||
|
||||
### (3) Table of Contents
|
||||
1. Medium-to-long formal reports should include a TOC
|
||||
2. TOC must be generated from real heading styles (Heading + TOC field) — never write a fake TOC
|
||||
3. TOC page is typically a separate page
|
||||
4. TOC depth: usually 2–3 levels
|
||||
|
||||
### (4) Background & Objectives
|
||||
1. Must explain why this report exists
|
||||
2. Must state what problem/scenario/audience the report serves
|
||||
3. If scope boundaries exist, state what the report does NOT cover
|
||||
4. Must not be vague/grand background — must relate directly to this report's task
|
||||
|
||||
### (5) Methodology / Scope / Basis
|
||||
1. Must state what materials, criteria, methods, and time range the report is based on
|
||||
2. Analysis: data sources, analysis dimensions, criteria definitions
|
||||
3. Experiment: environment, materials, samples, procedure principles
|
||||
4. Testing: scope, version, environment, methods, coverage/rounds
|
||||
5. Research: sample source, sample size, research method, time range
|
||||
6. Reader must understand how conclusions were derived
|
||||
|
||||
### (6) Core Content / Process / Status / Results
|
||||
1. Organized by logical or dimensional order — no chaotic piling
|
||||
2. Each section should lead with its conclusion, then expand with evidence
|
||||
3. Results must be specific — never just "performed well" or "has certain issues"
|
||||
4. Data, metrics, phenomena, and comparisons must be clearly stated
|
||||
5. If charts are needed but cannot be generated, use chart placeholders (see below)
|
||||
|
||||
### (7) Analysis / Discussion / Problem Diagnosis
|
||||
1. Must not merely repeat earlier results
|
||||
2. Must explain what results mean, what patterns they reveal, what problems they expose
|
||||
3. May include: comparative analysis, root cause analysis, mechanism analysis, anomaly explanation, limitations, risk boundaries
|
||||
4. Analysis must be consistent with preceding data and facts
|
||||
|
||||
### (8) Conclusions / Recommendations / Next Steps
|
||||
1. Conclusions must respond to report objectives
|
||||
2. Recommendations must be executable — not just principle slogans
|
||||
3. Recommendations should state: who executes, what to do, when, expected improvement
|
||||
4. Testing/review: clear verdict (pass / conditional pass / fail)
|
||||
5. Retrospective/summary: specific follow-up action items
|
||||
|
||||
### (9) Appendices
|
||||
1. Supplementary material valuable to the report but not suitable for the main body
|
||||
2. Includes: raw data excerpts, detailed parameters, supplementary tables, sample screenshots
|
||||
3. Appendices should be on separate pages with proper headings
|
||||
|
||||
---
|
||||
|
||||
## Chart Placeholder Convention
|
||||
|
||||
When charts are needed but cannot be directly generated:
|
||||
|
||||
```
|
||||
[Chart Placeholder: Bar chart; Topic: Q1-Q4 2025 revenue comparison; X-axis: Quarter; Y-axis: Revenue (10K CNY); Style: clean business]
|
||||
```
|
||||
|
||||
**Rules:**
|
||||
- Specify: chart type, topic, axis meanings, key dimensions, optional palette suggestion
|
||||
- Placeholder must be a standalone paragraph — never inline
|
||||
- Never use vague placeholders like "insert chart here"
|
||||
|
||||
**Prefer direct generation:** Charts that can be produced via matplotlib should be generated as embedded PNGs. Placeholders are a fallback only.
|
||||
|
||||
---
|
||||
|
||||
## Content-to-Word Mapping
|
||||
|
||||
### Heading Levels
|
||||
1. Strict hierarchy — no level-skipping
|
||||
2. Headings must be informative — never "Background", "Content", "Other" (use "Project Background & Report Objectives" instead)
|
||||
3. Do not mix multiple numbering systems
|
||||
4. Normal paragraphs must not masquerade as headings
|
||||
|
||||
### Paragraphs
|
||||
1. Do not use consecutive blank lines for visual spacing
|
||||
2. Each paragraph should be a complete semantic unit — not too long or too fragmented
|
||||
|
||||
### Lists
|
||||
1. Use lists only when genuinely needed — an entire report must not be bullet points
|
||||
2. Nesting depth ≤ 3 levels
|
||||
3. Consistent punctuation within a list (all complete sentences or all fragments)
|
||||
4. Combine "key points" with "analysis paragraphs" — never just list without explaining
|
||||
|
||||
### Tables
|
||||
1. Use tables only for structured data (statistics, comparisons, parameter lists)
|
||||
2. Every table must have a header row — headers must not be blank
|
||||
3. Avoid heavily merged-cell complex nested tables
|
||||
4. Tables must have introductory and explanatory text before/after
|
||||
5. Cell content should be concise — avoid long paragraphs inside cells
|
||||
|
||||
### Emphasis
|
||||
1. Bold only for key conclusions, critical metrics, first occurrence of key terms
|
||||
2. Never bold entire paragraphs
|
||||
3. Avoid italic, strikethrough, and other unstable styles
|
||||
|
||||
---
|
||||
|
||||
## Palette Selection
|
||||
|
||||
| Report Type | Suggested Palette |
|
||||
|-------------|-------------------|
|
||||
| General | Neutral calm (primary: #101820) |
|
||||
| Consulting | Warm terracotta |
|
||||
| Tech | Cool dawn mist |
|
||||
| Environment / Education | Warm sunshine |
|
||||
| Medical | Cool mint |
|
||||
|
||||
See `references/design-system.md` for full palette definitions.
|
||||
|
||||
---
|
||||
|
||||
## Document Structure
|
||||
|
||||
1. **Cover** — via `selectCoverRecipe()` (see design-system.md)
|
||||
- Separate section, page margin typically 0
|
||||
- Common: general R1, whitepaper R2, consulting R3, proposal R4
|
||||
|
||||
2. **Table of Contents** — H1–H3, separate section
|
||||
|
||||
3. **Executive Summary** — 1 page max
|
||||
|
||||
4. **Body** — Chapters per selected template (A–F)
|
||||
|
||||
5. **Conclusions & Recommendations**
|
||||
|
||||
6. **Appendices** — Raw data, detailed tables
|
||||
|
||||
---
|
||||
|
||||
## Professional Elements
|
||||
|
||||
- **Page numbers**: bottom center, size 18, color "808080"
|
||||
- **Header**: report title (abbreviated), size 18, color "808080"
|
||||
- **Figure/table numbering**: sequential (Figure 1 / Table 1)
|
||||
- **Cover**: no page number, no header/footer
|
||||
- **TOC**: optional Roman numerals or no page numbers
|
||||
- **Body**: Arabic numerals, continuous
|
||||
|
||||
---
|
||||
|
||||
## Scene-Specific Quality Checks
|
||||
|
||||
In addition to universal checks (see `references/common-rules.md`):
|
||||
|
||||
### Format
|
||||
- [ ] Executive summary ≤ 1 page
|
||||
- [ ] Figures/tables have captions ("Figure X: description" / "Table X: description")
|
||||
- [ ] Cover recipe matches report type
|
||||
- [ ] Data charts use palette accent color
|
||||
|
||||
### Content
|
||||
- [ ] Has executive summary — not starting directly with details
|
||||
- [ ] Heading names are specific and meaningful
|
||||
- [ ] Complete loop: background → basis → content → analysis → conclusions/recommendations
|
||||
- [ ] No fabricated or exaggerated details
|
||||
- [ ] Abstract and conclusions are consistent
|
||||
- [ ] Terminology consistent throughout
|
||||
- [ ] Data expressions are quantified, not vague
|
||||
- [ ] Recommendations are actionable with owners and timeline
|
||||
|
||||
### Structure
|
||||
- [ ] Heading hierarchy has no level-skipping
|
||||
- [ ] List nesting ≤ 3 levels
|
||||
- [ ] Tables have headers with intro/explanation text
|
||||
- [ ] Bold used sparingly for emphasis only
|
||||
534
skills/docx/scenes/resume.md
Executable file
534
skills/docx/scenes/resume.md
Executable file
@@ -0,0 +1,534 @@
|
||||
# Scene: Resume / CV
|
||||
|
||||
## Goal
|
||||
|
||||
Generate a complete, authentic, well-structured, position-targeted resume with stable Word formatting. Must simultaneously meet:
|
||||
- Authentic and credible content, clear position targeting
|
||||
- ATS-friendly, stable Word layout
|
||||
- Clean structure, professional visual design, easy to scan
|
||||
|
||||
**Execution priority** (when conflicting): Position relevance > Information readability > ATS compatibility > Visual decoration
|
||||
|
||||
**Forbidden:** Producing advice-only / fragments / half-finished drafts; outputting chat-style explanations.
|
||||
|
||||
→ Font profile: **B (Visual)** — see `references/common-rules.md`
|
||||
→ Placeholder convention & universal prohibitions — see `references/common-rules.md`
|
||||
|
||||
---
|
||||
|
||||
## Scope
|
||||
|
||||
Default: generate a position-oriented general resume. Switch to English resume, academic CV, international format, or design portfolio style only when explicitly requested by the user.
|
||||
|
||||
---
|
||||
|
||||
## Resume Type Routing
|
||||
|
||||
Auto-select module order based on user background and target:
|
||||
|
||||
### General Resume (default)
|
||||
Name & Contact → Target Position → Profile Summary (optional) → Core Skills → Work Experience → Projects → Education → Certifications / Awards
|
||||
|
||||
### New Graduate Resume
|
||||
Name & Contact → Target Position → Education → Internship Experience → Projects → Campus Activities / Competitions / Awards → Skills & Certifications
|
||||
|
||||
### Technical Role Resume
|
||||
Name & Contact → Target Direction → Profile Summary (optional) → Tech Stack / Core Skills → Work Experience → Projects → Education → Open Source / Papers / Patents / Competitions
|
||||
|
||||
### Academic CV
|
||||
Name & Contact → Research Direction / Target → Education → Research Experience → Papers / Patents / Projects / Grants → Teaching / Academic Service → Awards / Skills / Languages
|
||||
|
||||
---
|
||||
|
||||
## Input Processing Rules
|
||||
|
||||
1. If user provides a target position or JD → **must reorganize and rewrite content around position requirements**
|
||||
2. If user provides a raw draft → prioritize restructuring, phrasing refinement, and priority reordering; do not rewrite into an unfamiliar career
|
||||
3. **Never fabricate** companies, positions, degrees, projects, certifications, awards, papers, patents, data results, or achievements
|
||||
4. If critical data is missing → use conservative expressions or placeholder `【Please fill in: ______】`; never fabricate precise numbers
|
||||
5. A single resume should generally serve only one primary career direction
|
||||
|
||||
---
|
||||
|
||||
## Content Quality Constraints
|
||||
|
||||
### Core Principles
|
||||
1. Resume must revolve around the target position — do not spread all experiences equally
|
||||
2. Most relevant experiences, projects, and skills must be **placed first and detailed**
|
||||
3. Terminology, company names, position titles, date formats, and skill names must be consistent
|
||||
4. Must demonstrate: **personal positioning → capability tags → relevant experience → provable results**
|
||||
5. No piling of vague self-praise; no inspirational writing or chronological dumps
|
||||
|
||||
### Experience Writing Standards
|
||||
|
||||
Each experience bullet should demonstrate: **Action + Object/Context + Method + Result/Impact**
|
||||
|
||||
**Recommended verbs:** Led, built, drove, optimized, refactored, designed, delivered, coordinated, improved, reduced, achieved
|
||||
|
||||
**Rules:**
|
||||
- "Responsible for" / "participated in" are not absolutely forbidden, but must include scope and results
|
||||
- Each bullet is concise — one core contribution per bullet
|
||||
- Quantify when possible, but do not force-bold all numbers
|
||||
- Recent experience gets detail; low-relevance/low-value experience gets compressed or removed
|
||||
- Reverse chronological order — most recent and relevant first
|
||||
- Expand the most recent 2 experiences; compress earlier ones
|
||||
|
||||
### Profile Summary / Self-Assessment
|
||||
1. Not mandatory
|
||||
2. If included, frame as "Profile Summary" — **3–4 lines max**
|
||||
3. Focus on: years of experience, career direction, core capabilities, representative achievements, position fit
|
||||
4. **Forbidden** as main content: "hardworking", "strong sense of responsibility", "team player", "quick learner", "outgoing personality"
|
||||
|
||||
### Truthfulness & Risk Control
|
||||
1. Never fabricate experiences, achievements, education, awards, or certifications
|
||||
2. Never upgrade "participated in" to "led" unless user information supports it
|
||||
3. Never attribute team results entirely to the individual
|
||||
4. Never fabricate revenue, conversion rates, headcount, budgets, or technical metrics
|
||||
5. If no data available, use restrained expressions: "improved delivery efficiency", "shortened processing cycle", "supported core business launch"
|
||||
|
||||
---
|
||||
|
||||
## Length Control
|
||||
|
||||
| Candidate Type | Target Pages |
|
||||
|---------------|-------------|
|
||||
| New graduate / <3 years experience | **1 page** |
|
||||
| 3–10 years experience | 1–2 pages |
|
||||
| Senior manager / researcher / academic CV | May exceed 2 pages, but must maintain information density |
|
||||
|
||||
**Compression rules:**
|
||||
- Experiences >5 years old with low relevance should be compressed
|
||||
- Experiences >10 years old and irrelevant may be omitted
|
||||
- Never pad low-value experiences just to "look comprehensive"
|
||||
|
||||
---
|
||||
|
||||
## ATS & Structure Constraints
|
||||
|
||||
1. Core information must be plain text — never rely on images, icons, text boxes, or headers/footers for key content
|
||||
2. No embedded charts, objects, SmartArt, or WordArt
|
||||
3. Experience descriptions use consistent bullet symbols — no complex auto-numbering
|
||||
4. Bullets within the same position should be compact — no excess blank lines
|
||||
|
||||
**Table layout vs. ATS balance:** The 3 visual templates (A/B/C) use Table-based layouts for Word visual quality. In strict ATS scenarios (user explicitly says "ATS priority"), prefer Template B (single-column) with reduced table dependency. Default: visual quality first.
|
||||
|
||||
---
|
||||
|
||||
## Module Naming
|
||||
|
||||
Use only standard, universal, recruiter-familiar names:
|
||||
- Personal Info, Target Position, Profile Summary, Core Skills, Work Experience, Projects, Education, Certifications, Awards, Languages
|
||||
|
||||
**Forbidden fancy names:** "My Growth Journey", "Self-Appreciation", "Shining Moments", "Life Motto"
|
||||
|
||||
---
|
||||
|
||||
## Template Disease Prevention
|
||||
|
||||
1. Do not include irrelevant identity tags (political affiliation, hometown, etc.) unless user explicitly requests
|
||||
2. Do not place low-priority modules (hobbies, languages, personality traits) before work experience
|
||||
3. Do not combine cover letter and resume in one document (unless user explicitly requests)
|
||||
4. Do not let template feel overpower actual personal information
|
||||
5. Do not let "self-assessment" occupy the golden area of the page (should come after core skills/experience)
|
||||
|
||||
---
|
||||
|
||||
## Template Selection
|
||||
|
||||
Three templates are provided, auto-selected based on user needs:
|
||||
|
||||
| Template | Layout | Best For | Color Style |
|
||||
|----------|--------|----------|-------------|
|
||||
| A | Left sidebar + right body | General purpose, tech roles | Dark grey sidebar + blue bar headings |
|
||||
| B | Dark header banner + single column | Content-heavy / senior candidates | Dark blue header + underline headings |
|
||||
| C | Left sidebar + vertical-line headings | International / bilingual / foreign companies | Blue sidebar + left-border headings |
|
||||
|
||||
**Selection logic:**
|
||||
- Default: Template A
|
||||
- Lots of content (expected > 1 page) → Template B (no sidebar, better space utilization)
|
||||
- User explicitly requests bilingual / English → Template C
|
||||
|
||||
### Industry Color Suggestions
|
||||
|
||||
| Career Direction | Sidebar BG | Accent Color | Recommended Template |
|
||||
|-----------------|-----------|-------------|---------------------|
|
||||
| Tech / Internet | `#1A1F36` (deep blue-purple) | `#667eea` (amethyst) | A or C |
|
||||
| Finance / Consulting | `#0F2027` (deep sea blue) | `#D4AF37` (gold) | A or B |
|
||||
| Design / Creative | `#2D1B30` (deep purple) | `#f5576c` (coral pink) | A or C |
|
||||
| Education / Training | `#1A3A3A` (dark green) | `#3CB4A0` (mint green) | A |
|
||||
| Medical / Health | `#0E2030` (dark cyan) | `#3888A8` (medical blue) | B |
|
||||
| General / Default | `#303030` (warm dark neutral) | `#B89870` (warm accent) | A |
|
||||
|
||||
When industry is unspecified, use default warm neutral palette. This aligns with the Visual Profile warm-neutral guidance in `design-system.md`.
|
||||
|
||||
## Key Rules
|
||||
|
||||
- **NO cover page / NO TOC**
|
||||
- **Target: 1 page** (2 pages max for senior roles)
|
||||
- **Compact spacing**: `line: 276` (1.15x)
|
||||
- All templates use **bilingual section headings** (e.g., "Work Experience 工作经历")
|
||||
|
||||
---
|
||||
|
||||
## Template A: Left Sidebar + Color Bar Headings
|
||||
|
||||
### Color Palette
|
||||
```js
|
||||
const S = {
|
||||
bg: "3B4F5C", // sidebar background (dark grey-blue)
|
||||
text: "D8E2E8", // sidebar text
|
||||
label: "8BA0AD", // sidebar secondary text
|
||||
accent: "2F97B8", // accent color (blue-cyan)
|
||||
title: "1A2D38", // body heading
|
||||
body: "2C3E4A", // body content
|
||||
sec: "6B8592", // secondary info (dates etc.)
|
||||
};
|
||||
```
|
||||
|
||||
### Layout Structure
|
||||
```
|
||||
┌──────────┬──────────────────────┐
|
||||
│ [Photo] │ ██ Profile ██ │ ← Blue bar heading
|
||||
│ │ Summary text... │
|
||||
│ Name │ │
|
||||
│ Title │ ██ Work Experience ██│
|
||||
│ │ Company Role Date │
|
||||
│ ──────── │ ▸ Achievement... │
|
||||
│ Basic │ ▸ Achievement... │
|
||||
│ Info │ │
|
||||
│ │ ██ Projects ██ │
|
||||
│ ──────── │ ... │
|
||||
│ Contact │ │
|
||||
│ │ ██ Education ██ │
|
||||
│ ──────── │ ... │
|
||||
│ Skills │ │
|
||||
│ Java ●●●●○│ │
|
||||
│ Go ●●●○○│ │
|
||||
│ │ │
|
||||
│ ──────── │ │
|
||||
│ Certs │ │
|
||||
└──────────┴──────────────────────┘
|
||||
30% 70%
|
||||
```
|
||||
|
||||
### Implementation Notes
|
||||
|
||||
**Page setup:**
|
||||
```js
|
||||
page: { margin: { top: 0, bottom: 0, left: 0, right: 0 } }
|
||||
// Use Table to simulate columns: columnWidths: [3400, 8506]
|
||||
// ⚠️ Row height must use "exact" with safety margin to prevent overflow blank pages
|
||||
// Row height: height: { value: 16038, rule: "exact" }
|
||||
// 16038 = 16838(A4 height) - 1200(safety margin for cross-engine compatibility)
|
||||
```
|
||||
|
||||
**Sidebar element order:**
|
||||
1. Photo placeholder (rectangle + border, width 2400 DXA, height 1800)
|
||||
2. Name (32pt bold white SimHei) + Title (18pt accent)
|
||||
3. Basic info (DOB / degree / school)
|
||||
4. Contact info (phone / email / address)
|
||||
5. Skill ratings (name + ●○ dot rating, 5 levels each)
|
||||
6. Certificates list
|
||||
|
||||
**Right-side section headings (color bar style):**
|
||||
```js
|
||||
// Full-width bar background + white Chinese text + lighter English text
|
||||
new Table({ columnWidths:[7600], rows:[new TableRow({ children:[
|
||||
new TableCell({
|
||||
shading: { fill: S.accent, type: ShadingType.CLEAR },
|
||||
margins: { top:40, bottom:40, left:200, right:100 },
|
||||
children: [new Paragraph({ children: [
|
||||
new TextRun({ text: "Work Experience ", size:22, bold:true, color:"FFFFFF", font:"SimHei" }),
|
||||
new TextRun({ text: "Experience", size:18, color:"C8E8F0", font:"Times New Roman", italics:true }),
|
||||
] })],
|
||||
})
|
||||
] })] });
|
||||
```
|
||||
|
||||
**Experience entry format:**
|
||||
```js
|
||||
// Line 1: Company(bold) + Title(accent) + Date(right-aligned)
|
||||
new Paragraph({
|
||||
tabStops: [{ type: TabStopType.RIGHT, position: 7200 }],
|
||||
children: [
|
||||
new TextRun({ text: "Company Name", size:22, bold:true, color:S.title }),
|
||||
new TextRun({ text: " Role Title", size:20, color:S.accent }),
|
||||
new TextRun({ text: "\t2023.06 — Present", size:17, color:S.sec }),
|
||||
]
|
||||
});
|
||||
// Line 2+: ▸ bullet points
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Template B: Dark Header Banner + Single Column
|
||||
|
||||
### Color Palette
|
||||
```js
|
||||
const C = {
|
||||
dark: "1A3352", // header background (dark blue)
|
||||
accent: "2980B9", // accent color
|
||||
title: "1A2636", // heading
|
||||
body: "2C3E50", // body text
|
||||
sec: "6B8599", // secondary info
|
||||
light: "E8EFF5", // light background
|
||||
};
|
||||
```
|
||||
|
||||
### Layout Structure
|
||||
```
|
||||
┌────────────────────────────────┐
|
||||
│ ██████████████████████████████ │ ← Dark blue background banner
|
||||
│ █ Name Title █ │ Contains name / title /
|
||||
│ █ Phone | Email | Location █ │ contact / basic info
|
||||
│ █ DOB | Degree | School █ │
|
||||
│ ██████████████████████████████ │
|
||||
│ │
|
||||
│ Profile │ ← Underline heading
|
||||
│ ───────────────────────────── │
|
||||
│ Summary text... │
|
||||
│ │
|
||||
│ Work Experience │
|
||||
│ ───────────────────────────── │
|
||||
│ Company | Role Date │
|
||||
│ • Achievement... │
|
||||
│ ... │
|
||||
│ │
|
||||
│ Skills │
|
||||
│ ───────────────────────────── │
|
||||
│ Programming ●●●●○ Java/Go/...│ ← Rating + details
|
||||
└────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Implementation Notes
|
||||
|
||||
**Header banner:**
|
||||
```js
|
||||
// Table single row single column, dark background, height 2400 DXA
|
||||
new Table({ columnWidths:[11906], rows:[new TableRow({
|
||||
height: { value:2400, rule:"exact" },
|
||||
children:[new TableCell({
|
||||
shading: { fill: C.dark },
|
||||
margins: { top:300, bottom:200, left:800, right:800 },
|
||||
verticalAlign: VerticalAlign.TOP, // Never use CENTER in exact-height rows (WPS incompatible)
|
||||
children: [
|
||||
// Line 1: Name(48pt white) + Title
|
||||
// Line 2: Phone | Email | Location
|
||||
// Line 3: DOB | Degree | School
|
||||
]
|
||||
})]
|
||||
})] });
|
||||
```
|
||||
|
||||
**Section headings (underline style):**
|
||||
```js
|
||||
new Paragraph({
|
||||
borders: { bottom: { style: BorderStyle.SINGLE, size: 2, color: C.accent } },
|
||||
children: [
|
||||
new TextRun({ text: "Work Experience", size:24, bold:true, color:C.accent, font:"SimHei" }),
|
||||
new TextRun({ text: " Experience", size:18, color:C.sec, italics:true }),
|
||||
]
|
||||
});
|
||||
```
|
||||
|
||||
**Skills display (rating + details):**
|
||||
```js
|
||||
// Name(bold) + ●○ rating + specific tools list
|
||||
new Paragraph({ children: [
|
||||
new TextRun({ text: "Programming ", size:19, bold:true, color:C.title }),
|
||||
new TextRun({ text: "●●●●○ ", size:13, color:C.accent }),
|
||||
new TextRun({ text: "Java / Go / Python / TypeScript", size:18, color:C.sec }),
|
||||
] });
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Template C: Blue Sidebar + Vertical-Line Headings
|
||||
|
||||
### Color Palette
|
||||
```js
|
||||
const C = {
|
||||
side: "4A7C8F", // sidebar background (teal-blue)
|
||||
text: "FFFFFF", // sidebar text
|
||||
label: "A0C4D0", // sidebar secondary text
|
||||
accent: "357A8F", // accent color
|
||||
dot: "2F8FAD", // skill dot fill color
|
||||
dotDim: "B8D4DE", // skill dot empty color
|
||||
title: "1A3040", // body heading
|
||||
body: "2C4050", // body content
|
||||
sec: "6B8A98", // secondary info
|
||||
};
|
||||
```
|
||||
|
||||
### Sidebar-Specific Elements
|
||||
|
||||
**Circular photo placeholder:**
|
||||
```js
|
||||
new Paragraph({ alignment: AlignmentType.CENTER,
|
||||
children: [new TextRun({ text: "◯", size:80, color:C.label })]
|
||||
});
|
||||
```
|
||||
|
||||
**Language proficiency matrix:**
|
||||
```js
|
||||
"English ● ● ● ● ○"
|
||||
"Japanese ● ● ○ ○ ○"
|
||||
```
|
||||
|
||||
**Right-side section headings (left-border style):**
|
||||
```js
|
||||
new Paragraph({
|
||||
borders: { left: { style: BorderStyle.SINGLE, size:8, color:C.accent, space:8 } },
|
||||
indent: { left: 120 },
|
||||
children: [
|
||||
new TextRun({ text: "Work Experience", size:24, bold:true, color:C.title, font:"SimHei" }),
|
||||
new TextRun({ text: " Experience", size:18, color:C.sec, italics:true }),
|
||||
]
|
||||
});
|
||||
```
|
||||
|
||||
**Experience entry format (differs from A):**
|
||||
```js
|
||||
// Line 1: Company name (bold)
|
||||
// Line 2: Role (accent color) + Date
|
||||
// Line 3+: ▸ bullet points
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Universal Rules
|
||||
|
||||
### Font Specifications
|
||||
| Element | Font | Size | Style |
|
||||
|---------|------|------|-------|
|
||||
| Name (sidebar) | SimHei | 32pt (size:64) | Bold, white |
|
||||
| Name (header) | SimHei | 24pt (size:48) | Bold, white |
|
||||
| Section heading | SimHei | 11pt (size:22) | Bold |
|
||||
| Company / School | Microsoft YaHei | 11pt (size:22) | Bold |
|
||||
| Role title | Microsoft YaHei | 10pt (size:20) | accent color |
|
||||
| Date range | Microsoft YaHei | 8.5pt (size:17) | sec color |
|
||||
| Bullet description | Microsoft YaHei | 9.5pt (size:19) | body color |
|
||||
| Skill dots | Default | 6.5pt (size:13) | accent / dimColor |
|
||||
|
||||
### Bullet Symbols
|
||||
- Template A / C: `▸` (small triangle)
|
||||
- Template B: `•` (round dot)
|
||||
|
||||
### Skill Rating Rules
|
||||
- 1–5 levels using filled ● and empty ○ dots
|
||||
- One skill per line, name on the left, dots on the right
|
||||
- Filled dot color: accent; empty dot color: dimColor
|
||||
|
||||
### JD Matching Logic
|
||||
When user provides a job description:
|
||||
1. Extract key requirements (skills, experience, education)
|
||||
2. Prioritize matching experience items to the top
|
||||
3. Naturally incorporate JD keywords into descriptions
|
||||
4. Highlight relevant skills
|
||||
|
||||
### Multi-Page Handling
|
||||
|
||||
- 1 page content: Sidebar templates (A/C) or single-column template (B)
|
||||
- Over 1 page: Prefer Template B; if using A/C, switch page 2 to full-width layout with a name bar at the top (Name | Title)
|
||||
|
||||
⚠️ **Multi-page resumes must use multi-section structure:**
|
||||
|
||||
Page 1 and Page 2 must be **different sections** for independent margin and layout control:
|
||||
|
||||
```js
|
||||
sections: [
|
||||
{
|
||||
// Page 1 section — margin 0 (sidebar layout needs full-page)
|
||||
properties: { page: { margin: { top: 0, bottom: 0, left: 0, right: 0 } } },
|
||||
children: [page1Table],
|
||||
},
|
||||
{
|
||||
// Page 2 section — normal margins with header bar
|
||||
properties: { page: { margin: { top: 800, bottom: 600, left: 800, right: 800 } } },
|
||||
children: [pageHeader(name, title), ...page2Content],
|
||||
},
|
||||
]
|
||||
```
|
||||
|
||||
⚠️ **Template B multi-page handling:**
|
||||
|
||||
Template B header banner uses Table simulation:
|
||||
1. Banner `columnWidths` must equal **page content area width** (pageWidth - marginLeft - marginRight), not full page width
|
||||
2. If banner needs full page width → set page 1 section margin to 0, banner columnWidths to 11906
|
||||
3. Page 2+ must be independent sections, margin.top ≥ 800
|
||||
|
||||
⚠️ **Page 2+ top spacing rules (mandatory):**
|
||||
|
||||
1. **Page margin.top must be ≥ 800 twips** (~1.4 cm), never 0
|
||||
2. **Page 2+ needs a header info bar:** concise `Name | Title` bar, height ~400–600 twips, separated from body with light background or bottom line
|
||||
3. **200–300 twips spacing between header bar and body content**
|
||||
4. **Forbidden: content touching the very top of page 2**
|
||||
|
||||
```js
|
||||
// Concise header bar for page 2+
|
||||
function pageHeader(name, title) {
|
||||
return new Table({
|
||||
width: { size: 100, type: WidthType.PERCENTAGE },
|
||||
borders: { top: NB, left: NB, right: NB, insideHorizontal: NB, insideVertical: NB,
|
||||
bottom: { style: BorderStyle.SINGLE, size: 1, color: "D0D0D0" } },
|
||||
rows: [new TableRow({
|
||||
cantSplit: true,
|
||||
height: { value: 500, rule: "atLeast" },
|
||||
children: [new TableCell({
|
||||
margins: { top: 60, bottom: 60, left: 200, right: 200 },
|
||||
borders: { top: NB, left: NB, right: NB, bottom: NB },
|
||||
children: [new Paragraph({
|
||||
children: [
|
||||
new TextRun({ text: name, size: 20, bold: true, color: S.title || C.title }),
|
||||
new TextRun({ text: ` | ${title}`, size: 18, color: S.sec || C.sec }),
|
||||
]
|
||||
})],
|
||||
})],
|
||||
})],
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Scene-Specific Quality Checks
|
||||
|
||||
In addition to universal checks (see `references/common-rules.md`):
|
||||
|
||||
### Format
|
||||
- [ ] Fits within 1 page (senior ≤ 2 pages)
|
||||
- [ ] **Single-page fill rate ≥ 85%** (bottom whitespace ≤ 15%, ~2500 twips)
|
||||
- [ ] Section headings are bilingual
|
||||
- [ ] Skill rating dots correct (●○)
|
||||
- [ ] Experience in reverse chronological order
|
||||
- [ ] No cover page, no TOC
|
||||
- [ ] Line spacing 1.15x (line: 276)
|
||||
- [ ] No extra blank pages
|
||||
- [ ] **Table row height uses `rule: "exact"` with value ≤ 16038** (prevent overflow blank pages)
|
||||
- [ ] **Multi-page: page 2+ has header info bar + proper top spacing**
|
||||
|
||||
### Content
|
||||
- [ ] Clearly organized around target position
|
||||
- [ ] No vague self-assessments ("hardworking", "responsible", "team player")
|
||||
- [ ] No fabricated data or exaggerated results
|
||||
- [ ] Most relevant experience placed first and detailed
|
||||
- [ ] Each bullet demonstrates action + object + method + result
|
||||
- [ ] No long narrative blocks / excessive long sentences / information density imbalance
|
||||
- [ ] Module names are standardized
|
||||
- [ ] Contact info is plain text, clearly positioned
|
||||
- [ ] Header area forms visual center
|
||||
- [ ] Work experience and projects are the visual main body
|
||||
- [ ] Page count matches candidate seniority
|
||||
|
||||
### Single-Page Fill Rules
|
||||
|
||||
Single-page resumes must fully utilize page space — **large bottom whitespace is forbidden:**
|
||||
|
||||
1. If content is insufficient → **proactively expand:**
|
||||
- Add project details, skill keywords, achievement data
|
||||
- Add supplementary modules: profile summary, interests, awards
|
||||
2. Use section spacing (`spacing.before/after`) to **distribute content evenly**
|
||||
3. Sidebar templates (A/C): sidebar height should approach full page
|
||||
- If sidebar content is sparse, increase element spacing
|
||||
- Or add supplementary modules: "Languages", "Interests"
|
||||
4. Assessment: after generation, check last content element position; if >2500 twips from page bottom, adjust
|
||||
1
skills/docx/scripts/__init__.py
Executable file
1
skills/docx/scripts/__init__.py
Executable file
@@ -0,0 +1 @@
|
||||
# Make scripts directory a package for relative imports in tests
|
||||
749
skills/docx/scripts/add_toc_placeholders.py
Executable file
749
skills/docx/scripts/add_toc_placeholders.py
Executable file
@@ -0,0 +1,749 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Add placeholder entries to Table of Contents in a DOCX file.
|
||||
|
||||
This script adds placeholder TOC entries between the 'separate' and 'end'
|
||||
field characters, so users see some content on first open instead of an empty TOC.
|
||||
The original file is replaced with the modified version.
|
||||
|
||||
Usage:
|
||||
python add_toc_placeholders.py <docx_file> # auto-extract headings (default)
|
||||
python add_toc_placeholders.py <docx_file> --auto # explicit auto mode
|
||||
python add_toc_placeholders.py <docx_file> --entries <entries_json>
|
||||
|
||||
entries_json format: JSON string with array of objects:
|
||||
[
|
||||
{"level": 1, "text": "Chapter 1 Overview", "page": "1"},
|
||||
{"level": 2, "text": "Section 1.1 Details", "page": "1"}
|
||||
]
|
||||
|
||||
Default behavior (no flags): auto-extracts Heading 1-3 from the document.
|
||||
Filters out table/figure captions (e.g. "表 1:xxx", "图 2:xxx").
|
||||
|
||||
Example:
|
||||
python add_toc_placeholders.py document.docx
|
||||
python add_toc_placeholders.py document.docx --auto
|
||||
python add_toc_placeholders.py document.docx --entries '[{"level":1,"text":"Introduction","page":"1"}]'
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import html
|
||||
import json
|
||||
import re
|
||||
import shutil
|
||||
import sys
|
||||
import tempfile
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def _extract_headings_from_docx(docx_path: str, max_level: int = 3) -> list:
|
||||
"""Extract headings from a DOCX file for auto-mode TOC generation.
|
||||
|
||||
Args:
|
||||
docx_path: Path to DOCX file
|
||||
max_level: Maximum heading level to include (default 3)
|
||||
|
||||
Returns:
|
||||
List of dicts with 'level', 'text', 'page' keys
|
||||
"""
|
||||
from docx import Document
|
||||
|
||||
doc = Document(docx_path)
|
||||
entries = []
|
||||
page_estimate = 1
|
||||
|
||||
# Pattern to filter out table/figure captions styled as headings
|
||||
caption_pattern = re.compile(r'^[表图]\s*\d')
|
||||
|
||||
for i, para in enumerate(doc.paragraphs):
|
||||
style_name = para.style.name if para.style else ''
|
||||
if not style_name.startswith('Heading'):
|
||||
continue
|
||||
m = re.search(r'(\d+)', style_name)
|
||||
if not m:
|
||||
continue
|
||||
level = int(m.group(1))
|
||||
if level > max_level:
|
||||
continue
|
||||
text = para.text.strip()
|
||||
if not text:
|
||||
continue
|
||||
# Filter table/figure captions
|
||||
if caption_pattern.match(text):
|
||||
continue
|
||||
|
||||
# Rough page estimate: increment every ~8 headings
|
||||
page_estimate = max(1, 1 + i // 8)
|
||||
entries.append({"level": level, "text": text, "page": str(page_estimate)})
|
||||
|
||||
return entries
|
||||
|
||||
|
||||
def add_toc_placeholders(docx_path: str, entries: list = None) -> None:
|
||||
"""Add placeholder TOC entries to a DOCX file (in-place replacement).
|
||||
|
||||
Args:
|
||||
docx_path: Path to DOCX file (will be modified in-place)
|
||||
entries: Optional list of placeholder entries. Each entry should be a dict
|
||||
with 'level' (1-3), 'text', and 'page' keys.
|
||||
"""
|
||||
docx_path = Path(docx_path)
|
||||
|
||||
# Create temp directory for extraction
|
||||
with tempfile.TemporaryDirectory() as temp_dir:
|
||||
temp_path = Path(temp_dir)
|
||||
extracted_dir = temp_path / "extracted"
|
||||
temp_output = temp_path / "output.docx"
|
||||
|
||||
# Extract DOCX
|
||||
with zipfile.ZipFile(docx_path, 'r') as zip_ref:
|
||||
zip_ref.extractall(extracted_dir)
|
||||
|
||||
# Ensure TOC styles exist in styles.xml
|
||||
styles_xml_path = extracted_dir / "word" / "styles.xml"
|
||||
toc_style_mapping = _ensure_toc_styles(styles_xml_path)
|
||||
print(f"TOC style mapping: {toc_style_mapping}")
|
||||
|
||||
# Fix settings.xml: ensure updateFields has val="true"
|
||||
settings_xml_path = extracted_dir / "word" / "settings.xml"
|
||||
_fix_update_fields(settings_xml_path)
|
||||
|
||||
# Fix Heading styles: ensure outlineLvl is set (required for TOC field update)
|
||||
_fix_heading_outline_levels(styles_xml_path)
|
||||
|
||||
# Process document.xml
|
||||
document_xml = extracted_dir / "word" / "document.xml"
|
||||
if not document_xml.exists():
|
||||
raise ValueError("document.xml not found in the DOCX file")
|
||||
|
||||
# Read and process XML
|
||||
content = document_xml.read_text(encoding='utf-8')
|
||||
|
||||
# Fix fldChar structure: split merged begin+instrText+separate into separate <w:r> elements
|
||||
content = _fix_fld_char_structure(content)
|
||||
|
||||
# Find TOC structure and add placeholders (uses lxml for robust XML parsing)
|
||||
modified_content = _insert_toc_placeholders(content, entries, toc_style_mapping)
|
||||
|
||||
# Write back
|
||||
document_xml.write_text(modified_content, encoding='utf-8')
|
||||
|
||||
# Repack DOCX to temp file
|
||||
with zipfile.ZipFile(temp_output, 'w', zipfile.ZIP_DEFLATED) as zipf:
|
||||
for file_path in extracted_dir.rglob('*'):
|
||||
if file_path.is_file():
|
||||
arcname = file_path.relative_to(extracted_dir)
|
||||
zipf.write(file_path, arcname)
|
||||
|
||||
# Replace original file with modified version (use shutil.move for cross-device support)
|
||||
docx_path.unlink()
|
||||
shutil.move(str(temp_output), str(docx_path))
|
||||
|
||||
|
||||
def _fix_update_fields(settings_xml_path: Path) -> None:
|
||||
"""Fix settings.xml to ensure <w:updateFields w:val="true"/> is present.
|
||||
|
||||
The docx npm library generates <w:updateFields/> without val="true",
|
||||
which Word/WPS interprets as false, preventing TOC auto-update on open.
|
||||
"""
|
||||
if not settings_xml_path.exists():
|
||||
return
|
||||
|
||||
content = settings_xml_path.read_text(encoding='utf-8')
|
||||
original = content
|
||||
|
||||
# Case 1: <w:updateFields/> (self-closing, no val) → add val="true"
|
||||
if '<w:updateFields/>' in content:
|
||||
content = content.replace('<w:updateFields/>', '<w:updateFields w:val="true"/>')
|
||||
print('Fixed: <w:updateFields/> → <w:updateFields w:val="true"/>')
|
||||
|
||||
# Case 2: <w:updateFields w:val="false"/> → change to true (match precisely)
|
||||
elif re.search(r'<w:updateFields\s+w:val="false"\s*/>', content):
|
||||
content = re.sub(
|
||||
r'<w:updateFields\s+w:val="false"\s*/>',
|
||||
'<w:updateFields w:val="true"/>',
|
||||
content
|
||||
)
|
||||
print('Fixed: <w:updateFields w:val="false"/> → <w:updateFields w:val="true"/>')
|
||||
|
||||
# Case 3: Not present at all → inject before </w:settings>
|
||||
elif '<w:updateFields' not in content:
|
||||
content = content.replace('</w:settings>', '<w:updateFields w:val="true"/></w:settings>')
|
||||
print('Fixed: added <w:updateFields w:val="true"/> to settings.xml')
|
||||
|
||||
if content != original:
|
||||
settings_xml_path.write_text(content, encoding='utf-8')
|
||||
|
||||
|
||||
def _fix_heading_outline_levels(styles_xml_path: Path) -> None:
|
||||
"""Fix Heading styles to include outlineLvl in pPr.
|
||||
|
||||
The docx npm library creates Heading styles but sometimes doesn't set outlineLvl
|
||||
in the style definition. Without outlineLvl, Word's TOC field update won't find
|
||||
headings even though they display correctly.
|
||||
|
||||
This ensures Heading1 has outlineLvl=0, Heading2 has outlineLvl=1, etc.
|
||||
"""
|
||||
if not styles_xml_path.exists():
|
||||
return
|
||||
|
||||
content = styles_xml_path.read_text(encoding='utf-8')
|
||||
original = content
|
||||
|
||||
W_NS = 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'
|
||||
|
||||
for level in range(1, 7):
|
||||
style_id = f'Heading{level}'
|
||||
outline_val = str(level - 1)
|
||||
|
||||
# Pattern: find <w:style> with w:styleId="HeadingN"
|
||||
style_pattern = (
|
||||
rf'(<w:style[^>]*w:styleId="{style_id}"[^>]*>)'
|
||||
rf'(.*?)'
|
||||
rf'(</w:style>)'
|
||||
)
|
||||
|
||||
match = re.search(style_pattern, content, flags=re.DOTALL)
|
||||
if not match:
|
||||
continue
|
||||
|
||||
style_content = match.group(2)
|
||||
|
||||
# Check if outlineLvl already exists in this style
|
||||
if f'<w:outlineLvl' in style_content:
|
||||
continue
|
||||
|
||||
# Find or create <w:pPr> within this style
|
||||
ppr_match = re.search(r'(<w:pPr[^>]*>)(.*?)(</w:pPr>)', style_content, flags=re.DOTALL)
|
||||
if ppr_match:
|
||||
# Add outlineLvl inside existing pPr
|
||||
new_ppr_content = ppr_match.group(2) + f'<w:outlineLvl w:val="{outline_val}"/>'
|
||||
new_style_content = (
|
||||
style_content[:ppr_match.start()] +
|
||||
ppr_match.group(1) + new_ppr_content + ppr_match.group(3) +
|
||||
style_content[ppr_match.end():]
|
||||
)
|
||||
else:
|
||||
# No pPr exists, create one
|
||||
new_ppr = f'<w:pPr><w:outlineLvl w:val="{outline_val}"/></w:pPr>'
|
||||
# Insert pPr right after style opening (after name/basedOn if present)
|
||||
new_style_content = new_ppr + style_content
|
||||
|
||||
new_style = match.group(1) + new_style_content + match.group(3)
|
||||
content = content[:match.start()] + new_style + content[match.end():]
|
||||
print(f'Fixed: added outlineLvl={outline_val} to {style_id} style')
|
||||
|
||||
if content != original:
|
||||
styles_xml_path.write_text(content, encoding='utf-8')
|
||||
|
||||
|
||||
def _fix_fld_char_structure(xml_content: str) -> str:
|
||||
"""Fix malformed fldChar structure where begin+instrText+separate are in one <w:r>.
|
||||
|
||||
The docx npm library generates:
|
||||
<w:r><w:fldChar begin/><w:instrText>TOC...</w:instrText><w:fldChar separate/></w:r>
|
||||
|
||||
Word/WPS requires the standard structure:
|
||||
<w:r><w:fldChar begin/></w:r>
|
||||
<w:r><w:instrText>TOC...</w:instrText></w:r>
|
||||
<w:r><w:fldChar separate/></w:r>
|
||||
"""
|
||||
# Match a <w:r> that contains both begin fldChar AND instrText AND separate fldChar
|
||||
pattern = (
|
||||
r'<w:r(?:\s[^>]*)?>('
|
||||
r'<w:fldChar[^>]*w:fldCharType="begin"[^>]*/>' # begin
|
||||
r')('
|
||||
r'<w:instrText[^>]*>.*?</w:instrText>' # instrText
|
||||
r')('
|
||||
r'<w:fldChar[^>]*w:fldCharType="separate"[^>]*/>' # separate
|
||||
r')</w:r>'
|
||||
)
|
||||
|
||||
def split_run(match):
|
||||
begin = match.group(1)
|
||||
instr = match.group(2)
|
||||
separate = match.group(3)
|
||||
return f'<w:r>{begin}</w:r><w:r>{instr}</w:r><w:r>{separate}</w:r>'
|
||||
|
||||
modified = re.sub(pattern, split_run, xml_content, flags=re.DOTALL)
|
||||
if modified != xml_content:
|
||||
print("Fixed: split merged fldChar begin+instrText+separate into separate <w:r> elements")
|
||||
|
||||
# Fix TOC instrText: remove \t switch with wrong style names
|
||||
# docx npm lib generates \t "Heading1,1,Heading2,2,..." but Word expects "Heading 1,1,..."
|
||||
# Since we already have \o "1-3" which uses outlineLvl (now fixed), \t is redundant and harmful
|
||||
toc_t_pattern = r'(TOC\s+[^<]*?)\\t\s+"[^&]*"'
|
||||
modified2 = re.sub(toc_t_pattern, r'\1', modified)
|
||||
if modified2 != modified:
|
||||
print("Fixed: removed \\t switch from TOC instrText (\\o with outlineLvl is sufficient)")
|
||||
modified = modified2
|
||||
|
||||
return modified
|
||||
|
||||
|
||||
def _detect_toc_styles(styles_xml_path: Path) -> dict:
|
||||
"""Detect TOC style IDs from styles.xml.
|
||||
|
||||
Args:
|
||||
styles_xml_path: Path to styles.xml
|
||||
|
||||
Returns:
|
||||
Dictionary mapping level (1-3) to style ID string
|
||||
"""
|
||||
if not styles_xml_path.exists():
|
||||
return {}
|
||||
|
||||
content = styles_xml_path.read_text(encoding='utf-8')
|
||||
result = {}
|
||||
|
||||
for level in range(1, 4):
|
||||
# Standard TOC style names: "TOC 1", "TOC 2", "TOC 3" (with space)
|
||||
# or "TOC1", "TOC2", "TOC3" (no space) — docx-js uses numeric IDs like "9", "11", "12"
|
||||
patterns = [
|
||||
rf'w:styleId="(TOC{level})"',
|
||||
rf'w:styleId="(TOC {level})"',
|
||||
rf'<w:name\s+w:val="toc\s*{level}"[^/]*/>\s*</w:name>|<w:name\s+w:val="toc\s*{level}"[^/]*/>',
|
||||
]
|
||||
for pattern in patterns[:2]:
|
||||
m = re.search(pattern, content)
|
||||
if m:
|
||||
result[level] = m.group(1)
|
||||
break
|
||||
else:
|
||||
# Try matching by w:name (case insensitive toc N)
|
||||
# Find <w:style> blocks with name containing "toc N"
|
||||
name_pattern = rf'<w:style[^>]*w:styleId="([^"]*)"[^>]*>.*?<w:name\s+w:val="[Tt][Oo][Cc]\s*{level}"'
|
||||
m = re.search(name_pattern, content, flags=re.DOTALL)
|
||||
if m:
|
||||
result[level] = m.group(1)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def _ensure_toc_styles(styles_xml_path: Path) -> dict:
|
||||
"""Ensure TOC styles exist in styles.xml, adding them if necessary.
|
||||
|
||||
Returns:
|
||||
Dictionary mapping level (1-3) to style ID string
|
||||
"""
|
||||
if not styles_xml_path.exists():
|
||||
return {1: "9", 2: "11", 3: "12"}
|
||||
|
||||
content = styles_xml_path.read_text(encoding='utf-8')
|
||||
detected = _detect_toc_styles(styles_xml_path)
|
||||
result = dict(detected)
|
||||
|
||||
W_NS = 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'
|
||||
|
||||
# Define TOC styles to add if missing
|
||||
toc_style_defs = {
|
||||
1: {
|
||||
'id': '9',
|
||||
'name': 'toc 1',
|
||||
'xml': f'''<w:style w:type="paragraph" w:styleId="9" xmlns:w="{W_NS}">
|
||||
<w:name w:val="toc 1"/>
|
||||
<w:basedOn w:val="Normal"/>
|
||||
<w:uiPriority w:val="39"/>
|
||||
<w:pPr>
|
||||
<w:tabs><w:tab w:val="right" w:leader="dot" w:pos="9026"/></w:tabs>
|
||||
<w:spacing w:before="120" w:after="60"/>
|
||||
</w:pPr>
|
||||
<w:rPr><w:b/><w:bCs/></w:rPr>
|
||||
</w:style>'''
|
||||
},
|
||||
2: {
|
||||
'id': '11',
|
||||
'name': 'toc 2',
|
||||
'xml': f'''<w:style w:type="paragraph" w:styleId="11" xmlns:w="{W_NS}">
|
||||
<w:name w:val="toc 2"/>
|
||||
<w:basedOn w:val="Normal"/>
|
||||
<w:uiPriority w:val="39"/>
|
||||
<w:pPr>
|
||||
<w:tabs><w:tab w:val="right" w:leader="dot" w:pos="9026"/></w:tabs>
|
||||
<w:ind w:left="360"/>
|
||||
<w:spacing w:before="60" w:after="40"/>
|
||||
</w:pPr>
|
||||
</w:style>'''
|
||||
},
|
||||
3: {
|
||||
'id': '12',
|
||||
'name': 'toc 3',
|
||||
'xml': f'''<w:style w:type="paragraph" w:styleId="12" xmlns:w="{W_NS}">
|
||||
<w:name w:val="toc 3"/>
|
||||
<w:basedOn w:val="Normal"/>
|
||||
<w:uiPriority w:val="39"/>
|
||||
<w:pPr>
|
||||
<w:tabs><w:tab w:val="right" w:leader="dot" w:pos="9026"/></w:tabs>
|
||||
<w:ind w:left="720"/>
|
||||
<w:spacing w:before="40" w:after="20"/>
|
||||
</w:pPr>
|
||||
</w:style>'''
|
||||
},
|
||||
}
|
||||
|
||||
modified = False
|
||||
for level in range(1, 4):
|
||||
if level not in result:
|
||||
style_def = toc_style_defs[level]
|
||||
result[level] = style_def['id']
|
||||
# Add style before </w:styles>
|
||||
insert_point = content.rfind('</w:styles>')
|
||||
if insert_point == -1:
|
||||
print(f"WARNING: Could not find </w:styles> to insert TOC {level} style", file=sys.stderr)
|
||||
continue
|
||||
content = content[:insert_point] + style_def['xml'] + '\n' + content[insert_point:]
|
||||
print(f"Added TOC {level} style (ID: {style_def['id']})")
|
||||
modified = True
|
||||
|
||||
if modified:
|
||||
styles_xml_path.write_text(content, encoding='utf-8')
|
||||
|
||||
# Ensure Hyperlink style exists
|
||||
_ensure_hyperlink_style(styles_xml_path)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def _ensure_hyperlink_style(styles_xml_path: Path) -> None:
|
||||
"""Ensure Hyperlink character style exists in styles.xml."""
|
||||
if not styles_xml_path.exists():
|
||||
return
|
||||
|
||||
content = styles_xml_path.read_text(encoding='utf-8')
|
||||
if 'w:styleId="Hyperlink"' in content:
|
||||
return
|
||||
|
||||
W_NS = 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'
|
||||
hyperlink_style = f'''<w:style w:type="character" w:styleId="Hyperlink" xmlns:w="{W_NS}">
|
||||
<w:name w:val="Hyperlink"/>
|
||||
<w:uiPriority w:val="99"/>
|
||||
<w:rPr>
|
||||
<w:color w:val="0563C1"/>
|
||||
<w:u w:val="single"/>
|
||||
</w:rPr>
|
||||
</w:style>'''
|
||||
|
||||
insert_point = content.rfind('</w:styles>')
|
||||
if insert_point != -1:
|
||||
content = content[:insert_point] + hyperlink_style + '\n' + content[insert_point:]
|
||||
styles_xml_path.write_text(content, encoding='utf-8')
|
||||
print("Added Hyperlink character style")
|
||||
|
||||
|
||||
def _insert_toc_placeholders(xml_content: str, entries: list = None, toc_style_mapping: dict = None) -> str:
|
||||
"""Insert placeholder TOC entries and heading bookmarks into XML content.
|
||||
|
||||
Uses lxml ElementTree for robust XML manipulation instead of fragile regex.
|
||||
|
||||
This function does TWO things:
|
||||
1. Adds bookmark anchors to each Heading paragraph (so Word can link TOC → heading)
|
||||
2. Replaces TOC placeholder area with proper entries containing HYPERLINK + PAGEREF
|
||||
|
||||
Args:
|
||||
xml_content: The XML content of document.xml
|
||||
entries: List of placeholder entries with 'level', 'text', 'page' keys
|
||||
toc_style_mapping: Dictionary mapping level to style ID
|
||||
|
||||
Returns:
|
||||
Modified XML content with bookmarks and TOC placeholders
|
||||
|
||||
Raises:
|
||||
RuntimeError: If TOC structure cannot be found or is malformed
|
||||
"""
|
||||
from lxml import etree
|
||||
|
||||
if entries is None:
|
||||
entries = [{"level": 1, "text": "Contents", "page": "1"}]
|
||||
|
||||
if toc_style_mapping is None:
|
||||
toc_style_mapping = {1: "9", 2: "11", 3: "12"}
|
||||
|
||||
W = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
|
||||
R_NS = "http://schemas.openxmlformats.org/officeDocument/2006/relationships"
|
||||
|
||||
# Parse XML
|
||||
root = etree.fromstring(xml_content.encode('utf-8'))
|
||||
nsmap = {'w': W, 'r': R_NS}
|
||||
|
||||
# ── Step 1: Add bookmarks to Heading paragraphs ──
|
||||
bookmark_id_counter = 100000
|
||||
heading_bookmark_map = {} # text → first bookmark_name (backward compat)
|
||||
heading_bookmark_map_all = {} # text → [list of bookmark_names] for duplicate headings
|
||||
|
||||
for para in root.iter(f'{{{W}}}p'):
|
||||
# Find pStyle
|
||||
ppr = para.find(f'{{{W}}}pPr')
|
||||
if ppr is None:
|
||||
continue
|
||||
pstyle = ppr.find(f'{{{W}}}pStyle')
|
||||
if pstyle is None:
|
||||
continue
|
||||
style_val = pstyle.get(f'{{{W}}}val', '')
|
||||
if not re.match(r'Heading\d$', style_val):
|
||||
continue
|
||||
|
||||
# Extract heading text
|
||||
texts = []
|
||||
for t_elem in para.iter(f'{{{W}}}t'):
|
||||
if t_elem.text:
|
||||
texts.append(t_elem.text)
|
||||
heading_text = ''.join(texts).strip()
|
||||
if not heading_text:
|
||||
continue
|
||||
|
||||
# Skip if already has bookmark
|
||||
if para.find(f'{{{W}}}bookmarkStart') is not None:
|
||||
continue
|
||||
|
||||
# Generate bookmark
|
||||
bm_name = f"_Toc{bookmark_id_counter}"
|
||||
bm_id_str = str(bookmark_id_counter)
|
||||
bookmark_id_counter += 1
|
||||
|
||||
# Store mapping (support duplicate headings)
|
||||
if heading_text not in heading_bookmark_map_all:
|
||||
heading_bookmark_map_all[heading_text] = []
|
||||
heading_bookmark_map_all[heading_text].append(bm_name)
|
||||
if heading_text not in heading_bookmark_map:
|
||||
heading_bookmark_map[heading_text] = bm_name
|
||||
|
||||
# Insert bookmarkStart after pPr
|
||||
bm_start = etree.Element(f'{{{W}}}bookmarkStart')
|
||||
bm_start.set(f'{{{W}}}id', bm_id_str)
|
||||
bm_start.set(f'{{{W}}}name', bm_name)
|
||||
|
||||
bm_end = etree.Element(f'{{{W}}}bookmarkEnd')
|
||||
bm_end.set(f'{{{W}}}id', bm_id_str)
|
||||
|
||||
ppr_index = list(para).index(ppr)
|
||||
para.insert(ppr_index + 1, bm_start)
|
||||
# bookmarkEnd at end of paragraph
|
||||
para.append(bm_end)
|
||||
|
||||
bookmarks_added = len(heading_bookmark_map)
|
||||
if bookmarks_added > 0:
|
||||
print(f"Added {bookmarks_added} bookmarks to Heading paragraphs")
|
||||
|
||||
# ── Step 2: Find TOC field structure (begin → instrText → separate → end) ──
|
||||
toc_separate_para = None
|
||||
toc_end_para = None
|
||||
|
||||
# Track field nesting to handle nested fields correctly
|
||||
field_stack = []
|
||||
toc_field_depth = None
|
||||
|
||||
for fld_char in root.iter(f'{{{W}}}fldChar'):
|
||||
fld_type = fld_char.get(f'{{{W}}}fldCharType')
|
||||
run = fld_char.getparent()
|
||||
|
||||
if fld_type == 'begin':
|
||||
para = run.getparent()
|
||||
instr_text = ''
|
||||
found_run = False
|
||||
for sibling in para:
|
||||
if sibling is run:
|
||||
found_run = True
|
||||
it = sibling.find(f'{{{W}}}instrText')
|
||||
if it is not None and it.text:
|
||||
instr_text += it.text
|
||||
continue
|
||||
if found_run and sibling.tag == f'{{{W}}}r':
|
||||
it = sibling.find(f'{{{W}}}instrText')
|
||||
if it is not None and it.text:
|
||||
instr_text += it.text
|
||||
if sibling.find(f'{{{W}}}fldChar') is not None:
|
||||
break
|
||||
|
||||
field_stack.append(instr_text.strip())
|
||||
if 'TOC' in instr_text and toc_field_depth is None:
|
||||
toc_field_depth = len(field_stack)
|
||||
|
||||
elif fld_type == 'separate':
|
||||
if toc_field_depth is not None and len(field_stack) == toc_field_depth:
|
||||
toc_separate_para = run.getparent()
|
||||
|
||||
elif fld_type == 'end':
|
||||
if toc_field_depth is not None and len(field_stack) == toc_field_depth:
|
||||
toc_end_para = run.getparent()
|
||||
break
|
||||
if field_stack:
|
||||
field_stack.pop()
|
||||
|
||||
if toc_separate_para is None or toc_end_para is None:
|
||||
has_begin = root.find(f'.//{{{W}}}fldChar[@{{{W}}}fldCharType="begin"]') is not None
|
||||
has_separate = root.find(f'.//{{{W}}}fldChar[@{{{W}}}fldCharType="separate"]') is not None
|
||||
if not has_begin:
|
||||
raise RuntimeError(
|
||||
"TOC FAILED: No field structure found in document. "
|
||||
"Ensure the code includes a TableOfContents element."
|
||||
)
|
||||
elif not has_separate:
|
||||
raise RuntimeError(
|
||||
"TOC FAILED: TOC field has 'begin' but no 'separate' fldChar. "
|
||||
"Run _fix_fld_char_structure() first or check the docx-js version."
|
||||
)
|
||||
else:
|
||||
raise RuntimeError(
|
||||
"TOC FAILED: Field structure found but no TOC instrText detected. "
|
||||
"Ensure TableOfContents element generates a TOC \\o field code."
|
||||
)
|
||||
|
||||
# ── Step 3: Remove everything between separate-para and end-para ──
|
||||
# The TOC paragraphs may be direct children of <w:body> or wrapped in <w:sdt><w:sdtContent>
|
||||
toc_container = toc_separate_para.getparent() # could be body or sdtContent
|
||||
container_children = list(toc_container)
|
||||
|
||||
sep_idx = container_children.index(toc_separate_para)
|
||||
end_idx = container_children.index(toc_end_para)
|
||||
|
||||
for elem in container_children[sep_idx + 1:end_idx]:
|
||||
toc_container.remove(elem)
|
||||
|
||||
# ── Step 4: Build and insert placeholder paragraphs ──
|
||||
indent_mapping = {1: 0, 2: 360, 3: 720, 4: 1080, 5: 1440, 6: 1800}
|
||||
heading_occurrence_counter = {}
|
||||
|
||||
insert_pos = list(toc_container).index(toc_end_para)
|
||||
|
||||
for entry in entries:
|
||||
level = entry.get('level', 1)
|
||||
text_raw = entry.get('text', '')
|
||||
page = entry.get('page', '1')
|
||||
|
||||
toc_style = toc_style_mapping.get(level, toc_style_mapping.get(1, "9"))
|
||||
indent = indent_mapping.get(level, 0)
|
||||
|
||||
# Resolve bookmark (handle duplicate headings correctly)
|
||||
bm_name = ''
|
||||
if text_raw in heading_bookmark_map_all:
|
||||
occ = heading_occurrence_counter.get(text_raw, 0)
|
||||
bm_list = heading_bookmark_map_all[text_raw]
|
||||
if occ < len(bm_list):
|
||||
bm_name = bm_list[occ]
|
||||
heading_occurrence_counter[text_raw] = occ + 1
|
||||
|
||||
# Build paragraph element
|
||||
p = etree.Element(f'{{{W}}}p')
|
||||
toc_container.insert(insert_pos, p)
|
||||
insert_pos += 1
|
||||
|
||||
# pPr
|
||||
ppr = etree.SubElement(p, f'{{{W}}}pPr')
|
||||
pstyle = etree.SubElement(ppr, f'{{{W}}}pStyle')
|
||||
pstyle.set(f'{{{W}}}val', str(toc_style))
|
||||
if indent > 0:
|
||||
ind = etree.SubElement(ppr, f'{{{W}}}ind')
|
||||
ind.set(f'{{{W}}}left', str(indent))
|
||||
tabs = etree.SubElement(ppr, f'{{{W}}}tabs')
|
||||
tab = etree.SubElement(tabs, f'{{{W}}}tab')
|
||||
tab.set(f'{{{W}}}val', 'right')
|
||||
tab.set(f'{{{W}}}leader', 'dot')
|
||||
tab.set(f'{{{W}}}pos', '9026')
|
||||
spacing = etree.SubElement(ppr, f'{{{W}}}spacing')
|
||||
spacing.set(f'{{{W}}}before', '120')
|
||||
spacing.set(f'{{{W}}}after', '60')
|
||||
|
||||
if bm_name:
|
||||
hyperlink = etree.SubElement(p, f'{{{W}}}hyperlink')
|
||||
hyperlink.set(f'{{{W}}}anchor', bm_name)
|
||||
hyperlink.set(f'{{{W}}}history', '1')
|
||||
|
||||
r_text = etree.SubElement(hyperlink, f'{{{W}}}r')
|
||||
rpr = etree.SubElement(r_text, f'{{{W}}}rPr')
|
||||
rstyle = etree.SubElement(rpr, f'{{{W}}}rStyle')
|
||||
rstyle.set(f'{{{W}}}val', 'Hyperlink')
|
||||
t = etree.SubElement(r_text, f'{{{W}}}t')
|
||||
t.text = text_raw
|
||||
|
||||
r_tab = etree.SubElement(hyperlink, f'{{{W}}}r')
|
||||
etree.SubElement(r_tab, f'{{{W}}}tab')
|
||||
|
||||
r_begin = etree.SubElement(hyperlink, f'{{{W}}}r')
|
||||
fc_begin = etree.SubElement(r_begin, f'{{{W}}}fldChar')
|
||||
fc_begin.set(f'{{{W}}}fldCharType', 'begin')
|
||||
|
||||
r_instr = etree.SubElement(hyperlink, f'{{{W}}}r')
|
||||
instr = etree.SubElement(r_instr, f'{{{W}}}instrText')
|
||||
instr.set('{http://www.w3.org/XML/1998/namespace}space', 'preserve')
|
||||
instr.text = f' PAGEREF {bm_name} \\h '
|
||||
|
||||
r_sep = etree.SubElement(hyperlink, f'{{{W}}}r')
|
||||
fc_sep = etree.SubElement(r_sep, f'{{{W}}}fldChar')
|
||||
fc_sep.set(f'{{{W}}}fldCharType', 'separate')
|
||||
|
||||
r_page = etree.SubElement(hyperlink, f'{{{W}}}r')
|
||||
t_page = etree.SubElement(r_page, f'{{{W}}}t')
|
||||
t_page.text = str(page)
|
||||
|
||||
r_end = etree.SubElement(hyperlink, f'{{{W}}}r')
|
||||
fc_end = etree.SubElement(r_end, f'{{{W}}}fldChar')
|
||||
fc_end.set(f'{{{W}}}fldCharType', 'end')
|
||||
else:
|
||||
r_text = etree.SubElement(p, f'{{{W}}}r')
|
||||
t = etree.SubElement(r_text, f'{{{W}}}t')
|
||||
t.text = text_raw
|
||||
|
||||
r_tab = etree.SubElement(p, f'{{{W}}}r')
|
||||
etree.SubElement(r_tab, f'{{{W}}}tab')
|
||||
|
||||
r_page = etree.SubElement(p, f'{{{W}}}r')
|
||||
t_page = etree.SubElement(r_page, f'{{{W}}}t')
|
||||
t_page.text = str(page)
|
||||
|
||||
placeholders_inserted = len(entries)
|
||||
print(f"Inserted {placeholders_inserted} TOC placeholder entries")
|
||||
|
||||
# Serialize back to string
|
||||
result = etree.tostring(root, xml_declaration=True, encoding='UTF-8', standalone=True)
|
||||
return result.decode('utf-8')
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Add placeholder entries to Table of Contents in a DOCX file (in-place)'
|
||||
)
|
||||
parser.add_argument('docx_file', help='DOCX file to modify (will be replaced)')
|
||||
parser.add_argument(
|
||||
'--auto', action='store_true',
|
||||
help='Auto-extract Heading 1-3 from the DOCX as TOC entries (recommended)'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--entries',
|
||||
help='JSON string with placeholder entries: [{"level":1,"text":"Chapter 1","page":"1"}]'
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Determine entries
|
||||
entries = None
|
||||
if args.entries:
|
||||
try:
|
||||
entries = json.loads(args.entries)
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"Error parsing entries JSON: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
elif args.auto or True:
|
||||
# Default to auto mode — always extract from document headings
|
||||
entries = _extract_headings_from_docx(args.docx_file)
|
||||
if entries:
|
||||
print(f"Auto-extracted {len(entries)} headings from document", file=sys.stderr)
|
||||
else:
|
||||
print("No headings found in document, using minimal placeholder", file=sys.stderr)
|
||||
entries = [{"level": 1, "text": "Contents", "page": "1"}]
|
||||
|
||||
# Add placeholders
|
||||
try:
|
||||
add_toc_placeholders(args.docx_file, entries)
|
||||
print(f"Successfully added TOC placeholders to {args.docx_file}")
|
||||
except RuntimeError as e:
|
||||
# TOC structure errors — hard fail with exit code 1
|
||||
print(f"ERROR: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
1333
skills/docx/scripts/document.py
Executable file
1333
skills/docx/scripts/document.py
Executable file
File diff suppressed because it is too large
Load Diff
807
skills/docx/scripts/postcheck.py
Executable file
807
skills/docx/scripts/postcheck.py
Executable file
@@ -0,0 +1,807 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
postcheck.py — Document business rule self-check script
|
||||
|
||||
Unlike traditional OpenXML Schema validation, this script does not check XML legality.
|
||||
Instead, it checks document "visual quality" and "typesetting correctness" — issues visible to the human eye.
|
||||
|
||||
Usage:
|
||||
python3 postcheck.py output.docx [--fix] [--json]
|
||||
|
||||
Checks:
|
||||
1. Blank page detection — trailing/middle excess blank pages, double page breaks, consecutive empty paragraphs
|
||||
2. Line spacing consistency — whether body paragraph line spacing is uniform
|
||||
3. Table margins — whether cells have padding set
|
||||
4. Table pagination control — whether header rows have tblHeader set, data rows have cantSplit
|
||||
5. Image overflow — whether image width exceeds page usable area
|
||||
6. Font fallback — whether fonts are used that may be missing on target systems
|
||||
7. CJK indentation — whether Chinese body text has first-line indent (excluding table cells, lists, centered paragraphs)
|
||||
8. Heading level continuity — whether headings skip levels (H1→H3 skipping H2)
|
||||
9. Numbering continuity — whether numbered lists have gaps
|
||||
10. Cover separation — whether cover and body are in different sections
|
||||
11. ShadingType — whether SOLID is misused causing black cells
|
||||
12. TOC quality — whether TOC field exists, whether headings use standard Heading styles
|
||||
13. Image aspect ratio — whether images are stretched/distorted
|
||||
14. Document cleanliness — whether placeholder text, Markdown syntax, or draft expressions remain
|
||||
15. Report content quality — whether summary exists, whether titles are specific, whether vague conclusions are used
|
||||
"""
|
||||
|
||||
import zipfile
|
||||
import sys
|
||||
import json
|
||||
import re
|
||||
from pathlib import Path
|
||||
from xml.etree import ElementTree as ET
|
||||
|
||||
NS = {
|
||||
"w": "http://schemas.openxmlformats.org/wordprocessingml/2006/main",
|
||||
"r": "http://schemas.openxmlformats.org/officeDocument/2006/relationships",
|
||||
"wp": "http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing",
|
||||
"a": "http://schemas.openxmlformats.org/drawingml/2006/main",
|
||||
"pic": "http://schemas.openxmlformats.org/drawingml/2006/picture",
|
||||
}
|
||||
|
||||
|
||||
class CheckResult:
|
||||
def __init__(self, name: str, passed: bool, message: str, severity: str = "warning"):
|
||||
self.name = name
|
||||
self.passed = passed
|
||||
self.message = message
|
||||
self.severity = severity # "error" | "warning" | "info"
|
||||
|
||||
def to_dict(self):
|
||||
return {
|
||||
"name": self.name,
|
||||
"passed": self.passed,
|
||||
"message": self.message,
|
||||
"severity": self.severity,
|
||||
}
|
||||
|
||||
def __str__(self):
|
||||
icon = "✅" if self.passed else ("❌" if self.severity == "error" else "⚠️")
|
||||
return f"{icon} [{self.name}] {self.message}"
|
||||
|
||||
|
||||
def read_document_xml(docx_path: str) -> ET.Element:
|
||||
"""Read document.xml and return the root element"""
|
||||
with zipfile.ZipFile(docx_path, "r") as z:
|
||||
return ET.fromstring(z.read("word/document.xml"))
|
||||
|
||||
|
||||
def get_sections(root: ET.Element) -> list:
|
||||
"""Extract all sections (located via sectPr)"""
|
||||
body = root.find(".//w:body", NS)
|
||||
if body is None:
|
||||
return []
|
||||
|
||||
sections = []
|
||||
current_children = []
|
||||
|
||||
for child in body:
|
||||
tag = child.tag.split("}")[-1] if "}" in child.tag else child.tag
|
||||
if tag == "sectPr":
|
||||
sections.append({"children": current_children, "sectPr": child})
|
||||
current_children = []
|
||||
else:
|
||||
# Check whether paragraph contains sectPr (section break inside paragraph pPr)
|
||||
ppr_sect = child.find(".//w:pPr/w:sectPr", NS)
|
||||
if ppr_sect is not None:
|
||||
current_children.append(child)
|
||||
sections.append({"children": current_children, "sectPr": ppr_sect})
|
||||
current_children = []
|
||||
else:
|
||||
current_children.append(child)
|
||||
|
||||
# Last section (body-level sectPr)
|
||||
body_sect = body.find("w:sectPr", NS)
|
||||
if body_sect is not None and current_children:
|
||||
sections.append({"children": current_children, "sectPr": body_sect})
|
||||
|
||||
return sections
|
||||
|
||||
|
||||
def check_blank_pages(root: ET.Element) -> CheckResult:
|
||||
"""Detect excess blank pages — multi-pattern detection"""
|
||||
body = root.find(".//w:body", NS)
|
||||
paragraphs = body.findall("w:p", NS)
|
||||
issues = []
|
||||
|
||||
if not paragraphs:
|
||||
return CheckResult("blank-pages", True, "No paragraph content")
|
||||
|
||||
# Check 1: Whether the last paragraph only has a page break
|
||||
last_p = paragraphs[-1]
|
||||
runs = last_p.findall(".//w:r", NS)
|
||||
has_page_break = False
|
||||
has_text = False
|
||||
for run in runs:
|
||||
br = run.find("w:br", NS)
|
||||
if br is not None and br.get(f"{{{NS['w']}}}type") == "page":
|
||||
has_page_break = True
|
||||
t = run.find("w:t", NS)
|
||||
if t is not None and t.text and t.text.strip():
|
||||
has_text = True
|
||||
if has_page_break and not has_text:
|
||||
issues.append("Trailing page break at document end may cause blank page")
|
||||
|
||||
# Check 2: Consecutive empty paragraphs (≥5 consecutive may form visual blank page)
|
||||
consecutive_empty = 0
|
||||
max_empty = 0
|
||||
max_empty_pos = 0
|
||||
for idx, p in enumerate(paragraphs):
|
||||
texts = p.findall(".//w:t", NS)
|
||||
has_any_text = any(t.text and t.text.strip() for t in texts)
|
||||
has_br = any(
|
||||
br.get(f"{{{NS['w']}}}type") == "page"
|
||||
for br in p.findall(".//w:br", NS)
|
||||
)
|
||||
has_drawing = p.find(".//{http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing}inline", None) is not None
|
||||
if not has_any_text and not has_br and not has_drawing:
|
||||
consecutive_empty += 1
|
||||
if consecutive_empty > max_empty:
|
||||
max_empty = consecutive_empty
|
||||
max_empty_pos = idx
|
||||
else:
|
||||
consecutive_empty = 0
|
||||
|
||||
if max_empty >= 5:
|
||||
issues.append(f"Found {max_empty} consecutive empty paragraphs (starting around paragraph {max_empty_pos - max_empty + 2}), may form visual blank page")
|
||||
|
||||
# Check 3: Double page break at section boundary (PageBreak at section end + NEXT_PAGE in next section)
|
||||
sections = get_sections(root)
|
||||
for i in range(len(sections) - 1):
|
||||
sec_children = sections[i]["children"]
|
||||
if not sec_children:
|
||||
continue
|
||||
# Check whether the last paragraph of the section contains PageBreak
|
||||
last_child = sec_children[-1]
|
||||
if last_child.tag == f"{{{NS['w']}}}p":
|
||||
for br in last_child.findall(".//w:br", NS):
|
||||
if br.get(f"{{{NS['w']}}}type") == "page":
|
||||
# Check whether the next section is NEXT_PAGE
|
||||
next_sect_pr = sections[i + 1]["sectPr"]
|
||||
sect_type = next_sect_pr.find("w:type", NS)
|
||||
if sect_type is not None and sect_type.get(f"{{{NS['w']}}}val") == "nextPage":
|
||||
issues.append(f"Section {i+1} ends with PageBreak and Section {i+2} is type nextPage, double page break causes blank page")
|
||||
|
||||
# Check 4: Empty paragraph + PageBreak (paragraph has only PageBreak, no text)
|
||||
# Exclude section-ending PageBreaks — they are normal section separators
|
||||
# (e.g., cover page ending with an empty para + PageBreak before a new section)
|
||||
section_last_paras = set()
|
||||
for sec in sections:
|
||||
children = sec["children"]
|
||||
if children:
|
||||
last_child = children[-1]
|
||||
section_last_paras.add(id(last_child))
|
||||
|
||||
empty_pb_count = 0
|
||||
for p in paragraphs[:-1]: # Last paragraph already handled in Check 1
|
||||
if id(p) in section_last_paras:
|
||||
continue # Skip section-ending paragraphs (normal section breaks)
|
||||
runs = p.findall(".//w:r", NS)
|
||||
p_has_break = False
|
||||
p_has_text = False
|
||||
for run in runs:
|
||||
br = run.find("w:br", NS)
|
||||
if br is not None and br.get(f"{{{NS['w']}}}type") == "page":
|
||||
p_has_break = True
|
||||
t = run.find("w:t", NS)
|
||||
if t is not None and t.text and t.text.strip():
|
||||
p_has_text = True
|
||||
if p_has_break and not p_has_text:
|
||||
empty_pb_count += 1
|
||||
|
||||
if empty_pb_count > 0:
|
||||
issues.append(f"Found {empty_pb_count} empty paragraphs with PageBreak (suggest attaching PageBreak to content paragraphs)")
|
||||
|
||||
# Separate hard errors from soft warnings
|
||||
hard_issues = [i for i in issues if "double page break" in i.lower() or "trailing page break" in i.lower() or "consecutive" in i.lower()]
|
||||
soft_issues = [i for i in issues if i not in hard_issues]
|
||||
|
||||
if hard_issues:
|
||||
return CheckResult(
|
||||
"blank-pages", False,
|
||||
"; ".join(hard_issues[:3]),
|
||||
"error"
|
||||
)
|
||||
if soft_issues:
|
||||
return CheckResult(
|
||||
"blank-pages", False,
|
||||
"; ".join(soft_issues[:3]),
|
||||
"warning"
|
||||
)
|
||||
|
||||
return CheckResult("blank-pages", True, "No blank page issues detected")
|
||||
|
||||
|
||||
def check_line_spacing(root: ET.Element) -> CheckResult:
|
||||
"""Check body paragraph line spacing consistency"""
|
||||
body = root.find(".//w:body", NS)
|
||||
paragraphs = body.findall(".//w:p", NS)
|
||||
|
||||
spacing_values = {}
|
||||
body_para_count = 0
|
||||
|
||||
for p in paragraphs:
|
||||
ppr = p.find("w:pPr", NS)
|
||||
# Skip heading paragraphs
|
||||
if ppr is not None:
|
||||
style = ppr.find("w:pStyle", NS)
|
||||
if style is not None:
|
||||
val = style.get(f"{{{NS['w']}}}val", "")
|
||||
if val.startswith("Heading") or val == "Title":
|
||||
continue
|
||||
|
||||
spacing = ppr.find("w:spacing", NS) if ppr is not None else None
|
||||
line_val = spacing.get(f"{{{NS['w']}}}line") if spacing is not None else None
|
||||
|
||||
# Only count paragraphs with text content
|
||||
texts = p.findall(".//w:t", NS)
|
||||
if not any(t.text and t.text.strip() for t in texts):
|
||||
continue
|
||||
|
||||
body_para_count += 1
|
||||
key = line_val or "default"
|
||||
spacing_values[key] = spacing_values.get(key, 0) + 1
|
||||
|
||||
if body_para_count == 0:
|
||||
return CheckResult("line-spacing", True, "No body paragraphs")
|
||||
|
||||
if len(spacing_values) <= 1:
|
||||
dominant = list(spacing_values.keys())[0] if spacing_values else "default"
|
||||
return CheckResult("line-spacing", True, f"Line spacing uniform (line={dominant})")
|
||||
|
||||
# Find the most common line spacing
|
||||
dominant = max(spacing_values, key=spacing_values.get)
|
||||
inconsistent = sum(v for k, v in spacing_values.items() if k != dominant)
|
||||
total = sum(spacing_values.values())
|
||||
|
||||
if inconsistent / total > 0.2:
|
||||
return CheckResult(
|
||||
"line-spacing", False,
|
||||
f"Line spacing inconsistent: {dict(spacing_values)}, {inconsistent}/{total} paragraphs differ from dominant spacing {dominant}",
|
||||
"warning"
|
||||
)
|
||||
|
||||
return CheckResult("line-spacing", True, f"Line spacing mostly uniform (line={dominant}, {inconsistent} exceptions)")
|
||||
|
||||
|
||||
|
||||
def check_image_overflow(root: ET.Element) -> CheckResult:
|
||||
"""Check whether image width may exceed page bounds"""
|
||||
# Get page width
|
||||
sect_pr = root.find(".//w:body/w:sectPr", NS)
|
||||
page_width = 11906 # A4 default
|
||||
margin_left = 1701
|
||||
margin_right = 1417
|
||||
|
||||
if sect_pr is not None:
|
||||
pg_sz = sect_pr.find("w:pgSz", NS)
|
||||
pg_mar = sect_pr.find("w:pgMar", NS)
|
||||
if pg_sz is not None:
|
||||
page_width = int(pg_sz.get(f"{{{NS['w']}}}w", "11906"))
|
||||
if pg_mar is not None:
|
||||
margin_left = int(pg_mar.get(f"{{{NS['w']}}}left", "1701"))
|
||||
margin_right = int(pg_mar.get(f"{{{NS['w']}}}right", "1417"))
|
||||
|
||||
usable_width_emu = (page_width - margin_left - margin_right) * 635 # twips → EMU
|
||||
|
||||
drawings = root.findall(".//wp:inline", NS) + root.findall(".//wp:anchor", NS)
|
||||
oversized = 0
|
||||
|
||||
for dwg in drawings:
|
||||
extent = dwg.find("wp:extent", NS)
|
||||
if extent is not None:
|
||||
cx = int(extent.get("cx", "0"))
|
||||
if cx > usable_width_emu * 1.05: # 5% tolerance
|
||||
oversized += 1
|
||||
|
||||
if oversized > 0:
|
||||
return CheckResult(
|
||||
"image-overflow", False,
|
||||
f"{oversized} images exceed page usable area",
|
||||
"error"
|
||||
)
|
||||
|
||||
return CheckResult(
|
||||
"image-overflow", True,
|
||||
f"All images within page width ({len(drawings)} images)"
|
||||
)
|
||||
|
||||
|
||||
def check_image_aspect_ratio(docx_path: str, root: ET.Element) -> CheckResult:
|
||||
"""Check whether images are stretched/distorted (aspect ratio drift).
|
||||
|
||||
Compares the original aspect ratio of embedded images with the display aspect ratio set in wp:extent.
|
||||
Drift >10% is considered distortion (pie charts becoming elliptical, radar charts becoming diamond-shaped, etc).
|
||||
"""
|
||||
import zipfile as _zf
|
||||
|
||||
# Build a mapping: rId → image file path inside the zip
|
||||
# We need to parse word/_rels/document.xml.rels
|
||||
rid_to_path = {}
|
||||
try:
|
||||
with _zf.ZipFile(docx_path, 'r') as z:
|
||||
rels_path = 'word/_rels/document.xml.rels'
|
||||
if rels_path in z.namelist():
|
||||
rels_xml = z.read(rels_path)
|
||||
rels_root = ET.fromstring(rels_xml)
|
||||
rels_ns = 'http://schemas.openxmlformats.org/package/2006/relationships'
|
||||
for rel in rels_root.findall(f'{{{rels_ns}}}Relationship'):
|
||||
rid = rel.get('Id', '')
|
||||
target = rel.get('Target', '')
|
||||
rel_type = rel.get('Type', '')
|
||||
if 'image' in rel_type:
|
||||
# Target is relative to word/ directory
|
||||
if not target.startswith('/'):
|
||||
img_path = 'word/' + target
|
||||
else:
|
||||
img_path = target.lstrip('/')
|
||||
rid_to_path[rid] = img_path
|
||||
|
||||
# Now check each drawing
|
||||
drawings = root.findall(".//wp:inline", NS) + root.findall(".//wp:anchor", NS)
|
||||
distorted = []
|
||||
|
||||
for dwg in drawings:
|
||||
extent = dwg.find("wp:extent", NS)
|
||||
if extent is None:
|
||||
continue
|
||||
display_cx = int(extent.get("cx", "0"))
|
||||
display_cy = int(extent.get("cy", "0"))
|
||||
if display_cx == 0 or display_cy == 0:
|
||||
continue
|
||||
|
||||
# Find the blip rId
|
||||
blip = dwg.find(".//a:blip", NS)
|
||||
if blip is None:
|
||||
continue
|
||||
r_embed = blip.get(f"{{{NS['r']}}}embed", "")
|
||||
if not r_embed or r_embed not in rid_to_path:
|
||||
continue
|
||||
|
||||
img_zip_path = rid_to_path[r_embed]
|
||||
if img_zip_path not in z.namelist():
|
||||
continue
|
||||
|
||||
# Read actual image dimensions
|
||||
try:
|
||||
img_data = z.read(img_zip_path)
|
||||
from PIL import Image as _PILImage
|
||||
import io as _io
|
||||
pil_img = _PILImage.open(_io.BytesIO(img_data))
|
||||
orig_w, orig_h = pil_img.size
|
||||
if orig_w == 0 or orig_h == 0:
|
||||
continue
|
||||
except Exception:
|
||||
continue
|
||||
|
||||
# Compare aspect ratios
|
||||
orig_ratio = orig_w / orig_h
|
||||
display_ratio = display_cx / display_cy
|
||||
drift = abs(orig_ratio - display_ratio) / orig_ratio
|
||||
|
||||
if drift > 0.10: # >10% distortion
|
||||
pct = drift * 100
|
||||
distorted.append(
|
||||
f"{img_zip_path.split('/')[-1]}: "
|
||||
f"original {orig_w}×{orig_h} (ratio={orig_ratio:.2f}), "
|
||||
f"display ratio={display_ratio:.2f}, distortion {pct:.0f}%"
|
||||
)
|
||||
|
||||
except Exception:
|
||||
return CheckResult(
|
||||
"image-aspect-ratio", True,
|
||||
"Cannot check image aspect ratio (zip read error)",
|
||||
"info"
|
||||
)
|
||||
|
||||
if distorted:
|
||||
detail = "; ".join(distorted[:3])
|
||||
if len(distorted) > 3:
|
||||
detail += f" ...and {len(distorted)} more"
|
||||
return CheckResult(
|
||||
"image-aspect-ratio", False,
|
||||
f"{len(distorted)} images have aspect ratio distortion: {detail}",
|
||||
"warning"
|
||||
)
|
||||
|
||||
img_count = len(drawings)
|
||||
return CheckResult(
|
||||
"image-aspect-ratio", True,
|
||||
f"All images have correct aspect ratio ({img_count} images)"
|
||||
)
|
||||
|
||||
|
||||
def check_font_fallback(root: ET.Element) -> CheckResult:
|
||||
"""Check whether potentially missing fonts are used"""
|
||||
SAFE_FONTS = {
|
||||
# Chinese
|
||||
"宋体", "SimSun", "黑体", "SimHei", "微软雅黑", "Microsoft YaHei",
|
||||
"仿宋", "FangSong", "FangSong_GB2312", "楷体", "KaiTi",
|
||||
# English
|
||||
"Times New Roman", "Arial", "Calibri", "Helvetica",
|
||||
"Courier New", "Georgia", "Verdana", "Tahoma",
|
||||
# Universal
|
||||
"Symbol", "Wingdings",
|
||||
}
|
||||
|
||||
fonts_used = set()
|
||||
for rpr in root.findall(".//w:rPr", NS):
|
||||
for font_tag in ["w:rFonts"]:
|
||||
rf = rpr.find(font_tag, NS)
|
||||
if rf is not None:
|
||||
for attr in ["ascii", "eastAsia", "hAnsi", "cs"]:
|
||||
f = rf.get(f"{{{NS['w']}}}{attr}")
|
||||
if f:
|
||||
fonts_used.add(f)
|
||||
|
||||
risky = fonts_used - SAFE_FONTS
|
||||
if risky:
|
||||
return CheckResult(
|
||||
"font-fallback", False,
|
||||
f"Following fonts may be missing on target system: {', '.join(sorted(risky))}",
|
||||
"info"
|
||||
)
|
||||
|
||||
return CheckResult("font-fallback", True, f"All fonts are common system fonts ({len(fonts_used)} types)")
|
||||
|
||||
|
||||
|
||||
def check_heading_levels(root: ET.Element) -> CheckResult:
|
||||
"""Check whether headings skip levels"""
|
||||
body = root.find(".//w:body", NS)
|
||||
heading_levels = []
|
||||
|
||||
for p in body.findall(".//w:p", NS):
|
||||
ppr = p.find("w:pPr", NS)
|
||||
if ppr is None:
|
||||
continue
|
||||
style = ppr.find("w:pStyle", NS)
|
||||
if style is None:
|
||||
continue
|
||||
val = style.get(f"{{{NS['w']}}}val", "")
|
||||
m = re.match(r"Heading(\d+)", val)
|
||||
if m:
|
||||
heading_levels.append(int(m.group(1)))
|
||||
|
||||
if len(heading_levels) < 2:
|
||||
return CheckResult("heading-levels", True, "Too few headings, skipping check")
|
||||
|
||||
skips = []
|
||||
for i in range(1, len(heading_levels)):
|
||||
diff = heading_levels[i] - heading_levels[i - 1]
|
||||
if diff > 1:
|
||||
skips.append(f"H{heading_levels[i-1]}→H{heading_levels[i]}")
|
||||
|
||||
if skips:
|
||||
return CheckResult(
|
||||
"heading-levels", False,
|
||||
f"Heading level skip: {', '.join(skips[:5])}",
|
||||
"warning"
|
||||
)
|
||||
|
||||
return CheckResult("heading-levels", True, f"Heading levels continuous ({len(heading_levels)} headings)")
|
||||
|
||||
|
||||
# check_cover_separation removed — false positives on complex covers (>15 elements is normal)
|
||||
|
||||
|
||||
def check_shading_type(root: ET.Element) -> CheckResult:
|
||||
"""Check whether ShadingType.SOLID is misused"""
|
||||
shadings = root.findall(".//w:shd", NS)
|
||||
solid_count = 0
|
||||
|
||||
for shd in shadings:
|
||||
val = shd.get(f"{{{NS['w']}}}val", "")
|
||||
if val == "solid":
|
||||
solid_count += 1
|
||||
|
||||
if solid_count > 0:
|
||||
return CheckResult(
|
||||
"shading-type", False,
|
||||
f"Found {solid_count} instances of ShadingType.SOLID (should be CLEAR), may cause black cells",
|
||||
"error"
|
||||
)
|
||||
|
||||
return CheckResult("shading-type", True, "No ShadingType.SOLID misuse found")
|
||||
|
||||
|
||||
|
||||
def check_toc(root: ET.Element, docx_path: str = "") -> CheckResult:
|
||||
"""Check TOC quality: field existence, headings presence, outlineLvl, updateFields."""
|
||||
body = root.find(".//w:body", NS)
|
||||
if body is None:
|
||||
return CheckResult("toc", True, "Document body is empty, skipping TOC check", "info")
|
||||
|
||||
paragraphs = list(body)
|
||||
w_ns = NS["w"]
|
||||
|
||||
# --- Detect headings and their levels ---
|
||||
heading_count = 0
|
||||
heading_levels_used = set() # e.g. {1, 2, 3}
|
||||
for p in paragraphs:
|
||||
if p.tag != f"{{{w_ns}}}p":
|
||||
continue
|
||||
ppr = p.find(f"{{{w_ns}}}pPr")
|
||||
if ppr is None:
|
||||
continue
|
||||
ps = ppr.find(f"{{{w_ns}}}pStyle")
|
||||
if ps is None:
|
||||
continue
|
||||
val = ps.get(f"{{{w_ns}}}val", "")
|
||||
m = re.match(r"(?i)heading\s*(\d)", val)
|
||||
if m:
|
||||
heading_count += 1
|
||||
heading_levels_used.add(int(m.group(1)))
|
||||
|
||||
# --- Detect TOC field ---
|
||||
has_toc = False
|
||||
for instr in root.findall(f".//{{{w_ns}}}instrText"):
|
||||
if instr.text and "TOC" in instr.text.upper():
|
||||
has_toc = True
|
||||
break
|
||||
if not has_toc:
|
||||
for fld in root.findall(f".//{{{w_ns}}}fldSimple"):
|
||||
if "TOC" in fld.get(f"{{{w_ns}}}instr", "").upper():
|
||||
has_toc = True
|
||||
break
|
||||
# Also check SDT-wrapped TOC
|
||||
if not has_toc:
|
||||
for sdt in root.findall(f".//{{{w_ns}}}sdt"):
|
||||
for instr in sdt.findall(f".//{{{w_ns}}}instrText"):
|
||||
if instr.text and "TOC" in instr.text.upper():
|
||||
has_toc = True
|
||||
break
|
||||
if has_toc:
|
||||
break
|
||||
|
||||
issues = []
|
||||
|
||||
# Check 1: Document has a "目录" / "目 录" / "Table of Contents" title but no TOC field
|
||||
has_toc_title = False
|
||||
toc_title_pattern = re.compile(r'^(?:目\s*录|table\s+of\s+contents|contents)$', re.IGNORECASE)
|
||||
for p in paragraphs:
|
||||
if p.tag != f"{{{w_ns}}}p":
|
||||
continue
|
||||
texts = p.findall(f".//{{{w_ns}}}t")
|
||||
p_text = "".join(t.text or "" for t in texts).strip()
|
||||
if toc_title_pattern.match(p_text):
|
||||
has_toc_title = True
|
||||
break
|
||||
|
||||
if has_toc_title and not has_toc:
|
||||
issues.append("TOC_FIELD_MISSING: document has a TOC title but no TOC field element — add TableOfContents in code")
|
||||
|
||||
# Check 2: TOC field exists but no headings in document → TOC will be empty after update
|
||||
if has_toc and heading_count == 0:
|
||||
issues.append("TOC_NO_HEADINGS: TOC field exists but document has 0 Heading-styled paragraphs — TOC will be empty after update")
|
||||
|
||||
# Check 3 & 4: Read styles.xml and settings.xml from DOCX (only when TOC exists)
|
||||
if has_toc and docx_path:
|
||||
try:
|
||||
import zipfile
|
||||
with zipfile.ZipFile(docx_path, 'r') as zf:
|
||||
# Check 3: outlineLvl missing in Heading styles
|
||||
if 'word/styles.xml' in zf.namelist():
|
||||
styles_content = zf.read('word/styles.xml').decode('utf-8')
|
||||
styles_root = ET.fromstring(styles_content)
|
||||
|
||||
missing_outline = []
|
||||
for level in sorted(heading_levels_used):
|
||||
style_id = f"Heading{level}"
|
||||
# Find <w:style w:styleId="HeadingN">
|
||||
for style_elem in styles_root.findall(f".//{{{w_ns}}}style"):
|
||||
sid = style_elem.get(f"{{{w_ns}}}styleId", "")
|
||||
if sid == style_id:
|
||||
# Check if pPr has outlineLvl
|
||||
ppr = style_elem.find(f"{{{w_ns}}}pPr")
|
||||
has_outline = False
|
||||
if ppr is not None:
|
||||
ol = ppr.find(f"{{{w_ns}}}outlineLvl")
|
||||
if ol is not None:
|
||||
has_outline = True
|
||||
if not has_outline:
|
||||
missing_outline.append(style_id)
|
||||
break
|
||||
|
||||
if missing_outline:
|
||||
issues.append(
|
||||
"TOC_OUTLINE_MISSING: %s style(s) missing outlineLvl — "
|
||||
"Word TOC update won't find these headings. "
|
||||
"Run add_toc_placeholders.py to fix" % ", ".join(missing_outline)
|
||||
)
|
||||
|
||||
# Check 4: updateFields not set to true
|
||||
if 'word/settings.xml' in zf.namelist():
|
||||
settings_content = zf.read('word/settings.xml').decode('utf-8')
|
||||
# Check for <w:updateFields w:val="true"/>
|
||||
update_ok = bool(re.search(
|
||||
r'<w:updateFields\s+[^>]*w:val\s*=\s*"true"',
|
||||
settings_content
|
||||
))
|
||||
if not update_ok:
|
||||
issues.append(
|
||||
"TOC_UPDATE_DISABLED: settings.xml missing updateFields=true — "
|
||||
"Word won't prompt to update TOC on open. "
|
||||
"Run add_toc_placeholders.py to fix"
|
||||
)
|
||||
except Exception as e:
|
||||
issues.append(f"TOC_CHECK_ERROR: failed to read styles/settings from DOCX: {e}")
|
||||
|
||||
if not issues:
|
||||
if has_toc:
|
||||
return CheckResult("toc", True, "TOC field present and update-ready")
|
||||
else:
|
||||
return CheckResult("toc", True, "No TOC needed")
|
||||
|
||||
severity = "error" if any(k in i for i in issues for k in ("FIELD_MISSING", "NO_HEADINGS", "OUTLINE_MISSING")) else "warning"
|
||||
return CheckResult("toc", False, "; ".join(issues[:5]), severity)
|
||||
|
||||
|
||||
|
||||
|
||||
def check_cover_overflow(root: ET.Element) -> CheckResult:
|
||||
"""Detect cover section issues: oversized fonts, excessive spacing, trailing empty content."""
|
||||
sections = get_sections(root)
|
||||
if not sections:
|
||||
return CheckResult("cover-overflow", True, "No sections found")
|
||||
|
||||
sec0 = sections[0]
|
||||
sect_pr = sec0["sectPr"]
|
||||
|
||||
# Get page dimensions and margins for accurate available height calculation
|
||||
pg_sz = sect_pr.find("w:pgSz", NS)
|
||||
pg_mar = sect_pr.find("w:pgMar", NS)
|
||||
page_height = int(pg_sz.get(f"{{{NS['w']}}}h", "16838")) if pg_sz is not None else 16838
|
||||
margin_top = int(pg_mar.get(f"{{{NS['w']}}}top", "0")) if pg_mar is not None else 0
|
||||
margin_bottom = int(pg_mar.get(f"{{{NS['w']}}}bottom", "0")) if pg_mar is not None else 0
|
||||
|
||||
issues = []
|
||||
children = sec0["children"]
|
||||
|
||||
# Check 1: Oversized font in cover section (> 44pt = 88 half-points = 889000 EMU)
|
||||
max_font_size = 0
|
||||
for child in children:
|
||||
for sz in child.findall(".//" + f"{{{NS['w']}}}sz"):
|
||||
val = sz.get(f"{{{NS['w']}}}val")
|
||||
if val and val.isdigit():
|
||||
size_hp = int(val)
|
||||
if size_hp > max_font_size:
|
||||
max_font_size = size_hp
|
||||
|
||||
if max_font_size > 88: # 88 half-points = 44pt
|
||||
issues.append(
|
||||
f"Cover has font size {max_font_size // 2}pt (>{44}pt max). "
|
||||
f"Use calcTitleLayout() for dynamic sizing"
|
||||
)
|
||||
|
||||
# Check 2: Excessive spacing.before in cover section (> 5000 twips)
|
||||
max_spacing = 0
|
||||
for child in children:
|
||||
for sp in child.findall(".//" + f"{{{NS['w']}}}spacing"):
|
||||
before = sp.get(f"{{{NS['w']}}}before")
|
||||
if before and before.isdigit():
|
||||
val = int(before)
|
||||
if val > max_spacing:
|
||||
max_spacing = val
|
||||
|
||||
if max_spacing > 5000:
|
||||
issues.append(
|
||||
f"Cover has spacing.before={max_spacing} twips (>5000 max). "
|
||||
f"Use calcCoverSpacing() for dynamic spacing"
|
||||
)
|
||||
|
||||
# Check 3: Trailing empty paragraphs in cover section
|
||||
trailing_empty = 0
|
||||
for child in reversed(children):
|
||||
tag = child.tag.split("}")[-1] if "}" in child.tag else child.tag
|
||||
if tag != "p":
|
||||
break
|
||||
texts = child.findall(".//" + f"{{{NS['w']}}}t")
|
||||
has_text = any(t.text and t.text.strip() for t in texts)
|
||||
if not has_text:
|
||||
trailing_empty += 1
|
||||
else:
|
||||
break
|
||||
|
||||
if trailing_empty > 2:
|
||||
issues.append(
|
||||
f"Cover section ends with {trailing_empty} empty paragraphs (max 2 allowed) — "
|
||||
f"excessive empty paragraphs may cause blank page after cover"
|
||||
)
|
||||
|
||||
if issues:
|
||||
return CheckResult(
|
||||
"cover-overflow", False,
|
||||
"; ".join(issues),
|
||||
"error"
|
||||
)
|
||||
|
||||
return CheckResult("cover-overflow", True, "Cover section layout looks OK")
|
||||
|
||||
|
||||
def run_all_checks(docx_path: str) -> list[CheckResult]:
|
||||
"""Run all checks"""
|
||||
root = read_document_xml(docx_path)
|
||||
|
||||
checks = [
|
||||
check_blank_pages,
|
||||
check_cover_overflow,
|
||||
check_line_spacing,
|
||||
check_image_overflow,
|
||||
check_font_fallback,
|
||||
check_heading_levels,
|
||||
check_shading_type,
|
||||
]
|
||||
|
||||
results = []
|
||||
for check_fn in checks:
|
||||
try:
|
||||
results.append(check_fn(root))
|
||||
except Exception as e:
|
||||
results.append(CheckResult(
|
||||
check_fn.__name__.replace("check_", ""),
|
||||
False,
|
||||
f"Check error: {e}",
|
||||
"error"
|
||||
))
|
||||
|
||||
# TOC check needs both root and docx_path
|
||||
try:
|
||||
results.append(check_toc(root, docx_path))
|
||||
except Exception as e:
|
||||
results.append(CheckResult("toc", False, f"Check error: {e}", "error"))
|
||||
|
||||
# Image aspect ratio check needs both root and docx_path
|
||||
try:
|
||||
results.append(check_image_aspect_ratio(docx_path, root))
|
||||
except Exception as e:
|
||||
results.append(CheckResult("image-aspect-ratio", False, f"Check error: {e}", "error"))
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def main():
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser(description="docx business rule self-check")
|
||||
parser.add_argument("docx_path", help="Path to the .docx file to check")
|
||||
parser.add_argument("--json", action="store_true", help="Output in JSON format")
|
||||
parser.add_argument("--strict", action="store_true", help="Treat warnings as failures")
|
||||
args = parser.parse_args()
|
||||
|
||||
if not Path(args.docx_path).exists():
|
||||
print(f"❌ File not found: {args.docx_path}")
|
||||
sys.exit(1)
|
||||
|
||||
results = run_all_checks(args.docx_path)
|
||||
|
||||
if args.json:
|
||||
print(json.dumps([r.to_dict() for r in results], ensure_ascii=False, indent=2))
|
||||
else:
|
||||
print(f"\n📋 Document self-check report: {args.docx_path}\n")
|
||||
for r in results:
|
||||
print(f" {r}")
|
||||
|
||||
passed = sum(1 for r in results if r.passed)
|
||||
total = len(results)
|
||||
errors = sum(1 for r in results if not r.passed and r.severity == "error")
|
||||
warnings = sum(1 for r in results if not r.passed and r.severity == "warning")
|
||||
|
||||
print(f"\n {'─' * 50}")
|
||||
print(f" Passed {passed}/{total} | ❌ {errors} errors | ⚠️ {warnings} warnings\n")
|
||||
|
||||
# Exit code
|
||||
has_errors = any(not r.passed and r.severity == "error" for r in results)
|
||||
has_warnings = any(not r.passed and r.severity == "warning" for r in results)
|
||||
|
||||
if has_errors:
|
||||
sys.exit(2)
|
||||
elif args.strict and has_warnings:
|
||||
sys.exit(1)
|
||||
else:
|
||||
sys.exit(0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
3
skills/docx/scripts/templates/comments.xml
Executable file
3
skills/docx/scripts/templates/comments.xml
Executable file
@@ -0,0 +1,3 @@
|
||||
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
|
||||
<w:comments xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:oel="http://schemas.microsoft.com/office/2019/extlst" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16du="http://schemas.microsoft.com/office/word/2023/wordml/word16du" xmlns:w16sdtdh="http://schemas.microsoft.com/office/word/2020/wordml/sdtdatahash" xmlns:w16sdtfl="http://schemas.microsoft.com/office/word/2024/wordml/sdtformatlock" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid w16 w16cex w16sdtdh w16sdtfl w16du wp14">
|
||||
</w:comments>
|
||||
3
skills/docx/scripts/templates/commentsExtended.xml
Executable file
3
skills/docx/scripts/templates/commentsExtended.xml
Executable file
@@ -0,0 +1,3 @@
|
||||
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
|
||||
<w15:commentsEx xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:oel="http://schemas.microsoft.com/office/2019/extlst" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16du="http://schemas.microsoft.com/office/word/2023/wordml/word16du" xmlns:w16sdtdh="http://schemas.microsoft.com/office/word/2020/wordml/sdtdatahash" xmlns:w16sdtfl="http://schemas.microsoft.com/office/word/2024/wordml/sdtformatlock" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid w16 w16cex w16sdtdh w16sdtfl w16du wp14">
|
||||
</w15:commentsEx>
|
||||
3
skills/docx/scripts/templates/commentsExtensible.xml
Executable file
3
skills/docx/scripts/templates/commentsExtensible.xml
Executable file
@@ -0,0 +1,3 @@
|
||||
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
|
||||
<w16cex:commentsExtensible xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:oel="http://schemas.microsoft.com/office/2019/extlst" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16du="http://schemas.microsoft.com/office/word/2023/wordml/word16du" xmlns:w16sdtdh="http://schemas.microsoft.com/office/word/2020/wordml/sdtdatahash" xmlns:w16sdtfl="http://schemas.microsoft.com/office/word/2024/wordml/sdtformatlock" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" xmlns:cr="http://schemas.microsoft.com/office/comments/2020/reactions" mc:Ignorable="w14 w15 w16se w16cid w16 w16cex w16sdtdh w16sdtfl cr w16du wp14">
|
||||
</w16cex:commentsExtensible>
|
||||
3
skills/docx/scripts/templates/commentsIds.xml
Executable file
3
skills/docx/scripts/templates/commentsIds.xml
Executable file
@@ -0,0 +1,3 @@
|
||||
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
|
||||
<w16cid:commentsIds xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:oel="http://schemas.microsoft.com/office/2019/extlst" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16du="http://schemas.microsoft.com/office/word/2023/wordml/word16du" xmlns:w16sdtdh="http://schemas.microsoft.com/office/word/2020/wordml/sdtdatahash" xmlns:w16sdtfl="http://schemas.microsoft.com/office/word/2024/wordml/sdtformatlock" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid w16 w16cex w16sdtdh w16sdtfl w16du wp14">
|
||||
</w16cid:commentsIds>
|
||||
3
skills/docx/scripts/templates/people.xml
Executable file
3
skills/docx/scripts/templates/people.xml
Executable file
@@ -0,0 +1,3 @@
|
||||
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
|
||||
<w15:people xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml">
|
||||
</w15:people>
|
||||
374
skills/docx/scripts/utilities.py
Executable file
374
skills/docx/scripts/utilities.py
Executable file
@@ -0,0 +1,374 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Utilities for editing OOXML documents.
|
||||
|
||||
This module provides XMLEditor, a tool for manipulating XML files with support for
|
||||
line-number-based node finding and DOM manipulation. Each element is automatically
|
||||
annotated with its original line and column position during parsing.
|
||||
|
||||
Example usage:
|
||||
editor = XMLEditor("document.xml")
|
||||
|
||||
# Find node by line number or range
|
||||
elem = editor.get_node(tag="w:r", line_number=519)
|
||||
elem = editor.get_node(tag="w:p", line_number=range(100, 200))
|
||||
|
||||
# Find node by text content
|
||||
elem = editor.get_node(tag="w:p", contains="specific text")
|
||||
|
||||
# Find node by attributes
|
||||
elem = editor.get_node(tag="w:r", attrs={"w:id": "target"})
|
||||
|
||||
# Combine filters
|
||||
elem = editor.get_node(tag="w:p", line_number=range(1, 50), contains="text")
|
||||
|
||||
# Replace, insert, or manipulate
|
||||
new_elem = editor.replace_node(elem, "<w:r><w:t>new text</w:t></w:r>")
|
||||
editor.insert_after(new_elem, "<w:r><w:t>more</w:t></w:r>")
|
||||
|
||||
# Save changes
|
||||
editor.save()
|
||||
"""
|
||||
|
||||
import html
|
||||
from pathlib import Path
|
||||
from typing import Optional, Union
|
||||
|
||||
import defusedxml.minidom
|
||||
import defusedxml.sax
|
||||
|
||||
|
||||
class XMLEditor:
|
||||
"""
|
||||
Editor for manipulating OOXML XML files with line-number-based node finding.
|
||||
|
||||
This class parses XML files and tracks the original line and column position
|
||||
of each element. This enables finding nodes by their line number in the original
|
||||
file, which is useful when working with Read tool output.
|
||||
|
||||
Attributes:
|
||||
xml_path: Path to the XML file being edited
|
||||
encoding: Detected encoding of the XML file ('ascii' or 'utf-8')
|
||||
dom: Parsed DOM tree with parse_position attributes on elements
|
||||
"""
|
||||
|
||||
def __init__(self, xml_path):
|
||||
"""
|
||||
Initialize with path to XML file and parse with line number tracking.
|
||||
|
||||
Args:
|
||||
xml_path: Path to XML file to edit (str or Path)
|
||||
|
||||
Raises:
|
||||
ValueError: If the XML file does not exist
|
||||
"""
|
||||
self.xml_path = Path(xml_path)
|
||||
if not self.xml_path.exists():
|
||||
raise ValueError(f"XML file not found: {xml_path}")
|
||||
|
||||
with open(self.xml_path, "rb") as f:
|
||||
header = f.read(200).decode("utf-8", errors="ignore")
|
||||
self.encoding = "ascii" if 'encoding="ascii"' in header else "utf-8"
|
||||
|
||||
parser = _create_line_tracking_parser()
|
||||
self.dom = defusedxml.minidom.parse(str(self.xml_path), parser)
|
||||
|
||||
def get_node(
|
||||
self,
|
||||
tag: str,
|
||||
attrs: Optional[dict[str, str]] = None,
|
||||
line_number: Optional[Union[int, range]] = None,
|
||||
contains: Optional[str] = None,
|
||||
):
|
||||
"""
|
||||
Get a DOM element by tag and identifier.
|
||||
|
||||
Finds an element by either its line number in the original file or by
|
||||
matching attribute values. Exactly one match must be found.
|
||||
|
||||
Args:
|
||||
tag: The XML tag name (e.g., "w:del", "w:ins", "w:r")
|
||||
attrs: Dictionary of attribute name-value pairs to match (e.g., {"w:id": "1"})
|
||||
line_number: Line number (int) or line range (range) in original XML file (1-indexed)
|
||||
contains: Text string that must appear in any text node within the element.
|
||||
Supports both entity notation (“) and Unicode characters (\u201c).
|
||||
|
||||
Returns:
|
||||
defusedxml.minidom.Element: The matching DOM element
|
||||
|
||||
Raises:
|
||||
ValueError: If node not found or multiple matches found
|
||||
|
||||
Example:
|
||||
elem = editor.get_node(tag="w:r", line_number=519)
|
||||
elem = editor.get_node(tag="w:r", line_number=range(100, 200))
|
||||
elem = editor.get_node(tag="w:del", attrs={"w:id": "1"})
|
||||
elem = editor.get_node(tag="w:p", attrs={"w14:paraId": "12345678"})
|
||||
elem = editor.get_node(tag="w:commentRangeStart", attrs={"w:id": "0"})
|
||||
elem = editor.get_node(tag="w:p", contains="specific text")
|
||||
elem = editor.get_node(tag="w:t", contains="“Agreement") # Entity notation
|
||||
elem = editor.get_node(tag="w:t", contains="\u201cAgreement") # Unicode character
|
||||
"""
|
||||
matches = []
|
||||
for elem in self.dom.getElementsByTagName(tag):
|
||||
# Check line_number filter
|
||||
if line_number is not None:
|
||||
parse_pos = getattr(elem, "parse_position", (None,))
|
||||
elem_line = parse_pos[0]
|
||||
|
||||
# Handle both single line number and range
|
||||
if isinstance(line_number, range):
|
||||
if elem_line not in line_number:
|
||||
continue
|
||||
else:
|
||||
if elem_line != line_number:
|
||||
continue
|
||||
|
||||
# Check attrs filter
|
||||
if attrs is not None:
|
||||
if not all(
|
||||
elem.getAttribute(attr_name) == attr_value
|
||||
for attr_name, attr_value in attrs.items()
|
||||
):
|
||||
continue
|
||||
|
||||
# Check contains filter
|
||||
if contains is not None:
|
||||
elem_text = self._get_element_text(elem)
|
||||
# Normalize the search string: convert HTML entities to Unicode characters
|
||||
# This allows searching for both "“Rowan" and ""Rowan"
|
||||
normalized_contains = html.unescape(contains)
|
||||
if normalized_contains not in elem_text:
|
||||
continue
|
||||
|
||||
# If all applicable filters passed, this is a match
|
||||
matches.append(elem)
|
||||
|
||||
if not matches:
|
||||
# Build descriptive error message
|
||||
filters = []
|
||||
if line_number is not None:
|
||||
line_str = (
|
||||
f"lines {line_number.start}-{line_number.stop - 1}"
|
||||
if isinstance(line_number, range)
|
||||
else f"line {line_number}"
|
||||
)
|
||||
filters.append(f"at {line_str}")
|
||||
if attrs is not None:
|
||||
filters.append(f"with attributes {attrs}")
|
||||
if contains is not None:
|
||||
filters.append(f"containing '{contains}'")
|
||||
|
||||
filter_desc = " ".join(filters) if filters else ""
|
||||
base_msg = f"Node not found: <{tag}> {filter_desc}".strip()
|
||||
|
||||
# Add helpful hint based on filters used
|
||||
if contains:
|
||||
hint = "Text may be split across elements or use different wording."
|
||||
elif line_number:
|
||||
hint = "Line numbers may have changed if document was modified."
|
||||
elif attrs:
|
||||
hint = "Verify attribute values are correct."
|
||||
else:
|
||||
hint = "Try adding filters (attrs, line_number, or contains)."
|
||||
|
||||
raise ValueError(f"{base_msg}. {hint}")
|
||||
if len(matches) > 1:
|
||||
raise ValueError(
|
||||
f"Multiple nodes found: <{tag}>. "
|
||||
f"Add more filters (attrs, line_number, or contains) to narrow the search."
|
||||
)
|
||||
return matches[0]
|
||||
|
||||
def _get_element_text(self, elem):
|
||||
"""
|
||||
Recursively extract all text content from an element.
|
||||
|
||||
Skips text nodes that contain only whitespace (spaces, tabs, newlines),
|
||||
which typically represent XML formatting rather than document content.
|
||||
|
||||
Args:
|
||||
elem: defusedxml.minidom.Element to extract text from
|
||||
|
||||
Returns:
|
||||
str: Concatenated text from all non-whitespace text nodes within the element
|
||||
"""
|
||||
text_parts = []
|
||||
for node in elem.childNodes:
|
||||
if node.nodeType == node.TEXT_NODE:
|
||||
# Skip whitespace-only text nodes (XML formatting)
|
||||
if node.data.strip():
|
||||
text_parts.append(node.data)
|
||||
elif node.nodeType == node.ELEMENT_NODE:
|
||||
text_parts.append(self._get_element_text(node))
|
||||
return "".join(text_parts)
|
||||
|
||||
def replace_node(self, elem, new_content):
|
||||
"""
|
||||
Replace a DOM element with new XML content.
|
||||
|
||||
Args:
|
||||
elem: defusedxml.minidom.Element to replace
|
||||
new_content: String containing XML to replace the node with
|
||||
|
||||
Returns:
|
||||
List[defusedxml.minidom.Node]: All inserted nodes
|
||||
|
||||
Example:
|
||||
new_nodes = editor.replace_node(old_elem, "<w:r><w:t>text</w:t></w:r>")
|
||||
"""
|
||||
parent = elem.parentNode
|
||||
nodes = self._parse_fragment(new_content)
|
||||
for node in nodes:
|
||||
parent.insertBefore(node, elem)
|
||||
parent.removeChild(elem)
|
||||
return nodes
|
||||
|
||||
def insert_after(self, elem, xml_content):
|
||||
"""
|
||||
Insert XML content after a DOM element.
|
||||
|
||||
Args:
|
||||
elem: defusedxml.minidom.Element to insert after
|
||||
xml_content: String containing XML to insert
|
||||
|
||||
Returns:
|
||||
List[defusedxml.minidom.Node]: All inserted nodes
|
||||
|
||||
Example:
|
||||
new_nodes = editor.insert_after(elem, "<w:r><w:t>text</w:t></w:r>")
|
||||
"""
|
||||
parent = elem.parentNode
|
||||
next_sibling = elem.nextSibling
|
||||
nodes = self._parse_fragment(xml_content)
|
||||
for node in nodes:
|
||||
if next_sibling:
|
||||
parent.insertBefore(node, next_sibling)
|
||||
else:
|
||||
parent.appendChild(node)
|
||||
return nodes
|
||||
|
||||
def insert_before(self, elem, xml_content):
|
||||
"""
|
||||
Insert XML content before a DOM element.
|
||||
|
||||
Args:
|
||||
elem: defusedxml.minidom.Element to insert before
|
||||
xml_content: String containing XML to insert
|
||||
|
||||
Returns:
|
||||
List[defusedxml.minidom.Node]: All inserted nodes
|
||||
|
||||
Example:
|
||||
new_nodes = editor.insert_before(elem, "<w:r><w:t>text</w:t></w:r>")
|
||||
"""
|
||||
parent = elem.parentNode
|
||||
nodes = self._parse_fragment(xml_content)
|
||||
for node in nodes:
|
||||
parent.insertBefore(node, elem)
|
||||
return nodes
|
||||
|
||||
def append_to(self, elem, xml_content):
|
||||
"""
|
||||
Append XML content as a child of a DOM element.
|
||||
|
||||
Args:
|
||||
elem: defusedxml.minidom.Element to append to
|
||||
xml_content: String containing XML to append
|
||||
|
||||
Returns:
|
||||
List[defusedxml.minidom.Node]: All inserted nodes
|
||||
|
||||
Example:
|
||||
new_nodes = editor.append_to(elem, "<w:r><w:t>text</w:t></w:r>")
|
||||
"""
|
||||
nodes = self._parse_fragment(xml_content)
|
||||
for node in nodes:
|
||||
elem.appendChild(node)
|
||||
return nodes
|
||||
|
||||
def get_next_rid(self):
|
||||
"""Get the next available rId for relationships files."""
|
||||
max_id = 0
|
||||
for rel_elem in self.dom.getElementsByTagName("Relationship"):
|
||||
rel_id = rel_elem.getAttribute("Id")
|
||||
if rel_id.startswith("rId"):
|
||||
try:
|
||||
max_id = max(max_id, int(rel_id[3:]))
|
||||
except ValueError:
|
||||
pass
|
||||
return f"rId{max_id + 1}"
|
||||
|
||||
def save(self):
|
||||
"""
|
||||
Save the edited XML back to the file.
|
||||
|
||||
Serializes the DOM tree and writes it back to the original file path,
|
||||
preserving the original encoding (ascii or utf-8).
|
||||
"""
|
||||
content = self.dom.toxml(encoding=self.encoding)
|
||||
self.xml_path.write_bytes(content)
|
||||
|
||||
def _parse_fragment(self, xml_content):
|
||||
"""
|
||||
Parse XML fragment and return list of imported nodes.
|
||||
|
||||
Args:
|
||||
xml_content: String containing XML fragment
|
||||
|
||||
Returns:
|
||||
List of defusedxml.minidom.Node objects imported into this document
|
||||
|
||||
Raises:
|
||||
AssertionError: If fragment contains no element nodes
|
||||
"""
|
||||
# Extract namespace declarations from the root document element
|
||||
root_elem = self.dom.documentElement
|
||||
namespaces = []
|
||||
if root_elem and root_elem.attributes:
|
||||
for i in range(root_elem.attributes.length):
|
||||
attr = root_elem.attributes.item(i)
|
||||
if attr.name.startswith("xmlns"): # type: ignore
|
||||
namespaces.append(f'{attr.name}="{attr.value}"') # type: ignore
|
||||
|
||||
ns_decl = " ".join(namespaces)
|
||||
wrapper = f"<root {ns_decl}>{xml_content}</root>"
|
||||
fragment_doc = defusedxml.minidom.parseString(wrapper)
|
||||
nodes = [
|
||||
self.dom.importNode(child, deep=True)
|
||||
for child in fragment_doc.documentElement.childNodes # type: ignore
|
||||
]
|
||||
elements = [n for n in nodes if n.nodeType == n.ELEMENT_NODE]
|
||||
assert elements, "Fragment must contain at least one element"
|
||||
return nodes
|
||||
|
||||
|
||||
def _create_line_tracking_parser():
|
||||
"""
|
||||
Create a SAX parser that tracks line and column numbers for each element.
|
||||
|
||||
Monkey patches the SAX content handler to store the current line and column
|
||||
position from the underlying expat parser onto each element as a parse_position
|
||||
attribute (line, column) tuple.
|
||||
|
||||
Returns:
|
||||
defusedxml.sax.xmlreader.XMLReader: Configured SAX parser
|
||||
"""
|
||||
|
||||
def set_content_handler(dom_handler):
|
||||
def startElementNS(name, tagName, attrs):
|
||||
orig_start_cb(name, tagName, attrs)
|
||||
cur_elem = dom_handler.elementStack[-1]
|
||||
cur_elem.parse_position = (
|
||||
parser._parser.CurrentLineNumber, # type: ignore
|
||||
parser._parser.CurrentColumnNumber, # type: ignore
|
||||
)
|
||||
|
||||
orig_start_cb = dom_handler.startElementNS
|
||||
dom_handler.startElementNS = startElementNS
|
||||
orig_set_content_handler(dom_handler)
|
||||
|
||||
parser = defusedxml.sax.make_parser()
|
||||
orig_set_content_handler = parser.setContentHandler
|
||||
parser.setContentHandler = set_content_handler # type: ignore
|
||||
return parser
|
||||
177
skills/docx/setup.sh
Executable file
177
skills/docx/setup.sh
Executable file
@@ -0,0 +1,177 @@
|
||||
#!/usr/bin/env bash
|
||||
# ---
|
||||
# name: docx-setup
|
||||
# author: Z.AI
|
||||
# version: "1.0"
|
||||
# description: Environment setup for the DOCX skill. Checks and installs all required dependencies.
|
||||
# ---
|
||||
#
|
||||
# Installs only dependencies required by the DOCX skill.
|
||||
set -euo pipefail
|
||||
|
||||
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m'
|
||||
ok() { echo -e " ${GREEN}✓${NC} $1"; }
|
||||
fail() { echo -e " ${RED}✗${NC} $1"; }
|
||||
warn() { echo -e " ${YELLOW}○${NC} $1"; }
|
||||
info() { echo -e " ${BLUE}→${NC} $1"; }
|
||||
|
||||
echo "============================================"
|
||||
echo " DOCX Skill — Environment Setup"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
|
||||
OS="$(uname -s)"
|
||||
ARCH="$(uname -m)"
|
||||
echo "Platform: $OS $ARCH"
|
||||
echo ""
|
||||
|
||||
# ── 0. macOS: Homebrew ──
|
||||
if [ "$OS" = "Darwin" ]; then
|
||||
echo "--- Homebrew (macOS package manager) ---"
|
||||
if command -v brew &>/dev/null; then
|
||||
BREW_VER=$(brew --version 2>/dev/null | head -1)
|
||||
ok "brew ($BREW_VER)"
|
||||
else
|
||||
fail "brew not found — Node.js install needs Homebrew on macOS"
|
||||
info "Install: /bin/bash -c \"\$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\""
|
||||
fi
|
||||
echo ""
|
||||
fi
|
||||
|
||||
# ── 1. Node.js (docx-js runs on Node) ──
|
||||
echo "--- Node.js ---"
|
||||
if command -v node &>/dev/null; then
|
||||
NODE_VER=$(node --version)
|
||||
ok "node ($NODE_VER)"
|
||||
else
|
||||
fail "node not found (required — docx generation uses docx-js on Node)"
|
||||
case "$OS" in
|
||||
Darwin) info "Install: brew install node" ;;
|
||||
Linux) info "Install: curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -"
|
||||
info " sudo apt install -y nodejs" ;;
|
||||
*) info "Install: https://nodejs.org/" ;;
|
||||
esac
|
||||
fi
|
||||
|
||||
# ── 2. npm ──
|
||||
echo ""
|
||||
echo "--- npm ---"
|
||||
if command -v npm &>/dev/null; then
|
||||
NPM_VER=$(npm --version 2>/dev/null)
|
||||
ok "npm ($NPM_VER)"
|
||||
else
|
||||
fail "npm not found"
|
||||
case "$OS" in
|
||||
Darwin) info "Install: brew install node (includes npm)" ;;
|
||||
Linux) info "Install: comes with nodejs" ;;
|
||||
*) info "Install: https://nodejs.org/" ;;
|
||||
esac
|
||||
fi
|
||||
|
||||
# ── 3. npm package: docx ──
|
||||
echo ""
|
||||
echo "--- npm Packages ---"
|
||||
if node -e "require('docx')" 2>/dev/null || npm list -g docx &>/dev/null; then
|
||||
DOCX_VER=$(node -e "try{console.log(require('docx/package.json').version)}catch(e){console.log('installed')}" 2>/dev/null)
|
||||
ok "docx ($DOCX_VER)"
|
||||
else
|
||||
fail "docx not installed"
|
||||
info "Install: npm install -g docx"
|
||||
echo ""
|
||||
if [ -t 0 ]; then
|
||||
read -p " Install now? [Y/n] " -n 1 -r REPLY
|
||||
echo ""
|
||||
REPLY=${REPLY:-Y}
|
||||
else
|
||||
warn "Non-interactive mode — skipping auto-install."
|
||||
REPLY=N
|
||||
fi
|
||||
if [[ ! $REPLY =~ ^[Nn]$ ]]; then
|
||||
npm install -g docx 2>/dev/null && ok "Installed: docx" || fail "npm install failed"
|
||||
fi
|
||||
fi
|
||||
|
||||
# ── 4. Python 3 (post-processing scripts) ──
|
||||
echo ""
|
||||
echo "--- Python (post-processing) ---"
|
||||
if command -v python3 &>/dev/null; then
|
||||
PY_VER=$(python3 --version 2>&1)
|
||||
ok "python3 ($PY_VER)"
|
||||
if [ "$OS" = "Darwin" ]; then
|
||||
PY_PATH=$(which python3 2>/dev/null)
|
||||
if [[ "$PY_PATH" == "/usr/bin/python3" ]]; then
|
||||
warn "Using macOS system Python (limited). Recommend: brew install python3"
|
||||
fi
|
||||
fi
|
||||
else
|
||||
fail "python3 not found"
|
||||
case "$OS" in
|
||||
Darwin) info "Install: brew install python3" ;;
|
||||
Linux) info "Install: sudo apt install python3 python3-pip (Debian/Ubuntu)"
|
||||
info " sudo dnf install python3 python3-pip (Fedora/RHEL)" ;;
|
||||
*) info "Install: https://www.python.org/downloads/" ;;
|
||||
esac
|
||||
fi
|
||||
|
||||
# ── 5. pip ──
|
||||
echo ""
|
||||
echo "--- pip ---"
|
||||
if python3 -m pip --version &>/dev/null 2>&1; then
|
||||
PIP_VER=$(python3 -m pip --version 2>/dev/null | head -1)
|
||||
ok "pip ($PIP_VER)"
|
||||
else
|
||||
fail "pip not found"
|
||||
case "$OS" in
|
||||
Darwin) info "Install: python3 -m ensurepip --upgrade"
|
||||
info " or: brew install python3 (includes pip)" ;;
|
||||
Linux) info "Install: sudo apt install python3-pip (Debian/Ubuntu)" ;;
|
||||
*) info "Install: python3 -m ensurepip --upgrade" ;;
|
||||
esac
|
||||
fi
|
||||
|
||||
# ── 6. Python packages ──
|
||||
echo ""
|
||||
echo "--- Python Packages ---"
|
||||
PY_PKGS=(
|
||||
"defusedxml:defusedxml"
|
||||
)
|
||||
|
||||
MISSING_PY=()
|
||||
for entry in "${PY_PKGS[@]}"; do
|
||||
mod="${entry%%:*}"
|
||||
pkg="${entry##*:}"
|
||||
if python3 -c "import $mod" 2>/dev/null; then
|
||||
ver=$(python3 -c "import $mod; print(getattr($mod, '__version__', 'installed'))" 2>/dev/null)
|
||||
ok "$pkg ($ver)"
|
||||
else
|
||||
fail "$pkg not installed"
|
||||
MISSING_PY+=("$pkg")
|
||||
fi
|
||||
done
|
||||
|
||||
if [ ${#MISSING_PY[@]} -gt 0 ]; then
|
||||
echo ""
|
||||
if [ -t 0 ]; then
|
||||
read -p " Install missing Python packages? [Y/n] " -n 1 -r REPLY
|
||||
echo ""
|
||||
REPLY=${REPLY:-Y}
|
||||
else
|
||||
warn "Non-interactive mode — skipping auto-install. Run interactively or install manually."
|
||||
REPLY=N
|
||||
fi
|
||||
if [[ ! $REPLY =~ ^[Nn]$ ]]; then
|
||||
python3 -m pip install -q "${MISSING_PY[@]}" 2>/dev/null \
|
||||
|| python3 -m pip install -q --user "${MISSING_PY[@]}" 2>/dev/null \
|
||||
|| python3 -m pip install -q --break-system-packages "${MISSING_PY[@]}" 2>/dev/null \
|
||||
|| { fail "pip install failed. Try manually: pip install ${MISSING_PY[*]}"; }
|
||||
ok "Installed: ${MISSING_PY[*]}"
|
||||
fi
|
||||
fi
|
||||
|
||||
# ── Summary ──
|
||||
echo ""
|
||||
echo "============================================"
|
||||
echo " Setup complete."
|
||||
echo " Core: Node.js + docx (npm)"
|
||||
echo " Post-processing: Python + defusedxml"
|
||||
echo "============================================"
|
||||
Reference in New Issue
Block a user