Initial commit

This commit is contained in:
Z User
2026-06-06 05:21:10 +00:00
Unverified
commit 6664758a6d
493 changed files with 135653 additions and 0 deletions

88
skills/docx/routes/comment.md Executable file
View File

@@ -0,0 +1,88 @@
# Route: Add Comments
## Method 1: python-docx (Recommended — Simple)
```python
from docx import Document
from docx.oxml.ns import qn
from docx.oxml import OxmlElement
from datetime import datetime
def add_comment(paragraph, comment_text, author="GLM", initials="G"):
"""Add a comment to an entire paragraph."""
# Create comment reference
comment_id = str(hash(comment_text) % 10000)
# Add to comments.xml (need to create if not exists)
# ... complex XML manipulation required
pass
# Simpler approach: use python-docx-ng or manipulate XML directly
```
**Note**: python-docx has limited native comment support. For reliable results, use the OOXML method.
## Method 2: OOXML Direct Manipulation (Reliable)
### Step 1: Unpack
```bash
mkdir work && cd work && unzip ../input.docx
```
### Step 2: Create/update word/comments.xml
```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:comments xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<w:comment w:id="1" w:author="Reviewer" w:date="2024-01-15T10:30:00Z" w:initials="R">
<w:p>
<w:r>
<w:t>This section needs more detail.</w:t>
</w:r>
</w:p>
</w:comment>
</w:comments>
```
### Step 3: Mark comment range in document.xml
```xml
<w:commentRangeStart w:id="1"/>
<w:r><w:t>Text being commented on</w:t></w:r>
<w:commentRangeEnd w:id="1"/>
<w:r>
<w:rPr><w:rStyle w:val="CommentReference"/></w:rPr>
<w:commentReference w:id="1"/>
</w:r>
```
### Step 4: Update relationships
In `word/_rels/document.xml.rels`, add:
```xml
<Relationship Id="rIdComments" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments" Target="comments.xml"/>
```
### Step 5: Update Content_Types
In `[Content_Types].xml`, ensure:
```xml
<Override PartName="/word/comments.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml"/>
```
### Step 6: Pack
```bash
zip -r ../output.docx . -x ".*"
```
## When to Use Each Method
| Scenario | Method |
|----------|--------|
| Add 1-2 simple comments | OOXML |
| Batch review (many comments) | OOXML with Python script |
| Comment on specific words | OOXML (precise range control) |
| Quick annotation | python-docx if available |

207
skills/docx/routes/create.md Executable file
View File

@@ -0,0 +1,207 @@
# Route: Create New Document
## Workflow
```
0. Check if user provided a reference template (PDF/docx) → if yes, use Template-Following Mode below
1. Load `references/design-system.md` → select palette and cover recipe
2. Load `references/common-rules.md` → shared layout, font, placeholder rules
3. Check user keywords → load scene file if applicable
4. Load `references/docx-js-core.md`
5. If complex → also load `references/docx-js-advanced.md`
6. Plan document structure (outline)
7. Write JS/TS using docx library
⚠️ **BEFORE writing any string**: scan ALL Chinese text for curly quotes `""''` and replace with `\u201c \u201d \u2018 \u2019` — bare curly quotes break JS syntax (see docx-js-advanced.md § Quotes Escaping)
8. Run with `bun run generate.js` (or `node generate.js`)
9. If TOC → run `python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx --auto`
10. Run post-generation checklist (see SKILL.md)
```
## Template-Following Mode
When the user provides a reference document (PDF/docx) as a **formatting template** (e.g., "generate following this template format", "refer to this sample"), switch to template-following mode instead of the standard recipe-based workflow:
1. **Extract the template's structure** — cover layout, section order, heading hierarchy, page breaks, special pages (e.g., advisor comments page, approval form)
2. **Replicate structure exactly** — every major structural unit becomes a **separate section** (cover, body, appendix/form pages) with appropriate margins and page breaks
3. **Fill content** from the user's content source, or generate per user instructions
4. **Preserve template-specific elements** — school-specific forms, signature areas, stamp placeholders, advisor comment blocks → reproduce as-is with placeholder text (e.g., "Advisor (signature):")
5. **Maintain formatting fidelity** — font choices, table layouts, spacing, and alignment should match the template, not the standard design-system palettes
⚠️ **Do NOT apply standard cover recipes (R1R7) when a user-provided template defines its own cover format.** Follow the template's cover layout instead. Standard `common-rules.md` constraints (e.g., `WidthType.PERCENTAGE`, `allNoBorders` for cover wrapper, `Rule 8` line spacing) still apply for cross-engine compatibility.
⚠️ **Each distinct page type = separate section.** Cover section (margin: 0), body section (standard margins), appendix/form pages (may need different margins or orientation). Never place cover + body + appendix in a single section.
---
## Decision Tree
### Cover Page?
- **YES**: Reports, theses, proposals, plans, or 3+ page docs with clear title/author
- **NO**: Resumes, contracts, official documents, exam papers, short memos
### Cover Style Selector — Recipe Router
Covers use **7 validated layout recipes (R1R7)**, auto-selected by `selectCoverRecipe()` in `references/design-system.md` (the **authoritative source** — do NOT duplicate the function).
**Quick Reference:**
| docType | Recipe | Default Palette |
|---------|--------|-----------------|
| contract / official / exam / resume | null (no cover) | — |
| academic | R5 (Clean White) | ACADEMIC |
| proposal_report (thesis proposal) | R5 (Clean White) | ACADEMIC |
| lesson_plan (STEM) | R4 (Top Color Block) | DM-1 |
| lesson_plan (arts/general) | R6 (Editorial Warm) | ED-1 |
| creative / branding / design | R3 (Centered Card Frame) | SN-2 |
| cultural / newsletter / internal | R6 (Editorial Warm) | ED-1 |
| activity / event | R6 (Editorial Warm) | ED-1 |
| trend/research (cultural/creative/brand) | R7 (Swiss Tech) | ST-1 |
| whitepaper | R2 (Double-Rule Frame) | IG-1 / CM-2 |
| consulting | R2 (Double-Rule Frame) | MIN-1 |
| proposal / plan | R4 (Top Color Block) | GO-1 |
| report | R1 (Pure Paragraph Left) | by industry |
| default | R1 (Pure Paragraph Left) | DS-1 |
⚠️ **Long title routing:** After selecting recipe, apply `applyLongTitleOverride(result, titleLength)`. Titles >20 chars on R3/R4/R6 → fall back to R1. Titles >30 chars on R2 → fall back to R1. R5 is never overridden.
⚠️ **Academic thesis cover:** Use `buildAcademicCover()` from `scenes/academic.md`.
⚠️ **Thesis proposal report (开题报告):** Use `buildProposalCover()` from `scenes/academic.md`. Cover MUST be an independent section. Keywords: "开题报告" (Chinese), "thesis proposal", "research proposal" — NOT the same as business proposals (which use R4).
### Table of Contents?
- **YES**: 3+ major sections (H1 headings)
- **NO**: Resumes, exam papers, short docs, contracts (<20 clauses)
→ See `references/toc.md` for the complete TOC reference (3-step process, code examples, common bugs).
### Headers/Footers?
- **YES** by default (page numbers minimum)
- **NO**: cover page section, official docs (special format)
### Load Math Formulas?
When: exam papers, academic papers, physics/math/chemistry → load `references/math-formulas.md`
### Load Chart Templates?
When: data visualization, reports with charts → load `references/chart-templates.md`
## Outline Rules
**User provides outline** → Follow EXACTLY. No additions, deletions, or reordering.
**No outline** → Create from scene template:
- **Academic:** Abstract → TOC → Body → References
- **Report:** Use `selectReportType()` to determine type, then follow template AF:
- analysis → Template A (Executive Summary → Background → Scope & Method → Findings → Diagnosis → Conclusions)
- experiment → Template B (Abstract → Objective & Hypothesis → Environment → Procedure → Results → Error Analysis → Conclusions)
- testing → Template C (Overview → Scope & Environment → Test Plan → Results → Defects → Risks → Conclusions)
- research → Template D (Summary → Background → Subjects & Method → Sample → Findings → Synthesis → Recommendations)
- review → Template E (Overview → Goals → Review → Results → Issues → Lessons → Action Plan)
- proposal → Template F (Summary → Status → Goals → Solution → Roadmap → Resources → Risks → Benefits)
- **Contract:** Use `selectContractType()` then follow template AE:
- bilateral → Template A (Header → Parties → Recitals → Definitions → Subject → Price → Rights → Delivery → Tax → IP → Breach → Force Majeure → Termination → Notices → Dispute → Miscellaneous → Signature)
- transfer → Template B (Header → Recitals → Definitions → Subject → Consideration → Closing → Representations → Tax → Breach → Dispute → Signature)
- nda → Template C (Header → Recitals → Definition → Obligations → Use Restrictions → Return/Destroy → Exceptions → Duration → Breach → Dispute → Signature)
- framework → Template D (Header → Recitals → Purpose → Scope → Division → Mechanism → Commercial → Confidentiality → Term → Breach → Dispute → Signature)
- terms → Template E (Title → Definitions → Services → Rights → Liability → Fees → IP → Termination → Notices → Dispute → Miscellaneous)
- **Official:** Use `selectOfficialType()` + `needsRedHeader()`:
- notice → Template A ([Red header] → [Doc number] → Title → Addressee → Reason → Items → Requirements → [Attachments] → [Signature] → [Date] → [Colophon])
- letter → Template B ([Red header] → [Doc number] → Title → Addressee → Reason → Negotiation/Reply → Closing → [Signature] → [Date])
- reply → Template C ([Red header] → [Doc number] → Title → Addressee → Reference → Reply → "This is the reply." → Signature → Date)
- minutes → Template D (Title → Meeting Overview → Agreed Items → Responsibilities → [Distribution]) — typically no red header
- Present outline to user before generating when possible
## Scene Completeness
Include ALL elements a scene specifies:
- **Academic thesis:** Cover (`buildAcademicCover()` in its own section), abstract, TOC, references
- **Thesis proposal report (thesis proposal / 开题报告):** Cover (`buildProposalCover()` in its own section), body sections per proposal template. Cover MUST be a separate section.
- **Report:** Cover, executive summary, conclusions
- **Contract:** Party info, recitals, complete clause closure, signature block, uniform `【】` placeholders
- **Official:** Correct document type, specific title, closing phrase matching type, proper numbering hierarchy, red header only when requested
- **Exam:** Student info area, scoring criteria
Generate complete, substantive content — not skeletons.
## Content Guidelines
- **Length**: "detailed report" = 3000+ words. "brief summary" = 5001000.
- **Data**: Use user's data, or generate realistic placeholders
- **Charts**: Use `references/chart-templates.md` matplotlib templates → PNG → embed
- **Math**: Use `references/math-formulas.md` LaTeX → docx-js Math mapping
- **Tables**: For structured data, not layout
- **Numbering**: Figures, tables numbered sequentially with cross-references
## Code Architecture
### Heading Style Rule (Mandatory)
**All body chapter headings MUST use `heading: HeadingLevel.HEADING_X`** — never simulate with bold + large font (TOC cannot detect simulated headings).
**Exception:** Cover title and TOC title ("目录") heading MUST NOT use Heading style.
### Blank Page Prevention
→ See SKILL.md § Post-Generation checklist for the full set of rules.
Key rules:
1. No double page breaks (SectionType.NEXT_PAGE + PageBreak = blank page)
2. PageBreak paragraphs should have visible text content
3. No more than 3 consecutive empty paragraphs
4. Cover section: ≤2 trailing empty paragraphs, no trailing PageBreak
### Builder Pattern Example
```js
const { Document, Packer, Paragraph, TextRun, Header, Footer,
AlignmentType, HeadingLevel, PageNumber } = require("docx");
const fs = require("fs");
// 1. Palette
const P = { primary: "#101820", body: "#182030", secondary: "#506070", accent: "#8090A0" };
const c = (hex) => hex.replace("#", "");
// 2. Component builders
function heading(text, level = HeadingLevel.HEADING_1) {
return new Paragraph({
heading: level,
spacing: { before: level === HeadingLevel.HEADING_1 ? 360 : 240, after: 120 },
children: [new TextRun({ text, bold: true, color: c(P.primary), font: { ascii: "Calibri", eastAsia: "SimHei" } })]
});
}
function body(text) {
return new Paragraph({
alignment: AlignmentType.JUSTIFIED,
indent: { firstLine: 480 },
spacing: { line: 312 },
children: [new TextRun({ text, size: 24, color: c(P.body) })],
});
}
// 3. Assembly — cover + body in separate sections
const doc = new Document({
styles: { default: { document: {
run: { font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" }, size: 24, color: c(P.body) },
paragraph: { spacing: { line: 312 } },
}}},
sections: [
{ properties: { page: { margin: { top: 0, bottom: 0, left: 0, right: 0 } } },
children: buildCoverR1(config) }, // ← use recipe from design-system.md
{ properties: { page: { margin: { top: 1440, bottom: 1440, left: 1701, right: 1417 } } },
footers: { default: new Footer({ children: [new Paragraph({ alignment: AlignmentType.CENTER,
children: [new TextRun({ children: [PageNumber.CURRENT], size: 18 })] })] }) },
children: [heading("Chapter 1"), body("Content...")] },
],
});
Packer.toBuffer(doc).then(buf => { fs.writeFileSync("output.docx", buf); });
```
## Post-Generation
→ See SKILL.md § Post-Generation for the complete two-layer verification checklist.
```bash
python3 "$DOCX_SCRIPTS/postcheck.py" output.docx
```
⚠️ **Running postcheck.py is MANDATORY.** Fix all ❌ errors before delivering.

115
skills/docx/routes/edit.md Executable file
View File

@@ -0,0 +1,115 @@
# Route: Edit Existing Document
## Workflow Overview
```
1. Receive .docx (or .doc → convert)
2. Unpack → working directory
3. Analyze structure (document.xml, styles.xml)
4. Plan changes → batch by type
5. Implement via Document library (Python)
6. Pack → output.docx
7. Verify (pandoc or visual)
```
## Step 0: Format Conversion
```bash
# .doc → .docx
libreoffice --headless --convert-to docx input.doc
```
## Step 1: Unpack
```bash
mkdir -p work_dir && cd work_dir && unzip ../input.docx
```
Key files: `word/document.xml` (content), `word/styles.xml` (styles), `word/numbering.xml` (lists), `word/media/` (images), `[Content_Types].xml`, `word/_rels/document.xml.rels`
## Step 2: Plan Changes
Group changes into batches, process in order:
1. **Structural** — Add/remove sections, reorder paragraphs
2. **Style** — Font, size, color modifications
3. **Text** — Find/replace, fix typos
4. **Table** — Add/remove rows/columns, update data
5. **Image** — Replace/add images
## Step 3: Implement
Load `references/ooxml.md` for the full Document library API. Key patterns:
```python
from scripts.document import Document
doc = Document('work_dir')
# Text replacement with tracked changes
node = doc["word/document.xml"].get_node(tag="w:r", contains="old text")
rpr = tags[0].toxml() if (tags := node.getElementsByTagName("w:rPr")) else ""
replacement = f'<w:del><w:r>{rpr}<w:delText>old text</w:delText></w:r></w:del><w:ins><w:r>{rpr}<w:t>new text</w:t></w:r></w:ins>'
doc["word/document.xml"].replace_node(node, replacement)
doc.save()
```
## Step 4: Pack
```bash
cd work_dir && zip -r ../output.docx . -x ".*"
```
## Step 5: Verify
```bash
pandoc output.docx -t plain -o /dev/stdout | head -50
# or visual
libreoffice --headless --convert-to pdf output.docx
```
---
## Template Matching Workflow
When user says "use this format" or provides a template:
1. Unpack template, extract `styles.xml`, `numbering.xml`
2. Analyze font/size/spacing/margins
3. Copy `styles.xml` into target document
4. Match heading hierarchy and spacing
## Multi-File Merge
1. Use first document as base
2. Extract content from additional documents
3. Insert with page breaks between sections
4. Merge styles (prefer base document's)
5. Re-number figures/tables sequentially
## Redlining (Tracked Changes) — Default for Revisions
When user asks for revisions, **default to tracked changes** so they can review:
```python
doc = Document('work_dir', track_revisions=True)
# ... make changes using replace_node with <w:del>/<w:ins>
doc.save()
```
Ask user if they want clean output or tracked changes only if ambiguous.
## Common Operations Quick Reference
| Operation | Approach |
|-----------|----------|
| Replace text | `get_node` + `replace_node` with tracked changes |
| Change font | Modify `<w:rFonts>` in run properties |
| Add paragraph | `insert_after` with `<w:p>` element |
| Delete paragraph | `suggest_deletion` on `<w:p>` |
| Add table row | Clone `<w:tr>`, modify cells |
| Update header | Edit `word/headerN.xml` |
| Change margins | Edit `<w:pgMar>` in `<w:sectPr>` |
| Add image | See `references/ooxml.md` image insertion pattern |
| Add comment | `doc.add_comment(start, end, text)` |

120
skills/docx/routes/format.md Executable file
View File

@@ -0,0 +1,120 @@
# Route: Format / Layout
## Workflow
```
1. Read current document (pandoc for content, unpack for structure)
2. Identify format requirements from user
3. Use unit conversion table (see SKILL.md)
4. Apply formatting via OOXML manipulation or python-docx
5. Pack and verify
```
## Quick Formatting via python-docx
For simple formatting tasks, python-docx is often faster than raw XML:
```python
from docx import Document as PythonDocument
from docx.shared import Pt, Cm, Twips
from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = PythonDocument("input.docx")
# Change all body paragraph formatting
for para in doc.paragraphs:
if para.style.name.startswith("Heading"):
continue
para.paragraph_format.first_line_indent = Twips(420)
para.paragraph_format.line_spacing = 1.5
para.paragraph_format.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
for run in para.runs:
run.font.name = "宋体"
run.font.size = Pt(12) # Xiao Si 小四
doc.save("output.docx")
```
## Common Format Request Patterns
### University Thesis Formatting
Typical Chinese university thesis requirements:
```python
from docx.shared import Cm, Pt, Twips
# Margins
for section in doc.sections:
section.top_margin = Cm(2.5)
section.bottom_margin = Cm(2.5)
section.left_margin = Cm(3.0)
section.right_margin = Cm(2.5)
# Fonts
# Body: SimSun 宋体 Xiao Si 小四 (12pt)
# H1: SimHei 黑体 San Hao 三号 (16pt) centered
# H2: SimHei 黑体 Si Hao 四号 (14pt)
# H3: SimHei 黑体 Xiao Si 小四 (12pt)
# English: Times New Roman, same sizes
```
### Page Numbers Starting from Specific Page
Use multi-section approach:
```python
# Section 1: Front matter (Roman numerals)
# Section 2: Main content (Arabic, starting from 1)
# This requires OOXML manipulation — see routes/edit.md for unpack/pack workflow
```
In raw XML (`word/document.xml`):
```xml
<w:sectPr>
<w:pgNumType w:fmt="upperRoman" w:start="1"/>
</w:sectPr>
<!-- New section -->
<w:sectPr>
<w:pgNumType w:fmt="decimal" w:start="1"/>
</w:sectPr>
```
### Different Headers Per Section
Each section in a .docx can have its own header/footer. See `references/docx-js-advanced.md` for the multi-section approach.
For existing documents, modify `word/document.xml` to split `<w:sectPr>` and create separate `headerN.xml` files.
### Font Size Conversion
When user requests a Chinese font size name:
| Request | Action |
|---------|--------|
| "Change to Wu Hao (5th) size" | `font.size = Pt(10.5)` or `size: 21` in docx-js |
| "Title in San Hao SimHei" | `font.size = Pt(16)`, `font.name = "SimHei"` |
| "Body in Xiao Si SimSun" | `font.size = Pt(12)`, `font.name = "SimSun"` |
### Line Spacing Adjustment
```python
from docx.shared import Twips
# 1.0x spacing
para.paragraph_format.line_spacing_rule = WD_LINE_SPACING.MULTIPLE
para.paragraph_format.line_spacing = 1.0
# 1.3x spacing (our default)
para.paragraph_format.line_spacing = 1.5
# Fixed spacing (e.g., 28pt)
para.paragraph_format.line_spacing_rule = WD_LINE_SPACING.EXACTLY
para.paragraph_format.line_spacing = Pt(28)
```
## Verification
After formatting changes:
1. Open in LibreOffice or convert to PDF for visual check
2. Extract text with pandoc to ensure content unchanged
3. Compare file sizes (formatting-only changes shouldn't dramatically change size)

114
skills/docx/routes/read.md Executable file
View File

@@ -0,0 +1,114 @@
# Route: Read / Analyze / Extract
## Method 1: Text Extraction via pandoc (Fastest)
```bash
# Plain text
pandoc input.docx -t plain -o output.txt
# Markdown (preserves structure)
pandoc input.docx -t markdown -o output.md
# Extract with metadata
pandoc input.docx -t markdown --standalone -o output.md
```
**Best for**: Quick content reading, text analysis, word count, searching.
## Method 2: Raw XML Access (Detailed)
```bash
mkdir work && cd work && unzip ../input.docx
# Read main content
cat word/document.xml
# Read styles
cat word/styles.xml
# List embedded media
ls word/media/
# Read headers/footers
cat word/header1.xml
cat word/footer1.xml
```
**Best for**: Analyzing formatting, extracting styles, inspecting document structure, debugging layout issues.
### Quick XML Parsing
```python
import defusedxml.ElementTree as ET
tree = ET.parse("word/document.xml")
ns = {"w": "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}
# Extract all text
texts = []
for t in tree.iter("{http://schemas.openxmlformats.org/wordprocessingml/2006/main}t"):
if t.text:
texts.append(t.text)
full_text = "".join(texts)
# Count paragraphs
paras = tree.findall(".//w:p", ns)
print(f"Paragraphs: {len(paras)}")
# Find headings
for para in paras:
pPr = para.find("w:pPr", ns)
if pPr is not None:
pStyle = pPr.find("w:pStyle", ns)
if pStyle is not None and "Heading" in pStyle.get(f"{{{ns['w']}}}val", ""):
text = "".join(t.text for t in para.iter(f"{{{ns['w']}}}t") if t.text)
print(f" {pStyle.get(f'{{{ns[\"w\"]}}}val')}: {text}")
```
## Method 3: Convert to Images (Visual Analysis)
```bash
# Convert to PDF first
libreoffice --headless --convert-to pdf input.docx
# Then to images
pdftoppm -png -r 200 input.pdf page
# Generates page-1.png, page-2.png, etc.
```
**Best for**: Visual layout analysis, comparing formatting, generating previews, when user asks "what does it look like".
## Method 4: python-docx Reading
```python
from docx import Document
doc = Document("input.docx")
# Read paragraphs
for para in doc.paragraphs:
print(f"[{para.style.name}] {para.text}")
# Read tables
for table in doc.tables:
for row in table.rows:
print([cell.text for cell in row.cells])
# Document properties
print(f"Sections: {len(doc.sections)}")
print(f"Paragraphs: {len(doc.paragraphs)}")
print(f"Tables: {len(doc.tables)}")
```
## Choosing the Right Method
| Need | Method |
|------|--------|
| Quick text content | pandoc |
| Document structure/outline | pandoc → markdown |
| Formatting details | Raw XML |
| Table data extraction | python-docx |
| Visual appearance | Convert to images |
| Style analysis | Raw XML (styles.xml) |
| Word/character count | pandoc → plain → wc |