Initial commit

2026-06-06 05:21:10 +00:00
commit 6664758a6d
493 changed files with 135653 additions and 0 deletions
--- a/skills/docx/routes/comment.md
+++ b/skills/docx/routes/comment.md
@@ -0,0 +1,88 @@
+# Route: Add Comments
+
+## Method 1: python-docx (Recommended — Simple)
+
+```python
+from docx import Document
+from docx.oxml.ns import qn
+from docx.oxml import OxmlElement
+from datetime import datetime
+
+def add_comment(paragraph, comment_text, author="GLM", initials="G"):
+    """Add a comment to an entire paragraph."""
+    # Create comment reference
+    comment_id = str(hash(comment_text) % 10000)
+    
+    # Add to comments.xml (need to create if not exists)
+    # ... complex XML manipulation required
+    pass
+
+# Simpler approach: use python-docx-ng or manipulate XML directly
+```
+
+**Note**: python-docx has limited native comment support. For reliable results, use the OOXML method.
+
+## Method 2: OOXML Direct Manipulation (Reliable)
+
+### Step 1: Unpack
+
+```bash
+mkdir work && cd work && unzip ../input.docx
+```
+
+### Step 2: Create/update word/comments.xml
+
+```xml
+<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
+<w:comments xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
+            xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
+  <w:comment w:id="1" w:author="Reviewer" w:date="2024-01-15T10:30:00Z" w:initials="R">
+    <w:p>
+      <w:r>
+        <w:t>This section needs more detail.</w:t>
+      </w:r>
+    </w:p>
+  </w:comment>
+</w:comments>
+```
+
+### Step 3: Mark comment range in document.xml
+
+```xml
+<w:commentRangeStart w:id="1"/>
+<w:r><w:t>Text being commented on</w:t></w:r>
+<w:commentRangeEnd w:id="1"/>
+<w:r>
+  <w:rPr><w:rStyle w:val="CommentReference"/></w:rPr>
+  <w:commentReference w:id="1"/>
+</w:r>
+```
+
+### Step 4: Update relationships
+
+In `word/_rels/document.xml.rels`, add:
+```xml
+<Relationship Id="rIdComments" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments" Target="comments.xml"/>
+```
+
+### Step 5: Update Content_Types
+
+In `[Content_Types].xml`, ensure:
+```xml
+<Override PartName="/word/comments.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml"/>
+```
+
+### Step 6: Pack
+
+```bash
+zip -r ../output.docx . -x ".*"
+```
+
+## When to Use Each Method
+
+| Scenario | Method |
+|----------|--------|
+| Add 1-2 simple comments | OOXML |
+| Batch review (many comments) | OOXML with Python script |
+| Comment on specific words | OOXML (precise range control) |
+| Quick annotation | python-docx if available |
--- a/skills/docx/routes/create.md
+++ b/skills/docx/routes/create.md
@@ -0,0 +1,207 @@
+# Route: Create New Document
+
+## Workflow
+
+```
+0. Check if user provided a reference template (PDF/docx) → if yes, use Template-Following Mode below
+1. Load `references/design-system.md` → select palette and cover recipe
+2. Load `references/common-rules.md` → shared layout, font, placeholder rules
+3. Check user keywords → load scene file if applicable
+4. Load `references/docx-js-core.md`
+5. If complex → also load `references/docx-js-advanced.md`
+6. Plan document structure (outline)
+7. Write JS/TS using docx library
+   ⚠️ **BEFORE writing any string**: scan ALL Chinese text for curly quotes `""''` and replace with `\u201c \u201d \u2018 \u2019` — bare curly quotes break JS syntax (see docx-js-advanced.md § Quotes Escaping)
+8. Run with `bun run generate.js` (or `node generate.js`)
+9. If TOC → run `python3 "$DOCX_SCRIPTS/add_toc_placeholders.py" output.docx --auto`
+10. Run post-generation checklist (see SKILL.md)
+```
+
+## Template-Following Mode
+
+When the user provides a reference document (PDF/docx) as a **formatting template** (e.g., "generate following this template format", "refer to this sample"), switch to template-following mode instead of the standard recipe-based workflow:
+
+1. **Extract the template's structure** — cover layout, section order, heading hierarchy, page breaks, special pages (e.g., advisor comments page, approval form)
+2. **Replicate structure exactly** — every major structural unit becomes a **separate section** (cover, body, appendix/form pages) with appropriate margins and page breaks
+3. **Fill content** from the user's content source, or generate per user instructions
+4. **Preserve template-specific elements** — school-specific forms, signature areas, stamp placeholders, advisor comment blocks → reproduce as-is with placeholder text (e.g., "Advisor (signature):")
+5. **Maintain formatting fidelity** — font choices, table layouts, spacing, and alignment should match the template, not the standard design-system palettes
+
+⚠️ **Do NOT apply standard cover recipes (R1–R7) when a user-provided template defines its own cover format.** Follow the template's cover layout instead. Standard `common-rules.md` constraints (e.g., `WidthType.PERCENTAGE`, `allNoBorders` for cover wrapper, `Rule 8` line spacing) still apply for cross-engine compatibility.
+
+⚠️ **Each distinct page type = separate section.** Cover section (margin: 0), body section (standard margins), appendix/form pages (may need different margins or orientation). Never place cover + body + appendix in a single section.
+
+---
+
+## Decision Tree
+
+### Cover Page?
+- **YES**: Reports, theses, proposals, plans, or 3+ page docs with clear title/author
+- **NO**: Resumes, contracts, official documents, exam papers, short memos
+
+### Cover Style Selector — Recipe Router
+
+Covers use **7 validated layout recipes (R1–R7)**, auto-selected by `selectCoverRecipe()` in `references/design-system.md` (the **authoritative source** — do NOT duplicate the function).
+
+**Quick Reference:**
+
+| docType | Recipe | Default Palette |
+|---------|--------|-----------------|
+| contract / official / exam / resume | null (no cover) | — |
+| academic | R5 (Clean White) | ACADEMIC |
+| proposal_report (thesis proposal) | R5 (Clean White) | ACADEMIC |
+| lesson_plan (STEM) | R4 (Top Color Block) | DM-1 |
+| lesson_plan (arts/general) | R6 (Editorial Warm) | ED-1 |
+| creative / branding / design | R3 (Centered Card Frame) | SN-2 |
+| cultural / newsletter / internal | R6 (Editorial Warm) | ED-1 |
+| activity / event | R6 (Editorial Warm) | ED-1 |
+| trend/research (cultural/creative/brand) | R7 (Swiss Tech) | ST-1 |
+| whitepaper | R2 (Double-Rule Frame) | IG-1 / CM-2 |
+| consulting | R2 (Double-Rule Frame) | MIN-1 |
+| proposal / plan | R4 (Top Color Block) | GO-1 |
+| report | R1 (Pure Paragraph Left) | by industry |
+| default | R1 (Pure Paragraph Left) | DS-1 |
+
+⚠️ **Long title routing:** After selecting recipe, apply `applyLongTitleOverride(result, titleLength)`. Titles >20 chars on R3/R4/R6 → fall back to R1. Titles >30 chars on R2 → fall back to R1. R5 is never overridden.
+
+⚠️ **Academic thesis cover:** Use `buildAcademicCover()` from `scenes/academic.md`.
+
+⚠️ **Thesis proposal report (开题报告):** Use `buildProposalCover()` from `scenes/academic.md`. Cover MUST be an independent section. Keywords: "开题报告" (Chinese), "thesis proposal", "research proposal" — NOT the same as business proposals (which use R4).
+
+### Table of Contents?
+- **YES**: 3+ major sections (H1 headings)
+- **NO**: Resumes, exam papers, short docs, contracts (<20 clauses)
+
+→ See `references/toc.md` for the complete TOC reference (3-step process, code examples, common bugs).
+
+### Headers/Footers?
+- **YES** by default (page numbers minimum)
+- **NO**: cover page section, official docs (special format)
+
+### Load Math Formulas?
+When: exam papers, academic papers, physics/math/chemistry → load `references/math-formulas.md`
+
+### Load Chart Templates?
+When: data visualization, reports with charts → load `references/chart-templates.md`
+
+## Outline Rules
+
+**User provides outline** → Follow EXACTLY. No additions, deletions, or reordering.
+
+**No outline** → Create from scene template:
+- **Academic:** Abstract → TOC → Body → References
+- **Report:** Use `selectReportType()` to determine type, then follow template A–F:
+  - analysis → Template A (Executive Summary → Background → Scope & Method → Findings → Diagnosis → Conclusions)
+  - experiment → Template B (Abstract → Objective & Hypothesis → Environment → Procedure → Results → Error Analysis → Conclusions)
+  - testing → Template C (Overview → Scope & Environment → Test Plan → Results → Defects → Risks → Conclusions)
+  - research → Template D (Summary → Background → Subjects & Method → Sample → Findings → Synthesis → Recommendations)
+  - review → Template E (Overview → Goals → Review → Results → Issues → Lessons → Action Plan)
+  - proposal → Template F (Summary → Status → Goals → Solution → Roadmap → Resources → Risks → Benefits)
+- **Contract:** Use `selectContractType()` then follow template A–E:
+  - bilateral → Template A (Header → Parties → Recitals → Definitions → Subject → Price → Rights → Delivery → Tax → IP → Breach → Force Majeure → Termination → Notices → Dispute → Miscellaneous → Signature)
+  - transfer → Template B (Header → Recitals → Definitions → Subject → Consideration → Closing → Representations → Tax → Breach → Dispute → Signature)
+  - nda → Template C (Header → Recitals → Definition → Obligations → Use Restrictions → Return/Destroy → Exceptions → Duration → Breach → Dispute → Signature)
+  - framework → Template D (Header → Recitals → Purpose → Scope → Division → Mechanism → Commercial → Confidentiality → Term → Breach → Dispute → Signature)
+  - terms → Template E (Title → Definitions → Services → Rights → Liability → Fees → IP → Termination → Notices → Dispute → Miscellaneous)
+- **Official:** Use `selectOfficialType()` + `needsRedHeader()`:
+  - notice → Template A ([Red header] → [Doc number] → Title → Addressee → Reason → Items → Requirements → [Attachments] → [Signature] → [Date] → [Colophon])
+  - letter → Template B ([Red header] → [Doc number] → Title → Addressee → Reason → Negotiation/Reply → Closing → [Signature] → [Date])
+  - reply → Template C ([Red header] → [Doc number] → Title → Addressee → Reference → Reply → "This is the reply." → Signature → Date)
+  - minutes → Template D (Title → Meeting Overview → Agreed Items → Responsibilities → [Distribution]) — typically no red header
+- Present outline to user before generating when possible
+
+## Scene Completeness
+
+Include ALL elements a scene specifies:
+- **Academic thesis:** Cover (`buildAcademicCover()` in its own section), abstract, TOC, references
+- **Thesis proposal report (thesis proposal / 开题报告):** Cover (`buildProposalCover()` in its own section), body sections per proposal template. Cover MUST be a separate section.
+- **Report:** Cover, executive summary, conclusions
+- **Contract:** Party info, recitals, complete clause closure, signature block, uniform `【】` placeholders
+- **Official:** Correct document type, specific title, closing phrase matching type, proper numbering hierarchy, red header only when requested
+- **Exam:** Student info area, scoring criteria
+
+Generate complete, substantive content — not skeletons.
+
+## Content Guidelines
+
+- **Length**: "detailed report" = 3000+ words. "brief summary" = 500–1000.
+- **Data**: Use user's data, or generate realistic placeholders
+- **Charts**: Use `references/chart-templates.md` matplotlib templates → PNG → embed
+- **Math**: Use `references/math-formulas.md` LaTeX → docx-js Math mapping
+- **Tables**: For structured data, not layout
+- **Numbering**: Figures, tables numbered sequentially with cross-references
+
+## Code Architecture
+
+### Heading Style Rule (Mandatory)
+
+**All body chapter headings MUST use `heading: HeadingLevel.HEADING_X`** — never simulate with bold + large font (TOC cannot detect simulated headings).
+
+**Exception:** Cover title and TOC title ("目录") heading MUST NOT use Heading style.
+
+### Blank Page Prevention
+
+→ See SKILL.md § Post-Generation checklist for the full set of rules.
+
+Key rules:
+1. No double page breaks (SectionType.NEXT_PAGE + PageBreak = blank page)
+2. PageBreak paragraphs should have visible text content
+3. No more than 3 consecutive empty paragraphs
+4. Cover section: ≤2 trailing empty paragraphs, no trailing PageBreak
+
+### Builder Pattern Example
+
+```js
+const { Document, Packer, Paragraph, TextRun, Header, Footer,
+        AlignmentType, HeadingLevel, PageNumber } = require("docx");
+const fs = require("fs");
+
+// 1. Palette
+const P = { primary: "#101820", body: "#182030", secondary: "#506070", accent: "#8090A0" };
+const c = (hex) => hex.replace("#", "");
+
+// 2. Component builders
+function heading(text, level = HeadingLevel.HEADING_1) {
+  return new Paragraph({
+    heading: level,
+    spacing: { before: level === HeadingLevel.HEADING_1 ? 360 : 240, after: 120 },
+    children: [new TextRun({ text, bold: true, color: c(P.primary), font: { ascii: "Calibri", eastAsia: "SimHei" } })]
+  });
+}
+
+function body(text) {
+  return new Paragraph({
+    alignment: AlignmentType.JUSTIFIED,
+    indent: { firstLine: 480 },
+    spacing: { line: 312 },
+    children: [new TextRun({ text, size: 24, color: c(P.body) })],
+  });
+}
+
+// 3. Assembly — cover + body in separate sections
+const doc = new Document({
+  styles: { default: { document: {
+    run: { font: { ascii: "Calibri", eastAsia: "Microsoft YaHei" }, size: 24, color: c(P.body) },
+    paragraph: { spacing: { line: 312 } },
+  }}},
+  sections: [
+    { properties: { page: { margin: { top: 0, bottom: 0, left: 0, right: 0 } } },
+      children: buildCoverR1(config) },  // ← use recipe from design-system.md
+    { properties: { page: { margin: { top: 1440, bottom: 1440, left: 1701, right: 1417 } } },
+      footers: { default: new Footer({ children: [new Paragraph({ alignment: AlignmentType.CENTER,
+        children: [new TextRun({ children: [PageNumber.CURRENT], size: 18 })] })] }) },
+      children: [heading("Chapter 1"), body("Content...")] },
+  ],
+});
+
+Packer.toBuffer(doc).then(buf => { fs.writeFileSync("output.docx", buf); });
+```
+
+## Post-Generation
+
+→ See SKILL.md § Post-Generation for the complete two-layer verification checklist.
+
+```bash
+python3 "$DOCX_SCRIPTS/postcheck.py" output.docx
+```
+⚠️ **Running postcheck.py is MANDATORY.** Fix all ❌ errors before delivering.
--- a/skills/docx/routes/edit.md
+++ b/skills/docx/routes/edit.md
@@ -0,0 +1,115 @@
+# Route: Edit Existing Document
+
+## Workflow Overview
+
+```
+1. Receive .docx (or .doc → convert)
+2. Unpack → working directory
+3. Analyze structure (document.xml, styles.xml)
+4. Plan changes → batch by type
+5. Implement via Document library (Python)
+6. Pack → output.docx
+7. Verify (pandoc or visual)
+```
+
+## Step 0: Format Conversion
+
+```bash
+# .doc → .docx
+libreoffice --headless --convert-to docx input.doc
+```
+
+## Step 1: Unpack
+
+```bash
+mkdir -p work_dir && cd work_dir && unzip ../input.docx
+```
+
+Key files: `word/document.xml` (content), `word/styles.xml` (styles), `word/numbering.xml` (lists), `word/media/` (images), `[Content_Types].xml`, `word/_rels/document.xml.rels`
+
+## Step 2: Plan Changes
+
+Group changes into batches, process in order:
+
+1. **Structural** — Add/remove sections, reorder paragraphs
+2. **Style** — Font, size, color modifications
+3. **Text** — Find/replace, fix typos
+4. **Table** — Add/remove rows/columns, update data
+5. **Image** — Replace/add images
+
+## Step 3: Implement
+
+Load `references/ooxml.md` for the full Document library API. Key patterns:
+
+```python
+from scripts.document import Document
+
+doc = Document('work_dir')
+
+# Text replacement with tracked changes
+node = doc["word/document.xml"].get_node(tag="w:r", contains="old text")
+rpr = tags[0].toxml() if (tags := node.getElementsByTagName("w:rPr")) else ""
+replacement = f'<w:del><w:r>{rpr}<w:delText>old text</w:delText></w:r></w:del><w:ins><w:r>{rpr}<w:t>new text</w:t></w:r></w:ins>'
+doc["word/document.xml"].replace_node(node, replacement)
+
+doc.save()
+```
+
+## Step 4: Pack
+
+```bash
+cd work_dir && zip -r ../output.docx . -x ".*"
+```
+
+## Step 5: Verify
+
+```bash
+pandoc output.docx -t plain -o /dev/stdout | head -50
+# or visual
+libreoffice --headless --convert-to pdf output.docx
+```
+
+---
+
+## Template Matching Workflow
+
+When user says "use this format" or provides a template:
+
+1. Unpack template, extract `styles.xml`, `numbering.xml`
+2. Analyze font/size/spacing/margins
+3. Copy `styles.xml` into target document
+4. Match heading hierarchy and spacing
+
+## Multi-File Merge
+
+1. Use first document as base
+2. Extract content from additional documents
+3. Insert with page breaks between sections
+4. Merge styles (prefer base document's)
+5. Re-number figures/tables sequentially
+
+## Redlining (Tracked Changes) — Default for Revisions
+
+When user asks for revisions, **default to tracked changes** so they can review:
+
+```python
+doc = Document('work_dir', track_revisions=True)
+# ... make changes using replace_node with <w:del>/<w:ins>
+doc.save()
+```
+
+Ask user if they want clean output or tracked changes only if ambiguous.
+
+## Common Operations Quick Reference
+
+| Operation | Approach |
+|-----------|----------|
+| Replace text | `get_node` + `replace_node` with tracked changes |
+| Change font | Modify `<w:rFonts>` in run properties |
+| Add paragraph | `insert_after` with `<w:p>` element |
+| Delete paragraph | `suggest_deletion` on `<w:p>` |
+| Add table row | Clone `<w:tr>`, modify cells |
+| Update header | Edit `word/headerN.xml` |
+| Change margins | Edit `<w:pgMar>` in `<w:sectPr>` |
+| Add image | See `references/ooxml.md` image insertion pattern |
+| Add comment | `doc.add_comment(start, end, text)` |
--- a/skills/docx/routes/format.md
+++ b/skills/docx/routes/format.md
@@ -0,0 +1,120 @@
+# Route: Format / Layout
+
+## Workflow
+
+```
+1. Read current document (pandoc for content, unpack for structure)
+2. Identify format requirements from user
+3. Use unit conversion table (see SKILL.md)
+4. Apply formatting via OOXML manipulation or python-docx
+5. Pack and verify
+```
+
+## Quick Formatting via python-docx
+
+For simple formatting tasks, python-docx is often faster than raw XML:
+
+```python
+from docx import Document as PythonDocument
+from docx.shared import Pt, Cm, Twips
+from docx.enum.text import WD_ALIGN_PARAGRAPH
+
+doc = PythonDocument("input.docx")
+
+# Change all body paragraph formatting
+for para in doc.paragraphs:
+    if para.style.name.startswith("Heading"):
+        continue
+    para.paragraph_format.first_line_indent = Twips(420)
+    para.paragraph_format.line_spacing = 1.5
+    para.paragraph_format.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
+    for run in para.runs:
+        run.font.name = "宋体"
+        run.font.size = Pt(12)  # Xiao Si 小四
+
+doc.save("output.docx")
+```
+
+## Common Format Request Patterns
+
+### University Thesis Formatting
+
+Typical Chinese university thesis requirements:
+
+```python
+from docx.shared import Cm, Pt, Twips
+
+# Margins
+for section in doc.sections:
+    section.top_margin = Cm(2.5)
+    section.bottom_margin = Cm(2.5)
+    section.left_margin = Cm(3.0)
+    section.right_margin = Cm(2.5)
+
+# Fonts
+# Body: SimSun 宋体 Xiao Si 小四 (12pt)
+# H1: SimHei 黑体 San Hao 三号 (16pt) centered
+# H2: SimHei 黑体 Si Hao 四号 (14pt)
+# H3: SimHei 黑体 Xiao Si 小四 (12pt)
+# English: Times New Roman, same sizes
+```
+
+### Page Numbers Starting from Specific Page
+
+Use multi-section approach:
+```python
+# Section 1: Front matter (Roman numerals)
+# Section 2: Main content (Arabic, starting from 1)
+# This requires OOXML manipulation — see routes/edit.md for unpack/pack workflow
+```
+
+In raw XML (`word/document.xml`):
+```xml
+<w:sectPr>
+  <w:pgNumType w:fmt="upperRoman" w:start="1"/>
+</w:sectPr>
+<!-- New section -->
+<w:sectPr>
+  <w:pgNumType w:fmt="decimal" w:start="1"/>
+</w:sectPr>
+```
+
+### Different Headers Per Section
+
+Each section in a .docx can have its own header/footer. See `references/docx-js-advanced.md` for the multi-section approach.
+
+For existing documents, modify `word/document.xml` to split `<w:sectPr>` and create separate `headerN.xml` files.
+
+### Font Size Conversion
+
+When user requests a Chinese font size name:
+
+| Request | Action |
+|---------|--------|
+| "Change to Wu Hao (5th) size" | `font.size = Pt(10.5)` or `size: 21` in docx-js |
+| "Title in San Hao SimHei" | `font.size = Pt(16)`, `font.name = "SimHei"` |
+| "Body in Xiao Si SimSun" | `font.size = Pt(12)`, `font.name = "SimSun"` |
+
+### Line Spacing Adjustment
+
+```python
+from docx.shared import Twips
+
+# 1.0x spacing
+para.paragraph_format.line_spacing_rule = WD_LINE_SPACING.MULTIPLE
+para.paragraph_format.line_spacing = 1.0
+
+# 1.3x spacing (our default)
+para.paragraph_format.line_spacing = 1.5
+
+# Fixed spacing (e.g., 28pt)
+para.paragraph_format.line_spacing_rule = WD_LINE_SPACING.EXACTLY
+para.paragraph_format.line_spacing = Pt(28)
+```
+
+## Verification
+
+After formatting changes:
+1. Open in LibreOffice or convert to PDF for visual check
+2. Extract text with pandoc to ensure content unchanged
+3. Compare file sizes (formatting-only changes shouldn't dramatically change size)
--- a/skills/docx/routes/read.md
+++ b/skills/docx/routes/read.md
@@ -0,0 +1,114 @@
+# Route: Read / Analyze / Extract
+
+## Method 1: Text Extraction via pandoc (Fastest)
+
+```bash
+# Plain text
+pandoc input.docx -t plain -o output.txt
+
+# Markdown (preserves structure)
+pandoc input.docx -t markdown -o output.md
+
+# Extract with metadata
+pandoc input.docx -t markdown --standalone -o output.md
+```
+
+**Best for**: Quick content reading, text analysis, word count, searching.
+
+## Method 2: Raw XML Access (Detailed)
+
+```bash
+mkdir work && cd work && unzip ../input.docx
+
+# Read main content
+cat word/document.xml
+
+# Read styles
+cat word/styles.xml
+
+# List embedded media
+ls word/media/
+
+# Read headers/footers
+cat word/header1.xml
+cat word/footer1.xml
+```
+
+**Best for**: Analyzing formatting, extracting styles, inspecting document structure, debugging layout issues.
+
+### Quick XML Parsing
+
+```python
+import defusedxml.ElementTree as ET
+
+tree = ET.parse("word/document.xml")
+ns = {"w": "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}
+
+# Extract all text
+texts = []
+for t in tree.iter("{http://schemas.openxmlformats.org/wordprocessingml/2006/main}t"):
+    if t.text:
+        texts.append(t.text)
+full_text = "".join(texts)
+
+# Count paragraphs
+paras = tree.findall(".//w:p", ns)
+print(f"Paragraphs: {len(paras)}")
+
+# Find headings
+for para in paras:
+    pPr = para.find("w:pPr", ns)
+    if pPr is not None:
+        pStyle = pPr.find("w:pStyle", ns)
+        if pStyle is not None and "Heading" in pStyle.get(f"{{{ns['w']}}}val", ""):
+            text = "".join(t.text for t in para.iter(f"{{{ns['w']}}}t") if t.text)
+            print(f"  {pStyle.get(f'{{{ns[\"w\"]}}}val')}: {text}")
+```
+
+## Method 3: Convert to Images (Visual Analysis)
+
+```bash
+# Convert to PDF first
+libreoffice --headless --convert-to pdf input.docx
+
+# Then to images
+pdftoppm -png -r 200 input.pdf page
+
+# Generates page-1.png, page-2.png, etc.
+```
+
+**Best for**: Visual layout analysis, comparing formatting, generating previews, when user asks "what does it look like".
+
+## Method 4: python-docx Reading
+
+```python
+from docx import Document
+
+doc = Document("input.docx")
+
+# Read paragraphs
+for para in doc.paragraphs:
+    print(f"[{para.style.name}] {para.text}")
+
+# Read tables
+for table in doc.tables:
+    for row in table.rows:
+        print([cell.text for cell in row.cells])
+
+# Document properties
+print(f"Sections: {len(doc.sections)}")
+print(f"Paragraphs: {len(doc.paragraphs)}")
+print(f"Tables: {len(doc.tables)}")
+```
+
+## Choosing the Right Method
+
+| Need | Method |
+|------|--------|
+| Quick text content | pandoc |
+| Document structure/outline | pandoc → markdown |
+| Formatting details | Raw XML |
+| Table data extraction | python-docx |
+| Visual appearance | Convert to images |
+| Style analysis | Raw XML (styles.xml) |
+| Word/character count | pandoc → plain → wc |