3.2 KiB
Executable File
3.2 KiB
Executable File
Route: Edit Existing Document
Workflow Overview
1. Receive .docx (or .doc → convert)
2. Unpack → working directory
3. Analyze structure (document.xml, styles.xml)
4. Plan changes → batch by type
5. Implement via Document library (Python)
6. Pack → output.docx
7. Verify (pandoc or visual)
Step 0: Format Conversion
# .doc → .docx
libreoffice --headless --convert-to docx input.doc
Step 1: Unpack
mkdir -p work_dir && cd work_dir && unzip ../input.docx
Key files: word/document.xml (content), word/styles.xml (styles), word/numbering.xml (lists), word/media/ (images), [Content_Types].xml, word/_rels/document.xml.rels
Step 2: Plan Changes
Group changes into batches, process in order:
- Structural — Add/remove sections, reorder paragraphs
- Style — Font, size, color modifications
- Text — Find/replace, fix typos
- Table — Add/remove rows/columns, update data
- Image — Replace/add images
Step 3: Implement
Load references/ooxml.md for the full Document library API. Key patterns:
from scripts.document import Document
doc = Document('work_dir')
# Text replacement with tracked changes
node = doc["word/document.xml"].get_node(tag="w:r", contains="old text")
rpr = tags[0].toxml() if (tags := node.getElementsByTagName("w:rPr")) else ""
replacement = f'<w:del><w:r>{rpr}<w:delText>old text</w:delText></w:r></w:del><w:ins><w:r>{rpr}<w:t>new text</w:t></w:r></w:ins>'
doc["word/document.xml"].replace_node(node, replacement)
doc.save()
Step 4: Pack
cd work_dir && zip -r ../output.docx . -x ".*"
Step 5: Verify
pandoc output.docx -t plain -o /dev/stdout | head -50
# or visual
libreoffice --headless --convert-to pdf output.docx
Template Matching Workflow
When user says "use this format" or provides a template:
- Unpack template, extract
styles.xml,numbering.xml - Analyze font/size/spacing/margins
- Copy
styles.xmlinto target document - Match heading hierarchy and spacing
Multi-File Merge
- Use first document as base
- Extract content from additional documents
- Insert with page breaks between sections
- Merge styles (prefer base document's)
- Re-number figures/tables sequentially
Redlining (Tracked Changes) — Default for Revisions
When user asks for revisions, default to tracked changes so they can review:
doc = Document('work_dir', track_revisions=True)
# ... make changes using replace_node with <w:del>/<w:ins>
doc.save()
Ask user if they want clean output or tracked changes only if ambiguous.
Common Operations Quick Reference
| Operation | Approach |
|---|---|
| Replace text | get_node + replace_node with tracked changes |
| Change font | Modify <w:rFonts> in run properties |
| Add paragraph | insert_after with <w:p> element |
| Delete paragraph | suggest_deletion on <w:p> |
| Add table row | Clone <w:tr>, modify cells |
| Update header | Edit word/headerN.xml |
| Change margins | Edit <w:pgMar> in <w:sectPr> |
| Add image | See references/ooxml.md image insertion pattern |
| Add comment | doc.add_comment(start, end, text) |