# Route: Edit Existing Document ## Workflow Overview ``` 1. Receive .docx (or .doc → convert) 2. Unpack → working directory 3. Analyze structure (document.xml, styles.xml) 4. Plan changes → batch by type 5. Implement via Document library (Python) 6. Pack → output.docx 7. Verify (pandoc or visual) ``` ## Step 0: Format Conversion ```bash # .doc → .docx libreoffice --headless --convert-to docx input.doc ``` ## Step 1: Unpack ```bash mkdir -p work_dir && cd work_dir && unzip ../input.docx ``` Key files: `word/document.xml` (content), `word/styles.xml` (styles), `word/numbering.xml` (lists), `word/media/` (images), `[Content_Types].xml`, `word/_rels/document.xml.rels` ## Step 2: Plan Changes Group changes into batches, process in order: 1. **Structural** — Add/remove sections, reorder paragraphs 2. **Style** — Font, size, color modifications 3. **Text** — Find/replace, fix typos 4. **Table** — Add/remove rows/columns, update data 5. **Image** — Replace/add images ## Step 3: Implement Load `references/ooxml.md` for the full Document library API. Key patterns: ```python from scripts.document import Document doc = Document('work_dir') # Text replacement with tracked changes node = doc["word/document.xml"].get_node(tag="w:r", contains="old text") rpr = tags[0].toxml() if (tags := node.getElementsByTagName("w:rPr")) else "" replacement = f'{rpr}old text{rpr}new text' doc["word/document.xml"].replace_node(node, replacement) doc.save() ``` ## Step 4: Pack ```bash cd work_dir && zip -r ../output.docx . -x ".*" ``` ## Step 5: Verify ```bash pandoc output.docx -t plain -o /dev/stdout | head -50 # or visual libreoffice --headless --convert-to pdf output.docx ``` --- ## Template Matching Workflow When user says "use this format" or provides a template: 1. Unpack template, extract `styles.xml`, `numbering.xml` 2. Analyze font/size/spacing/margins 3. Copy `styles.xml` into target document 4. Match heading hierarchy and spacing ## Multi-File Merge 1. Use first document as base 2. Extract content from additional documents 3. Insert with page breaks between sections 4. Merge styles (prefer base document's) 5. Re-number figures/tables sequentially ## Redlining (Tracked Changes) — Default for Revisions When user asks for revisions, **default to tracked changes** so they can review: ```python doc = Document('work_dir', track_revisions=True) # ... make changes using replace_node with / doc.save() ``` Ask user if they want clean output or tracked changes only if ambiguous. ## Common Operations Quick Reference | Operation | Approach | |-----------|----------| | Replace text | `get_node` + `replace_node` with tracked changes | | Change font | Modify `` in run properties | | Add paragraph | `insert_after` with `` element | | Delete paragraph | `suggest_deletion` on `` | | Add table row | Clone ``, modify cells | | Update header | Edit `word/headerN.xml` | | Change margins | Edit `` in `` | | Add image | See `references/ooxml.md` image insertion pattern | | Add comment | `doc.add_comment(start, end, text)` |