# OOXML Editing Reference — Document Library API **Important: Read this entire document before editing.** This is the primary reference for modifying existing .docx files. ## Document Library (Python) — Primary API Use the `Document` class from `"$DOCX_SCRIPTS/document.py"` for all edits, tracked changes, and comments. It handles infrastructure automatically (people.xml, RSIDs, settings.xml, comments, relationships, content types). **Working with Unicode and Entities:** - Both entity notation and Unicode work for search: `contains="“Company"` ≡ `contains="\u201cCompany"` - Both work for replacement too ### Setup ```bash # Find the docx skill root find /mnt/skills -name "document.py" -path "*/docx/scripts/*" 2>/dev/null | head -1 # Skill root = parent of scripts/ # Run with PYTHONPATH PYTHONPATH=/mnt/skills/docx python your_script.py ``` ```python from scripts.document import Document, DocxXMLEditor # Basic init (auto-creates temp copy, sets up infrastructure) doc = Document('unpacked') # Custom author/initials doc = Document('unpacked', author="John Doe", initials="JD") # Enable tracked changes doc = Document('unpacked', track_revisions=True) # Custom RSID (auto-generated if omitted) doc = Document('unpacked', rsid="07DC5ECB") ``` ### Finding Nodes ```python # By text node = doc["word/document.xml"].get_node(tag="w:p", contains="specific text") # By line range para = doc["word/document.xml"].get_node(tag="w:p", line_number=range(100, 150)) # By attributes node = doc["word/document.xml"].get_node(tag="w:del", attrs={"w:id": "1"}) # By exact line number para = doc["word/document.xml"].get_node(tag="w:p", line_number=42) # Combined filters (disambiguation) node = doc["word/document.xml"].get_node(tag="w:r", contains="Section", line_number=range(2400, 2500)) ``` ### Tracked Changes **CRITICAL**: Only mark text that actually changes. Keep unchanged text outside ``/`` tags. **Method Selection**: - Regular text → `replace_node()` with ``/``, or `suggest_deletion()` for whole elements - Partially modify another's tracked change → `replace_node()` to nest changes - Reject another's insertion → `revert_insertion()` (NOT `suggest_deletion()`) - Reject another's deletion → `revert_deletion()` ```python # Change one word: "monthly" → "quarterly" node = doc["word/document.xml"].get_node(tag="w:r", contains="The report is monthly") rpr = tags[0].toxml() if (tags := node.getElementsByTagName("w:rPr")) else "" replacement = f'{rpr}The report is {rpr}monthly{rpr}quarterly' doc["word/document.xml"].replace_node(node, replacement) # Delete entire run node = doc["word/document.xml"].get_node(tag="w:r", contains="text to delete") doc["word/document.xml"].suggest_deletion(node) # Delete entire paragraph para = doc["word/document.xml"].get_node(tag="w:p", contains="paragraph to delete") doc["word/document.xml"].suggest_deletion(para) # Insert new content after a node node = doc["word/document.xml"].get_node(tag="w:r", contains="existing text") doc["word/document.xml"].insert_after(node, 'new text') # Add new numbered list item target_para = doc["word/document.xml"].get_node(tag="w:p", contains="existing list item") pPr = tags[0].toxml() if (tags := target_para.getElementsByTagName("w:pPr")) else "" new_item = f'{pPr}New item' tracked_para = DocxXMLEditor.suggest_paragraph(new_item) doc["word/document.xml"].insert_after(target_para, tracked_para) ``` ### Handling Other Authors' Changes ```python # Partially delete another author's insertion node = doc["word/document.xml"].get_node(tag="w:ins", attrs={"w:id": "5"}) replacement = ''' quarterly financial report ''' doc["word/document.xml"].replace_node(node, replacement) # Reject insertion (wraps in deletion) ins = doc["word/document.xml"].get_node(tag="w:ins", attrs={"w:id": "5"}) doc["word/document.xml"].revert_insertion(ins) # Reject deletion (restores deleted content) del_elem = doc["word/document.xml"].get_node(tag="w:del", attrs={"w:id": "3"}) doc["word/document.xml"].revert_deletion(del_elem) ``` ### Comments ```python doc = Document('unpacked', author="Z.ai", initials="Z") # Comment on a range start = doc["word/document.xml"].get_node(tag="w:del", attrs={"w:id": "1"}) end = doc["word/document.xml"].get_node(tag="w:ins", attrs={"w:id": "2"}) doc.add_comment(start=start, end=end, text="Explanation of this change") # Comment on paragraph para = doc["word/document.xml"].get_node(tag="w:p", contains="text") doc.add_comment(start=para, end=para, text="Comment here") # Comment on newly created tracked change node = doc["word/document.xml"].get_node(tag="w:r", contains="old") new_nodes = doc["word/document.xml"].replace_node( node, 'oldnew') doc.add_comment(start=new_nodes[0], end=new_nodes[1], text="Changed per requirements") # Reply to comment doc.reply_to_comment(parent_comment_id=0, text="I agree") ``` ### Images ```python from PIL import Image import shutil, os doc = Document('unpacked') media_dir = os.path.join(doc.unpacked_path, 'word/media') os.makedirs(media_dir, exist_ok=True) shutil.copy('image.png', os.path.join(media_dir, 'image1.png')) img = Image.open(os.path.join(media_dir, 'image1.png')) width_emus = int(6.5 * 914400) # 6.5" usable width height_emus = int(width_emus * img.size[1] / img.size[0]) # Add relationship rels_editor = doc['word/_rels/document.xml.rels'] next_rid = rels_editor.get_next_rid() rels_editor.append_to(rels_editor.dom.documentElement, f'') doc['[Content_Types].xml'].append_to(doc['[Content_Types].xml'].dom.documentElement, '') # Insert node = doc["word/document.xml"].get_node(tag="w:p", line_number=100) doc["word/document.xml"].insert_after(node, f''' ''') ``` ### Saving ```python doc.save() # Validates + copies back to original dir doc.save('modified-unpacked') # Save to different location doc.save(validate=False) # Skip validation (debug only) ``` ### Direct DOM Manipulation ```python editor = doc["word/document.xml"] node = doc["word/document.xml"].get_node(tag="w:p", line_number=5) parent = node.parentNode parent.removeChild(node) # General replacement (without tracked changes) old = doc["word/document.xml"].get_node(tag="w:p", contains="original") doc["word/document.xml"].replace_node(old, "replacement") # Chained insertions node = doc["word/document.xml"].get_node(tag="w:r", line_number=100) nodes = doc["word/document.xml"].insert_after(node, "A") nodes = doc["word/document.xml"].insert_after(nodes[-1], "B") ``` ## Schema Compliance Quick Reference - **Element ordering in ``**: `` → `` → `` → `` → `` - **Whitespace**: `xml:space='preserve'` on `` with leading/trailing spaces - **RSIDs**: 8-digit hex only (0-9, A-F) - **trackRevisions**: Add `` after `` in settings.xml - **``/`` placement**: At paragraph level, containing complete `` elements. Never nest inside ``. ## Validation Rules The validator ensures document text matches the original after reverting GLM's changes: - **Never modify text inside another author's `` or `` tags** - **Use nested deletions** to remove another author's insertions - **Every edit must be tracked** with `` or `` tags