196 lines
6.7 KiB
Markdown
Executable File
196 lines
6.7 KiB
Markdown
Executable File
# Scene: Edit Existing Spreadsheet
|
|
|
|
## When This Applies
|
|
User provides an existing .xlsx/.xlsm file and wants to modify it — fill data, fix formulas, beautify layout, add sheets, restructure.
|
|
|
|
## Core Principle: Preserve First
|
|
|
|
**Study the existing file before making ANY changes.** The original format, style, and conventions take absolute priority over default guidelines.
|
|
|
|
### VBA Preservation Rule
|
|
When opening `.xlsm` files, **always** use `keep_vba=True`:
|
|
```python
|
|
wb = load_workbook('file.xlsm', keep_vba=True)
|
|
# Edit data/formatting as usual
|
|
wb.save('output.xlsm') # VBA modules preserved
|
|
```
|
|
**Never** save a `.xlsm` as `.xlsx` unless the user explicitly requests macro removal. This silently destroys all VBA code.
|
|
|
|
## Workflow
|
|
|
|
```
|
|
1. INSPECT → Read the file, understand structure
|
|
2. PLAN → Identify what to change vs what to preserve
|
|
3. BACKUP → If destructive changes, suggest user keeps original
|
|
4. MODIFY → Make targeted changes
|
|
5. QA → recalc → audit → scan
|
|
6. VALIDATE → validate → deliver
|
|
```
|
|
|
|
## Step 1: Inspect the File
|
|
|
|
### 1a. Structure Survey
|
|
|
|
```python
|
|
from openpyxl import load_workbook
|
|
|
|
# Read with formulas preserved
|
|
wb = load_workbook('input.xlsx')
|
|
|
|
# Survey structure
|
|
for name in wb.sheetnames:
|
|
ws = wb[name]
|
|
print(f"Sheet: {name}, Dimensions: {ws.dimensions}, "
|
|
f"Rows: {ws.max_row}, Cols: {ws.max_column}")
|
|
|
|
# Check for existing styles
|
|
sample = ws['B4']
|
|
print(f"Font: {sample.font.name}, Size: {sample.font.size}, "
|
|
f"Bold: {sample.font.bold}, Fill: {sample.fill.fgColor}")
|
|
```
|
|
|
|
Also run `python3 "$XLSX_SKILL_DIR/xlsx.py" inspect input.xlsx --pretty` for structured overview.
|
|
|
|
### 1b. Semantic Data Sampling (MANDATORY for merge/copy/aggregate operations)
|
|
|
|
**Don't just print headers — print actual data rows to understand column semantics:**
|
|
|
|
```python
|
|
# Sample first 5 data rows from each sheet
|
|
for name in wb.sheetnames:
|
|
ws = wb[name]
|
|
print(f"\n=== {name} ===")
|
|
for row in range(1, min(6, ws.max_row + 1)):
|
|
vals = []
|
|
for col in range(1, ws.max_column + 1):
|
|
v = ws.cell(row=row, column=col).value
|
|
if v is not None:
|
|
vals.append(f"{get_column_letter(col)}={v}")
|
|
if vals:
|
|
print(f" Row {row}: {vals}")
|
|
```
|
|
|
|
### 1c. Cross-Sheet Column Semantic Mapping (MANDATORY before any merge/copy)
|
|
|
|
**⚠️ NEVER copy columns by position index alone when merging sheets.**
|
|
|
|
When two sheets have similar headers (e.g., both have columns A-V), the same column position may hold completely different data. Always:
|
|
|
|
1. Print sample data (not just headers) from both source and target sheets
|
|
2. For each column, identify the data type and value domain
|
|
3. Create an explicit column mapping dict before writing any data
|
|
|
|
```python
|
|
# Example: source sheet E column = amount, target sheet E column = type code
|
|
# → Do NOT copy source.E → target.E. Build semantic mapping first.
|
|
column_mapping = {
|
|
'src_I': 'dst_E', # amount → amount (different positions!)
|
|
'src_E': 'dst_I', # type → type
|
|
}
|
|
```
|
|
|
|
### 1d. Cell Value Normalization
|
|
|
|
Canonical implementation lives in **`templates/base.py → normalize_cell_value()`**.
|
|
Referenced by `edit-patterns.md` and `quality/pipeline.md`.
|
|
|
|
```python
|
|
from base import normalize_cell_value
|
|
# normalize_cell_value(value) → None for blank/NBSP/ZWSP, otherwise original value
|
|
```
|
|
|
|
**Always use this when checking for empty cells** — `\xa0` (NBSP) looks blank but fails `is None`.
|
|
|
|
## Step 2: Match Existing Styles
|
|
|
|
When adding new cells/rows to a styled file, use **`copy_style()` from `templates/base.py`**:
|
|
|
|
```python
|
|
from base import copy_style
|
|
|
|
# copy_style(source_cell, target_cell)
|
|
# → copies font, fill, border, alignment, number_format
|
|
```
|
|
|
|
## Common Edit Operations
|
|
|
|
### Fill / Complete Data
|
|
```python
|
|
# Add data to empty cells while preserving existing formatting
|
|
for row in range(start, end + 1):
|
|
cell = ws.cell(row=row, column=col)
|
|
if cell.value is None:
|
|
cell.value = new_value
|
|
# Copy style from the cell above
|
|
copy_style(ws.cell(row=row-1, column=col), cell)
|
|
```
|
|
|
|
### Insert Rows / Columns
|
|
```python
|
|
# Insert 3 rows at position 10
|
|
ws.insert_rows(10, amount=3)
|
|
# Note: formulas referencing rows below 10 will auto-adjust
|
|
|
|
# Insert column at position D
|
|
ws.insert_cols(4)
|
|
```
|
|
|
|
**Warning**: Inserting/deleting rows can break chart references and named ranges. Verify after insertion.
|
|
|
|
### Restructure Data
|
|
```python
|
|
# Move data from one layout to another
|
|
# Read all data first, then rewrite
|
|
data = []
|
|
for row in ws.iter_rows(min_row=2, values_only=True):
|
|
data.append(row)
|
|
|
|
# Clear and rewrite in new structure
|
|
# ...
|
|
```
|
|
|
|
### Fix Formulas
|
|
```python
|
|
# Find cells with errors (after recalc)
|
|
wb_data = load_workbook('input.xlsx', data_only=True)
|
|
ws_data = wb_data.active
|
|
|
|
wb_formula = load_workbook('input.xlsx')
|
|
ws_formula = wb_formula.active
|
|
|
|
for row in ws_data.iter_rows():
|
|
for cell in row:
|
|
if isinstance(cell.value, str) and cell.value.startswith('#'):
|
|
formula_cell = ws_formula[cell.coordinate]
|
|
print(f"Error at {cell.coordinate}: {cell.value}, Formula: {formula_cell.value}")
|
|
```
|
|
|
|
## Format Beautification
|
|
|
|
When the user asks to "make it look better" or "format nicely":
|
|
|
|
→ **Load `engines/design.md`** and apply its complete styling system (tokens, fonts, layout, colors).
|
|
|
|
**But**: if the file already has a consistent style, enhance it rather than replacing it. Add what's missing (alignment, column widths, alternating fills) without changing existing colors or fonts. Use `copy_style()` (above) to match adjacent cells.
|
|
|
|
## ⚠️ Dangerous Operations
|
|
|
|
| Operation | Risk | Mitigation |
|
|
|-----------|------|-----------|
|
|
| `load_workbook(data_only=True)` then save | Formulas permanently lost | Never save after data_only read |
|
|
| Delete rows/cols with formula dependencies | #REF! errors | Run audit after deletion |
|
|
| Modify pivot table output with openpyxl | Corrupt pivotCache | Never — regenerate via xlsx.py pivot |
|
|
| Overwrite merged cells | Layout breaks | Check `ws.merged_cells.ranges` first |
|
|
| Manual row sort (swap row data) | Formulas still reference old row numbers | **Regenerate formula strings with target row number** (see Common Patterns → Sort with Formula Rewrite) |
|
|
| Write SUM formula → verify with data_only | Get `None` — formula not evaluated | Compute value in Python for verification; write computed value or use recalc |
|
|
|
|
---
|
|
|
|
## Common Patterns
|
|
|
|
For complex edit operations (grouping, sorting, block detection, merging, sequence fill, etc.):
|
|
|
|
→ **Load `scenes/edit-patterns.md`** on demand.
|
|
|
|
Available patterns: Block Detection, Pre-filter Null, Sort with Formula Rewrite, Group-Merge, Group-Max-Keep-Ties, Sequence Fill, Zero-as-Blank, Side-by-Side Table Detection.
|