Files

Z User 6664758a6d Initial commit

2026-06-06 05:21:10 +00:00

9.1 KiB

Executable File

Raw Blame History

Spreadsheet Integrity Pipeline

Every xlsx deliverable is built and verified through a role-based workflow. Three roles collaborate in sequence: Blueprint Architect, Builder, and Inspector. Each role has explicit responsibilities and handoff criteria.

Tool Reference: xlsx.py

All commands: python3 "$XLSX_SKILL_DIR/xlsx.py" <command> [arguments]

Command	Purpose	Called By
`recalc <file>`	Recalculate formulas via LibreOffice, scan for errors	Builder (self-check)
`audit <file>`	Deep formula error scan + zero-value + implicit array detection	Builder (self-check)
`scan <file>`	Detect out-of-range, header-included, small-aggregate, inconsistent patterns	Builder (self-check)
`inspect <file> --pretty`	Get sheet structure, data ranges, headers (JSON)	Blueprint Architect
`pivot <in> <out> --source --values [--rows --cols --filters --style --chart]`	Create PivotTable	Builder (final step only)
`chart-verify <file>`	Verify embedded charts have data	Builder (self-check)
`validate <file>`	Structural validation (release gate)	Inspector

Role 1: Blueprint Architect

Before any code runs, the Architect produces a build plan:

Decompose the request: separate explicit requirements from implicit business context
Map every sheet: name, column structure, formula dependencies, cross-references
Identify data flow: which sheets feed into which (source → derived → summary)
Flag ambiguity: if the request is unclear, ask — don't guess

The Architect's output is a mental blueprint. No files are created yet.

Role 2: Builder

The Builder writes code and produces the workbook. The Builder operates under a strict single-sheet discipline: complete one sheet fully, verify it, then move on.

Build Cycle (per sheet)

┌─────────────────────────────────────────────┐
│  Write sheet (data, formulas, styling, charts)  │
│                    ↓                            │
│  Save workbook to disk                          │
│                    ↓                            │
│  Self-check chain:                              │
│    recalc → audit → scan                        │
│    + chart-verify (if sheet has charts)          │
│                    ↓                            │
│  All clear? ──Yes──→ Proceed to next sheet      │
│       │                                         │
│      No                                         │
│       ↓                                         │
│  Fix errors → re-save → re-run self-check       │
│  (loop until clean)                             │
└─────────────────────────────────────────────┘

Builder Constraints

No batch-then-check: you cannot create all sheets first and verify later. Errors in early sheets propagate silently into later sheets.
No error forwarding: a sheet with unresolved errors blocks all subsequent work.
No silent delivery: a file that hasn't passed self-check is not a deliverable — it's a draft.

Pivot Tables — Special Sequencing

PivotTables depend on finalized source data. They are always the last data operation:

python3 "$XLSX_SKILL_DIR/xlsx.py" inspect input.xlsx --pretty   # understand structure
python3 "$XLSX_SKILL_DIR/xlsx.py" pivot input.xlsx output.xlsx \
    --source "Sheet!A1:F100" \
    --values "Revenue:sum,Units:count" \
    --rows "Product,Region" \
    --cols "Quarter" \
    --filters "Year" \
    --location "Summary!A3" \
    --style "finance" \
    --chart "bar"

Aggregations: sum, count, average/avg, max, min Chart types: bar (default), line, pie Styles: monochrome (default), finance

Never modify pivot output with openpyxl afterward — it corrupts the pivotCache.

Role 3: Inspector

The Inspector runs after all sheets are built. Two levels of inspection: Semantic and Structural.

Semantic Inspection (for edit/transform tasks)

When the task involves transforming existing data (not creating from scratch), verify the transformation didn't corrupt meaning:

Check	Method
Row count	Does output have the expected number of rows? (e.g., grouping 15 rows by 5 keys → expect 5 rows)
Column totals	Do numeric sums in output match source? (or expected transformation)
Spot-check	Compare 2-3 specific rows between source and output
Formula evaluability	Can formulas be verified in Python? If self-referencing or cross-sheet, verify computed values instead

# Semantic verification template
source_total = sum(normalize_cell_value(ws_src.cell(row=r, column=c).value) or 0
                   for r in range(start, end + 1))
output_total = sum(normalize_cell_value(ws_out.cell(row=r, column=c).value) or 0
                   for r in range(out_start, out_end + 1))
assert abs(source_total - output_total) < 0.01, f"Total mismatch: {source_total} vs {output_total}"

Structural Inspection (release gate)

python3 "$XLSX_SKILL_DIR/xlsx.py" validate output.xlsx

Exit 0 → file is releasable
Non-zero → Builder must regenerate from scratch with corrected code

Known Traps & Countermeasures

These are recurring failure modes. The Builder must internalize them.

Trap	What Goes Wrong	Countermeasure
`data_only=True` then save	Formulas permanently replaced with cached values	Never save after opening with `data_only=True`
Column index miscalculation	col 64 ≠ "BK"	Always use `openpyxl.utils.get_column_letter()`
Row offset confusion	DataFrame row 5 = Excel row 6	Excel is 1-indexed, pandas is 0-indexed
NaN leaks into formulas	`=A1+nan` → broken formula string	Check `pd.notna()` before referencing
Cross-sheet reference typo	`Sheet1!A1` vs `'Sheet 1'!A1`	Quote sheet names containing spaces
Division by zero	`#DIV/0!` in Excel	Wrap with `IFERROR()` or `IF(denom=0,...)`
Text starting with `=`	`#NAME?` error	Prefix descriptive text with `'`
Implicit array formula	`#N/A` in Excel	Avoid `MATCH(TRUE(),range>0,0)`, use `SUMPRODUCT`
Chart renders blank	Formula cells have no cached values	Run `recalc` before creating charts
Hidden rows → empty chart	Chart skips hidden data	Set `chart.plot_visible_only = False`
Overlapping charts	Multiple charts stacked on same cells	Calculate anchor: ~15 rows per chart + 2 rows gap
Verify newly-written formulas with `data_only=True` → get `None`	openpyxl doesn't evaluate formulas; `data_only=True` only reads Excel's cached values which don't exist for new formulas	Compute expected values in Python and compare directly. For TOTAL rows needing verification, write computed values (see SKILL.md Design Principle #1 Exception)
Manual row sort breaks references	Value-swap sorting doesn't update formula references	After sorting by swapping data, regenerate all formula strings with updated row numbers
NBSP (`\xa0`) treated as non-empty	Cells containing `\xa0` or `\u200b` look blank but fail `is None`	Normalize: `\xa0`, `\u200b`, whitespace-only → `None` before comparison or aggregation

Cross-Validation Review Sheet

For analysis-heavy deliverables, embed a self-checking Review sheet in the workbook.

When Required

Deliverables with computed metrics or aggregated data
Financial models with cross-sheet references
Data sourced from external APIs or web searches

Structure

review_ws = wb.create_sheet("Review")
review_ws.sheet_properties.tabColor = "FFC000"  # amber tab

checks = [
    ["Check", "Expected", "Actual", "Status"],
    ["Total Revenue", "=SUM(Data!B2:B100)", "=Summary!B10", '=IF(B2=C2,"✓ PASS","✗ FAIL")'],
    ["Row Count", "=COUNTA(Data!A:A)-1", "=Summary!B3", '=IF(B3=C3,"✓ PASS","✗ FAIL")'],
    ["Grand Total Match", "=Detail!F50", "=Dashboard!C5", '=IF(B4=C4,"✓ PASS","✗ FAIL")'],
]
for i, row in enumerate(checks, 1):
    for j, val in enumerate(row, 1):
        review_ws.cell(row=i, column=j, value=val)

Rules

Every Summary/Dashboard metric must have a cross-check formula back to source data
Status column uses live formulas — green if correct, red if mismatch
Review is the last sheet in the workbook (before Sources, if present)

Release Checklist

Before handing the file to the user:

Every sheet passed the Builder's self-check chain
Semantic inspection passed (if applicable)
validate returned exit code 0
All temp files, drafts, and retry artifacts removed
If multiple versions exist from retries, only the latest correct version remains
Every remaining file in the output directory is an expected deliverable
VBA check (if .xlsm): VBA modules preserved, no unintended macro removal
VBA security (if VBA generated): passes security checklist in scenes/vba.md

9.1 KiB Executable File Raw Blame History