Initial commit

This commit is contained in:
Z User
2026-06-06 05:21:10 +00:00
Unverified
commit 6664758a6d
493 changed files with 135653 additions and 0 deletions

13
skills/xlsx/LICENSE.txt Executable file
View File

@@ -0,0 +1,13 @@
Copyright (c) 2026 Z.ai All rights reserved.
Permission is granted for personal, educational, and non-commercial use only.
Commercial use is strictly prohibited without prior written permission from the author.
Unauthorized copying, modification, or distribution of the software for commercial purposes is prohibited.
The author reserves the right to make the final determination of what constitutes "commercial use".
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY ARISING FROM THE USE OF THE SOFTWARE.

230
skills/xlsx/SKILL.md Executable file
View File

@@ -0,0 +1,230 @@
---
name: xlsx
metadata:
author: Z.AI
version: "1.0"
description: "Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file; create a new spreadsheet from scratch or from other data sources; analyze data and output results as an Excel file with charts; convert between tabular file formats (CSV/JSON/PDF → XLSX or vice versa); clean, merge, pivot, or transform tabular data. Trigger especially when the user references a spreadsheet file by name or path, says 'make a table/report/model', mentions Excel/CSV/数据分析/报表/汇总, or wants data visualization inside a spreadsheet."
license: Proprietary. LICENSE.txt has complete terms
---
# XLSX — Scene-Driven Spreadsheet Workbench
## Quick Setup
```bash
bash "$XLSX_SKILL_DIR/setup.sh" # Interactive environment check + install
```
## Pre-Flight: Intent Gate
Before touching any code, confirm the user actually needs a spreadsheet:
- Report / analysis summary (述职, 调研报告) → **docx skill**
- Presentation (汇报, 演示, pitch deck) → **pptx skill**
- Formal print document (合同, 证书, "PDF") → **pdf skill**
- Charts only, no data table needed → **charts skill**
- User explicitly says a format → respect it
If confirmed xlsx → proceed to Scene Router below.
**Request Decomposition** (do this every time):
- **Explicit needs**: sheets, columns, formulas, metrics the user stated
- **Implicit needs**: business context, downstream use (filter? sort? input?)
- **Multi-part requests**: generate ALL parts — never silently drop a component
**Multi-Intent Detection** — some requests combine multiple scenes:
```
"Create a financial model with charts and export a PDF summary"
→ scenes/finance.md + engines/chart.md + (hand off PDF to pdf skill)
"Analyze this CSV, build a dashboard, and make it look professional"
→ scenes/analyze.md + engines/chart.md + engines/design.md
"Edit this budget file, add a new quarter column, and create a pivot"
→ scenes/edit.md + quality/pipeline.md (pivot command)
"Convert these 5 CSVs into one xlsx with a summary sheet"
→ scenes/convert.md + scenes/create.md (for summary)
```
When multiple intents detected, load all matching files and execute in logical order: data preparation → analysis → visualization → styling → QA.
---
## Complexity Gate (evaluate BEFORE Scene Router)
Determine task complexity to control file loading depth:
```
User Request
├─ LITE (single aggregation, simple chart, direct conversion, QA-only)
│ → Load: SKILL.md + ONE scene file (lean version)
│ → Skip: engine files (use built-in knowledge for basic styles)
│ → QA: audit + validate only
│ → Target: ≤ 400 lines total context
└─ FULL (multi-dimensional analysis, financial model, dashboard, KANO, etc.)
→ Load: SKILL.md + scene + engines (chart.md / design.md) as needed
→ For code patterns: load recipes/templates files ON DEMAND (not upfront)
→ QA: full pipeline (recalc → audit → scan → chart-verify → validate)
→ Target: load recipes/templates only when stuck on implementation
```
**LITE triggers**: single groupby, one chart, format conversion, inspect/audit/validate, simple pivot
**FULL triggers**: correlation matrix, multi-sheet dashboard, statistical analysis, financial model, KANO/funnel/cohort
---
## Scene Router
```
User Request
├─ Involves an existing file?
│ ├─ Yes → Modify content or structure?
│ │ ├─ Yes ──────────────────── → scenes/edit.md
│ │ └─ No (read/analyze only) ─ → scenes/analyze.md
│ │
│ └─ Format conversion (CSV↔XLSX, JSON, PDF tables)?
│ └─ Yes ────────────────────────── → scenes/convert.md
├─ Create from scratch?
│ ├─ Financial / budget / forecast / cost tracking?
│ │ ├─ Complex (DCF / LBO / three-statement linkage (三表联动) / sensitivity / IB model)?
│ │ │ └─ Yes ─────────────────────── → scenes/finance.md
│ │ └─ Simple (budget table (预算表) / expense report (费用报表) / revenue vs cost (收支对比) / project cost (项目成本) / personal finance (个人记账))?
│ │ └─ Yes ─────────────────────── → scenes/finance_lite.md
│ └─ General table / report / template
│ └─ ──────────────────────────── → scenes/create.md
├─ Batch processing / large files / protection / validation?
│ └─ Yes ───────────────────────────── → scenes/advanced.md
├─ VBA / macros / automation inside Excel?
│ └─ Yes ───────────────────────────── → scenes/vba.md + engines/vba-templates.md
├─ Needs charts or data visualization?
│ └─ Yes ───────────── append ────────→ engines/chart.md
└─ Needs styling / design system?
└─ Yes ───────────── append ────────→ engines/design.md
```
**Mixed requests**: load all matching files. Engine files always **append** to a scene.
**Finance detection**:
- **finance.md** (complex): DCF, LBO, P&L, 利润表, 资产负债, valuation, 估值, IRR, 三表联动, sensitivity, scenario
- **finance_lite.md** (simple): 预算, budget, 费用, expense, 收支, 记账, 项目成本, cost tracking, 报销, ROI
**VBA detection**: 宏, macro, VBA, 自动化, automation, .xlsm, 按钮, button, auto-run, 批量处理脚本
---
## Design Principles
### 1. Live Formula Guarantee
Every derived value SHOULD be an Excel formula so the spreadsheet stays dynamic.
**Exception — Programmatic Verification**: When the output file will be verified by Python (not opened in Excel), TOTAL/SUM rows should write **computed values** instead of formulas, because openpyxl cannot evaluate formulas and `data_only=True` returns `None` for newly-written formulas. Optionally add the formula as a cell comment for reference.
### 2. Zero Error Tolerance
Deliverables must have zero formula errors. All divisions wrapped with `IFERROR` or `IF(denom=0,...)`. Absolute references (`$C$42`) for shared denominators.
### 3. Compatibility First
No dynamic array functions (`FILTER`, `UNIQUE`, `XLOOKUP`, `SORT`, `SORTBY`, `XMATCH`, `SEQUENCE`, `LET`, `LAMBDA`, `RANDARRAY`). No implicit array formulas — use `SUMPRODUCT` alternatives.
### 4. Preserve & Match
When editing existing files: study and exactly match format, style, conventions. Existing patterns always override defaults. Text starting with `=` must be prefixed with `'`.
### 5. Language Mirror
Output language (sheet names, headers, labels) matches user's input language.
### 6. Data Consistency Over Instructions
When user instructions conflict with the actual data patterns in the existing file:
- **First priority**: match the existing data pattern (e.g., if existing data uses `0` for empty, don't switch to `-`)
- **Second priority**: follow user instructions literally
- Always flag the conflict to the user
Example: User says "show hyphen for zero" but existing data and answer key use numeric `0` → Use `0` and notify user of the discrepancy.
---
## Toolchain
### Script Path Setup (MANDATORY before any script call)
All CLI tools live relative to this skill's directory. Before calling any script, resolve the absolute path once:
```bash
XLSX_SKILL_DIR="<skill_directory>" # ← parent directory of this SKILL.md
# Then all commands use absolute paths:
python3 "$XLSX_SKILL_DIR/xlsx.py" inspect data.xlsx --pretty
python3 "$XLSX_SKILL_DIR/xlsx.py" pivot data.xlsx output.xlsx --rows Region --values Revenue
python3 "$XLSX_SKILL_DIR/xlsx.py" validate output.xlsx
```
**For Python imports** (when generation code needs to import skill modules):
```python
import sys, os
XLSX_SKILL_DIR = "<skill_directory>"
for sub in [XLSX_SKILL_DIR, os.path.join(XLSX_SKILL_DIR, "templates")]:
if sub not in sys.path:
sys.path.insert(0, sub)
```
**⚠️ NEVER use bare `python3 xlsx.py ...`** — it only works if cwd happens to be the skill directory. Always use the absolute path.
### Tool Reference
| Tool | Use |
|------|-----|
| **openpyxl** | Formulas, formatting, charts, cell-level control |
| **pandas** | Data analysis, bulk operations, CSV/TSV |
| `load_workbook(read_only=True)` | Large file reads |
| `Workbook(write_only=True)` | Large file writes |
| **templates/base.py** | Design tokens, font resolution, style factories, utilities (single source of truth) |
| **xlsx.py** | QA commands (see `quality/pipeline.md`) |
Workbook metadata: `wb.properties.creator = "Z.ai"`
> **All code must import from `templates/base.py`** for colors, fonts, and style helpers. Never hardcode hex values or font names.
---
## Quality Gate
Every deliverable must pass the full integrity pipeline before delivery.
**Load `quality/pipeline.md` for the role-based integrity workflow.**
Quick reference:
```
Blueprint → Build & Self-check (per-sheet) → Inspect → Pivot (if needed) → Release
```
---
## Capability Matrix
| Capability | Supported | Scene/Engine |
|-----------|-----------|-------------|
| Create from scratch | ✅ | scenes/create |
| Edit existing file | ✅ | scenes/edit |
| Data analysis & EDA | ✅ | scenes/analyze |
| Format conversion | ✅ | scenes/convert |
| Financial models (DCF/LBO/P&L) | ✅ | scenes/finance |
| Simple budgets & expenses | ✅ | scenes/finance_lite |
| VBA macros & automation | ✅ | scenes/vba + engines/vba-templates |
| Batch processing | ✅ | scenes/advanced |
| Embedded charts | ✅ | engines/chart |
| Smart chart recommendation | ✅ | engines/chart |
| Design system & styling | ✅ | engines/design |
| PivotTable creation | ✅ | quality/pipeline (pivot cmd) |
| Formula validation | ✅ | quality/pipeline |
| Structural validation | ✅ | quality/pipeline |
| Data provenance tracking | ✅ | scenes/analyze |
| Large file handling | ✅ | scenes/advanced |
| Data protection & locking | ✅ | scenes/advanced |

View File

@@ -0,0 +1,167 @@
# Chart Templates — Implementation Code
> Load on demand when you need specific chart code. Do NOT load upfront.
---
## Native Excel Charts (openpyxl.chart)
### Bar Chart
```python
from openpyxl.chart import BarChart, Reference
from templates.base import make_chart_title
chart = BarChart()
chart.type = "col"
chart.title = make_chart_title("Revenue by Product", 14)
chart.y_axis.title = make_chart_title("Revenue ($)", 10, bold=False, axis=True)
chart.x_axis.title = make_chart_title("Product", 10, bold=False)
data = Reference(ws, min_col=3, min_row=4, max_col=3, max_row=last_row)
cats = Reference(ws, min_col=2, min_row=5, max_row=last_row)
chart.add_data(data, titles_from_data=True)
chart.set_categories(cats)
chart.shape = 4
chart.width = 18
chart.height = 10
ws.add_chart(chart, "J4")
```
### Line Chart
```python
from openpyxl.chart import LineChart, Reference
from templates.base import make_chart_title
chart = LineChart()
chart.title = make_chart_title("Monthly Trend", 14)
chart.y_axis.title = make_chart_title("Amount", 10, bold=False, axis=True)
chart.style = 10
data = Reference(ws, min_col=3, max_col=5, min_row=4, max_row=last_row)
cats = Reference(ws, min_col=2, min_row=5, max_row=last_row)
chart.add_data(data, titles_from_data=True)
chart.set_categories(cats)
for series in chart.series:
series.smooth = True
ws.add_chart(chart, "J4")
```
### Pie Chart
```python
from openpyxl.chart import PieChart, Reference
from openpyxl.chart.label import DataLabelList
from templates.base import make_chart_title
chart = PieChart()
chart.title = make_chart_title("Market Share", 14)
data = Reference(ws, min_col=3, min_row=4, max_row=last_row)
cats = Reference(ws, min_col=2, min_row=5, max_row=last_row)
chart.add_data(data, titles_from_data=True)
chart.set_categories(cats)
chart.dataLabels = DataLabelList()
chart.dataLabels.dLblPos = 'bestFit'
chart.dataLabels.showLeaderLines = True
chart.dataLabels.showCatName = True
chart.dataLabels.showPercent = True
chart.dataLabels.showVal = False
ws.add_chart(chart, "J4")
```
### Combo Chart (Bar + Line, dual axis)
```python
from openpyxl.chart import BarChart, LineChart, Reference
from templates.base import make_chart_title
bar = BarChart()
bar.add_data(Reference(ws, min_col=2, max_col=2, min_row=1, max_row=10), titles_from_data=True)
bar.title = make_chart_title("Revenue vs Growth", 14)
bar.y_axis.title = make_chart_title("Revenue ($)", 10, bold=False, axis=True)
line = LineChart()
line.add_data(Reference(ws, min_col=3, max_col=3, min_row=1, max_row=10), titles_from_data=True)
line.y_axis.title = make_chart_title("Growth %", 10, bold=False, axis=True)
line.y_axis.axId = 200
bar += line
ws.add_chart(bar, "E2")
```
---
## Matplotlib Charts (embedded as images)
### Chinese Font Setup
```python
import matplotlib
import matplotlib.pyplot as plt
import os
_font_path = os.path.join(os.path.dirname(__file__), '..', '..', '..', 'fonts', 'truetype', 'chinese', 'SimHei.ttf')
if not os.path.exists(_font_path):
# Fallback: try workspace fonts
_font_path = os.path.expanduser('/usr/share/fonts/truetype/chinese/SimHei.ttf')
if os.path.exists(_font_path):
matplotlib.font_manager.fontManager.addfont(_font_path)
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
```
### Standard Template
```python
fig, ax = plt.subplots(figsize=(10, 6))
ax.bar(categories, values, color='#4A90D9')
ax.set_title('Chart Title', fontsize=14, fontweight='bold', pad=15)
ax.set_xlabel('X Label', fontsize=11)
ax.set_ylabel('Y Label', fontsize=11)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.tick_params(axis='x', rotation=45)
fig.tight_layout(pad=2.0)
plt.legend(loc='best', fontsize='small')
fig.savefig('chart.png', dpi=150, bbox_inches='tight', facecolor='white')
plt.close()
```
### Embed in Excel (preserving aspect ratio)
```python
from openpyxl.drawing.image import Image as XlImage
from PIL import Image as PILImage
pil_img = PILImage.open('chart.png')
orig_w, orig_h = pil_img.size
target_w = 600
scale = target_w / orig_w
xl_img = XlImage('chart.png')
xl_img.width = target_w
xl_img.height = int(orig_h * scale)
ws.add_image(xl_img, 'B20')
```
### Smart Chart Recommend Function
```python
def recommend_chart(df, x_col, y_cols):
if pd.api.types.is_datetime64_any_dtype(df[x_col]):
return "line"
n_categories = df[x_col].nunique()
n_series = len(y_cols)
if n_series == 1:
vals = df[y_cols[0]]
if vals.sum() > 95 and vals.sum() < 105:
return "pie" if n_categories <= 5 else "bar_horizontal"
if n_categories <= 6:
return "bar_grouped" if n_series > 1 else "bar"
elif n_categories <= 15:
return "bar_horizontal"
else:
return "bar_top10"
```

87
skills/xlsx/engines/chart.md Executable file
View File

@@ -0,0 +1,87 @@
# Chart Engine — Selection & Specs
> Load this file when the task needs charts. For code templates, load `engines/chart-templates.md` on demand.
---
## Decision: Native Excel Chart vs Matplotlib Image
| Situation | Use |
|-----------|-----|
| User will interact with chart in Excel (resize, filter, update) | **Native Excel chart** (openpyxl.chart) |
| Publication-quality or complex visualization (heatmap, multi-axis) | **Matplotlib image** → embed in Excel |
| Dashboard with multiple small charts | **Matplotlib** (more layout control) |
| Simple bar/line/pie from sheet data | **Native Excel chart** |
---
## Chart Size & Placement
```python
CHART_SIZES = {
'small': (12, 7), # ~400x230px — inline with data
'medium': (18, 10), # ~600x330px — standard report chart
'large': (24, 14), # ~800x460px — full-width dashboard
}
```
Multiple charts on one sheet: each chart ≈ 15 rows tall + 2 rows gap. Calculate anchor positions to prevent overlap.
---
## Smart Chart Recommendation
When user doesn't specify chart type, auto-select:
| Data Pattern | Best Chart | Avoid |
|-------------|-----------|-------|
| Trend over time | Line | Pie |
| Category comparison (≤6) | Bar (vertical) | Pie |
| Category comparison (7-15) | Horizontal Bar | Vertical bar |
| Category comparison (>15) | Top 10 bar + "Others" | All-in-one |
| Part of whole (≤5 slices) | Pie / Donut | Bar |
| Part of whole (>8) | Horizontal Bar | Pie |
| Distribution | Histogram | Pie |
| Correlation | Scatter | Bar |
| Budget vs Actual | Clustered Bar + variance line | Pie |
| Mixed scales ($ + %) | Combo (bar + line) | Single axis |
### Auto-Detection from Headers
| Header patterns | Suggested chart |
|----------------|-----------------|
| Date, Month, Quarter, Year, 月, 季度, 年 | Line / Area |
| Category, Type, Product, Region, 类别, 产品 | Bar |
| Percentage, Share, %, 占比, 份额 | Pie / Donut |
| Budget + Actual, 预算 + 实际 | Clustered Bar |
| Revenue + Cost + Profit, 收入 + 成本 + 利润 | Stacked Bar / Combo |
| Growth, Change, Δ, 增长, 变化 | Line with markers |
---
## Critical Rules
1. **Anti-Overlap**: Always `fig.tight_layout(pad=2.0)` before `savefig()`; use `plt.legend(loc='best')`
2. **`titles_from_data=True`**: First row of data reference MUST contain text headers
3. **Cached Values**: Run `recalc` before adding charts that reference formula cells
4. **Hidden Data**: Set `chart.plot_visible_only = False` when chart data comes from hidden rows
5. **Aspect Ratio**: When embedding matplotlib PNGs, always calculate proportional height from original dimensions
6. **Chinese Font**: Must configure SimHei before any matplotlib plotting
### Color Palette
Chart colors are derived from **`engines/design.md §9`**. Do NOT define independent colors.
```python
# Import from design tokens — single source of truth
# CHART_COLORS = [PRIMARY, ACCENT_POSITIVE, ACCENT_WARNING, ACCENT_NEGATIVE, NEUTRAL_600]
# Single series → PRIMARY only
# Two series → PRIMARY + ACCENT_POSITIVE
# Never exceed 5 colors
```
---
## Code Templates
For specific chart implementation code (bar, line, pie, scatter, combo, matplotlib embed), load `engines/chart-templates.md`.

575
skills/xlsx/engines/design.md Executable file
View File

@@ -0,0 +1,575 @@
# XLSX Design System
**The single authoritative style reference. All table styles must be derived from this file — custom color values are prohibited.**
---
## 1. Design Philosophy
### Borderless-first
Visual elements exist only where data exists; blank areas remain **completely clean** — no borders, no fills, no visual noise.
- **Has content → has style**: Data regions use alternating row fills to distinguish rows
- **No content → no trace**: Cells outside the data range receive no formatting whatsoever
- **Minimal borders**: Only a single line at the header bottom and totals top; everything else relies on whitespace and alternating fills
### Typography First
Establish hierarchy through **font size, font weight, and color value**, not lines and frames.
### Three-Color Discipline
The entire table uses **at most 3 color roles**: primary, secondary, and accent. Everything else is black/white/gray warm tones.
---
## 2. Color System (Three-Color Rule)
### 2.1 Color Role Definitions
Each table allows only 3 color roles + neutral base:
| Role | Token | Responsibility | Area Ratio |
|------|-------|----------------|------------|
| **Primary** | `primary` | Header background, title text | ~5-8% |
| **Secondary** | `secondary` | Section title background, totals row | ~2-3% |
| **Accent** | `accent` | Status indicators (positive/negative/warning) | ≤2% |
| **Neutral** | `neutral-*` | Body text, background, alternating rows | ~90% |
### 2.2 Default Palette
```python
# === Primary (deep blue family) ===
PRIMARY = "1B2A4A" # Header background, title text color
PRIMARY_LIGHT = "D6E4F0" # Light variant of primary → secondary (section titles, totals row)
# === Secondary (derived from primary) ===
SECONDARY = PRIMARY_LIGHT # Secondary = light version of primary
# === Accent (semantic, used on demand) ===
ACCENT_POSITIVE = "1B7D46" # Positive: growth, completed, on-target (deep green)
ACCENT_NEGATIVE = "C0392B" # Negative: decline, overdue, off-target (deep red)
ACCENT_WARNING = "D4820A" # Warning: approaching, needs attention (deep amber)
# === Neutral palette (warm gray) ===
NEUTRAL_900 = "37352F" # Body text
NEUTRAL_600 = "8C8A84" # Secondary text, annotations
NEUTRAL_200 = "E9E9E8" # Divider lines, header bottom line
NEUTRAL_100 = "F7F7F5" # Alternating row fill (odd rows)
NEUTRAL_50 = "FAFAF9" # Very light base color (optional)
NEUTRAL_0 = "FFFFFF" # White (even rows)
```
### 2.3 Style Palette System (Style-First Palette Engine)
Palettes are implemented via `templates/palettes.py`, **purely style-driven, not bound to domains**.
Domains (finance/education/sales…) only affect data formats and header conventions, not colors.
**12 style palettes:**
All theme headers use PRIMARY background + white text.
| # | Style | Keyword Triggers | PRIMARY | Positioning |
|---|------|-----------|---------|------|
| 01 | **professional** | 正式/商务/汇报/默认 | `1B2A4A` deep blue | Universal default |
| 02 | **warm** | 温暖/活力/热情 | `B85C1E` warm orange | Vibrant and impactful |
| 03 | **elegant** | 极简/简约 | `2C2C2C` charcoal | High-end minimalist |
| 04 | **creative** | 文艺/莫兰迪/设计感 | `6C5B7B` purple-gray | Artistic distinction |
| 05 | **muji** | 无印/呼吸感/素净 | `2C2C2C` warm black | MUJI pencil-on-paper |
| 06 | **aesop** | 沙岩/大地色/护肤 | `3D3229` earth brown | Premium skincare packaging |
| 07 | **kinfolk** | 奶油/刊物/杂志/拿铁 | `5C524C` cocoa | Independent magazine aesthetic |
| 08 | **celine** | 黑白/时装/冷冽/mono | `000000` pure black | Fashion house coldness |
| 09 | **bottega** | 墨绿/深绿/森林/贵气 | `2D4A3E` dark green | Italian luxury restraint |
| 10 | **chanel** | 米金/香奈儿/奶茶/高级 | `1C1917` ink | Champagne gold elegance |
| 11 | **bloomberg** | 终端/深蓝/金融终端/工业/包豪斯 | `0D1B2A` deep space | Financial data aesthetic |
| 12 | **original_blue** | 原始/经典蓝/传统蓝 | `1B2A4A` classic blue | Original blue-black scheme |
**Three-step matching logic (priority from high to low):**
1. **Explicit style keywords** → direct match ("make a warm table" → warm)
2. **Scene keyword inference** → indirect match ("sales monthly report" → warm, "student grades" → muji)
3. **No match** → professional (safe default, no guessing)
**Usage:**
```python
import base
base.use_palette("help me make a warm sales monthly report") # → warm
base.use_palette_explicit("warm") # → warm
base.get_active_style() # → 'warm'
```
Each palette is a complete color set (PRIMARY + SECONDARY + ACCENT × 3 + NEUTRAL × 6 + HEADER_TEXT + CHART_COLORS + CF backgrounds).
When `use_palette` is not called, the default behavior is identical to before (professional = deep blue).
### 2.4 Special Color Rules for Finance Scenarios
Only when the scene is Finance, add the following text color encoding (IB industry convention, overrides default NEUTRAL_900):
| Text Color | Hex | Meaning |
|------------|-----|--------|
| Blue `0000FF` | Manual input values (user-modifiable) |
| Black `000000` | Formula/calculated values |
| Green `008000` | Cross-sheet references |
| Red `FF0000` | External file references |
### 2.5 Color Prohibitions
- ❌ Do not introduce any new hues outside of `ACCENT_*`
- ❌ Do not use color for decoration (primary color is sufficient for colored headers)
- ❌ No gradient fills
- ❌ Do not mix two different PRIMARY colors in the same table
---
## 3. Font System
### 3.1 Font Hierarchy
| Token | Size | Weight | Color | Usage |
|-------|------|--------|-------|-------|
| `font-title` | 16pt | `HEADER_BOLD`* | `PRIMARY` | Table title (B2) |
| `font-header` | 11pt | `HEADER_BOLD`* | `#FFFFFF` | Column headers (white text on primary background) |
| `font-subheader` | 11pt/12pt | `HEADER_BOLD`* | `PRIMARY` | Section titles, totals row |
| `font-body` | 11pt | Normal | `NEUTRAL_900` | Body data |
| `font-caption` | 9pt | Normal | `NEUTRAL_600` | Annotations, sources, footnotes |
| `font-kpi` | 22pt | `HEADER_BOLD`* | `PRIMARY` | KPI large numbers (analysis scenes only) |
| `font-kpi-label` | 9pt | Normal | `NEUTRAL_600` | KPI labels |
> \* `HEADER_BOLD` is determined at runtime by §3.3. Heavy-stroke fonts (SimHei/YaHei/PingFang, etc.) → False, thin-stroke fonts → True.
### 3.2 Font Selection (Cross-Platform Fallback Chain)
openpyxl's `Font(name=...)` can only specify a single font name and does not support CSS-style fallback chains.
Therefore, **runtime platform detection** is needed to select the first available font from the fallback sequence:
```python
import platform, os
from openpyxl.styles import Font
def _resolve_font(candidates: list[str]) -> str:
"""Return the first font name likely available on this OS."""
system = platform.system()
# Quick lookup: match common fonts by platform
_platform_hints = {
"Darwin": {"PingFang SC", "Hiragino Sans GB", ".AppleSystemUIFont"},
"Windows": {"Microsoft YaHei", "SimHei", "SimSun"},
"Linux": {"Noto Sans CJK SC", "WenQuanYi Micro Hei", "Source Han Sans SC"},
}
available_hints = _platform_hints.get(system, set())
for name in candidates:
if name in available_hints:
return name
# Fallback: return the first in sequence (Excel will fallback on its own when opened)
return candidates[0]
# === Font fallback sequences ===
# CJK body text (CJK Sans): prefer platform-native sans-serif fonts
CJK_BODY_CHAIN = [
"PingFang SC", # macOS native, best rendering
"Microsoft YaHei", # Windows native, screen-optimized
"Noto Sans CJK SC", # Linux / Android universal
"Hiragino Sans GB", # macOS alternative
"Source Han Sans SC", # Adobe Source Han Sans, cross-platform
"SimHei", # Classic fallback
]
# Latin/numbers: serif (for formal reports)
LATIN_BODY_CHAIN = [
"Times New Roman", # Available on virtually all platforms
"Georgia",
"serif",
]
# Runtime resolution
FONT_CJK = _resolve_font(CJK_BODY_CHAIN)
FONT_LATIN = _resolve_font(LATIN_BODY_CHAIN)
# openpyxl can only set one name; use the CJK font for Chinese tables (it also covers ASCII characters)
FONT_NAME = FONT_CJK
```
**Rules**:
- Use `FONT_NAME` uniformly across the entire table — do not mix fonts
- All `Font(name=...)` in code must use the `FONT_NAME` variable — **hardcoding font names is prohibited**
- If the user explicitly specifies a font, respect the user's choice
### 3.3 Header Bold Strategy
Not all fonts are suitable for bold. Heavy-stroke fonts (like SimHei, YaHei) become blurry when bolded —
hierarchy should be established through **font size differences or color contrast**, not font weight:
```python
# Determine whether the font is suitable for bold based on font name
_HEAVY_FONTS = {
"SimHei", "Microsoft YaHei", "PingFang SC",
"Noto Sans CJK SC", "Source Han Sans SC",
"Hiragino Sans GB", "WenQuanYi Micro Hei",
}
HEADER_BOLD = FONT_NAME not in _HEAVY_FONTS
# → Heavy fonts (SimHei/YaHei/PingFang, etc.): headers not bolded, rely on background color + white text
# → Thin fonts (SimSun/Times New Roman, etc.): headers bolded
```
**Hierarchy alternatives when `HEADER_BOLD = False`**:
- Headers: no bold, rely on **primary background + white text** for distinction
- Titles: no bold, use **larger font size (16pt vs 11pt)** for hierarchy
- Totals row: no bold, use **secondary background + primary text** for distinction
- Section titles: no bold, use **primary text + slightly larger size (12pt)** for distinction
### 3.4 Alignment Rules
| Data Type | Horizontal Alignment | Notes |
|-----------|---------------------|-------|
| Numbers/amounts/percentages | Right-aligned | Ensures decimal point alignment |
| Dates | Center-aligned | |
| Text | Left-aligned | |
| Headers | Center-aligned | |
| Titles | Left-aligned | ❌ Not centered |
---
## 4. Layout System
### 4.1 Starting Position and Margins
```
A B C D E ...
1 [blank] [blank] [blank] [blank] [blank] ← Top margin
2 [blank] Title ───────────────────────── ← Title row (starts at B2, merged to data width)
3 [blank] [blank] [blank] [blank] [blank] ← Spacing row (optional: subtitle/date)
4 [blank] Header1 Header2 Header3 Header4 ← Header row
5 [blank] Data Data Data Data ← Data area start
```
- **Canvas Origin**: `B2` (left margin Column A + top margin Row 1)
- **Column A width**: 3 (pure whitespace for visual breathing room)
- **Row 1 height**: 15pt (top margin)
### 4.2 Row Height Standards
| Row Type | Height | Notes |
|----------|--------|-------|
| Title row (Row 2) | 32pt | 16pt font + top/bottom breathing room |
| Spacing row (Row 3) | 8pt | Gap between title and header |
| Header row (Row 4) | 28pt | 11pt font + wrap_text space |
| Data rows | 22pt | 11pt font + comfortable reading |
| Totals row | 26pt | Slightly taller than data rows for emphasis |
### 4.3 Column Width Guidelines
```python
COLUMN_WIDTHS = {
'margin': 3, # Column A whitespace
'id_short': 8, # Serial number, ID
'name_cn': 16, # Chinese name (2-4 chars)
'name_en': 22, # English name
'description': 32, # Long text
'number': 14, # Amounts, quantities
'percentage': 12, # Percentages
'date': 14, # Dates
'status': 12, # Status labels
}
# CJK character ≈ 2.5 units, Latin ≈ 1.2 units
# Minimum 8, maximum 40
```
### 4.4 Auto-Fit Column Widths (Recommended)
After populating data, call `auto_fit_columns(ws)` from `templates/base.py` to automatically size columns based on **data content** (not headers). Headers that exceed the computed width get `wrap_text=True` instead of stretching the column.
```python
from templates.base import auto_fit_columns
# After all data is written:
auto_fit_columns(ws, min_width=8, max_width=28, header_row=4, data_start_row=5)
```
**Rules**:
- Column width is determined by the widest **data cell**, not the header
- CJK characters are counted as 1.7x width (via `unicodedata.east_asian_width`)
- Headers wider than the column automatically get `wrap_text=True`
- This prevents the common problem of headers being wider than data content
---
## 5. Border System (Borderless-first)
### 5.1 Allowed Borders
| Position | Style | Color | Purpose |
|------|------|------|------|
| Header bottom | `thin` | `NEUTRAL_200` | Separate header from data |
| Totals top | `medium` | `NEUTRAL_200` | Mark summary row |
### 5.2 Prohibited Borders
- ❌ Full grid (all-sides thin border)
- ❌ Colored borders
- ❌ Double-line borders
- ❌ Thick borders (medium/thick) for decoration
### 5.3 Row Separation Alternative
Use **alternating row fills** instead of grid lines:
- Even rows: `NEUTRAL_0` (white)
- Odd rows: `NEUTRAL_100` (warm white `#F7F7F5`)
### 5.4 Finance Scene Exception
Finance scene retains section dividers (`PRIMARY` color), following IB industry convention.
---
## 6. Title Row Design
### 6.1 Title Style
```python
# Title: plain text, no background fill
title_font = Font(name=FONT_NAME, size=16, bold=HEADER_BOLD, color=PRIMARY)
title_align = Alignment(horizontal='left', vertical='center')
# Position: B2, merged to the last data column
ws.merge_cells(start_row=2, start_column=2, end_row=2, end_column=last_col)
ws['B2'].font = title_font
ws['B2'].alignment = title_align
ws.row_dimensions[2].height = 32
```
### 6.2 Header Style
```python
# Header: primary color background + white text
header_fill = PatternFill('solid', fgColor=PRIMARY)
header_font = Font(name=FONT_NAME, size=11, bold=HEADER_BOLD, color="FFFFFF")
header_align = Alignment(horizontal='center', vertical='center', wrap_text=True)
header_border = Border(bottom=Side(style='thin', color=NEUTRAL_200))
for cell in header_row:
cell.fill = header_fill
cell.font = header_font
cell.alignment = header_align
cell.border = header_border
ws.row_dimensions[header_row_num].height = 28
```
### 6.3 Totals Row
```python
total_fill = PatternFill('solid', fgColor=SECONDARY) # PRIMARY_LIGHT
total_font = Font(name=FONT_NAME, size=11, bold=HEADER_BOLD, color=PRIMARY)
total_border = Border(top=Side(style='medium', color=NEUTRAL_200))
for cell in total_row:
cell.fill = total_fill
cell.font = total_font
cell.border = total_border
```
---
## 7. Data Area Styles
### 7.1 Alternating Row Fill
```python
for i, row in enumerate(ws.iter_rows(min_row=data_start, max_row=data_end)):
fill_color = NEUTRAL_0 if i % 2 == 0 else NEUTRAL_100
for cell in row:
cell.fill = PatternFill('solid', fgColor=fill_color)
cell.font = Font(name=FONT_NAME, size=11, color=NEUTRAL_900)
# ❌ No borders
```
### 7.2 Empty Data Area
Cells outside the data range receive **no formatting** — no fill, no borders, no font settings. Keep Excel defaults.
### 7.3 Grid Lines
```python
ws.sheet_view.showGridLines = False # Disable Excel default grid lines
```
---
## 8. Conditional Formatting
### 8.1 When to Use
| ✅ Use | ❌ Don't Use |
|---------|----------|
| Data has comparison/ranking semantics (scores, KPIs, growth rates) | Simple entry forms, reference tables |
| Financial data with positive/negative values (profit/loss, increase/decrease) | Data rows ≤5 |
| User explicitly requests | User requests minimalist style |
### 8.2 Color Rules
Conditional formatting **uses only accent colors**:
```python
# Positive → green background + green text
POSITIVE_FILL = PatternFill(bgColor='E8F5E9')
POSITIVE_FONT = Font(color=ACCENT_POSITIVE) # "1B7D46"
# Negative → red background + red text
NEGATIVE_FILL = PatternFill(bgColor='FDEDEC')
NEGATIVE_FONT = Font(color=ACCENT_NEGATIVE) # "C0392B"
# Warning → amber background + amber text
WARNING_FILL = PatternFill(bgColor='FEF9E7')
WARNING_FONT = Font(color=ACCENT_WARNING) # "D4820A"
```
### 8.3 Color Scale
```python
from openpyxl.formatting.rule import ColorScaleRule
# Red → Yellow → Green (low → mid → high)
ws.conditional_formatting.add('B5:B100',
ColorScaleRule(
start_type='min', start_color='F8696B',
mid_type='percentile', mid_value=50, mid_color='FFEB84',
end_type='max', end_color='63BE7B'))
```
### 8.4 Data Bar
```python
from openpyxl.formatting.rule import DataBarRule
ws.conditional_formatting.add('D5:D100',
DataBarRule(start_type='min', end_type='max',
color=PRIMARY, showValue=True))
# Data Bar color uses primary, maintaining color discipline
```
---
## 9. Chart Colors
Chart colors are **derived from the design system**, not maintained separately:
```python
CHART_COLORS = [
PRIMARY, # 1st data series = primary
ACCENT_POSITIVE, # 2nd series
ACCENT_WARNING, # 3rd series
ACCENT_NEGATIVE, # 4th series
NEUTRAL_600, # 5th series (gray)
]
```
- Single series chart → use only `PRIMARY`
- Two series → `PRIMARY` + `ACCENT_POSITIVE`
- Multiple series → pick colors in order from the table above
- **Never exceed 5 colors**
---
## 10. Number Formats
### 10.1 General Formats
```python
FORMATS = {
'integer': '#,##0',
'decimal_1': '#,##0.0',
'decimal_2': '#,##0.00',
'percentage': '0.0%',
'currency_cny': '¥#,##0.00',
'currency_usd': '$#,##0.00',
'date': 'YYYY-MM-DD',
}
```
### 10.2 Financial Formats
→ Full financial number format definitions are in **`scenes/finance.md §Number Formatting`**, not repeated here.
Brief rules: zero values `"-"`, negatives in parentheses `($123)`, headers indicate units `"Revenue ($mm)"`.
---
## 11. Code Templates
All design tokens, font resolution, and style factory functions have been extracted into **`templates/base.py`**.
> `templates/base.py` is the single code-level implementation. This file (design.md) is the design specification document; `base.py` is the corresponding executable code.
### Usage
```python
# In all scene/engine code, import base.py directly
import sys, os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'templates'))
from base import *
# Then use directly:
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws.title = "Sheet1"
headers = ["Column1", "Column2", "Column3"]
last_col = len(headers) + 1 # Starting from B=2
# One call to set up sheet basics + title
setup_sheet(ws, title="Table Title", last_col=last_col)
# Write headers
for col_idx, header in enumerate(headers, start=2):
ws.cell(row=4, column=col_idx, value=header)
style_header_row(ws, row_num=4, col_start=2, col_end=last_col)
# Write data rows
for i, row_data in enumerate(data):
row_num = 5 + i
for col_idx, value in enumerate(row_data, start=2):
ws.cell(row=row_num, column=col_idx, value=value)
style_data_row(ws, row_num=row_num, col_start=2, col_end=last_col, row_index=i)
# Write totals row
total_row_num = 5 + len(data)
style_total_row(ws, row_num=total_row_num, col_start=2, col_end=last_col)
wb.properties.creator = "Z.ai"
wb.save("output.xlsx")
```
### Complete API provided by base.py
| Category | Exports |
|------|------|
| **Constants** | `FONT_NAME`, `HEADER_BOLD`, `PRIMARY`, `PRIMARY_LIGHT`, `SECONDARY`, `ACCENT_*`, `NEUTRAL_*`, `CHART_COLORS`, `COLUMN_WIDTHS`, `FORMATS`, `ROW_HEIGHTS` |
| **Conditional Formatting** | `CF_POSITIVE_FILL/FONT`, `CF_NEGATIVE_FILL/FONT`, `CF_WARNING_FILL/FONT` |
| **Font Factories** | `font_title()`, `font_header()`, `font_subheader()`, `font_body()`, `font_caption()`, `font_kpi()`, `font_kpi_label()` |
| **Fill Factories** | `fill_header()`, `fill_total()`, `fill_data_row(row_index)` |
| **Border Factories** | `border_header()`, `border_total()` |
| **Alignment Factories** | `align_title()`, `align_header()`, `align_number()`, `align_text()`, `align_date()` |
| **Sheet Helpers** | `setup_sheet(ws, title, last_col)`, `style_header_row(...)`, `style_data_row(...)`, `style_total_row(...)` |
| **Utility Functions** | `normalize_cell_value(value)`, `copy_style(source, target)` |
---
## 12. Design Checklist
Verify each item before delivering a table:
- [ ] Colors ≤ 3 hues (primary + accents, excluding neutrals)
- [ ] No formatting outside data area
- [ ] No full-grid borders (only header bottom line + totals top line)
- [ ] Alternating row fill applied
- [ ] Title has no background fill, left-aligned
- [ ] Numbers right-aligned, text left-aligned
- [ ] Grid lines disabled (`showGridLines = False`)
- [ ] Starting position is B2, column A is margin
- [ ] Body text color is `NEUTRAL_900` (`#37352F`), not pure black
- [ ] Chart colors come from Design Tokens, no new colors introduced

View File

@@ -0,0 +1,435 @@
# VBA Code Templates
Ready-to-use VBA templates for common automation tasks. Copy and customize.
Load `scenes/vba.md` first for code standards and injection workflow.
---
## Template 1: Auto-Generate Monthly Report
```vba
Option Explicit
' ============================================================
' Module: ModMonthlyReport
' Purpose: Auto-generate monthly summary from raw data sheet
' ============================================================
Public Sub GenerateMonthlyReport()
On Error GoTo ErrHandler
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Dim wsData As Worksheet
Dim wsSummary As Worksheet
Dim lastRow As Long
Dim reportMonth As String
' Get target month
reportMonth = InputBox("Enter month (YYYY-MM):", "Report Month", Format(Date, "YYYY-MM"))
If reportMonth = "" Then GoTo CleanUp
' Reference sheets
Set wsData = ThisWorkbook.Sheets("Data")
' Create or clear summary sheet
On Error Resume Next
Set wsSummary = ThisWorkbook.Sheets("Summary_" & reportMonth)
On Error GoTo ErrHandler
If wsSummary Is Nothing Then
Set wsSummary = ThisWorkbook.Sheets.Add(After:=ThisWorkbook.Sheets(ThisWorkbook.Sheets.Count))
wsSummary.Name = "Summary_" & reportMonth
Else
wsSummary.Cells.Clear
End If
lastRow = wsData.Cells(wsData.Rows.Count, "A").End(xlUp).Row
' Write headers
wsSummary.Range("A1").Value = "Monthly Report: " & reportMonth
wsSummary.Range("A1").Font.Size = 16
wsSummary.Range("A1").Font.Bold = True
wsSummary.Range("A3").Value = "Category"
wsSummary.Range("B3").Value = "Total Amount"
wsSummary.Range("C3").Value = "Count"
wsSummary.Range("D3").Value = "Average"
' Aggregate by category (using Dictionary)
Dim dict As Object
Set dict = CreateObject("Scripting.Dictionary")
Dim i As Long
Dim cat As String
Dim amt As Double
For i = 2 To lastRow
' Filter by month (assuming date in column A, category in B, amount in C)
If Format(wsData.Cells(i, 1).Value, "YYYY-MM") = reportMonth Then
cat = CStr(wsData.Cells(i, 2).Value)
amt = CDbl(wsData.Cells(i, 3).Value)
If dict.Exists(cat) Then
dict(cat) = Array(dict(cat)(0) + amt, dict(cat)(1) + 1)
Else
dict.Add cat, Array(amt, 1)
End If
End If
Next i
' Write results
Dim outRow As Long
outRow = 4
Dim key As Variant
For Each key In dict.Keys
wsSummary.Cells(outRow, 1).Value = key
wsSummary.Cells(outRow, 2).Value = dict(key)(0)
wsSummary.Cells(outRow, 2).NumberFormat = "#,##0.00"
wsSummary.Cells(outRow, 3).Value = dict(key)(1)
wsSummary.Cells(outRow, 4).Value = dict(key)(0) / dict(key)(1)
wsSummary.Cells(outRow, 4).NumberFormat = "#,##0.00"
outRow = outRow + 1
Next key
' Auto-fit columns
wsSummary.Columns("A:D").AutoFit
MsgBox "Report generated: " & dict.Count & " categories", vbInformation
CleanUp:
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Exit Sub
ErrHandler:
MsgBox "Error: " & Err.Description, vbCritical
Resume CleanUp
End Sub
```
---
## Template 2: Batch Process Multiple Sheets
```vba
Option Explicit
' ============================================================
' Module: ModBatchProcess
' Purpose: Apply same operation to all data sheets
' ============================================================
Public Sub BatchProcessSheets()
On Error GoTo ErrHandler
Application.ScreenUpdating = False
Dim ws As Worksheet
Dim processedCount As Long
For Each ws In ThisWorkbook.Worksheets
' Skip non-data sheets
If Left(ws.Name, 1) <> "_" And ws.Name <> "Summary" And ws.Name <> "Config" Then
Call ProcessSingleSheet(ws)
processedCount = processedCount + 1
End If
Next ws
MsgBox processedCount & " sheets processed.", vbInformation
CleanUp:
Application.ScreenUpdating = True
Exit Sub
ErrHandler:
MsgBox "Error on sheet '" & ws.Name & "': " & Err.Description, vbCritical
Resume CleanUp
End Sub
Private Sub ProcessSingleSheet(ws As Worksheet)
Dim lastRow As Long
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
' Example: Add a "Total" row at the bottom
Dim lastCol As Long
lastCol = ws.Cells(1, ws.Columns.Count).End(xlToLeft).Column
Dim totalRow As Long
totalRow = lastRow + 1
ws.Cells(totalRow, 1).Value = "Total"
ws.Cells(totalRow, 1).Font.Bold = True
Dim col As Long
For col = 2 To lastCol
' Only sum if column contains numbers
If IsNumeric(ws.Cells(2, col).Value) Then
ws.Cells(totalRow, col).Formula = "=SUM(" & _
ws.Cells(2, col).Address & ":" & ws.Cells(lastRow, col).Address & ")"
ws.Cells(totalRow, col).Font.Bold = True
End If
Next col
End Sub
```
---
## Template 3: Data Validation & Cleanup
```vba
Option Explicit
' ============================================================
' Module: ModDataCleanup
' Purpose: Validate and clean data, log issues
' ============================================================
Public Sub ValidateAndClean()
On Error GoTo ErrHandler
Application.ScreenUpdating = False
Dim wsData As Worksheet
Dim wsLog As Worksheet
Dim lastRow As Long
Dim logRow As Long
Dim issueCount As Long
Set wsData = ThisWorkbook.Sheets("Data")
lastRow = wsData.Cells(wsData.Rows.Count, "A").End(xlUp).Row
' Create log sheet
On Error Resume Next
Application.DisplayAlerts = False
ThisWorkbook.Sheets("ValidationLog").Delete
Application.DisplayAlerts = True
On Error GoTo ErrHandler
Set wsLog = ThisWorkbook.Sheets.Add(After:=ThisWorkbook.Sheets(ThisWorkbook.Sheets.Count))
wsLog.Name = "ValidationLog"
wsLog.Range("A1:D1").Value = Array("Row", "Column", "Issue", "Original Value")
logRow = 2
Dim i As Long
For i = 2 To lastRow
' Check: Empty required fields (columns A-C)
Dim col As Long
For col = 1 To 3
If IsEmpty(wsData.Cells(i, col)) Or Trim(CStr(wsData.Cells(i, col).Value)) = "" Then
wsLog.Cells(logRow, 1).Value = i
wsLog.Cells(logRow, 2).Value = wsData.Cells(1, col).Value
wsLog.Cells(logRow, 3).Value = "Empty required field"
logRow = logRow + 1
issueCount = issueCount + 1
End If
Next col
' Check: Numeric column D should be positive
If Not IsEmpty(wsData.Cells(i, 4)) Then
If Not IsNumeric(wsData.Cells(i, 4).Value) Then
wsLog.Cells(logRow, 1).Value = i
wsLog.Cells(logRow, 2).Value = wsData.Cells(1, 4).Value
wsLog.Cells(logRow, 3).Value = "Non-numeric value"
wsLog.Cells(logRow, 4).Value = wsData.Cells(i, 4).Value
logRow = logRow + 1
issueCount = issueCount + 1
ElseIf CDbl(wsData.Cells(i, 4).Value) < 0 Then
wsLog.Cells(logRow, 1).Value = i
wsLog.Cells(logRow, 2).Value = wsData.Cells(1, 4).Value
wsLog.Cells(logRow, 3).Value = "Negative value"
wsLog.Cells(logRow, 4).Value = wsData.Cells(i, 4).Value
logRow = logRow + 1
issueCount = issueCount + 1
End If
End If
' Clean: Trim whitespace from text columns
For col = 1 To 3
If Not IsEmpty(wsData.Cells(i, col)) Then
Dim cleaned As String
cleaned = Trim(CStr(wsData.Cells(i, col).Value))
If cleaned <> CStr(wsData.Cells(i, col).Value) Then
wsData.Cells(i, col).Value = cleaned
End If
End If
Next col
Next i
' Format log
wsLog.Columns("A:D").AutoFit
wsLog.Range("A1:D1").Font.Bold = True
If issueCount > 0 Then
wsLog.Activate
MsgBox issueCount & " issues found. See ValidationLog sheet.", vbExclamation
Else
MsgBox "All data validated. No issues found.", vbInformation
End If
CleanUp:
Application.ScreenUpdating = True
Exit Sub
ErrHandler:
MsgBox "Error: " & Err.Description, vbCritical
Resume CleanUp
End Sub
```
---
## Template 4: Multi-File Consolidation
```vba
Option Explicit
' ============================================================
' Module: ModConsolidate
' Purpose: Merge data from multiple Excel files into one sheet
' ============================================================
Public Sub ConsolidateFiles()
On Error GoTo ErrHandler
Application.ScreenUpdating = False
' Let user select files
Dim files As Variant
files = Application.GetOpenFilename( _
FileFilter:="Excel Files (*.xlsx;*.xlsm),*.xlsx;*.xlsm", _
Title:="Select Files to Consolidate", _
MultiSelect:=True)
If Not IsArray(files) Then
MsgBox "No files selected.", vbInformation
GoTo CleanUp
End If
Dim wsDest As Worksheet
Set wsDest = ThisWorkbook.Sheets("Consolidated")
wsDest.Cells.Clear
Dim destRow As Long
destRow = 1
Dim headerWritten As Boolean
Dim fileIndex As Long
For fileIndex = LBound(files) To UBound(files)
Dim wbSource As Workbook
Set wbSource = Workbooks.Open(CStr(files(fileIndex)), ReadOnly:=True)
Dim wsSource As Worksheet
Set wsSource = wbSource.Sheets(1) ' First sheet
Dim srcLastRow As Long
srcLastRow = wsSource.Cells(wsSource.Rows.Count, "A").End(xlUp).Row
Dim srcLastCol As Long
srcLastCol = wsSource.Cells(1, wsSource.Columns.Count).End(xlToLeft).Column
' Copy header from first file only
If Not headerWritten Then
wsSource.Range(wsSource.Cells(1, 1), wsSource.Cells(1, srcLastCol)).Copy _
Destination:=wsDest.Cells(destRow, 1)
' Add "Source File" column
wsDest.Cells(destRow, srcLastCol + 1).Value = "Source File"
destRow = destRow + 1
headerWritten = True
End If
' Copy data rows
If srcLastRow >= 2 Then
wsSource.Range(wsSource.Cells(2, 1), wsSource.Cells(srcLastRow, srcLastCol)).Copy _
Destination:=wsDest.Cells(destRow, 1)
' Tag source file
Dim r As Long
For r = destRow To destRow + srcLastRow - 2
wsDest.Cells(r, srcLastCol + 1).Value = Dir(CStr(files(fileIndex)))
Next r
destRow = destRow + srcLastRow - 1
End If
wbSource.Close SaveChanges:=False
Next fileIndex
wsDest.Columns.AutoFit
MsgBox "Consolidated " & UBound(files) - LBound(files) + 1 & " files, " & _
destRow - 2 & " data rows.", vbInformation
CleanUp:
Application.ScreenUpdating = True
Exit Sub
ErrHandler:
MsgBox "Error: " & Err.Description, vbCritical
If Not wbSource Is Nothing Then wbSource.Close SaveChanges:=False
Resume CleanUp
End Sub
```
---
## Template 5: Button-Triggered Automation
```vba
' ============================================================
' In ThisWorkbook module — create button on sheet
' ============================================================
' Add button programmatically (run once):
Sub CreateRunButton()
Dim ws As Worksheet
Set ws = ThisWorkbook.Sheets("Dashboard")
Dim btn As Button
Set btn = ws.Buttons.Add(Left:=10, Top:=10, Width:=120, Height:=36)
btn.Caption = "Generate Report"
btn.OnAction = "ModMonthlyReport.GenerateMonthlyReport"
btn.Font.Size = 11
End Sub
```
---
## Template 6: Protected Sheet with Editable Ranges
```vba
Option Explicit
' ============================================================
' Module: ModProtection
' Purpose: Lock sheet but allow editing in specific ranges
' ============================================================
Public Sub SetupProtection()
Dim ws As Worksheet
Set ws = ThisWorkbook.Sheets("Input")
' First unlock everything
ws.Unprotect Password:="admin123"
ws.Cells.Locked = True
' Unlock editable ranges
ws.Range("C5:C20").Locked = False ' Input cells
ws.Range("E5:E20").Locked = False ' Comment cells
' Visual hint: light yellow for editable cells
ws.Range("C5:C20").Interior.Color = RGB(255, 255, 230)
ws.Range("E5:E20").Interior.Color = RGB(255, 255, 230)
' Protect with options
ws.Protect Password:="admin123", _
DrawingObjects:=True, _
Contents:=True, _
Scenarios:=True, _
AllowFormattingCells:=False, _
AllowInsertingRows:=False, _
AllowDeletingRows:=False, _
AllowSorting:=True, _
AllowFiltering:=True, _
AllowUsingPivotTables:=False
MsgBox "Sheet protected. Editable ranges highlighted in yellow.", vbInformation
End Sub
```

196
skills/xlsx/quality/pipeline.md Executable file
View File

@@ -0,0 +1,196 @@
# Spreadsheet Integrity Pipeline
Every xlsx deliverable is built and verified through a role-based workflow. Three roles collaborate in sequence: **Blueprint Architect**, **Builder**, and **Inspector**. Each role has explicit responsibilities and handoff criteria.
---
## Tool Reference: xlsx.py
All commands: `python3 "$XLSX_SKILL_DIR/xlsx.py" <command> [arguments]`
| Command | Purpose | Called By |
|---------|---------|-----------|
| `recalc <file>` | Recalculate formulas via LibreOffice, scan for errors | Builder (self-check) |
| `audit <file>` | Deep formula error scan + zero-value + implicit array detection | Builder (self-check) |
| `scan <file>` | Detect out-of-range, header-included, small-aggregate, inconsistent patterns | Builder (self-check) |
| `inspect <file> --pretty` | Get sheet structure, data ranges, headers (JSON) | Blueprint Architect |
| `pivot <in> <out> --source --values [--rows --cols --filters --style --chart]` | Create PivotTable | Builder (final step only) |
| `chart-verify <file>` | Verify embedded charts have data | Builder (self-check) |
| `validate <file>` | Structural validation (release gate) | Inspector |
---
## Role 1: Blueprint Architect
Before any code runs, the Architect produces a build plan:
- **Decompose the request**: separate explicit requirements from implicit business context
- **Map every sheet**: name, column structure, formula dependencies, cross-references
- **Identify data flow**: which sheets feed into which (source → derived → summary)
- **Flag ambiguity**: if the request is unclear, ask — don't guess
The Architect's output is a mental blueprint. No files are created yet.
---
## Role 2: Builder
The Builder writes code and produces the workbook. The Builder operates under a strict **single-sheet discipline**: complete one sheet fully, verify it, then move on.
### Build Cycle (per sheet)
```
┌─────────────────────────────────────────────┐
│ Write sheet (data, formulas, styling, charts) │
│ ↓ │
│ Save workbook to disk │
│ ↓ │
│ Self-check chain: │
│ recalc → audit → scan │
│ + chart-verify (if sheet has charts) │
│ ↓ │
│ All clear? ──Yes──→ Proceed to next sheet │
│ │ │
│ No │
│ ↓ │
│ Fix errors → re-save → re-run self-check │
│ (loop until clean) │
└─────────────────────────────────────────────┘
```
### Builder Constraints
- **No batch-then-check**: you cannot create all sheets first and verify later. Errors in early sheets propagate silently into later sheets.
- **No error forwarding**: a sheet with unresolved errors blocks all subsequent work.
- **No silent delivery**: a file that hasn't passed self-check is not a deliverable — it's a draft.
### Pivot Tables — Special Sequencing
PivotTables depend on finalized source data. They are always the **last data operation**:
```bash
python3 "$XLSX_SKILL_DIR/xlsx.py" inspect input.xlsx --pretty # understand structure
python3 "$XLSX_SKILL_DIR/xlsx.py" pivot input.xlsx output.xlsx \
--source "Sheet!A1:F100" \
--values "Revenue:sum,Units:count" \
--rows "Product,Region" \
--cols "Quarter" \
--filters "Year" \
--location "Summary!A3" \
--style "finance" \
--chart "bar"
```
Aggregations: `sum`, `count`, `average`/`avg`, `max`, `min`
Chart types: `bar` (default), `line`, `pie`
Styles: `monochrome` (default), `finance`
**Never modify pivot output with openpyxl afterward** — it corrupts the pivotCache.
---
## Role 3: Inspector
The Inspector runs after all sheets are built. Two levels of inspection: **Semantic** and **Structural**.
### Semantic Inspection (for edit/transform tasks)
When the task involves transforming existing data (not creating from scratch), verify the transformation didn't corrupt meaning:
| Check | Method |
|-------|--------|
| **Row count** | Does output have the expected number of rows? (e.g., grouping 15 rows by 5 keys → expect 5 rows) |
| **Column totals** | Do numeric sums in output match source? (or expected transformation) |
| **Spot-check** | Compare 2-3 specific rows between source and output |
| **Formula evaluability** | Can formulas be verified in Python? If self-referencing or cross-sheet, verify computed values instead |
```python
# Semantic verification template
source_total = sum(normalize_cell_value(ws_src.cell(row=r, column=c).value) or 0
for r in range(start, end + 1))
output_total = sum(normalize_cell_value(ws_out.cell(row=r, column=c).value) or 0
for r in range(out_start, out_end + 1))
assert abs(source_total - output_total) < 0.01, f"Total mismatch: {source_total} vs {output_total}"
```
### Structural Inspection (release gate)
```bash
python3 "$XLSX_SKILL_DIR/xlsx.py" validate output.xlsx
```
- Exit 0 → file is releasable
- Non-zero → Builder must regenerate from scratch with corrected code
---
## Known Traps & Countermeasures
These are recurring failure modes. The Builder must internalize them.
| Trap | What Goes Wrong | Countermeasure |
|------|----------------|----------------|
| `data_only=True` then save | Formulas permanently replaced with cached values | Never save after opening with `data_only=True` |
| Column index miscalculation | col 64 ≠ "BK" | Always use `openpyxl.utils.get_column_letter()` |
| Row offset confusion | DataFrame row 5 = Excel row 6 | Excel is 1-indexed, pandas is 0-indexed |
| NaN leaks into formulas | `=A1+nan` → broken formula string | Check `pd.notna()` before referencing |
| Cross-sheet reference typo | `Sheet1!A1` vs `'Sheet 1'!A1` | Quote sheet names containing spaces |
| Division by zero | `#DIV/0!` in Excel | Wrap with `IFERROR()` or `IF(denom=0,...)` |
| Text starting with `=` | `#NAME?` error | Prefix descriptive text with `'` |
| Implicit array formula | `#N/A` in Excel | Avoid `MATCH(TRUE(),range>0,0)`, use `SUMPRODUCT` |
| Chart renders blank | Formula cells have no cached values | Run `recalc` before creating charts |
| Hidden rows → empty chart | Chart skips hidden data | Set `chart.plot_visible_only = False` |
| Overlapping charts | Multiple charts stacked on same cells | Calculate anchor: ~15 rows per chart + 2 rows gap |
| Verify newly-written formulas with `data_only=True` → get `None` | openpyxl doesn't evaluate formulas; `data_only=True` only reads Excel's cached values which don't exist for new formulas | Compute expected values in Python and compare directly. For TOTAL rows needing verification, write computed values (see SKILL.md Design Principle #1 Exception) |
| Manual row sort breaks references | Value-swap sorting doesn't update formula references | After sorting by swapping data, regenerate all formula strings with updated row numbers |
| NBSP (`\xa0`) treated as non-empty | Cells containing `\xa0` or `\u200b` look blank but fail `is None` | Normalize: `\xa0`, `\u200b`, whitespace-only → `None` before comparison or aggregation |
---
## Cross-Validation Review Sheet
For analysis-heavy deliverables, embed a self-checking Review sheet in the workbook.
### When Required
- Deliverables with computed metrics or aggregated data
- Financial models with cross-sheet references
- Data sourced from external APIs or web searches
### Structure
```python
review_ws = wb.create_sheet("Review")
review_ws.sheet_properties.tabColor = "FFC000" # amber tab
checks = [
["Check", "Expected", "Actual", "Status"],
["Total Revenue", "=SUM(Data!B2:B100)", "=Summary!B10", '=IF(B2=C2,"✓ PASS","✗ FAIL")'],
["Row Count", "=COUNTA(Data!A:A)-1", "=Summary!B3", '=IF(B3=C3,"✓ PASS","✗ FAIL")'],
["Grand Total Match", "=Detail!F50", "=Dashboard!C5", '=IF(B4=C4,"✓ PASS","✗ FAIL")'],
]
for i, row in enumerate(checks, 1):
for j, val in enumerate(row, 1):
review_ws.cell(row=i, column=j, value=val)
```
### Rules
- Every Summary/Dashboard metric must have a cross-check formula back to source data
- Status column uses live formulas — green if correct, red if mismatch
- Review is the **last sheet** in the workbook (before Sources, if present)
---
## Release Checklist
Before handing the file to the user:
- [ ] Every sheet passed the Builder's self-check chain
- [ ] Semantic inspection passed (if applicable)
- [ ] `validate` returned exit code 0
- [ ] All temp files, drafts, and retry artifacts removed
- [ ] If multiple versions exist from retries, only the latest correct version remains
- [ ] Every remaining file in the output directory is an expected deliverable
- [ ] **VBA check** (if `.xlsm`): VBA modules preserved, no unintended macro removal
- [ ] **VBA security** (if VBA generated): passes security checklist in `scenes/vba.md`

271
skills/xlsx/scenes/advanced.md Executable file
View File

@@ -0,0 +1,271 @@
# Scene: Advanced Operations
## When This Applies
Batch processing multiple files, handling very large datasets, data validation, conditional formatting, sheet protection, or other power-user features.
---
## Large File Handling (>100K rows)
### Read-Only Mode
```python
from openpyxl import load_workbook
# Memory-efficient reading — does NOT load entire file
wb = load_workbook('huge.xlsx', read_only=True)
ws = wb.active
for row in ws.iter_rows(min_row=2, values_only=True):
process(row) # Yields rows one at a time
wb.close() # MUST close read-only workbooks
```
### Write-Only Mode
```python
from openpyxl import Workbook
wb = Workbook(write_only=True)
ws = wb.create_sheet()
# Write rows sequentially — cannot random-access cells
for data_row in large_dataset:
ws.append(data_row)
wb.save('output.xlsx')
```
### Chunked Processing with pandas
```python
# Read in chunks
chunks = pd.read_excel('huge.xlsx', chunksize=10000)
# Note: chunksize only works with read_csv, not read_excel
# For Excel, read specific columns/rows
df = pd.read_excel('huge.xlsx',
usecols=['A', 'C', 'E'], # Only needed columns
nrows=50000, # Limit rows
dtype={'id': str} # Prevent type inference overhead
)
```
---
## Batch Processing Multiple Files
```python
import os
import glob
import pandas as pd
# Collect all Excel files
files = glob.glob('data/*.xlsx')
# Method 1: Concatenate into one DataFrame
all_data = []
for f in files:
df = pd.read_excel(f)
df['source_file'] = os.path.basename(f)
all_data.append(df)
combined = pd.concat(all_data, ignore_index=True)
combined.to_excel('combined.xlsx', index=False)
# Method 2: One sheet per file
wb = Workbook()
wb.remove(wb.active) # Remove default sheet
for f in files:
df = pd.read_excel(f)
ws = wb.create_sheet(title=os.path.splitext(os.path.basename(f))[0][:31])
for r in dataframe_to_rows(df, index=False, header=True):
ws.append(r)
wb.save('all_files.xlsx')
```
---
## Data Validation (Dropdown Lists)
```python
from openpyxl.worksheet.datavalidation import DataValidation
# Dropdown list
dv = DataValidation(
type="list",
formula1='"High,Medium,Low"',
allow_blank=True,
showErrorMessage=True,
errorTitle="Invalid",
error="Please select High, Medium, or Low"
)
ws.add_data_validation(dv)
dv.add('D5:D100') # Apply to range
# Number range validation
dv_num = DataValidation(
type="whole",
operator="between",
formula1=1,
formula2=100,
errorTitle="Out of range",
error="Enter a number between 1 and 100"
)
ws.add_data_validation(dv_num)
dv_num.add('E5:E100')
# Date validation
dv_date = DataValidation(
type="date",
operator="greaterThan",
formula1="2024-01-01"
)
ws.add_data_validation(dv_date)
dv_date.add('F5:F100')
```
---
## Conditional Formatting
For full conditional formatting rules, color usage, and code examples → see **`engines/design.md §8`**.
Quick reference for advanced-only patterns (FormulaRule for row-level highlighting):
```python
from openpyxl.formatting.rule import FormulaRule
from openpyxl.styles import PatternFill
# Formula-based: highlight entire row if status = "Overdue"
ws.conditional_formatting.add('B5:H100',
FormulaRule(formula=['$G5="Overdue"'],
fill=PatternFill('solid', fgColor='FFEBEE')))
# Note: Icon sets are NOT supported by openpyxl — use color fills instead
```
---
## Sheet Protection
```python
# Protect sheet (allow select + sort, prevent edits)
ws.protection.sheet = True
ws.protection.password = 'mypassword'
ws.protection.sort = True
ws.protection.autoFilter = True
# Unlock specific cells for user input
from openpyxl.styles import Protection
unlocked = Protection(locked=False)
for row in range(5, 101):
ws.cell(row=row, column=4).protection = unlocked # Column D is editable
# Protect workbook structure (prevent adding/deleting sheets)
wb.security.workbookPassword = 'structpass'
wb.security.lockStructure = True
```
---
## Named Ranges
```python
from openpyxl.workbook.defined_name import DefinedName
# Create named range
ref = f"'Data'!$B$5:$B$100"
defn = DefinedName('SalesData', attr_text=ref)
wb.defined_names.add(defn)
# Use in formulas
ws['H5'] = '=SUM(SalesData)'
```
---
## Auto-Filter & Sort
```python
# Apply auto-filter
ws.auto_filter.ref = 'B4:H100'
# Add filter criteria (for saved state — user can change in Excel)
ws.auto_filter.add_filter_column(0, ['Active', 'Pending'])
# Sort (openpyxl can set sort state, but actual reordering
# must be done in Python before writing)
df = df.sort_values(['Category', 'Revenue'], ascending=[True, False])
```
---
## Merged Cells
```python
# Merge cells
ws.merge_cells('B2:H2') # Title spanning full width
# Write to merged range (write to top-left cell)
ws['B2'] = 'Report Title'
# Check existing merges before editing
for merge_range in ws.merged_cells.ranges:
print(f"Merged: {merge_range}")
# Unmerge if needed
ws.unmerge_cells('B2:H2')
```
**Warning**: Never write to cells within a merged range except the top-left cell. This causes corruption.
---
## Performance Tips
| Technique | When | Impact |
|-----------|------|--------|
| `read_only=True` | Reading files >50K rows | ~10x less memory |
| `write_only=True` | Writing files >50K rows | ~5x faster |
| `usecols` parameter | Only need specific columns | Faster read |
| Avoid `ws.cell()` in tight loops | Use `ws.append()` instead | Faster write |
| Batch style application | Apply to ranges, not cell-by-cell | Faster formatting |
| `data_only=True` for analysis | Need values not formulas | Faster read |
---
## VBA Module Inspection
When working with `.xlsm` files, you can read and list VBA modules:
```python
from openpyxl import load_workbook
import zipfile
import os
def list_vba_modules(filepath):
"""List all VBA modules in an .xlsm file."""
if not filepath.endswith(('.xlsm', '.xlsb')):
return {"has_vba": False, "modules": []}
modules = []
try:
with zipfile.ZipFile(filepath, 'r') as zf:
vba_files = [f for f in zf.namelist() if f.startswith('xl/vbaProject')]
if not vba_files:
return {"has_vba": False, "modules": []}
# Read with keep_vba to access vba_archive
wb = load_workbook(filepath, keep_vba=True)
if wb.vba_archive:
for name in wb.vba_archive.namelist():
modules.append(name)
wb.close()
except Exception as e:
return {"has_vba": False, "error": str(e)}
return {"has_vba": True, "modules": modules}
```
Use this to inspect before editing — know what VBA exists before you touch the file.

View File

@@ -0,0 +1,234 @@
# Analyze Recipes — Code Patterns for Data Analysis
> Load this file ON DEMAND when you need specific code patterns. Do NOT load upfront.
---
## Load & Explore
```python
import pandas as pd
df = pd.read_excel('input.xlsx') # or read_csv, read_json
# Multi-sheet: pd.read_excel('input.xlsx', sheet_name=None) → dict
print(f"Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print(f"Dtypes:\n{df.dtypes}")
print(f"Nulls:\n{df.isnull().sum()}")
print(f"Duplicates: {df.duplicated().sum()}")
print(f"\nDescribe:\n{df.describe()}")
```
---
## Aggregation & Grouping
```python
summary = df.groupby('Category').agg(
total=('Revenue', 'sum'),
avg=('Revenue', 'mean'),
count=('Revenue', 'count'),
max_val=('Revenue', 'max')
).round(2)
pivot = df.pivot_table(
values='Amount', index='Category', columns='Quarter',
aggfunc='sum', margins=True
)
```
---
## Time Series
```python
df['date'] = pd.to_datetime(df['date'])
monthly = df.resample('M', on='date').agg({'revenue': 'sum', 'orders': 'count'})
monthly['growth'] = monthly['revenue'].pct_change()
monthly['rolling_3m'] = monthly['revenue'].rolling(3).mean()
```
---
## Comparison / Diff
```python
df1 = pd.read_excel('this_month.xlsx')
df2 = pd.read_excel('last_month.xlsx')
merged = df1.merge(df2, on='ID', suffixes=('_new', '_old'))
merged['change'] = merged['value_new'] - merged['value_old']
merged['change_pct'] = (merged['change'] / merged['value_old'] * 100).round(1)
```
---
## Statistical Analysis
```python
stats = df.describe().T
stats['median'] = df.median()
stats['skew'] = df.skew()
corr = df.select_dtypes(include='number').corr().round(3)
top_10 = df.nlargest(10, 'Revenue')
bottom_10 = df.nsmallest(10, 'Revenue')
```
---
## Data Cleaning
```python
df = df.drop_duplicates()
df['amount'] = df['amount'].fillna(0)
df['name'] = df['name'].fillna('Unknown')
df['date'] = pd.to_datetime(df['date'], errors='coerce')
df['amount'] = pd.to_numeric(df['amount'], errors='coerce')
# Remove outliers (IQR)
Q1, Q3 = df['value'].quantile([0.25, 0.75])
IQR = Q3 - Q1
df = df[(df['value'] >= Q1 - 1.5*IQR) & (df['value'] <= Q3 + 1.5*IQR)]
```
---
## Bridge Pattern: pandas → openpyxl
```python
from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
wb = Workbook()
ws = wb.active
ws.title = "Analysis"
for r_idx, row in enumerate(dataframe_to_rows(summary, index=True, header=True), 1):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx + 3, column=c_idx + 1, value=value)
```
---
## KPI Summary Card
```python
kpis = [
('Total Revenue', total_revenue, '$#,##0'),
('Avg Order Value', avg_order, '$#,##0.00'),
('Growth Rate', growth_rate, '0.0%'),
('Total Orders', total_orders, '#,##0'),
]
col = 2
for label, value, fmt in kpis:
ws.cell(row=3, column=col, value=label)
ws.cell(row=4, column=col, value=value)
ws.cell(row=4, column=col).number_format = fmt
col += 3
```
---
## Cross-Validation Review Sheet
```python
review_ws = wb.create_sheet("Review")
review_ws.sheet_properties.tabColor = "FFC000"
checks = [
["Check", "Expected", "Actual", "Status"],
["Total Revenue", "=SUM(Data!B2:B100)", "=Summary!B10", '=IF(B2=C2,"✓ PASS","✗ FAIL")'],
["Row Count", "=COUNTA(Data!A:A)-1", "=Summary!B3", '=IF(B3=C3,"✓ PASS","✗ FAIL")'],
]
for i, row in enumerate(checks, 1):
for j, val in enumerate(row, 1):
review_ws.cell(row=i, column=j, value=val)
```
---
## xlsx.py Pivot Workflow
```bash
python3 "$XLSX_SKILL_DIR/xlsx.py" inspect data.xlsx --pretty
python3 "$XLSX_SKILL_DIR/xlsx.py" pivot data.xlsx output.xlsx \
--source "Data!A1:F500" \
--rows "Product,Region" \
--values "Revenue:sum,Units:count" \
--location "Summary!A3" \
--style "finance" \
--chart "bar"
python3 "$XLSX_SKILL_DIR/xlsx.py" validate output.xlsx
```
### PivotTable Best Practices
- Source data: first row must have unique, non-blank headers
- No merged cells or blank rows in source range
- Place pivot on a dedicated sheet, position at A3 or B2
- Row axis: primary grouping; Column axis: ≤10 distinct values
- Values: numeric measures only
### PivotTable Troubleshooting
| Symptom | Remedy |
|---------|--------|
| "Field not found" | Check header spelling via `inspect` |
| PivotTable empty | Ensure `--source` covers all data rows |
| `validate` reports pivot errors | Critical — must fix |
| `validate` reports `pass_with_warnings` | Safe to deliver |
---
## Alternating Column Structure (Key-Value Pairs)
When odd columns contain identifiers and even columns contain corresponding values (e.g., O=PartNo, P=Qty, Q=PartNo, R=Qty, ...):
**Detection heuristic**:
- Odd columns have repeated values or category codes
- Even columns are numeric
- Headers alternate between descriptive and quantitative names
**Solution**: Use SUMIF across the combined key/value ranges:
```python
# Excel formula: =SUMIF(O2:W2, A2, P2:X2)
# SUMIF matches position-by-position across multi-column ranges
formula = f'=SUMIF(O{row}:W{row},A{row},P{row}:X{row})'
```
---
## FIFO Allocation Formula (Cumulative Deduction)
Scenario: Allocate limited inventory to order lines in sequence — each row gets what's left after previous rows consumed their share.
**Formula template** (row N):
```
=MAX(0, MIN(OrderQty_N,
TotalInventory_for_key - SUM_of_already_allocated_above))
```
**Example** (H column = allocated qty):
```python
# Row 2 (first row): allocate up to available inventory
f'=MIN(G2, SUMIFS(Sheet2!D:D, Sheet2!A:A, A2, Sheet2!B:B, D2))'
# Row 3+ (subsequent): subtract already-allocated from rows above
f'=MAX(0, MIN(G{r}, SUMIFS(Sheet2!D:D, Sheet2!A:A, A{r}, Sheet2!B:B, D{r})'
f' - SUMIFS(H$1:H{r-1}, A$1:A{r-1}, A{r}, D$1:D{r-1}, D{r})))'
```
**Key**: `SUMIFS(H$1:H{r-1}, ...)` creates a running total of already-allocated amounts, achieving row-by-row deduction.
⚠️ This is a self-referencing formula pattern — openpyxl cannot verify it. Must open in Excel to confirm calculation.
### Data Provenance Implementation
```python
src_ws = wb.create_sheet("Sources")
src_ws.sheet_properties.tabColor = PRIMARY
headers = ["Data Description", "Source Name", "Source URL", "Access Date"]
for col, h in enumerate(headers, 1):
cell = src_ws.cell(row=1, column=col, value=h)
cell.font = Font(name=FONT_NAME, bold=HEADER_BOLD, color="FFFFFF")
cell.fill = PatternFill(start_color=PRIMARY, end_color=PRIMARY, fill_type="solid")
```

95
skills/xlsx/scenes/analyze.md Executable file
View File

@@ -0,0 +1,95 @@
# Scene: Data Analysis → Excel Output
## When This Applies
User wants to analyze data (statistics, trends, comparisons, pivots, aggregation) and receive results as an Excel file — possibly with charts, summary tables, or dashboards.
This scene bridges **pandas analysis** with **openpyxl output**. The deliverable is always an .xlsx file.
## Workflow
```
1. LOAD → Read input data (CSV/XLSX/JSON/DB)
2. EXPLORE → Understand structure, quality, distributions
3. ANALYZE → Compute metrics, aggregations, statistical tests
4. DESIGN → Plan Excel output (sheets, charts, KPIs)
5. BUILD → Write analysis results to .xlsx with formatting
6. CHART → Add charts (Excel-native or embedded matplotlib)
7. QA → recalc → audit → scan → chart-verify
8. PIVOT → If needed, run xlsx.py pivot as final step
9. VALIDATE → validate → deliver
```
## Analysis Framework
### Phase A: Problem Framing
- What question is the user trying to answer?
- Who will consume this output? (executive summary vs. detailed analysis)
- What decisions will be made based on this data?
### Phase B: Data Quality Assessment
- Missing values: count, pattern (random vs. systematic)
- Outliers: statistical detection (IQR, z-score)
- Data types: numeric vs. categorical, date parsing
- Duplicates: exact and fuzzy
### Phase C: Exploratory Analysis
- Distributions: histograms, box plots for key variables
- Correlations: pairwise for numeric columns
- Segmentation: group-by analysis on categorical dimensions
- Time patterns: trends, seasonality if time-series data
### Phase D: Insight Extraction
- Rank findings by business impact, not statistical significance
- Each insight must be actionable — "so what?" test
- Cross-validate: check the same insight from a different angle
### Phase E: Cross-Validation
- Sanity check totals against known benchmarks
- Verify computed metrics with alternative formulas
- Document any assumptions or limitations in the output
**Industry-specific frameworks:**
- **Finance**: Variance analysis → trend decomposition → ratio analysis → peer comparison
- **Marketing**: Funnel analysis → cohort analysis → attribution → ROI calculation
- **Operations**: Throughput analysis → bottleneck identification → utilization rates → SLA compliance
---
## Multi-Sheet Report Layout
```
Sheet 1: "Dashboard" — KPI cards + summary chart
Sheet 2: "Detail" — Full analysis table with formatting
Sheet 3: "Charts" — Additional visualizations
Sheet 4: "Raw Data" — Original data for reference (tab color: gray)
```
### KPI Summary Card Pattern
Place 4-6 KPI metrics at the top of Dashboard sheet (row 3-4), each spaced 3 columns apart. Include label (small, gray) and value (large, bold, themed) with appropriate number format.
---
## PivotTable Decision
| Situation | Use |
|-----------|-----|
| Need interactive PivotTable in Excel | `"$XLSX_SKILL_DIR/xlsx.py" pivot` |
| Just need a summary table (static) | pandas `pivot_table` → openpyxl |
| Simple aggregation (1 dimension) | pandas `groupby` → openpyxl |
**Trigger phrases**: summarize, aggregate, group by, categorize, breakdown, distribution, tally, totals per, cross-tab, 汇总, 透视, 分类统计, 交叉分析
---
## Data Provenance
When analysis uses external data, create a **"Sources" sheet** (tab color: `PRIMARY`) with columns: Data Description | Source Name | Source URL | Access Date.
Skip when user provides all data directly.
---
## Code Recipes
For specific code patterns (aggregation, time series, comparison, cleaning, bridge pattern), load `scenes/analyze-recipes.md` on demand.

133
skills/xlsx/scenes/convert.md Executable file
View File

@@ -0,0 +1,133 @@
# Scene: Format Conversion
## When This Applies
User wants to convert between tabular file formats: CSV↔XLSX, JSON→XLSX, TSV→XLSX, PDF table→XLSX, or XLSX→CSV/JSON.
## Conversion Matrix
| From | To | Method |
|------|-----|--------|
| CSV/TSV → XLSX | pandas read → openpyxl write with formatting | Most common |
| JSON → XLSX | pandas json_normalize → openpyxl | Flatten nested structures |
| XLSX → CSV | pandas read_excel → to_csv | Simple export |
| XLSX → JSON | pandas read_excel → to_json | With orient parameter |
| PDF table → XLSX | pdfplumber/tabula extract → openpyxl | Needs table detection |
| Image table → XLSX | OCR → pandas → openpyxl | Last resort, error-prone |
## CSV/TSV → XLSX
```python
import pandas as pd
from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
# Read with encoding detection
df = pd.read_csv('input.csv', encoding='utf-8')
# Common encodings: utf-8, gbk, gb2312, latin-1, shift_jis
# Handle messy CSVs
df = pd.read_csv('input.csv',
encoding='utf-8',
sep=',', # or '\t', ';', '|'
skiprows=2, # skip junk header rows
na_values=['N/A', '-', ''],
dtype=str, # read everything as string first, convert later
on_bad_lines='skip' # skip malformed rows
)
# Convert types after reading
df['amount'] = pd.to_numeric(df['amount'], errors='coerce')
df['date'] = pd.to_datetime(df['date'], errors='coerce')
# Write to Excel with formatting
wb = Workbook()
ws = wb.active
# Write data starting at B4 (with theme formatting)
for r_idx, row in enumerate(dataframe_to_rows(df, index=False, header=True), 4):
for c_idx, value in enumerate(row, 2):
ws.cell(row=r_idx, column=c_idx, value=value)
# Apply design tokens from engines/design.md
# ...
wb.save('output.xlsx')
```
## JSON → XLSX
```python
import pandas as pd
import json
# Flat JSON
df = pd.read_json('input.json')
# Nested JSON — flatten
with open('input.json') as f:
data = json.load(f)
# If it's a list of objects
df = pd.json_normalize(data, max_level=2)
# If nested with specific record path
df = pd.json_normalize(data, record_path='items', meta=['id', 'name'])
# Write to Excel...
```
## XLSX → CSV/JSON
```python
# To CSV
df = pd.read_excel('input.xlsx', sheet_name='Data')
df.to_csv('output.csv', index=False, encoding='utf-8-sig') # utf-8-sig for Excel compatibility
# To JSON
df.to_json('output.json', orient='records', force_ascii=False, indent=2)
# Multiple sheets → multiple CSVs
sheets = pd.read_excel('input.xlsx', sheet_name=None)
for name, df in sheets.items():
df.to_csv(f'output_{name}.csv', index=False, encoding='utf-8-sig')
```
## PDF Table → XLSX
```python
# Method 1: pdfplumber (preferred for most PDFs)
import pdfplumber
tables = []
with pdfplumber.open('input.pdf') as pdf:
for page in pdf.pages:
page_tables = page.extract_tables()
for table in page_tables:
tables.extend(table)
# Clean and convert to DataFrame
df = pd.DataFrame(tables[1:], columns=tables[0])
# Method 2: tabula-py (Java-based, good for complex tables)
# import tabula
# dfs = tabula.read_pdf('input.pdf', pages='all', multiple_tables=True)
```
## Encoding Gotchas
| Scenario | Encoding | Tip |
|----------|----------|-----|
| Chinese data from Windows | `gbk` or `gb2312` | Try gbk first |
| Japanese data | `shift_jis` or `cp932` | |
| European data | `latin-1` or `cp1252` | |
| Excel-generated CSV | `utf-8-sig` (has BOM) | pandas handles automatically |
| Output CSV for Excel | Write with `utf-8-sig` | Prevents garbled Chinese in Excel |
## Quality Checks After Conversion
- [ ] Row count matches source
- [ ] No garbled characters (encoding correct)
- [ ] Numeric columns are numbers, not strings
- [ ] Dates are date objects, not text
- [ ] No blank rows/columns from source artifacts
- [ ] Headers are in the correct row

105
skills/xlsx/scenes/create.md Executable file
View File

@@ -0,0 +1,105 @@
# Scene: Create New Spreadsheet
## When This Applies
User wants to create a new Excel file from scratch — a table, template, schedule, report, or any structured data output.
For financial models, also load `scenes/finance.md`.
## Workflow
```
1. PLAN → Identify all sheets, their structure, formulas, cross-references
2. STYLE → Load engines/design.md, apply default palette
3. BUILD → Create workbook, write data/formulas/formatting per sheet
4. QA → recalc → audit → scan → chart-verify (if charts)
5. PIVOT → If needed, run pivot command LAST
6. VALIDATE → validate → exit 0 = deliver
```
## Layout & Styling
All layout rules (Canvas Origin B2, column widths, row heights, margins) and styling (title/header/data/totals) are defined in **`engines/design.md`** — the single source of truth. Do not duplicate here.
Quick reference for sheet structure:
```
Row 1: [top margin]
Row 2: Title (B2)
Row 3: [spacer]
Row 4: Column headers
Row 5+: Data rows
Last+1: Totals row
Last+3: Notes/sources
```
## Multi-Sheet Workbooks
### Cross-Sheet References
```python
# Reference another sheet
sheet['C5'] = "=Data!B10"
# Sheet names with spaces need quotes
sheet['C5'] = "='Sales Data'!B10"
# Green font for cross-sheet links (Finance theme)
sheet['C5'].font = Font(color="008000")
```
### Common Multi-Sheet Patterns
- **Data + Summary**: Raw data on Sheet1, formulas/charts on Summary
- **Monthly tabs**: Jan, Feb, Mar... + Annual Summary
- **Input + Output**: Assumptions sheet + Calculations sheet + Dashboard
## Template Patterns
### Simple Data Table
```python
wb = Workbook()
ws = wb.active
ws.title = "Data"
# Title + Headers + Data + Totals styling → see engines/design.md §11 Code Templates
# Only show formula logic here:
# Headers at B4
headers = ['Product', 'Q1', 'Q2', 'Q3', 'Q4', 'Total']
for col, h in enumerate(headers, 2):
cell = ws.cell(row=4, column=col, value=h)
# Data rows starting at row 5
# ...
# Totals row
total_row = last_data_row + 1
ws.cell(row=total_row, column=2, value='Total')
for col in range(3, 7): # Q1-Q4
letter = get_column_letter(col)
ws.cell(row=total_row, column=col).value = f'=SUM({letter}5:{letter}{last_data_row})'
# Grand total
ws.cell(row=total_row, column=7).value = f'=SUM(C{total_row}:F{total_row})'
```
### Schedule / Calendar
- Use merged cells for day headers
- Conditional formatting for weekends (light gray fill)
- Freeze panes: `ws.freeze_panes = 'C5'` (freeze header + left labels)
### Checklist / Tracker
- Checkbox column using data validation (`TRUE`/`FALSE`)
- Status column with conditional formatting (green/amber/red)
- Progress bar using data bar conditional formatting
## Freeze Panes & Print
```python
# Freeze headers (row 4) and label column (col B)
ws.freeze_panes = 'C5' # Rows 1-4 and cols A-B stay visible
# Print setup
ws.page_setup.orientation = 'landscape'
ws.page_setup.fitToWidth = 1
ws.page_setup.fitToHeight = 0
ws.print_area = 'B2:H50'
ws.print_title_rows = '4:4' # Repeat header on each page
```

View File

@@ -0,0 +1,222 @@
# Edit Patterns — Reusable Code for Complex Edit Operations
> Load this file ON DEMAND when you encounter grouping, sorting, block detection, or other complex edit patterns.
> Do NOT load upfront for simple edits.
---
## Pattern: Block Detection
Data is often split into independent blocks separated by blank rows or keyword rows (e.g., TOTAL, Subtotal).
```python
def detect_blocks(ws, col=1, start_row=1, end_row=None,
separator='blank', keyword='TOTAL'):
"""
Detect data block boundaries.
separator: 'blank' (empty row) or 'keyword' (row containing keyword)
Returns: list of (start_row, end_row) tuples
"""
if end_row is None:
end_row = ws.max_row
blocks, block_start = [], None
for row in range(start_row, end_row + 1):
val = ws.cell(row=row, column=col).value
is_blank = val is None or (isinstance(val, str) and val.strip() == '')
is_kw = (separator == 'keyword' and
isinstance(val, str) and keyword in str(val).upper())
if separator == 'blank':
if not is_blank and block_start is None:
block_start = row
elif is_blank and block_start is not None:
blocks.append((block_start, row - 1))
block_start = None
elif separator == 'keyword':
if is_kw:
if block_start:
blocks.append((block_start, row))
block_start = None
elif not is_blank and block_start is None:
block_start = row
if block_start:
blocks.append((block_start, end_row))
return blocks
```
---
## Pattern: Pre-filter Null Rows
Before any groupby/aggregation, filter out rows where key columns are empty.
```python
def pre_filter_rows(ws, key_cols, start_row, end_row):
"""Return row numbers where ALL key columns are non-null."""
return [row for row in range(start_row, end_row + 1)
if all(normalize_cell_value(ws.cell(row=row, column=c).value) is not None
for c in key_cols)]
```
---
## Pattern: Sort with Formula Rewrite
When sorting rows by swapping data (not using `insert_rows`), formulas must be regenerated with new row numbers.
```python
def sort_block_with_formulas(ws, block_rows, sort_col, formula_templates,
descending=True):
"""
Sort rows within a block, regenerating formulas.
formula_templates: dict {col_index: '=B{row}+C{row}'}
"""
# 1. Read all row data + compute sort key
rows_data = []
for r in block_rows:
vals = {c: ws.cell(row=r, column=c).value for c in range(1, ws.max_column + 1)}
rows_data.append(vals)
rows_data.sort(key=lambda x: (x.get(sort_col) or 0), reverse=descending)
# 2. Write back with new row numbers
for i, rd in enumerate(rows_data):
target = block_rows[i]
for col, val in rd.items():
if col in formula_templates:
ws.cell(row=target, column=col).value = formula_templates[col].format(row=target)
else:
ws.cell(row=target, column=col).value = val
```
---
## Pattern: Group-Merge (Aggregate by Key)
Group rows by a key column. Take first-row values for some columns, sum for others.
```python
from collections import OrderedDict
def group_merge_rows(ws, key_col, start_row, end_row, first_cols, sum_cols):
"""
Group by key_col, merge rows.
first_cols: take value from first row in group
sum_cols: sum values across group
"""
groups = OrderedDict()
for row in range(start_row, end_row + 1):
key = normalize_cell_value(ws.cell(row=row, column=key_col).value)
if key is None:
continue
if key not in groups:
groups[key] = {
'first': {c: ws.cell(row=row, column=c).value for c in first_cols},
'sums': {c: 0.0 for c in sum_cols},
}
for c in sum_cols:
v = normalize_cell_value(ws.cell(row=row, column=c).value)
if v is not None:
try:
groups[key]['sums'][c] += float(v)
except (ValueError, TypeError):
pass
return groups
```
---
## Pattern: Group-Max-Keep-Ties
Group by key, find max value per group, keep ALL rows that match the max (not just the first).
```python
from collections import defaultdict
def group_max_keep_ties(rows, key_func, value_func, filter_null=True):
"""
Keep all rows with the maximum value per group (ties preserved).
rows: list of row dicts or tuples
key_func: row → group key
value_func: row → comparable value (e.g., date)
"""
groups = defaultdict(list)
for row in rows:
val = value_func(row)
if filter_null and val is None:
continue
groups[key_func(row)].append(row)
kept = []
for key, group in groups.items():
max_val = max(value_func(r) for r in group)
kept.extend(r for r in group if value_func(r) == max_val)
return kept
```
---
## Pattern: Sequence Fill (Smart Numbering)
Fill blank rows with "parent number + letter suffix" (e.g., 5 → 5a, 5b, ..., 5z, 5aa).
```python
import re
def get_letter_suffix(n):
"""0=a, 25=z, 26=aa, 27=ab..."""
if n < 26:
return chr(ord('a') + n)
return chr(ord('a') + (n // 26) - 1) + chr(ord('a') + (n % 26))
def fill_sequential_labels(ws, col, start_row, end_row):
last_base, blank_count = None, 0
for row in range(start_row, end_row + 1):
val = ws.cell(row=row, column=col).value
if val is not None:
m = re.match(r'^(\d+)', str(val))
if m:
last_base = m.group(1)
blank_count = 0
else:
if last_base is not None:
ws.cell(row=row, column=col).value = f"{last_base}{get_letter_suffix(blank_count)}"
blank_count += 1
```
---
## Pattern: Zero-as-Blank Output
When merged/aggregated values of 0 should display as empty:
```python
# Method 1: Write None (best for programmatic verification)
cell.value = computed_value if computed_value != 0 else None
# Method 2: Number format (best for Excel viewing)
cell.value = computed_value
cell.number_format = '0.00;-0.00;""' # positive;negative;zero(blank)
```
---
## Pattern: Side-by-Side Table Detection
Some sheets contain multiple independent tables arranged horizontally (separated by empty columns).
```python
def detect_side_by_side_tables(ws):
"""Find column groups separated by all-null columns."""
tables = []
current_start = None
for col in range(1, ws.max_column + 1):
has_data = any(ws.cell(row=r, column=col).value is not None
for r in range(1, ws.max_row + 1))
if has_data and current_start is None:
current_start = col
elif not has_data and current_start is not None:
tables.append((current_start, col - 1))
current_start = None
if current_start:
tables.append((current_start, ws.max_column))
return tables # [(start_col, end_col), ...]
```

195
skills/xlsx/scenes/edit.md Executable file
View File

@@ -0,0 +1,195 @@
# Scene: Edit Existing Spreadsheet
## When This Applies
User provides an existing .xlsx/.xlsm file and wants to modify it — fill data, fix formulas, beautify layout, add sheets, restructure.
## Core Principle: Preserve First
**Study the existing file before making ANY changes.** The original format, style, and conventions take absolute priority over default guidelines.
### VBA Preservation Rule
When opening `.xlsm` files, **always** use `keep_vba=True`:
```python
wb = load_workbook('file.xlsm', keep_vba=True)
# Edit data/formatting as usual
wb.save('output.xlsm') # VBA modules preserved
```
**Never** save a `.xlsm` as `.xlsx` unless the user explicitly requests macro removal. This silently destroys all VBA code.
## Workflow
```
1. INSPECT → Read the file, understand structure
2. PLAN → Identify what to change vs what to preserve
3. BACKUP → If destructive changes, suggest user keeps original
4. MODIFY → Make targeted changes
5. QA → recalc → audit → scan
6. VALIDATE → validate → deliver
```
## Step 1: Inspect the File
### 1a. Structure Survey
```python
from openpyxl import load_workbook
# Read with formulas preserved
wb = load_workbook('input.xlsx')
# Survey structure
for name in wb.sheetnames:
ws = wb[name]
print(f"Sheet: {name}, Dimensions: {ws.dimensions}, "
f"Rows: {ws.max_row}, Cols: {ws.max_column}")
# Check for existing styles
sample = ws['B4']
print(f"Font: {sample.font.name}, Size: {sample.font.size}, "
f"Bold: {sample.font.bold}, Fill: {sample.fill.fgColor}")
```
Also run `python3 "$XLSX_SKILL_DIR/xlsx.py" inspect input.xlsx --pretty` for structured overview.
### 1b. Semantic Data Sampling (MANDATORY for merge/copy/aggregate operations)
**Don't just print headers — print actual data rows to understand column semantics:**
```python
# Sample first 5 data rows from each sheet
for name in wb.sheetnames:
ws = wb[name]
print(f"\n=== {name} ===")
for row in range(1, min(6, ws.max_row + 1)):
vals = []
for col in range(1, ws.max_column + 1):
v = ws.cell(row=row, column=col).value
if v is not None:
vals.append(f"{get_column_letter(col)}={v}")
if vals:
print(f" Row {row}: {vals}")
```
### 1c. Cross-Sheet Column Semantic Mapping (MANDATORY before any merge/copy)
**⚠️ NEVER copy columns by position index alone when merging sheets.**
When two sheets have similar headers (e.g., both have columns A-V), the same column position may hold completely different data. Always:
1. Print sample data (not just headers) from both source and target sheets
2. For each column, identify the data type and value domain
3. Create an explicit column mapping dict before writing any data
```python
# Example: source sheet E column = amount, target sheet E column = type code
# → Do NOT copy source.E → target.E. Build semantic mapping first.
column_mapping = {
'src_I': 'dst_E', # amount → amount (different positions!)
'src_E': 'dst_I', # type → type
}
```
### 1d. Cell Value Normalization
Canonical implementation lives in **`templates/base.py → normalize_cell_value()`**.
Referenced by `edit-patterns.md` and `quality/pipeline.md`.
```python
from base import normalize_cell_value
# normalize_cell_value(value) → None for blank/NBSP/ZWSP, otherwise original value
```
**Always use this when checking for empty cells**`\xa0` (NBSP) looks blank but fails `is None`.
## Step 2: Match Existing Styles
When adding new cells/rows to a styled file, use **`copy_style()` from `templates/base.py`**:
```python
from base import copy_style
# copy_style(source_cell, target_cell)
# → copies font, fill, border, alignment, number_format
```
## Common Edit Operations
### Fill / Complete Data
```python
# Add data to empty cells while preserving existing formatting
for row in range(start, end + 1):
cell = ws.cell(row=row, column=col)
if cell.value is None:
cell.value = new_value
# Copy style from the cell above
copy_style(ws.cell(row=row-1, column=col), cell)
```
### Insert Rows / Columns
```python
# Insert 3 rows at position 10
ws.insert_rows(10, amount=3)
# Note: formulas referencing rows below 10 will auto-adjust
# Insert column at position D
ws.insert_cols(4)
```
**Warning**: Inserting/deleting rows can break chart references and named ranges. Verify after insertion.
### Restructure Data
```python
# Move data from one layout to another
# Read all data first, then rewrite
data = []
for row in ws.iter_rows(min_row=2, values_only=True):
data.append(row)
# Clear and rewrite in new structure
# ...
```
### Fix Formulas
```python
# Find cells with errors (after recalc)
wb_data = load_workbook('input.xlsx', data_only=True)
ws_data = wb_data.active
wb_formula = load_workbook('input.xlsx')
ws_formula = wb_formula.active
for row in ws_data.iter_rows():
for cell in row:
if isinstance(cell.value, str) and cell.value.startswith('#'):
formula_cell = ws_formula[cell.coordinate]
print(f"Error at {cell.coordinate}: {cell.value}, Formula: {formula_cell.value}")
```
## Format Beautification
When the user asks to "make it look better" or "format nicely":
**Load `engines/design.md`** and apply its complete styling system (tokens, fonts, layout, colors).
**But**: if the file already has a consistent style, enhance it rather than replacing it. Add what's missing (alignment, column widths, alternating fills) without changing existing colors or fonts. Use `copy_style()` (above) to match adjacent cells.
## ⚠️ Dangerous Operations
| Operation | Risk | Mitigation |
|-----------|------|-----------|
| `load_workbook(data_only=True)` then save | Formulas permanently lost | Never save after data_only read |
| Delete rows/cols with formula dependencies | #REF! errors | Run audit after deletion |
| Modify pivot table output with openpyxl | Corrupt pivotCache | Never — regenerate via xlsx.py pivot |
| Overwrite merged cells | Layout breaks | Check `ws.merged_cells.ranges` first |
| Manual row sort (swap row data) | Formulas still reference old row numbers | **Regenerate formula strings with target row number** (see Common Patterns → Sort with Formula Rewrite) |
| Write SUM formula → verify with data_only | Get `None` — formula not evaluated | Compute value in Python for verification; write computed value or use recalc |
---
## Common Patterns
For complex edit operations (grouping, sorting, block detection, merging, sequence fill, etc.):
**Load `scenes/edit-patterns.md`** on demand.
Available patterns: Block Detection, Pre-filter Null, Sort with Formula Rewrite, Group-Merge, Group-Max-Keep-Ties, Sequence Fill, Zero-as-Blank, Side-by-Side Table Detection.

318
skills/xlsx/scenes/finance.md Executable file
View File

@@ -0,0 +1,318 @@
# Financial Model Specialist Guide
Load this reference when the task involves: financial statements, budgets, forecasts, DCF models, LBO, valuation, P&L, balance sheets, cash flow, or any investment banking deliverable.
Also load `engines/design.md` → use **Finance** scene overrides (IB text color rules, section dividers).
---
## Financial Model Architecture
### Standard Sheet Structure
```
Assumptions Sheet:
- All inputs, growth rates, margins, multiples
- Blue font for every changeable number
- Yellow background for key assumptions
- Source citations in adjacent cells or comments
Income Statement / P&L:
- Revenue → COGS → Gross Profit → OpEx → EBIT → Interest → Tax → Net Income
- All values are formulas referencing Assumptions
Balance Sheet:
- Assets = Liabilities + Equity (must balance!)
- Include balance check row: =Assets-Liabilities-Equity (should be 0)
Cash Flow Statement:
- Operating → Investing → Financing → Net Change
- Ending Cash = Beginning Cash + Net Change
Valuation / Output:
- DCF, comparables, or whatever model the user needs
- Green font for values pulled from other sheets
```
### Formula Construction Rules
```python
# ✅ CORRECT: Reference assumptions
sheet['C10'] = '=C9*(1+Assumptions!$B$5)' # Growth rate from assumptions
# ❌ WRONG: Hardcoded magic number
sheet['C10'] = '=C9*1.05'
# ✅ CORRECT: Protected division
sheet['D15'] = '=IF(C15=0,"-",B15/C15)'
# ✅ CORRECT: Consistent formula across periods
# If D10 = '=D9*(1+Assumptions!$B$5)' then E10 must follow the same pattern
```
### Assumptions Sheet Layout
```
B4: "Key Assumptions" (section header, bold)
B6: "Revenue Growth Rate" C6: 0.05 (blue font, yellow bg)
B7: "Gross Margin" C7: 0.65 (blue font, yellow bg)
B8: "OpEx as % Revenue" C8: 0.30 (blue font, yellow bg)
B9: "Tax Rate" C9: 0.21 (blue font, yellow bg)
B10: "Discount Rate (WACC)" C10: 0.10 (blue font, yellow bg)
B11: "Terminal Growth Rate" C11: 0.02 (blue font, yellow bg)
```
### Source Documentation for Hardcodes
Every hardcoded input MUST have a source citation:
```python
# In cell comment
ws['C6'].comment = Comment(
"Source: Company 10-K, FY2024, Page 45, Revenue Growth",
"Z.ai"
)
# Or in adjacent cell (if end of table)
ws['D6'] = "Source: Management guidance, Q3 2024 earnings call"
ws['D6'].font = Font(size=8, italic=True, color="808080")
```
---
## Number Formatting (CRITICAL)
> Finance-specific formats below. For general number formats, see `engines/design.md §10`.
> Finance formats take priority when both apply.
```python
FINANCE_FORMATS = {
# Currency — zeros as dash, negatives in parentheses
'currency': '$#,##0;($#,##0);"-"',
'currency_k': '$#,##0,"K";($#,##0,"K");"-"',
'currency_mm': '$#,##0.0,,"M";($#,##0.0,,"M");"-"',
# Percentages — one decimal
'pct': '0.0%;(0.0%);"-"',
# Multiples — for EV/EBITDA, P/E etc.
'multiple': '0.0"x";(0.0"x");"-"',
# Years — MUST be text, not number (avoids "2,024")
'year': '@',
# Integer with thousands separator
'integer': '#,##0;(#,##0);"-"',
# Two decimal places
'decimal': '#,##0.00;(#,##0.00);"-"',
# Shares (millions)
'shares': '#,##0.0,,"M"',
}
# Apply
cell.number_format = FINANCE_FORMATS['currency_mm']
```
**Always specify units in column headers**: "Revenue ($mm)", "Shares (M)", "Growth (%)"
---
## IB Model Layout Rules
> All colors below use **design tokens from `engines/design.md`**. Do not hardcode hex values.
> Finance-specific overrides (IB text color rules, section dividers) are in `design.md §2.4`.
### Section Headers
```python
# Dark background, white bold text, merged across data width
# Uses PRIMARY from design.md (or Finance palette PRIMARY from design.md)
ws.merge_cells('B10:H10')
ws['B10'] = 'Income Statement'
ws['B10'].fill = PatternFill('solid', fgColor=PRIMARY)
ws['B10'].font = Font(name=FONT_NAME, size=12, bold=HEADER_BOLD, color='FFFFFF')
```
### Data Alignment
- Column labels (years, quarters): **right-aligned**
- Row labels (line items): **left-aligned**
- Submetrics: **indented** (add 2-3 spaces prefix)
```python
# Parent line item
ws['B12'] = 'Revenue'
ws['B12'].font = Font(name=FONT_NAME, bold=HEADER_BOLD)
# Sub line item (indented)
ws['B13'] = ' Product Revenue'
ws['B14'] = ' Service Revenue'
```
### Totals Formatting
```python
# Uses design tokens — see engines/design.md §6.3
total_border = Border(top=Side(style='thin', color=PRIMARY))
for col in range(3, 9): # C through H
cell = ws.cell(row=total_row, column=col)
cell.font = Font(name=FONT_NAME, bold=HEADER_BOLD)
cell.border = total_border
```
### Grid Lines
```python
ws.sheet_view.showGridLines = False # Standard — defined in design.md §7.3
```
---
## Balance Check Pattern
For any financial model with a balance sheet:
```python
# Balance check row (should always be 0)
check_row = bs_end + 2
ws.cell(row=check_row, column=2, value='Balance Check')
for col in range(3, last_col + 1):
letter = get_column_letter(col)
ws.cell(row=check_row, column=col).value = \
f'={letter}{assets_total_row}-{letter}{liab_total_row}-{letter}{equity_total_row}'
# Conditional: red if not zero
ws.conditional_formatting.add(
f'{letter}{check_row}',
CellIsRule(operator='notEqual', formula=['0'],
font=Font(color='FF0000', bold=True))
)
```
---
## Sensitivity / Scenario Tables
```python
# Two-way data table: vary growth rate (rows) × discount rate (cols)
# Row headers: growth rates
growth_rates = [0.02, 0.03, 0.04, 0.05, 0.06]
# Col headers: discount rates
discount_rates = [0.08, 0.09, 0.10, 0.11, 0.12]
# Write headers
for i, g in enumerate(growth_rates):
ws.cell(row=start_row + i + 1, column=start_col, value=g)
ws.cell(row=start_row + i + 1, column=start_col).number_format = '0.0%'
ws.cell(row=start_row + i + 1, column=start_col).font = Font(color='0000FF')
for j, d in enumerate(discount_rates):
ws.cell(row=start_row, column=start_col + j + 1, value=d)
ws.cell(row=start_row, column=start_col + j + 1).number_format = '0.0%'
ws.cell(row=start_row, column=start_col + j + 1).font = Font(color='0000FF')
# Fill formulas for each combination
# Yellow background for the cell matching base case assumptions
```
---
## Projection Period Patterns
```python
# Historical + Projected columns
years = ['FY2022', 'FY2023', 'FY2024', 'FY2025E', 'FY2026E', 'FY2027E']
for i, year in enumerate(years):
col = start_col + i
cell = ws.cell(row=header_row, column=col, value=year)
cell.font = Font(name=FONT_NAME, bold=HEADER_BOLD)
cell.alignment = Alignment(horizontal='center')
# Visual separator between historical and projected
if year.endswith('E') and not years[i-1].endswith('E'):
# Add left border to mark transition
for row in range(header_row, last_row + 1):
ws.cell(row=row, column=col).border = Border(
left=Side(style='medium', color=PRIMARY))
```
---
## Additional Model Templates
### Template: P&L (Profit & Loss) Statement
```
Sheet: "P&L"
Row 1: Company Name + Period
Row 3: Headers (Month/Quarter columns)
Revenue Section:
Product Revenue =Assumptions!B5 * (1+Assumptions!C5)
Service Revenue =Assumptions!B6 * (1+Assumptions!C6)
Total Revenue =SUM(above)
COGS Section:
Direct Costs =Total_Revenue * Assumptions!gross_margin
Gross Profit =Total_Revenue - Direct_Costs
Gross Margin % =IFERROR(Gross_Profit/Total_Revenue, 0)
OpEx Section:
S&M, R&D, G&A (each from Assumptions)
Total OpEx =SUM(S&M:G&A)
EBITDA =Gross_Profit - Total_OpEx
EBITDA Margin % =IFERROR(EBITDA/Total_Revenue, 0)
Below the Line:
D&A, Interest, Tax
Net Income =EBITDA - D&A - Interest - Tax
```
### Template: Budget vs Actual
```
Sheet: "Budget vs Actual"
Columns: Category | Budget | Actual | Variance | Var %
Key formulas:
Variance = =Actual - Budget
Var % = =IFERROR(Variance/Budget, 0)
Conditional formatting:
Var % > 0 → Green font (favorable)
Var % < -10% → Red font + red fill (unfavorable)
Var % -10~0 → Orange font (watch)
Summary section:
Total Budget =SUM(Budget range)
Total Actual =SUM(Actual range)
Overall Var % =IFERROR((Total_Actual-Total_Budget)/Total_Budget, 0)
```
### Template: SaaS Metrics Dashboard
```
Sheet: "SaaS Metrics"
KPIs (each with formula, not hardcoded):
MRR =SUMPRODUCT(Users * ARPU)
ARR =MRR * 12
Net Revenue Retention = =IFERROR((Starting_MRR + Expansion - Contraction - Churn) / Starting_MRR, 0)
CAC =IFERROR(Total_S&M / New_Customers, 0)
LTV =IFERROR(ARPU * Gross_Margin / Monthly_Churn_Rate, 0)
LTV:CAC Ratio =IFERROR(LTV / CAC, 0)
Payback Months =IFERROR(CAC / (ARPU * Gross_Margin), 0)
Chart: MRR waterfall (starting → new → expansion → contraction → churn → ending)
Chart: LTV:CAC trend line
```
### Template: Project Budget Tracker
```
Sheet: "Project Budget"
Columns: Phase | Task | Planned Cost | Actual Cost | Remaining | % Spent | Status
Key formulas:
Remaining = =Planned - Actual
% Spent = =IFERROR(Actual/Planned, 0)
Status = =IF(% Spent>1, "Over Budget", IF(% Spent>0.9, "At Risk", "On Track"))
Phase subtotals with SUBTOTAL function
Grand total row with project-level health indicator
```

View File

@@ -0,0 +1,192 @@
# Finance Lite — Simple Budget & Expense Guide
Load this reference for: simple budgets, expense reports, fee tracking, cost summaries, revenue/expense comparison, personal finance, project cost tracking — any financial table that does **NOT** need DCF, LBO, three-statement linkage, sensitivity analysis, or IB-grade formatting.
For complex financial models → use `scenes/finance.md` instead.
Also load `engines/design.md` for styling (use **standard** design tokens, NOT IB overrides).
---
## When to Use finance_lite vs finance
| Signal | finance_lite ✅ | finance.md ❌ |
|--------|----------------|--------------|
| 预算表 / budget | ✅ | |
| 费用报表 / expense report | ✅ | |
| 项目成本追踪 / project cost tracking | ✅ | |
| 收支对比 / revenue vs cost | ✅ | |
| 个人记账 / personal finance | ✅ | |
| 简单 ROI 计算 / simple ROI calculation | ✅ | |
| DCF / LBO / 估值模型 (valuation model) | | ✅ |
| 三表联动 (P&L + BS + CF) | | ✅ |
| 敏感性分析 / scenario table | | ✅ |
| IB pitch book level formatting | | ✅ |
---
## Standard Sheet Structure
```
Sheet: "Budget" (or user-specified name)
Row 1: margin (whitespace)
Row 2: Title (merged, styled via setup_sheet())
Row 3: spacer
Row 4: Headers
Row 5+: Data rows
Last row: Totals (if applicable)
```
### Typical Column Patterns
**Budget Table:**
```
Category (类别) | Budget Amount (预算金额) | Actual Amount (实际金额) | Variance (差异) | Variance Rate (差异率) | Notes (备注)
```
**Expense Report:**
```
Date (日期) | Category (类别) | Description (说明) | Amount (金额) | Claimant (报销人) | Status (状态)
```
**Revenue vs Cost:**
```
Month (月份) | Revenue (收入) | Cost (成本) | Gross Profit (毛利) | Gross Margin (毛利率)
```
**Project Cost:**
```
Phase (阶段) | Task (任务) | Budget (预算) | Used (已用) | Remaining (剩余) | Usage Rate (使用率) | Status (状态)
```
---
## Formula Patterns
```python
# Variance
cell.value = '=C{r}-B{r}' # Actual - Budget
# Variance percentage (safe division)
cell.value = '=IFERROR((C{r}-B{r})/B{r},0)'
# Running total
cell.value = '=SUM(D$5:D{r})'
# Gross margin
cell.value = '=IFERROR((B{r}-C{r})/B{r},0)'
# Status formula (simple threshold)
cell.value = '=IF(F{r}>1,"Over Budget",IF(F{r}>0.9,"At Risk","On Track"))'
# Subtotal
cell.value = '=SUBTOTAL(9,D{start}:D{end})'
# Grand total
cell.value = '=SUM(D5:D{last_data_row})'
```
---
## Number Formats
Use standard formats from `templates/base.py`:
```python
from templates.base import FORMATS
cell.number_format = FORMATS['currency_cny'] # ¥#,##0.00
cell.number_format = FORMATS['percentage'] # 0.0%
cell.number_format = FORMATS['integer'] # #,##0
cell.number_format = FORMATS['date'] # YYYY-MM-DD
```
For budget-specific formatting (negatives in parentheses):
```python
BUDGET_FORMATS = {
'currency': '¥#,##0.00;(¥#,##0.00);"-"',
'variance': '#,##0.00;(#,##0.00);"-"',
'var_pct': '0.0%;(0.0%);"-"',
}
```
---
## Styling
Use **standard** design tokens (NOT IB overrides):
```python
from templates.base import (
setup_sheet, style_header_row, style_data_row, style_total_row,
FONT_NAME, HEADER_BOLD, PRIMARY, ACCENT_POSITIVE, ACCENT_NEGATIVE, ACCENT_WARNING,
font_body, font_header, fill_header,
)
# Setup
setup_sheet(ws, title="2026年部门预算", last_col=7)
# Headers at row 4
style_header_row(ws, row_num=4, col_start=2, col_end=7)
# Data rows
for i, row_num in enumerate(range(5, last_row + 1)):
style_data_row(ws, row_num=row_num, col_start=2, col_end=7, row_index=i)
# Totals
style_total_row(ws, row_num=last_row + 1, col_start=2, col_end=7)
```
---
## Conditional Formatting (Simple)
```python
from openpyxl.formatting.rule import CellIsRule
from templates.base import CF_POSITIVE_FONT, CF_POSITIVE_FILL, CF_NEGATIVE_FONT, CF_NEGATIVE_FILL
# Highlight positive variance (green)
ws.conditional_formatting.add(
f'D5:D{last_row}',
CellIsRule(operator='greaterThan', formula=['0'],
font=CF_POSITIVE_FONT, fill=CF_POSITIVE_FILL)
)
# Highlight negative variance (red)
ws.conditional_formatting.add(
f'D5:D{last_row}',
CellIsRule(operator='lessThan', formula=['0'],
font=CF_NEGATIVE_FONT, fill=CF_NEGATIVE_FILL)
)
```
---
## Quick Templates
### Template: Monthly Budget
```python
headers = ["类别", "预算金额", "实际金额", "差异", "差异率", "状态"]
# Variance = Actual - Budget
# Var% = IFERROR((Actual-Budget)/Budget, 0)
# Status = IF(Var%>0.1,"超支"(Over Budget),IF(Var%>0,"注意"(Watch),"正常"(Normal)))
```
### Template: Expense Report
```python
headers = ["日期", "类别", "说明", "金额", "报销人", "状态"]
# Date format: YYYY-MM-DD
# Amount: currency_cny
# Status: dropdown validation ["待审批"(Pending),"已审批"(Approved),"已报销"(Reimbursed),"已拒绝"(Rejected)]
```
### Template: Project Cost Tracker
```python
headers = ["阶段", "任务", "预算", "已用", "剩余", "使用率", "状态"]
# Remaining = Budget - Used
# Usage% = IFERROR(Used/Budget, 0)
# Status = IF(Usage%>1,"超支"(Over Budget),IF(Usage%>0.9,"预警"(Warning),"正常"(Normal)))
```

298
skills/xlsx/scenes/vba.md Executable file
View File

@@ -0,0 +1,298 @@
# VBA — Macro Generation & Management Guide
Load this reference when the task involves: creating Excel macros, writing VBA code, automating Excel workflows, adding buttons/forms, modifying existing macros, or any `.xlsm` deliverable that needs programmatic automation.
Also load `engines/vba-templates.md` for ready-to-use code templates.
---
## Core Principles
### 1. Safety First
- **Never** generate VBA that deletes files, accesses filesystem outside the workbook, or sends data to external URLs without explicit user request
- **Always** include error handling (`On Error GoTo`)
- **Always** add `Application.ScreenUpdating` toggle for performance
- Generated macros must be **read-audit-friendly**: clear naming, comments, structured layout
### 2. openpyxl VBA Workflow
openpyxl can read/preserve/inject VBA but **cannot execute** it. The workflow:
```python
# READ existing VBA
from openpyxl import load_workbook
wb = load_workbook('file.xlsm', keep_vba=True)
# wb.vba_archive contains all VBA modules
# CREATE new .xlsm with VBA
from openpyxl import Workbook
wb = Workbook()
# ... build sheets ...
# Inject VBA via vbaProject.bin (see Injection section)
wb.save('output.xlsm')
```
### 3. File Format Rules
| Need | Format | Extension |
|------|--------|-----------|
| Data only, no macros | OpenXML | `.xlsx` |
| Contains VBA macros | Macro-Enabled | `.xlsm` |
| Binary with macros | Binary | `.xlsb` |
**Critical**: If user gives `.xlsx` but wants macros → output must be `.xlsm`. Always warn about format change.
---
## VBA Code Structure Standard
Every generated VBA module must follow this structure:
```vba
Option Explicit
' ============================================================
' Module: [ModuleName]
' Purpose: [One-line description]
' Author: Z.ai
' Date: [YYYY-MM-DD]
' ============================================================
' --- Constants ---
Private Const MODULE_NAME As String = "[ModuleName]"
' --- Main Entry Point ---
Public Sub Main()
On Error GoTo ErrHandler
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
' [Main logic here]
CleanUp:
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Exit Sub
ErrHandler:
MsgBox "Error in " & MODULE_NAME & ": " & Err.Description, _
vbCritical, "Error"
Resume CleanUp
End Sub
```
### Naming Conventions
| Element | Convention | Example |
|---------|-----------|---------|
| Sub/Function | PascalCase | `GenerateMonthlyReport` |
| Variable | camelCase | `lastRow`, `wsData` |
| Constant | UPPER_SNAKE | `MAX_ROWS`, `REPORT_TITLE` |
| Module | PascalCase | `ModReport`, `ModUtils` |
| Worksheet variable | ws + Name | `wsData`, `wsSummary` |
| Range variable | rng + Desc | `rngData`, `rngHeaders` |
### Variable Declaration Rules
```vba
' Always use explicit types
Dim lastRow As Long ' Not Integer (row limit)
Dim ws As Worksheet
Dim rng As Range
Dim cell As Range
Dim i As Long
Dim strValue As String
Dim dblAmount As Double
```
---
## Common Patterns
### Find Last Row/Column (Robust)
```vba
' Last row with data in column A
Dim lastRow As Long
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
' Last column with data in row 1
Dim lastCol As Long
lastCol = ws.Cells(1, ws.Columns.Count).End(xlToLeft).Column
' Used range (less reliable but useful)
Dim usedRows As Long
usedRows = ws.UsedRange.Rows.Count
```
### Loop Through Data
```vba
' Row loop
Dim i As Long
For i = 2 To lastRow ' Skip header
If ws.Cells(i, 1).Value <> "" Then
' Process row
End If
Next i
' For Each (range)
Dim cell As Range
For Each cell In ws.Range("A2:A" & lastRow)
If Not IsEmpty(cell) Then
' Process cell
End If
Next cell
```
### Sheet Operations
```vba
' Reference sheet safely
Dim ws As Worksheet
On Error Resume Next
Set ws = ThisWorkbook.Sheets("Data")
On Error GoTo 0
If ws Is Nothing Then
MsgBox "Sheet 'Data' not found!", vbExclamation
Exit Sub
End If
' Create sheet if not exists
Dim wsNew As Worksheet
Dim sheetExists As Boolean
For Each wsNew In ThisWorkbook.Sheets
If wsNew.Name = "Summary" Then sheetExists = True
Next wsNew
If Not sheetExists Then
Set wsNew = ThisWorkbook.Sheets.Add(After:=ThisWorkbook.Sheets(ThisWorkbook.Sheets.Count))
wsNew.Name = "Summary"
End If
```
### User Interaction
```vba
' Simple input
Dim userInput As String
userInput = InputBox("Enter report month (YYYY-MM):", "Month Selection")
If userInput = "" Then Exit Sub
' Confirmation
If MsgBox("Generate report for " & userInput & "?", _
vbYesNo + vbQuestion, "Confirm") = vbNo Then Exit Sub
' File picker
Dim filePath As Variant
filePath = Application.GetOpenFilename( _
FileFilter:="Excel Files (*.xlsx;*.xlsm),*.xlsx;*.xlsm", _
Title:="Select Source File")
If filePath = False Then Exit Sub
```
---
## VBA Injection via openpyxl
### Method 1: Preserve Existing VBA
```python
# Open with VBA preserved
wb = load_workbook('source.xlsm', keep_vba=True)
# Edit data/formatting as usual
wb.save('output.xlsm') # VBA modules intact
```
### Method 2: Copy VBA from Template
```python
# Use a template .xlsm that already has the VBA you need
import shutil
shutil.copy('template_with_macros.xlsm', 'output.xlsm')
wb = load_workbook('output.xlsm', keep_vba=True)
# Modify data
wb.save('output.xlsm')
```
### Method 3: Manual vbaProject.bin Injection
```python
# For advanced use: inject raw vbaProject.bin
# 1. Create your VBA in Excel, save as .xlsm
# 2. Extract vbaProject.bin from the .xlsm (it's a ZIP)
# 3. Inject into new workbook
import zipfile
import shutil
# Create the workbook first
wb = Workbook()
# ... add data ...
wb.save('temp.xlsx')
# Convert to .xlsm by injecting VBA
shutil.copy('temp.xlsx', 'output.xlsm')
with zipfile.ZipFile('output.xlsm', 'a') as zf:
zf.write('vbaProject.bin', 'xl/vbaProject.bin')
# Update [Content_Types].xml to register VBA
# (This is fragile — Method 1 or 2 preferred)
```
**Recommendation**: Method 1 (preserve) or Method 2 (template) are robust. Method 3 is fragile and should be last resort.
---
## Security Checklist
Before delivering any VBA-enabled file:
- [ ] No filesystem access outside workbook (no `Kill`, `FileCopy`, `MkDir` unless requested)
- [ ] No network calls (`XMLHTTP`, `WinHttpRequest`) unless requested
- [ ] No shell execution (`Shell`, `WScript.Shell`) unless requested
- [ ] No registry access (`CreateObject("WScript.Shell").RegWrite`)
- [ ] No auto-execution (`Auto_Open`, `Workbook_Open`) unless explicitly requested
- [ ] Error handling in every Sub/Function
- [ ] `ScreenUpdating` restored in cleanup
- [ ] All variables explicitly declared (`Option Explicit`)
- [ ] Module purpose documented in header comment
---
## Performance Guidelines
```vba
' ALWAYS bracket bulk operations
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
' [Bulk operations here]
Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
```
### Array-Based Processing (for large data)
```vba
' Read range into array — much faster than cell-by-cell
Dim data As Variant
data = ws.Range("A1:Z" & lastRow).Value ' 2D array
' Process in memory
Dim i As Long
For i = LBound(data, 1) To UBound(data, 1)
data(i, 3) = data(i, 1) * data(i, 2) ' Column C = A * B
Next i
' Write back in one shot
ws.Range("A1:Z" & lastRow).Value = data
```
---
## Debugging Support
When user reports VBA errors, include diagnostic code:
```vba
' Debug logging to Immediate Window
Debug.Print "Processing row " & i & ": " & ws.Cells(i, 1).Value
' Verbose error info
ErrHandler:
Debug.Print "ERROR in " & MODULE_NAME
Debug.Print " Number: " & Err.Number
Debug.Print " Description: " & Err.Description
Debug.Print " Source: " & Err.Source
```

136
skills/xlsx/setup.sh Executable file
View File

@@ -0,0 +1,136 @@
#!/usr/bin/env bash
# ---
# name: xlsx-setup
# author: Z.AI
# version: "1.0"
# description: Environment setup for the XLSX skill. Checks and installs all required dependencies.
# ---
#
# Installs only dependencies required by the XLSX skill.
set -euo pipefail
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m'
ok() { echo -e " ${GREEN}${NC} $1"; }
fail() { echo -e " ${RED}${NC} $1"; }
warn() { echo -e " ${YELLOW}${NC} $1"; }
info() { echo -e " ${BLUE}${NC} $1"; }
echo "============================================"
echo " XLSX Skill — Environment Setup"
echo "============================================"
echo ""
OS="$(uname -s)"
ARCH="$(uname -m)"
echo "Platform: $OS $ARCH"
echo ""
# ── 0. macOS: Homebrew ──
if [ "$OS" = "Darwin" ]; then
echo "--- Homebrew (macOS package manager) ---"
if command -v brew &>/dev/null; then
BREW_VER=$(brew --version 2>/dev/null | head -1)
ok "brew ($BREW_VER)"
else
fail "brew not found — some optional dependencies need Homebrew on macOS"
info "Install: /bin/bash -c \"\$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\""
fi
echo ""
fi
# ── 1. Python 3 ──
echo "--- Python ---"
if command -v python3 &>/dev/null; then
PY_VER=$(python3 --version 2>&1)
ok "python3 ($PY_VER)"
if [ "$OS" = "Darwin" ]; then
PY_PATH=$(which python3 2>/dev/null)
if [[ "$PY_PATH" == "/usr/bin/python3" ]]; then
warn "Using macOS system Python (limited). Recommend: brew install python3"
fi
fi
else
fail "python3 not found"
case "$OS" in
Darwin) info "Install: brew install python3" ;;
Linux) info "Install: sudo apt install python3 python3-pip (Debian/Ubuntu)"
info " sudo dnf install python3 python3-pip (Fedora/RHEL)" ;;
*) info "Install: https://www.python.org/downloads/" ;;
esac
fi
# ── 2. pip ──
echo ""
echo "--- pip ---"
if python3 -m pip --version &>/dev/null 2>&1; then
PIP_VER=$(python3 -m pip --version 2>/dev/null | head -1)
ok "pip ($PIP_VER)"
else
fail "pip not found"
case "$OS" in
Darwin) info "Install: python3 -m ensurepip --upgrade"
info " or: brew install python3 (includes pip)" ;;
Linux) info "Install: sudo apt install python3-pip (Debian/Ubuntu)" ;;
*) info "Install: python3 -m ensurepip --upgrade" ;;
esac
fi
# ── 3. Python packages ──
echo ""
echo "--- Python Packages ---"
PY_PKGS=(
"openpyxl:openpyxl"
"xlsxwriter:XlsxWriter"
)
MISSING_PY=()
for entry in "${PY_PKGS[@]}"; do
mod="${entry%%:*}"
pkg="${entry##*:}"
if python3 -c "import $mod" 2>/dev/null; then
ver=$(python3 -c "import $mod; print(getattr($mod, '__version__', 'installed'))" 2>/dev/null)
ok "$pkg ($ver)"
else
fail "$pkg not installed"
MISSING_PY+=("$pkg")
fi
done
if [ ${#MISSING_PY[@]} -gt 0 ]; then
echo ""
if [ -t 0 ]; then
read -p " Install missing Python packages? [Y/n] " -n 1 -r REPLY
echo ""
REPLY=${REPLY:-Y}
else
warn "Non-interactive mode — skipping auto-install. Run interactively or install manually."
REPLY=N
fi
if [[ ! $REPLY =~ ^[Nn]$ ]]; then
python3 -m pip install -q "${MISSING_PY[@]}" 2>/dev/null \
|| python3 -m pip install -q --user "${MISSING_PY[@]}" 2>/dev/null \
|| python3 -m pip install -q --break-system-packages "${MISSING_PY[@]}" 2>/dev/null \
|| { fail "pip install failed. Try manually: pip install ${MISSING_PY[*]}"; }
ok "Installed: ${MISSING_PY[*]}"
fi
fi
# ── 4. LibreOffice (optional, for format conversion) ──
echo ""
echo "--- LibreOffice (optional, for CSV/PDF conversion) ---"
if command -v soffice &>/dev/null; then
LO_VER=$(soffice --version 2>/dev/null | head -1)
ok "libreoffice ($LO_VER)"
else
warn "libreoffice not installed (needed only for .xlsx→PDF or .csv→.xlsx conversion)"
case "$OS" in
Darwin) info "Install: brew install --cask libreoffice" ;;
Linux) info "Install: sudo apt install libreoffice-core (Debian/Ubuntu)" ;;
esac
fi
# ── Summary ──
echo ""
echo "============================================"
echo " Setup complete."
echo "============================================"

632
skills/xlsx/templates/base.py Executable file
View File

@@ -0,0 +1,632 @@
"""
xlsx skill — Base Template
===========================
Single source of truth for design tokens, font resolution, and style factories.
All scene/engine code MUST import from here. Never hardcode colors, fonts, or styles.
Usage:
from templates.base import *
# To switch palette based on user prompt (call BEFORE creating styles):
use_palette("帮我做一个温暖的销售月报") # Chinese prompt example
# → All color tokens and style factories now use 'warm' palette.
# Or manually:
use_palette_explicit("warm")
"""
import platform
from openpyxl.styles import PatternFill, Font, Border, Side, Alignment
from copy import copy
# ============================================================
# §1 Font Resolution (cross-platform fallback chain)
# ============================================================
def _resolve_font(candidates: list) -> str:
"""Return the first font name likely available on this OS."""
system = platform.system()
_platform_hints = {
"Darwin": {"PingFang SC", "Hiragino Sans GB", ".AppleSystemUIFont"},
"Windows": {"Microsoft YaHei", "SimHei", "SimSun"},
"Linux": {"Noto Sans CJK SC", "WenQuanYi Micro Hei", "Source Han Sans SC"},
}
available = _platform_hints.get(system, set())
for name in candidates:
if name in available:
return name
return candidates[0]
# CJK sans-serif fallback chain
CJK_BODY_CHAIN = [
"PingFang SC", # macOS
"Microsoft YaHei", # Windows
"Noto Sans CJK SC", # Linux / Android
"Hiragino Sans GB", # macOS alt
"Source Han Sans SC", # Adobe cross-platform
"SimHei", # classic fallback
]
# Latin serif (for formal reports)
LATIN_BODY_CHAIN = [
"Times New Roman",
"Georgia",
"serif",
]
FONT_CJK = _resolve_font(CJK_BODY_CHAIN)
FONT_LATIN = _resolve_font(LATIN_BODY_CHAIN)
# Primary font — CJK font covers ASCII too
FONT_NAME = FONT_CJK
# Bold strategy: heavy-stroke fonts should NOT be bolded
_HEAVY_FONTS = {
"SimHei", "Microsoft YaHei", "PingFang SC",
"Noto Sans CJK SC", "Source Han Sans SC",
"Hiragino Sans GB", "WenQuanYi Micro Hei",
}
HEADER_BOLD = FONT_NAME not in _HEAVY_FONTS
# ============================================================
# §2 Color Tokens (Three-Color Rule)
# ============================================================
# --- Primary (deep blue — professional default) ---
PRIMARY = "1B2A4A"
PRIMARY_LIGHT = "D6E4F0"
SECONDARY = PRIMARY_LIGHT # derived from primary
# --- Accent (semantic, on-demand) ---
ACCENT_POSITIVE = "1B7D46" # growth, done, pass (deep green)
ACCENT_NEGATIVE = "C0392B" # decline, overdue (deep red)
ACCENT_WARNING = "D4820A" # at-risk, watch (deep amber)
# --- Neutral (warm gray) ---
NEUTRAL_900 = "37352F" # body text
NEUTRAL_600 = "8C8A84" # caption, secondary text
NEUTRAL_200 = "E9E9E8" # borders, dividers
NEUTRAL_100 = "F7F7F5" # alternating row fill (odd)
NEUTRAL_50 = "FAFAF9" # ultra-light bg (optional)
NEUTRAL_0 = "FFFFFF" # white (even rows)
# --- Header text color (overridable by palette) ---
HEADER_TEXT = "FFFFFF"
# --- Chart palette (max 5 colors) ---
CHART_COLORS = [PRIMARY, ACCENT_POSITIVE, ACCENT_WARNING, ACCENT_NEGATIVE, NEUTRAL_600]
# --- Conditional formatting fills ---
CF_POSITIVE_FILL = PatternFill(bgColor="E8F5E9")
CF_POSITIVE_FONT = Font(color=ACCENT_POSITIVE)
CF_NEGATIVE_FILL = PatternFill(bgColor="FDEDEC")
CF_NEGATIVE_FONT = Font(color=ACCENT_NEGATIVE)
CF_WARNING_FILL = PatternFill(bgColor="FEF9E7")
CF_WARNING_FONT = Font(color=ACCENT_WARNING)
# --- Active style (for debugging/logging) ---
_ACTIVE_STYLE = "professional"
# ============================================================
# §2.1 Palette Integration
# ============================================================
def use_palette(prompt: str):
"""
Auto-detect style from user prompt and switch all color tokens.
Call this BEFORE creating any styles/cells.
Three-step matching:
1. Explicit style keywords → direct match
2. Scene/content keywords → infer style
3. No match → professional (safe default)
Example:
use_palette("帮我做一个温暖的销售月报") # Chinese prompt example
# → 'warm' palette applied
"""
from templates.palettes import resolve_palette_with_info
palette, style = resolve_palette_with_info(prompt)
_apply(palette, style)
def use_palette_explicit(style: str = "professional"):
"""
Manually select a palette by style name.
Available: professional, warm, elegant, creative, muji, aesop,
kinfolk, celine, bottega, chanel, bloomberg, original_blue
Example:
use_palette_explicit("warm")
"""
from templates.palettes import get_palette
palette = get_palette(style)
_apply(palette, style)
def get_active_style() -> str:
"""Return the currently active style name."""
return _ACTIVE_STYLE
def _apply(palette: dict, style: str):
"""Internal: apply a palette dict to all module-level color tokens."""
global PRIMARY, PRIMARY_LIGHT, SECONDARY
global ACCENT_POSITIVE, ACCENT_NEGATIVE, ACCENT_WARNING
global NEUTRAL_900, NEUTRAL_600, NEUTRAL_200, NEUTRAL_100, NEUTRAL_50, NEUTRAL_0
global CHART_COLORS, HEADER_TEXT
global CF_POSITIVE_FILL, CF_POSITIVE_FONT
global CF_NEGATIVE_FILL, CF_NEGATIVE_FONT
global CF_WARNING_FILL, CF_WARNING_FONT
global _ACTIVE_STYLE
PRIMARY = palette["PRIMARY"]
PRIMARY_LIGHT = palette["PRIMARY_LIGHT"]
SECONDARY = palette["SECONDARY"]
ACCENT_POSITIVE = palette["ACCENT_POSITIVE"]
ACCENT_NEGATIVE = palette["ACCENT_NEGATIVE"]
ACCENT_WARNING = palette["ACCENT_WARNING"]
NEUTRAL_900 = palette["NEUTRAL_900"]
NEUTRAL_600 = palette["NEUTRAL_600"]
NEUTRAL_200 = palette["NEUTRAL_200"]
NEUTRAL_100 = palette["NEUTRAL_100"]
NEUTRAL_50 = palette["NEUTRAL_50"]
NEUTRAL_0 = palette["NEUTRAL_0"]
HEADER_TEXT = palette.get("HEADER_TEXT", "FFFFFF")
CHART_COLORS = palette["CHART_COLORS"]
# Rebuild conditional formatting fills/fonts with new accent colors
CF_POSITIVE_FILL = PatternFill(bgColor=palette.get("CF_POSITIVE_BG", "E8F5E9"))
CF_POSITIVE_FONT = Font(color=ACCENT_POSITIVE)
CF_NEGATIVE_FILL = PatternFill(bgColor=palette.get("CF_NEGATIVE_BG", "FDEDEC"))
CF_NEGATIVE_FONT = Font(color=ACCENT_NEGATIVE)
CF_WARNING_FILL = PatternFill(bgColor=palette.get("CF_WARNING_BG", "FEF9E7"))
CF_WARNING_FONT = Font(color=ACCENT_WARNING)
_ACTIVE_STYLE = style
# ============================================================
# §3 Column Width Map
# ============================================================
COLUMN_WIDTHS = {
"margin": 3, # A col whitespace
"id_short": 8, # #, ID
"name_cn": 16, # Chinese name (2-4 chars)
"name_en": 22, # English name
"description": 32, # long text
"number": 14, # currency, amount
"percentage": 12, # %
"date": 14, # YYYY-MM-DD
"status": 12, # short label
}
# ============================================================
# §4 Number Formats
# ============================================================
FORMATS = {
"integer": "#,##0",
"decimal_1": "#,##0.0",
"decimal_2": "#,##0.00",
"percentage": "0.0%",
"currency_cny": "¥#,##0.00",
"currency_usd": "$#,##0.00",
"date": "YYYY-MM-DD",
}
# ============================================================
# §5 Style Factories
# ============================================================
def font_title():
"""16pt title font — left-aligned, no fill."""
return Font(name=FONT_NAME, size=16, bold=HEADER_BOLD, color=PRIMARY)
def font_header():
"""11pt header font — text color on primary background."""
return Font(name=FONT_NAME, size=11, bold=HEADER_BOLD, color=HEADER_TEXT)
def font_subheader():
"""11pt sub-header — primary color text."""
return Font(name=FONT_NAME, size=11, bold=HEADER_BOLD, color=PRIMARY)
def font_body():
"""11pt body text."""
return Font(name=FONT_NAME, size=11, color=NEUTRAL_900)
def font_caption():
"""9pt caption / footnote."""
return Font(name=FONT_NAME, size=9, color=NEUTRAL_600)
def font_kpi():
"""22pt big KPI number."""
return Font(name=FONT_NAME, size=22, bold=HEADER_BOLD, color=PRIMARY)
def font_kpi_label():
"""9pt KPI label."""
return Font(name=FONT_NAME, size=9, color=NEUTRAL_600)
def make_chart_title(text, size_pt=12, bold=True, axis=False, max_line_chars=6):
"""
Build a chart Title with font baked into <tx><rich><defRPr>/<rPr>.
Ensures WPS and Office render identical font name and size.
Uses FONT_NAME and HEADER_BOLD from §1 — no hardcoded font names.
Args:
axis: If True, set bodyPr rot=-5400000 (rotate -90°) for Y-axis titles.
max_line_chars: For axis titles, auto-insert line breaks (\n) when text
exceeds this length. Breaks at parentheses boundaries.
The text stays in ONE run inside ONE paragraph — this prevents
WPS/Office from creating separate overlapping text boxes.
Set to 0 or None to disable.
Key insight: Multiple <p> paragraphs in axis titles cause WPS to render
them as stacked overlapping text boxes. Instead, we use a SINGLE <r> run
with \\n line breaks inside the text, which both Office and WPS render
as line breaks within the same text box.
"""
from openpyxl.chart.title import Title
from openpyxl.chart.text import Text, RichText
from openpyxl.drawing.text import (
Paragraph, ParagraphProperties, CharacterProperties,
Font as DrawingFont, RichTextProperties, RegularTextRun,
LineBreak,
)
from copy import deepcopy
import re
rpr = CharacterProperties(
latin=DrawingFont(typeface=FONT_NAME),
ea=DrawingFont(typeface=FONT_NAME),
sz=int(size_pt * 100),
b=bold if HEADER_BOLD else False,
)
def _insert_breaks(text, max_chars):
"""Insert \\n before parentheses when text exceeds max_chars."""
if not max_chars or len(text) <= max_chars:
return text
# Insert \n before '(' or ''
result = re.sub(r'(?=[(])', '\n', text, count=1)
return result
# For axis titles, insert line breaks to prevent overlap
display_text = text
if axis and max_line_chars:
display_text = _insert_breaks(text, max_line_chars)
# Single paragraph, single run with \n inside the text.
# Both Office and WPS render \n as line breaks within one text box.
# Do NOT use multiple <p> paragraphs — WPS renders them as separate
# overlapping text boxes on axis titles.
run = RegularTextRun(rPr=deepcopy(rpr), t=display_text)
inner_body = RichTextProperties(rot=-5400000) if axis else RichTextProperties()
para = Paragraph(
pPr=ParagraphProperties(defRPr=deepcopy(rpr)),
r=[run],
)
rich = RichText(bodyPr=inner_body, p=[para])
# Outer txPr: Office reads rotation from here for axis titles
if axis:
outer_body = RichTextProperties(rot=-5400000)
txPr = RichText(
bodyPr=outer_body,
p=[Paragraph(pPr=ParagraphProperties(defRPr=deepcopy(rpr)))],
)
return Title(tx=Text(rich=rich), txPr=txPr)
return Title(tx=Text(rich=rich))
def fill_header():
return PatternFill("solid", fgColor=PRIMARY)
def fill_total():
return PatternFill("solid", fgColor=SECONDARY)
def fill_data_row(row_index: int):
"""Alternating row: even=white, odd=warm-white."""
color = NEUTRAL_0 if row_index % 2 == 0 else NEUTRAL_100
return PatternFill("solid", fgColor=color)
def border_header():
"""Thin bottom border under header row."""
return Border(bottom=Side(style="thin", color=NEUTRAL_200))
def border_total():
"""Medium top border above totals row."""
return Border(top=Side(style="medium", color=NEUTRAL_200))
def align_title():
return Alignment(horizontal="left", vertical="center")
def align_header():
return Alignment(horizontal="center", vertical="center", wrap_text=True)
def align_number():
return Alignment(horizontal="right", vertical="center")
def align_text():
return Alignment(horizontal="left", vertical="center")
def align_date():
return Alignment(horizontal="center", vertical="center")
# ============================================================
# §6 Sheet Setup Helpers
# ============================================================
ROW_HEIGHTS = {
"margin": 15, # row 1 top whitespace
"title": 32, # row 2
"spacer": 8, # row 3
"header": 28, # row 4
"data": 22, # data rows
"total": 26, # totals row
}
def setup_sheet(ws, title: str = None, last_col: int = None):
"""
Apply standard sheet setup:
- hide grid lines
- set margin column A width
- set row 1/2/3 heights
- optionally write & style title at B2
"""
ws.sheet_view.showGridLines = False
ws.column_dimensions["A"].width = COLUMN_WIDTHS["margin"]
ws.row_dimensions[1].height = ROW_HEIGHTS["margin"]
ws.row_dimensions[2].height = ROW_HEIGHTS["title"]
ws.row_dimensions[3].height = ROW_HEIGHTS["spacer"]
if title and last_col:
ws.merge_cells(start_row=2, start_column=2, end_row=2, end_column=last_col)
cell = ws.cell(row=2, column=2, value=title)
cell.font = font_title()
cell.alignment = align_title()
def style_header_row(ws, row_num: int, col_start: int, col_end: int):
"""Apply header style to a row range."""
for col in range(col_start, col_end + 1):
cell = ws.cell(row=row_num, column=col)
cell.fill = fill_header()
cell.font = font_header()
cell.alignment = align_header()
cell.border = border_header()
ws.row_dimensions[row_num].height = ROW_HEIGHTS["header"]
def style_data_row(ws, row_num: int, col_start: int, col_end: int, row_index: int):
"""Apply data row style (alternating fill)."""
fill = fill_data_row(row_index)
for col in range(col_start, col_end + 1):
cell = ws.cell(row=row_num, column=col)
cell.fill = fill
cell.font = font_body()
ws.row_dimensions[row_num].height = ROW_HEIGHTS["data"]
def style_total_row(ws, row_num: int, col_start: int, col_end: int):
"""Apply totals row style."""
for col in range(col_start, col_end + 1):
cell = ws.cell(row=row_num, column=col)
cell.fill = fill_total()
cell.font = font_subheader()
cell.border = border_total()
ws.row_dimensions[row_num].height = ROW_HEIGHTS["total"]
# ============================================================
# §6.1 Chart Factory Functions
# ============================================================
def create_bar_chart(chart_type="col", grouping="clustered", gap_width=80,
overlap=100, style=10, width=18, height=10, **kwargs):
"""
Create a BarChart with sane defaults that prevent the "thin bar" / offset bug.
Key fixes baked in:
- gapWidth=80 (default 150 → bars too thin)
- overlap=100 (bars fill their slot, no empty gap for line series)
Returns an openpyxl BarChart ready for add_data / set_categories.
"""
from openpyxl.chart import BarChart
chart = BarChart()
chart.type = chart_type
chart.grouping = grouping
chart.gapWidth = gap_width
chart.overlap = overlap
chart.style = style
chart.width = width
chart.height = height
return chart
def create_line_chart(style=10, width=18, height=11, **kwargs):
"""Create a LineChart with standard defaults."""
from openpyxl.chart import LineChart
chart = LineChart()
chart.style = style
chart.width = width
chart.height = height
return chart
def create_pie_chart(style=10, width=14, height=10, **kwargs):
"""Create a PieChart with standard defaults."""
from openpyxl.chart import PieChart
chart = PieChart()
chart.style = style
chart.width = width
chart.height = height
return chart
def setup_chart_titles(chart, title=None, y_title=None, x_title=None,
title_size=12, axis_size=10):
"""
Set chart title and axis titles using make_chart_title() for
cross-platform font consistency (Office + WPS).
This is the ONLY correct way to set chart titles. Never do:
chart.title = "some string" # ← WRONG
chart.y_axis.title = "some string" # ← WRONG
Args:
chart: openpyxl chart object
title: main chart title (optional)
y_title: Y-axis title (optional, auto-rotated -90°)
x_title: X-axis title (optional)
title_size: font size for main title (default 12)
axis_size: font size for axis titles (default 10)
"""
if title is not None:
chart.title = make_chart_title(title, size_pt=title_size, bold=True)
if y_title is not None:
chart.y_axis.title = make_chart_title(y_title, size_pt=axis_size, bold=False, axis=True)
if x_title is not None:
chart.x_axis.title = make_chart_title(x_title, size_pt=axis_size, bold=False)
def apply_chart_colors(chart, colors=None):
"""
Apply palette colors to all series in a chart.
Call AFTER add_data().
Args:
chart: openpyxl chart object (BarChart, LineChart, etc.)
colors: list of hex color strings (default: CHART_COLORS)
"""
if colors is None:
colors = CHART_COLORS
for i, series in enumerate(chart.series):
color_hex = colors[i % len(colors)]
series.graphicalProperties.solidFill = color_hex
# For line charts, also set line color
if hasattr(series.graphicalProperties, 'line') and series.graphicalProperties.line is not None:
series.graphicalProperties.line.solidFill = color_hex
def apply_pie_colors(chart, count, colors=None):
"""
Apply palette colors to pie chart data points.
Call AFTER add_data().
Args:
chart: openpyxl PieChart
count: number of data points (slices)
colors: list of hex color strings (default: CHART_COLORS)
"""
from openpyxl.chart.series import DataPoint
if colors is None:
colors = CHART_COLORS
for idx in range(count):
pt = DataPoint(idx=idx)
pt.graphicalProperties.solidFill = colors[idx % len(colors)]
chart.series[0].data_points.append(pt)
# ============================================================
# §7 Utility Functions
# ============================================================
def normalize_cell_value(value):
"""Normalize cell values: convert invisible whitespace variants to None."""
if value is None:
return None
if isinstance(value, str):
stripped = value.strip().replace("\xa0", "").replace("\u200b", "")
if stripped == "":
return None
return value
def copy_style(source_cell, target_cell):
"""Copy all styling from source to target cell."""
target_cell.font = copy(source_cell.font)
target_cell.fill = copy(source_cell.fill)
target_cell.border = copy(source_cell.border)
target_cell.alignment = copy(source_cell.alignment)
target_cell.number_format = source_cell.number_format
def auto_fit_columns(ws, min_width=8, max_width=28, header_row=None, data_start_row=None):
"""
Auto-fit column widths based on DATA content (not header).
Headers that exceed the computed width get wrap_text=True instead of stretching the column.
Args:
ws: worksheet
min_width: minimum column width (default 8)
max_width: maximum column width (default 28)
header_row: row number of the header (auto-detected if None)
data_start_row: first data row (auto-detected as header_row + 1 if None)
"""
import unicodedata
def _display_width(text):
"""Estimate display width: CJK chars count as ~1.7, others as 1."""
if text is None:
return 0
s = str(text)
w = 0
for ch in s:
if unicodedata.east_asian_width(ch) in ('W', 'F'):
w += 1.7
else:
w += 1
return w
# Auto-detect header row: first row with data starting from column B
if header_row is None:
for row in range(1, ws.max_row + 1):
val = ws.cell(row=row, column=2).value
if val is not None:
header_row = row
break
if header_row is None:
return
if data_start_row is None:
data_start_row = header_row + 1
for col_cells in ws.iter_cols(min_col=1, max_col=ws.max_column, min_row=data_start_row, max_row=ws.max_row):
if not col_cells:
continue
col_letter = col_cells[0].column_letter
# Skip margin column A
if col_letter == 'A':
continue
# Width based on data content only
max_data_w = max((_display_width(c.value) for c in col_cells), default=0)
width = min(max_width, max(min_width, max_data_w + 2))
ws.column_dimensions[col_letter].width = width
# If header text is wider than computed column width, wrap it
header_cell = ws.cell(row=header_row, column=col_cells[0].column)
header_w = _display_width(header_cell.value)
if header_w > width:
current_align = header_cell.alignment
header_cell.alignment = Alignment(
horizontal=current_align.horizontal or "center",
vertical=current_align.vertical or "center",
wrap_text=True,
)

521
skills/xlsx/templates/palettes.py Executable file
View File

@@ -0,0 +1,521 @@
"""
xlsx skill — Palette System (Style-First Theme Engine)
=======================================================
12 visual styles × scene-based fallback. No domain-color binding.
Themes (12):
professional, warm, elegant, creative,
muji, aesop, kinfolk, celine, bottega, chanel, bloomberg, original_blue
Matching priority:
1. Explicit style keywords in prompt → direct match
2. Scene/content keywords → infer style
3. No match → professional (safe default)
Usage:
from templates.palettes import resolve_palette, get_palette
# Auto-detect from user prompt
palette = resolve_palette("帮我做一个温暖的销售月报") # Chinese prompt example
# → warm palette
# Manual selection
palette = get_palette("bottega")
"""
from __future__ import annotations
from typing import Dict, Optional, Tuple
# ============================================================
# §1 Palette Data Structure
# ============================================================
_Palette = Dict[str, str | list]
def _make_palette(
*,
primary: str,
primary_light: str,
accent_positive: str = "1B7D46",
accent_negative: str = "C0392B",
accent_warning: str = "D4820A",
neutral_900: str = "37352F",
neutral_600: str = "8C8A84",
neutral_200: str = "E9E9E8",
neutral_100: str = "F7F7F5",
neutral_50: str = "FAFAF9",
neutral_0: str = "FFFFFF",
header_text: str = "FFFFFF",
cf_positive_bg: str = "E8F5E9",
cf_negative_bg: str = "FDEDEC",
cf_warning_bg: str = "FEF9E7",
) -> _Palette:
return {
"PRIMARY": primary,
"PRIMARY_LIGHT": primary_light,
"SECONDARY": primary_light,
"ACCENT_POSITIVE": accent_positive,
"ACCENT_NEGATIVE": accent_negative,
"ACCENT_WARNING": accent_warning,
"NEUTRAL_900": neutral_900,
"NEUTRAL_600": neutral_600,
"NEUTRAL_200": neutral_200,
"NEUTRAL_100": neutral_100,
"NEUTRAL_50": neutral_50,
"NEUTRAL_0": neutral_0,
"HEADER_TEXT": header_text,
"CF_POSITIVE_BG": cf_positive_bg,
"CF_NEGATIVE_BG": cf_negative_bg,
"CF_WARNING_BG": cf_warning_bg,
"CHART_COLORS": [primary, accent_positive, accent_warning, accent_negative, neutral_600],
}
# ============================================================
# §2 Legacy Palettes (6 original styles)
# ============================================================
# -- Professional: formal business, universal default --
PROFESSIONAL = _make_palette(
primary="1B2A4A",
primary_light="D6E4F0",
)
# -- Warm: warm and vibrant, high impact --
WARM = _make_palette(
primary="B85C1E",
primary_light="F5E6D5",
accent_positive="2E7D32",
accent_negative="C62828",
accent_warning="E65100",
neutral_900="3E2F1F",
neutral_600="9C8B78",
neutral_200="EAE0D5",
neutral_100="F7F2EC",
neutral_50="FBF8F5",
)
# -- Fresh: natural freshness, friendly and light --
FRESH = _make_palette(
primary="0E7C6B",
primary_light="D4F0EB",
accent_positive="2E9E5A",
accent_negative="D94F4F",
accent_warning="E6A023",
neutral_900="2F3735",
neutral_600="7A8C87",
neutral_200="DEE9E6",
neutral_100="F2F8F6",
neutral_50="F8FBFA",
)
# -- Elegant: premium restraint, minimalist black-white --
ELEGANT = _make_palette(
primary="2C2C2C",
primary_light="E5E5E5",
accent_positive="4A4A4A",
accent_negative="8B0000",
accent_warning="6B6B6B",
neutral_900="1A1A1A",
neutral_600="808080",
neutral_200="D4D4D4",
neutral_100="F0F0F0",
neutral_50="F8F8F8",
)
# -- Creative: artistic personality, distinctive --
CREATIVE = _make_palette(
primary="6C5B7B",
primary_light="E4DDE8",
accent_positive="6B9E78",
accent_negative="C06C7E",
accent_warning="C4A46A",
neutral_900="3E3A42",
neutral_600="9590A0",
neutral_200="E0DCE4",
neutral_100="F3F1F5",
neutral_50="F9F8FA",
)
# -- Vibrant: high-saturation multi-color, data display --
VIBRANT = _make_palette(
primary="2563EB",
primary_light="DBEAFE",
accent_positive="16A34A",
accent_negative="DC2626",
accent_warning="EA580C",
neutral_900="1E293B",
neutral_600="64748B",
neutral_200="E2E8F0",
neutral_100="F1F5F9",
neutral_50="F8FAFC",
)
# ============================================================
# §3 Premium Palettes (8 curated themes, "high-end feel" series)
# ============================================================
# -- A · MUJI breathing feel: restrained minimalism, pencil on paper --
MUJI = _make_palette(
primary="2C2C2C",
primary_light="F2F1EE",
accent_positive="5B8C5A",
accent_negative="C25450",
accent_warning="C9A84C",
neutral_900="2C2C2C",
neutral_600="999999",
neutral_200="E8E6E1",
neutral_100="F9F9F7",
neutral_50="FCFCFB",
header_text="FFFFFF",
)
# -- B · Aesop sandstone: earth tones, premium skincare packaging --
AESOP = _make_palette(
primary="3D3229",
primary_light="EDE8E0",
accent_positive="6B8F71",
accent_negative="B85C4A",
accent_warning="C4975A",
neutral_900="4A4038",
neutral_600="8C7B6B",
neutral_200="DDD5C9",
neutral_100="FAF8F5",
neutral_50="FDFCFA",
header_text="FFFFFF",
)
# -- C · Dieter Rams Industrial: Less but better --
DIETER_RAMS = _make_palette(
primary="1A1A1A",
primary_light="F7F7F7",
accent_positive="2D8C6F",
accent_negative="D44D3C",
accent_warning="D4920A",
neutral_900="1A1A1A",
neutral_600="787878",
neutral_200="E5E5E5",
neutral_100="F7F7F7",
neutral_50="FAFAFA",
header_text="FFFFFF",
)
# -- D · Kinfolk cream publication: independent magazine typography, slow-life aesthetic --
KINFOLK = _make_palette(
primary="5C524C",
primary_light="F0ECE7",
accent_positive="8DAA7F",
accent_negative="C9776A",
accent_warning="C9A96A",
neutral_900="5C524C",
neutral_600="BEB5AD",
neutral_200="EAE5DF",
neutral_100="FDFCFA",
neutral_50="FEFDFB",
header_text="FFFFFF",
)
# -- E · Céline pure black-white: monochrome, fashion house coldness --
CELINE = _make_palette(
primary="000000",
primary_light="FAFAFA",
accent_positive="4A7C59",
accent_negative="A63D2F",
accent_warning="8C7A3C",
neutral_900="000000",
neutral_600="ADADAD",
neutral_200="E0E0E0",
neutral_100="FAFAFA",
neutral_50="FDFDFD",
header_text="FFFFFF",
)
# -- F · Bottega dark green: Italian luxury, deep forest green --
BOTTEGA = _make_palette(
primary="2D4A3E",
primary_light="E8F0EB",
accent_positive="5FA67A",
accent_negative="C2694B",
accent_warning="B89B4A",
neutral_900="3B5249",
neutral_600="7A9B8C",
neutral_200="D4E3DB",
neutral_100="F6FAF8",
neutral_50="F9FCFA",
header_text="FFFFFF",
)
# -- G · Chanel champagne gold: Chanel elegance, beige + golden brown --
CHANEL = _make_palette(
primary="1C1917",
primary_light="E7DFD4",
accent_positive="A3845B",
accent_negative="B0413E",
accent_warning="C4975A",
neutral_900="1C1917",
neutral_600="A39888",
neutral_200="E7E0D5",
neutral_100="FDFBF7",
neutral_50="FEFDFB",
header_text="FFFFFF",
)
# -- H · Bloomberg deep blue: financial terminal, high-density data aesthetic --
BLOOMBERG = _make_palette(
primary="0D1B2A",
primary_light="D6E0EB",
accent_positive="10B981",
accent_negative="EF4444",
accent_warning="F59E0B",
neutral_900="0D1B2A",
neutral_600="708DA8",
neutral_200="D6E0EB",
neutral_100="F4F7FA",
neutral_50="F8FAFB",
header_text="FFFFFF",
)
# -- Original Blue/Black: original blue-black color scheme (Round 1 #1/#6 style) --
ORIGINAL_BLUE = _make_palette(
primary="1B2A4A",
primary_light="D6E4F0",
accent_positive="2E8B57",
accent_negative="EB5757",
accent_warning="F2994A",
neutral_900="333333",
neutral_600="666666",
neutral_200="E0E0E0",
neutral_100="F5F5F5",
neutral_50="FAFAFA",
)
# ============================================================
# §4 Registry
# ============================================================
PALETTE_REGISTRY: Dict[str, _Palette] = {
# Legacy (removed: fresh, vibrant)
"professional": PROFESSIONAL,
"warm": WARM,
"elegant": ELEGANT,
"creative": CREATIVE,
# Premium (high-end feel)
"muji": MUJI,
"aesop": AESOP,
# dieter_rams removed — header too dark, poor readability
"kinfolk": KINFOLK,
"celine": CELINE,
"bottega": BOTTEGA,
"chanel": CHANEL,
"bloomberg": BLOOMBERG,
"original_blue": ORIGINAL_BLUE,
}
# Aliases for convenience
PALETTE_REGISTRY["muji_breathing"] = MUJI
PALETTE_REGISTRY["sandstone"] = AESOP
PALETTE_REGISTRY["industrial"] = BLOOMBERG # was dieter_rams, redirected
PALETTE_REGISTRY["cream"] = KINFOLK
PALETTE_REGISTRY["monochrome"] = CELINE
PALETTE_REGISTRY["forest_green"] = BOTTEGA
PALETTE_REGISTRY["champagne"] = CHANEL
PALETTE_REGISTRY["terminal"] = BLOOMBERG
PALETTE_REGISTRY["classic_blue"] = ORIGINAL_BLUE
# ============================================================
# §5 Keyword Matching (three-step)
# ============================================================
# Step 1: Explicit style keywords (highest priority)
_STYLE_KEYWORDS: Dict[str, list[str]] = {
"professional": [
"正式", "商务", "专业", "沉稳", "稳重", "professional", "formal",
"corporate", "business",
],
"warm": [
"温暖", "活力", "热情", "热烈", "暖色", "温馨", "warm", "energetic",
"活跃", "热力",
],
"elegant": [
"极简", "简约", "elegant", "minimal",
"清新", "自然", "清爽", "淡雅", "浅色", "明亮", "fresh",
"natural", "clean", "light", "素雅",
"多彩", "丰富", "鲜艳", "vivid", "colorful", "明快",
"高饱和", "鲜明", "亮色",
],
"creative": [
"文艺", "个性", "紫色", "莫兰迪", "creative", "artistic",
"柔和", "雅致",
],
# Premium themes
"muji": [
"muji", "无印", "呼吸感", "白纸", "铅笔", "素净", "无印良品",
],
"aesop": [
"aesop", "沙岩", "大地色", "护肤", "泥土", "", "terracotta",
],
"bloomberg": [
"bloomberg", "终端", "深蓝", "terminal", "金融终端", "数据终端",
"rams", "dieter", "工业", "德系", "包豪斯", "bauhaus", "less but better",
"工业风",
],
"kinfolk": [
"kinfolk", "奶油", "刊物", "杂志", "慢生活", "latte", "拿铁",
],
"celine": [
"celine", "黑白", "时装", "冷冽", "mono", "纯黑", "monochrome",
],
"bottega": [
"bottega", "墨绿", "深绿", "森林", "橄榄", "绿色", "forest",
"贵气", "奢牌",
],
"chanel": [
"chanel", "米金", "金棕", "香奈儿", "champagne", "米色", "奶茶",
],
"original_blue": [
"原始", "经典蓝", "classic blue", "original", "传统蓝",
],
}
# Step 2: Scene keywords → infer style (lower priority)
_SCENE_TO_STYLE: Dict[str, str] = {
# Sales / Marketing / Ops → warm
"销售": "warm", "营销": "warm", "运营": "warm", "客户": "warm",
"业绩": "warm", "KPI": "warm", "GMV": "warm", "转化": "warm",
"漏斗": "warm", "签约": "warm", "提成": "warm", "电商": "warm",
"sales": "warm", "marketing": "warm", "campaign": "warm",
# Education / Medical → muji (was fresh, now removed)
"成绩": "muji", "考试": "muji", "学生": "muji", "课程": "muji",
"教育": "muji", "GPA": "muji", "学校": "muji", "班级": "muji",
"医疗": "muji", "健康": "muji", "患者": "muji", "体检": "muji",
"医院": "muji", "科室": "muji", "护理": "muji",
"环保": "muji",
"education": "muji", "medical": "muji", "health": "muji",
# Design / Brand → creative
"设计": "creative", "创意": "creative", "品牌": "creative",
"UI": "creative", "UX": "creative", "作品": "creative",
"视觉": "creative", "素材": "creative",
"design": "creative", "brand": "creative", "portfolio": "creative",
# Formal / Reporting → professional
"汇报": "professional", "提案": "professional", "会议": "professional",
"述职": "professional", "总结": "professional", "报告": "professional",
"年报": "professional", "季报": "professional", "月报": "professional",
"财务": "professional", "财报": "professional", "预算": "professional",
"审计": "professional", "咨询": "professional", "战略": "professional",
"finance": "professional", "budget": "professional", "report": "professional",
# Minimal / Premium → elegant
"premium": "elegant", "luxury": "elegant",
# Finance data → bloomberg
"股票": "bloomberg", "基金": "bloomberg", "投资": "bloomberg",
"交易": "bloomberg", "行情": "bloomberg", "K线": "bloomberg",
"stock": "bloomberg", "trading": "bloomberg", "portfolio_fin": "bloomberg",
# High-end / Luxury brand → chanel
"奢侈": "chanel", "高端": "chanel", "高级": "chanel",
}
def _match_style_keywords(text: str) -> Optional[str]:
"""Step 1: Match explicit style keywords. Returns style name or None."""
text_lower = text.lower()
best_match = None
best_score = 0
for style, keywords in _STYLE_KEYWORDS.items():
score = sum(1 for kw in keywords if kw.lower() in text_lower)
if score > best_score:
best_score = score
best_match = style
return best_match if best_score > 0 else None
def _infer_from_scene(text: str) -> Optional[str]:
"""Step 2: Infer style from scene/content keywords. Returns style name or None."""
text_lower = text.lower()
votes: Dict[str, int] = {}
for keyword, style in _SCENE_TO_STYLE.items():
if keyword.lower() in text_lower:
votes[style] = votes.get(style, 0) + 1
if not votes:
return None
return max(votes, key=votes.get)
# ============================================================
# §6 Public API
# ============================================================
def get_palette(style: str = "professional") -> _Palette:
"""Get a palette by style name. Falls back to professional."""
return PALETTE_REGISTRY.get(style, PROFESSIONAL)
def resolve_palette(prompt: str) -> _Palette:
"""
Auto-detect style from user prompt (three-step):
1. Explicit style keywords → direct match
2. Scene/content keywords → infer style
3. No match → professional (safe default)
"""
style = detect_style(prompt)
return get_palette(style)
def resolve_palette_with_info(prompt: str) -> Tuple[_Palette, str]:
"""Same as resolve_palette but also returns the detected style name."""
style = detect_style(prompt)
return get_palette(style), style
def detect_style(prompt: str) -> str:
"""
Detect style from prompt. Three-step priority:
1. Explicit style keywords
2. Scene keywords → infer style
3. Default: professional
"""
style = _match_style_keywords(prompt)
if style:
return style
style = _infer_from_scene(prompt)
if style:
return style
return "professional"
def list_available() -> list[str]:
"""Return list of available style names (no aliases)."""
# Return only canonical names, not aliases
canonical = [
"professional", "warm", "elegant", "creative",
"muji", "aesop", "kinfolk", "celine", "bottega",
"chanel", "bloomberg", "original_blue",
]
return canonical
def apply_palette(palette: _Palette, module_globals: dict):
"""
Inject palette tokens into a module's global namespace.
Designed to be called from base.py to override its color constants.
"""
key_map = {
"PRIMARY": "PRIMARY",
"PRIMARY_LIGHT": "PRIMARY_LIGHT",
"SECONDARY": "SECONDARY",
"ACCENT_POSITIVE": "ACCENT_POSITIVE",
"ACCENT_NEGATIVE": "ACCENT_NEGATIVE",
"ACCENT_WARNING": "ACCENT_WARNING",
"NEUTRAL_900": "NEUTRAL_900",
"NEUTRAL_600": "NEUTRAL_600",
"NEUTRAL_200": "NEUTRAL_200",
"NEUTRAL_100": "NEUTRAL_100",
"NEUTRAL_50": "NEUTRAL_50",
"NEUTRAL_0": "NEUTRAL_0",
"CHART_COLORS": "CHART_COLORS",
}
for palette_key, global_key in key_map.items():
if palette_key in palette:
module_globals[global_key] = palette[palette_key]

1299
skills/xlsx/xlsx.py Executable file

File diff suppressed because it is too large Load Diff