Files
mantle-ai-trader/skills/xlsx/scenes/analyze.md
2026-06-06 05:21:10 +00:00

3.6 KiB
Executable File

Scene: Data Analysis → Excel Output

When This Applies

User wants to analyze data (statistics, trends, comparisons, pivots, aggregation) and receive results as an Excel file — possibly with charts, summary tables, or dashboards.

This scene bridges pandas analysis with openpyxl output. The deliverable is always an .xlsx file.

Workflow

1. LOAD       → Read input data (CSV/XLSX/JSON/DB)
2. EXPLORE    → Understand structure, quality, distributions
3. ANALYZE    → Compute metrics, aggregations, statistical tests
4. DESIGN     → Plan Excel output (sheets, charts, KPIs)
5. BUILD      → Write analysis results to .xlsx with formatting
6. CHART      → Add charts (Excel-native or embedded matplotlib)
7. QA         → recalc → audit → scan → chart-verify
8. PIVOT      → If needed, run xlsx.py pivot as final step
9. VALIDATE   → validate → deliver

Analysis Framework

Phase A: Problem Framing

  • What question is the user trying to answer?
  • Who will consume this output? (executive summary vs. detailed analysis)
  • What decisions will be made based on this data?

Phase B: Data Quality Assessment

  • Missing values: count, pattern (random vs. systematic)
  • Outliers: statistical detection (IQR, z-score)
  • Data types: numeric vs. categorical, date parsing
  • Duplicates: exact and fuzzy

Phase C: Exploratory Analysis

  • Distributions: histograms, box plots for key variables
  • Correlations: pairwise for numeric columns
  • Segmentation: group-by analysis on categorical dimensions
  • Time patterns: trends, seasonality if time-series data

Phase D: Insight Extraction

  • Rank findings by business impact, not statistical significance
  • Each insight must be actionable — "so what?" test
  • Cross-validate: check the same insight from a different angle

Phase E: Cross-Validation

  • Sanity check totals against known benchmarks
  • Verify computed metrics with alternative formulas
  • Document any assumptions or limitations in the output

Industry-specific frameworks:

  • Finance: Variance analysis → trend decomposition → ratio analysis → peer comparison
  • Marketing: Funnel analysis → cohort analysis → attribution → ROI calculation
  • Operations: Throughput analysis → bottleneck identification → utilization rates → SLA compliance

Multi-Sheet Report Layout

Sheet 1: "Dashboard"     — KPI cards + summary chart
Sheet 2: "Detail"        — Full analysis table with formatting
Sheet 3: "Charts"        — Additional visualizations
Sheet 4: "Raw Data"      — Original data for reference (tab color: gray)

KPI Summary Card Pattern

Place 4-6 KPI metrics at the top of Dashboard sheet (row 3-4), each spaced 3 columns apart. Include label (small, gray) and value (large, bold, themed) with appropriate number format.


PivotTable Decision

Situation Use
Need interactive PivotTable in Excel "$XLSX_SKILL_DIR/xlsx.py" pivot
Just need a summary table (static) pandas pivot_table → openpyxl
Simple aggregation (1 dimension) pandas groupby → openpyxl

Trigger phrases: summarize, aggregate, group by, categorize, breakdown, distribution, tally, totals per, cross-tab, 汇总, 透视, 分类统计, 交叉分析


Data Provenance

When analysis uses external data, create a "Sources" sheet (tab color: PRIMARY) with columns: Data Description | Source Name | Source URL | Access Date.

Skip when user provides all data directly.


Code Recipes

For specific code patterns (aggregation, time series, comparison, cleaning, bridge pattern), load scenes/analyze-recipes.md on demand.