Initial commit
This commit is contained in:
95
skills/xlsx/scenes/analyze.md
Executable file
95
skills/xlsx/scenes/analyze.md
Executable file
@@ -0,0 +1,95 @@
|
||||
# Scene: Data Analysis → Excel Output
|
||||
|
||||
## When This Applies
|
||||
User wants to analyze data (statistics, trends, comparisons, pivots, aggregation) and receive results as an Excel file — possibly with charts, summary tables, or dashboards.
|
||||
|
||||
This scene bridges **pandas analysis** with **openpyxl output**. The deliverable is always an .xlsx file.
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
1. LOAD → Read input data (CSV/XLSX/JSON/DB)
|
||||
2. EXPLORE → Understand structure, quality, distributions
|
||||
3. ANALYZE → Compute metrics, aggregations, statistical tests
|
||||
4. DESIGN → Plan Excel output (sheets, charts, KPIs)
|
||||
5. BUILD → Write analysis results to .xlsx with formatting
|
||||
6. CHART → Add charts (Excel-native or embedded matplotlib)
|
||||
7. QA → recalc → audit → scan → chart-verify
|
||||
8. PIVOT → If needed, run xlsx.py pivot as final step
|
||||
9. VALIDATE → validate → deliver
|
||||
```
|
||||
|
||||
## Analysis Framework
|
||||
|
||||
### Phase A: Problem Framing
|
||||
- What question is the user trying to answer?
|
||||
- Who will consume this output? (executive summary vs. detailed analysis)
|
||||
- What decisions will be made based on this data?
|
||||
|
||||
### Phase B: Data Quality Assessment
|
||||
- Missing values: count, pattern (random vs. systematic)
|
||||
- Outliers: statistical detection (IQR, z-score)
|
||||
- Data types: numeric vs. categorical, date parsing
|
||||
- Duplicates: exact and fuzzy
|
||||
|
||||
### Phase C: Exploratory Analysis
|
||||
- Distributions: histograms, box plots for key variables
|
||||
- Correlations: pairwise for numeric columns
|
||||
- Segmentation: group-by analysis on categorical dimensions
|
||||
- Time patterns: trends, seasonality if time-series data
|
||||
|
||||
### Phase D: Insight Extraction
|
||||
- Rank findings by business impact, not statistical significance
|
||||
- Each insight must be actionable — "so what?" test
|
||||
- Cross-validate: check the same insight from a different angle
|
||||
|
||||
### Phase E: Cross-Validation
|
||||
- Sanity check totals against known benchmarks
|
||||
- Verify computed metrics with alternative formulas
|
||||
- Document any assumptions or limitations in the output
|
||||
|
||||
**Industry-specific frameworks:**
|
||||
- **Finance**: Variance analysis → trend decomposition → ratio analysis → peer comparison
|
||||
- **Marketing**: Funnel analysis → cohort analysis → attribution → ROI calculation
|
||||
- **Operations**: Throughput analysis → bottleneck identification → utilization rates → SLA compliance
|
||||
|
||||
---
|
||||
|
||||
## Multi-Sheet Report Layout
|
||||
|
||||
```
|
||||
Sheet 1: "Dashboard" — KPI cards + summary chart
|
||||
Sheet 2: "Detail" — Full analysis table with formatting
|
||||
Sheet 3: "Charts" — Additional visualizations
|
||||
Sheet 4: "Raw Data" — Original data for reference (tab color: gray)
|
||||
```
|
||||
|
||||
### KPI Summary Card Pattern
|
||||
|
||||
Place 4-6 KPI metrics at the top of Dashboard sheet (row 3-4), each spaced 3 columns apart. Include label (small, gray) and value (large, bold, themed) with appropriate number format.
|
||||
|
||||
---
|
||||
|
||||
## PivotTable Decision
|
||||
|
||||
| Situation | Use |
|
||||
|-----------|-----|
|
||||
| Need interactive PivotTable in Excel | `"$XLSX_SKILL_DIR/xlsx.py" pivot` |
|
||||
| Just need a summary table (static) | pandas `pivot_table` → openpyxl |
|
||||
| Simple aggregation (1 dimension) | pandas `groupby` → openpyxl |
|
||||
|
||||
**Trigger phrases**: summarize, aggregate, group by, categorize, breakdown, distribution, tally, totals per, cross-tab, 汇总, 透视, 分类统计, 交叉分析
|
||||
|
||||
---
|
||||
|
||||
## Data Provenance
|
||||
|
||||
When analysis uses external data, create a **"Sources" sheet** (tab color: `PRIMARY`) with columns: Data Description | Source Name | Source URL | Access Date.
|
||||
|
||||
Skip when user provides all data directly.
|
||||
|
||||
---
|
||||
|
||||
## Code Recipes
|
||||
|
||||
For specific code patterns (aggregation, time series, comparison, cleaning, bridge pattern), load `scenes/analyze-recipes.md` on demand.
|
||||
Reference in New Issue
Block a user