v1.4.0: Major Skills Expansion - 75 Total Skills

2026-02-26 12:19:22 +04:00
parent 029bad83b6
commit e2a7099273
392 changed files with 55962 additions and 2 deletions
--- a/skills/superpowers/docs/testing.md
+++ b/skills/superpowers/docs/testing.md
@@ -0,0 +1,303 @@
+# Testing Superpowers Skills
+
+This document describes how to test Superpowers skills, particularly the integration tests for complex skills like `subagent-driven-development`.
+
+## Overview
+
+Testing skills that involve subagents, workflows, and complex interactions requires running actual Claude Code sessions in headless mode and verifying their behavior through session transcripts.
+
+## Test Structure
+
+```
+tests/
+├── claude-code/
+│   ├── test-helpers.sh                    # Shared test utilities
+│   ├── test-subagent-driven-development-integration.sh
+│   ├── analyze-token-usage.py             # Token analysis tool
+│   └── run-skill-tests.sh                 # Test runner (if exists)
+```
+
+## Running Tests
+
+### Integration Tests
+
+Integration tests execute real Claude Code sessions with actual skills:
+
+```bash
+# Run the subagent-driven-development integration test
+cd tests/claude-code
+./test-subagent-driven-development-integration.sh
+```
+
+**Note:** Integration tests can take 10-30 minutes as they execute real implementation plans with multiple subagents.
+
+### Requirements
+
+- Must run from the **superpowers plugin directory** (not from temp directories)
+- Claude Code must be installed and available as `claude` command
+- Local dev marketplace must be enabled: `"superpowers@superpowers-dev": true` in `~/.claude/settings.json`
+
+## Integration Test: subagent-driven-development
+
+### What It Tests
+
+The integration test verifies the `subagent-driven-development` skill correctly:
+
+1. **Plan Loading**: Reads the plan once at the beginning
+2. **Full Task Text**: Provides complete task descriptions to subagents (doesn't make them read files)
+3. **Self-Review**: Ensures subagents perform self-review before reporting
+4. **Review Order**: Runs spec compliance review before code quality review
+5. **Review Loops**: Uses review loops when issues are found
+6. **Independent Verification**: Spec reviewer reads code independently, doesn't trust implementer reports
+
+### How It Works
+
+1. **Setup**: Creates a temporary Node.js project with a minimal implementation plan
+2. **Execution**: Runs Claude Code in headless mode with the skill
+3. **Verification**: Parses the session transcript (`.jsonl` file) to verify:
+   - Skill tool was invoked
+   - Subagents were dispatched (Task tool)
+   - TodoWrite was used for tracking
+   - Implementation files were created
+   - Tests pass
+   - Git commits show proper workflow
+4. **Token Analysis**: Shows token usage breakdown by subagent
+
+### Test Output
+
+```
+========================================
+ Integration Test: subagent-driven-development
+========================================
+
+Test project: /tmp/tmp.xyz123
+
+=== Verification Tests ===
+
+Test 1: Skill tool invoked...
+  [PASS] subagent-driven-development skill was invoked
+
+Test 2: Subagents dispatched...
+  [PASS] 7 subagents dispatched
+
+Test 3: Task tracking...
+  [PASS] TodoWrite used 5 time(s)
+
+Test 6: Implementation verification...
+  [PASS] src/math.js created
+  [PASS] add function exists
+  [PASS] multiply function exists
+  [PASS] test/math.test.js created
+  [PASS] Tests pass
+
+Test 7: Git commit history...
+  [PASS] Multiple commits created (3 total)
+
+Test 8: No extra features added...
+  [PASS] No extra features added
+
+=========================================
+ Token Usage Analysis
+=========================================
+
+Usage Breakdown:
+----------------------------------------------------------------------------------------------------
+Agent           Description                          Msgs      Input     Output      Cache     Cost
+----------------------------------------------------------------------------------------------------
+main            Main session (coordinator)             34         27      3,996  1,213,703 $   4.09
+3380c209        implementing Task 1: Create Add Function     1          2        787     24,989 $   0.09
+34b00fde        implementing Task 2: Create Multiply Function     1          4        644     25,114 $   0.09
+3801a732        reviewing whether an implementation matches...   1          5        703     25,742 $   0.09
+4c142934        doing a final code review...                    1          6        854     25,319 $   0.09
+5f017a42        a code reviewer. Review Task 2...               1          6        504     22,949 $   0.08
+a6b7fbe4        a code reviewer. Review Task 1...               1          6        515     22,534 $   0.08
+f15837c0        reviewing whether an implementation matches...   1          6        416     22,485 $   0.07
+----------------------------------------------------------------------------------------------------
+
+TOTALS:
+  Total messages:         41
+  Input tokens:           62
+  Output tokens:          8,419
+  Cache creation tokens:  132,742
+  Cache read tokens:      1,382,835
+
+  Total input (incl cache): 1,515,639
+  Total tokens:             1,524,058
+
+  Estimated cost: $4.67
+  (at $3/$15 per M tokens for input/output)
+
+========================================
+ Test Summary
+========================================
+
+STATUS: PASSED
+```
+
+## Token Analysis Tool
+
+### Usage
+
+Analyze token usage from any Claude Code session:
+
+```bash
+python3 tests/claude-code/analyze-token-usage.py ~/.claude/projects/<project-dir>/<session-id>.jsonl
+```
+
+### Finding Session Files
+
+Session transcripts are stored in `~/.claude/projects/` with the working directory path encoded:
+
+```bash
+# Example for /Users/jesse/Documents/GitHub/superpowers/superpowers
+SESSION_DIR="$HOME/.claude/projects/-Users-jesse-Documents-GitHub-superpowers-superpowers"
+
+# Find recent sessions
+ls -lt "$SESSION_DIR"/*.jsonl | head -5
+```
+
+### What It Shows
+
+- **Main session usage**: Token usage by the coordinator (you or main Claude instance)
+- **Per-subagent breakdown**: Each Task invocation with:
+  - Agent ID
+  - Description (extracted from prompt)
+  - Message count
+  - Input/output tokens
+  - Cache usage
+  - Estimated cost
+- **Totals**: Overall token usage and cost estimate
+
+### Understanding the Output
+
+- **High cache reads**: Good - means prompt caching is working
+- **High input tokens on main**: Expected - coordinator has full context
+- **Similar costs per subagent**: Expected - each gets similar task complexity
+- **Cost per task**: Typical range is $0.05-$0.15 per subagent depending on task
+
+## Troubleshooting
+
+### Skills Not Loading
+
+**Problem**: Skill not found when running headless tests
+
+**Solutions**:
+1. Ensure you're running FROM the superpowers directory: `cd /path/to/superpowers && tests/...`
+2. Check `~/.claude/settings.json` has `"superpowers@superpowers-dev": true` in `enabledPlugins`
+3. Verify skill exists in `skills/` directory
+
+### Permission Errors
+
+**Problem**: Claude blocked from writing files or accessing directories
+
+**Solutions**:
+1. Use `--permission-mode bypassPermissions` flag
+2. Use `--add-dir /path/to/temp/dir` to grant access to test directories
+3. Check file permissions on test directories
+
+### Test Timeouts
+
+**Problem**: Test takes too long and times out
+
+**Solutions**:
+1. Increase timeout: `timeout 1800 claude ...` (30 minutes)
+2. Check for infinite loops in skill logic
+3. Review subagent task complexity
+
+### Session File Not Found
+
+**Problem**: Can't find session transcript after test run
+
+**Solutions**:
+1. Check the correct project directory in `~/.claude/projects/`
+2. Use `find ~/.claude/projects -name "*.jsonl" -mmin -60` to find recent sessions
+3. Verify test actually ran (check for errors in test output)
+
+## Writing New Integration Tests
+
+### Template
+
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+source "$SCRIPT_DIR/test-helpers.sh"
+
+# Create test project
+TEST_PROJECT=$(create_test_project)
+trap "cleanup_test_project $TEST_PROJECT" EXIT
+
+# Set up test files...
+cd "$TEST_PROJECT"
+
+# Run Claude with skill
+PROMPT="Your test prompt here"
+cd "$SCRIPT_DIR/../.." && timeout 1800 claude -p "$PROMPT" \
+  --allowed-tools=all \
+  --add-dir "$TEST_PROJECT" \
+  --permission-mode bypassPermissions \
+  2>&1 | tee output.txt
+
+# Find and analyze session
+WORKING_DIR_ESCAPED=$(echo "$SCRIPT_DIR/../.." | sed 's/\\//-/g' | sed 's/^-//')
+SESSION_DIR="$HOME/.claude/projects/$WORKING_DIR_ESCAPED"
+SESSION_FILE=$(find "$SESSION_DIR" -name "*.jsonl" -type f -mmin -60 | sort -r | head -1)
+
+# Verify behavior by parsing session transcript
+if grep -q '"name":"Skill".*"skill":"your-skill-name"' "$SESSION_FILE"; then
+    echo "[PASS] Skill was invoked"
+fi
+
+# Show token analysis
+python3 "$SCRIPT_DIR/analyze-token-usage.py" "$SESSION_FILE"
+```
+
+### Best Practices
+
+1. **Always cleanup**: Use trap to cleanup temp directories
+2. **Parse transcripts**: Don't grep user-facing output - parse the `.jsonl` session file
+3. **Grant permissions**: Use `--permission-mode bypassPermissions` and `--add-dir`
+4. **Run from plugin dir**: Skills only load when running from the superpowers directory
+5. **Show token usage**: Always include token analysis for cost visibility
+6. **Test real behavior**: Verify actual files created, tests passing, commits made
+
+## Session Transcript Format
+
+Session transcripts are JSONL (JSON Lines) files where each line is a JSON object representing a message or tool result.
+
+### Key Fields
+
+```json
+{
+  "type": "assistant",
+  "message": {
+    "content": [...],
+    "usage": {
+      "input_tokens": 27,
+      "output_tokens": 3996,
+      "cache_read_input_tokens": 1213703
+    }
+  }
+}
+```
+
+### Tool Results
+
+```json
+{
+  "type": "user",
+  "toolUseResult": {
+    "agentId": "3380c209",
+    "usage": {
+      "input_tokens": 2,
+      "output_tokens": 787,
+      "cache_read_input_tokens": 24989
+    },
+    "prompt": "You are implementing Task 1...",
+    "content": [{"type": "text", "text": "..."}]
+  }
+}
+```
+
+The `agentId` field links to subagent sessions, and the `usage` field contains token usage for that specific subagent invocation.