QwenClaw v2.0 - Complete Rebuild with ALL 81+ Skills
This commit is contained in:
158
skills/superpowers/tests/claude-code/README.md
Normal file
158
skills/superpowers/tests/claude-code/README.md
Normal file
@@ -0,0 +1,158 @@
|
||||
# Claude Code Skills Tests
|
||||
|
||||
Automated tests for superpowers skills using Claude Code CLI.
|
||||
|
||||
## Overview
|
||||
|
||||
This test suite verifies that skills are loaded correctly and Claude follows them as expected. Tests invoke Claude Code in headless mode (`claude -p`) and verify the behavior.
|
||||
|
||||
## Requirements
|
||||
|
||||
- Claude Code CLI installed and in PATH (`claude --version` should work)
|
||||
- Local superpowers plugin installed (see main README for installation)
|
||||
|
||||
## Running Tests
|
||||
|
||||
### Run all fast tests (recommended):
|
||||
```bash
|
||||
./run-skill-tests.sh
|
||||
```
|
||||
|
||||
### Run integration tests (slow, 10-30 minutes):
|
||||
```bash
|
||||
./run-skill-tests.sh --integration
|
||||
```
|
||||
|
||||
### Run specific test:
|
||||
```bash
|
||||
./run-skill-tests.sh --test test-subagent-driven-development.sh
|
||||
```
|
||||
|
||||
### Run with verbose output:
|
||||
```bash
|
||||
./run-skill-tests.sh --verbose
|
||||
```
|
||||
|
||||
### Set custom timeout:
|
||||
```bash
|
||||
./run-skill-tests.sh --timeout 1800 # 30 minutes for integration tests
|
||||
```
|
||||
|
||||
## Test Structure
|
||||
|
||||
### test-helpers.sh
|
||||
Common functions for skills testing:
|
||||
- `run_claude "prompt" [timeout]` - Run Claude with prompt
|
||||
- `assert_contains output pattern name` - Verify pattern exists
|
||||
- `assert_not_contains output pattern name` - Verify pattern absent
|
||||
- `assert_count output pattern count name` - Verify exact count
|
||||
- `assert_order output pattern_a pattern_b name` - Verify order
|
||||
- `create_test_project` - Create temp test directory
|
||||
- `create_test_plan project_dir` - Create sample plan file
|
||||
|
||||
### Test Files
|
||||
|
||||
Each test file:
|
||||
1. Sources `test-helpers.sh`
|
||||
2. Runs Claude Code with specific prompts
|
||||
3. Verifies expected behavior using assertions
|
||||
4. Returns 0 on success, non-zero on failure
|
||||
|
||||
## Example Test
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
source "$SCRIPT_DIR/test-helpers.sh"
|
||||
|
||||
echo "=== Test: My Skill ==="
|
||||
|
||||
# Ask Claude about the skill
|
||||
output=$(run_claude "What does the my-skill skill do?" 30)
|
||||
|
||||
# Verify response
|
||||
assert_contains "$output" "expected behavior" "Skill describes behavior"
|
||||
|
||||
echo "=== All tests passed ==="
|
||||
```
|
||||
|
||||
## Current Tests
|
||||
|
||||
### Fast Tests (run by default)
|
||||
|
||||
#### test-subagent-driven-development.sh
|
||||
Tests skill content and requirements (~2 minutes):
|
||||
- Skill loading and accessibility
|
||||
- Workflow ordering (spec compliance before code quality)
|
||||
- Self-review requirements documented
|
||||
- Plan reading efficiency documented
|
||||
- Spec compliance reviewer skepticism documented
|
||||
- Review loops documented
|
||||
- Task context provision documented
|
||||
|
||||
### Integration Tests (use --integration flag)
|
||||
|
||||
#### test-subagent-driven-development-integration.sh
|
||||
Full workflow execution test (~10-30 minutes):
|
||||
- Creates real test project with Node.js setup
|
||||
- Creates implementation plan with 2 tasks
|
||||
- Executes plan using subagent-driven-development
|
||||
- Verifies actual behaviors:
|
||||
- Plan read once at start (not per task)
|
||||
- Full task text provided in subagent prompts
|
||||
- Subagents perform self-review before reporting
|
||||
- Spec compliance review happens before code quality
|
||||
- Spec reviewer reads code independently
|
||||
- Working implementation is produced
|
||||
- Tests pass
|
||||
- Proper git commits created
|
||||
|
||||
**What it tests:**
|
||||
- The workflow actually works end-to-end
|
||||
- Our improvements are actually applied
|
||||
- Subagents follow the skill correctly
|
||||
- Final code is functional and tested
|
||||
|
||||
## Adding New Tests
|
||||
|
||||
1. Create new test file: `test-<skill-name>.sh`
|
||||
2. Source test-helpers.sh
|
||||
3. Write tests using `run_claude` and assertions
|
||||
4. Add to test list in `run-skill-tests.sh`
|
||||
5. Make executable: `chmod +x test-<skill-name>.sh`
|
||||
|
||||
## Timeout Considerations
|
||||
|
||||
- Default timeout: 5 minutes per test
|
||||
- Claude Code may take time to respond
|
||||
- Adjust with `--timeout` if needed
|
||||
- Tests should be focused to avoid long runs
|
||||
|
||||
## Debugging Failed Tests
|
||||
|
||||
With `--verbose`, you'll see full Claude output:
|
||||
```bash
|
||||
./run-skill-tests.sh --verbose --test test-subagent-driven-development.sh
|
||||
```
|
||||
|
||||
Without verbose, only failures show output.
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
To run in CI:
|
||||
```bash
|
||||
# Run with explicit timeout for CI environments
|
||||
./run-skill-tests.sh --timeout 900
|
||||
|
||||
# Exit code 0 = success, non-zero = failure
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- Tests verify skill *instructions*, not full execution
|
||||
- Full workflow tests would be very slow
|
||||
- Focus on verifying key skill requirements
|
||||
- Tests should be deterministic
|
||||
- Avoid testing implementation details
|
||||
168
skills/superpowers/tests/claude-code/analyze-token-usage.py
Normal file
168
skills/superpowers/tests/claude-code/analyze-token-usage.py
Normal file
@@ -0,0 +1,168 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze token usage from Claude Code session transcripts.
|
||||
Breaks down usage by main session and individual subagents.
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from collections import defaultdict
|
||||
|
||||
def analyze_main_session(filepath):
|
||||
"""Analyze a session file and return token usage broken down by agent."""
|
||||
main_usage = {
|
||||
'input_tokens': 0,
|
||||
'output_tokens': 0,
|
||||
'cache_creation': 0,
|
||||
'cache_read': 0,
|
||||
'messages': 0
|
||||
}
|
||||
|
||||
# Track usage per subagent
|
||||
subagent_usage = defaultdict(lambda: {
|
||||
'input_tokens': 0,
|
||||
'output_tokens': 0,
|
||||
'cache_creation': 0,
|
||||
'cache_read': 0,
|
||||
'messages': 0,
|
||||
'description': None
|
||||
})
|
||||
|
||||
with open(filepath, 'r') as f:
|
||||
for line in f:
|
||||
try:
|
||||
data = json.loads(line)
|
||||
|
||||
# Main session assistant messages
|
||||
if data.get('type') == 'assistant' and 'message' in data:
|
||||
main_usage['messages'] += 1
|
||||
msg_usage = data['message'].get('usage', {})
|
||||
main_usage['input_tokens'] += msg_usage.get('input_tokens', 0)
|
||||
main_usage['output_tokens'] += msg_usage.get('output_tokens', 0)
|
||||
main_usage['cache_creation'] += msg_usage.get('cache_creation_input_tokens', 0)
|
||||
main_usage['cache_read'] += msg_usage.get('cache_read_input_tokens', 0)
|
||||
|
||||
# Subagent tool results
|
||||
if data.get('type') == 'user' and 'toolUseResult' in data:
|
||||
result = data['toolUseResult']
|
||||
if 'usage' in result and 'agentId' in result:
|
||||
agent_id = result['agentId']
|
||||
usage = result['usage']
|
||||
|
||||
# Get description from prompt if available
|
||||
if subagent_usage[agent_id]['description'] is None:
|
||||
prompt = result.get('prompt', '')
|
||||
# Extract first line as description
|
||||
first_line = prompt.split('\n')[0] if prompt else f"agent-{agent_id}"
|
||||
if first_line.startswith('You are '):
|
||||
first_line = first_line[8:] # Remove "You are "
|
||||
subagent_usage[agent_id]['description'] = first_line[:60]
|
||||
|
||||
subagent_usage[agent_id]['messages'] += 1
|
||||
subagent_usage[agent_id]['input_tokens'] += usage.get('input_tokens', 0)
|
||||
subagent_usage[agent_id]['output_tokens'] += usage.get('output_tokens', 0)
|
||||
subagent_usage[agent_id]['cache_creation'] += usage.get('cache_creation_input_tokens', 0)
|
||||
subagent_usage[agent_id]['cache_read'] += usage.get('cache_read_input_tokens', 0)
|
||||
except:
|
||||
pass
|
||||
|
||||
return main_usage, dict(subagent_usage)
|
||||
|
||||
def format_tokens(n):
|
||||
"""Format token count with thousands separators."""
|
||||
return f"{n:,}"
|
||||
|
||||
def calculate_cost(usage, input_cost_per_m=3.0, output_cost_per_m=15.0):
|
||||
"""Calculate estimated cost in dollars."""
|
||||
total_input = usage['input_tokens'] + usage['cache_creation'] + usage['cache_read']
|
||||
input_cost = total_input * input_cost_per_m / 1_000_000
|
||||
output_cost = usage['output_tokens'] * output_cost_per_m / 1_000_000
|
||||
return input_cost + output_cost
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: analyze-token-usage.py <session-file.jsonl>")
|
||||
sys.exit(1)
|
||||
|
||||
main_session_file = sys.argv[1]
|
||||
|
||||
if not Path(main_session_file).exists():
|
||||
print(f"Error: Session file not found: {main_session_file}")
|
||||
sys.exit(1)
|
||||
|
||||
# Analyze the session
|
||||
main_usage, subagent_usage = analyze_main_session(main_session_file)
|
||||
|
||||
print("=" * 100)
|
||||
print("TOKEN USAGE ANALYSIS")
|
||||
print("=" * 100)
|
||||
print()
|
||||
|
||||
# Print breakdown
|
||||
print("Usage Breakdown:")
|
||||
print("-" * 100)
|
||||
print(f"{'Agent':<15} {'Description':<35} {'Msgs':>5} {'Input':>10} {'Output':>10} {'Cache':>10} {'Cost':>8}")
|
||||
print("-" * 100)
|
||||
|
||||
# Main session
|
||||
cost = calculate_cost(main_usage)
|
||||
print(f"{'main':<15} {'Main session (coordinator)':<35} "
|
||||
f"{main_usage['messages']:>5} "
|
||||
f"{format_tokens(main_usage['input_tokens']):>10} "
|
||||
f"{format_tokens(main_usage['output_tokens']):>10} "
|
||||
f"{format_tokens(main_usage['cache_read']):>10} "
|
||||
f"${cost:>7.2f}")
|
||||
|
||||
# Subagents (sorted by agent ID)
|
||||
for agent_id in sorted(subagent_usage.keys()):
|
||||
usage = subagent_usage[agent_id]
|
||||
cost = calculate_cost(usage)
|
||||
desc = usage['description'] or f"agent-{agent_id}"
|
||||
print(f"{agent_id:<15} {desc:<35} "
|
||||
f"{usage['messages']:>5} "
|
||||
f"{format_tokens(usage['input_tokens']):>10} "
|
||||
f"{format_tokens(usage['output_tokens']):>10} "
|
||||
f"{format_tokens(usage['cache_read']):>10} "
|
||||
f"${cost:>7.2f}")
|
||||
|
||||
print("-" * 100)
|
||||
|
||||
# Calculate totals
|
||||
total_usage = {
|
||||
'input_tokens': main_usage['input_tokens'],
|
||||
'output_tokens': main_usage['output_tokens'],
|
||||
'cache_creation': main_usage['cache_creation'],
|
||||
'cache_read': main_usage['cache_read'],
|
||||
'messages': main_usage['messages']
|
||||
}
|
||||
|
||||
for usage in subagent_usage.values():
|
||||
total_usage['input_tokens'] += usage['input_tokens']
|
||||
total_usage['output_tokens'] += usage['output_tokens']
|
||||
total_usage['cache_creation'] += usage['cache_creation']
|
||||
total_usage['cache_read'] += usage['cache_read']
|
||||
total_usage['messages'] += usage['messages']
|
||||
|
||||
total_input = total_usage['input_tokens'] + total_usage['cache_creation'] + total_usage['cache_read']
|
||||
total_tokens = total_input + total_usage['output_tokens']
|
||||
total_cost = calculate_cost(total_usage)
|
||||
|
||||
print()
|
||||
print("TOTALS:")
|
||||
print(f" Total messages: {format_tokens(total_usage['messages'])}")
|
||||
print(f" Input tokens: {format_tokens(total_usage['input_tokens'])}")
|
||||
print(f" Output tokens: {format_tokens(total_usage['output_tokens'])}")
|
||||
print(f" Cache creation tokens: {format_tokens(total_usage['cache_creation'])}")
|
||||
print(f" Cache read tokens: {format_tokens(total_usage['cache_read'])}")
|
||||
print()
|
||||
print(f" Total input (incl cache): {format_tokens(total_input)}")
|
||||
print(f" Total tokens: {format_tokens(total_tokens)}")
|
||||
print()
|
||||
print(f" Estimated cost: ${total_cost:.2f}")
|
||||
print(" (at $3/$15 per M tokens for input/output)")
|
||||
print()
|
||||
print("=" * 100)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
187
skills/superpowers/tests/claude-code/run-skill-tests.sh
Normal file
187
skills/superpowers/tests/claude-code/run-skill-tests.sh
Normal file
@@ -0,0 +1,187 @@
|
||||
#!/usr/bin/env bash
|
||||
# Test runner for Claude Code skills
|
||||
# Tests skills by invoking Claude Code CLI and verifying behavior
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
cd "$SCRIPT_DIR"
|
||||
|
||||
echo "========================================"
|
||||
echo " Claude Code Skills Test Suite"
|
||||
echo "========================================"
|
||||
echo ""
|
||||
echo "Repository: $(cd ../.. && pwd)"
|
||||
echo "Test time: $(date)"
|
||||
echo "Claude version: $(claude --version 2>/dev/null || echo 'not found')"
|
||||
echo ""
|
||||
|
||||
# Check if Claude Code is available
|
||||
if ! command -v claude &> /dev/null; then
|
||||
echo "ERROR: Claude Code CLI not found"
|
||||
echo "Install Claude Code first: https://code.claude.com"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Parse command line arguments
|
||||
VERBOSE=false
|
||||
SPECIFIC_TEST=""
|
||||
TIMEOUT=300 # Default 5 minute timeout per test
|
||||
RUN_INTEGRATION=false
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case $1 in
|
||||
--verbose|-v)
|
||||
VERBOSE=true
|
||||
shift
|
||||
;;
|
||||
--test|-t)
|
||||
SPECIFIC_TEST="$2"
|
||||
shift 2
|
||||
;;
|
||||
--timeout)
|
||||
TIMEOUT="$2"
|
||||
shift 2
|
||||
;;
|
||||
--integration|-i)
|
||||
RUN_INTEGRATION=true
|
||||
shift
|
||||
;;
|
||||
--help|-h)
|
||||
echo "Usage: $0 [options]"
|
||||
echo ""
|
||||
echo "Options:"
|
||||
echo " --verbose, -v Show verbose output"
|
||||
echo " --test, -t NAME Run only the specified test"
|
||||
echo " --timeout SECONDS Set timeout per test (default: 300)"
|
||||
echo " --integration, -i Run integration tests (slow, 10-30 min)"
|
||||
echo " --help, -h Show this help"
|
||||
echo ""
|
||||
echo "Tests:"
|
||||
echo " test-subagent-driven-development.sh Test skill loading and requirements"
|
||||
echo ""
|
||||
echo "Integration Tests (use --integration):"
|
||||
echo " test-subagent-driven-development-integration.sh Full workflow execution"
|
||||
exit 0
|
||||
;;
|
||||
*)
|
||||
echo "Unknown option: $1"
|
||||
echo "Use --help for usage information"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
# List of skill tests to run (fast unit tests)
|
||||
tests=(
|
||||
"test-subagent-driven-development.sh"
|
||||
)
|
||||
|
||||
# Integration tests (slow, full execution)
|
||||
integration_tests=(
|
||||
"test-subagent-driven-development-integration.sh"
|
||||
)
|
||||
|
||||
# Add integration tests if requested
|
||||
if [ "$RUN_INTEGRATION" = true ]; then
|
||||
tests+=("${integration_tests[@]}")
|
||||
fi
|
||||
|
||||
# Filter to specific test if requested
|
||||
if [ -n "$SPECIFIC_TEST" ]; then
|
||||
tests=("$SPECIFIC_TEST")
|
||||
fi
|
||||
|
||||
# Track results
|
||||
passed=0
|
||||
failed=0
|
||||
skipped=0
|
||||
|
||||
# Run each test
|
||||
for test in "${tests[@]}"; do
|
||||
echo "----------------------------------------"
|
||||
echo "Running: $test"
|
||||
echo "----------------------------------------"
|
||||
|
||||
test_path="$SCRIPT_DIR/$test"
|
||||
|
||||
if [ ! -f "$test_path" ]; then
|
||||
echo " [SKIP] Test file not found: $test"
|
||||
skipped=$((skipped + 1))
|
||||
continue
|
||||
fi
|
||||
|
||||
if [ ! -x "$test_path" ]; then
|
||||
echo " Making $test executable..."
|
||||
chmod +x "$test_path"
|
||||
fi
|
||||
|
||||
start_time=$(date +%s)
|
||||
|
||||
if [ "$VERBOSE" = true ]; then
|
||||
if timeout "$TIMEOUT" bash "$test_path"; then
|
||||
end_time=$(date +%s)
|
||||
duration=$((end_time - start_time))
|
||||
echo ""
|
||||
echo " [PASS] $test (${duration}s)"
|
||||
passed=$((passed + 1))
|
||||
else
|
||||
exit_code=$?
|
||||
end_time=$(date +%s)
|
||||
duration=$((end_time - start_time))
|
||||
echo ""
|
||||
if [ $exit_code -eq 124 ]; then
|
||||
echo " [FAIL] $test (timeout after ${TIMEOUT}s)"
|
||||
else
|
||||
echo " [FAIL] $test (${duration}s)"
|
||||
fi
|
||||
failed=$((failed + 1))
|
||||
fi
|
||||
else
|
||||
# Capture output for non-verbose mode
|
||||
if output=$(timeout "$TIMEOUT" bash "$test_path" 2>&1); then
|
||||
end_time=$(date +%s)
|
||||
duration=$((end_time - start_time))
|
||||
echo " [PASS] (${duration}s)"
|
||||
passed=$((passed + 1))
|
||||
else
|
||||
exit_code=$?
|
||||
end_time=$(date +%s)
|
||||
duration=$((end_time - start_time))
|
||||
if [ $exit_code -eq 124 ]; then
|
||||
echo " [FAIL] (timeout after ${TIMEOUT}s)"
|
||||
else
|
||||
echo " [FAIL] (${duration}s)"
|
||||
fi
|
||||
echo ""
|
||||
echo " Output:"
|
||||
echo "$output" | sed 's/^/ /'
|
||||
failed=$((failed + 1))
|
||||
fi
|
||||
fi
|
||||
|
||||
echo ""
|
||||
done
|
||||
|
||||
# Print summary
|
||||
echo "========================================"
|
||||
echo " Test Results Summary"
|
||||
echo "========================================"
|
||||
echo ""
|
||||
echo " Passed: $passed"
|
||||
echo " Failed: $failed"
|
||||
echo " Skipped: $skipped"
|
||||
echo ""
|
||||
|
||||
if [ "$RUN_INTEGRATION" = false ] && [ ${#integration_tests[@]} -gt 0 ]; then
|
||||
echo "Note: Integration tests were not run (they take 10-30 minutes)."
|
||||
echo "Use --integration flag to run full workflow execution tests."
|
||||
echo ""
|
||||
fi
|
||||
|
||||
if [ $failed -gt 0 ]; then
|
||||
echo "STATUS: FAILED"
|
||||
exit 1
|
||||
else
|
||||
echo "STATUS: PASSED"
|
||||
exit 0
|
||||
fi
|
||||
202
skills/superpowers/tests/claude-code/test-helpers.sh
Normal file
202
skills/superpowers/tests/claude-code/test-helpers.sh
Normal file
@@ -0,0 +1,202 @@
|
||||
#!/usr/bin/env bash
|
||||
# Helper functions for Claude Code skill tests
|
||||
|
||||
# Run Claude Code with a prompt and capture output
|
||||
# Usage: run_claude "prompt text" [timeout_seconds] [allowed_tools]
|
||||
run_claude() {
|
||||
local prompt="$1"
|
||||
local timeout="${2:-60}"
|
||||
local allowed_tools="${3:-}"
|
||||
local output_file=$(mktemp)
|
||||
|
||||
# Build command
|
||||
local cmd="claude -p \"$prompt\""
|
||||
if [ -n "$allowed_tools" ]; then
|
||||
cmd="$cmd --allowed-tools=$allowed_tools"
|
||||
fi
|
||||
|
||||
# Run Claude in headless mode with timeout
|
||||
if timeout "$timeout" bash -c "$cmd" > "$output_file" 2>&1; then
|
||||
cat "$output_file"
|
||||
rm -f "$output_file"
|
||||
return 0
|
||||
else
|
||||
local exit_code=$?
|
||||
cat "$output_file" >&2
|
||||
rm -f "$output_file"
|
||||
return $exit_code
|
||||
fi
|
||||
}
|
||||
|
||||
# Check if output contains a pattern
|
||||
# Usage: assert_contains "output" "pattern" "test name"
|
||||
assert_contains() {
|
||||
local output="$1"
|
||||
local pattern="$2"
|
||||
local test_name="${3:-test}"
|
||||
|
||||
if echo "$output" | grep -q "$pattern"; then
|
||||
echo " [PASS] $test_name"
|
||||
return 0
|
||||
else
|
||||
echo " [FAIL] $test_name"
|
||||
echo " Expected to find: $pattern"
|
||||
echo " In output:"
|
||||
echo "$output" | sed 's/^/ /'
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
# Check if output does NOT contain a pattern
|
||||
# Usage: assert_not_contains "output" "pattern" "test name"
|
||||
assert_not_contains() {
|
||||
local output="$1"
|
||||
local pattern="$2"
|
||||
local test_name="${3:-test}"
|
||||
|
||||
if echo "$output" | grep -q "$pattern"; then
|
||||
echo " [FAIL] $test_name"
|
||||
echo " Did not expect to find: $pattern"
|
||||
echo " In output:"
|
||||
echo "$output" | sed 's/^/ /'
|
||||
return 1
|
||||
else
|
||||
echo " [PASS] $test_name"
|
||||
return 0
|
||||
fi
|
||||
}
|
||||
|
||||
# Check if output matches a count
|
||||
# Usage: assert_count "output" "pattern" expected_count "test name"
|
||||
assert_count() {
|
||||
local output="$1"
|
||||
local pattern="$2"
|
||||
local expected="$3"
|
||||
local test_name="${4:-test}"
|
||||
|
||||
local actual=$(echo "$output" | grep -c "$pattern" || echo "0")
|
||||
|
||||
if [ "$actual" -eq "$expected" ]; then
|
||||
echo " [PASS] $test_name (found $actual instances)"
|
||||
return 0
|
||||
else
|
||||
echo " [FAIL] $test_name"
|
||||
echo " Expected $expected instances of: $pattern"
|
||||
echo " Found $actual instances"
|
||||
echo " In output:"
|
||||
echo "$output" | sed 's/^/ /'
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
# Check if pattern A appears before pattern B
|
||||
# Usage: assert_order "output" "pattern_a" "pattern_b" "test name"
|
||||
assert_order() {
|
||||
local output="$1"
|
||||
local pattern_a="$2"
|
||||
local pattern_b="$3"
|
||||
local test_name="${4:-test}"
|
||||
|
||||
# Get line numbers where patterns appear
|
||||
local line_a=$(echo "$output" | grep -n "$pattern_a" | head -1 | cut -d: -f1)
|
||||
local line_b=$(echo "$output" | grep -n "$pattern_b" | head -1 | cut -d: -f1)
|
||||
|
||||
if [ -z "$line_a" ]; then
|
||||
echo " [FAIL] $test_name: pattern A not found: $pattern_a"
|
||||
return 1
|
||||
fi
|
||||
|
||||
if [ -z "$line_b" ]; then
|
||||
echo " [FAIL] $test_name: pattern B not found: $pattern_b"
|
||||
return 1
|
||||
fi
|
||||
|
||||
if [ "$line_a" -lt "$line_b" ]; then
|
||||
echo " [PASS] $test_name (A at line $line_a, B at line $line_b)"
|
||||
return 0
|
||||
else
|
||||
echo " [FAIL] $test_name"
|
||||
echo " Expected '$pattern_a' before '$pattern_b'"
|
||||
echo " But found A at line $line_a, B at line $line_b"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
# Create a temporary test project directory
|
||||
# Usage: test_project=$(create_test_project)
|
||||
create_test_project() {
|
||||
local test_dir=$(mktemp -d)
|
||||
echo "$test_dir"
|
||||
}
|
||||
|
||||
# Cleanup test project
|
||||
# Usage: cleanup_test_project "$test_dir"
|
||||
cleanup_test_project() {
|
||||
local test_dir="$1"
|
||||
if [ -d "$test_dir" ]; then
|
||||
rm -rf "$test_dir"
|
||||
fi
|
||||
}
|
||||
|
||||
# Create a simple plan file for testing
|
||||
# Usage: create_test_plan "$project_dir" "$plan_name"
|
||||
create_test_plan() {
|
||||
local project_dir="$1"
|
||||
local plan_name="${2:-test-plan}"
|
||||
local plan_file="$project_dir/docs/plans/$plan_name.md"
|
||||
|
||||
mkdir -p "$(dirname "$plan_file")"
|
||||
|
||||
cat > "$plan_file" <<'EOF'
|
||||
# Test Implementation Plan
|
||||
|
||||
## Task 1: Create Hello Function
|
||||
|
||||
Create a simple hello function that returns "Hello, World!".
|
||||
|
||||
**File:** `src/hello.js`
|
||||
|
||||
**Implementation:**
|
||||
```javascript
|
||||
export function hello() {
|
||||
return "Hello, World!";
|
||||
}
|
||||
```
|
||||
|
||||
**Tests:** Write a test that verifies the function returns the expected string.
|
||||
|
||||
**Verification:** `npm test`
|
||||
|
||||
## Task 2: Create Goodbye Function
|
||||
|
||||
Create a goodbye function that takes a name and returns a goodbye message.
|
||||
|
||||
**File:** `src/goodbye.js`
|
||||
|
||||
**Implementation:**
|
||||
```javascript
|
||||
export function goodbye(name) {
|
||||
return `Goodbye, ${name}!`;
|
||||
}
|
||||
```
|
||||
|
||||
**Tests:** Write tests for:
|
||||
- Default name
|
||||
- Custom name
|
||||
- Edge cases (empty string, null)
|
||||
|
||||
**Verification:** `npm test`
|
||||
EOF
|
||||
|
||||
echo "$plan_file"
|
||||
}
|
||||
|
||||
# Export functions for use in tests
|
||||
export -f run_claude
|
||||
export -f assert_contains
|
||||
export -f assert_not_contains
|
||||
export -f assert_count
|
||||
export -f assert_order
|
||||
export -f create_test_project
|
||||
export -f cleanup_test_project
|
||||
export -f create_test_plan
|
||||
@@ -0,0 +1,314 @@
|
||||
#!/usr/bin/env bash
|
||||
# Integration Test: subagent-driven-development workflow
|
||||
# Actually executes a plan and verifies the new workflow behaviors
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
source "$SCRIPT_DIR/test-helpers.sh"
|
||||
|
||||
echo "========================================"
|
||||
echo " Integration Test: subagent-driven-development"
|
||||
echo "========================================"
|
||||
echo ""
|
||||
echo "This test executes a real plan using the skill and verifies:"
|
||||
echo " 1. Plan is read once (not per task)"
|
||||
echo " 2. Full task text provided to subagents"
|
||||
echo " 3. Subagents perform self-review"
|
||||
echo " 4. Spec compliance review before code quality"
|
||||
echo " 5. Review loops when issues found"
|
||||
echo " 6. Spec reviewer reads code independently"
|
||||
echo ""
|
||||
echo "WARNING: This test may take 10-30 minutes to complete."
|
||||
echo ""
|
||||
|
||||
# Create test project
|
||||
TEST_PROJECT=$(create_test_project)
|
||||
echo "Test project: $TEST_PROJECT"
|
||||
|
||||
# Trap to cleanup
|
||||
trap "cleanup_test_project $TEST_PROJECT" EXIT
|
||||
|
||||
# Set up minimal Node.js project
|
||||
cd "$TEST_PROJECT"
|
||||
|
||||
cat > package.json <<'EOF'
|
||||
{
|
||||
"name": "test-project",
|
||||
"version": "1.0.0",
|
||||
"type": "module",
|
||||
"scripts": {
|
||||
"test": "node --test"
|
||||
}
|
||||
}
|
||||
EOF
|
||||
|
||||
mkdir -p src test docs/plans
|
||||
|
||||
# Create a simple implementation plan
|
||||
cat > docs/plans/implementation-plan.md <<'EOF'
|
||||
# Test Implementation Plan
|
||||
|
||||
This is a minimal plan to test the subagent-driven-development workflow.
|
||||
|
||||
## Task 1: Create Add Function
|
||||
|
||||
Create a function that adds two numbers.
|
||||
|
||||
**File:** `src/math.js`
|
||||
|
||||
**Requirements:**
|
||||
- Function named `add`
|
||||
- Takes two parameters: `a` and `b`
|
||||
- Returns the sum of `a` and `b`
|
||||
- Export the function
|
||||
|
||||
**Implementation:**
|
||||
```javascript
|
||||
export function add(a, b) {
|
||||
return a + b;
|
||||
}
|
||||
```
|
||||
|
||||
**Tests:** Create `test/math.test.js` that verifies:
|
||||
- `add(2, 3)` returns `5`
|
||||
- `add(0, 0)` returns `0`
|
||||
- `add(-1, 1)` returns `0`
|
||||
|
||||
**Verification:** `npm test`
|
||||
|
||||
## Task 2: Create Multiply Function
|
||||
|
||||
Create a function that multiplies two numbers.
|
||||
|
||||
**File:** `src/math.js` (add to existing file)
|
||||
|
||||
**Requirements:**
|
||||
- Function named `multiply`
|
||||
- Takes two parameters: `a` and `b`
|
||||
- Returns the product of `a` and `b`
|
||||
- Export the function
|
||||
- DO NOT add any extra features (like power, divide, etc.)
|
||||
|
||||
**Implementation:**
|
||||
```javascript
|
||||
export function multiply(a, b) {
|
||||
return a * b;
|
||||
}
|
||||
```
|
||||
|
||||
**Tests:** Add to `test/math.test.js`:
|
||||
- `multiply(2, 3)` returns `6`
|
||||
- `multiply(0, 5)` returns `0`
|
||||
- `multiply(-2, 3)` returns `-6`
|
||||
|
||||
**Verification:** `npm test`
|
||||
EOF
|
||||
|
||||
# Initialize git repo
|
||||
git init --quiet
|
||||
git config user.email "test@test.com"
|
||||
git config user.name "Test User"
|
||||
git add .
|
||||
git commit -m "Initial commit" --quiet
|
||||
|
||||
echo ""
|
||||
echo "Project setup complete. Starting execution..."
|
||||
echo ""
|
||||
|
||||
# Run Claude with subagent-driven-development
|
||||
# Capture full output to analyze
|
||||
OUTPUT_FILE="$TEST_PROJECT/claude-output.txt"
|
||||
|
||||
# Create prompt file
|
||||
cat > "$TEST_PROJECT/prompt.txt" <<'EOF'
|
||||
I want you to execute the implementation plan at docs/plans/implementation-plan.md using the subagent-driven-development skill.
|
||||
|
||||
IMPORTANT: Follow the skill exactly. I will be verifying that you:
|
||||
1. Read the plan once at the beginning
|
||||
2. Provide full task text to subagents (don't make them read files)
|
||||
3. Ensure subagents do self-review before reporting
|
||||
4. Run spec compliance review before code quality review
|
||||
5. Use review loops when issues are found
|
||||
|
||||
Begin now. Execute the plan.
|
||||
EOF
|
||||
|
||||
# Note: We use a longer timeout since this is integration testing
|
||||
# Use --allowed-tools to enable tool usage in headless mode
|
||||
# IMPORTANT: Run from superpowers directory so local dev skills are available
|
||||
PROMPT="Change to directory $TEST_PROJECT and then execute the implementation plan at docs/plans/implementation-plan.md using the subagent-driven-development skill.
|
||||
|
||||
IMPORTANT: Follow the skill exactly. I will be verifying that you:
|
||||
1. Read the plan once at the beginning
|
||||
2. Provide full task text to subagents (don't make them read files)
|
||||
3. Ensure subagents do self-review before reporting
|
||||
4. Run spec compliance review before code quality review
|
||||
5. Use review loops when issues are found
|
||||
|
||||
Begin now. Execute the plan."
|
||||
|
||||
echo "Running Claude (output will be shown below and saved to $OUTPUT_FILE)..."
|
||||
echo "================================================================================"
|
||||
cd "$SCRIPT_DIR/../.." && timeout 1800 claude -p "$PROMPT" --allowed-tools=all --add-dir "$TEST_PROJECT" --permission-mode bypassPermissions 2>&1 | tee "$OUTPUT_FILE" || {
|
||||
echo ""
|
||||
echo "================================================================================"
|
||||
echo "EXECUTION FAILED (exit code: $?)"
|
||||
exit 1
|
||||
}
|
||||
echo "================================================================================"
|
||||
|
||||
echo ""
|
||||
echo "Execution complete. Analyzing results..."
|
||||
echo ""
|
||||
|
||||
# Find the session transcript
|
||||
# Session files are in ~/.claude/projects/-<working-dir>/<session-id>.jsonl
|
||||
WORKING_DIR_ESCAPED=$(echo "$SCRIPT_DIR/../.." | sed 's/\//-/g' | sed 's/^-//')
|
||||
SESSION_DIR="$HOME/.claude/projects/$WORKING_DIR_ESCAPED"
|
||||
|
||||
# Find the most recent session file (created during this test run)
|
||||
SESSION_FILE=$(find "$SESSION_DIR" -name "*.jsonl" -type f -mmin -60 2>/dev/null | sort -r | head -1)
|
||||
|
||||
if [ -z "$SESSION_FILE" ]; then
|
||||
echo "ERROR: Could not find session transcript file"
|
||||
echo "Looked in: $SESSION_DIR"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Analyzing session transcript: $(basename "$SESSION_FILE")"
|
||||
echo ""
|
||||
|
||||
# Verification tests
|
||||
FAILED=0
|
||||
|
||||
echo "=== Verification Tests ==="
|
||||
echo ""
|
||||
|
||||
# Test 1: Skill was invoked
|
||||
echo "Test 1: Skill tool invoked..."
|
||||
if grep -q '"name":"Skill".*"skill":"superpowers:subagent-driven-development"' "$SESSION_FILE"; then
|
||||
echo " [PASS] subagent-driven-development skill was invoked"
|
||||
else
|
||||
echo " [FAIL] Skill was not invoked"
|
||||
FAILED=$((FAILED + 1))
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Test 2: Subagents were used (Task tool)
|
||||
echo "Test 2: Subagents dispatched..."
|
||||
task_count=$(grep -c '"name":"Task"' "$SESSION_FILE" || echo "0")
|
||||
if [ "$task_count" -ge 2 ]; then
|
||||
echo " [PASS] $task_count subagents dispatched"
|
||||
else
|
||||
echo " [FAIL] Only $task_count subagent(s) dispatched (expected >= 2)"
|
||||
FAILED=$((FAILED + 1))
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Test 3: TodoWrite was used for tracking
|
||||
echo "Test 3: Task tracking..."
|
||||
todo_count=$(grep -c '"name":"TodoWrite"' "$SESSION_FILE" || echo "0")
|
||||
if [ "$todo_count" -ge 1 ]; then
|
||||
echo " [PASS] TodoWrite used $todo_count time(s) for task tracking"
|
||||
else
|
||||
echo " [FAIL] TodoWrite not used"
|
||||
FAILED=$((FAILED + 1))
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Test 6: Implementation actually works
|
||||
echo "Test 6: Implementation verification..."
|
||||
if [ -f "$TEST_PROJECT/src/math.js" ]; then
|
||||
echo " [PASS] src/math.js created"
|
||||
|
||||
if grep -q "export function add" "$TEST_PROJECT/src/math.js"; then
|
||||
echo " [PASS] add function exists"
|
||||
else
|
||||
echo " [FAIL] add function missing"
|
||||
FAILED=$((FAILED + 1))
|
||||
fi
|
||||
|
||||
if grep -q "export function multiply" "$TEST_PROJECT/src/math.js"; then
|
||||
echo " [PASS] multiply function exists"
|
||||
else
|
||||
echo " [FAIL] multiply function missing"
|
||||
FAILED=$((FAILED + 1))
|
||||
fi
|
||||
else
|
||||
echo " [FAIL] src/math.js not created"
|
||||
FAILED=$((FAILED + 1))
|
||||
fi
|
||||
|
||||
if [ -f "$TEST_PROJECT/test/math.test.js" ]; then
|
||||
echo " [PASS] test/math.test.js created"
|
||||
else
|
||||
echo " [FAIL] test/math.test.js not created"
|
||||
FAILED=$((FAILED + 1))
|
||||
fi
|
||||
|
||||
# Try running tests
|
||||
if cd "$TEST_PROJECT" && npm test > test-output.txt 2>&1; then
|
||||
echo " [PASS] Tests pass"
|
||||
else
|
||||
echo " [FAIL] Tests failed"
|
||||
cat test-output.txt
|
||||
FAILED=$((FAILED + 1))
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Test 7: Git commits show proper workflow
|
||||
echo "Test 7: Git commit history..."
|
||||
commit_count=$(git -C "$TEST_PROJECT" log --oneline | wc -l)
|
||||
if [ "$commit_count" -gt 2 ]; then # Initial + at least 2 task commits
|
||||
echo " [PASS] Multiple commits created ($commit_count total)"
|
||||
else
|
||||
echo " [FAIL] Too few commits ($commit_count, expected >2)"
|
||||
FAILED=$((FAILED + 1))
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Test 8: Check for extra features (spec compliance should catch)
|
||||
echo "Test 8: No extra features added (spec compliance)..."
|
||||
if grep -q "export function divide\|export function power\|export function subtract" "$TEST_PROJECT/src/math.js" 2>/dev/null; then
|
||||
echo " [WARN] Extra features found (spec review should have caught this)"
|
||||
# Not failing on this as it tests reviewer effectiveness
|
||||
else
|
||||
echo " [PASS] No extra features added"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Token Usage Analysis
|
||||
echo "========================================="
|
||||
echo " Token Usage Analysis"
|
||||
echo "========================================="
|
||||
echo ""
|
||||
python3 "$SCRIPT_DIR/analyze-token-usage.py" "$SESSION_FILE"
|
||||
echo ""
|
||||
|
||||
# Summary
|
||||
echo "========================================"
|
||||
echo " Test Summary"
|
||||
echo "========================================"
|
||||
echo ""
|
||||
|
||||
if [ $FAILED -eq 0 ]; then
|
||||
echo "STATUS: PASSED"
|
||||
echo "All verification tests passed!"
|
||||
echo ""
|
||||
echo "The subagent-driven-development skill correctly:"
|
||||
echo " ✓ Reads plan once at start"
|
||||
echo " ✓ Provides full task text to subagents"
|
||||
echo " ✓ Enforces self-review"
|
||||
echo " ✓ Runs spec compliance before code quality"
|
||||
echo " ✓ Spec reviewer verifies independently"
|
||||
echo " ✓ Produces working implementation"
|
||||
exit 0
|
||||
else
|
||||
echo "STATUS: FAILED"
|
||||
echo "Failed $FAILED verification tests"
|
||||
echo ""
|
||||
echo "Output saved to: $OUTPUT_FILE"
|
||||
echo ""
|
||||
echo "Review the output to see what went wrong."
|
||||
exit 1
|
||||
fi
|
||||
@@ -0,0 +1,165 @@
|
||||
#!/usr/bin/env bash
|
||||
# Test: subagent-driven-development skill
|
||||
# Verifies that the skill is loaded and follows correct workflow
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
source "$SCRIPT_DIR/test-helpers.sh"
|
||||
|
||||
echo "=== Test: subagent-driven-development skill ==="
|
||||
echo ""
|
||||
|
||||
# Test 1: Verify skill can be loaded
|
||||
echo "Test 1: Skill loading..."
|
||||
|
||||
output=$(run_claude "What is the subagent-driven-development skill? Describe its key steps briefly." 30)
|
||||
|
||||
if assert_contains "$output" "subagent-driven-development\|Subagent-Driven Development\|Subagent Driven" "Skill is recognized"; then
|
||||
: # pass
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if assert_contains "$output" "Load Plan\|read.*plan\|extract.*tasks" "Mentions loading plan"; then
|
||||
: # pass
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Test 2: Verify skill describes correct workflow order
|
||||
echo "Test 2: Workflow ordering..."
|
||||
|
||||
output=$(run_claude "In the subagent-driven-development skill, what comes first: spec compliance review or code quality review? Be specific about the order." 30)
|
||||
|
||||
if assert_order "$output" "spec.*compliance" "code.*quality" "Spec compliance before code quality"; then
|
||||
: # pass
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Test 3: Verify self-review is mentioned
|
||||
echo "Test 3: Self-review requirement..."
|
||||
|
||||
output=$(run_claude "Does the subagent-driven-development skill require implementers to do self-review? What should they check?" 30)
|
||||
|
||||
if assert_contains "$output" "self-review\|self review" "Mentions self-review"; then
|
||||
: # pass
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if assert_contains "$output" "completeness\|Completeness" "Checks completeness"; then
|
||||
: # pass
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Test 4: Verify plan is read once
|
||||
echo "Test 4: Plan reading efficiency..."
|
||||
|
||||
output=$(run_claude "In subagent-driven-development, how many times should the controller read the plan file? When does this happen?" 30)
|
||||
|
||||
if assert_contains "$output" "once\|one time\|single" "Read plan once"; then
|
||||
: # pass
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if assert_contains "$output" "Step 1\|beginning\|start\|Load Plan" "Read at beginning"; then
|
||||
: # pass
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Test 5: Verify spec compliance reviewer is skeptical
|
||||
echo "Test 5: Spec compliance reviewer mindset..."
|
||||
|
||||
output=$(run_claude "What is the spec compliance reviewer's attitude toward the implementer's report in subagent-driven-development?" 30)
|
||||
|
||||
if assert_contains "$output" "not trust\|don't trust\|skeptical\|verify.*independently\|suspiciously" "Reviewer is skeptical"; then
|
||||
: # pass
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if assert_contains "$output" "read.*code\|inspect.*code\|verify.*code" "Reviewer reads code"; then
|
||||
: # pass
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Test 6: Verify review loops
|
||||
echo "Test 6: Review loop requirements..."
|
||||
|
||||
output=$(run_claude "In subagent-driven-development, what happens if a reviewer finds issues? Is it a one-time review or a loop?" 30)
|
||||
|
||||
if assert_contains "$output" "loop\|again\|repeat\|until.*approved\|until.*compliant" "Review loops mentioned"; then
|
||||
: # pass
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if assert_contains "$output" "implementer.*fix\|fix.*issues" "Implementer fixes issues"; then
|
||||
: # pass
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Test 7: Verify full task text is provided
|
||||
echo "Test 7: Task context provision..."
|
||||
|
||||
output=$(run_claude "In subagent-driven-development, how does the controller provide task information to the implementer subagent? Does it make them read a file or provide it directly?" 30)
|
||||
|
||||
if assert_contains "$output" "provide.*directly\|full.*text\|paste\|include.*prompt" "Provides text directly"; then
|
||||
: # pass
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if assert_not_contains "$output" "read.*file\|open.*file" "Doesn't make subagent read file"; then
|
||||
: # pass
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Test 8: Verify worktree requirement
|
||||
echo "Test 8: Worktree requirement..."
|
||||
|
||||
output=$(run_claude "What workflow skills are required before using subagent-driven-development? List any prerequisites or required skills." 30)
|
||||
|
||||
if assert_contains "$output" "using-git-worktrees\|worktree" "Mentions worktree requirement"; then
|
||||
: # pass
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Test 9: Verify main branch warning
|
||||
echo "Test 9: Main branch red flag..."
|
||||
|
||||
output=$(run_claude "In subagent-driven-development, is it okay to start implementation directly on the main branch?" 30)
|
||||
|
||||
if assert_contains "$output" "worktree\|feature.*branch\|not.*main\|never.*main\|avoid.*main\|don't.*main\|consent\|permission" "Warns against main branch"; then
|
||||
: # pass
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
echo "=== All subagent-driven-development skill tests passed ==="
|
||||
Reference in New Issue
Block a user