QwenClaw v2.0 - Complete Rebuild with ALL 81+ Skills

This commit is contained in:
AI Agent
2026-02-26 20:08:00 +04:00
Unverified
parent 7e297c53b9
commit 69cf7e8a05
475 changed files with 82593 additions and 110 deletions

View File

@@ -0,0 +1,158 @@
# Claude Code Skills Tests
Automated tests for superpowers skills using Claude Code CLI.
## Overview
This test suite verifies that skills are loaded correctly and Claude follows them as expected. Tests invoke Claude Code in headless mode (`claude -p`) and verify the behavior.
## Requirements
- Claude Code CLI installed and in PATH (`claude --version` should work)
- Local superpowers plugin installed (see main README for installation)
## Running Tests
### Run all fast tests (recommended):
```bash
./run-skill-tests.sh
```
### Run integration tests (slow, 10-30 minutes):
```bash
./run-skill-tests.sh --integration
```
### Run specific test:
```bash
./run-skill-tests.sh --test test-subagent-driven-development.sh
```
### Run with verbose output:
```bash
./run-skill-tests.sh --verbose
```
### Set custom timeout:
```bash
./run-skill-tests.sh --timeout 1800 # 30 minutes for integration tests
```
## Test Structure
### test-helpers.sh
Common functions for skills testing:
- `run_claude "prompt" [timeout]` - Run Claude with prompt
- `assert_contains output pattern name` - Verify pattern exists
- `assert_not_contains output pattern name` - Verify pattern absent
- `assert_count output pattern count name` - Verify exact count
- `assert_order output pattern_a pattern_b name` - Verify order
- `create_test_project` - Create temp test directory
- `create_test_plan project_dir` - Create sample plan file
### Test Files
Each test file:
1. Sources `test-helpers.sh`
2. Runs Claude Code with specific prompts
3. Verifies expected behavior using assertions
4. Returns 0 on success, non-zero on failure
## Example Test
```bash
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
source "$SCRIPT_DIR/test-helpers.sh"
echo "=== Test: My Skill ==="
# Ask Claude about the skill
output=$(run_claude "What does the my-skill skill do?" 30)
# Verify response
assert_contains "$output" "expected behavior" "Skill describes behavior"
echo "=== All tests passed ==="
```
## Current Tests
### Fast Tests (run by default)
#### test-subagent-driven-development.sh
Tests skill content and requirements (~2 minutes):
- Skill loading and accessibility
- Workflow ordering (spec compliance before code quality)
- Self-review requirements documented
- Plan reading efficiency documented
- Spec compliance reviewer skepticism documented
- Review loops documented
- Task context provision documented
### Integration Tests (use --integration flag)
#### test-subagent-driven-development-integration.sh
Full workflow execution test (~10-30 minutes):
- Creates real test project with Node.js setup
- Creates implementation plan with 2 tasks
- Executes plan using subagent-driven-development
- Verifies actual behaviors:
- Plan read once at start (not per task)
- Full task text provided in subagent prompts
- Subagents perform self-review before reporting
- Spec compliance review happens before code quality
- Spec reviewer reads code independently
- Working implementation is produced
- Tests pass
- Proper git commits created
**What it tests:**
- The workflow actually works end-to-end
- Our improvements are actually applied
- Subagents follow the skill correctly
- Final code is functional and tested
## Adding New Tests
1. Create new test file: `test-<skill-name>.sh`
2. Source test-helpers.sh
3. Write tests using `run_claude` and assertions
4. Add to test list in `run-skill-tests.sh`
5. Make executable: `chmod +x test-<skill-name>.sh`
## Timeout Considerations
- Default timeout: 5 minutes per test
- Claude Code may take time to respond
- Adjust with `--timeout` if needed
- Tests should be focused to avoid long runs
## Debugging Failed Tests
With `--verbose`, you'll see full Claude output:
```bash
./run-skill-tests.sh --verbose --test test-subagent-driven-development.sh
```
Without verbose, only failures show output.
## CI/CD Integration
To run in CI:
```bash
# Run with explicit timeout for CI environments
./run-skill-tests.sh --timeout 900
# Exit code 0 = success, non-zero = failure
```
## Notes
- Tests verify skill *instructions*, not full execution
- Full workflow tests would be very slow
- Focus on verifying key skill requirements
- Tests should be deterministic
- Avoid testing implementation details

View File

@@ -0,0 +1,168 @@
#!/usr/bin/env python3
"""
Analyze token usage from Claude Code session transcripts.
Breaks down usage by main session and individual subagents.
"""
import json
import sys
from pathlib import Path
from collections import defaultdict
def analyze_main_session(filepath):
"""Analyze a session file and return token usage broken down by agent."""
main_usage = {
'input_tokens': 0,
'output_tokens': 0,
'cache_creation': 0,
'cache_read': 0,
'messages': 0
}
# Track usage per subagent
subagent_usage = defaultdict(lambda: {
'input_tokens': 0,
'output_tokens': 0,
'cache_creation': 0,
'cache_read': 0,
'messages': 0,
'description': None
})
with open(filepath, 'r') as f:
for line in f:
try:
data = json.loads(line)
# Main session assistant messages
if data.get('type') == 'assistant' and 'message' in data:
main_usage['messages'] += 1
msg_usage = data['message'].get('usage', {})
main_usage['input_tokens'] += msg_usage.get('input_tokens', 0)
main_usage['output_tokens'] += msg_usage.get('output_tokens', 0)
main_usage['cache_creation'] += msg_usage.get('cache_creation_input_tokens', 0)
main_usage['cache_read'] += msg_usage.get('cache_read_input_tokens', 0)
# Subagent tool results
if data.get('type') == 'user' and 'toolUseResult' in data:
result = data['toolUseResult']
if 'usage' in result and 'agentId' in result:
agent_id = result['agentId']
usage = result['usage']
# Get description from prompt if available
if subagent_usage[agent_id]['description'] is None:
prompt = result.get('prompt', '')
# Extract first line as description
first_line = prompt.split('\n')[0] if prompt else f"agent-{agent_id}"
if first_line.startswith('You are '):
first_line = first_line[8:] # Remove "You are "
subagent_usage[agent_id]['description'] = first_line[:60]
subagent_usage[agent_id]['messages'] += 1
subagent_usage[agent_id]['input_tokens'] += usage.get('input_tokens', 0)
subagent_usage[agent_id]['output_tokens'] += usage.get('output_tokens', 0)
subagent_usage[agent_id]['cache_creation'] += usage.get('cache_creation_input_tokens', 0)
subagent_usage[agent_id]['cache_read'] += usage.get('cache_read_input_tokens', 0)
except:
pass
return main_usage, dict(subagent_usage)
def format_tokens(n):
"""Format token count with thousands separators."""
return f"{n:,}"
def calculate_cost(usage, input_cost_per_m=3.0, output_cost_per_m=15.0):
"""Calculate estimated cost in dollars."""
total_input = usage['input_tokens'] + usage['cache_creation'] + usage['cache_read']
input_cost = total_input * input_cost_per_m / 1_000_000
output_cost = usage['output_tokens'] * output_cost_per_m / 1_000_000
return input_cost + output_cost
def main():
if len(sys.argv) < 2:
print("Usage: analyze-token-usage.py <session-file.jsonl>")
sys.exit(1)
main_session_file = sys.argv[1]
if not Path(main_session_file).exists():
print(f"Error: Session file not found: {main_session_file}")
sys.exit(1)
# Analyze the session
main_usage, subagent_usage = analyze_main_session(main_session_file)
print("=" * 100)
print("TOKEN USAGE ANALYSIS")
print("=" * 100)
print()
# Print breakdown
print("Usage Breakdown:")
print("-" * 100)
print(f"{'Agent':<15} {'Description':<35} {'Msgs':>5} {'Input':>10} {'Output':>10} {'Cache':>10} {'Cost':>8}")
print("-" * 100)
# Main session
cost = calculate_cost(main_usage)
print(f"{'main':<15} {'Main session (coordinator)':<35} "
f"{main_usage['messages']:>5} "
f"{format_tokens(main_usage['input_tokens']):>10} "
f"{format_tokens(main_usage['output_tokens']):>10} "
f"{format_tokens(main_usage['cache_read']):>10} "
f"${cost:>7.2f}")
# Subagents (sorted by agent ID)
for agent_id in sorted(subagent_usage.keys()):
usage = subagent_usage[agent_id]
cost = calculate_cost(usage)
desc = usage['description'] or f"agent-{agent_id}"
print(f"{agent_id:<15} {desc:<35} "
f"{usage['messages']:>5} "
f"{format_tokens(usage['input_tokens']):>10} "
f"{format_tokens(usage['output_tokens']):>10} "
f"{format_tokens(usage['cache_read']):>10} "
f"${cost:>7.2f}")
print("-" * 100)
# Calculate totals
total_usage = {
'input_tokens': main_usage['input_tokens'],
'output_tokens': main_usage['output_tokens'],
'cache_creation': main_usage['cache_creation'],
'cache_read': main_usage['cache_read'],
'messages': main_usage['messages']
}
for usage in subagent_usage.values():
total_usage['input_tokens'] += usage['input_tokens']
total_usage['output_tokens'] += usage['output_tokens']
total_usage['cache_creation'] += usage['cache_creation']
total_usage['cache_read'] += usage['cache_read']
total_usage['messages'] += usage['messages']
total_input = total_usage['input_tokens'] + total_usage['cache_creation'] + total_usage['cache_read']
total_tokens = total_input + total_usage['output_tokens']
total_cost = calculate_cost(total_usage)
print()
print("TOTALS:")
print(f" Total messages: {format_tokens(total_usage['messages'])}")
print(f" Input tokens: {format_tokens(total_usage['input_tokens'])}")
print(f" Output tokens: {format_tokens(total_usage['output_tokens'])}")
print(f" Cache creation tokens: {format_tokens(total_usage['cache_creation'])}")
print(f" Cache read tokens: {format_tokens(total_usage['cache_read'])}")
print()
print(f" Total input (incl cache): {format_tokens(total_input)}")
print(f" Total tokens: {format_tokens(total_tokens)}")
print()
print(f" Estimated cost: ${total_cost:.2f}")
print(" (at $3/$15 per M tokens for input/output)")
print()
print("=" * 100)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,187 @@
#!/usr/bin/env bash
# Test runner for Claude Code skills
# Tests skills by invoking Claude Code CLI and verifying behavior
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
cd "$SCRIPT_DIR"
echo "========================================"
echo " Claude Code Skills Test Suite"
echo "========================================"
echo ""
echo "Repository: $(cd ../.. && pwd)"
echo "Test time: $(date)"
echo "Claude version: $(claude --version 2>/dev/null || echo 'not found')"
echo ""
# Check if Claude Code is available
if ! command -v claude &> /dev/null; then
echo "ERROR: Claude Code CLI not found"
echo "Install Claude Code first: https://code.claude.com"
exit 1
fi
# Parse command line arguments
VERBOSE=false
SPECIFIC_TEST=""
TIMEOUT=300 # Default 5 minute timeout per test
RUN_INTEGRATION=false
while [[ $# -gt 0 ]]; do
case $1 in
--verbose|-v)
VERBOSE=true
shift
;;
--test|-t)
SPECIFIC_TEST="$2"
shift 2
;;
--timeout)
TIMEOUT="$2"
shift 2
;;
--integration|-i)
RUN_INTEGRATION=true
shift
;;
--help|-h)
echo "Usage: $0 [options]"
echo ""
echo "Options:"
echo " --verbose, -v Show verbose output"
echo " --test, -t NAME Run only the specified test"
echo " --timeout SECONDS Set timeout per test (default: 300)"
echo " --integration, -i Run integration tests (slow, 10-30 min)"
echo " --help, -h Show this help"
echo ""
echo "Tests:"
echo " test-subagent-driven-development.sh Test skill loading and requirements"
echo ""
echo "Integration Tests (use --integration):"
echo " test-subagent-driven-development-integration.sh Full workflow execution"
exit 0
;;
*)
echo "Unknown option: $1"
echo "Use --help for usage information"
exit 1
;;
esac
done
# List of skill tests to run (fast unit tests)
tests=(
"test-subagent-driven-development.sh"
)
# Integration tests (slow, full execution)
integration_tests=(
"test-subagent-driven-development-integration.sh"
)
# Add integration tests if requested
if [ "$RUN_INTEGRATION" = true ]; then
tests+=("${integration_tests[@]}")
fi
# Filter to specific test if requested
if [ -n "$SPECIFIC_TEST" ]; then
tests=("$SPECIFIC_TEST")
fi
# Track results
passed=0
failed=0
skipped=0
# Run each test
for test in "${tests[@]}"; do
echo "----------------------------------------"
echo "Running: $test"
echo "----------------------------------------"
test_path="$SCRIPT_DIR/$test"
if [ ! -f "$test_path" ]; then
echo " [SKIP] Test file not found: $test"
skipped=$((skipped + 1))
continue
fi
if [ ! -x "$test_path" ]; then
echo " Making $test executable..."
chmod +x "$test_path"
fi
start_time=$(date +%s)
if [ "$VERBOSE" = true ]; then
if timeout "$TIMEOUT" bash "$test_path"; then
end_time=$(date +%s)
duration=$((end_time - start_time))
echo ""
echo " [PASS] $test (${duration}s)"
passed=$((passed + 1))
else
exit_code=$?
end_time=$(date +%s)
duration=$((end_time - start_time))
echo ""
if [ $exit_code -eq 124 ]; then
echo " [FAIL] $test (timeout after ${TIMEOUT}s)"
else
echo " [FAIL] $test (${duration}s)"
fi
failed=$((failed + 1))
fi
else
# Capture output for non-verbose mode
if output=$(timeout "$TIMEOUT" bash "$test_path" 2>&1); then
end_time=$(date +%s)
duration=$((end_time - start_time))
echo " [PASS] (${duration}s)"
passed=$((passed + 1))
else
exit_code=$?
end_time=$(date +%s)
duration=$((end_time - start_time))
if [ $exit_code -eq 124 ]; then
echo " [FAIL] (timeout after ${TIMEOUT}s)"
else
echo " [FAIL] (${duration}s)"
fi
echo ""
echo " Output:"
echo "$output" | sed 's/^/ /'
failed=$((failed + 1))
fi
fi
echo ""
done
# Print summary
echo "========================================"
echo " Test Results Summary"
echo "========================================"
echo ""
echo " Passed: $passed"
echo " Failed: $failed"
echo " Skipped: $skipped"
echo ""
if [ "$RUN_INTEGRATION" = false ] && [ ${#integration_tests[@]} -gt 0 ]; then
echo "Note: Integration tests were not run (they take 10-30 minutes)."
echo "Use --integration flag to run full workflow execution tests."
echo ""
fi
if [ $failed -gt 0 ]; then
echo "STATUS: FAILED"
exit 1
else
echo "STATUS: PASSED"
exit 0
fi

View File

@@ -0,0 +1,202 @@
#!/usr/bin/env bash
# Helper functions for Claude Code skill tests
# Run Claude Code with a prompt and capture output
# Usage: run_claude "prompt text" [timeout_seconds] [allowed_tools]
run_claude() {
local prompt="$1"
local timeout="${2:-60}"
local allowed_tools="${3:-}"
local output_file=$(mktemp)
# Build command
local cmd="claude -p \"$prompt\""
if [ -n "$allowed_tools" ]; then
cmd="$cmd --allowed-tools=$allowed_tools"
fi
# Run Claude in headless mode with timeout
if timeout "$timeout" bash -c "$cmd" > "$output_file" 2>&1; then
cat "$output_file"
rm -f "$output_file"
return 0
else
local exit_code=$?
cat "$output_file" >&2
rm -f "$output_file"
return $exit_code
fi
}
# Check if output contains a pattern
# Usage: assert_contains "output" "pattern" "test name"
assert_contains() {
local output="$1"
local pattern="$2"
local test_name="${3:-test}"
if echo "$output" | grep -q "$pattern"; then
echo " [PASS] $test_name"
return 0
else
echo " [FAIL] $test_name"
echo " Expected to find: $pattern"
echo " In output:"
echo "$output" | sed 's/^/ /'
return 1
fi
}
# Check if output does NOT contain a pattern
# Usage: assert_not_contains "output" "pattern" "test name"
assert_not_contains() {
local output="$1"
local pattern="$2"
local test_name="${3:-test}"
if echo "$output" | grep -q "$pattern"; then
echo " [FAIL] $test_name"
echo " Did not expect to find: $pattern"
echo " In output:"
echo "$output" | sed 's/^/ /'
return 1
else
echo " [PASS] $test_name"
return 0
fi
}
# Check if output matches a count
# Usage: assert_count "output" "pattern" expected_count "test name"
assert_count() {
local output="$1"
local pattern="$2"
local expected="$3"
local test_name="${4:-test}"
local actual=$(echo "$output" | grep -c "$pattern" || echo "0")
if [ "$actual" -eq "$expected" ]; then
echo " [PASS] $test_name (found $actual instances)"
return 0
else
echo " [FAIL] $test_name"
echo " Expected $expected instances of: $pattern"
echo " Found $actual instances"
echo " In output:"
echo "$output" | sed 's/^/ /'
return 1
fi
}
# Check if pattern A appears before pattern B
# Usage: assert_order "output" "pattern_a" "pattern_b" "test name"
assert_order() {
local output="$1"
local pattern_a="$2"
local pattern_b="$3"
local test_name="${4:-test}"
# Get line numbers where patterns appear
local line_a=$(echo "$output" | grep -n "$pattern_a" | head -1 | cut -d: -f1)
local line_b=$(echo "$output" | grep -n "$pattern_b" | head -1 | cut -d: -f1)
if [ -z "$line_a" ]; then
echo " [FAIL] $test_name: pattern A not found: $pattern_a"
return 1
fi
if [ -z "$line_b" ]; then
echo " [FAIL] $test_name: pattern B not found: $pattern_b"
return 1
fi
if [ "$line_a" -lt "$line_b" ]; then
echo " [PASS] $test_name (A at line $line_a, B at line $line_b)"
return 0
else
echo " [FAIL] $test_name"
echo " Expected '$pattern_a' before '$pattern_b'"
echo " But found A at line $line_a, B at line $line_b"
return 1
fi
}
# Create a temporary test project directory
# Usage: test_project=$(create_test_project)
create_test_project() {
local test_dir=$(mktemp -d)
echo "$test_dir"
}
# Cleanup test project
# Usage: cleanup_test_project "$test_dir"
cleanup_test_project() {
local test_dir="$1"
if [ -d "$test_dir" ]; then
rm -rf "$test_dir"
fi
}
# Create a simple plan file for testing
# Usage: create_test_plan "$project_dir" "$plan_name"
create_test_plan() {
local project_dir="$1"
local plan_name="${2:-test-plan}"
local plan_file="$project_dir/docs/plans/$plan_name.md"
mkdir -p "$(dirname "$plan_file")"
cat > "$plan_file" <<'EOF'
# Test Implementation Plan
## Task 1: Create Hello Function
Create a simple hello function that returns "Hello, World!".
**File:** `src/hello.js`
**Implementation:**
```javascript
export function hello() {
return "Hello, World!";
}
```
**Tests:** Write a test that verifies the function returns the expected string.
**Verification:** `npm test`
## Task 2: Create Goodbye Function
Create a goodbye function that takes a name and returns a goodbye message.
**File:** `src/goodbye.js`
**Implementation:**
```javascript
export function goodbye(name) {
return `Goodbye, ${name}!`;
}
```
**Tests:** Write tests for:
- Default name
- Custom name
- Edge cases (empty string, null)
**Verification:** `npm test`
EOF
echo "$plan_file"
}
# Export functions for use in tests
export -f run_claude
export -f assert_contains
export -f assert_not_contains
export -f assert_count
export -f assert_order
export -f create_test_project
export -f cleanup_test_project
export -f create_test_plan

View File

@@ -0,0 +1,314 @@
#!/usr/bin/env bash
# Integration Test: subagent-driven-development workflow
# Actually executes a plan and verifies the new workflow behaviors
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
source "$SCRIPT_DIR/test-helpers.sh"
echo "========================================"
echo " Integration Test: subagent-driven-development"
echo "========================================"
echo ""
echo "This test executes a real plan using the skill and verifies:"
echo " 1. Plan is read once (not per task)"
echo " 2. Full task text provided to subagents"
echo " 3. Subagents perform self-review"
echo " 4. Spec compliance review before code quality"
echo " 5. Review loops when issues found"
echo " 6. Spec reviewer reads code independently"
echo ""
echo "WARNING: This test may take 10-30 minutes to complete."
echo ""
# Create test project
TEST_PROJECT=$(create_test_project)
echo "Test project: $TEST_PROJECT"
# Trap to cleanup
trap "cleanup_test_project $TEST_PROJECT" EXIT
# Set up minimal Node.js project
cd "$TEST_PROJECT"
cat > package.json <<'EOF'
{
"name": "test-project",
"version": "1.0.0",
"type": "module",
"scripts": {
"test": "node --test"
}
}
EOF
mkdir -p src test docs/plans
# Create a simple implementation plan
cat > docs/plans/implementation-plan.md <<'EOF'
# Test Implementation Plan
This is a minimal plan to test the subagent-driven-development workflow.
## Task 1: Create Add Function
Create a function that adds two numbers.
**File:** `src/math.js`
**Requirements:**
- Function named `add`
- Takes two parameters: `a` and `b`
- Returns the sum of `a` and `b`
- Export the function
**Implementation:**
```javascript
export function add(a, b) {
return a + b;
}
```
**Tests:** Create `test/math.test.js` that verifies:
- `add(2, 3)` returns `5`
- `add(0, 0)` returns `0`
- `add(-1, 1)` returns `0`
**Verification:** `npm test`
## Task 2: Create Multiply Function
Create a function that multiplies two numbers.
**File:** `src/math.js` (add to existing file)
**Requirements:**
- Function named `multiply`
- Takes two parameters: `a` and `b`
- Returns the product of `a` and `b`
- Export the function
- DO NOT add any extra features (like power, divide, etc.)
**Implementation:**
```javascript
export function multiply(a, b) {
return a * b;
}
```
**Tests:** Add to `test/math.test.js`:
- `multiply(2, 3)` returns `6`
- `multiply(0, 5)` returns `0`
- `multiply(-2, 3)` returns `-6`
**Verification:** `npm test`
EOF
# Initialize git repo
git init --quiet
git config user.email "test@test.com"
git config user.name "Test User"
git add .
git commit -m "Initial commit" --quiet
echo ""
echo "Project setup complete. Starting execution..."
echo ""
# Run Claude with subagent-driven-development
# Capture full output to analyze
OUTPUT_FILE="$TEST_PROJECT/claude-output.txt"
# Create prompt file
cat > "$TEST_PROJECT/prompt.txt" <<'EOF'
I want you to execute the implementation plan at docs/plans/implementation-plan.md using the subagent-driven-development skill.
IMPORTANT: Follow the skill exactly. I will be verifying that you:
1. Read the plan once at the beginning
2. Provide full task text to subagents (don't make them read files)
3. Ensure subagents do self-review before reporting
4. Run spec compliance review before code quality review
5. Use review loops when issues are found
Begin now. Execute the plan.
EOF
# Note: We use a longer timeout since this is integration testing
# Use --allowed-tools to enable tool usage in headless mode
# IMPORTANT: Run from superpowers directory so local dev skills are available
PROMPT="Change to directory $TEST_PROJECT and then execute the implementation plan at docs/plans/implementation-plan.md using the subagent-driven-development skill.
IMPORTANT: Follow the skill exactly. I will be verifying that you:
1. Read the plan once at the beginning
2. Provide full task text to subagents (don't make them read files)
3. Ensure subagents do self-review before reporting
4. Run spec compliance review before code quality review
5. Use review loops when issues are found
Begin now. Execute the plan."
echo "Running Claude (output will be shown below and saved to $OUTPUT_FILE)..."
echo "================================================================================"
cd "$SCRIPT_DIR/../.." && timeout 1800 claude -p "$PROMPT" --allowed-tools=all --add-dir "$TEST_PROJECT" --permission-mode bypassPermissions 2>&1 | tee "$OUTPUT_FILE" || {
echo ""
echo "================================================================================"
echo "EXECUTION FAILED (exit code: $?)"
exit 1
}
echo "================================================================================"
echo ""
echo "Execution complete. Analyzing results..."
echo ""
# Find the session transcript
# Session files are in ~/.claude/projects/-<working-dir>/<session-id>.jsonl
WORKING_DIR_ESCAPED=$(echo "$SCRIPT_DIR/../.." | sed 's/\//-/g' | sed 's/^-//')
SESSION_DIR="$HOME/.claude/projects/$WORKING_DIR_ESCAPED"
# Find the most recent session file (created during this test run)
SESSION_FILE=$(find "$SESSION_DIR" -name "*.jsonl" -type f -mmin -60 2>/dev/null | sort -r | head -1)
if [ -z "$SESSION_FILE" ]; then
echo "ERROR: Could not find session transcript file"
echo "Looked in: $SESSION_DIR"
exit 1
fi
echo "Analyzing session transcript: $(basename "$SESSION_FILE")"
echo ""
# Verification tests
FAILED=0
echo "=== Verification Tests ==="
echo ""
# Test 1: Skill was invoked
echo "Test 1: Skill tool invoked..."
if grep -q '"name":"Skill".*"skill":"superpowers:subagent-driven-development"' "$SESSION_FILE"; then
echo " [PASS] subagent-driven-development skill was invoked"
else
echo " [FAIL] Skill was not invoked"
FAILED=$((FAILED + 1))
fi
echo ""
# Test 2: Subagents were used (Task tool)
echo "Test 2: Subagents dispatched..."
task_count=$(grep -c '"name":"Task"' "$SESSION_FILE" || echo "0")
if [ "$task_count" -ge 2 ]; then
echo " [PASS] $task_count subagents dispatched"
else
echo " [FAIL] Only $task_count subagent(s) dispatched (expected >= 2)"
FAILED=$((FAILED + 1))
fi
echo ""
# Test 3: TodoWrite was used for tracking
echo "Test 3: Task tracking..."
todo_count=$(grep -c '"name":"TodoWrite"' "$SESSION_FILE" || echo "0")
if [ "$todo_count" -ge 1 ]; then
echo " [PASS] TodoWrite used $todo_count time(s) for task tracking"
else
echo " [FAIL] TodoWrite not used"
FAILED=$((FAILED + 1))
fi
echo ""
# Test 6: Implementation actually works
echo "Test 6: Implementation verification..."
if [ -f "$TEST_PROJECT/src/math.js" ]; then
echo " [PASS] src/math.js created"
if grep -q "export function add" "$TEST_PROJECT/src/math.js"; then
echo " [PASS] add function exists"
else
echo " [FAIL] add function missing"
FAILED=$((FAILED + 1))
fi
if grep -q "export function multiply" "$TEST_PROJECT/src/math.js"; then
echo " [PASS] multiply function exists"
else
echo " [FAIL] multiply function missing"
FAILED=$((FAILED + 1))
fi
else
echo " [FAIL] src/math.js not created"
FAILED=$((FAILED + 1))
fi
if [ -f "$TEST_PROJECT/test/math.test.js" ]; then
echo " [PASS] test/math.test.js created"
else
echo " [FAIL] test/math.test.js not created"
FAILED=$((FAILED + 1))
fi
# Try running tests
if cd "$TEST_PROJECT" && npm test > test-output.txt 2>&1; then
echo " [PASS] Tests pass"
else
echo " [FAIL] Tests failed"
cat test-output.txt
FAILED=$((FAILED + 1))
fi
echo ""
# Test 7: Git commits show proper workflow
echo "Test 7: Git commit history..."
commit_count=$(git -C "$TEST_PROJECT" log --oneline | wc -l)
if [ "$commit_count" -gt 2 ]; then # Initial + at least 2 task commits
echo " [PASS] Multiple commits created ($commit_count total)"
else
echo " [FAIL] Too few commits ($commit_count, expected >2)"
FAILED=$((FAILED + 1))
fi
echo ""
# Test 8: Check for extra features (spec compliance should catch)
echo "Test 8: No extra features added (spec compliance)..."
if grep -q "export function divide\|export function power\|export function subtract" "$TEST_PROJECT/src/math.js" 2>/dev/null; then
echo " [WARN] Extra features found (spec review should have caught this)"
# Not failing on this as it tests reviewer effectiveness
else
echo " [PASS] No extra features added"
fi
echo ""
# Token Usage Analysis
echo "========================================="
echo " Token Usage Analysis"
echo "========================================="
echo ""
python3 "$SCRIPT_DIR/analyze-token-usage.py" "$SESSION_FILE"
echo ""
# Summary
echo "========================================"
echo " Test Summary"
echo "========================================"
echo ""
if [ $FAILED -eq 0 ]; then
echo "STATUS: PASSED"
echo "All verification tests passed!"
echo ""
echo "The subagent-driven-development skill correctly:"
echo " ✓ Reads plan once at start"
echo " ✓ Provides full task text to subagents"
echo " ✓ Enforces self-review"
echo " ✓ Runs spec compliance before code quality"
echo " ✓ Spec reviewer verifies independently"
echo " ✓ Produces working implementation"
exit 0
else
echo "STATUS: FAILED"
echo "Failed $FAILED verification tests"
echo ""
echo "Output saved to: $OUTPUT_FILE"
echo ""
echo "Review the output to see what went wrong."
exit 1
fi

View File

@@ -0,0 +1,165 @@
#!/usr/bin/env bash
# Test: subagent-driven-development skill
# Verifies that the skill is loaded and follows correct workflow
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
source "$SCRIPT_DIR/test-helpers.sh"
echo "=== Test: subagent-driven-development skill ==="
echo ""
# Test 1: Verify skill can be loaded
echo "Test 1: Skill loading..."
output=$(run_claude "What is the subagent-driven-development skill? Describe its key steps briefly." 30)
if assert_contains "$output" "subagent-driven-development\|Subagent-Driven Development\|Subagent Driven" "Skill is recognized"; then
: # pass
else
exit 1
fi
if assert_contains "$output" "Load Plan\|read.*plan\|extract.*tasks" "Mentions loading plan"; then
: # pass
else
exit 1
fi
echo ""
# Test 2: Verify skill describes correct workflow order
echo "Test 2: Workflow ordering..."
output=$(run_claude "In the subagent-driven-development skill, what comes first: spec compliance review or code quality review? Be specific about the order." 30)
if assert_order "$output" "spec.*compliance" "code.*quality" "Spec compliance before code quality"; then
: # pass
else
exit 1
fi
echo ""
# Test 3: Verify self-review is mentioned
echo "Test 3: Self-review requirement..."
output=$(run_claude "Does the subagent-driven-development skill require implementers to do self-review? What should they check?" 30)
if assert_contains "$output" "self-review\|self review" "Mentions self-review"; then
: # pass
else
exit 1
fi
if assert_contains "$output" "completeness\|Completeness" "Checks completeness"; then
: # pass
else
exit 1
fi
echo ""
# Test 4: Verify plan is read once
echo "Test 4: Plan reading efficiency..."
output=$(run_claude "In subagent-driven-development, how many times should the controller read the plan file? When does this happen?" 30)
if assert_contains "$output" "once\|one time\|single" "Read plan once"; then
: # pass
else
exit 1
fi
if assert_contains "$output" "Step 1\|beginning\|start\|Load Plan" "Read at beginning"; then
: # pass
else
exit 1
fi
echo ""
# Test 5: Verify spec compliance reviewer is skeptical
echo "Test 5: Spec compliance reviewer mindset..."
output=$(run_claude "What is the spec compliance reviewer's attitude toward the implementer's report in subagent-driven-development?" 30)
if assert_contains "$output" "not trust\|don't trust\|skeptical\|verify.*independently\|suspiciously" "Reviewer is skeptical"; then
: # pass
else
exit 1
fi
if assert_contains "$output" "read.*code\|inspect.*code\|verify.*code" "Reviewer reads code"; then
: # pass
else
exit 1
fi
echo ""
# Test 6: Verify review loops
echo "Test 6: Review loop requirements..."
output=$(run_claude "In subagent-driven-development, what happens if a reviewer finds issues? Is it a one-time review or a loop?" 30)
if assert_contains "$output" "loop\|again\|repeat\|until.*approved\|until.*compliant" "Review loops mentioned"; then
: # pass
else
exit 1
fi
if assert_contains "$output" "implementer.*fix\|fix.*issues" "Implementer fixes issues"; then
: # pass
else
exit 1
fi
echo ""
# Test 7: Verify full task text is provided
echo "Test 7: Task context provision..."
output=$(run_claude "In subagent-driven-development, how does the controller provide task information to the implementer subagent? Does it make them read a file or provide it directly?" 30)
if assert_contains "$output" "provide.*directly\|full.*text\|paste\|include.*prompt" "Provides text directly"; then
: # pass
else
exit 1
fi
if assert_not_contains "$output" "read.*file\|open.*file" "Doesn't make subagent read file"; then
: # pass
else
exit 1
fi
echo ""
# Test 8: Verify worktree requirement
echo "Test 8: Worktree requirement..."
output=$(run_claude "What workflow skills are required before using subagent-driven-development? List any prerequisites or required skills." 30)
if assert_contains "$output" "using-git-worktrees\|worktree" "Mentions worktree requirement"; then
: # pass
else
exit 1
fi
echo ""
# Test 9: Verify main branch warning
echo "Test 9: Main branch red flag..."
output=$(run_claude "In subagent-driven-development, is it okay to start implementation directly on the main branch?" 30)
if assert_contains "$output" "worktree\|feature.*branch\|not.*main\|never.*main\|avoid.*main\|don't.*main\|consent\|permission" "Warns against main branch"; then
: # pass
else
exit 1
fi
echo ""
echo "=== All subagent-driven-development skill tests passed ==="