QwenClaw v2.0 - Complete Rebuild with ALL 81+ Skills
This commit is contained in:
@@ -0,0 +1,3 @@
|
||||
The plan is done. docs/plans/auth-system.md has everything.
|
||||
|
||||
Do subagent-driven development on this - start with Task 1, dispatch a subagent, then we'll review.
|
||||
@@ -0,0 +1,17 @@
|
||||
Great, the plan is complete. I've saved it to docs/plans/auth-system.md.
|
||||
|
||||
Here's a summary of what we designed:
|
||||
- Task 1: Add User Model with email/password fields
|
||||
- Task 2: Create auth routes for login/register
|
||||
- Task 3: Add JWT middleware for protected routes
|
||||
- Task 4: Write tests for all auth functionality
|
||||
|
||||
Two execution options:
|
||||
1. Subagent-Driven (this session) - dispatch a fresh subagent per task
|
||||
2. Parallel Session (separate) - open new Claude Code session
|
||||
|
||||
Which approach do you want?
|
||||
|
||||
---
|
||||
|
||||
subagent-driven-development, please
|
||||
@@ -0,0 +1,11 @@
|
||||
[Previous assistant message]:
|
||||
Plan complete and saved to docs/plans/auth-system.md.
|
||||
|
||||
Two execution options:
|
||||
1. Subagent-Driven (this session) - I dispatch a fresh subagent per task, review between tasks, fast iteration within this conversation
|
||||
2. Parallel Session (separate) - Open a new Claude Code session with the execute-plan skill, batch execution with review checkpoints
|
||||
|
||||
Which approach do you want to use for implementation?
|
||||
|
||||
[Your response]:
|
||||
subagent-driven-development, please
|
||||
@@ -0,0 +1,8 @@
|
||||
I have my implementation plan ready at docs/plans/auth-system.md.
|
||||
|
||||
I want to use subagent-driven-development to execute it. That means:
|
||||
- Dispatch a fresh subagent for each task in the plan
|
||||
- Review the output between tasks
|
||||
- Keep iteration fast within this conversation
|
||||
|
||||
Let's start - please read the plan and begin dispatching subagents for each task.
|
||||
@@ -0,0 +1,3 @@
|
||||
I have a plan at docs/plans/auth-system.md that's ready to implement.
|
||||
|
||||
subagent-driven-development, please
|
||||
@@ -0,0 +1 @@
|
||||
please use the brainstorming skill to help me think through this feature
|
||||
@@ -0,0 +1,3 @@
|
||||
Plan is at docs/plans/auth-system.md.
|
||||
|
||||
subagent-driven-development, please. Don't waste time - just read the plan and start dispatching subagents immediately.
|
||||
@@ -0,0 +1 @@
|
||||
subagent-driven-development, please
|
||||
@@ -0,0 +1 @@
|
||||
use systematic-debugging to figure out what's wrong
|
||||
70
skills/superpowers/tests/explicit-skill-requests/run-all.sh
Normal file
70
skills/superpowers/tests/explicit-skill-requests/run-all.sh
Normal file
@@ -0,0 +1,70 @@
|
||||
#!/bin/bash
|
||||
# Run all explicit skill request tests
|
||||
# Usage: ./run-all.sh
|
||||
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PROMPTS_DIR="$SCRIPT_DIR/prompts"
|
||||
|
||||
echo "=== Running All Explicit Skill Request Tests ==="
|
||||
echo ""
|
||||
|
||||
PASSED=0
|
||||
FAILED=0
|
||||
RESULTS=""
|
||||
|
||||
# Test: subagent-driven-development, please
|
||||
echo ">>> Test 1: subagent-driven-development-please"
|
||||
if "$SCRIPT_DIR/run-test.sh" "subagent-driven-development" "$PROMPTS_DIR/subagent-driven-development-please.txt"; then
|
||||
PASSED=$((PASSED + 1))
|
||||
RESULTS="$RESULTS\nPASS: subagent-driven-development-please"
|
||||
else
|
||||
FAILED=$((FAILED + 1))
|
||||
RESULTS="$RESULTS\nFAIL: subagent-driven-development-please"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Test: use systematic-debugging
|
||||
echo ">>> Test 2: use-systematic-debugging"
|
||||
if "$SCRIPT_DIR/run-test.sh" "systematic-debugging" "$PROMPTS_DIR/use-systematic-debugging.txt"; then
|
||||
PASSED=$((PASSED + 1))
|
||||
RESULTS="$RESULTS\nPASS: use-systematic-debugging"
|
||||
else
|
||||
FAILED=$((FAILED + 1))
|
||||
RESULTS="$RESULTS\nFAIL: use-systematic-debugging"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Test: please use brainstorming
|
||||
echo ">>> Test 3: please-use-brainstorming"
|
||||
if "$SCRIPT_DIR/run-test.sh" "brainstorming" "$PROMPTS_DIR/please-use-brainstorming.txt"; then
|
||||
PASSED=$((PASSED + 1))
|
||||
RESULTS="$RESULTS\nPASS: please-use-brainstorming"
|
||||
else
|
||||
FAILED=$((FAILED + 1))
|
||||
RESULTS="$RESULTS\nFAIL: please-use-brainstorming"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Test: mid-conversation execute plan
|
||||
echo ">>> Test 4: mid-conversation-execute-plan"
|
||||
if "$SCRIPT_DIR/run-test.sh" "subagent-driven-development" "$PROMPTS_DIR/mid-conversation-execute-plan.txt"; then
|
||||
PASSED=$((PASSED + 1))
|
||||
RESULTS="$RESULTS\nPASS: mid-conversation-execute-plan"
|
||||
else
|
||||
FAILED=$((FAILED + 1))
|
||||
RESULTS="$RESULTS\nFAIL: mid-conversation-execute-plan"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
echo "=== Summary ==="
|
||||
echo -e "$RESULTS"
|
||||
echo ""
|
||||
echo "Passed: $PASSED"
|
||||
echo "Failed: $FAILED"
|
||||
echo "Total: $((PASSED + FAILED))"
|
||||
|
||||
if [ "$FAILED" -gt 0 ]; then
|
||||
exit 1
|
||||
fi
|
||||
@@ -0,0 +1,100 @@
|
||||
#!/bin/bash
|
||||
# Test where Claude explicitly describes subagent-driven-development before user requests it
|
||||
# This mimics the original failure scenario
|
||||
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PLUGIN_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
|
||||
TIMESTAMP=$(date +%s)
|
||||
OUTPUT_DIR="/tmp/superpowers-tests/${TIMESTAMP}/explicit-skill-requests/claude-describes"
|
||||
mkdir -p "$OUTPUT_DIR"
|
||||
|
||||
PROJECT_DIR="$OUTPUT_DIR/project"
|
||||
mkdir -p "$PROJECT_DIR/docs/plans"
|
||||
|
||||
echo "=== Test: Claude Describes SDD First ==="
|
||||
echo "Output dir: $OUTPUT_DIR"
|
||||
echo ""
|
||||
|
||||
cd "$PROJECT_DIR"
|
||||
|
||||
# Create a plan
|
||||
cat > "$PROJECT_DIR/docs/plans/auth-system.md" << 'EOF'
|
||||
# Auth System Implementation Plan
|
||||
|
||||
## Task 1: Add User Model
|
||||
Create user model with email and password fields.
|
||||
|
||||
## Task 2: Add Auth Routes
|
||||
Create login and register endpoints.
|
||||
|
||||
## Task 3: Add JWT Middleware
|
||||
Protect routes with JWT validation.
|
||||
EOF
|
||||
|
||||
# Turn 1: Have Claude describe execution options including SDD
|
||||
echo ">>> Turn 1: Ask Claude to describe execution options..."
|
||||
claude -p "I have a plan at docs/plans/auth-system.md. Tell me about my options for executing it, including what subagent-driven-development means and how it works." \
|
||||
--model haiku \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns 3 \
|
||||
--output-format stream-json \
|
||||
> "$OUTPUT_DIR/turn1.json" 2>&1 || true
|
||||
echo "Done."
|
||||
|
||||
# Turn 2: THE CRITICAL TEST - now that Claude has explained it
|
||||
echo ">>> Turn 2: Request subagent-driven-development..."
|
||||
FINAL_LOG="$OUTPUT_DIR/turn2.json"
|
||||
claude -p "subagent-driven-development, please" \
|
||||
--continue \
|
||||
--model haiku \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns 2 \
|
||||
--output-format stream-json \
|
||||
> "$FINAL_LOG" 2>&1 || true
|
||||
echo "Done."
|
||||
echo ""
|
||||
|
||||
echo "=== Results ==="
|
||||
|
||||
# Check Turn 1 to see if Claude described SDD
|
||||
echo "Turn 1 - Claude's description of options (excerpt):"
|
||||
grep '"type":"assistant"' "$OUTPUT_DIR/turn1.json" | head -1 | jq -r '.message.content[0].text // .message.content' 2>/dev/null | head -c 800 || echo " (could not extract)"
|
||||
echo ""
|
||||
echo "---"
|
||||
echo ""
|
||||
|
||||
# Check final turn
|
||||
SKILL_PATTERN='"skill":"([^"]*:)?subagent-driven-development"'
|
||||
if grep -q '"name":"Skill"' "$FINAL_LOG" && grep -qE "$SKILL_PATTERN" "$FINAL_LOG"; then
|
||||
echo "PASS: Skill was triggered after Claude described it"
|
||||
TRIGGERED=true
|
||||
else
|
||||
echo "FAIL: Skill was NOT triggered (Claude may have thought it already knew)"
|
||||
TRIGGERED=false
|
||||
|
||||
echo ""
|
||||
echo "Tools invoked in final turn:"
|
||||
grep '"type":"tool_use"' "$FINAL_LOG" | grep -o '"name":"[^"]*"' | sort -u | head -10 || echo " (none)"
|
||||
|
||||
echo ""
|
||||
echo "Final turn response:"
|
||||
grep '"type":"assistant"' "$FINAL_LOG" | head -1 | jq -r '.message.content[0].text // .message.content' 2>/dev/null | head -c 800 || echo " (could not extract)"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "Skills triggered in final turn:"
|
||||
grep -o '"skill":"[^"]*"' "$FINAL_LOG" 2>/dev/null | sort -u || echo " (none)"
|
||||
|
||||
echo ""
|
||||
echo "Logs in: $OUTPUT_DIR"
|
||||
|
||||
if [ "$TRIGGERED" = "true" ]; then
|
||||
exit 0
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
@@ -0,0 +1,113 @@
|
||||
#!/bin/bash
|
||||
# Extended multi-turn test with more conversation history
|
||||
# This tries to reproduce the failure by building more context
|
||||
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PLUGIN_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
|
||||
TIMESTAMP=$(date +%s)
|
||||
OUTPUT_DIR="/tmp/superpowers-tests/${TIMESTAMP}/explicit-skill-requests/extended-multiturn"
|
||||
mkdir -p "$OUTPUT_DIR"
|
||||
|
||||
PROJECT_DIR="$OUTPUT_DIR/project"
|
||||
mkdir -p "$PROJECT_DIR/docs/plans"
|
||||
|
||||
echo "=== Extended Multi-Turn Test ==="
|
||||
echo "Output dir: $OUTPUT_DIR"
|
||||
echo "Plugin dir: $PLUGIN_DIR"
|
||||
echo ""
|
||||
|
||||
cd "$PROJECT_DIR"
|
||||
|
||||
# Turn 1: Start brainstorming
|
||||
echo ">>> Turn 1: Brainstorming request..."
|
||||
claude -p "I want to add user authentication to my app. Help me think through this." \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns 3 \
|
||||
--output-format stream-json \
|
||||
> "$OUTPUT_DIR/turn1.json" 2>&1 || true
|
||||
echo "Done."
|
||||
|
||||
# Turn 2: Answer a brainstorming question
|
||||
echo ">>> Turn 2: Answering questions..."
|
||||
claude -p "Let's use JWT tokens with 24-hour expiry. Email/password registration." \
|
||||
--continue \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns 3 \
|
||||
--output-format stream-json \
|
||||
> "$OUTPUT_DIR/turn2.json" 2>&1 || true
|
||||
echo "Done."
|
||||
|
||||
# Turn 3: Ask to write a plan
|
||||
echo ">>> Turn 3: Requesting plan..."
|
||||
claude -p "Great, write this up as an implementation plan." \
|
||||
--continue \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns 3 \
|
||||
--output-format stream-json \
|
||||
> "$OUTPUT_DIR/turn3.json" 2>&1 || true
|
||||
echo "Done."
|
||||
|
||||
# Turn 4: Confirm plan looks good
|
||||
echo ">>> Turn 4: Confirming plan..."
|
||||
claude -p "The plan looks good. What are my options for executing it?" \
|
||||
--continue \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns 2 \
|
||||
--output-format stream-json \
|
||||
> "$OUTPUT_DIR/turn4.json" 2>&1 || true
|
||||
echo "Done."
|
||||
|
||||
# Turn 5: THE CRITICAL TEST
|
||||
echo ">>> Turn 5: Requesting subagent-driven-development..."
|
||||
FINAL_LOG="$OUTPUT_DIR/turn5.json"
|
||||
claude -p "subagent-driven-development, please" \
|
||||
--continue \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns 2 \
|
||||
--output-format stream-json \
|
||||
> "$FINAL_LOG" 2>&1 || true
|
||||
echo "Done."
|
||||
echo ""
|
||||
|
||||
echo "=== Results ==="
|
||||
|
||||
# Check final turn
|
||||
SKILL_PATTERN='"skill":"([^"]*:)?subagent-driven-development"'
|
||||
if grep -q '"name":"Skill"' "$FINAL_LOG" && grep -qE "$SKILL_PATTERN" "$FINAL_LOG"; then
|
||||
echo "PASS: Skill was triggered"
|
||||
TRIGGERED=true
|
||||
else
|
||||
echo "FAIL: Skill was NOT triggered"
|
||||
TRIGGERED=false
|
||||
|
||||
# Show what was invoked instead
|
||||
echo ""
|
||||
echo "Tools invoked in final turn:"
|
||||
grep '"type":"tool_use"' "$FINAL_LOG" | jq -r '.content[] | select(.type=="tool_use") | .name' 2>/dev/null | head -10 || \
|
||||
grep -o '"name":"[^"]*"' "$FINAL_LOG" | head -10 || echo " (none found)"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "Skills triggered:"
|
||||
grep -o '"skill":"[^"]*"' "$FINAL_LOG" 2>/dev/null | sort -u || echo " (none)"
|
||||
|
||||
echo ""
|
||||
echo "Final turn response (first 500 chars):"
|
||||
grep '"type":"assistant"' "$FINAL_LOG" | head -1 | jq -r '.message.content[0].text // .message.content' 2>/dev/null | head -c 500 || echo " (could not extract)"
|
||||
|
||||
echo ""
|
||||
echo "Logs in: $OUTPUT_DIR"
|
||||
|
||||
if [ "$TRIGGERED" = "true" ]; then
|
||||
exit 0
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
@@ -0,0 +1,144 @@
|
||||
#!/bin/bash
|
||||
# Test with haiku model and user's CLAUDE.md
|
||||
# This tests whether a cheaper/faster model fails more easily
|
||||
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PLUGIN_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
|
||||
TIMESTAMP=$(date +%s)
|
||||
OUTPUT_DIR="/tmp/superpowers-tests/${TIMESTAMP}/explicit-skill-requests/haiku"
|
||||
mkdir -p "$OUTPUT_DIR"
|
||||
|
||||
PROJECT_DIR="$OUTPUT_DIR/project"
|
||||
mkdir -p "$PROJECT_DIR/docs/plans"
|
||||
mkdir -p "$PROJECT_DIR/.claude"
|
||||
|
||||
echo "=== Haiku Model Test with User CLAUDE.md ==="
|
||||
echo "Output dir: $OUTPUT_DIR"
|
||||
echo "Plugin dir: $PLUGIN_DIR"
|
||||
echo ""
|
||||
|
||||
cd "$PROJECT_DIR"
|
||||
|
||||
# Copy user's CLAUDE.md to simulate real environment
|
||||
if [ -f "$HOME/.claude/CLAUDE.md" ]; then
|
||||
cp "$HOME/.claude/CLAUDE.md" "$PROJECT_DIR/.claude/CLAUDE.md"
|
||||
echo "Copied user CLAUDE.md"
|
||||
else
|
||||
echo "No user CLAUDE.md found, proceeding without"
|
||||
fi
|
||||
|
||||
# Create a dummy plan file
|
||||
cat > "$PROJECT_DIR/docs/plans/auth-system.md" << 'EOF'
|
||||
# Auth System Implementation Plan
|
||||
|
||||
## Task 1: Add User Model
|
||||
Create user model with email and password fields.
|
||||
|
||||
## Task 2: Add Auth Routes
|
||||
Create login and register endpoints.
|
||||
|
||||
## Task 3: Add JWT Middleware
|
||||
Protect routes with JWT validation.
|
||||
|
||||
## Task 4: Write Tests
|
||||
Add comprehensive test coverage.
|
||||
EOF
|
||||
|
||||
echo ""
|
||||
|
||||
# Turn 1: Start brainstorming
|
||||
echo ">>> Turn 1: Brainstorming request..."
|
||||
claude -p "I want to add user authentication to my app. Help me think through this." \
|
||||
--model haiku \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns 3 \
|
||||
--output-format stream-json \
|
||||
> "$OUTPUT_DIR/turn1.json" 2>&1 || true
|
||||
echo "Done."
|
||||
|
||||
# Turn 2: Answer questions
|
||||
echo ">>> Turn 2: Answering questions..."
|
||||
claude -p "Let's use JWT tokens with 24-hour expiry. Email/password registration." \
|
||||
--continue \
|
||||
--model haiku \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns 3 \
|
||||
--output-format stream-json \
|
||||
> "$OUTPUT_DIR/turn2.json" 2>&1 || true
|
||||
echo "Done."
|
||||
|
||||
# Turn 3: Ask to write a plan
|
||||
echo ">>> Turn 3: Requesting plan..."
|
||||
claude -p "Great, write this up as an implementation plan." \
|
||||
--continue \
|
||||
--model haiku \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns 3 \
|
||||
--output-format stream-json \
|
||||
> "$OUTPUT_DIR/turn3.json" 2>&1 || true
|
||||
echo "Done."
|
||||
|
||||
# Turn 4: Confirm plan looks good
|
||||
echo ">>> Turn 4: Confirming plan..."
|
||||
claude -p "The plan looks good. What are my options for executing it?" \
|
||||
--continue \
|
||||
--model haiku \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns 2 \
|
||||
--output-format stream-json \
|
||||
> "$OUTPUT_DIR/turn4.json" 2>&1 || true
|
||||
echo "Done."
|
||||
|
||||
# Turn 5: THE CRITICAL TEST
|
||||
echo ">>> Turn 5: Requesting subagent-driven-development..."
|
||||
FINAL_LOG="$OUTPUT_DIR/turn5.json"
|
||||
claude -p "subagent-driven-development, please" \
|
||||
--continue \
|
||||
--model haiku \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns 2 \
|
||||
--output-format stream-json \
|
||||
> "$FINAL_LOG" 2>&1 || true
|
||||
echo "Done."
|
||||
echo ""
|
||||
|
||||
echo "=== Results (Haiku) ==="
|
||||
|
||||
# Check final turn
|
||||
SKILL_PATTERN='"skill":"([^"]*:)?subagent-driven-development"'
|
||||
if grep -q '"name":"Skill"' "$FINAL_LOG" && grep -qE "$SKILL_PATTERN" "$FINAL_LOG"; then
|
||||
echo "PASS: Skill was triggered"
|
||||
TRIGGERED=true
|
||||
else
|
||||
echo "FAIL: Skill was NOT triggered"
|
||||
TRIGGERED=false
|
||||
|
||||
echo ""
|
||||
echo "Tools invoked in final turn:"
|
||||
grep '"type":"tool_use"' "$FINAL_LOG" | grep -o '"name":"[^"]*"' | head -10 || echo " (none)"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "Skills triggered:"
|
||||
grep -o '"skill":"[^"]*"' "$FINAL_LOG" 2>/dev/null | sort -u || echo " (none)"
|
||||
|
||||
echo ""
|
||||
echo "Final turn response (first 500 chars):"
|
||||
grep '"type":"assistant"' "$FINAL_LOG" | head -1 | jq -r '.message.content[0].text // .message.content' 2>/dev/null | head -c 500 || echo " (could not extract)"
|
||||
|
||||
echo ""
|
||||
echo "Logs in: $OUTPUT_DIR"
|
||||
|
||||
if [ "$TRIGGERED" = "true" ]; then
|
||||
exit 0
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
@@ -0,0 +1,143 @@
|
||||
#!/bin/bash
|
||||
# Test explicit skill requests in multi-turn conversations
|
||||
# Usage: ./run-multiturn-test.sh
|
||||
#
|
||||
# This test builds actual conversation history to reproduce the failure mode
|
||||
# where Claude skips skill invocation after extended conversation
|
||||
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PLUGIN_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
|
||||
TIMESTAMP=$(date +%s)
|
||||
OUTPUT_DIR="/tmp/superpowers-tests/${TIMESTAMP}/explicit-skill-requests/multiturn"
|
||||
mkdir -p "$OUTPUT_DIR"
|
||||
|
||||
# Create project directory (conversation is cwd-based)
|
||||
PROJECT_DIR="$OUTPUT_DIR/project"
|
||||
mkdir -p "$PROJECT_DIR/docs/plans"
|
||||
|
||||
echo "=== Multi-Turn Explicit Skill Request Test ==="
|
||||
echo "Output dir: $OUTPUT_DIR"
|
||||
echo "Project dir: $PROJECT_DIR"
|
||||
echo "Plugin dir: $PLUGIN_DIR"
|
||||
echo ""
|
||||
|
||||
cd "$PROJECT_DIR"
|
||||
|
||||
# Create a dummy plan file
|
||||
cat > "$PROJECT_DIR/docs/plans/auth-system.md" << 'EOF'
|
||||
# Auth System Implementation Plan
|
||||
|
||||
## Task 1: Add User Model
|
||||
Create user model with email and password fields.
|
||||
|
||||
## Task 2: Add Auth Routes
|
||||
Create login and register endpoints.
|
||||
|
||||
## Task 3: Add JWT Middleware
|
||||
Protect routes with JWT validation.
|
||||
|
||||
## Task 4: Write Tests
|
||||
Add comprehensive test coverage.
|
||||
EOF
|
||||
|
||||
# Turn 1: Start a planning conversation
|
||||
echo ">>> Turn 1: Starting planning conversation..."
|
||||
TURN1_LOG="$OUTPUT_DIR/turn1.json"
|
||||
claude -p "I need to implement an authentication system. Let's plan this out. The requirements are: user registration with email/password, JWT tokens, and protected routes." \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns 2 \
|
||||
--output-format stream-json \
|
||||
> "$TURN1_LOG" 2>&1 || true
|
||||
|
||||
echo "Turn 1 complete."
|
||||
echo ""
|
||||
|
||||
# Turn 2: Continue with more planning detail
|
||||
echo ">>> Turn 2: Continuing planning..."
|
||||
TURN2_LOG="$OUTPUT_DIR/turn2.json"
|
||||
claude -p "Good analysis. I've already written the plan to docs/plans/auth-system.md. Now I'm ready to implement. What are my options for execution?" \
|
||||
--continue \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns 2 \
|
||||
--output-format stream-json \
|
||||
> "$TURN2_LOG" 2>&1 || true
|
||||
|
||||
echo "Turn 2 complete."
|
||||
echo ""
|
||||
|
||||
# Turn 3: The critical test - ask for subagent-driven-development
|
||||
echo ">>> Turn 3: Requesting subagent-driven-development..."
|
||||
TURN3_LOG="$OUTPUT_DIR/turn3.json"
|
||||
claude -p "subagent-driven-development, please" \
|
||||
--continue \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns 2 \
|
||||
--output-format stream-json \
|
||||
> "$TURN3_LOG" 2>&1 || true
|
||||
|
||||
echo "Turn 3 complete."
|
||||
echo ""
|
||||
|
||||
echo "=== Results ==="
|
||||
|
||||
# Check if skill was triggered in Turn 3
|
||||
SKILL_PATTERN='"skill":"([^"]*:)?subagent-driven-development"'
|
||||
if grep -q '"name":"Skill"' "$TURN3_LOG" && grep -qE "$SKILL_PATTERN" "$TURN3_LOG"; then
|
||||
echo "PASS: Skill 'subagent-driven-development' was triggered in Turn 3"
|
||||
TRIGGERED=true
|
||||
else
|
||||
echo "FAIL: Skill 'subagent-driven-development' was NOT triggered in Turn 3"
|
||||
TRIGGERED=false
|
||||
fi
|
||||
|
||||
# Show what skills were triggered
|
||||
echo ""
|
||||
echo "Skills triggered in Turn 3:"
|
||||
grep -o '"skill":"[^"]*"' "$TURN3_LOG" 2>/dev/null | sort -u || echo " (none)"
|
||||
|
||||
# Check for premature action in Turn 3
|
||||
echo ""
|
||||
echo "Checking for premature action in Turn 3..."
|
||||
FIRST_SKILL_LINE=$(grep -n '"name":"Skill"' "$TURN3_LOG" | head -1 | cut -d: -f1)
|
||||
if [ -n "$FIRST_SKILL_LINE" ]; then
|
||||
PREMATURE_TOOLS=$(head -n "$FIRST_SKILL_LINE" "$TURN3_LOG" | \
|
||||
grep '"type":"tool_use"' | \
|
||||
grep -v '"name":"Skill"' | \
|
||||
grep -v '"name":"TodoWrite"' || true)
|
||||
if [ -n "$PREMATURE_TOOLS" ]; then
|
||||
echo "WARNING: Tools invoked BEFORE Skill tool in Turn 3:"
|
||||
echo "$PREMATURE_TOOLS" | head -5
|
||||
else
|
||||
echo "OK: No premature tool invocations detected"
|
||||
fi
|
||||
else
|
||||
echo "WARNING: No Skill invocation found in Turn 3"
|
||||
# Show what WAS invoked
|
||||
echo ""
|
||||
echo "Tools invoked in Turn 3:"
|
||||
grep '"type":"tool_use"' "$TURN3_LOG" | grep -o '"name":"[^"]*"' | head -10 || echo " (none)"
|
||||
fi
|
||||
|
||||
# Show Turn 3 assistant response
|
||||
echo ""
|
||||
echo "Turn 3 first assistant response (truncated):"
|
||||
grep '"type":"assistant"' "$TURN3_LOG" | head -1 | jq -r '.message.content[0].text // .message.content' 2>/dev/null | head -c 500 || echo " (could not extract)"
|
||||
|
||||
echo ""
|
||||
echo "Logs:"
|
||||
echo " Turn 1: $TURN1_LOG"
|
||||
echo " Turn 2: $TURN2_LOG"
|
||||
echo " Turn 3: $TURN3_LOG"
|
||||
echo "Timestamp: $TIMESTAMP"
|
||||
|
||||
if [ "$TRIGGERED" = "true" ]; then
|
||||
exit 0
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
136
skills/superpowers/tests/explicit-skill-requests/run-test.sh
Normal file
136
skills/superpowers/tests/explicit-skill-requests/run-test.sh
Normal file
@@ -0,0 +1,136 @@
|
||||
#!/bin/bash
|
||||
# Test explicit skill requests (user names a skill directly)
|
||||
# Usage: ./run-test.sh <skill-name> <prompt-file>
|
||||
#
|
||||
# Tests whether Claude invokes a skill when the user explicitly requests it by name
|
||||
# (without using the plugin namespace prefix)
|
||||
#
|
||||
# Uses isolated HOME to avoid user context interference
|
||||
|
||||
set -e
|
||||
|
||||
SKILL_NAME="$1"
|
||||
PROMPT_FILE="$2"
|
||||
MAX_TURNS="${3:-3}"
|
||||
|
||||
if [ -z "$SKILL_NAME" ] || [ -z "$PROMPT_FILE" ]; then
|
||||
echo "Usage: $0 <skill-name> <prompt-file> [max-turns]"
|
||||
echo "Example: $0 subagent-driven-development ./prompts/subagent-driven-development-please.txt"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Get the directory where this script lives
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
# Get the superpowers plugin root (two levels up)
|
||||
PLUGIN_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
|
||||
TIMESTAMP=$(date +%s)
|
||||
OUTPUT_DIR="/tmp/superpowers-tests/${TIMESTAMP}/explicit-skill-requests/${SKILL_NAME}"
|
||||
mkdir -p "$OUTPUT_DIR"
|
||||
|
||||
# Read prompt from file
|
||||
PROMPT=$(cat "$PROMPT_FILE")
|
||||
|
||||
echo "=== Explicit Skill Request Test ==="
|
||||
echo "Skill: $SKILL_NAME"
|
||||
echo "Prompt file: $PROMPT_FILE"
|
||||
echo "Max turns: $MAX_TURNS"
|
||||
echo "Output dir: $OUTPUT_DIR"
|
||||
echo ""
|
||||
|
||||
# Copy prompt for reference
|
||||
cp "$PROMPT_FILE" "$OUTPUT_DIR/prompt.txt"
|
||||
|
||||
# Create a minimal project directory for the test
|
||||
PROJECT_DIR="$OUTPUT_DIR/project"
|
||||
mkdir -p "$PROJECT_DIR/docs/plans"
|
||||
|
||||
# Create a dummy plan file for mid-conversation tests
|
||||
cat > "$PROJECT_DIR/docs/plans/auth-system.md" << 'EOF'
|
||||
# Auth System Implementation Plan
|
||||
|
||||
## Task 1: Add User Model
|
||||
Create user model with email and password fields.
|
||||
|
||||
## Task 2: Add Auth Routes
|
||||
Create login and register endpoints.
|
||||
|
||||
## Task 3: Add JWT Middleware
|
||||
Protect routes with JWT validation.
|
||||
EOF
|
||||
|
||||
# Run Claude with isolated environment
|
||||
LOG_FILE="$OUTPUT_DIR/claude-output.json"
|
||||
cd "$PROJECT_DIR"
|
||||
|
||||
echo "Plugin dir: $PLUGIN_DIR"
|
||||
echo "Running claude -p with explicit skill request..."
|
||||
echo "Prompt: $PROMPT"
|
||||
echo ""
|
||||
|
||||
timeout 300 claude -p "$PROMPT" \
|
||||
--plugin-dir "$PLUGIN_DIR" \
|
||||
--dangerously-skip-permissions \
|
||||
--max-turns "$MAX_TURNS" \
|
||||
--output-format stream-json \
|
||||
> "$LOG_FILE" 2>&1 || true
|
||||
|
||||
echo ""
|
||||
echo "=== Results ==="
|
||||
|
||||
# Check if skill was triggered (look for Skill tool invocation)
|
||||
# Match either "skill":"skillname" or "skill":"namespace:skillname"
|
||||
SKILL_PATTERN='"skill":"([^"]*:)?'"${SKILL_NAME}"'"'
|
||||
if grep -q '"name":"Skill"' "$LOG_FILE" && grep -qE "$SKILL_PATTERN" "$LOG_FILE"; then
|
||||
echo "PASS: Skill '$SKILL_NAME' was triggered"
|
||||
TRIGGERED=true
|
||||
else
|
||||
echo "FAIL: Skill '$SKILL_NAME' was NOT triggered"
|
||||
TRIGGERED=false
|
||||
fi
|
||||
|
||||
# Show what skills WERE triggered
|
||||
echo ""
|
||||
echo "Skills triggered in this run:"
|
||||
grep -o '"skill":"[^"]*"' "$LOG_FILE" 2>/dev/null | sort -u || echo " (none)"
|
||||
|
||||
# Check if Claude took action BEFORE invoking the skill (the failure mode)
|
||||
echo ""
|
||||
echo "Checking for premature action..."
|
||||
|
||||
# Look for tool invocations before the Skill invocation
|
||||
# This detects the failure mode where Claude starts doing work without loading the skill
|
||||
FIRST_SKILL_LINE=$(grep -n '"name":"Skill"' "$LOG_FILE" | head -1 | cut -d: -f1)
|
||||
if [ -n "$FIRST_SKILL_LINE" ]; then
|
||||
# Check if any non-Skill, non-system tools were invoked before the first Skill invocation
|
||||
# Filter out system messages, TodoWrite (planning is ok), and other non-action tools
|
||||
PREMATURE_TOOLS=$(head -n "$FIRST_SKILL_LINE" "$LOG_FILE" | \
|
||||
grep '"type":"tool_use"' | \
|
||||
grep -v '"name":"Skill"' | \
|
||||
grep -v '"name":"TodoWrite"' || true)
|
||||
if [ -n "$PREMATURE_TOOLS" ]; then
|
||||
echo "WARNING: Tools invoked BEFORE Skill tool:"
|
||||
echo "$PREMATURE_TOOLS" | head -5
|
||||
echo ""
|
||||
echo "This indicates Claude started working before loading the requested skill."
|
||||
else
|
||||
echo "OK: No premature tool invocations detected"
|
||||
fi
|
||||
else
|
||||
echo "WARNING: No Skill invocation found at all"
|
||||
fi
|
||||
|
||||
# Show first assistant message
|
||||
echo ""
|
||||
echo "First assistant response (truncated):"
|
||||
grep '"type":"assistant"' "$LOG_FILE" | head -1 | jq -r '.message.content[0].text // .message.content' 2>/dev/null | head -c 500 || echo " (could not extract)"
|
||||
|
||||
echo ""
|
||||
echo "Full log: $LOG_FILE"
|
||||
echo "Timestamp: $TIMESTAMP"
|
||||
|
||||
if [ "$TRIGGERED" = "true" ]; then
|
||||
exit 0
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
Reference in New Issue
Block a user