v1.4.0: Major Skills Expansion - 75 Total Skills
This commit is contained in:
@@ -0,0 +1,294 @@
|
||||
# OpenCode Support Design
|
||||
|
||||
**Date:** 2025-11-22
|
||||
**Author:** Bot & Jesse
|
||||
**Status:** Design Complete, Awaiting Implementation
|
||||
|
||||
## Overview
|
||||
|
||||
Add full superpowers support for OpenCode.ai using a native OpenCode plugin architecture that shares core functionality with the existing Codex implementation.
|
||||
|
||||
## Background
|
||||
|
||||
OpenCode.ai is a coding agent similar to Claude Code and Codex. Previous attempts to port superpowers to OpenCode (PR #93, PR #116) used file-copying approaches. This design takes a different approach: building a native OpenCode plugin using their JavaScript/TypeScript plugin system while sharing code with the Codex implementation.
|
||||
|
||||
### Key Differences Between Platforms
|
||||
|
||||
- **Claude Code**: Native Anthropic plugin system + file-based skills
|
||||
- **Codex**: No plugin system → bootstrap markdown + CLI script
|
||||
- **OpenCode**: JavaScript/TypeScript plugins with event hooks and custom tools API
|
||||
|
||||
### OpenCode's Agent System
|
||||
|
||||
- **Primary agents**: Build (default, full access) and Plan (restricted, read-only)
|
||||
- **Subagents**: General (research, searching, multi-step tasks)
|
||||
- **Invocation**: Automatic dispatch by primary agents OR manual `@mention` syntax
|
||||
- **Configuration**: Custom agents in `opencode.json` or `~/.config/opencode/agent/`
|
||||
|
||||
## Architecture
|
||||
|
||||
### High-Level Structure
|
||||
|
||||
1. **Shared Core Module** (`lib/skills-core.js`)
|
||||
- Common skill discovery and parsing logic
|
||||
- Used by both Codex and OpenCode implementations
|
||||
|
||||
2. **Platform-Specific Wrappers**
|
||||
- Codex: CLI script (`.codex/superpowers-codex`)
|
||||
- OpenCode: Plugin module (`.opencode/plugin/superpowers.js`)
|
||||
|
||||
3. **Skill Directories**
|
||||
- Core: `~/.config/opencode/superpowers/skills/` (or installed location)
|
||||
- Personal: `~/.config/opencode/skills/` (shadows core skills)
|
||||
|
||||
### Code Reuse Strategy
|
||||
|
||||
Extract common functionality from `.codex/superpowers-codex` into shared module:
|
||||
|
||||
```javascript
|
||||
// lib/skills-core.js
|
||||
module.exports = {
|
||||
extractFrontmatter(filePath), // Parse name + description from YAML
|
||||
findSkillsInDir(dir, maxDepth), // Recursive SKILL.md discovery
|
||||
findAllSkills(dirs), // Scan multiple directories
|
||||
resolveSkillPath(skillName, dirs), // Handle shadowing (personal > core)
|
||||
checkForUpdates(repoDir) // Git fetch/status check
|
||||
};
|
||||
```
|
||||
|
||||
### Skill Frontmatter Format
|
||||
|
||||
Current format (no `when_to_use` field):
|
||||
|
||||
```yaml
|
||||
---
|
||||
name: skill-name
|
||||
description: Use when [condition] - [what it does]; [additional context]
|
||||
---
|
||||
```
|
||||
|
||||
## OpenCode Plugin Implementation
|
||||
|
||||
### Custom Tools
|
||||
|
||||
**Tool 1: `use_skill`**
|
||||
|
||||
Loads a specific skill's content into the conversation (equivalent to Claude's Skill tool).
|
||||
|
||||
```javascript
|
||||
{
|
||||
name: 'use_skill',
|
||||
description: 'Load and read a specific skill to guide your work',
|
||||
schema: z.object({
|
||||
skill_name: z.string().describe('Name of skill (e.g., "superpowers:brainstorming")')
|
||||
}),
|
||||
execute: async ({ skill_name }) => {
|
||||
const { skillPath, content, frontmatter } = resolveAndReadSkill(skill_name);
|
||||
const skillDir = path.dirname(skillPath);
|
||||
|
||||
return `# ${frontmatter.name}
|
||||
# ${frontmatter.description}
|
||||
# Supporting tools and docs are in ${skillDir}
|
||||
# ============================================
|
||||
|
||||
${content}`;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Tool 2: `find_skills`**
|
||||
|
||||
Lists all available skills with metadata.
|
||||
|
||||
```javascript
|
||||
{
|
||||
name: 'find_skills',
|
||||
description: 'List all available skills',
|
||||
schema: z.object({}),
|
||||
execute: async () => {
|
||||
const skills = discoverAllSkills();
|
||||
return skills.map(s =>
|
||||
`${s.namespace}:${s.name}
|
||||
${s.description}
|
||||
Directory: ${s.directory}
|
||||
`).join('\n');
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Session Startup Hook
|
||||
|
||||
When a new session starts (`session.started` event):
|
||||
|
||||
1. **Inject using-superpowers content**
|
||||
- Full content of the using-superpowers skill
|
||||
- Establishes mandatory workflows
|
||||
|
||||
2. **Run find_skills automatically**
|
||||
- Display full list of available skills upfront
|
||||
- Include skill directories for each
|
||||
|
||||
3. **Inject tool mapping instructions**
|
||||
```markdown
|
||||
**Tool Mapping for OpenCode:**
|
||||
When skills reference tools you don't have, substitute:
|
||||
- `TodoWrite` → `update_plan`
|
||||
- `Task` with subagents → Use OpenCode subagent system (@mention)
|
||||
- `Skill` tool → `use_skill` custom tool
|
||||
- Read, Write, Edit, Bash → Your native equivalents
|
||||
|
||||
**Skill directories contain:**
|
||||
- Supporting scripts (run with bash)
|
||||
- Additional documentation (read with read tool)
|
||||
- Utilities specific to that skill
|
||||
```
|
||||
|
||||
4. **Check for updates** (non-blocking)
|
||||
- Quick git fetch with timeout
|
||||
- Notify if updates available
|
||||
|
||||
### Plugin Structure
|
||||
|
||||
```javascript
|
||||
// .opencode/plugin/superpowers.js
|
||||
const skillsCore = require('../../lib/skills-core');
|
||||
const path = require('path');
|
||||
const fs = require('fs');
|
||||
const { z } = require('zod');
|
||||
|
||||
export const SuperpowersPlugin = async ({ client, directory, $ }) => {
|
||||
const superpowersDir = path.join(process.env.HOME, '.config/opencode/superpowers');
|
||||
const personalDir = path.join(process.env.HOME, '.config/opencode/skills');
|
||||
|
||||
return {
|
||||
'session.started': async () => {
|
||||
const usingSuperpowers = await readSkill('using-superpowers');
|
||||
const skillsList = await findAllSkills();
|
||||
const toolMapping = getToolMappingInstructions();
|
||||
|
||||
return {
|
||||
context: `${usingSuperpowers}\n\n${skillsList}\n\n${toolMapping}`
|
||||
};
|
||||
},
|
||||
|
||||
tools: [
|
||||
{
|
||||
name: 'use_skill',
|
||||
description: 'Load and read a specific skill',
|
||||
schema: z.object({
|
||||
skill_name: z.string()
|
||||
}),
|
||||
execute: async ({ skill_name }) => {
|
||||
// Implementation using skillsCore
|
||||
}
|
||||
},
|
||||
{
|
||||
name: 'find_skills',
|
||||
description: 'List all available skills',
|
||||
schema: z.object({}),
|
||||
execute: async () => {
|
||||
// Implementation using skillsCore
|
||||
}
|
||||
}
|
||||
]
|
||||
};
|
||||
};
|
||||
```
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
superpowers/
|
||||
├── lib/
|
||||
│ └── skills-core.js # NEW: Shared skill logic
|
||||
├── .codex/
|
||||
│ ├── superpowers-codex # UPDATED: Use skills-core
|
||||
│ ├── superpowers-bootstrap.md
|
||||
│ └── INSTALL.md
|
||||
├── .opencode/
|
||||
│ ├── plugin/
|
||||
│ │ └── superpowers.js # NEW: OpenCode plugin
|
||||
│ └── INSTALL.md # NEW: Installation guide
|
||||
└── skills/ # Unchanged
|
||||
```
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Refactor Shared Core
|
||||
|
||||
1. Create `lib/skills-core.js`
|
||||
- Extract frontmatter parsing from `.codex/superpowers-codex`
|
||||
- Extract skill discovery logic
|
||||
- Extract path resolution (with shadowing)
|
||||
- Update to use only `name` and `description` (no `when_to_use`)
|
||||
|
||||
2. Update `.codex/superpowers-codex` to use shared core
|
||||
- Import from `../lib/skills-core.js`
|
||||
- Remove duplicated code
|
||||
- Keep CLI wrapper logic
|
||||
|
||||
3. Test Codex implementation still works
|
||||
- Verify bootstrap command
|
||||
- Verify use-skill command
|
||||
- Verify find-skills command
|
||||
|
||||
### Phase 2: Build OpenCode Plugin
|
||||
|
||||
1. Create `.opencode/plugin/superpowers.js`
|
||||
- Import shared core from `../../lib/skills-core.js`
|
||||
- Implement plugin function
|
||||
- Define custom tools (use_skill, find_skills)
|
||||
- Implement session.started hook
|
||||
|
||||
2. Create `.opencode/INSTALL.md`
|
||||
- Installation instructions
|
||||
- Directory setup
|
||||
- Configuration guidance
|
||||
|
||||
3. Test OpenCode implementation
|
||||
- Verify session startup bootstrap
|
||||
- Verify use_skill tool works
|
||||
- Verify find_skills tool works
|
||||
- Verify skill directories are accessible
|
||||
|
||||
### Phase 3: Documentation & Polish
|
||||
|
||||
1. Update README with OpenCode support
|
||||
2. Add OpenCode installation to main docs
|
||||
3. Update RELEASE-NOTES
|
||||
4. Test both Codex and OpenCode work correctly
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Create isolated workspace** (using git worktrees)
|
||||
- Branch: `feature/opencode-support`
|
||||
|
||||
2. **Follow TDD where applicable**
|
||||
- Test shared core functions
|
||||
- Test skill discovery and parsing
|
||||
- Integration tests for both platforms
|
||||
|
||||
3. **Incremental implementation**
|
||||
- Phase 1: Refactor shared core + update Codex
|
||||
- Verify Codex still works before moving on
|
||||
- Phase 2: Build OpenCode plugin
|
||||
- Phase 3: Documentation and polish
|
||||
|
||||
4. **Testing strategy**
|
||||
- Manual testing with real OpenCode installation
|
||||
- Verify skill loading, directories, scripts work
|
||||
- Test both Codex and OpenCode side-by-side
|
||||
- Verify tool mappings work correctly
|
||||
|
||||
5. **PR and merge**
|
||||
- Create PR with complete implementation
|
||||
- Test in clean environment
|
||||
- Merge to main
|
||||
|
||||
## Benefits
|
||||
|
||||
- **Code reuse**: Single source of truth for skill discovery/parsing
|
||||
- **Maintainability**: Bug fixes apply to both platforms
|
||||
- **Extensibility**: Easy to add future platforms (Cursor, Windsurf, etc.)
|
||||
- **Native integration**: Uses OpenCode's plugin system properly
|
||||
- **Consistency**: Same skill experience across all platforms
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,711 @@
|
||||
# Skills Improvements from User Feedback
|
||||
|
||||
**Date:** 2025-11-28
|
||||
**Status:** Draft
|
||||
**Source:** Two Claude instances using superpowers in real development scenarios
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Two Claude instances provided detailed feedback from actual development sessions. Their feedback reveals **systematic gaps** in current skills that allowed preventable bugs to ship despite following the skills.
|
||||
|
||||
**Critical insight:** These are problem reports, not just solution proposals. The problems are real; the solutions need careful evaluation.
|
||||
|
||||
**Key themes:**
|
||||
1. **Verification gaps** - We verify operations succeed but not that they achieve intended outcomes
|
||||
2. **Process hygiene** - Background processes accumulate and interfere across subagents
|
||||
3. **Context optimization** - Subagents get too much irrelevant information
|
||||
4. **Self-reflection missing** - No prompt to critique own work before handoff
|
||||
5. **Mock safety** - Mocks can drift from interfaces without detection
|
||||
6. **Skill activation** - Skills exist but aren't being read/used
|
||||
|
||||
---
|
||||
|
||||
## Problems Identified
|
||||
|
||||
### Problem 1: Configuration Change Verification Gap
|
||||
|
||||
**What happened:**
|
||||
- Subagent tested "OpenAI integration"
|
||||
- Set `OPENAI_API_KEY` env var
|
||||
- Got status 200 responses
|
||||
- Reported "OpenAI integration working"
|
||||
- **BUT** response contained `"model": "claude-sonnet-4-20250514"` - was actually using Anthropic
|
||||
|
||||
**Root cause:**
|
||||
`verification-before-completion` checks operations succeed but not that outcomes reflect intended configuration changes.
|
||||
|
||||
**Impact:** High - False confidence in integration tests, bugs ship to production
|
||||
|
||||
**Example failure pattern:**
|
||||
- Switch LLM provider → verify status 200 but don't check model name
|
||||
- Enable feature flag → verify no errors but don't check feature is active
|
||||
- Change environment → verify deployment succeeds but don't check environment vars
|
||||
|
||||
---
|
||||
|
||||
### Problem 2: Background Process Accumulation
|
||||
|
||||
**What happened:**
|
||||
- Multiple subagents dispatched during session
|
||||
- Each started background server processes
|
||||
- Processes accumulated (4+ servers running)
|
||||
- Stale processes still bound to ports
|
||||
- Later E2E test hit stale server with wrong config
|
||||
- Confusing/incorrect test results
|
||||
|
||||
**Root cause:**
|
||||
Subagents are stateless - don't know about previous subagents' processes. No cleanup protocol.
|
||||
|
||||
**Impact:** Medium-High - Tests hit wrong server, false passes/failures, debugging confusion
|
||||
|
||||
---
|
||||
|
||||
### Problem 3: Context Bloat in Subagent Prompts
|
||||
|
||||
**What happened:**
|
||||
- Standard approach: give subagent full plan file to read
|
||||
- Experiment: give only task + pattern + file + verify command
|
||||
- Result: Faster, more focused, single-attempt completion more common
|
||||
|
||||
**Root cause:**
|
||||
Subagents waste tokens and attention on irrelevant plan sections.
|
||||
|
||||
**Impact:** Medium - Slower execution, more failed attempts
|
||||
|
||||
**What worked:**
|
||||
```
|
||||
You are adding a single E2E test to packnplay's test suite.
|
||||
|
||||
**Your task:** Add `TestE2E_FeaturePrivilegedMode` to `pkg/runner/e2e_test.go`
|
||||
|
||||
**What to test:** A local devcontainer feature that requests `"privileged": true`
|
||||
in its metadata should result in the container running with `--privileged` flag.
|
||||
|
||||
**Follow the exact pattern of TestE2E_FeatureOptionValidation** (at the end of the file)
|
||||
|
||||
**After writing, run:** `go test -v ./pkg/runner -run TestE2E_FeaturePrivilegedMode -timeout 5m`
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Problem 4: No Self-Reflection Before Handoff
|
||||
|
||||
**What happened:**
|
||||
- Added self-reflection prompt: "Look at your work with fresh eyes - what could be better?"
|
||||
- Implementer for Task 5 identified failing test was due to implementation bug, not test bug
|
||||
- Traced to line 99: `strings.Join(metadata.Entrypoint, " ")` creating invalid Docker syntax
|
||||
- Without self-reflection, would have just reported "test fails" without root cause
|
||||
|
||||
**Root cause:**
|
||||
Implementers don't naturally step back and critique their own work before reporting completion.
|
||||
|
||||
**Impact:** Medium - Bugs handed off to reviewer that implementer could have caught
|
||||
|
||||
---
|
||||
|
||||
### Problem 5: Mock-Interface Drift
|
||||
|
||||
**What happened:**
|
||||
```typescript
|
||||
// Interface defines close()
|
||||
interface PlatformAdapter {
|
||||
close(): Promise<void>;
|
||||
}
|
||||
|
||||
// Code (BUGGY) calls cleanup()
|
||||
await adapter.cleanup();
|
||||
|
||||
// Mock (MATCHES BUG) defines cleanup()
|
||||
vi.mock('web-adapter', () => ({
|
||||
WebAdapter: vi.fn().mockImplementation(() => ({
|
||||
cleanup: vi.fn().mockResolvedValue(undefined), // Wrong!
|
||||
})),
|
||||
}));
|
||||
```
|
||||
- Tests passed
|
||||
- Runtime crashed: "adapter.cleanup is not a function"
|
||||
|
||||
**Root cause:**
|
||||
Mock derived from what buggy code calls, not from interface definition. TypeScript can't catch inline mocks with wrong method names.
|
||||
|
||||
**Impact:** High - Tests give false confidence, runtime crashes
|
||||
|
||||
**Why testing-anti-patterns didn't prevent this:**
|
||||
The skill covers testing mock behavior and mocking without understanding, but not the specific pattern of "derive mock from interface, not implementation."
|
||||
|
||||
---
|
||||
|
||||
### Problem 6: Code Reviewer File Access
|
||||
|
||||
**What happened:**
|
||||
- Code reviewer subagent dispatched
|
||||
- Couldn't find test file: "The file doesn't appear to exist in the repository"
|
||||
- File actually exists
|
||||
- Reviewer didn't know to explicitly read it first
|
||||
|
||||
**Root cause:**
|
||||
Reviewer prompts don't include explicit file reading instructions.
|
||||
|
||||
**Impact:** Low-Medium - Reviews fail or incomplete
|
||||
|
||||
---
|
||||
|
||||
### Problem 7: Fix Workflow Latency
|
||||
|
||||
**What happened:**
|
||||
- Implementer identifies bug during self-reflection
|
||||
- Implementer knows the fix
|
||||
- Current workflow: report → I dispatch fixer → fixer fixes → I verify
|
||||
- Extra round-trip adds latency without adding value
|
||||
|
||||
**Root cause:**
|
||||
Rigid separation between implementer and fixer roles when implementer has already diagnosed.
|
||||
|
||||
**Impact:** Low - Latency, but no correctness issue
|
||||
|
||||
---
|
||||
|
||||
### Problem 8: Skills Not Being Read
|
||||
|
||||
**What happened:**
|
||||
- `testing-anti-patterns` skill exists
|
||||
- Neither human nor subagents read it before writing tests
|
||||
- Would have prevented some issues (though not all - see Problem 5)
|
||||
|
||||
**Root cause:**
|
||||
No enforcement that subagents read relevant skills. No prompt includes skill reading.
|
||||
|
||||
**Impact:** Medium - Skill investment wasted if not used
|
||||
|
||||
---
|
||||
|
||||
## Proposed Improvements
|
||||
|
||||
### 1. verification-before-completion: Add Configuration Change Verification
|
||||
|
||||
**Add new section:**
|
||||
|
||||
```markdown
|
||||
## Verifying Configuration Changes
|
||||
|
||||
When testing changes to configuration, providers, feature flags, or environment:
|
||||
|
||||
**Don't just verify the operation succeeded. Verify the output reflects the intended change.**
|
||||
|
||||
### Common Failure Pattern
|
||||
|
||||
Operation succeeds because *some* valid config exists, but it's not the config you intended to test.
|
||||
|
||||
### Examples
|
||||
|
||||
| Change | Insufficient | Required |
|
||||
|--------|-------------|----------|
|
||||
| Switch LLM provider | Status 200 | Response contains expected model name |
|
||||
| Enable feature flag | No errors | Feature behavior actually active |
|
||||
| Change environment | Deploy succeeds | Logs/vars reference new environment |
|
||||
| Set credentials | Auth succeeds | Authenticated user/context is correct |
|
||||
|
||||
### Gate Function
|
||||
|
||||
```
|
||||
BEFORE claiming configuration change works:
|
||||
|
||||
1. IDENTIFY: What should be DIFFERENT after this change?
|
||||
2. LOCATE: Where is that difference observable?
|
||||
- Response field (model name, user ID)
|
||||
- Log line (environment, provider)
|
||||
- Behavior (feature active/inactive)
|
||||
3. RUN: Command that shows the observable difference
|
||||
4. VERIFY: Output contains expected difference
|
||||
5. ONLY THEN: Claim configuration change works
|
||||
|
||||
Red flags:
|
||||
- "Request succeeded" without checking content
|
||||
- Checking status code but not response body
|
||||
- Verifying no errors but not positive confirmation
|
||||
```
|
||||
|
||||
**Why this works:**
|
||||
Forces verification of INTENT, not just operation success.
|
||||
|
||||
---
|
||||
|
||||
### 2. subagent-driven-development: Add Process Hygiene for E2E Tests
|
||||
|
||||
**Add new section:**
|
||||
|
||||
```markdown
|
||||
## Process Hygiene for E2E Tests
|
||||
|
||||
When dispatching subagents that start services (servers, databases, message queues):
|
||||
|
||||
### Problem
|
||||
|
||||
Subagents are stateless - they don't know about processes started by previous subagents. Background processes persist and can interfere with later tests.
|
||||
|
||||
### Solution
|
||||
|
||||
**Before dispatching E2E test subagent, include cleanup in prompt:**
|
||||
|
||||
```
|
||||
BEFORE starting any services:
|
||||
1. Kill existing processes: pkill -f "<service-pattern>" 2>/dev/null || true
|
||||
2. Wait for cleanup: sleep 1
|
||||
3. Verify port free: lsof -i :<port> && echo "ERROR: Port still in use" || echo "Port free"
|
||||
|
||||
AFTER tests complete:
|
||||
1. Kill the process you started
|
||||
2. Verify cleanup: pgrep -f "<service-pattern>" || echo "Cleanup successful"
|
||||
```
|
||||
|
||||
### Example
|
||||
|
||||
```
|
||||
Task: Run E2E test of API server
|
||||
|
||||
Prompt includes:
|
||||
"Before starting the server:
|
||||
- Kill any existing servers: pkill -f 'node.*server.js' 2>/dev/null || true
|
||||
- Verify port 3001 is free: lsof -i :3001 && exit 1 || echo 'Port available'
|
||||
|
||||
After tests:
|
||||
- Kill the server you started
|
||||
- Verify: pgrep -f 'node.*server.js' || echo 'Cleanup verified'"
|
||||
```
|
||||
|
||||
### Why This Matters
|
||||
|
||||
- Stale processes serve requests with wrong config
|
||||
- Port conflicts cause silent failures
|
||||
- Process accumulation slows system
|
||||
- Confusing test results (hitting wrong server)
|
||||
```
|
||||
|
||||
**Trade-off analysis:**
|
||||
- Adds boilerplate to prompts
|
||||
- But prevents very confusing debugging
|
||||
- Worth it for E2E test subagents
|
||||
|
||||
---
|
||||
|
||||
### 3. subagent-driven-development: Add Lean Context Option
|
||||
|
||||
**Modify Step 2: Execute Task with Subagent**
|
||||
|
||||
**Before:**
|
||||
```
|
||||
Read that task carefully from [plan-file].
|
||||
```
|
||||
|
||||
**After:**
|
||||
```
|
||||
## Context Approaches
|
||||
|
||||
**Full Plan (default):**
|
||||
Use when tasks are complex or have dependencies:
|
||||
```
|
||||
Read Task N from [plan-file] carefully.
|
||||
```
|
||||
|
||||
**Lean Context (for independent tasks):**
|
||||
Use when task is standalone and pattern-based:
|
||||
```
|
||||
You are implementing: [1-2 sentence task description]
|
||||
|
||||
File to modify: [exact path]
|
||||
Pattern to follow: [reference to existing function/test]
|
||||
What to implement: [specific requirement]
|
||||
Verification: [exact command to run]
|
||||
|
||||
[Do NOT include full plan file]
|
||||
```
|
||||
|
||||
**Use lean context when:**
|
||||
- Task follows existing pattern (add similar test, implement similar feature)
|
||||
- Task is self-contained (doesn't need context from other tasks)
|
||||
- Pattern reference is sufficient (e.g., "follow TestE2E_FeatureOptionValidation")
|
||||
|
||||
**Use full plan when:**
|
||||
- Task has dependencies on other tasks
|
||||
- Requires understanding of overall architecture
|
||||
- Complex logic that needs context
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```
|
||||
Lean context prompt:
|
||||
|
||||
"You are adding a test for privileged mode in devcontainer features.
|
||||
|
||||
File: pkg/runner/e2e_test.go
|
||||
Pattern: Follow TestE2E_FeatureOptionValidation (at end of file)
|
||||
Test: Feature with `"privileged": true` in metadata results in `--privileged` flag
|
||||
Verify: go test -v ./pkg/runner -run TestE2E_FeaturePrivilegedMode -timeout 5m
|
||||
|
||||
Report: Implementation, test results, any issues."
|
||||
```
|
||||
|
||||
**Why this works:**
|
||||
Reduces token usage, increases focus, faster completion when appropriate.
|
||||
|
||||
---
|
||||
|
||||
### 4. subagent-driven-development: Add Self-Reflection Step
|
||||
|
||||
**Modify Step 2: Execute Task with Subagent**
|
||||
|
||||
**Add to prompt template:**
|
||||
|
||||
```
|
||||
When done, BEFORE reporting back:
|
||||
|
||||
Take a step back and review your work with fresh eyes.
|
||||
|
||||
Ask yourself:
|
||||
- Does this actually solve the task as specified?
|
||||
- Are there edge cases I didn't consider?
|
||||
- Did I follow the pattern correctly?
|
||||
- If tests are failing, what's the ROOT CAUSE (implementation bug vs test bug)?
|
||||
- What could be better about this implementation?
|
||||
|
||||
If you identify issues during this reflection, fix them now.
|
||||
|
||||
Then report:
|
||||
- What you implemented
|
||||
- Self-reflection findings (if any)
|
||||
- Test results
|
||||
- Files changed
|
||||
```
|
||||
|
||||
**Why this works:**
|
||||
Catches bugs implementer can find themselves before handoff. Documented case: identified entrypoint bug through self-reflection.
|
||||
|
||||
**Trade-off:**
|
||||
Adds ~30 seconds per task, but catches issues before review.
|
||||
|
||||
---
|
||||
|
||||
### 5. requesting-code-review: Add Explicit File Reading
|
||||
|
||||
**Modify the code-reviewer template:**
|
||||
|
||||
**Add at the beginning:**
|
||||
|
||||
```markdown
|
||||
## Files to Review
|
||||
|
||||
BEFORE analyzing, read these files:
|
||||
|
||||
1. [List specific files that changed in the diff]
|
||||
2. [Files referenced by changes but not modified]
|
||||
|
||||
Use Read tool to load each file.
|
||||
|
||||
If you cannot find a file:
|
||||
- Check exact path from diff
|
||||
- Try alternate locations
|
||||
- Report: "Cannot locate [path] - please verify file exists"
|
||||
|
||||
DO NOT proceed with review until you've read the actual code.
|
||||
```
|
||||
|
||||
**Why this works:**
|
||||
Explicit instruction prevents "file not found" issues.
|
||||
|
||||
---
|
||||
|
||||
### 6. testing-anti-patterns: Add Mock-Interface Drift Anti-Pattern
|
||||
|
||||
**Add new Anti-Pattern 6:**
|
||||
|
||||
```markdown
|
||||
## Anti-Pattern 6: Mocks Derived from Implementation
|
||||
|
||||
**The violation:**
|
||||
```typescript
|
||||
// Code (BUGGY) calls cleanup()
|
||||
await adapter.cleanup();
|
||||
|
||||
// Mock (MATCHES BUG) has cleanup()
|
||||
const mock = {
|
||||
cleanup: vi.fn().mockResolvedValue(undefined)
|
||||
};
|
||||
|
||||
// Interface (CORRECT) defines close()
|
||||
interface PlatformAdapter {
|
||||
close(): Promise<void>;
|
||||
}
|
||||
```
|
||||
|
||||
**Why this is wrong:**
|
||||
- Mock encodes the bug into the test
|
||||
- TypeScript can't catch inline mocks with wrong method names
|
||||
- Test passes because both code and mock are wrong
|
||||
- Runtime crashes when real object is used
|
||||
|
||||
**The fix:**
|
||||
```typescript
|
||||
// ✅ GOOD: Derive mock from interface
|
||||
|
||||
// Step 1: Open interface definition (PlatformAdapter)
|
||||
// Step 2: List methods defined there (close, initialize, etc.)
|
||||
// Step 3: Mock EXACTLY those methods
|
||||
|
||||
const mock = {
|
||||
initialize: vi.fn().mockResolvedValue(undefined),
|
||||
close: vi.fn().mockResolvedValue(undefined), // From interface!
|
||||
};
|
||||
|
||||
// Now test FAILS because code calls cleanup() which doesn't exist
|
||||
// That failure reveals the bug BEFORE runtime
|
||||
```
|
||||
|
||||
### Gate Function
|
||||
|
||||
```
|
||||
BEFORE writing any mock:
|
||||
|
||||
1. STOP - Do NOT look at the code under test yet
|
||||
2. FIND: The interface/type definition for the dependency
|
||||
3. READ: The interface file
|
||||
4. LIST: Methods defined in the interface
|
||||
5. MOCK: ONLY those methods with EXACTLY those names
|
||||
6. DO NOT: Look at what your code calls
|
||||
|
||||
IF your test fails because code calls something not in mock:
|
||||
✅ GOOD - The test found a bug in your code
|
||||
Fix the code to call the correct interface method
|
||||
NOT the mock
|
||||
|
||||
Red flags:
|
||||
- "I'll mock what the code calls"
|
||||
- Copying method names from implementation
|
||||
- Mock written without reading interface
|
||||
- "The test is failing so I'll add this method to the mock"
|
||||
```
|
||||
|
||||
**Detection:**
|
||||
|
||||
When you see runtime error "X is not a function" and tests pass:
|
||||
1. Check if X is mocked
|
||||
2. Compare mock methods to interface methods
|
||||
3. Look for method name mismatches
|
||||
```
|
||||
|
||||
**Why this works:**
|
||||
Directly addresses the failure pattern from feedback.
|
||||
|
||||
---
|
||||
|
||||
### 7. subagent-driven-development: Require Skills Reading for Test Subagents
|
||||
|
||||
**Add to prompt template when task involves testing:**
|
||||
|
||||
```markdown
|
||||
BEFORE writing any tests:
|
||||
|
||||
1. Read testing-anti-patterns skill:
|
||||
Use Skill tool: superpowers:testing-anti-patterns
|
||||
|
||||
2. Apply gate functions from that skill when:
|
||||
- Writing mocks
|
||||
- Adding methods to production classes
|
||||
- Mocking dependencies
|
||||
|
||||
This is NOT optional. Tests that violate anti-patterns will be rejected in review.
|
||||
```
|
||||
|
||||
**Why this works:**
|
||||
Ensures skills are actually used, not just exist.
|
||||
|
||||
**Trade-off:**
|
||||
Adds time to each task, but prevents entire classes of bugs.
|
||||
|
||||
---
|
||||
|
||||
### 8. subagent-driven-development: Allow Implementer to Fix Self-Identified Issues
|
||||
|
||||
**Modify Step 2:**
|
||||
|
||||
**Current:**
|
||||
```
|
||||
Subagent reports back with summary of work.
|
||||
```
|
||||
|
||||
**Proposed:**
|
||||
```
|
||||
Subagent performs self-reflection, then:
|
||||
|
||||
IF self-reflection identifies fixable issues:
|
||||
1. Fix the issues
|
||||
2. Re-run verification
|
||||
3. Report: "Initial implementation + self-reflection fix"
|
||||
|
||||
ELSE:
|
||||
Report: "Implementation complete"
|
||||
|
||||
Include in report:
|
||||
- Self-reflection findings
|
||||
- Whether fixes were applied
|
||||
- Final verification results
|
||||
```
|
||||
|
||||
**Why this works:**
|
||||
Reduces latency when implementer already knows the fix. Documented case: would have saved one round-trip for entrypoint bug.
|
||||
|
||||
**Trade-off:**
|
||||
Slightly more complex prompt, but faster end-to-end.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: High-Impact, Low-Risk (Do First)
|
||||
|
||||
1. **verification-before-completion: Configuration change verification**
|
||||
- Clear addition, doesn't change existing content
|
||||
- Addresses high-impact problem (false confidence in tests)
|
||||
- File: `skills/verification-before-completion/SKILL.md`
|
||||
|
||||
2. **testing-anti-patterns: Mock-interface drift**
|
||||
- Adds new anti-pattern, doesn't modify existing
|
||||
- Addresses high-impact problem (runtime crashes)
|
||||
- File: `skills/testing-anti-patterns/SKILL.md`
|
||||
|
||||
3. **requesting-code-review: Explicit file reading**
|
||||
- Simple addition to template
|
||||
- Fixes concrete problem (reviewers can't find files)
|
||||
- File: `skills/requesting-code-review/SKILL.md`
|
||||
|
||||
### Phase 2: Moderate Changes (Test Carefully)
|
||||
|
||||
4. **subagent-driven-development: Process hygiene**
|
||||
- Adds new section, doesn't change workflow
|
||||
- Addresses medium-high impact (test reliability)
|
||||
- File: `skills/subagent-driven-development/SKILL.md`
|
||||
|
||||
5. **subagent-driven-development: Self-reflection**
|
||||
- Changes prompt template (higher risk)
|
||||
- But documented to catch bugs
|
||||
- File: `skills/subagent-driven-development/SKILL.md`
|
||||
|
||||
6. **subagent-driven-development: Skills reading requirement**
|
||||
- Adds prompt overhead
|
||||
- But ensures skills are actually used
|
||||
- File: `skills/subagent-driven-development/SKILL.md`
|
||||
|
||||
### Phase 3: Optimization (Validate First)
|
||||
|
||||
7. **subagent-driven-development: Lean context option**
|
||||
- Adds complexity (two approaches)
|
||||
- Needs validation that it doesn't cause confusion
|
||||
- File: `skills/subagent-driven-development/SKILL.md`
|
||||
|
||||
8. **subagent-driven-development: Allow implementer to fix**
|
||||
- Changes workflow (higher risk)
|
||||
- Optimization, not bug fix
|
||||
- File: `skills/subagent-driven-development/SKILL.md`
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Lean context approach:**
|
||||
- Should we make it the default for pattern-based tasks?
|
||||
- How do we decide which approach to use?
|
||||
- Risk of being too lean and missing important context?
|
||||
|
||||
2. **Self-reflection:**
|
||||
- Will this slow down simple tasks significantly?
|
||||
- Should it only apply to complex tasks?
|
||||
- How do we prevent "reflection fatigue" where it becomes rote?
|
||||
|
||||
3. **Process hygiene:**
|
||||
- Should this be in subagent-driven-development or a separate skill?
|
||||
- Does it apply to other workflows beyond E2E tests?
|
||||
- How do we handle cases where process SHOULD persist (dev servers)?
|
||||
|
||||
4. **Skills reading enforcement:**
|
||||
- Should we require ALL subagents to read relevant skills?
|
||||
- How do we keep prompts from becoming too long?
|
||||
- Risk of over-documenting and losing focus?
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
How do we know these improvements work?
|
||||
|
||||
1. **Configuration verification:**
|
||||
- Zero instances of "test passed but wrong config was used"
|
||||
- Jesse doesn't say "that's not actually testing what you think"
|
||||
|
||||
2. **Process hygiene:**
|
||||
- Zero instances of "test hit wrong server"
|
||||
- No port conflict errors during E2E test runs
|
||||
|
||||
3. **Mock-interface drift:**
|
||||
- Zero instances of "tests pass but runtime crashes on missing method"
|
||||
- No method name mismatches between mocks and interfaces
|
||||
|
||||
4. **Self-reflection:**
|
||||
- Measurable: Do implementer reports include self-reflection findings?
|
||||
- Qualitative: Do fewer bugs make it to code review?
|
||||
|
||||
5. **Skills reading:**
|
||||
- Subagent reports reference skill gate functions
|
||||
- Fewer anti-pattern violations in code review
|
||||
|
||||
---
|
||||
|
||||
## Risks and Mitigations
|
||||
|
||||
### Risk: Prompt Bloat
|
||||
**Problem:** Adding all these requirements makes prompts overwhelming
|
||||
**Mitigation:**
|
||||
- Phase implementation (don't add everything at once)
|
||||
- Make some additions conditional (E2E hygiene only for E2E tests)
|
||||
- Consider templates for different task types
|
||||
|
||||
### Risk: Analysis Paralysis
|
||||
**Problem:** Too much reflection/verification slows execution
|
||||
**Mitigation:**
|
||||
- Keep gate functions quick (seconds, not minutes)
|
||||
- Make lean context opt-in initially
|
||||
- Monitor task completion times
|
||||
|
||||
### Risk: False Sense of Security
|
||||
**Problem:** Following checklist doesn't guarantee correctness
|
||||
**Mitigation:**
|
||||
- Emphasize gate functions are minimums, not maximums
|
||||
- Keep "use judgment" language in skills
|
||||
- Document that skills catch common failures, not all failures
|
||||
|
||||
### Risk: Skill Divergence
|
||||
**Problem:** Different skills give conflicting advice
|
||||
**Mitigation:**
|
||||
- Review changes across all skills for consistency
|
||||
- Document how skills interact (Integration sections)
|
||||
- Test with real scenarios before deployment
|
||||
|
||||
---
|
||||
|
||||
## Recommendation
|
||||
|
||||
**Proceed with Phase 1 immediately:**
|
||||
- verification-before-completion: Configuration change verification
|
||||
- testing-anti-patterns: Mock-interface drift
|
||||
- requesting-code-review: Explicit file reading
|
||||
|
||||
**Test Phase 2 with Jesse before finalizing:**
|
||||
- Get feedback on self-reflection impact
|
||||
- Validate process hygiene approach
|
||||
- Confirm skills reading requirement is worth overhead
|
||||
|
||||
**Hold Phase 3 pending validation:**
|
||||
- Lean context needs real-world testing
|
||||
- Implementer-fix workflow change needs careful evaluation
|
||||
|
||||
These changes address real problems documented by users while minimizing risk of making skills worse.
|
||||
Reference in New Issue
Block a user