Community Skills (32): - jat: jat-start, jat-verify, jat-complete - pi-mono: codex-cli, codex-5.3-prompting, interactive-shell - picoclaw: github, weather, tmux, summarize, skill-creator - dyad: 18 skills (swarm-to-plan, multi-pr-review, fix-issue, lint, etc.) - dexter: dcf valuation skill Agents (23): - pi-mono subagents: scout, planner, reviewer, worker - toad: 19 agent configs (Claude, Codex, Gemini, Copilot, OpenCode, etc.) System Prompts (91): - Anthropic: 15 Claude prompts (opus-4.6, code, cowork, etc.) - OpenAI: 49 GPT prompts (gpt-5 series, o3, o4-mini, tools) - Google: 13 Gemini prompts (2.5-pro, 3-pro, workspace, cli) - xAI: 5 Grok prompts - Other: 9 misc prompts (Notion, Raycast, Warp, Kagi, etc.) Hooks (9): - JAT hooks for session management, signal tracking, activity logging Prompts (6): - pi-mono templates for PR review, issue analysis, changelog audit Sources analyzed: jat, ralph-desktop, toad, pi-mono, cmux, pi-interactive-shell, craft-agents-oss, dexter, picoclaw, dyad, system_prompts_leaks, Prometheus, zed, clawdbot, OS-Copilot, and more
3.8 KiB
name, description
| name | description |
|---|---|
| dyad:deflake-e2e | Identify and fix flaky E2E tests by running them repeatedly and investigating failures. |
Deflake E2E Tests
Identify and fix flaky E2E tests by running them repeatedly and investigating failures.
Arguments
$ARGUMENTS: (Optional) Specific E2E test file(s) to deflake (e.g.,main.spec.tsore2e-tests/main.spec.ts). If not provided, will prompt to deflake the entire test suite.
Instructions
-
Check if specific tests are provided:
If
$ARGUMENTSis empty or not provided, ask the user:"No specific tests provided. Do you want to deflake the entire E2E test suite? This can take a very long time as each test will be run 10 times."
Wait for user confirmation before proceeding. If they decline, ask them to provide specific test files.
-
Install dependencies:
npm install -
Build the app binary:
npm run buildIMPORTANT: This step is required before running E2E tests. E2E tests run against the built binary. If you make any changes to application code (anything outside of
e2e-tests/), you MUST re-runnpm run buildbefore running E2E tests again, otherwise you'll be testing the old version. -
Run tests repeatedly to detect flakiness:
For each test file, run it 10 times:
PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.ts --repeat-each=10IMPORTANT:
PLAYWRIGHT_RETRIES=0is required to disable automatic retries. Without it, CI environments (whereCI=true) default to 2 retries, causing flaky tests to pass on retry and be incorrectly skipped as "not flaky."Notes:
- If
$ARGUMENTSis provided without thee2e-tests/prefix, add it - If
$ARGUMENTSis provided without the.spec.tssuffix, add it - A test is considered flaky if it fails at least once out of 10 runs
- If
-
For each flaky test, investigate with debug logs:
Run the failing test with Playwright browser debugging enabled:
DEBUG=pw:browser PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.tsAnalyze the debug output to understand:
- Timing issues (race conditions, elements not ready)
- Animation/transition interference
- Network timing variability
- State leaking between tests
- Snapshot comparison differences
-
Fix the flaky test:
Common fixes following Playwright best practices:
- Use
await expect(locator).toBeVisible()before interacting with elements - Use
await page.waitForLoadState('networkidle')for network-dependent tests - Use stable selectors (data-testid, role, text) instead of fragile CSS selectors
- Add explicit waits for animations:
await page.waitForTimeout(300)(use sparingly) - Use
await expect(locator).toHaveScreenshot()options likemaxDiffPixelRatiofor visual tests - Ensure proper test isolation (clean state before/after tests)
IMPORTANT: Do NOT change any application code. Assume the application code is correct. Only modify test files and snapshot baselines.
- Use
-
Update snapshot baselines if needed:
If the flakiness is due to legitimate visual differences:
PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.ts --update-snapshots -
Verify the fix:
Re-run the test 10 times to confirm it's no longer flaky:
PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.ts --repeat-each=10The test should pass all 10 runs consistently.
-
Summarize results:
Report to the user:
- Which tests were identified as flaky
- What was causing the flakiness
- What fixes were applied
- Verification results (all 10 runs passing)
- Any tests that could not be fixed and need further investigation