Files
GLM-Tools-Skills-Agents/skills/community/dyad/deflake-e2e/SKILL.md
uroma b60638f0a3 Add community skills, agents, system prompts from 22+ sources
Community Skills (32):
- jat: jat-start, jat-verify, jat-complete
- pi-mono: codex-cli, codex-5.3-prompting, interactive-shell
- picoclaw: github, weather, tmux, summarize, skill-creator
- dyad: 18 skills (swarm-to-plan, multi-pr-review, fix-issue, lint, etc.)
- dexter: dcf valuation skill

Agents (23):
- pi-mono subagents: scout, planner, reviewer, worker
- toad: 19 agent configs (Claude, Codex, Gemini, Copilot, OpenCode, etc.)

System Prompts (91):
- Anthropic: 15 Claude prompts (opus-4.6, code, cowork, etc.)
- OpenAI: 49 GPT prompts (gpt-5 series, o3, o4-mini, tools)
- Google: 13 Gemini prompts (2.5-pro, 3-pro, workspace, cli)
- xAI: 5 Grok prompts
- Other: 9 misc prompts (Notion, Raycast, Warp, Kagi, etc.)

Hooks (9):
- JAT hooks for session management, signal tracking, activity logging

Prompts (6):
- pi-mono templates for PR review, issue analysis, changelog audit

Sources analyzed: jat, ralph-desktop, toad, pi-mono, cmux, pi-interactive-shell,
craft-agents-oss, dexter, picoclaw, dyad, system_prompts_leaks, Prometheus,
zed, clawdbot, OS-Copilot, and more
2026-02-13 10:58:17 +00:00

3.8 KiB

name, description
name description
dyad:deflake-e2e Identify and fix flaky E2E tests by running them repeatedly and investigating failures.

Deflake E2E Tests

Identify and fix flaky E2E tests by running them repeatedly and investigating failures.

Arguments

  • $ARGUMENTS: (Optional) Specific E2E test file(s) to deflake (e.g., main.spec.ts or e2e-tests/main.spec.ts). If not provided, will prompt to deflake the entire test suite.

Instructions

  1. Check if specific tests are provided:

    If $ARGUMENTS is empty or not provided, ask the user:

    "No specific tests provided. Do you want to deflake the entire E2E test suite? This can take a very long time as each test will be run 10 times."

    Wait for user confirmation before proceeding. If they decline, ask them to provide specific test files.

  2. Install dependencies:

    npm install
    
  3. Build the app binary:

    npm run build
    

    IMPORTANT: This step is required before running E2E tests. E2E tests run against the built binary. If you make any changes to application code (anything outside of e2e-tests/), you MUST re-run npm run build before running E2E tests again, otherwise you'll be testing the old version.

  4. Run tests repeatedly to detect flakiness:

    For each test file, run it 10 times:

    PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.ts --repeat-each=10
    

    IMPORTANT: PLAYWRIGHT_RETRIES=0 is required to disable automatic retries. Without it, CI environments (where CI=true) default to 2 retries, causing flaky tests to pass on retry and be incorrectly skipped as "not flaky."

    Notes:

    • If $ARGUMENTS is provided without the e2e-tests/ prefix, add it
    • If $ARGUMENTS is provided without the .spec.ts suffix, add it
    • A test is considered flaky if it fails at least once out of 10 runs
  5. For each flaky test, investigate with debug logs:

    Run the failing test with Playwright browser debugging enabled:

    DEBUG=pw:browser PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.ts
    

    Analyze the debug output to understand:

    • Timing issues (race conditions, elements not ready)
    • Animation/transition interference
    • Network timing variability
    • State leaking between tests
    • Snapshot comparison differences
  6. Fix the flaky test:

    Common fixes following Playwright best practices:

    • Use await expect(locator).toBeVisible() before interacting with elements
    • Use await page.waitForLoadState('networkidle') for network-dependent tests
    • Use stable selectors (data-testid, role, text) instead of fragile CSS selectors
    • Add explicit waits for animations: await page.waitForTimeout(300) (use sparingly)
    • Use await expect(locator).toHaveScreenshot() options like maxDiffPixelRatio for visual tests
    • Ensure proper test isolation (clean state before/after tests)

    IMPORTANT: Do NOT change any application code. Assume the application code is correct. Only modify test files and snapshot baselines.

  7. Update snapshot baselines if needed:

    If the flakiness is due to legitimate visual differences:

    PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.ts --update-snapshots
    
  8. Verify the fix:

    Re-run the test 10 times to confirm it's no longer flaky:

    PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/<testfile>.spec.ts --repeat-each=10
    

    The test should pass all 10 runs consistently.

  9. Summarize results:

    Report to the user:

    • Which tests were identified as flaky
    • What was causing the flakiness
    • What fixes were applied
    • Verification results (all 10 runs passing)
    • Any tests that could not be fixed and need further investigation