Reorganize: Move all skills to skills/ folder

- Created skills/ directory - Moved 272 skills to skills/ subfolder - Kept agents/ at root level - Kept installation scripts and docs at root level Repository structure: - skills/ - All 272 skills from skills.sh - agents/ - Agent definitions - *.sh, *.ps1 - Installation scripts - README.md, etc. - Documentation Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-23 18:05:17 +00:00
parent 2b4e974878
commit b723e2bd7d
4083 changed files with 1056 additions and 1098063 deletions
--- a/skills/skill-judge/skill.md
+++ b/skills/skill-judge/skill.md
@@ -0,0 +1,752 @@
+---
+name: skill-judge
+description: Evaluate Agent Skill design quality against official specifications and best practices. Use when reviewing, auditing, or improving SKILL.md files and skill packages. Provides multi-dimensional scoring and actionable improvement suggestions.
+---
+
+# Skill Judge
+
+Evaluate Agent Skills against official specifications and patterns derived from 17+ official examples.
+
+---
+
+## Core Philosophy
+
+### What is a Skill?
+
+A Skill is NOT a tutorial. A Skill is a **knowledge externalization mechanism**.
+
+Traditional AI knowledge is locked in model parameters. To teach new capabilities:
+```
+Traditional: Collect data → GPU cluster → Train → Deploy new version
+Cost: $10,000 - $1,000,000+
+Timeline: Weeks to months
+```
+
+Skills change this:
+```
+Skill: Edit SKILL.md → Save → Takes effect on next invocation
+Cost: $0
+Timeline: Instant
+```
+
+This is the paradigm shift from "training AI" to "educating AI" — like a hot-swappable LoRA adapter that requires no training. You edit a Markdown file in natural language, and the model's behavior changes.
+
+### The Core Formula
+
+> **Good Skill = Expert-only Knowledge − What Claude Already Knows**
+
+A Skill's value is measured by its **knowledge delta** — the gap between what it provides and what the model already knows.
+
+- **Expert-only knowledge**: Decision trees, trade-offs, edge cases, anti-patterns, domain-specific thinking frameworks — things that take years of experience to accumulate
+- **What Claude already knows**: Basic concepts, standard library usage, common programming patterns, general best practices
+
+When a Skill explains "what is PDF" or "how to write a for-loop", it's compressing knowledge Claude already has. This is **token waste** — context window is a public resource shared with system prompts, conversation history, other Skills, and user requests.
+
+### Tool vs Skill
+
+| Concept | Essence | Function | Example |
+|---------|---------|----------|---------|
+| **Tool** | What model CAN do | Execute actions | bash, read_file, write_file, WebSearch |
+| **Skill** | What model KNOWS how to do | Guide decisions | PDF processing, MCP building, frontend design |
+
+Tools define capability boundaries — without bash tool, model can't execute commands.
+Skills inject knowledge — without frontend-design Skill, model produces generic UI.
+
+**The equation**:
+```
+General Agent + Excellent Skill = Domain Expert Agent
+```
+
+Same Claude model, different Skills loaded, becomes different experts.
+
+### Three Types of Knowledge in Skills
+
+When evaluating, categorize each section:
+
+| Type | Definition | Treatment |
+|------|------------|-----------|
+| **Expert** | Claude genuinely doesn't know this | Must keep — this is the Skill's value |
+| **Activation** | Claude knows but may not think of | Keep if brief — serves as reminder |
+| **Redundant** | Claude definitely knows this | Should delete — wastes tokens |
+
+The art of Skill design is maximizing Expert content, using Activation sparingly, and eliminating Redundant ruthlessly.
+
+---
+
+## Evaluation Dimensions (120 points total)
+
+### D1: Knowledge Delta (20 points) — THE CORE DIMENSION
+
+The most important dimension. Does the Skill add genuine expert knowledge?
+
+| Score | Criteria |
+|-------|----------|
+| 0-5 | Explains basics Claude knows (what is X, how to write code, standard library tutorials) |
+| 6-10 | Mixed: some expert knowledge diluted by obvious content |
+| 11-15 | Mostly expert knowledge with minimal redundancy |
+| 16-20 | Pure knowledge delta — every paragraph earns its tokens |
+
+**Red flags** (instant score ≤5):
+- "What is [basic concept]" sections
+- Step-by-step tutorials for standard operations
+- Explaining how to use common libraries
+- Generic best practices ("write clean code", "handle errors")
+- Definitions of industry-standard terms
+
+**Green flags** (indicators of high knowledge delta):
+- Decision trees for non-obvious choices ("when X fails, try Y because Z")
+- Trade-offs only an expert would know ("A is faster but B handles edge case C")
+- Edge cases from real-world experience
+- "NEVER do X because [non-obvious reason]"
+- Domain-specific thinking frameworks
+
+**Evaluation questions**:
+1. For each section, ask: "Does Claude already know this?"
+2. If explaining something, ask: "Is this explaining TO Claude or FOR Claude?"
+3. Count paragraphs that are Expert vs Activation vs Redundant
+
+---
+
+### D2: Mindset + Appropriate Procedures (15 points)
+
+Does the Skill transfer expert **thinking patterns** along with **necessary domain-specific procedures**?
+
+The difference between experts and novices isn't "knowing how to operate" — it's "how to think about the problem." But thinking patterns alone aren't enough when Claude lacks domain-specific procedural knowledge.
+
+**Key distinction**:
+| Type | Example | Value |
+|------|---------|-------|
+| **Thinking patterns** | "Before designing, ask: What makes this memorable?" | High — shapes decision-making |
+| **Domain-specific procedures** | "OOXML workflow: unpack → edit XML → validate → pack" | High — Claude may not know this |
+| **Generic procedures** | "Step 1: Open file, Step 2: Edit, Step 3: Save" | Low — Claude already knows |
+
+| Score | Criteria |
+|-------|----------|
+| 0-3 | Only generic procedures Claude already knows |
+| 4-7 | Has domain procedures but lacks thinking frameworks |
+| 8-11 | Good balance: thinking patterns + domain-specific workflows |
+| 12-15 | Expert-level: shapes thinking AND provides procedures Claude wouldn't know |
+
+**What counts as valuable procedures**:
+- Workflows Claude hasn't been trained on (new tools, proprietary systems)
+- Correct ordering that's non-obvious (e.g., "validate BEFORE packing, not after")
+- Critical steps that are easy to miss (e.g., "MUST recalculate formulas after editing")
+- Domain-specific sequences (e.g., MCP server's 4-phase development process)
+
+**What counts as redundant procedures**:
+- Generic file operations (open, read, write, save)
+- Standard programming patterns (loops, conditionals, error handling)
+- Common library usage that's well-documented
+
+**Expert thinking patterns look like**:
+```markdown
+Before [action], ask yourself:
+- **Purpose**: What problem does this solve? Who uses it?
+- **Constraints**: What are the hidden requirements?
+- **Differentiation**: What makes this solution memorable?
+```
+
+**Valuable domain procedures look like**:
+```markdown
+### Redlining Workflow (Claude wouldn't know this sequence)
+1. Convert to markdown: `pandoc --track-changes=all`
+2. Map text to XML: grep for text in document.xml
+3. Implement changes in batches of 3-10
+4. Pack and verify: check ALL changes were applied
+```
+
+**Redundant generic procedures look like**:
+```markdown
+Step 1: Open the file
+Step 2: Find the section
+Step 3: Make the change
+Step 4: Save and test
+```
+
+**The test**:
+1. Does it tell Claude WHAT to think about? (thinking patterns)
+2. Does it tell Claude HOW to do things it wouldn't know? (domain procedures)
+
+A good Skill provides both when needed.
+
+---
+
+### D3: Anti-Pattern Quality (15 points)
+
+Does the Skill have effective NEVER lists?
+
+**Why this matters**: Half of expert knowledge is knowing what NOT to do. A senior designer sees purple gradient on white background and instinctively cringes — "too AI-generated." This intuition for "what absolutely not to do" comes from stepping on countless landmines.
+
+Claude hasn't stepped on these landmines. It doesn't know Inter font is overused, doesn't know purple gradients are the signature of AI-generated content. Good Skills must explicitly state these "absolute don'ts."
+
+| Score | Criteria |
+|-------|----------|
+| 0-3 | No anti-patterns mentioned |
+| 4-7 | Generic warnings ("avoid errors", "be careful", "consider edge cases") |
+| 8-11 | Specific NEVER list with some reasoning |
+| 12-15 | Expert-grade anti-patterns with WHY — things only experience teaches |
+
+**Expert anti-patterns** (specific + reason):
+```markdown
+NEVER use generic AI-generated aesthetics like:
+- Overused font families (Inter, Roboto, Arial)
+- Cliched color schemes (particularly purple gradients on white backgrounds)
+- Predictable layouts and component patterns
+- Default border-radius on everything
+```
+
+**Weak anti-patterns** (vague, no reasoning):
+```markdown
+Avoid making mistakes.
+Be careful with edge cases.
+Don't write bad code.
+```
+
+**The test**: Would an expert read the anti-pattern list and say "yes, I learned this the hard way"? Or would they say "this is obvious to everyone"?
+
+---
+
+### D4: Specification Compliance — Especially Description (15 points)
+
+Does the Skill follow official format requirements? **Special focus on description quality.**
+
+| Score | Criteria |
+|-------|----------|
+| 0-5 | Missing frontmatter or invalid format |
+| 6-10 | Has frontmatter but description is vague or incomplete |
+| 11-13 | Valid frontmatter, description has WHAT but weak on WHEN |
+| 14-15 | Perfect: comprehensive description with WHAT, WHEN, and trigger keywords |
+
+**Frontmatter requirements**:
+- `name`: lowercase, alphanumeric + hyphens only, ≤64 characters
+- `description`: **THE MOST CRITICAL FIELD** — determines if skill gets used at all
+
+---
+
+**Why description is THE MOST IMPORTANT field**:
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│  SKILL ACTIVATION FLOW                                              │
+│                                                                     │
+│  User Request → Agent sees ALL skill descriptions → Decides which  │
+│                 (only descriptions, not bodies!)     to activate    │
+│                                                                     │
+│  If description doesn't match → Skill NEVER gets loaded            │
+│  If description is vague → Skill might not trigger when it should  │
+│  If description lacks keywords → Skill is invisible to the Agent   │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+**The brutal truth**: A Skill with perfect content but poor description is **useless** — it will never be activated. The description is the **only chance** to tell the Agent "use me in these situations."
+
+---
+
+**Description must answer THREE questions**:
+
+1. **WHAT**: What does this Skill do? (functionality)
+2. **WHEN**: In what situations should it be used? (trigger scenarios)
+3. **KEYWORDS**: What terms should trigger this Skill? (searchable terms)
+
+**Excellent description** (all three elements):
+```yaml
+description: "Comprehensive document creation, editing, and analysis with support
+for tracked changes, comments, formatting preservation, and text extraction.
+When Claude needs to work with professional documents (.docx files) for:
+(1) Creating new documents, (2) Modifying or editing content,
+(3) Working with tracked changes, (4) Adding comments, or any other document tasks"
+```
+
+Analysis:
+- WHAT: creation, editing, analysis, tracked changes, comments
+- WHEN: "When Claude needs to work with... for: (1)... (2)... (3)..."
+- KEYWORDS: .docx files, tracked changes, professional documents
+
+**Poor description** (missing elements):
+```yaml
+description: "处理文档相关功能"
+```
+
+Problems:
+- WHAT: vague ("文档相关功能" — what specifically?)
+- WHEN: missing (when should Agent use this?)
+- KEYWORDS: missing (no ".docx", no specific scenarios)
+
+**Another poor example**:
+```yaml
+description: "A helpful skill for various tasks"
+```
+
+This is useless — Agent has no idea when to activate it.
+
+---
+
+**Description quality checklist**:
+- [ ] Lists specific capabilities (not just "helps with X")
+- [ ] Includes explicit trigger scenarios ("Use when...", "When user asks for...")
+- [ ] Contains searchable keywords (file extensions, domain terms, action verbs)
+- [ ] Specific enough that Agent knows EXACTLY when to use it
+- [ ] Includes scenarios where this skill MUST be used (not just "can be used")
+
+---
+
+### D5: Progressive Disclosure (15 points)
+
+Does the Skill implement proper content layering?
+
+Skill loading has three layers:
+```
+Layer 1: Metadata (always in memory)
+         Only name + description
+         ~100 tokens per skill
+
+Layer 2: SKILL.md Body (loaded after triggering)
+         Detailed guidelines, code examples, decision trees
+         Ideal: < 500 lines
+
+Layer 3: Resources (loaded on demand)
+         scripts/, references/, assets/
+         No limit
+```
+
+| Score | Criteria |
+|-------|----------|
+| 0-5 | Everything dumped in SKILL.md (>500 lines, no structure) |
+| 6-10 | Has references but unclear when to load them |
+| 11-13 | Good layering with MANDATORY triggers present |
+| 14-15 | Perfect: decision trees + explicit triggers + "Do NOT Load" guidance |
+
+**For Skills WITH references directory**, check Loading Trigger Quality:
+
+| Trigger Quality | Characteristics |
+|-----------------|-----------------|
+| Poor | References listed at end, no loading guidance |
+| Mediocre | Some triggers but not embedded in workflow |
+| Good | MANDATORY triggers in workflow steps |
+| Excellent | Scenario detection + conditional triggers + "Do NOT Load" |
+
+**The loading problem**:
+```
+Loading too little ◄─────────────────────────────────► Loading too much
+- References sit unused                    - Wastes context space
+- Agent doesn't know when to load          - Irrelevant info dilutes key content
+- Knowledge is there but never accessed    - Unnecessary token overhead
+```
+
+**Good loading trigger** (embedded in workflow):
+```markdown
+### Creating New Document
+
+**MANDATORY - READ ENTIRE FILE**: Before proceeding, you MUST read
+[`docx-js.md`](docx-js.md) (~500 lines) completely from start to finish.
+**NEVER set any range limits when reading this file.**
+
+**Do NOT load** `ooxml.md` or `redlining.md` for this task.
+```
+
+**Bad loading trigger** (just listed):
+```markdown
+## References
+- docx-js.md - for creating documents
+- ooxml.md - for editing
+- redlining.md - for tracking changes
+```
+
+**For simple Skills** (no references, <100 lines): Score based on conciseness and self-containment.
+
+---
+
+### D6: Freedom Calibration (15 points)
+
+Is the level of specificity appropriate for the task's fragility?
+
+Different tasks need different levels of constraint. This is about matching freedom to fragility.
+
+| Score | Criteria |
+|-------|----------|
+| 0-5 | Severely mismatched (rigid scripts for creative tasks, vague for fragile ops) |
+| 6-10 | Partially appropriate, some mismatches |
+| 11-13 | Good calibration for most scenarios |
+| 14-15 | Perfect freedom calibration throughout |
+
+**The freedom spectrum**:
+
+| Task Type | Should Have | Why | Example Skill |
+|-----------|-------------|-----|---------------|
+| Creative/Design | High freedom | Multiple valid approaches, differentiation is value | frontend-design |
+| Code review | Medium freedom | Principles exist but judgment required | code-review |
+| File format operations | Low freedom | One wrong byte corrupts file, consistency critical | docx, xlsx, pdf |
+
+**High freedom** (text-based instructions):
+```markdown
+Commit to a BOLD aesthetic direction. Pick an extreme: brutally minimal,
+maximalist chaos, retro-futuristic, organic natural...
+```
+
+**Medium freedom** (pseudocode or parameterized):
+```markdown
+Review priority:
+1. Security vulnerabilities (must fix)
+2. Logic errors (must fix)
+3. Performance issues (should fix)
+4. Maintainability (optional)
+```
+
+**Low freedom** (specific scripts, exact steps):
+```markdown
+**MANDATORY**: Use exact script in `scripts/create-doc.py`
+Parameters: --title "X" --author "Y"
+Do NOT modify the script.
+```
+
+**The test**: Ask "if Agent makes a mistake, what's the consequence?"
+- High consequence → Low freedom
+- Low consequence → High freedom
+
+---
+
+### D7: Pattern Recognition (10 points)
+
+Does the Skill follow an established official pattern?
+
+Through analyzing 17 official Skills, we identified 5 main design patterns:
+
+| Pattern | ~Lines | Key Characteristics | Example | When to Use |
+|---------|--------|---------------------|---------|-------------|
+| **Mindset** | ~50 | Thinking > technique, strong NEVER list, high freedom | frontend-design | Creative tasks requiring taste |
+| **Navigation** | ~30 | Minimal SKILL.md, routes to sub-files | internal-comms | Multiple distinct scenarios |
+| **Philosophy** | ~150 | Two-step: Philosophy → Express, emphasizes craft | canvas-design | Art/creation requiring originality |
+| **Process** | ~200 | Phased workflow, checkpoints, medium freedom | mcp-builder | Complex multi-step projects |
+| **Tool** | ~300 | Decision trees, code examples, low freedom | docx, pdf, xlsx | Precise operations on specific formats |
+
+| Score | Criteria |
+|-------|----------|
+| 0-3 | No recognizable pattern, chaotic structure |
+| 4-6 | Partially follows a pattern with significant deviations |
+| 7-8 | Clear pattern with minor deviations |
+| 9-10 | Masterful application of appropriate pattern |
+
+**Pattern selection guide**:
+
+| Your Task Characteristics | Recommended Pattern |
+|---------------------------|---------------------|
+| Needs taste and creativity | Mindset (~50 lines) |
+| Needs originality and craft quality | Philosophy (~150 lines) |
+| Has multiple distinct sub-scenarios | Navigation (~30 lines) |
+| Complex multi-step project | Process (~200 lines) |
+| Precise operations on specific format | Tool (~300 lines) |
+
+---
+
+### D8: Practical Usability (15 points)
+
+Can an Agent actually use this Skill effectively?
+
+| Score | Criteria |
+|-------|----------|
+| 0-5 | Confusing, incomplete, contradictory, or untested guidance |
+| 6-10 | Usable but with noticeable gaps |
+| 11-13 | Clear guidance for common cases |
+| 14-15 | Comprehensive coverage including edge cases and error handling |
+
+**Check for**:
+- **Decision trees**: For multi-path scenarios, is there clear guidance on which path to take?
+- **Code examples**: Do they actually work? Or are they pseudocode that breaks?
+- **Error handling**: What if the main approach fails? Are fallbacks provided?
+- **Edge cases**: Are unusual but realistic scenarios covered?
+- **Actionability**: Can Agent immediately act, or needs to figure things out?
+
+**Good usability** (decision tree + fallback):
+```markdown
+| Task | Primary Tool | Fallback | When to Use Fallback |
+|------|-------------|----------|----------------------|
+| Read text | pdftotext | PyMuPDF | Need layout info |
+| Extract tables | camelot-py | tabula-py | camelot fails |
+
+**Common issues**:
+- Scanned PDF: pdftotext returns blank → Use OCR first
+- Encrypted PDF: Permission error → Use PyMuPDF with password
+```
+
+**Poor usability** (vague):
+```markdown
+Use appropriate tools for PDF processing.
+Handle errors properly.
+Consider edge cases.
+```
+
+---
+
+## NEVER Do When Evaluating
+
+- **NEVER** give high scores just because it "looks professional" or is well-formatted
+- **NEVER** ignore token waste — every redundant paragraph should result in deduction
+- **NEVER** let length impress you — a 43-line Skill can outperform a 500-line Skill
+- **NEVER** skip mentally testing the decision trees — do they actually lead to correct choices?
+- **NEVER** forgive explaining basics with "but it provides helpful context"
+- **NEVER** overlook missing anti-patterns — if there's no NEVER list, that's a significant gap
+- **NEVER** assume all procedures are valuable — distinguish domain-specific from generic
+- **NEVER** undervalue the description field — poor description = skill never gets used
+- **NEVER** put "when to use" info only in the body — Agent only sees description before loading
+
+---
+
+## Evaluation Protocol
+
+### Step 1: First Pass — Knowledge Delta Scan
+
+Read SKILL.md completely and for each section ask:
+> "Does Claude already know this?"
+
+Mark each section as:
+- **[E] Expert**: Claude genuinely doesn't know this — value-add
+- **[A] Activation**: Claude knows but brief reminder is useful — acceptable
+- **[R] Redundant**: Claude definitely knows this — should be deleted
+
+Calculate rough ratio: E:A:R
+- Good Skill: >70% Expert, <20% Activation, <10% Redundant
+- Mediocre Skill: 40-70% Expert, high Activation
+- Bad Skill: <40% Expert, high Redundant
+
+### Step 2: Structure Analysis
+
+```
+[ ] Check frontmatter validity
+[ ] Count total lines in SKILL.md
+[ ] List all reference files and their sizes
+[ ] Identify which pattern the Skill follows
+[ ] Check for loading triggers (if references exist)
+```
+
+### Step 3: Score Each Dimension
+
+For each of the 8 dimensions:
+1. Find specific evidence (quote relevant lines)
+2. Assign score with one-line justification
+3. Note specific improvements if score < max
+
+### Step 4: Calculate Total & Grade
+
+```
+Total = D1 + D2 + D3 + D4 + D5 + D6 + D7 + D8
+Max = 120 points
+```
+
+**Grade Scale** (percentage-based):
+| Grade | Percentage | Meaning |
+|-------|------------|---------|
+| A | 90%+ (108+) | Excellent — production-ready expert Skill |
+| B | 80-89% (96-107) | Good — minor improvements needed |
+| C | 70-79% (84-95) | Adequate — clear improvement path |
+| D | 60-69% (72-83) | Below Average — significant issues |
+| F | <60% (<72) | Poor — needs fundamental redesign |
+
+### Step 5: Generate Report
+
+```markdown
+# Skill Evaluation Report: [Skill Name]
+
+## Summary
+- **Total Score**: X/120 (X%)
+- **Grade**: [A/B/C/D/F]
+- **Pattern**: [Mindset/Navigation/Philosophy/Process/Tool]
+- **Knowledge Ratio**: E:A:R = X:Y:Z
+- **Verdict**: [One sentence assessment]
+
+## Dimension Scores
+
+| Dimension | Score | Max | Notes |
+|-----------|-------|-----|-------|
+| D1: Knowledge Delta | X | 20 | |
+| D2: Mindset vs Mechanics | X | 15 | |
+| D3: Anti-Pattern Quality | X | 15 | |
+| D4: Specification Compliance | X | 15 | |
+| D5: Progressive Disclosure | X | 15 | |
+| D6: Freedom Calibration | X | 15 | |
+| D7: Pattern Recognition | X | 10 | |
+| D8: Practical Usability | X | 15 | |
+
+## Critical Issues
+[List must-fix problems that significantly impact the Skill's effectiveness]
+
+## Top 3 Improvements
+1. [Highest impact improvement with specific guidance]
+2. [Second priority improvement]
+3. [Third priority improvement]
+
+## Detailed Analysis
+[For each dimension scoring below 80%, provide:
+- What's missing or problematic
+- Specific examples from the Skill
+- Concrete suggestions for improvement]
+```
+
+---
+
+## Common Failure Patterns
+
+### Pattern 1: The Tutorial
+```
+Symptom: Explains what PDF is, how Python works, basic library usage
+Root cause: Author assumes Skill should "teach" the model
+Fix: Claude already knows this. Delete all basic explanations.
+     Focus on expert decisions, trade-offs, and anti-patterns.
+```
+
+### Pattern 2: The Dump
+```
+Symptom: SKILL.md is 800+ lines with everything included
+Root cause: No progressive disclosure design
+Fix: Core routing and decision trees in SKILL.md (<300 lines ideal)
+     Detailed content in references/, loaded on-demand
+```
+
+### Pattern 3: The Orphan References
+```
+Symptom: References directory exists but files are never loaded
+Root cause: No explicit loading triggers
+Fix: Add "MANDATORY - READ ENTIRE FILE" at workflow decision points
+     Add "Do NOT Load" to prevent over-loading
+```
+
+### Pattern 4: The Checkbox Procedure
+```
+Symptom: Step 1, Step 2, Step 3... mechanical procedures
+Root cause: Author thinks in procedures, not thinking frameworks
+Fix: Transform into "Before doing X, ask yourself..."
+     Focus on decision principles, not operation sequences
+```
+
+### Pattern 5: The Vague Warning
+```
+Symptom: "Be careful", "avoid errors", "consider edge cases"
+Root cause: Author knows things can go wrong but hasn't articulated specifics
+Fix: Specific NEVER list with concrete examples and non-obvious reasons
+     "NEVER use X because [specific problem that takes experience to learn]"
+```
+
+### Pattern 6: The Invisible Skill
+```
+Symptom: Great content but skill rarely gets activated
+Root cause: Description is vague, missing keywords, or lacks trigger scenarios
+Fix: Description must answer WHAT, WHEN, and include KEYWORDS
+     "Use when..." + specific scenarios + searchable terms
+
+Example fix:
+BAD:  "Helps with document tasks"
+GOOD: "Create, edit, and analyze .docx files. Use when working with
+       Word documents, tracked changes, or professional document formatting."
+```
+
+### Pattern 7: The Wrong Location
+```
+Symptom: "When to use this Skill" section in body, not in description
+Root cause: Misunderstanding of three-layer loading
+Fix: Move all triggering information to description field
+     Body is only loaded AFTER triggering decision is made
+```
+
+### Pattern 8: The Over-Engineered
+```
+Symptom: README.md, CHANGELOG.md, INSTALLATION_GUIDE.md, CONTRIBUTING.md
+Root cause: Treating Skill like a software project
+Fix: Delete all auxiliary files. Only include what Agent needs for the task.
+     No documentation about the Skill itself.
+```
+
+### Pattern 9: The Freedom Mismatch
+```
+Symptom: Rigid scripts for creative tasks, vague guidance for fragile operations
+Root cause: Not considering task fragility
+Fix: High freedom for creative (principles, not steps)
+     Low freedom for fragile (exact scripts, no parameters)
+```
+
+---
+
+## Quick Reference Checklist
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│  SKILL EVALUATION QUICK CHECK                                           │
+├─────────────────────────────────────────────────────────────────────────┤
+│                                                                         │
+│  KNOWLEDGE DELTA (most important):                                      │
+│    [ ] No "What is X" explanations for basic concepts                   │
+│    [ ] No step-by-step tutorials for standard operations                │
+│    [ ] Has decision trees for non-obvious choices                       │
+│    [ ] Has trade-offs only experts would know                           │
+│    [ ] Has edge cases from real-world experience                        │
+│                                                                         │
+│  MINDSET + PROCEDURES:                                                  │
+│    [ ] Transfers thinking patterns (how to think about problems)        │
+│    [ ] Has "Before doing X, ask yourself..." frameworks                 │
+│    [ ] Includes domain-specific procedures Claude wouldn't know         │
+│    [ ] Distinguishes valuable procedures from generic ones              │
+│                                                                         │
+│  ANTI-PATTERNS:                                                         │
+│    [ ] Has explicit NEVER list                                          │
+│    [ ] Anti-patterns are specific, not vague                            │
+│    [ ] Includes WHY (non-obvious reasons)                               │
+│                                                                         │
+│  SPECIFICATION (description is critical!):                              │
+│    [ ] Valid YAML frontmatter                                           │
+│    [ ] name: lowercase, ≤64 chars                                       │
+│    [ ] description answers: WHAT does it do?                            │
+│    [ ] description answers: WHEN should it be used?                     │
+│    [ ] description contains trigger KEYWORDS                            │
+│    [ ] description is specific enough for Agent to know when to use     │
+│                                                                         │
+│  STRUCTURE:                                                             │
+│    [ ] SKILL.md < 500 lines (ideal < 300)                               │
+│    [ ] Heavy content in references/                                     │
+│    [ ] Loading triggers embedded in workflow                            │
+│    [ ] Has "Do NOT Load" for preventing over-loading                    │
+│                                                                         │
+│  FREEDOM:                                                               │
+│    [ ] Creative tasks → High freedom (principles)                       │
+│    [ ] Fragile operations → Low freedom (exact scripts)                 │
+│                                                                         │
+│  USABILITY:                                                             │
+│    [ ] Decision trees for multi-path scenarios                          │
+│    [ ] Working code examples                                            │
+│    [ ] Error handling and fallbacks                                     │
+│    [ ] Edge cases covered                                               │
+│                                                                         │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## The Meta-Question
+
+When evaluating any Skill, always return to this fundamental question:
+
+> **"Would an expert in this domain, looking at this Skill, say:**
+> **'Yes, this captures knowledge that took me years to learn'?"**
+
+If the answer is yes → the Skill has genuine value.
+If the answer is no → it's compressing what Claude already knows.
+
+The best Skills are **compressed expert brains** — they take a designer's 10 years of aesthetic accumulation and compress it into 43 lines, or a document expert's operational experience into a 200-line decision tree.
+
+What gets compressed must be things Claude doesn't have. Otherwise, it's garbage compression.
+
+---
+
+## Self-Evaluation Note
+
+This Skill (skill-judge) should itself pass evaluation:
+
+- **Knowledge Delta**: Provides specific evaluation criteria Claude wouldn't generate on its own
+- **Mindset**: Shapes how to think about Skill quality, not just checklist items
+- **Anti-Patterns**: "NEVER Do When Evaluating" section with specific don'ts
+- **Specification**: Valid frontmatter with comprehensive description
+- **Progressive Disclosure**: Self-contained, no external references needed
+- **Freedom**: Medium freedom appropriate for evaluation task
+- **Pattern**: Follows Tool pattern with decision frameworks
+- **Usability**: Clear protocol, report template, quick reference
+
+
+
+Evaluate this Skill against itself as a calibration exercise.