Files
Pony Alpha 2 68453089ee feat: initial Alpha Brain 2 dataset release
Massive training corpus for AI coding models containing:
- 10 JSONL training datasets (641+ examples across coding, reasoning, planning, architecture, communication, debugging, security, workflows, error handling, UI/UX)
- 11 agent behavior specifications (explorer, planner, reviewer, debugger, executor, UI designer, Linux admin, kernel engineer, security architect, automation engineer, API architect)
- 6 skill definition files (coding, API engineering, kernel, Linux server, security architecture, server automation, UI/UX)
- Master README with project origin story and philosophy

Built by Pony Alpha 2 to help AI models learn expert-level coding approaches.
68453089ee · 2026-03-13 16:26:29 +04:00
History
..

Code Review and Debugging Dataset

This dataset contains 62 examples of code review and debugging scenarios covering security vulnerabilities, performance issues, error handling, concurrency bugs, and memory leaks across multiple programming languages.

Dataset Format

JSONL format - one JSON object per line with the following structure:

  • type: Either "code_review" or "debugging"
  • input_code: The code being reviewed or debugged
  • analysis: Step-by-step analysis of the code
  • findings: List of issues with severity levels and CWE references
  • fix: The recommended fix for the identified issues

Coverage

Security Vulnerabilities

  • SQL Injection: Direct string concatenation in queries (CWE-89)
  • Cross-Site Scripting (XSS): Unescaped output in templates (CWE-79)
  • Command Injection: User input in shell commands (CWE-78)
  • Path Traversal: Unvalidated file paths (CWE-22)
  • SSRF: Unvalidated URL parameters (CWE-918)
  • Missing Authentication: No auth checks on endpoints (CWE-306)
  • Insecure Session Management: Unsigned cookies, missing expiration (CWE-613)
  • Weak Cryptography: MD5, missing salts, insecure modes (CWE-327)
  • Code Injection: eval() and similar dangerous functions (CWE-94)

Performance Issues

  • String Concatenation: Quadratic time complexity (CWE-407)
  • N+1 Query Problem: Sequential database queries (CWE-1050)
  • Unbounded Growth: Memory leaks in caches, queues, maps (CWE-400)
  • Missing Connection Pooling: Creating new connections (CWE-407)
  • Busy Waiting: Inefficient polling loops (CWE-842)

Error Handling

  • Silent Failures: Broad exception catching (CWE-390)
  • Information Disclosure: Leaking error details (CWE-209)
  • Missing Validation: No input sanitization (CWE-20)
  • Resource Leaks: Unclosed files, connections, threads (CWE-772)

Concurrency Bugs

  • Race Conditions: Unprotected shared state (CWE-362)
  • TOCTOU Issues: Check-then-act patterns (CWE-367)
  • Deadlocks: Missing timeout handling (CWE-833)
  • Missing Synchronization: No locks on shared data (CWE-820)

Memory Leaks

  • Unbounded Caches: No size limits or TTL (CWE-401)
  • Unclosed Resources: Files, connections, threads (CWE-772)
  • Growing Lists: No eviction policies (CWE-400)
  • Circular References: Event listeners, callbacks (CWE-459)

Languages Covered

  • Python: 35+ examples
  • JavaScript/TypeScript: 15+ examples
  • Go: 10+ examples

Example Entry

{
  "type": "code_review",
  "input_code": "def login(username, password):\n    query = \"SELECT * FROM users WHERE username='\" + username + \"' AND password='\" + password + \"'\"\n    cursor.execute(query)\n    return cursor.fetchone()",
  "analysis": "1. The code directly concatenates user input into a SQL query without any sanitization.\n2. This creates a classic SQL injection vulnerability where an attacker can manipulate the query.",
  "findings": [
    {"issue": "SQL Injection Vulnerability", "severity": "CRITICAL", "location": "query construction", "cwe": "CWE-89"},
    {"issue": "Plaintext Password Storage", "severity": "HIGH", "location": "password comparison", "cwe": "CWE-256"}
  ],
  "fix": "def login(username, password):\n    cursor.execute(\"SELECT user_id FROM users WHERE username = %s\", (username,))\n    result = cursor.fetchone()\n    if result and verify_password(password, result['password_hash']):\n        return result"
}

Usage

This dataset is suitable for:

  • Training code review AI models
  • Teaching secure coding practices
  • Automated code analysis tools
  • Security awareness training
  • Bug bounty preparation

Statistics

  • Total Examples: 62
  • Code Review: ~32 examples
  • Debugging: ~30 examples
  • File Size: ~75KB
  • Unique CWEs: 25+ vulnerability types
  • Languages: Python, JavaScript, TypeScript, Go

File Location

/c/Users/admin/Pony-Alpha-2-Dataset-Training/datasets/06-code-review-debugging/code-review-debugging.jsonl