Files
Pony Alpha 2 68453089ee feat: initial Alpha Brain 2 dataset release
Massive training corpus for AI coding models containing:
- 10 JSONL training datasets (641+ examples across coding, reasoning, planning, architecture, communication, debugging, security, workflows, error handling, UI/UX)
- 11 agent behavior specifications (explorer, planner, reviewer, debugger, executor, UI designer, Linux admin, kernel engineer, security architect, automation engineer, API architect)
- 6 skill definition files (coding, API engineering, kernel, Linux server, security architecture, server automation, UI/UX)
- Master README with project origin story and philosophy

Built by Pony Alpha 2 to help AI models learn expert-level coding approaches.
2026-03-13 16:26:29 +04:00

97 lines
3.9 KiB
Markdown

# Code Review and Debugging Dataset
This dataset contains **62 examples** of code review and debugging scenarios covering security vulnerabilities, performance issues, error handling, concurrency bugs, and memory leaks across multiple programming languages.
## Dataset Format
JSONL format - one JSON object per line with the following structure:
- `type`: Either "code_review" or "debugging"
- `input_code`: The code being reviewed or debugged
- `analysis`: Step-by-step analysis of the code
- `findings`: List of issues with severity levels and CWE references
- `fix`: The recommended fix for the identified issues
## Coverage
### Security Vulnerabilities
- **SQL Injection**: Direct string concatenation in queries (CWE-89)
- **Cross-Site Scripting (XSS)**: Unescaped output in templates (CWE-79)
- **Command Injection**: User input in shell commands (CWE-78)
- **Path Traversal**: Unvalidated file paths (CWE-22)
- **SSRF**: Unvalidated URL parameters (CWE-918)
- **Missing Authentication**: No auth checks on endpoints (CWE-306)
- **Insecure Session Management**: Unsigned cookies, missing expiration (CWE-613)
- **Weak Cryptography**: MD5, missing salts, insecure modes (CWE-327)
- **Code Injection**: eval() and similar dangerous functions (CWE-94)
### Performance Issues
- **String Concatenation**: Quadratic time complexity (CWE-407)
- **N+1 Query Problem**: Sequential database queries (CWE-1050)
- **Unbounded Growth**: Memory leaks in caches, queues, maps (CWE-400)
- **Missing Connection Pooling**: Creating new connections (CWE-407)
- **Busy Waiting**: Inefficient polling loops (CWE-842)
### Error Handling
- **Silent Failures**: Broad exception catching (CWE-390)
- **Information Disclosure**: Leaking error details (CWE-209)
- **Missing Validation**: No input sanitization (CWE-20)
- **Resource Leaks**: Unclosed files, connections, threads (CWE-772)
### Concurrency Bugs
- **Race Conditions**: Unprotected shared state (CWE-362)
- **TOCTOU Issues**: Check-then-act patterns (CWE-367)
- **Deadlocks**: Missing timeout handling (CWE-833)
- **Missing Synchronization**: No locks on shared data (CWE-820)
### Memory Leaks
- **Unbounded Caches**: No size limits or TTL (CWE-401)
- **Unclosed Resources**: Files, connections, threads (CWE-772)
- **Growing Lists**: No eviction policies (CWE-400)
- **Circular References**: Event listeners, callbacks (CWE-459)
## Languages Covered
- **Python**: 35+ examples
- **JavaScript/TypeScript**: 15+ examples
- **Go**: 10+ examples
## Example Entry
```json
{
"type": "code_review",
"input_code": "def login(username, password):\n query = \"SELECT * FROM users WHERE username='\" + username + \"' AND password='\" + password + \"'\"\n cursor.execute(query)\n return cursor.fetchone()",
"analysis": "1. The code directly concatenates user input into a SQL query without any sanitization.\n2. This creates a classic SQL injection vulnerability where an attacker can manipulate the query.",
"findings": [
{"issue": "SQL Injection Vulnerability", "severity": "CRITICAL", "location": "query construction", "cwe": "CWE-89"},
{"issue": "Plaintext Password Storage", "severity": "HIGH", "location": "password comparison", "cwe": "CWE-256"}
],
"fix": "def login(username, password):\n cursor.execute(\"SELECT user_id FROM users WHERE username = %s\", (username,))\n result = cursor.fetchone()\n if result and verify_password(password, result['password_hash']):\n return result"
}
```
## Usage
This dataset is suitable for:
- Training code review AI models
- Teaching secure coding practices
- Automated code analysis tools
- Security awareness training
- Bug bounty preparation
## Statistics
- **Total Examples**: 62
- **Code Review**: ~32 examples
- **Debugging**: ~30 examples
- **File Size**: ~75KB
- **Unique CWEs**: 25+ vulnerability types
- **Languages**: Python, JavaScript, TypeScript, Go
## File Location
```
/c/Users/admin/Pony-Alpha-2-Dataset-Training/datasets/06-code-review-debugging/code-review-debugging.jsonl
```