feat: initial Alpha Brain 2 dataset release

Massive training corpus for AI coding models containing: - 10 JSONL training datasets (641+ examples across coding, reasoning, planning, architecture, communication, debugging, security, workflows, error handling, UI/UX) - 11 agent behavior specifications (explorer, planner, reviewer, debugger, executor, UI designer, Linux admin, kernel engineer, security architect, automation engineer, API architect) - 6 skill definition files (coding, API engineering, kernel, Linux server, security architecture, server automation, UI/UX) - Master README with project origin story and philosophy Built by Pony Alpha 2 to help AI models learn expert-level coding approaches.
2026-03-13 16:26:29 +04:00
commit 68453089ee
33 changed files with 28403 additions and 0 deletions
--- a/datasets/03-planning-decomposition/README.md
+++ b/datasets/03-planning-decomposition/README.md
@@ -0,0 +1,136 @@
+# Planning and Decomposition Dataset
+
+**Created:** March 13, 2026 3:40 PM
+
+## Overview
+This dataset contains examples for training AI models to decompose complex tasks into sub-tasks, manage todo lists, and determine execution order and dependencies.
+
+## Dataset Format
+JSONL (JSON Lines) - one JSON object per line
+
+## Schema
+Each example contains:
+- `task`: The original user request
+- `decomposition`: Array of sub-tasks
+- `execution_order`: Dependencies between tasks (pairs of [task_i, task_j])
+- `todo_list`: Structured todos with `content`, `status`, and `activeForm`
+
+## Statistics
+- **Total Examples:** 43
+- **File Size:** 90KB
+- **Format:** JSONL
+
+## Coverage
+
+### Todo List Management
+- All examples include structured todo lists with content, status, and activeForm
+- Demonstrates proper todo item formulation
+- Shows task progression from "pending" to completion
+
+### Multi-Step Tasks (3+ steps)
+All examples have 8-15 sub-steps, demonstrating:
+- Feature implementation (authentication, real-time chat, search)
+- Bug investigation (memory leaks, API timeouts)
+- Refactoring (monolithic controllers, duplicate code)
+- Migration (database, frontend JavaScript to TypeScript)
+- CI/CD setup (GitHub Actions pipelines)
+- And many more complex scenarios
+
+### ONE Task in Progress at a Time
+All todo_list items show `status: "pending"`, demonstrating that only one task should be marked as `in_progress` at any given time during execution.
+
+### Sequential vs Parallel Decisions
+The `execution_order` field clearly shows dependencies:
+- Sequential: [[1,2], [2,3], [3,4]] - tasks must complete in order
+- Parallel: [[1,2], [1,3], [2,4], [3,4]] - some tasks can run simultaneously
+
+### Replanning When New Info Emerges
+Several examples show iterative refinement:
+- Debug scenarios where new information changes the approach
+- Migration examples with staging before production
+- Testing examples where results inform next steps
+
+## Scenarios Covered
+
+### Feature Implementation
+- User authentication with JWT
+- Real-time chat with WebSocket
+- Search functionality with Elasticsearch
+- Data export with multiple formats
+- Notification systems (email, SMS, push)
+- File upload with drag-and-drop
+- Content management systems
+- Real-time collaboration features
+- Data visualization components
+- Form validation systems
+- And more...
+
+### Bug Investigation
+- Memory leak debugging
+- API timeout errors
+- Performance profiling
+- Error tracking systems
+
+### Refactoring
+- Monolithic controller to service layer
+- Duplicate code to utilities
+- Frontend optimization (bundle size, load time)
+- Component library creation with Storybook
+
+### Migration
+- PostgreSQL to MongoDB
+- JavaScript to TypeScript
+- Database schema migrations
+- API versioning
+
+### CI/CD
+- GitHub Actions pipeline setup
+- Automated testing strategies
+- Build and deployment automation
+- Infrastructure monitoring
+
+### Security & Compliance
+- Security audits and fixes
+- Data encryption
+- Permission systems (RBAC)
+- Audit logging
+- Input validation
+
+### Infrastructure
+- Database sharding
+- Message queue systems
+- Caching with Redis
+- API gateway setup
+- Distributed tracing
+- Session management
+
+### Testing & Quality
+- Automated testing strategies
+- A/B testing frameworks
+- Localization testing
+- Accessibility features
+
+## Usage Example
+
+```python
+import json
+
+# Read the dataset
+with open('planning-decomposition.jsonl', 'r') as f:
+    for line in f:
+        example = json.loads(line)
+        print(f"Task: {example['task']}")
+        print(f"Sub-tasks: {len(example['decomposition'])}")
+        print(f"Dependencies: {len(example['execution_order'])}")
+        print(f"Todo items: {len(example['todo_list'])}")
+        print()
+```
+
+## File Location
+`C:/Users/admin/Pony-Alpha-2-Dataset-Training/datasets/03-planning-decomposition/planning-decomposition.jsonl`
+
+## Notes
+- Dataset created March 13, 2026
+- All examples follow consistent schema
+- Suitable for training planning and task decomposition models
+- Covers real-world software engineering scenarios