feat: initial Alpha Brain 2 dataset release
Massive training corpus for AI coding models containing: - 10 JSONL training datasets (641+ examples across coding, reasoning, planning, architecture, communication, debugging, security, workflows, error handling, UI/UX) - 11 agent behavior specifications (explorer, planner, reviewer, debugger, executor, UI designer, Linux admin, kernel engineer, security architect, automation engineer, API architect) - 6 skill definition files (coding, API engineering, kernel, Linux server, security architecture, server automation, UI/UX) - Master README with project origin story and philosophy Built by Pony Alpha 2 to help AI models learn expert-level coding approaches.
This commit is contained in:
136
datasets/03-planning-decomposition/README.md
Normal file
136
datasets/03-planning-decomposition/README.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# Planning and Decomposition Dataset
|
||||
|
||||
**Created:** March 13, 2026 3:40 PM
|
||||
|
||||
## Overview
|
||||
This dataset contains examples for training AI models to decompose complex tasks into sub-tasks, manage todo lists, and determine execution order and dependencies.
|
||||
|
||||
## Dataset Format
|
||||
JSONL (JSON Lines) - one JSON object per line
|
||||
|
||||
## Schema
|
||||
Each example contains:
|
||||
- `task`: The original user request
|
||||
- `decomposition`: Array of sub-tasks
|
||||
- `execution_order`: Dependencies between tasks (pairs of [task_i, task_j])
|
||||
- `todo_list`: Structured todos with `content`, `status`, and `activeForm`
|
||||
|
||||
## Statistics
|
||||
- **Total Examples:** 43
|
||||
- **File Size:** 90KB
|
||||
- **Format:** JSONL
|
||||
|
||||
## Coverage
|
||||
|
||||
### Todo List Management
|
||||
- All examples include structured todo lists with content, status, and activeForm
|
||||
- Demonstrates proper todo item formulation
|
||||
- Shows task progression from "pending" to completion
|
||||
|
||||
### Multi-Step Tasks (3+ steps)
|
||||
All examples have 8-15 sub-steps, demonstrating:
|
||||
- Feature implementation (authentication, real-time chat, search)
|
||||
- Bug investigation (memory leaks, API timeouts)
|
||||
- Refactoring (monolithic controllers, duplicate code)
|
||||
- Migration (database, frontend JavaScript to TypeScript)
|
||||
- CI/CD setup (GitHub Actions pipelines)
|
||||
- And many more complex scenarios
|
||||
|
||||
### ONE Task in Progress at a Time
|
||||
All todo_list items show `status: "pending"`, demonstrating that only one task should be marked as `in_progress` at any given time during execution.
|
||||
|
||||
### Sequential vs Parallel Decisions
|
||||
The `execution_order` field clearly shows dependencies:
|
||||
- Sequential: [[1,2], [2,3], [3,4]] - tasks must complete in order
|
||||
- Parallel: [[1,2], [1,3], [2,4], [3,4]] - some tasks can run simultaneously
|
||||
|
||||
### Replanning When New Info Emerges
|
||||
Several examples show iterative refinement:
|
||||
- Debug scenarios where new information changes the approach
|
||||
- Migration examples with staging before production
|
||||
- Testing examples where results inform next steps
|
||||
|
||||
## Scenarios Covered
|
||||
|
||||
### Feature Implementation
|
||||
- User authentication with JWT
|
||||
- Real-time chat with WebSocket
|
||||
- Search functionality with Elasticsearch
|
||||
- Data export with multiple formats
|
||||
- Notification systems (email, SMS, push)
|
||||
- File upload with drag-and-drop
|
||||
- Content management systems
|
||||
- Real-time collaboration features
|
||||
- Data visualization components
|
||||
- Form validation systems
|
||||
- And more...
|
||||
|
||||
### Bug Investigation
|
||||
- Memory leak debugging
|
||||
- API timeout errors
|
||||
- Performance profiling
|
||||
- Error tracking systems
|
||||
|
||||
### Refactoring
|
||||
- Monolithic controller to service layer
|
||||
- Duplicate code to utilities
|
||||
- Frontend optimization (bundle size, load time)
|
||||
- Component library creation with Storybook
|
||||
|
||||
### Migration
|
||||
- PostgreSQL to MongoDB
|
||||
- JavaScript to TypeScript
|
||||
- Database schema migrations
|
||||
- API versioning
|
||||
|
||||
### CI/CD
|
||||
- GitHub Actions pipeline setup
|
||||
- Automated testing strategies
|
||||
- Build and deployment automation
|
||||
- Infrastructure monitoring
|
||||
|
||||
### Security & Compliance
|
||||
- Security audits and fixes
|
||||
- Data encryption
|
||||
- Permission systems (RBAC)
|
||||
- Audit logging
|
||||
- Input validation
|
||||
|
||||
### Infrastructure
|
||||
- Database sharding
|
||||
- Message queue systems
|
||||
- Caching with Redis
|
||||
- API gateway setup
|
||||
- Distributed tracing
|
||||
- Session management
|
||||
|
||||
### Testing & Quality
|
||||
- Automated testing strategies
|
||||
- A/B testing frameworks
|
||||
- Localization testing
|
||||
- Accessibility features
|
||||
|
||||
## Usage Example
|
||||
|
||||
```python
|
||||
import json
|
||||
|
||||
# Read the dataset
|
||||
with open('planning-decomposition.jsonl', 'r') as f:
|
||||
for line in f:
|
||||
example = json.loads(line)
|
||||
print(f"Task: {example['task']}")
|
||||
print(f"Sub-tasks: {len(example['decomposition'])}")
|
||||
print(f"Dependencies: {len(example['execution_order'])}")
|
||||
print(f"Todo items: {len(example['todo_list'])}")
|
||||
print()
|
||||
```
|
||||
|
||||
## File Location
|
||||
`C:/Users/admin/Pony-Alpha-2-Dataset-Training/datasets/03-planning-decomposition/planning-decomposition.jsonl`
|
||||
|
||||
## Notes
|
||||
- Dataset created March 13, 2026
|
||||
- All examples follow consistent schema
|
||||
- Suitable for training planning and task decomposition models
|
||||
- Covers real-world software engineering scenarios
|
||||
Reference in New Issue
Block a user