Files
Pony-Alpha-2-Dataset-Training/datasets/03-planning-decomposition/README.md
Pony Alpha 2 68453089ee feat: initial Alpha Brain 2 dataset release
Massive training corpus for AI coding models containing:
- 10 JSONL training datasets (641+ examples across coding, reasoning, planning, architecture, communication, debugging, security, workflows, error handling, UI/UX)
- 11 agent behavior specifications (explorer, planner, reviewer, debugger, executor, UI designer, Linux admin, kernel engineer, security architect, automation engineer, API architect)
- 6 skill definition files (coding, API engineering, kernel, Linux server, security architecture, server automation, UI/UX)
- Master README with project origin story and philosophy

Built by Pony Alpha 2 to help AI models learn expert-level coding approaches.
2026-03-13 16:26:29 +04:00

137 lines
3.9 KiB
Markdown

# Planning and Decomposition Dataset
**Created:** March 13, 2026 3:40 PM
## Overview
This dataset contains examples for training AI models to decompose complex tasks into sub-tasks, manage todo lists, and determine execution order and dependencies.
## Dataset Format
JSONL (JSON Lines) - one JSON object per line
## Schema
Each example contains:
- `task`: The original user request
- `decomposition`: Array of sub-tasks
- `execution_order`: Dependencies between tasks (pairs of [task_i, task_j])
- `todo_list`: Structured todos with `content`, `status`, and `activeForm`
## Statistics
- **Total Examples:** 43
- **File Size:** 90KB
- **Format:** JSONL
## Coverage
### Todo List Management
- All examples include structured todo lists with content, status, and activeForm
- Demonstrates proper todo item formulation
- Shows task progression from "pending" to completion
### Multi-Step Tasks (3+ steps)
All examples have 8-15 sub-steps, demonstrating:
- Feature implementation (authentication, real-time chat, search)
- Bug investigation (memory leaks, API timeouts)
- Refactoring (monolithic controllers, duplicate code)
- Migration (database, frontend JavaScript to TypeScript)
- CI/CD setup (GitHub Actions pipelines)
- And many more complex scenarios
### ONE Task in Progress at a Time
All todo_list items show `status: "pending"`, demonstrating that only one task should be marked as `in_progress` at any given time during execution.
### Sequential vs Parallel Decisions
The `execution_order` field clearly shows dependencies:
- Sequential: [[1,2], [2,3], [3,4]] - tasks must complete in order
- Parallel: [[1,2], [1,3], [2,4], [3,4]] - some tasks can run simultaneously
### Replanning When New Info Emerges
Several examples show iterative refinement:
- Debug scenarios where new information changes the approach
- Migration examples with staging before production
- Testing examples where results inform next steps
## Scenarios Covered
### Feature Implementation
- User authentication with JWT
- Real-time chat with WebSocket
- Search functionality with Elasticsearch
- Data export with multiple formats
- Notification systems (email, SMS, push)
- File upload with drag-and-drop
- Content management systems
- Real-time collaboration features
- Data visualization components
- Form validation systems
- And more...
### Bug Investigation
- Memory leak debugging
- API timeout errors
- Performance profiling
- Error tracking systems
### Refactoring
- Monolithic controller to service layer
- Duplicate code to utilities
- Frontend optimization (bundle size, load time)
- Component library creation with Storybook
### Migration
- PostgreSQL to MongoDB
- JavaScript to TypeScript
- Database schema migrations
- API versioning
### CI/CD
- GitHub Actions pipeline setup
- Automated testing strategies
- Build and deployment automation
- Infrastructure monitoring
### Security & Compliance
- Security audits and fixes
- Data encryption
- Permission systems (RBAC)
- Audit logging
- Input validation
### Infrastructure
- Database sharding
- Message queue systems
- Caching with Redis
- API gateway setup
- Distributed tracing
- Session management
### Testing & Quality
- Automated testing strategies
- A/B testing frameworks
- Localization testing
- Accessibility features
## Usage Example
```python
import json
# Read the dataset
with open('planning-decomposition.jsonl', 'r') as f:
for line in f:
example = json.loads(line)
print(f"Task: {example['task']}")
print(f"Sub-tasks: {len(example['decomposition'])}")
print(f"Dependencies: {len(example['execution_order'])}")
print(f"Todo items: {len(example['todo_list'])}")
print()
```
## File Location
`C:/Users/admin/Pony-Alpha-2-Dataset-Training/datasets/03-planning-decomposition/planning-decomposition.jsonl`
## Notes
- Dataset created March 13, 2026
- All examples follow consistent schema
- Suitable for training planning and task decomposition models
- Covers real-world software engineering scenarios