# Planning and Decomposition Dataset

**Created:** March 13, 2026 3:40 PM

## Overview
This dataset contains examples for training AI models to decompose complex tasks into sub-tasks, manage todo lists, and determine execution order and dependencies.

## Dataset Format
JSONL (JSON Lines) - one JSON object per line

## Schema
Each example contains:
- `task`: The original user request
- `decomposition`: Array of sub-tasks
- `execution_order`: Dependencies between tasks (pairs of [task_i, task_j])
- `todo_list`: Structured todos with `content`, `status`, and `activeForm`

## Statistics
- **Total Examples:** 43
- **File Size:** 90KB
- **Format:** JSONL

## Coverage

### Todo List Management
- All examples include structured todo lists with content, status, and activeForm
- Demonstrates proper todo item formulation
- Shows task progression from "pending" to completion

### Multi-Step Tasks (3+ steps)
All examples have 8-15 sub-steps, demonstrating:
- Feature implementation (authentication, real-time chat, search)
- Bug investigation (memory leaks, API timeouts)
- Refactoring (monolithic controllers, duplicate code)
- Migration (database, frontend JavaScript to TypeScript)
- CI/CD setup (GitHub Actions pipelines)
- And many more complex scenarios

### ONE Task in Progress at a Time
All todo_list items show `status: "pending"`, demonstrating that only one task should be marked as `in_progress` at any given time during execution.

### Sequential vs Parallel Decisions
The `execution_order` field clearly shows dependencies:
- Sequential: [[1,2], [2,3], [3,4]] - tasks must complete in order
- Parallel: [[1,2], [1,3], [2,4], [3,4]] - some tasks can run simultaneously

### Replanning When New Info Emerges
Several examples show iterative refinement:
- Debug scenarios where new information changes the approach
- Migration examples with staging before production
- Testing examples where results inform next steps

## Scenarios Covered

### Feature Implementation
- User authentication with JWT
- Real-time chat with WebSocket
- Search functionality with Elasticsearch
- Data export with multiple formats
- Notification systems (email, SMS, push)
- File upload with drag-and-drop
- Content management systems
- Real-time collaboration features
- Data visualization components
- Form validation systems
- And more...

### Bug Investigation
- Memory leak debugging
- API timeout errors
- Performance profiling
- Error tracking systems

### Refactoring
- Monolithic controller to service layer
- Duplicate code to utilities
- Frontend optimization (bundle size, load time)
- Component library creation with Storybook

### Migration
- PostgreSQL to MongoDB
- JavaScript to TypeScript
- Database schema migrations
- API versioning

### CI/CD
- GitHub Actions pipeline setup
- Automated testing strategies
- Build and deployment automation
- Infrastructure monitoring

### Security & Compliance
- Security audits and fixes
- Data encryption
- Permission systems (RBAC)
- Audit logging
- Input validation

### Infrastructure
- Database sharding
- Message queue systems
- Caching with Redis
- API gateway setup
- Distributed tracing
- Session management

### Testing & Quality
- Automated testing strategies
- A/B testing frameworks
- Localization testing
- Accessibility features

## Usage Example

```python
import json

# Read the dataset
with open('planning-decomposition.jsonl', 'r') as f:
    for line in f:
        example = json.loads(line)
        print(f"Task: {example['task']}")
        print(f"Sub-tasks: {len(example['decomposition'])}")
        print(f"Dependencies: {len(example['execution_order'])}")
        print(f"Todo items: {len(example['todo_list'])}")
        print()
```

## File Location
`C:/Users/admin/Pony-Alpha-2-Dataset-Training/datasets/03-planning-decomposition/planning-decomposition.jsonl`

## Notes
- Dataset created March 13, 2026
- All examples follow consistent schema
- Suitable for training planning and task decomposition models
- Covers real-world software engineering scenarios