# Planning and Decomposition Dataset **Created:** March 13, 2026 3:40 PM ## Overview This dataset contains examples for training AI models to decompose complex tasks into sub-tasks, manage todo lists, and determine execution order and dependencies. ## Dataset Format JSONL (JSON Lines) - one JSON object per line ## Schema Each example contains: - `task`: The original user request - `decomposition`: Array of sub-tasks - `execution_order`: Dependencies between tasks (pairs of [task_i, task_j]) - `todo_list`: Structured todos with `content`, `status`, and `activeForm` ## Statistics - **Total Examples:** 43 - **File Size:** 90KB - **Format:** JSONL ## Coverage ### Todo List Management - All examples include structured todo lists with content, status, and activeForm - Demonstrates proper todo item formulation - Shows task progression from "pending" to completion ### Multi-Step Tasks (3+ steps) All examples have 8-15 sub-steps, demonstrating: - Feature implementation (authentication, real-time chat, search) - Bug investigation (memory leaks, API timeouts) - Refactoring (monolithic controllers, duplicate code) - Migration (database, frontend JavaScript to TypeScript) - CI/CD setup (GitHub Actions pipelines) - And many more complex scenarios ### ONE Task in Progress at a Time All todo_list items show `status: "pending"`, demonstrating that only one task should be marked as `in_progress` at any given time during execution. ### Sequential vs Parallel Decisions The `execution_order` field clearly shows dependencies: - Sequential: [[1,2], [2,3], [3,4]] - tasks must complete in order - Parallel: [[1,2], [1,3], [2,4], [3,4]] - some tasks can run simultaneously ### Replanning When New Info Emerges Several examples show iterative refinement: - Debug scenarios where new information changes the approach - Migration examples with staging before production - Testing examples where results inform next steps ## Scenarios Covered ### Feature Implementation - User authentication with JWT - Real-time chat with WebSocket - Search functionality with Elasticsearch - Data export with multiple formats - Notification systems (email, SMS, push) - File upload with drag-and-drop - Content management systems - Real-time collaboration features - Data visualization components - Form validation systems - And more... ### Bug Investigation - Memory leak debugging - API timeout errors - Performance profiling - Error tracking systems ### Refactoring - Monolithic controller to service layer - Duplicate code to utilities - Frontend optimization (bundle size, load time) - Component library creation with Storybook ### Migration - PostgreSQL to MongoDB - JavaScript to TypeScript - Database schema migrations - API versioning ### CI/CD - GitHub Actions pipeline setup - Automated testing strategies - Build and deployment automation - Infrastructure monitoring ### Security & Compliance - Security audits and fixes - Data encryption - Permission systems (RBAC) - Audit logging - Input validation ### Infrastructure - Database sharding - Message queue systems - Caching with Redis - API gateway setup - Distributed tracing - Session management ### Testing & Quality - Automated testing strategies - A/B testing frameworks - Localization testing - Accessibility features ## Usage Example ```python import json # Read the dataset with open('planning-decomposition.jsonl', 'r') as f: for line in f: example = json.loads(line) print(f"Task: {example['task']}") print(f"Sub-tasks: {len(example['decomposition'])}") print(f"Dependencies: {len(example['execution_order'])}") print(f"Todo items: {len(example['todo_list'])}") print() ``` ## File Location `C:/Users/admin/Pony-Alpha-2-Dataset-Training/datasets/03-planning-decomposition/planning-decomposition.jsonl` ## Notes - Dataset created March 13, 2026 - All examples follow consistent schema - Suitable for training planning and task decomposition models - Covers real-world software engineering scenarios