# Ralph Multi-Agent Orchestration System

## Architecture Overview

The Ralph Multi-Agent Orchestration System enables running 10+ Claude instances in parallel with intelligent coordination, conflict resolution, and real-time observability.

```
┌─────────────────────────────────────────────┐
│            Meta-Agent Orchestrator           │
│         (ralph-integration.py)              │
│  - Analyzes requirements                    │
│  - Breaks into independent tasks            │
│  - Manages dependencies                     │
│  - Coordinates worker agents                │
└──────────────────┬──────────────────────────┘
                   │ Creates tasks
                   ▼
┌─────────────────────────────────────────────┐
│              Task Queue (Redis)              │
│         Stores and distributes work          │
└─────┬───────┬───────┬───────┬──────────────┘
      │       │       │       │
      ▼       ▼       ▼       ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Agent 1 │ │ Agent 2 │ │ Agent 3 │ │ Agent N │
│Frontend │ │ Backend │ │  Tests  │ │  Docs   │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
      │       │       │       │
      └───────┴───────┴───────┘
                   │
                   ▼
        ┌──────────────────┐
        │  Observability   │
        │    Dashboard     │
        │  (Real-time UI)  │
        └──────────────────┘
```

## Core Components

### 1. Meta-Agent Orchestrator

The meta-agent is Ralph running in orchestration mode where it manages other agents instead of writing code directly.

**Key Responsibilities:**
- Analyze project requirements
- Break down into parallelizable tasks
- Manage task dependencies
- Spawn and coordinate worker agents
- Monitor progress and handle conflicts
- Aggregate results

**Configuration:**
```bash
# Enable multi-agent mode
RALPH_MULTI_AGENT=true
RALPH_MAX_WORKERS=12
RALPH_TASK_QUEUE_HOST=localhost
RALPH_TASK_QUEUE_PORT=6379
RALPH_OBSERVABILITY_PORT=3001
```

### 2. Task Queue System

Uses Redis for reliable task distribution and state management.

**Task Structure:**
```json
{
  "id": "unique-task-id",
  "type": "frontend|backend|testing|docs|refactor|analysis",
  "description": "What needs to be done",
  "dependencies": ["task-id-1", "task-id-2"],
  "files": ["path/to/file1.ts", "path/to/file2.ts"],
  "priority": 1-10,
  "specialization": "optional-specific-agent-type",
  "timeout": 300,
  "retry_count": 0,
  "max_retries": 3
}
```

**Queue Operations:**
- `claude_tasks` - Main task queue
- `claude_tasks:pending` - Tasks waiting for dependencies
- `claude_tasks:complete` - Completed tasks
- `claude_tasks:failed` - Failed tasks for retry
- `lock:{file_path}` - File-level locks
- `task:{task_id}` - Task status tracking

### 3. Specialized Worker Agents

Each worker agent has a specific role and configuration.

**Agent Types:**

| Agent Type | Specialization | Example Tasks |
|------------|----------------|---------------|
| **Frontend** | UI/UX, React, Vue, Svelte | Component refactoring, styling |
| **Backend** | APIs, databases, services | Endpoint creation, data models |
| **Testing** | Unit tests, integration tests | Test writing, coverage improvement |
| **Documentation** | Docs, comments, README | API docs, inline documentation |
| **Refactor** | Code quality, optimization | Performance tuning, code cleanup |
| **Analysis** | Code review, architecture | Dependency analysis, security audit |

**Worker Configuration:**
```json
{
  "agent_id": "agent-frontend-1",
  "specialization": "frontend",
  "max_concurrent_tasks": 1,
  "file_lock_timeout": 300,
  "heartbeat_interval": 10,
  "log_level": "info"
}
```

### 4. File Locking & Conflict Resolution

Prevents multiple agents from modifying the same file simultaneously.

**Lock Acquisition Flow:**
1. Agent requests locks for required files
2. Redis attempts to set lock keys with NX flag
3. If all locks acquired, agent proceeds
4. If any lock fails, agent waits and retries
5. Locks auto-expire after timeout (safety mechanism)

**Conflict Detection:**
```python
def detect_conflicts(agent_files: Dict[str, List[str]]) -> List[Conflict]:
    """Detect file access conflicts between agents"""
    file_agents = {}
    for agent_id, files in agent_files.items():
        for file_path in files:
            if file_path in file_agents:
                file_agents[file_path].append(agent_id)
            else:
                file_agents[file_path] = [agent_id]

    conflicts = [
        {"file": f, "agents": agents}
        for f, agents in file_agents.items()
        if len(agents) > 1
    ]
    return conflicts
```

**Resolution Strategies:**
1. **Dependency-based ordering** - Add dependencies between conflicting tasks
2. **File splitting** - Break tasks into smaller units
3. **Agent specialization** - Assign conflicting tasks to same agent
4. **Merge coordination** - Use git merge strategies

### 5. Real-Time Observability Dashboard

WebSocket-based dashboard for monitoring all agents in real-time.

**Dashboard Features:**
- Live agent status (active, busy, idle, error)
- Task progress tracking
- File modification visualization
- Conflict alerts and resolution
- Activity stream with timestamps
- Performance metrics

**WebSocket Events:**
```javascript
// Agent update
{
  "type": "agent_update",
  "agent": {
    "id": "agent-frontend-1",
    "status": "active",
    "currentTask": "refactor-buttons",
    "progress": 65,
    "workingFiles": ["components/Button.tsx"],
    "completedCount": 12
  }
}

// Conflict detected
{
  "type": "conflict",
  "conflict": {
    "file": "components/Button.tsx",
    "agents": ["agent-frontend-1", "agent-frontend-2"],
    "timestamp": "2025-08-02T15:30:00Z"
  }
}

// Task completed
{
  "type": "task_complete",
  "taskId": "refactor-buttons",
  "agentId": "agent-frontend-1",
  "duration": 45.2,
  "filesModified": ["components/Button.tsx", "components/Button.test.tsx"]
}
```

## Usage Examples

### Example 1: Frontend Refactor

```bash
# Start multi-agent Ralph for frontend refactor
RALPH_MULTI_AGENT=true \
RALPH_MAX_WORKERS=8 \
/ralph "Refactor all components from class to functional with hooks"
```

**Meta-Agent Breakdown:**
```json
[
  {
    "id": "analyze-1",
    "type": "analysis",
    "description": "Scan all components and create refactoring plan",
    "dependencies": [],
    "files": []
  },
  {
    "id": "refactor-buttons",
    "type": "frontend",
    "description": "Convert all Button components to functional",
    "dependencies": ["analyze-1"],
    "files": ["components/Button/*.tsx"]
  },
  {
    "id": "refactor-forms",
    "type": "frontend",
    "description": "Convert all Form components to functional",
    "dependencies": ["analyze-1"],
    "files": ["components/Form/*.tsx"]
  },
  {
    "id": "update-tests-buttons",
    "type": "testing",
    "description": "Update Button component tests",
    "dependencies": ["refactor-buttons"],
    "files": ["__tests__/Button/*.test.tsx"]
  }
]
```

### Example 2: Full-Stack Feature

```bash
# Build feature with parallel frontend/backend
RALPH_MULTI_AGENT=true \
RALPH_MAX_WORKERS=6 \
/ralph "Build user authentication with OAuth, profile management, and email verification"
```

**Parallel Execution:**
- Agent 1 (Frontend): Build login form UI
- Agent 2 (Frontend): Build profile page UI
- Agent 3 (Backend): Implement OAuth endpoints
- Agent 4 (Backend): Implement profile API
- Agent 5 (Testing): Write integration tests
- Agent 6 (Docs): Write API documentation

### Example 3: Codebase Optimization

```bash
# Parallel optimization across codebase
RALPH_MULTI_AGENT=true \
RALPH_MAX_WORKERS=10 \
/ralph "Optimize performance: bundle size, lazy loading, image optimization, caching strategy"
```

## Environment Variables

```bash
# Multi-Agent Configuration
RALPH_MULTI_AGENT=true                    # Enable multi-agent mode
RALPH_MAX_WORKERS=12                      # Maximum worker agents
RALPH_MIN_WORKERS=2                       # Minimum worker agents

# Task Queue (Redis)
RALPH_TASK_QUEUE_HOST=localhost           # Redis host
RALPH_TASK_QUEUE_PORT=6379                # Redis port
RALPH_TASK_QUEUE_DB=0                     # Redis database
RALPH_TASK_QUEUE_PASSWORD=                # Redis password (optional)

# Observability
RALPH_OBSERVABILITY_ENABLED=true          # Enable dashboard
RALPH_OBSERVABILITY_PORT=3001             # WebSocket port
RALPH_OBSERVABILITY_HOST=localhost        # Dashboard host

# Agent Behavior
RALPH_AGENT_TIMEOUT=300                   # Task timeout (seconds)
RALPH_AGENT_HEARTBEAT=10                  # Heartbeat interval (seconds)
RALPH_FILE_LOCK_TIMEOUT=300               # File lock timeout (seconds)
RALPH_MAX_RETRIES=3                       # Task retry count

# Logging
RALPH_VERBOSE=true                        # Verbose logging
RALPH_LOG_LEVEL=info                      # Log level
RALPH_LOG_FILE=.ralph/multi-agent.log     # Log file path
```

## Monitoring & Debugging

### Check Multi-Agent Status

```bash
# View active agents
redis-cli keys "agent:*"

# View task queue
redis-cli lrange claude_tasks 0 10

# View file locks
redis-cli keys "lock:*"

# View task status
redis-cli hgetall "task:task-id"

# View completed tasks
redis-cli lrange claude_tasks:complete 0 10
```

### Observability Dashboard

Access dashboard at: `http://localhost:3001`

**Dashboard Sections:**
1. **Mission Status** - Overall progress
2. **Agent Grid** - Individual agent status
3. **Conflict Alerts** - Active file conflicts
4. **Activity Stream** - Real-time event log
5. **Performance Metrics** - Agent efficiency

## Best Practices

### 1. Task Design
- Keep tasks independent when possible
- Minimize cross-task file dependencies
- Use specialization to guide agent assignment
- Set appropriate timeouts

### 2. Dependency Management
- Use topological sort for execution order
- Minimize dependency depth
- Allow parallel execution at every opportunity
- Handle circular dependencies gracefully

### 3. Conflict Prevention
- Group related file modifications in single task
- Use file-specific agents when conflicts likely
- Implement merge strategies for common conflicts
- Monitor lock acquisition time

### 4. Observability
- Log all agent activities
- Track file modifications in real-time
- Alert on conflicts immediately
- Maintain activity history for debugging

### 5. Error Handling
- Implement retry logic with exponential backoff
- Quarantine failing tasks for analysis
- Provide detailed error context
- Allow manual intervention when needed

## Troubleshooting

### Common Issues

**Agents stuck waiting:**
```bash
# Check for stale locks
redis-cli keys "lock:*"

# Clear stale locks
redis-cli del "lock:path/to/file"
```

**Tasks not executing:**
```bash
# Check task queue
redis-cli lrange claude_tasks 0 -1

# Check pending tasks
redis-cli lrange claude_tasks:pending 0 -1
```

**Dashboard not updating:**
```bash
# Check WebSocket server
netstat -an | grep 3001

# Restart observability server
pkill -f ralph-observability
RALPH_OBSERVABILITY_ENABLED=true ralph-observability
```

## Performance Tuning

### Optimize Worker Count
```bash
# Calculate optimal workers
WORKERS = (CPU_CORES * 1.5) - 1

# For I/O bound tasks
WORKERS = CPU_CORES * 2

# For CPU bound tasks
WORKERS = CPU_CORES
```

### Redis Configuration
```bash
# redis.conf
maxmemory 2gb
maxmemory-policy allkeys-lru
timeout 300
tcp-keepalive 60
```

### Agent Pool Sizing
```bash
# Dynamic scaling based on queue depth
QUEUE_DEPTH=$(redis-cli llen claude_tasks)
if [ $QUEUE_DEPTH -gt 50 ]; then
    SCALE_UP=true
elif [ $QUEUE_DEPTH -lt 10 ]; then
    SCALE_DOWN=true
fi
```

## Security Considerations

1. **File Access Control** - Restrict agent file system access
2. **Redis Authentication** - Use Redis password in production
3. **Network Isolation** - Run agents in isolated network
4. **Resource Limits** - Set CPU/memory limits per agent
5. **Audit Logging** - Log all agent actions for compliance

## Integration with Claude Code

The Ralph Multi-Agent System integrates seamlessly with Claude Code:

```bash
# Use with Claude Code projects
export RALPH_AGENT=claude
export RALPH_MULTI_AGENT=true
cd /path/to/claude-code-project
/ralph "Refactor authentication system"
```

**Claude Code Integration Points:**
- Uses Claude Code agent pool
- Respects Claude Code project structure
- Integrates with Claude Code hooks
- Supports Claude Code tool ecosystem