Files
Pony-Alpha-2-Dataset-Training/skills/skill-server-automation.md
Pony Alpha 2 68453089ee feat: initial Alpha Brain 2 dataset release
Massive training corpus for AI coding models containing:
- 10 JSONL training datasets (641+ examples across coding, reasoning, planning, architecture, communication, debugging, security, workflows, error handling, UI/UX)
- 11 agent behavior specifications (explorer, planner, reviewer, debugger, executor, UI designer, Linux admin, kernel engineer, security architect, automation engineer, API architect)
- 6 skill definition files (coding, API engineering, kernel, Linux server, security architecture, server automation, UI/UX)
- Master README with project origin story and philosophy

Built by Pony Alpha 2 to help AI models learn expert-level coding approaches.
2026-03-13 16:26:29 +04:00

32 KiB

Server Automation Expert Skill

Activation Criteria

Activate this skill when the user:

  • Automates server provisioning and configuration
  • Implements Infrastructure as Code (IaC)
  • Designs CI/CD pipelines
  • Automates deployment processes
  • Manages container orchestration
  • Implements configuration management
  • Automates testing and quality assurance
  • Designs monitoring and alerting automation
  • Automates backup and disaster recovery
  • Implements GitOps practices
  • Automates security scanning and compliance
  • Manages multi-environment deployments
  • Implements blue-green or canary deployments
  • Automates infrastructure scaling
  • Needs reproducible infrastructure builds

Core Methodology

1. Ansible Playbook Design

Complete Ansible Setup

# ansible.cfg - Ansible Configuration
[defaults]
inventory = ./inventory
host_key_checking = False
retry_files_enabled = False
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
stdout_callback = yaml
bin_ansible_callbacks = True
callbacks_enabled = profile_tasks, timer
jinja2_extensions = jinja2.ext.do
display_skipped_hosts = False

[ssh_connection]
pipelining = True
control_path = /tmp/ansible-ssh-%%h-%%p-%%r
# inventory.yml - Dynamic Inventory
---
all:
  children:
    production:
      hosts:
        prod-web-01:
          ansible_host: 10.0.1.10
          ansible_user: deploy
        prod-web-02:
          ansible_host: 10.0.1.11
          ansible_user: deploy
        prod-db-01:
          ansible_host: 10.0.20.10
          ansible_user: deploy
    staging:
      hosts:
        staging-web-01:
          ansible_host: 10.1.1.10
          ansible_user: deploy
    webservers:
      children:
        production:
          vars:
            nginx_worker_processes: auto
            nginx_worker_connections: 1024
    databases:
      children:
        production:
          vars:
            postgresql_version: 15
            postgresql_max_connections: 200
  vars:
    ansible_python_interpreter: /usr/bin/python3
    env: production
# site.yml - Master Playbook
---
- name: Configure web servers
  hosts: webservers
  become: true
  roles:
    - role: base
      tags: ['base']
    - role: nginx
      tags: ['nginx', 'web']
    - role: application
      tags: ['application']
    - role: monitoring
      tags: ['monitoring']

- name: Configure database servers
  hosts: databases
  become: true
  roles:
    - role: base
      tags: ['base']
    - role: postgresql
      tags: ['database', 'postgresql']
    - role: monitoring
      tags: ['monitoring']

Production-Ready Roles

# roles/base/tasks/main.yml
---
- name: Update apt cache
  apt:
    update_cache: true
    cache_valid_time: 3600
  tags: ['apt']

- name: Upgrade all packages
  apt:
    upgrade: dist
    autoremove: true
  tags: ['apt']

- name: Install base packages
  apt:
    name:
      - curl
      - wget
      - git
      - vim
      - htop
      - tmux
      - net-tools
      - tcpdump
      - strace
      - sysstat
      - fail2ban
      - ufw
    state: present
  tags: ['packages']

- name: Configure timezone
  timezone:
    name: UTC
  tags: ['system']

- name: Set hostname
  hostname:
    name: "{{ inventory_hostname }}"
  tags: ['system']

- name: Configure sysctl
  sysctl:
    name: "{{ item.name }}"
    value: "{{ item.value }}"
    state: present
    reload: true
  loop:
    - { name: "net.ipv4.ip_forward", value: "0" }
    - { name: "net.ipv4.conf.all.send_redirects", value: "0" }
    - { name: "net.ipv4.conf.default.send_redirects", value: "0" }
    - { name: "net.ipv4.conf.all.accept_source_route", value: "0" }
    - { name: "net.ipv4.conf.default.accept_source_route", value: "0" }
    - { name: "net.ipv4.conf.all.accept_redirects", value: "0" }
    - { name: "net.ipv4.conf.default.accept_redirects", value: "0" }
    - { name: "net.ipv4.icmp_echo_ignore_broadcasts", value: "1" }
    - { name: "net.ipv4.tcp_syncookies", value: "1" }
    - { name: "net.ipv4.tcp_max_syn_backlog", value: "2048" }
    - { name: "net.core.somaxconn", value: "1024" }
  tags: ['system', 'security']

- name: Configure limits
  pam_limits:
    domain: "*"
    limit_type: "{{ item.type }}"
    limit_item: "{{ item.item }}"
    value: "{{ item.value }}"
  loop:
    - { type: "soft", item: "nofile", value: "65536" }
    - { type: "hard", item: "nofile", value: "65536" }
    - { type: "soft", item: "nproc", value: "65536" }
    - { type: "hard", item: "nproc", value: "65536" }
  tags: ['system']

- name: Configure fail2ban
  copy:
    src: jail.local
    dest: /etc/fail2ban/jail.local
    owner: root
    group: root
    mode: '0644'
  notify: restart fail2ban
  tags: ['security']

- name: Ensure fail2ban is running
  service:
    name: fail2ban
    state: started
    enabled: true
  tags: ['security']

- name: Configure UFW
  ufw:
    state: enabled
    policy: deny
    direction: incoming
  tags: ['firewall']

- name: Allow SSH through UFW
  ufw:
    rule: allow
    port: "22"
    proto: tcp
  tags: ['firewall']

- name: Allow HTTP/HTTPS through UFW
  ufw:
    rule: allow
    port: "{{ item }}"
    proto: tcp
  loop:
    - "80"
    - "443"
  tags: ['firewall']

- name: Create deploy user
  user:
    name: deploy
    shell: /bin/bash
    groups: sudo
    append: true
    state: present
  tags: ['users']

- name: Add SSH key for deploy user
  authorized_key:
    user: deploy
    key: "{{ deploy_ssh_public_key }}"
    state: present
  tags: ['users']
# roles/nginx/tasks/main.yml
---
- name: Add NGINX repository
  apt_repository:
    repo: "ppa:ondrej/nginx"
    state: present
    update_cache: true
  tags: ['nginx', 'repository']

- name: Install NGINX
  apt:
    name: nginx
    state: present
  tags: ['nginx', 'packages']

- name: Create nginx directories
  file:
    path: "{{ item }}"
    state: directory
    owner: www-data
    group: www-data
    mode: '0755'
  loop:
    - /var/www/html
    - /etc/nginx/sites-available
    - /etc/nginx/sites-enabled
    - /var/log/nginx
  tags: ['nginx', 'config']

- name: Configure nginx main config
  template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
    owner: root
    group: root
    mode: '0644'
    validate: 'nginx -t -c %s'
  notify: reload nginx
  tags: ['nginx', 'config']

- name: Remove default nginx site
  file:
    path: /etc/nginx/sites-enabled/default
    state: absent
  notify: reload nginx
  tags: ['nginx', 'config']

- name: Configure nginx site
  template:
    src: site.conf.j2
    dest: "/etc/nginx/sites-available/{{ application_name }}.conf"
    owner: root
    group: root
    mode: '0644'
    validate: 'nginx -t'
  notify: reload nginx
  tags: ['nginx', 'config']

- name: Enable nginx site
  file:
    src: "/etc/nginx/sites-available/{{ application_name }}.conf"
    dest: "/etc/nginx/sites-enabled/{{ application_name }}.conf"
    state: link
  notify: reload nginx
  tags: ['nginx', 'config']

- name: Ensure nginx is running
  service:
    name: nginx
    state: started
    enabled: true
  tags: ['nginx', 'service']

- name: Configure logrotate for nginx
  copy:
    src: nginx-logrotate
    dest: /etc/logrotate.d/nginx
    owner: root
    group: root
    mode: '0644'
  tags: ['nginx', 'logging']
# roles/nginx/templates/nginx.conf.j2
user www-data;
worker_processes {{ nginx_worker_processes }};
worker_rlimit_nofile 65535;

error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;

events {
    worker_connections {{ nginx_worker_connections }};
    use epoll;
    multi_accept on;
}

http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for"';

    access_log /var/log/nginx/access.log main;

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server_tokens off;

    client_body_buffer_size 128k;
    client_max_body_size 100m;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 4k;

    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css text/xml text/javascript
               application/json application/javascript application/xml+rss
               application/rss+xml font/truetype font/opentype
               application/vnd.ms-fontobject image/svg+xml;

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}
# roles/docker/tasks/main.yml
---
- name: Install prerequisites
  apt:
    name:
      - apt-transport-https
      - ca-certificates
      - curl
      - gnupg
      - lsb-release
    state: present
  tags: ['docker', 'prerequisites']

- name: Add Docker GPG key
  apt_key:
    url: https://download.docker.com/linux/{{ ansible_distribution | lower }}/gpg
    state: present
  tags: ['docker', 'repository']

- name: Add Docker repository
  apt_repository:
    repo: "deb https://download.docker.com/linux/{{ ansible_distribution | lower }} {{ ansible_distribution_release }} stable"
    state: present
    update_cache: true
  tags: ['docker', 'repository']

- name: Install Docker
  apt:
    name:
      - docker-ce
      - docker-ce-cli
      - containerd.io
      - docker-compose-plugin
    state: present
  tags: ['docker', 'packages']

- name: Create Docker directory
  file:
    path: /etc/docker
    state: directory
    owner: root
    group: root
    mode: '0755'
  tags: ['docker', 'config']

- name: Configure Docker daemon
  copy:
    src: daemon.json
    dest: /etc/docker/daemon.json
    owner: root
    group: root
    mode: '0644'
  notify: restart docker
  tags: ['docker', 'config']

- name: Ensure deploy user can use Docker
  user:
    name: deploy
    groups: docker
    append: true
  tags: ['docker', 'users']

- name: Ensure Docker is running
  service:
    name: docker
    state: started
    enabled: true
  tags: ['docker', 'service']

- name: Install Python Docker SDK
  pip:
    name: docker
    state: present
  tags: ['docker', 'python']

2. Terraform Infrastructure as Code

Production Terraform Configuration

# terraform/terraform.tf
terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }

  backend "s3" {
    bucket         = "terraform-state-prod"
    key            = "production/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

# terraform/provider.tf
provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Environment = var.environment
      Project     = var.project_name
      ManagedBy   = "Terraform"
    }
  }
}

# terraform/variables.tf
variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

variable "environment" {
  description = "Environment name"
  type        = string
  default     = "production"
}

variable "project_name" {
  description = "Project name"
  type        = string
  default     = "myapp"
}

variable "vpc_cidr" {
  description = "VPC CIDR block"
  type        = string
  default     = "10.0.0.0/16"
}

variable "availability_zones" {
  description = "List of availability zones"
  type        = list(string)
  default     = ["us-east-1a", "us-east-1b"]
}

variable "instance_types" {
  description = "Instance types by tier"
  type = map(string)
  default = {
    web  = "t3.medium"
    app  = "t3.large"
    db   = "r6g.large"
  }
}

# terraform/vpc.tf
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_support   = true
  enable_dns_hostnames = true

  tags = {
    Name = "${var.project_name}-${var.environment}-vpc"
    Tier = "Network"
  }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}-${var.environment}-igw"
  }
}

resource "aws_subnet" "public" {
  count                   = length(var.availability_zones)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.project_name}-${var.environment}-public-${var.availability_zones[count.index]}"
    Tier = "Public"
  }
}

resource "aws_subnet" "private" {
  count                   = length(var.availability_zones)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = false

  tags = {
    Name = "${var.project_name}-${var.environment}-private-${var.availability_zones[count.index]}"
    Tier = "Private"
  }
}

resource "aws_subnet" "database" {
  count                   = length(var.availability_zones)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index + 20)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = false

  tags = {
    Name = "${var.project_name}-${var.environment}-database-${var.availability_zones[count.index]}"
    Tier = "Database"
  }
}

resource "aws_eip" "nat" {
  count  = length(var.availability_zones)
  domain = "vpc"

  tags = {
    Name = "${var.project_name}-${var.environment}-nat-${count.index}"
  }

  depends_on = [aws_internet_gateway.main]
}

resource "aws_nat_gateway" "main" {
  count         = length(var.availability_zones)
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = {
    Name = "${var.project_name}-${var.environment}-nat-${count.index}"
  }

  depends_on = [aws_internet_gateway.main]
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-public-rt"
  }
}

resource "aws_route_table" "private" {
  count  = length(var.availability_zones)
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main[count.index].id
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-private-rt-${count.index}"
  }
}

resource "aws_route_table_association" "public" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "private" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

resource "aws_route_table_association" "database" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.database[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

# terraform/security_groups.tf
resource "aws_security_group" "web" {
  name        = "${var.project_name}-${var.environment}-web"
  description = "Security group for web tier"
  vpc_id      = aws_vpc.main.id

  ingress {
    description = "HTTPS from anywhere"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "HTTP from anywhere"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    description = "All outbound traffic"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-web"
    Tier = "Web"
  }
}

resource "aws_security_group" "app" {
  name        = "${var.project_name}-${var.environment}-app"
  description = "Security group for application tier"
  vpc_id      = aws_vpc.main.id

  ingress {
    description     = "Application port from web tier"
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.web.id]
  }

  egress {
    description = "All outbound traffic"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-app"
    Tier = "Application"
  }
}

resource "aws_security_group" "database" {
  name        = "${var.project_name}-${var.environment}-database"
  description = "Security group for database tier"
  vpc_id      = aws_vpc.main.id

  ingress {
    description     = "PostgreSQL from application tier"
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-database"
    Tier = "Database"
  }
}

# terraform/ec2.tf
resource "aws_launch_template" "web" {
  name_prefix   = "${var.project_name}-${var.environment}-web-"
  image_id      = data.aws_ami.amazon_linux_2.id
  instance_type = var.instance_types.web
  key_name      = aws_key_pair.main.key_name

  network_interfaces {
    associate_public_ip_address = true
    security_groups             = [aws_security_group.web.id]
  }

  user_data = base64encode(templatefile("${path.module}/templates/web_user_data.sh", {
    environment = var.environment
  }))

  tag_specifications {
    resource_type = "instance"
    tags = {
      Name = "${var.project_name}-${var.environment}-web"
      Tier = "Web"
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "web" {
  desired_capacity    = 2
  max_size           = 4
  min_size           = 2
  vpc_zone_identifier = aws_subnet.public[*].id

  target_group_arns = [aws_lb_target_group.web.arn]

  launch_template {
    id      = aws_launch_template.web.id
    version = "$Latest"
  }

  tag {
    key                 = "Name"
    value               = "${var.project_name}-${var.environment}-web"
    propagate_at_launch = true
  }
}

# terraform/load_balancer.tf
resource "aws_lb" "main" {
  name               = "${var.project_name}-${var.environment}-lb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.web.id]
  subnets            = aws_subnet.public[*].id

  enable_deletion_protection = false

  tags = {
    Name = "${var.project_name}-${var.environment}-lb"
  }
}

resource "aws_lb_target_group" "web" {
  name     = "${var.project_name}-${var.environment}-web-tg"
  port     = 80
  protocol = "HTTP"
  vpc_id   = aws_vpc.main.id

  health_check {
    enabled             = true
    healthy_threshold   = 2
    interval            = 30
    matcher             = "200"
    path                = "/health"
    port                = "traffic-port"
    protocol            = "HTTP"
    timeout             = 5
    unhealthy_threshold = 3
  }
}

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.main.arn
  port              = "443"
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS-1-2-2017-01"
  certificate_arn   = aws_acm_certificate.main.arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.web.arn
  }
}

resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.main.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type = "redirect"

    redirect {
      port        = "443"
      protocol    = "HTTPS"
      status_code = "HTTP_301"
    }
  }
}

3. CI/CD Pipeline Design

GitHub Actions Production Pipeline

# .github/workflows/ci-cd.yml
name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main, develop]

env:
  AWS_REGION: us-east-1
  ECR_REPOSITORY: myapp
  ECS_CLUSTER: production
  ECS_SERVICE: myapp-service

jobs:
  # CI Job
  test:
    name: Test
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run linter
        run: npm run lint

      - name: Run tests
        run: npm run test:coverage

      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          files: ./coverage/lcov.info
          flags: unittests
          name: codecov-umbrella

      - name: Run security scan
        run: npm audit --audit-level=moderate

      - name: Run SAST
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          format: 'sarif'
          output: 'trivy-results.sarif'

      - name: Upload Trivy results to GitHub Security tab
        uses: github/codeql-action/upload-sarif@v2
        if: always()
        with:
          sarif_file: 'trivy-results.sarif'

  # Build Job
  build:
    name: Build Docker Image
    needs: test
    runs-on: ubuntu-latest
    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
      image-digest: ${{ steps.build.outputs.digest }}
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=semver,pattern={{version}}
            type=semver,pattern={{major}}.{{minor}}
            type=sha,prefix={{branch}}-

      - name: Build and push Docker image
        id: build
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          build-args: |
            BUILD_DATE=${{ github.event.repository.updated_at }}
            VCS_REF=${{ github.sha }}
            VERSION=${{ steps.meta.outputs.version }}

      - name: Image vulnerability scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}:${{ github.sha }}
          format: 'sarif'
          output: 'trivy-image-results.sarif'

      - name: Upload Trivy results to GitHub Security
        uses: github/codeql-action/upload-sarif@v2
        if: always()
        with:
          sarif_file: 'trivy-image-results.sarif'

  # Deploy to Staging
  deploy-staging:
    name: Deploy to Staging
    needs: build
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/develop'
    environment:
      name: staging
      url: https://staging.example.com
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Deploy to ECS (Staging)
        run: |
          aws ecs update-service \
            --cluster staging \
            --service myapp-staging-service \
            --force-new-deployment \
            --region ${{ env.AWS_REGION }}

      - name: Wait for deployment
        run: |
          aws ecs wait services-stable \
            --cluster staging \
            --services myapp-staging-service \
            --region ${{ env.AWS_REGION }}

      - name: Run integration tests
        run: |
          npm run test:integration -- --env=staging

  # Deploy to Production
  deploy-production:
    name: Deploy to Production
    needs: [build, deploy-staging]
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment:
      name: production
      url: https://example.com
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Create deployment record
        run: |
          gh release create ${{ github.sha }} \
            --title "Release ${{ github.sha }}" \
            --notes "Deploying to production"

      - name: Blue-green deployment
        run: |
          # Switch traffic to new task set
          TASK_SET_ARN=$(aws ecs create-task-set \
            --cluster production \
            --service myapp-service \
            --task-definition myapp:${{ github.sha }} \
            --launch-type FARGATE \
            --network-configuration "awsvpcConfiguration={subnets=[${{ env.PRIVATE_SUBNETS }}],securityGroups=[${{ env.SECURITY_GROUP }}],assignPublicIp=DISABLED}" \
            --query 'taskSet.taskSetArn' \
            --output text)

          # Gradual rollout
          for percentage in 10 25 50 75 100; do
            aws ecs update-service-primary-task-set \
              --cluster production \
              --service myapp-service \
              --primary-task-set $TASK_SET_ARN \
              --task-set $TASK_SET_ARN
            sleep 30
          done

      - name: Run smoke tests
        run: |
          npm run test:smoke -- --env=production

      - name: Rollback on failure
        if: failure()
        run: |
          # Rollback to previous task set
          aws ecs update-service \
            --cluster production \
            --service myapp-service \
            --task-definition myapp:previous \
            --force-new-deployment

      - name: Notify team
        if: success()
        uses: 8398a7/action-slack@v3
        with:
          status: ${{ job.status }}
          text: |
            Production deployment successful!
            Commit: ${{ github.sha }}
            Author: ${{ github.actor }}
          webhook_url: ${{ secrets.SLACK_WEBHOOK }}

4. Kubernetes Deployment

Helm Chart for Production

# helm/myapp/Chart.yaml
apiVersion: v2
name: myapp
description: A Helm chart for my application
type: application
version: 1.0.0
appVersion: "1.0"

# helm/myapp/values.yaml
replicaCount: 3

image:
  repository: myapp
  pullPolicy: IfNotPresent
  tag: "1.0.0"

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

serviceAccount:
  create: true
  annotations: {}
  name: ""

podAnnotations: {}

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000

securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  runAsUser: 1000
  capabilities:
    drop:
    - ALL

service:
  type: ClusterIP
  port: 80
  annotations: {}

ingress:
  enabled: true
  className: "nginx"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
  hosts:
    - host: example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: myapp-tls
      hosts:
        - example.com

resources:
  limits:
    cpu: 1000m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 80
  targetMemoryUtilizationPercentage: 80

nodeSelector: {}

tolerations: []

affinity: {}

# helm/myapp/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "myapp.fullname" . }}
  labels:
    {{- include "myapp.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "myapp.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      {{- with .Values.podAnnotations }}
      annotations:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      labels:
        {{- include "myapp.selectorLabels" . | nindent 8 }}
    spec:
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      serviceAccountName: {{ include "myapp.serviceAccountName" . }}
      securityContext:
        {{- toYaml .Values.podSecurityContext | nindent 8 }}
      containers:
      - name: {{ .Chart.Name }}
        securityContext:
          {{- toYaml .Values.securityContext | nindent 10 }}
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
        imagePullPolicy: {{ .Values.image.pullPolicy }}
        ports:
        - name: http
          containerPort: 8080
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /health
            port: http
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready
            port: http
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        resources:
          {{- toYaml .Values.resources | nindent 10 }}
        volumeMounts:
        - name: temp
          mountPath: /tmp
        - name: cache
          mountPath: /app/cache
      volumes:
      - name: temp
        emptyDir: {}
      - name: cache
        emptyDir: {}
      {{- with .Values.nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.affinity }}
      affinity:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.tolerations }}
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}

5. Decision Trees

Automation Tool Selection

What to automate?
│
├─ Configuration management → Ansible, Chef, Puppet
├─ Infrastructure provisioning → Terraform, CloudFormation
├─ Container orchestration → Kubernetes, Docker Swarm
├─ CI/CD → Jenkins, GitLab CI, GitHub Actions
├─ Monitoring → Prometheus, Grafana, Datadog
├─ Log management → ELK Stack, Splunk
└─ Security scanning → Trivy, SonarQube

Deployment Strategy Selection

Deployment requirements?
│
├─ Zero downtime → Blue-green, Canary
├─ Quick rollback → Blue-green
├─ Gradual rollout → Canary, Rolling
├─ Simple infrastructure → Rolling
├─ Complex microservices → Canary
└─ Enterprise compliance → Blue-green with approvals

6. Anti-Patterns to Avoid

  1. Hard-coded secrets: Use vaults, never hard-code
  2. No testing: Always test before deploying
  3. Manual deployments: Automate everything
  4. No rollback plan: Always have a rollback strategy
  5. Missing monitoring: You can't manage what you don't measure
  6. Large monoliths: Break into smaller, deployable units
  7. No version control: Everything must be in git
  8. Tight coupling: Design for independence
  9. No documentation: Document your automation
  10. Ignoring security: Security must be built-in

7. Quality Checklist

Before considering automation production-ready:

  • All infrastructure codified
  • Secrets management implemented
  • Automated testing complete
  • CI/CD pipeline tested
  • Rollback procedure tested
  • Monitoring and alerting configured
  • Security scanning integrated
  • Documentation complete
  • Peer review completed
  • Disaster recovery tested
  • Configuration drift detection active
  • Automation idempotent
  • Error handling implemented
  • Performance testing completed
  • Compliance requirements met
  • Backup automation configured
  • Logging and auditing enabled
  • Team training completed
  • Runbooks documented
  • SLA requirements met

This comprehensive skill definition provides complete guidance for server automation across modern infrastructure.