Files

Pony Alpha 2 68453089ee feat: initial Alpha Brain 2 dataset release

Massive training corpus for AI coding models containing:
- 10 JSONL training datasets (641+ examples across coding, reasoning, planning, architecture, communication, debugging, security, workflows, error handling, UI/UX)
- 11 agent behavior specifications (explorer, planner, reviewer, debugger, executor, UI designer, Linux admin, kernel engineer, security architect, automation engineer, API architect)
- 6 skill definition files (coding, API engineering, kernel, Linux server, security architecture, server automation, UI/UX)
- Master README with project origin story and philosophy

Built by Pony Alpha 2 to help AI models learn expert-level coding approaches.

2026-03-13 16:26:29 +04:00

32 KiB

Raw Blame History

Server Automation Expert Skill

Activation Criteria

Activate this skill when the user:

Automates server provisioning and configuration
Implements Infrastructure as Code (IaC)
Designs CI/CD pipelines
Automates deployment processes
Manages container orchestration
Implements configuration management
Automates testing and quality assurance
Designs monitoring and alerting automation
Automates backup and disaster recovery
Implements GitOps practices
Automates security scanning and compliance
Manages multi-environment deployments
Implements blue-green or canary deployments
Automates infrastructure scaling
Needs reproducible infrastructure builds

Core Methodology

1. Ansible Playbook Design

Complete Ansible Setup

# ansible.cfg - Ansible Configuration
[defaults]
inventory = ./inventory
host_key_checking = False
retry_files_enabled = False
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
stdout_callback = yaml
bin_ansible_callbacks = True
callbacks_enabled = profile_tasks, timer
jinja2_extensions = jinja2.ext.do
display_skipped_hosts = False

[ssh_connection]
pipelining = True
control_path = /tmp/ansible-ssh-%%h-%%p-%%r

# inventory.yml - Dynamic Inventory
---
all:
  children:
    production:
      hosts:
        prod-web-01:
          ansible_host: 10.0.1.10
          ansible_user: deploy
        prod-web-02:
          ansible_host: 10.0.1.11
          ansible_user: deploy
        prod-db-01:
          ansible_host: 10.0.20.10
          ansible_user: deploy
    staging:
      hosts:
        staging-web-01:
          ansible_host: 10.1.1.10
          ansible_user: deploy
    webservers:
      children:
        production:
          vars:
            nginx_worker_processes: auto
            nginx_worker_connections: 1024
    databases:
      children:
        production:
          vars:
            postgresql_version: 15
            postgresql_max_connections: 200
  vars:
    ansible_python_interpreter: /usr/bin/python3
    env: production

# site.yml - Master Playbook
---
- name: Configure web servers
  hosts: webservers
  become: true
  roles:
    - role: base
      tags: ['base']
    - role: nginx
      tags: ['nginx', 'web']
    - role: application
      tags: ['application']
    - role: monitoring
      tags: ['monitoring']

- name: Configure database servers
  hosts: databases
  become: true
  roles:
    - role: base
      tags: ['base']
    - role: postgresql
      tags: ['database', 'postgresql']
    - role: monitoring
      tags: ['monitoring']

Production-Ready Roles

# roles/base/tasks/main.yml
---
- name: Update apt cache
  apt:
    update_cache: true
    cache_valid_time: 3600
  tags: ['apt']

- name: Upgrade all packages
  apt:
    upgrade: dist
    autoremove: true
  tags: ['apt']

- name: Install base packages
  apt:
    name:
      - curl
      - wget
      - git
      - vim
      - htop
      - tmux
      - net-tools
      - tcpdump
      - strace
      - sysstat
      - fail2ban
      - ufw
    state: present
  tags: ['packages']

- name: Configure timezone
  timezone:
    name: UTC
  tags: ['system']

- name: Set hostname
  hostname:
    name: "{{ inventory_hostname }}"
  tags: ['system']

- name: Configure sysctl
  sysctl:
    name: "{{ item.name }}"
    value: "{{ item.value }}"
    state: present
    reload: true
  loop:
    - { name: "net.ipv4.ip_forward", value: "0" }
    - { name: "net.ipv4.conf.all.send_redirects", value: "0" }
    - { name: "net.ipv4.conf.default.send_redirects", value: "0" }
    - { name: "net.ipv4.conf.all.accept_source_route", value: "0" }
    - { name: "net.ipv4.conf.default.accept_source_route", value: "0" }
    - { name: "net.ipv4.conf.all.accept_redirects", value: "0" }
    - { name: "net.ipv4.conf.default.accept_redirects", value: "0" }
    - { name: "net.ipv4.icmp_echo_ignore_broadcasts", value: "1" }
    - { name: "net.ipv4.tcp_syncookies", value: "1" }
    - { name: "net.ipv4.tcp_max_syn_backlog", value: "2048" }
    - { name: "net.core.somaxconn", value: "1024" }
  tags: ['system', 'security']

- name: Configure limits
  pam_limits:
    domain: "*"
    limit_type: "{{ item.type }}"
    limit_item: "{{ item.item }}"
    value: "{{ item.value }}"
  loop:
    - { type: "soft", item: "nofile", value: "65536" }
    - { type: "hard", item: "nofile", value: "65536" }
    - { type: "soft", item: "nproc", value: "65536" }
    - { type: "hard", item: "nproc", value: "65536" }
  tags: ['system']

- name: Configure fail2ban
  copy:
    src: jail.local
    dest: /etc/fail2ban/jail.local
    owner: root
    group: root
    mode: '0644'
  notify: restart fail2ban
  tags: ['security']

- name: Ensure fail2ban is running
  service:
    name: fail2ban
    state: started
    enabled: true
  tags: ['security']

- name: Configure UFW
  ufw:
    state: enabled
    policy: deny
    direction: incoming
  tags: ['firewall']

- name: Allow SSH through UFW
  ufw:
    rule: allow
    port: "22"
    proto: tcp
  tags: ['firewall']

- name: Allow HTTP/HTTPS through UFW
  ufw:
    rule: allow
    port: "{{ item }}"
    proto: tcp
  loop:
    - "80"
    - "443"
  tags: ['firewall']

- name: Create deploy user
  user:
    name: deploy
    shell: /bin/bash
    groups: sudo
    append: true
    state: present
  tags: ['users']

- name: Add SSH key for deploy user
  authorized_key:
    user: deploy
    key: "{{ deploy_ssh_public_key }}"
    state: present
  tags: ['users']

# roles/nginx/tasks/main.yml
---
- name: Add NGINX repository
  apt_repository:
    repo: "ppa:ondrej/nginx"
    state: present
    update_cache: true
  tags: ['nginx', 'repository']

- name: Install NGINX
  apt:
    name: nginx
    state: present
  tags: ['nginx', 'packages']

- name: Create nginx directories
  file:
    path: "{{ item }}"
    state: directory
    owner: www-data
    group: www-data
    mode: '0755'
  loop:
    - /var/www/html
    - /etc/nginx/sites-available
    - /etc/nginx/sites-enabled
    - /var/log/nginx
  tags: ['nginx', 'config']

- name: Configure nginx main config
  template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
    owner: root
    group: root
    mode: '0644'
    validate: 'nginx -t -c %s'
  notify: reload nginx
  tags: ['nginx', 'config']

- name: Remove default nginx site
  file:
    path: /etc/nginx/sites-enabled/default
    state: absent
  notify: reload nginx
  tags: ['nginx', 'config']

- name: Configure nginx site
  template:
    src: site.conf.j2
    dest: "/etc/nginx/sites-available/{{ application_name }}.conf"
    owner: root
    group: root
    mode: '0644'
    validate: 'nginx -t'
  notify: reload nginx
  tags: ['nginx', 'config']

- name: Enable nginx site
  file:
    src: "/etc/nginx/sites-available/{{ application_name }}.conf"
    dest: "/etc/nginx/sites-enabled/{{ application_name }}.conf"
    state: link
  notify: reload nginx
  tags: ['nginx', 'config']

- name: Ensure nginx is running
  service:
    name: nginx
    state: started
    enabled: true
  tags: ['nginx', 'service']

- name: Configure logrotate for nginx
  copy:
    src: nginx-logrotate
    dest: /etc/logrotate.d/nginx
    owner: root
    group: root
    mode: '0644'
  tags: ['nginx', 'logging']

# roles/nginx/templates/nginx.conf.j2
user www-data;
worker_processes {{ nginx_worker_processes }};
worker_rlimit_nofile 65535;

error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;

events {
    worker_connections {{ nginx_worker_connections }};
    use epoll;
    multi_accept on;
}

http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for"';

    access_log /var/log/nginx/access.log main;

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server_tokens off;

    client_body_buffer_size 128k;
    client_max_body_size 100m;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 4k;

    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css text/xml text/javascript
               application/json application/javascript application/xml+rss
               application/rss+xml font/truetype font/opentype
               application/vnd.ms-fontobject image/svg+xml;

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

# roles/docker/tasks/main.yml
---
- name: Install prerequisites
  apt:
    name:
      - apt-transport-https
      - ca-certificates
      - curl
      - gnupg
      - lsb-release
    state: present
  tags: ['docker', 'prerequisites']

- name: Add Docker GPG key
  apt_key:
    url: https://download.docker.com/linux/{{ ansible_distribution | lower }}/gpg
    state: present
  tags: ['docker', 'repository']

- name: Add Docker repository
  apt_repository:
    repo: "deb https://download.docker.com/linux/{{ ansible_distribution | lower }} {{ ansible_distribution_release }} stable"
    state: present
    update_cache: true
  tags: ['docker', 'repository']

- name: Install Docker
  apt:
    name:
      - docker-ce
      - docker-ce-cli
      - containerd.io
      - docker-compose-plugin
    state: present
  tags: ['docker', 'packages']

- name: Create Docker directory
  file:
    path: /etc/docker
    state: directory
    owner: root
    group: root
    mode: '0755'
  tags: ['docker', 'config']

- name: Configure Docker daemon
  copy:
    src: daemon.json
    dest: /etc/docker/daemon.json
    owner: root
    group: root
    mode: '0644'
  notify: restart docker
  tags: ['docker', 'config']

- name: Ensure deploy user can use Docker
  user:
    name: deploy
    groups: docker
    append: true
  tags: ['docker', 'users']

- name: Ensure Docker is running
  service:
    name: docker
    state: started
    enabled: true
  tags: ['docker', 'service']

- name: Install Python Docker SDK
  pip:
    name: docker
    state: present
  tags: ['docker', 'python']

2. Terraform Infrastructure as Code

Production Terraform Configuration

# terraform/terraform.tf
terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }

  backend "s3" {
    bucket         = "terraform-state-prod"
    key            = "production/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

# terraform/provider.tf
provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Environment = var.environment
      Project     = var.project_name
      ManagedBy   = "Terraform"
    }
  }
}

# terraform/variables.tf
variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

variable "environment" {
  description = "Environment name"
  type        = string
  default     = "production"
}

variable "project_name" {
  description = "Project name"
  type        = string
  default     = "myapp"
}

variable "vpc_cidr" {
  description = "VPC CIDR block"
  type        = string
  default     = "10.0.0.0/16"
}

variable "availability_zones" {
  description = "List of availability zones"
  type        = list(string)
  default     = ["us-east-1a", "us-east-1b"]
}

variable "instance_types" {
  description = "Instance types by tier"
  type = map(string)
  default = {
    web  = "t3.medium"
    app  = "t3.large"
    db   = "r6g.large"
  }
}

# terraform/vpc.tf
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_support   = true
  enable_dns_hostnames = true

  tags = {
    Name = "${var.project_name}-${var.environment}-vpc"
    Tier = "Network"
  }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}-${var.environment}-igw"
  }
}

resource "aws_subnet" "public" {
  count                   = length(var.availability_zones)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.project_name}-${var.environment}-public-${var.availability_zones[count.index]}"
    Tier = "Public"
  }
}

resource "aws_subnet" "private" {
  count                   = length(var.availability_zones)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = false

  tags = {
    Name = "${var.project_name}-${var.environment}-private-${var.availability_zones[count.index]}"
    Tier = "Private"
  }
}

resource "aws_subnet" "database" {
  count                   = length(var.availability_zones)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index + 20)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = false

  tags = {
    Name = "${var.project_name}-${var.environment}-database-${var.availability_zones[count.index]}"
    Tier = "Database"
  }
}

resource "aws_eip" "nat" {
  count  = length(var.availability_zones)
  domain = "vpc"

  tags = {
    Name = "${var.project_name}-${var.environment}-nat-${count.index}"
  }

  depends_on = [aws_internet_gateway.main]
}

resource "aws_nat_gateway" "main" {
  count         = length(var.availability_zones)
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = {
    Name = "${var.project_name}-${var.environment}-nat-${count.index}"
  }

  depends_on = [aws_internet_gateway.main]
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-public-rt"
  }
}

resource "aws_route_table" "private" {
  count  = length(var.availability_zones)
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main[count.index].id
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-private-rt-${count.index}"
  }
}

resource "aws_route_table_association" "public" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "private" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

resource "aws_route_table_association" "database" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.database[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

# terraform/security_groups.tf
resource "aws_security_group" "web" {
  name        = "${var.project_name}-${var.environment}-web"
  description = "Security group for web tier"
  vpc_id      = aws_vpc.main.id

  ingress {
    description = "HTTPS from anywhere"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "HTTP from anywhere"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    description = "All outbound traffic"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-web"
    Tier = "Web"
  }
}

resource "aws_security_group" "app" {
  name        = "${var.project_name}-${var.environment}-app"
  description = "Security group for application tier"
  vpc_id      = aws_vpc.main.id

  ingress {
    description     = "Application port from web tier"
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.web.id]
  }

  egress {
    description = "All outbound traffic"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-app"
    Tier = "Application"
  }
}

resource "aws_security_group" "database" {
  name        = "${var.project_name}-${var.environment}-database"
  description = "Security group for database tier"
  vpc_id      = aws_vpc.main.id

  ingress {
    description     = "PostgreSQL from application tier"
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-database"
    Tier = "Database"
  }
}

# terraform/ec2.tf
resource "aws_launch_template" "web" {
  name_prefix   = "${var.project_name}-${var.environment}-web-"
  image_id      = data.aws_ami.amazon_linux_2.id
  instance_type = var.instance_types.web
  key_name      = aws_key_pair.main.key_name

  network_interfaces {
    associate_public_ip_address = true
    security_groups             = [aws_security_group.web.id]
  }

  user_data = base64encode(templatefile("${path.module}/templates/web_user_data.sh", {
    environment = var.environment
  }))

  tag_specifications {
    resource_type = "instance"
    tags = {
      Name = "${var.project_name}-${var.environment}-web"
      Tier = "Web"
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "web" {
  desired_capacity    = 2
  max_size           = 4
  min_size           = 2
  vpc_zone_identifier = aws_subnet.public[*].id

  target_group_arns = [aws_lb_target_group.web.arn]

  launch_template {
    id      = aws_launch_template.web.id
    version = "$Latest"
  }

  tag {
    key                 = "Name"
    value               = "${var.project_name}-${var.environment}-web"
    propagate_at_launch = true
  }
}

# terraform/load_balancer.tf
resource "aws_lb" "main" {
  name               = "${var.project_name}-${var.environment}-lb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.web.id]
  subnets            = aws_subnet.public[*].id

  enable_deletion_protection = false

  tags = {
    Name = "${var.project_name}-${var.environment}-lb"
  }
}

resource "aws_lb_target_group" "web" {
  name     = "${var.project_name}-${var.environment}-web-tg"
  port     = 80
  protocol = "HTTP"
  vpc_id   = aws_vpc.main.id

  health_check {
    enabled             = true
    healthy_threshold   = 2
    interval            = 30
    matcher             = "200"
    path                = "/health"
    port                = "traffic-port"
    protocol            = "HTTP"
    timeout             = 5
    unhealthy_threshold = 3
  }
}

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.main.arn
  port              = "443"
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS-1-2-2017-01"
  certificate_arn   = aws_acm_certificate.main.arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.web.arn
  }
}

resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.main.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type = "redirect"

    redirect {
      port        = "443"
      protocol    = "HTTPS"
      status_code = "HTTP_301"
    }
  }
}

3. CI/CD Pipeline Design

GitHub Actions Production Pipeline

# .github/workflows/ci-cd.yml
name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main, develop]

env:
  AWS_REGION: us-east-1
  ECR_REPOSITORY: myapp
  ECS_CLUSTER: production
  ECS_SERVICE: myapp-service

jobs:
  # CI Job
  test:
    name: Test
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run linter
        run: npm run lint

      - name: Run tests
        run: npm run test:coverage

      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          files: ./coverage/lcov.info
          flags: unittests
          name: codecov-umbrella

      - name: Run security scan
        run: npm audit --audit-level=moderate

      - name: Run SAST
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          format: 'sarif'
          output: 'trivy-results.sarif'

      - name: Upload Trivy results to GitHub Security tab
        uses: github/codeql-action/upload-sarif@v2
        if: always()
        with:
          sarif_file: 'trivy-results.sarif'

  # Build Job
  build:
    name: Build Docker Image
    needs: test
    runs-on: ubuntu-latest
    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
      image-digest: ${{ steps.build.outputs.digest }}
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=semver,pattern={{version}}
            type=semver,pattern={{major}}.{{minor}}
            type=sha,prefix={{branch}}-

      - name: Build and push Docker image
        id: build
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          build-args: |
            BUILD_DATE=${{ github.event.repository.updated_at }}
            VCS_REF=${{ github.sha }}
            VERSION=${{ steps.meta.outputs.version }}

      - name: Image vulnerability scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}:${{ github.sha }}
          format: 'sarif'
          output: 'trivy-image-results.sarif'

      - name: Upload Trivy results to GitHub Security
        uses: github/codeql-action/upload-sarif@v2
        if: always()
        with:
          sarif_file: 'trivy-image-results.sarif'

  # Deploy to Staging
  deploy-staging:
    name: Deploy to Staging
    needs: build
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/develop'
    environment:
      name: staging
      url: https://staging.example.com
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Deploy to ECS (Staging)
        run: |
          aws ecs update-service \
            --cluster staging \
            --service myapp-staging-service \
            --force-new-deployment \
            --region ${{ env.AWS_REGION }}

      - name: Wait for deployment
        run: |
          aws ecs wait services-stable \
            --cluster staging \
            --services myapp-staging-service \
            --region ${{ env.AWS_REGION }}

      - name: Run integration tests
        run: |
          npm run test:integration -- --env=staging

  # Deploy to Production
  deploy-production:
    name: Deploy to Production
    needs: [build, deploy-staging]
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment:
      name: production
      url: https://example.com
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Create deployment record
        run: |
          gh release create ${{ github.sha }} \
            --title "Release ${{ github.sha }}" \
            --notes "Deploying to production"

      - name: Blue-green deployment
        run: |
          # Switch traffic to new task set
          TASK_SET_ARN=$(aws ecs create-task-set \
            --cluster production \
            --service myapp-service \
            --task-definition myapp:${{ github.sha }} \
            --launch-type FARGATE \
            --network-configuration "awsvpcConfiguration={subnets=[${{ env.PRIVATE_SUBNETS }}],securityGroups=[${{ env.SECURITY_GROUP }}],assignPublicIp=DISABLED}" \
            --query 'taskSet.taskSetArn' \
            --output text)

          # Gradual rollout
          for percentage in 10 25 50 75 100; do
            aws ecs update-service-primary-task-set \
              --cluster production \
              --service myapp-service \
              --primary-task-set $TASK_SET_ARN \
              --task-set $TASK_SET_ARN
            sleep 30
          done

      - name: Run smoke tests
        run: |
          npm run test:smoke -- --env=production

      - name: Rollback on failure
        if: failure()
        run: |
          # Rollback to previous task set
          aws ecs update-service \
            --cluster production \
            --service myapp-service \
            --task-definition myapp:previous \
            --force-new-deployment

      - name: Notify team
        if: success()
        uses: 8398a7/action-slack@v3
        with:
          status: ${{ job.status }}
          text: |
            Production deployment successful!
            Commit: ${{ github.sha }}
            Author: ${{ github.actor }}
          webhook_url: ${{ secrets.SLACK_WEBHOOK }}

4. Kubernetes Deployment

Helm Chart for Production

# helm/myapp/Chart.yaml
apiVersion: v2
name: myapp
description: A Helm chart for my application
type: application
version: 1.0.0
appVersion: "1.0"

# helm/myapp/values.yaml
replicaCount: 3

image:
  repository: myapp
  pullPolicy: IfNotPresent
  tag: "1.0.0"

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

serviceAccount:
  create: true
  annotations: {}
  name: ""

podAnnotations: {}

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000

securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  runAsUser: 1000
  capabilities:
    drop:
    - ALL

service:
  type: ClusterIP
  port: 80
  annotations: {}

ingress:
  enabled: true
  className: "nginx"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
  hosts:
    - host: example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: myapp-tls
      hosts:
        - example.com

resources:
  limits:
    cpu: 1000m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 80
  targetMemoryUtilizationPercentage: 80

nodeSelector: {}

tolerations: []

affinity: {}

# helm/myapp/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "myapp.fullname" . }}
  labels:
    {{- include "myapp.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "myapp.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      {{- with .Values.podAnnotations }}
      annotations:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      labels:
        {{- include "myapp.selectorLabels" . | nindent 8 }}
    spec:
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      serviceAccountName: {{ include "myapp.serviceAccountName" . }}
      securityContext:
        {{- toYaml .Values.podSecurityContext | nindent 8 }}
      containers:
      - name: {{ .Chart.Name }}
        securityContext:
          {{- toYaml .Values.securityContext | nindent 10 }}
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
        imagePullPolicy: {{ .Values.image.pullPolicy }}
        ports:
        - name: http
          containerPort: 8080
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /health
            port: http
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready
            port: http
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        resources:
          {{- toYaml .Values.resources | nindent 10 }}
        volumeMounts:
        - name: temp
          mountPath: /tmp
        - name: cache
          mountPath: /app/cache
      volumes:
      - name: temp
        emptyDir: {}
      - name: cache
        emptyDir: {}
      {{- with .Values.nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.affinity }}
      affinity:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.tolerations }}
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}

5. Decision Trees

Automation Tool Selection

What to automate?
│
├─ Configuration management → Ansible, Chef, Puppet
├─ Infrastructure provisioning → Terraform, CloudFormation
├─ Container orchestration → Kubernetes, Docker Swarm
├─ CI/CD → Jenkins, GitLab CI, GitHub Actions
├─ Monitoring → Prometheus, Grafana, Datadog
├─ Log management → ELK Stack, Splunk
└─ Security scanning → Trivy, SonarQube

Deployment Strategy Selection

Deployment requirements?
│
├─ Zero downtime → Blue-green, Canary
├─ Quick rollback → Blue-green
├─ Gradual rollout → Canary, Rolling
├─ Simple infrastructure → Rolling
├─ Complex microservices → Canary
└─ Enterprise compliance → Blue-green with approvals

6. Anti-Patterns to Avoid

Hard-coded secrets: Use vaults, never hard-code
No testing: Always test before deploying
Manual deployments: Automate everything
No rollback plan: Always have a rollback strategy
Missing monitoring: You can't manage what you don't measure
Large monoliths: Break into smaller, deployable units
No version control: Everything must be in git
Tight coupling: Design for independence
No documentation: Document your automation
Ignoring security: Security must be built-in

7. Quality Checklist

Before considering automation production-ready:

All infrastructure codified
Secrets management implemented
Automated testing complete
CI/CD pipeline tested
Rollback procedure tested
Monitoring and alerting configured
Security scanning integrated
Documentation complete
Peer review completed
Disaster recovery tested
Configuration drift detection active
Automation idempotent
Error handling implemented
Performance testing completed
Compliance requirements met
Backup automation configured
Logging and auditing enabled
Team training completed
Runbooks documented
SLA requirements met

This comprehensive skill definition provides complete guidance for server automation across modern infrastructure.

32 KiB Raw Blame History