Files
Pony-Alpha-2-Dataset-Training/skills/skill-server-automation.md
Pony Alpha 2 68453089ee feat: initial Alpha Brain 2 dataset release
Massive training corpus for AI coding models containing:
- 10 JSONL training datasets (641+ examples across coding, reasoning, planning, architecture, communication, debugging, security, workflows, error handling, UI/UX)
- 11 agent behavior specifications (explorer, planner, reviewer, debugger, executor, UI designer, Linux admin, kernel engineer, security architect, automation engineer, API architect)
- 6 skill definition files (coding, API engineering, kernel, Linux server, security architecture, server automation, UI/UX)
- Master README with project origin story and philosophy

Built by Pony Alpha 2 to help AI models learn expert-level coding approaches.
2026-03-13 16:26:29 +04:00

1348 lines
32 KiB
Markdown

# Server Automation Expert Skill
## Activation Criteria
Activate this skill when the user:
- Automates server provisioning and configuration
- Implements Infrastructure as Code (IaC)
- Designs CI/CD pipelines
- Automates deployment processes
- Manages container orchestration
- Implements configuration management
- Automates testing and quality assurance
- Designs monitoring and alerting automation
- Automates backup and disaster recovery
- Implements GitOps practices
- Automates security scanning and compliance
- Manages multi-environment deployments
- Implements blue-green or canary deployments
- Automates infrastructure scaling
- Needs reproducible infrastructure builds
## Core Methodology
### 1. Ansible Playbook Design
#### Complete Ansible Setup
```yaml
# ansible.cfg - Ansible Configuration
[defaults]
inventory = ./inventory
host_key_checking = False
retry_files_enabled = False
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
stdout_callback = yaml
bin_ansible_callbacks = True
callbacks_enabled = profile_tasks, timer
jinja2_extensions = jinja2.ext.do
display_skipped_hosts = False
[ssh_connection]
pipelining = True
control_path = /tmp/ansible-ssh-%%h-%%p-%%r
```
```yaml
# inventory.yml - Dynamic Inventory
---
all:
children:
production:
hosts:
prod-web-01:
ansible_host: 10.0.1.10
ansible_user: deploy
prod-web-02:
ansible_host: 10.0.1.11
ansible_user: deploy
prod-db-01:
ansible_host: 10.0.20.10
ansible_user: deploy
staging:
hosts:
staging-web-01:
ansible_host: 10.1.1.10
ansible_user: deploy
webservers:
children:
production:
vars:
nginx_worker_processes: auto
nginx_worker_connections: 1024
databases:
children:
production:
vars:
postgresql_version: 15
postgresql_max_connections: 200
vars:
ansible_python_interpreter: /usr/bin/python3
env: production
```
```yaml
# site.yml - Master Playbook
---
- name: Configure web servers
hosts: webservers
become: true
roles:
- role: base
tags: ['base']
- role: nginx
tags: ['nginx', 'web']
- role: application
tags: ['application']
- role: monitoring
tags: ['monitoring']
- name: Configure database servers
hosts: databases
become: true
roles:
- role: base
tags: ['base']
- role: postgresql
tags: ['database', 'postgresql']
- role: monitoring
tags: ['monitoring']
```
#### Production-Ready Roles
```yaml
# roles/base/tasks/main.yml
---
- name: Update apt cache
apt:
update_cache: true
cache_valid_time: 3600
tags: ['apt']
- name: Upgrade all packages
apt:
upgrade: dist
autoremove: true
tags: ['apt']
- name: Install base packages
apt:
name:
- curl
- wget
- git
- vim
- htop
- tmux
- net-tools
- tcpdump
- strace
- sysstat
- fail2ban
- ufw
state: present
tags: ['packages']
- name: Configure timezone
timezone:
name: UTC
tags: ['system']
- name: Set hostname
hostname:
name: "{{ inventory_hostname }}"
tags: ['system']
- name: Configure sysctl
sysctl:
name: "{{ item.name }}"
value: "{{ item.value }}"
state: present
reload: true
loop:
- { name: "net.ipv4.ip_forward", value: "0" }
- { name: "net.ipv4.conf.all.send_redirects", value: "0" }
- { name: "net.ipv4.conf.default.send_redirects", value: "0" }
- { name: "net.ipv4.conf.all.accept_source_route", value: "0" }
- { name: "net.ipv4.conf.default.accept_source_route", value: "0" }
- { name: "net.ipv4.conf.all.accept_redirects", value: "0" }
- { name: "net.ipv4.conf.default.accept_redirects", value: "0" }
- { name: "net.ipv4.icmp_echo_ignore_broadcasts", value: "1" }
- { name: "net.ipv4.tcp_syncookies", value: "1" }
- { name: "net.ipv4.tcp_max_syn_backlog", value: "2048" }
- { name: "net.core.somaxconn", value: "1024" }
tags: ['system', 'security']
- name: Configure limits
pam_limits:
domain: "*"
limit_type: "{{ item.type }}"
limit_item: "{{ item.item }}"
value: "{{ item.value }}"
loop:
- { type: "soft", item: "nofile", value: "65536" }
- { type: "hard", item: "nofile", value: "65536" }
- { type: "soft", item: "nproc", value: "65536" }
- { type: "hard", item: "nproc", value: "65536" }
tags: ['system']
- name: Configure fail2ban
copy:
src: jail.local
dest: /etc/fail2ban/jail.local
owner: root
group: root
mode: '0644'
notify: restart fail2ban
tags: ['security']
- name: Ensure fail2ban is running
service:
name: fail2ban
state: started
enabled: true
tags: ['security']
- name: Configure UFW
ufw:
state: enabled
policy: deny
direction: incoming
tags: ['firewall']
- name: Allow SSH through UFW
ufw:
rule: allow
port: "22"
proto: tcp
tags: ['firewall']
- name: Allow HTTP/HTTPS through UFW
ufw:
rule: allow
port: "{{ item }}"
proto: tcp
loop:
- "80"
- "443"
tags: ['firewall']
- name: Create deploy user
user:
name: deploy
shell: /bin/bash
groups: sudo
append: true
state: present
tags: ['users']
- name: Add SSH key for deploy user
authorized_key:
user: deploy
key: "{{ deploy_ssh_public_key }}"
state: present
tags: ['users']
```
```yaml
# roles/nginx/tasks/main.yml
---
- name: Add NGINX repository
apt_repository:
repo: "ppa:ondrej/nginx"
state: present
update_cache: true
tags: ['nginx', 'repository']
- name: Install NGINX
apt:
name: nginx
state: present
tags: ['nginx', 'packages']
- name: Create nginx directories
file:
path: "{{ item }}"
state: directory
owner: www-data
group: www-data
mode: '0755'
loop:
- /var/www/html
- /etc/nginx/sites-available
- /etc/nginx/sites-enabled
- /var/log/nginx
tags: ['nginx', 'config']
- name: Configure nginx main config
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: '0644'
validate: 'nginx -t -c %s'
notify: reload nginx
tags: ['nginx', 'config']
- name: Remove default nginx site
file:
path: /etc/nginx/sites-enabled/default
state: absent
notify: reload nginx
tags: ['nginx', 'config']
- name: Configure nginx site
template:
src: site.conf.j2
dest: "/etc/nginx/sites-available/{{ application_name }}.conf"
owner: root
group: root
mode: '0644'
validate: 'nginx -t'
notify: reload nginx
tags: ['nginx', 'config']
- name: Enable nginx site
file:
src: "/etc/nginx/sites-available/{{ application_name }}.conf"
dest: "/etc/nginx/sites-enabled/{{ application_name }}.conf"
state: link
notify: reload nginx
tags: ['nginx', 'config']
- name: Ensure nginx is running
service:
name: nginx
state: started
enabled: true
tags: ['nginx', 'service']
- name: Configure logrotate for nginx
copy:
src: nginx-logrotate
dest: /etc/logrotate.d/nginx
owner: root
group: root
mode: '0644'
tags: ['nginx', 'logging']
```
```yaml
# roles/nginx/templates/nginx.conf.j2
user www-data;
worker_processes {{ nginx_worker_processes }};
worker_rlimit_nofile 65535;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections {{ nginx_worker_connections }};
use epoll;
multi_accept on;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
server_tokens off;
client_body_buffer_size 128k;
client_max_body_size 100m;
client_header_buffer_size 1k;
large_client_header_buffers 4 4k;
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css text/xml text/javascript
application/json application/javascript application/xml+rss
application/rss+xml font/truetype font/opentype
application/vnd.ms-fontobject image/svg+xml;
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}
```
```yaml
# roles/docker/tasks/main.yml
---
- name: Install prerequisites
apt:
name:
- apt-transport-https
- ca-certificates
- curl
- gnupg
- lsb-release
state: present
tags: ['docker', 'prerequisites']
- name: Add Docker GPG key
apt_key:
url: https://download.docker.com/linux/{{ ansible_distribution | lower }}/gpg
state: present
tags: ['docker', 'repository']
- name: Add Docker repository
apt_repository:
repo: "deb https://download.docker.com/linux/{{ ansible_distribution | lower }} {{ ansible_distribution_release }} stable"
state: present
update_cache: true
tags: ['docker', 'repository']
- name: Install Docker
apt:
name:
- docker-ce
- docker-ce-cli
- containerd.io
- docker-compose-plugin
state: present
tags: ['docker', 'packages']
- name: Create Docker directory
file:
path: /etc/docker
state: directory
owner: root
group: root
mode: '0755'
tags: ['docker', 'config']
- name: Configure Docker daemon
copy:
src: daemon.json
dest: /etc/docker/daemon.json
owner: root
group: root
mode: '0644'
notify: restart docker
tags: ['docker', 'config']
- name: Ensure deploy user can use Docker
user:
name: deploy
groups: docker
append: true
tags: ['docker', 'users']
- name: Ensure Docker is running
service:
name: docker
state: started
enabled: true
tags: ['docker', 'service']
- name: Install Python Docker SDK
pip:
name: docker
state: present
tags: ['docker', 'python']
```
### 2. Terraform Infrastructure as Code
#### Production Terraform Configuration
```hcl
# terraform/terraform.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "terraform-state-prod"
key = "production/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
# terraform/provider.tf
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
Project = var.project_name
ManagedBy = "Terraform"
}
}
}
# terraform/variables.tf
variable "aws_region" {
description = "AWS region"
type = string
default = "us-east-1"
}
variable "environment" {
description = "Environment name"
type = string
default = "production"
}
variable "project_name" {
description = "Project name"
type = string
default = "myapp"
}
variable "vpc_cidr" {
description = "VPC CIDR block"
type = string
default = "10.0.0.0/16"
}
variable "availability_zones" {
description = "List of availability zones"
type = list(string)
default = ["us-east-1a", "us-east-1b"]
}
variable "instance_types" {
description = "Instance types by tier"
type = map(string)
default = {
web = "t3.medium"
app = "t3.large"
db = "r6g.large"
}
}
# terraform/vpc.tf
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_support = true
enable_dns_hostnames = true
tags = {
Name = "${var.project_name}-${var.environment}-vpc"
Tier = "Network"
}
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.project_name}-${var.environment}-igw"
}
}
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.project_name}-${var.environment}-public-${var.availability_zones[count.index]}"
Tier = "Public"
}
}
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = false
tags = {
Name = "${var.project_name}-${var.environment}-private-${var.availability_zones[count.index]}"
Tier = "Private"
}
}
resource "aws_subnet" "database" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 20)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = false
tags = {
Name = "${var.project_name}-${var.environment}-database-${var.availability_zones[count.index]}"
Tier = "Database"
}
}
resource "aws_eip" "nat" {
count = length(var.availability_zones)
domain = "vpc"
tags = {
Name = "${var.project_name}-${var.environment}-nat-${count.index}"
}
depends_on = [aws_internet_gateway.main]
}
resource "aws_nat_gateway" "main" {
count = length(var.availability_zones)
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = {
Name = "${var.project_name}-${var.environment}-nat-${count.index}"
}
depends_on = [aws_internet_gateway.main]
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "${var.project_name}-${var.environment}-public-rt"
}
}
resource "aws_route_table" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main[count.index].id
}
tags = {
Name = "${var.project_name}-${var.environment}-private-rt-${count.index}"
}
}
resource "aws_route_table_association" "public" {
count = length(var.availability_zones)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "private" {
count = length(var.availability_zones)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[count.index].id
}
resource "aws_route_table_association" "database" {
count = length(var.availability_zones)
subnet_id = aws_subnet.database[count.index].id
route_table_id = aws_route_table.private[count.index].id
}
# terraform/security_groups.tf
resource "aws_security_group" "web" {
name = "${var.project_name}-${var.environment}-web"
description = "Security group for web tier"
vpc_id = aws_vpc.main.id
ingress {
description = "HTTPS from anywhere"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTP from anywhere"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
description = "All outbound traffic"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-${var.environment}-web"
Tier = "Web"
}
}
resource "aws_security_group" "app" {
name = "${var.project_name}-${var.environment}-app"
description = "Security group for application tier"
vpc_id = aws_vpc.main.id
ingress {
description = "Application port from web tier"
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.web.id]
}
egress {
description = "All outbound traffic"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-${var.environment}-app"
Tier = "Application"
}
}
resource "aws_security_group" "database" {
name = "${var.project_name}-${var.environment}-database"
description = "Security group for database tier"
vpc_id = aws_vpc.main.id
ingress {
description = "PostgreSQL from application tier"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.app.id]
}
tags = {
Name = "${var.project_name}-${var.environment}-database"
Tier = "Database"
}
}
# terraform/ec2.tf
resource "aws_launch_template" "web" {
name_prefix = "${var.project_name}-${var.environment}-web-"
image_id = data.aws_ami.amazon_linux_2.id
instance_type = var.instance_types.web
key_name = aws_key_pair.main.key_name
network_interfaces {
associate_public_ip_address = true
security_groups = [aws_security_group.web.id]
}
user_data = base64encode(templatefile("${path.module}/templates/web_user_data.sh", {
environment = var.environment
}))
tag_specifications {
resource_type = "instance"
tags = {
Name = "${var.project_name}-${var.environment}-web"
Tier = "Web"
}
}
lifecycle {
create_before_destroy = true
}
}
resource "aws_autoscaling_group" "web" {
desired_capacity = 2
max_size = 4
min_size = 2
vpc_zone_identifier = aws_subnet.public[*].id
target_group_arns = [aws_lb_target_group.web.arn]
launch_template {
id = aws_launch_template.web.id
version = "$Latest"
}
tag {
key = "Name"
value = "${var.project_name}-${var.environment}-web"
propagate_at_launch = true
}
}
# terraform/load_balancer.tf
resource "aws_lb" "main" {
name = "${var.project_name}-${var.environment}-lb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.web.id]
subnets = aws_subnet.public[*].id
enable_deletion_protection = false
tags = {
Name = "${var.project_name}-${var.environment}-lb"
}
}
resource "aws_lb_target_group" "web" {
name = "${var.project_name}-${var.environment}-web-tg"
port = 80
protocol = "HTTP"
vpc_id = aws_vpc.main.id
health_check {
enabled = true
healthy_threshold = 2
interval = 30
matcher = "200"
path = "/health"
port = "traffic-port"
protocol = "HTTP"
timeout = 5
unhealthy_threshold = 3
}
}
resource "aws_lb_listener" "https" {
load_balancer_arn = aws_lb.main.arn
port = "443"
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS-1-2-2017-01"
certificate_arn = aws_acm_certificate.main.arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.web.arn
}
}
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.main.arn
port = "80"
protocol = "HTTP"
default_action {
type = "redirect"
redirect {
port = "443"
protocol = "HTTPS"
status_code = "HTTP_301"
}
}
}
```
### 3. CI/CD Pipeline Design
#### GitHub Actions Production Pipeline
```yaml
# .github/workflows/ci-cd.yml
name: CI/CD Pipeline
on:
push:
branches: [main, develop]
pull_request:
branches: [main, develop]
env:
AWS_REGION: us-east-1
ECR_REPOSITORY: myapp
ECS_CLUSTER: production
ECS_SERVICE: myapp-service
jobs:
# CI Job
test:
name: Test
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run linter
run: npm run lint
- name: Run tests
run: npm run test:coverage
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: ./coverage/lcov.info
flags: unittests
name: codecov-umbrella
- name: Run security scan
run: npm audit --audit-level=moderate
- name: Run SAST
uses: aquasecurity/trivy-action@master
with:
scan-type: 'fs'
scan-ref: '.'
format: 'sarif'
output: 'trivy-results.sarif'
- name: Upload Trivy results to GitHub Security tab
uses: github/codeql-action/upload-sarif@v2
if: always()
with:
sarif_file: 'trivy-results.sarif'
# Build Job
build:
name: Build Docker Image
needs: test
runs-on: ubuntu-latest
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
image-digest: ${{ steps.build.outputs.digest }}
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=sha,prefix={{branch}}-
- name: Build and push Docker image
id: build
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
build-args: |
BUILD_DATE=${{ github.event.repository.updated_at }}
VCS_REF=${{ github.sha }}
VERSION=${{ steps.meta.outputs.version }}
- name: Image vulnerability scan
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}:${{ github.sha }}
format: 'sarif'
output: 'trivy-image-results.sarif'
- name: Upload Trivy results to GitHub Security
uses: github/codeql-action/upload-sarif@v2
if: always()
with:
sarif_file: 'trivy-image-results.sarif'
# Deploy to Staging
deploy-staging:
name: Deploy to Staging
needs: build
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/develop'
environment:
name: staging
url: https://staging.example.com
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Deploy to ECS (Staging)
run: |
aws ecs update-service \
--cluster staging \
--service myapp-staging-service \
--force-new-deployment \
--region ${{ env.AWS_REGION }}
- name: Wait for deployment
run: |
aws ecs wait services-stable \
--cluster staging \
--services myapp-staging-service \
--region ${{ env.AWS_REGION }}
- name: Run integration tests
run: |
npm run test:integration -- --env=staging
# Deploy to Production
deploy-production:
name: Deploy to Production
needs: [build, deploy-staging]
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
environment:
name: production
url: https://example.com
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Create deployment record
run: |
gh release create ${{ github.sha }} \
--title "Release ${{ github.sha }}" \
--notes "Deploying to production"
- name: Blue-green deployment
run: |
# Switch traffic to new task set
TASK_SET_ARN=$(aws ecs create-task-set \
--cluster production \
--service myapp-service \
--task-definition myapp:${{ github.sha }} \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[${{ env.PRIVATE_SUBNETS }}],securityGroups=[${{ env.SECURITY_GROUP }}],assignPublicIp=DISABLED}" \
--query 'taskSet.taskSetArn' \
--output text)
# Gradual rollout
for percentage in 10 25 50 75 100; do
aws ecs update-service-primary-task-set \
--cluster production \
--service myapp-service \
--primary-task-set $TASK_SET_ARN \
--task-set $TASK_SET_ARN
sleep 30
done
- name: Run smoke tests
run: |
npm run test:smoke -- --env=production
- name: Rollback on failure
if: failure()
run: |
# Rollback to previous task set
aws ecs update-service \
--cluster production \
--service myapp-service \
--task-definition myapp:previous \
--force-new-deployment
- name: Notify team
if: success()
uses: 8398a7/action-slack@v3
with:
status: ${{ job.status }}
text: |
Production deployment successful!
Commit: ${{ github.sha }}
Author: ${{ github.actor }}
webhook_url: ${{ secrets.SLACK_WEBHOOK }}
```
### 4. Kubernetes Deployment
#### Helm Chart for Production
```yaml
# helm/myapp/Chart.yaml
apiVersion: v2
name: myapp
description: A Helm chart for my application
type: application
version: 1.0.0
appVersion: "1.0"
# helm/myapp/values.yaml
replicaCount: 3
image:
repository: myapp
pullPolicy: IfNotPresent
tag: "1.0.0"
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
serviceAccount:
create: true
annotations: {}
name: ""
podAnnotations: {}
podSecurityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsUser: 1000
capabilities:
drop:
- ALL
service:
type: ClusterIP
port: 80
annotations: {}
ingress:
enabled: true
className: "nginx"
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
hosts:
- host: example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: myapp-tls
hosts:
- example.com
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
nodeSelector: {}
tolerations: []
affinity: {}
# helm/myapp/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "myapp.fullname" . }}
labels:
{{- include "myapp.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "myapp.selectorLabels" . | nindent 6 }}
template:
metadata:
{{- with .Values.podAnnotations }}
annotations:
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "myapp.selectorLabels" . | nindent 8 }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "myapp.serviceAccountName" . }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
- name: {{ .Chart.Name }}
securityContext:
{{- toYaml .Values.securityContext | nindent 10 }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: 8080
protocol: TCP
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
resources:
{{- toYaml .Values.resources | nindent 10 }}
volumeMounts:
- name: temp
mountPath: /tmp
- name: cache
mountPath: /app/cache
volumes:
- name: temp
emptyDir: {}
- name: cache
emptyDir: {}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
```
### 5. Decision Trees
#### Automation Tool Selection
```
What to automate?
├─ Configuration management → Ansible, Chef, Puppet
├─ Infrastructure provisioning → Terraform, CloudFormation
├─ Container orchestration → Kubernetes, Docker Swarm
├─ CI/CD → Jenkins, GitLab CI, GitHub Actions
├─ Monitoring → Prometheus, Grafana, Datadog
├─ Log management → ELK Stack, Splunk
└─ Security scanning → Trivy, SonarQube
```
#### Deployment Strategy Selection
```
Deployment requirements?
├─ Zero downtime → Blue-green, Canary
├─ Quick rollback → Blue-green
├─ Gradual rollout → Canary, Rolling
├─ Simple infrastructure → Rolling
├─ Complex microservices → Canary
└─ Enterprise compliance → Blue-green with approvals
```
### 6. Anti-Patterns to Avoid
1. **Hard-coded secrets**: Use vaults, never hard-code
2. **No testing**: Always test before deploying
3. **Manual deployments**: Automate everything
4. **No rollback plan**: Always have a rollback strategy
5. **Missing monitoring**: You can't manage what you don't measure
6. **Large monoliths**: Break into smaller, deployable units
7. **No version control**: Everything must be in git
8. **Tight coupling**: Design for independence
9. **No documentation**: Document your automation
10. **Ignoring security**: Security must be built-in
### 7. Quality Checklist
Before considering automation production-ready:
- [ ] All infrastructure codified
- [ ] Secrets management implemented
- [ ] Automated testing complete
- [ ] CI/CD pipeline tested
- [ ] Rollback procedure tested
- [ ] Monitoring and alerting configured
- [ ] Security scanning integrated
- [ ] Documentation complete
- [ ] Peer review completed
- [ ] Disaster recovery tested
- [ ] Configuration drift detection active
- [ ] Automation idempotent
- [ ] Error handling implemented
- [ ] Performance testing completed
- [ ] Compliance requirements met
- [ ] Backup automation configured
- [ ] Logging and auditing enabled
- [ ] Team training completed
- [ ] Runbooks documented
- [ ] SLA requirements met
This comprehensive skill definition provides complete guidance for server automation across modern infrastructure.