Files
Pony-Alpha-2-Dataset-Training/skills/skill-linux-server.md
Pony Alpha 2 68453089ee feat: initial Alpha Brain 2 dataset release
Massive training corpus for AI coding models containing:
- 10 JSONL training datasets (641+ examples across coding, reasoning, planning, architecture, communication, debugging, security, workflows, error handling, UI/UX)
- 11 agent behavior specifications (explorer, planner, reviewer, debugger, executor, UI designer, Linux admin, kernel engineer, security architect, automation engineer, API architect)
- 6 skill definition files (coding, API engineering, kernel, Linux server, security architecture, server automation, UI/UX)
- Master README with project origin story and philosophy

Built by Pony Alpha 2 to help AI models learn expert-level coding approaches.
2026-03-13 16:26:29 +04:00

32 KiB

Linux Server Administration Expert Skill

Activation Criteria

Activate this skill when the user:

  • Requests Linux server setup, configuration, or maintenance
  • Needs service management (systemd, init scripts)
  • Requires networking configuration (firewalls, routing, DNS)
  • Asks for storage management (LVM, RAID, filesystems)
  • Needs monitoring and logging solutions
  • Requires security hardening or compliance
  • Requests performance tuning or optimization
  • Needs container orchestration (Docker, Kubernetes)
  • Asks for backup and disaster recovery strategies
  • Requires incident response or troubleshooting
  • Needs automation (Ansible, bash scripts, cron)
  • Is managing: web servers, database servers, application servers, cloud instances

Core Methodology

1. System Administration Methodology

Server Setup Workflow

#!/bin/bash
# Initial Server Setup Script
# Usage: sudo ./initial-setup.sh

set -euo pipefail

# Configuration
SERVER_HOSTNAME="${1:-web-server-01}"
TIMEZONE="${2:-UTC}"
ADMIN_USER="${3:-admin}"
SSH_PORT="${4:-22}"

echo "=== Initial Server Setup ==="
echo "Hostname: $SERVER_HOSTNAME"
echo "Timezone: $TIMEZONE"
echo "Admin User: $ADMIN_USER"
echo "SSH Port: $SSH_PORT"

# 1. Update System
echo "[1/8] Updating system packages..."
apt-get update && apt-get upgrade -y
apt-get install -y \
    curl \
    wget \
    git \
    vim \
    htop \
    tmux \
    net-tools \
    lsof \
    strace \
    tcpdump \
    iotop \
    sysstat \
    fail2ban \
    ufw

# 2. Set Hostname
echo "[2/8] Setting hostname..."
hostnamectl set-hostname "$SERVER_HOSTNAME"
echo "127.0.0.1 localhost $SERVER_HOSTNAME" >> /etc/hosts

# 3. Configure Timezone
echo "[3/8] Configuring timezone..."
timedatectl set-timezone "$TIMEZONE"

# 4. Create Admin User
echo "[4/8] Creating admin user..."
if ! id "$ADMIN_USER" &>/dev/null; then
    useradd -m -s /bin/bash "$ADMIN_USER"
    usermod -aG sudo "$ADMIN_USER"
    echo "$ADMIN_USER ALL=(ALL) NOPASSWD:ALL" > "/etc/sudoers.d/$ADMIN_USER"
    chmod 440 "/etc/sudoers.d/$ADMIN_USER"
fi

# 5. Configure SSH
echo "[5/8] Hardening SSH configuration..."
sed -i 's/#PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config
sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config
sed -i "s/#Port 22/Port $SSH_PORT/" /etc/ssh/sshd_config
echo "AllowUsers $ADMIN_USER" >> /etc/ssh/sshd_config
systemctl restart sshd

# 6. Configure Firewall
echo "[6/8] Configuring firewall..."
ufw default deny incoming
ufw default allow outgoing
ufw allow "$SSH_PORT"/tcp
ufw allow 80/tcp
ufw allow 443/tcp
ufw --force enable

# 7. Configure Fail2Ban
echo "[7/8] Configuring Fail2Ban..."
cat > /etc/fail2ban/jail.local << 'EOF'
[DEFAULT]
bantime = 3600
findtime = 600
maxretry = 5
destemail = admin@example.com
sendername = Fail2Ban
action = %(action_mwl)s

[sshd]
enabled = true
port = ssh
filter = sshd
logpath = /var/log/auth.log
maxretry = 3
EOF

systemctl enable fail2ban
systemctl start fail2ban

# 8. Configure System Monitoring
echo "[8/8] Installing monitoring tools..."
apt-get install -y prometheus-node-exporter
systemctl enable prometheus-node-exporter
systemctl start prometheus-node-exporter

echo "=== Setup Complete ==="
echo "Next steps:"
echo "1. Copy SSH key for $ADMIN_USER"
echo "2. Test SSH connection on port $SSH_PORT"
echo "3. Reboot server"

2. Service Management

Systemd Unit Files

Web Application Service

# /etc/systemd/system/webapp.service
[Unit]
Description=Web Application Service
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service

[Service]
Type=simple
User=webapp
Group=webapp
WorkingDirectory=/opt/webapp
Environment="NODE_ENV=production"
Environment="PORT=3000"
EnvironmentFile=/opt/webapp/.env

ExecStart=/usr/bin/node /opt/webapp/server.js
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/kill -SIGTERM $MAINPID

Restart=always
RestartSec=10
StartLimitInterval=200
StartLimitBurst=5

# Security
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/webapp/logs /opt/webapp/uploads
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
AmbientCapabilities=CAP_NET_BIND_SERVICE

# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=webapp

[Install]
WantedBy=multi-user.target

Database Service (PostgreSQL)

# /etc/systemd/system/postgresql-custom.service
[Unit]
Description=PostgreSQL Database Server
After=network.target
Wants=network-online.target

[Service]
Type=notify
User=postgres
Group=postgres

Environment=PGDATA=/var/lib/postgresql/data
Environment=PGPORT=5432

ExecStart=/usr/bin/postgres -D ${PGDATA}
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/usr/bin/pg_ctl stop -D ${PGDATA} -m fast

TimeoutSec=300
OOMScoreAdjust=-1000

# Performance
LimitNOFILE=65536
LimitMEMLOCK=infinity

# Security
ProtectHome=true
ProtectSystem=strict
ReadWritePaths=/var/lib/postgresql /var/run/postgresql
PrivateDevices=true

[Install]
WantedBy=multi-user.target

Background Worker Service

# /etc/systemd/system/worker.service
[Unit]
Description=Background Job Worker
After=network.target redis.service
Wants=redis.service

[Service]
Type=forking
User=worker
Group=worker
WorkingDirectory=/opt/worker

Environment="QUEUE=default"
Environment="CONCURRENCY=4"
EnvironmentFile=/opt/worker/.env

ExecStart=/usr/bin/python3 -m worker start --daemon
ExecStop=/usr/bin/python3 -m worker stop
ExecReload=/usr/bin/python3 -m worker reload

Restart=on-failure
RestartSec=5

# Resource Limits
MemoryMax=2G
CPUQuota=200%

[Install]
WantedBy=multi-user.target

Service Management Commands

# Service Operations
systemctl start service-name
systemctl stop service-name
systemctl restart service-name
systemctl reload service-name
systemctl status service-name

# Enable/Disable Services
systemctl enable service-name    # Enable at boot
systemctl disable service-name   # Disable at boot
systemctl is-enabled service-name

# View Service Details
systemctl show service-name
journalctl -u service-name -f
journalctl -u service-name --since "1 hour ago"

# List Services
systemctl list-units --type=service
systemctl list-units --type=service --state=running
systemctl list-units --type=service --all

# Service Dependencies
systemctl list-dependencies service-name
systemctl list-dependencies --reverse service-name

# Resource Usage
systemd-cgtop
systemctl show service-name -p CPUUsage,MemoryCurrent

3. Network Configuration

Firewall Configuration (UFW)

#!/bin/bash
# Firewall Setup Script

# Reset firewall
ufw --force reset

# Default policies
ufw default deny incoming
ufw default allow outgoing
ufw default deny routed

# Allow loopback
ufw allow in on lo

# Allow established connections
ufw allow established
ufw allow related

# SSH (custom port)
ufw allow 2222/tcp comment 'SSH'

# HTTP/HTTPS
ufw allow 80/tcp comment 'HTTP'
ufw allow 443/tcp comment 'HTTPS'

# Database (internal only)
ufw allow from 10.0.0.0/8 to any port 5432 proto tcp comment 'PostgreSQL internal'

# Monitoring
ufw allow from 10.0.0.0/8 to any port 9100 proto tcp comment 'Node Exporter'

# Rate limiting
ufw limit 2222/tcp comment 'Rate limit SSH'

# Enable firewall
ufw --force enable

# Show rules
ufw status numbered

iptables Configuration

#!/bin/bash
# Advanced iptables Configuration

# Flush existing rules
iptables -F
iptables -X
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X

# Default policies
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT

# Allow loopback
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT

# Allow established connections
iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

# Drop invalid packets
iptables -A INPUT -m conntrack --ctstate INVALID -j DROP

# Allow SSH
iptables -A INPUT -p tcp --dport 2222 -m conntrack --ctstate NEW -m limit --limit 3/min --limit-burst 3 -j ACCEPT
iptables -A INPUT -p tcp --dport 2222 -j DROP

# Allow HTTP/HTTPS
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT

# Anti-spoofing
iptables -A INPUT -s 127.0.0.0/8 ! -i lo -j DROP
iptables -A INPUT -s 0.0.0.0/8 -j DROP
iptables -A INPUT -s 224.0.0.0/4 -j DROP

# ICMP protection
iptables -A INPUT -p icmp --icmp-type echo-request -m limit --limit 1/s -j ACCEPT
iptables -A INPUT -p icmp --icmp-type echo-request -j DROP

# SYN flood protection
iptables -A INPUT -p tcp --syn -m limit --limit 10/s --limit-burst 20 -j ACCEPT
iptables -A INPUT -p tcp --syn -j DROP

# Port scan protection
iptables -A INPUT -p tcp --tcp-flags ALL NONE -j DROP
iptables -A INPUT -p tcp --tcp-flags ALL ALL -j DROP

# Log dropped packets
iptables -A INPUT -j LOG --log-prefix "[DROPPED] " --log-level 4

# Save rules
iptables-save > /etc/iptables/rules.v4

Nginx Configuration

# /etc/nginx/nginx.conf
user nginx;
worker_processes auto;
worker_rlimit_nofile 65535;

error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;

events {
    worker_connections 4096;
    use epoll;
    multi_accept on;
}

http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # Logging
    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for"';

    access_log /var/log/nginx/access.log main;

    # Performance
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server_tokens off;

    # Buffers
    client_body_buffer_size 128k;
    client_max_body_size 100m;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 4k;
    output_buffers 1 32k;
    postpone_output 1460;

    # Gzip
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css text/xml text/javascript
               application/json application/javascript application/xml+rss
               application/rss+xml font/truetype font/opentype
               application/vnd.ms-fontobject image/svg+xml;
    gzip_disable "msie6";

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;
    add_header Referrer-Policy "no-referrer-when-downgrade" always;
    add_header Content-Security-Policy "default-src 'self' http: https: data: blob: 'unsafe-inline'" always;

    # Rate limiting
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
    limit_req_zone $binary_remote_addr zone=general:10m rate=5r/s;
    limit_conn_zone $binary_remote_addr zone=addr:10m;

    # Include site configs
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

Application Server Configuration

# /etc/nginx/sites-available/app.conf
upstream app_backend {
    least_conn;
    server 127.0.0.1:3000 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:3001 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:3002 max_fails=3 fail_timeout=30s;
    keepalive 32;
}

# Rate limiting
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=20r/s;

server {
    listen 80;
    listen [::]:80;
    server_name app.example.com;

    # Redirect to HTTPS
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name app.example.com;

    # SSL Configuration
    ssl_certificate /etc/ssl/certs/app.example.com.crt;
    ssl_certificate_key /etc/ssl/private/app.example.com.key;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384';
    ssl_prefer_server_ciphers on;
    ssl_session_cache shared:SSL:10m;
    ssl_session_timeout 10m;
    ssl_stapling on;
    ssl_stapling_verify on;

    # Security headers
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

    # Root
    root /var/www/app;
    index index.html;

    # Client upload size
    client_max_body_size 100M;

    # Logging
    access_log /var/log/nginx/app.access.log;
    error_log /var/log/nginx/app.error.log;

    # Static files
    location /static/ {
        alias /var/www/app/static/;
        expires 1y;
        add_header Cache-Control "public, immutable";
        access_log off;
    }

    # Media files
    location /media/ {
        alias /var/www/app/media/;
        expires 30d;
        add_header Cache-Control "public";
    }

    # API endpoints
    location /api/ {
        limit_req zone=api_limit burst=40 nodelay;

        proxy_pass http://app_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_cache_bypass $http_upgrade;

        # Timeouts
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }

    # WebSocket
    location /ws/ {
        proxy_pass http://app_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_read_timeout 86400;
    }

    # Health check
    location /health {
        access_log off;
        return 200 "healthy\n";
        add_header Content-Type text/plain;
    }

    # Favicon
    location = /favicon.ico {
        access_log off;
        log_not_found off;
    }

    # Robots.txt
    location = /robots.txt {
        access_log off;
        log_not_found off;
    }

    # Deny access to hidden files
    location ~ /\. {
        deny all;
        access_log off;
        log_not_found off;
    }
}

4. Storage Management

LVM Configuration

#!/bin/bash
# LVM Setup Script

VG_NAME="vg01"
LV_DATA="lv-data"
LV_LOGS="lv-logs"
LV_BACKUP="lv-backup"

# Create physical volume
pvcreate /dev/sdb

# Create volume group
vgcreate $VG_NAME /dev/sdb

# Create logical volumes
lvcreate -L 100G -n $LV_DATA $VG_NAME
lvcreate -L 50G -n $LV_LOGS $VG_NAME
lvcreate -L 200G -n $LV_BACKUP $VG_NAME

# Create filesystems
mkfs.xfs /dev/$VG_NAME/$LV_DATA
mkfs.xfs /dev/$VG_NAME/$LV_LOGS
mkfs.ext4 /dev/$VG_NAME/$LV_BACKUP

# Create mount points
mkdir -p /data /logs /backup

# Update /etc/fstab
cat >> /etc/fstab << EOF
/dev/$VG_NAME/$LV_DATA    /data    xfs    defaults,noatime    0 2
/dev/$VG_NAME/$LV_LOGS    /logs    xfs    defaults,noatime    0 2
/dev/$VG_NAME/$LV_BACKUP  /backup  ext4   defaults,noatime    0 2
EOF

# Mount filesystems
mount -a

# Verify
df -h
vgdisplay
lvdisplay

RAID Configuration

#!/bin/bash
# RAID 10 Setup Script

# Install mdadm
apt-get install -y mdadm

# Create RAID 10 array
mdadm --create /dev/md0 \
  --level=10 \
  --raid-devices=4 \
  /dev/sdb /dev/sdc /dev/sdd /dev/sde

# Create filesystem
mkfs.ext4 /dev/md0

# Create mount point
mkdir -p /raid

# Update /etc/fstab
echo "/dev/md0 /raid ext4 defaults,noatime 0 2" >> /etc/fstab

# Mount
mount /raid

# Save RAID configuration
mdadm --detail --scan >> /etc/mdadm/mdadm.conf

# Verify
cat /proc/mdstat
mdadm --detail /dev/md0

5. Monitoring and Logging

Prometheus Configuration

# /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'production'
    replica: '1'

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - localhost:9093

# Rule files
rule_files:
  - '/etc/prometheus/rules/*.yml'

# Scrape configurations
scrape_configs:
  # Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Node Exporter
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
        replacement: 'web-server-01'

  # Nginx Exporter
  - job_name: 'nginx'
    static_configs:
      - targets: ['localhost:9113']

  # PostgreSQL Exporter
  - job_name: 'postgresql'
    static_configs:
      - targets: ['localhost:9187']

  # Application metrics
  - job_name: 'application'
    static_configs:
      - targets: ['localhost:3000']
    metrics_path: '/metrics'

Alert Rules

# /etc/prometheus/rules/alerts.yml
groups:
  - name: system
    interval: 30s
    rules:
      # CPU usage
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is {{ $value }}%"

      # Memory usage
      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"
          description: "Memory usage is {{ $value }}%"

      # Disk space
      - alert: LowDiskSpace
        expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "Only {{ $value }}% free space on /"

  - name: application
    interval: 30s
    rules:
      # HTTP error rate
      - alert: HighErrorRate
        expr: (rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])) * 100 > 5
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High HTTP error rate"
          description: "Error rate is {{ $value }}%"

      # Response time
      - alert: SlowResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Slow response time"
          description: "P95 response time is {{ $value }}s"

Log Rotation

# /etc/logrotate.d/custom
/var/log/myapp/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    copytruncate
    notifempty
    create 0640 www-data adm
    sharedscripts
    postrotate
        systemctl reload myapp > /dev/null 2>&1 || true
    endscript
}

6. Security Hardening

SSH Hardening

# /etc/ssh/sshd_config

# Basic
Port 2222
Protocol 2
HostKey /etc/ssh/ssh_host_ed25519_key
HostKey /etc/ssh/ssh_host_rsa_key

# Logging
SyslogFacility AUTHPRIV
LogLevel VERBOSE

# Authentication
LoginGraceTime 60
PermitRootLogin no
StrictModes yes
MaxAuthTries 3
MaxStartups 10:30:60
PasswordAuthentication no
PermitEmptyPasswords no
ChallengeResponseAuthentication no

# Key-based authentication
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys

# Security
UsePAM yes
X11Forwarding no
AllowTcpForwarding no
AllowAgentForwarding no
PermitTunnel no
GatewayPorts no

# Restrict users
AllowUsers admin deploy
DenyUsers root

# Ciphers
Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com

# MACs
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com

# Kex
KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256

# Restart service
systemctl restart sshd

System Hardening Script

#!/bin/bash
# System Security Hardening

set -euo pipefail

echo "=== System Security Hardening ==="

# 1. Disable unused filesystems
echo "[1/10] Disabling unused filesystems..."
cat > /etc/modprobe.d/disable-filesystems.conf << 'EOF'
install cramfs /bin/true
install freevxfs /bin/true
install jffs2 /bin/true
install hfsplus /bin/true
install squashfs /bin/true
install udf /bin/true
EOF

# 2. Kernel hardening
echo "[2/10] Configuring kernel parameters..."
cat > /etc/sysctl.d/99-security.conf << 'EOF'
# Network
net.ipv4.ip_forward = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 5

# IPv6
net.ipv6.conf.all.accept_redirects = 0
net.ipv6.conf.default.accept_redirects = 0

# Kernel
kernel.randomize_va_space = 2
kernel.kptr_restrict = 2
kernel.dmesg_restrict = 1
kernel.perf_event_paranoid = 2

# Core dumps
fs.suid_dumpable = 0
kernel.core_pattern = |/bin/false
EOF

sysctl -p /etc/sysctl.d/99-security.conf

# 3. Restrict core dumps
echo "[3/10] Restricting core dumps..."
cat > /etc/security/limits.d/50-core.conf << 'EOF'
* hard core 0
EOF

# 4. Disable USB storage
echo "[4/10] Disabling USB storage..."
echo "install usb-storage /bin/true" > /etc/modprobe.d/disable-usb.conf

# 5. Configure PAM
echo "[5/10] Configuring PAM..."
cat > /etc/pam.d/login << 'EOF'
auth required pam_unix.so
auth requisite pam_deny.so
auth sufficient pam_rootok.so
account required pam_unix.so
password required pam_unix.so sha512 shadow
session required pam_unix.so
session required pam_limits.so
EOF

# 6. Disable unnecessary services
echo "[6/10] Disabling unnecessary services..."
systemctl disable cups 2>/dev/null || true
systemctl disable avahi-daemon 2>/dev/null || true
systemctl disable bluetooth 2>/dev/null || true

# 7. Install security updates
echo "[7/10] Installing security updates..."
apt-get install -y unattended-upgrades
dpkg-reconfigure -plow unattended-upgrades

# 8. Configure automatic updates
echo "[8/10] Configuring automatic updates..."
cat > /etc/apt/apt.conf.d/50unattended-upgrades << 'EOF'
Unattended-Upgrade::Allowed-Origins {
    "${distro_id}:${distro_codename}-security";
};
Unattended-Upgrade::AutoFixInterruptedDpkg "true";
Unattended-Upgrade::MinimalSteps "true";
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
Unattended-Upgrade::Automatic-Reboot "false";
Unattended-Upgrade::Automatic-Reboot-Time "02:00";
EOF

# 9. Install auditd
echo "[9/10] Installing audit daemon..."
apt-get install -y auditd audispd-plugins
systemctl enable auditd
systemctl start auditd

# 10. Configure audit rules
echo "[10/10] Configuring audit rules..."
cat > /etc/audit/rules.d/audit.rules << 'EOF'
-w /etc/passwd -p wa -k identity
-w /etc/group -p wa -k identity
-w /etc/shadow -p wa -k identity
-w /etc/sudoers -p wa -k sudoers
-w /var/log/audit/ -p wa -k audit-logs
-w /etc/ssh/sshd_config -p wa -k sshd
-w /var/log/auth.log -p wa -k auth-log
EOF

systemctl restart auditd

echo "=== Hardening Complete ==="

7. Performance Tuning

System Performance Tuning

#!/bin/bash
# Performance Tuning Script

# 1. I/O Scheduler
echo "[1/5] Configuring I/O scheduler..."
echo 'ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/scheduler}="deadline"' \
  > /etc/udev/rules.d/60-scheduler.rules

# 2. Filesystem optimization
echo "[2/5] Optimizing filesystem..."
cat > /etc/fstab << 'EOF'
UUID=xxx / ext4 defaults,noatime,nodiratime 0 1
EOF

# 3. Network tuning
echo "[3/5] Tuning network parameters..."
cat > /etc/sysctl.d/99-performance.conf << 'EOF'
# TCP
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.core.somaxconn = 1024
net.ipv4.tcp_max_tw_buckets = 400000
net.ipv4.tcp_no_metrics_save = 1
net.core.netdev_max_backlog = 16384
EOF

sysctl -p /etc/sysctl.d/99-performance.conf

# 4. Process limits
echo "[4/5] Configuring process limits..."
cat > /etc/security/limits.d/50-performance.conf << 'EOF'
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
* soft memlock unlimited
* hard memlock unlimited
EOF

# 5. Swapping
echo "[5/5] Configuring swap..."
sysctl vm.swappiness=10
echo "vm.swappiness=10" >> /etc/sysctl.conf

echo "=== Performance Tuning Complete ==="

8. Container Management

Docker Daemon Configuration

// /etc/docker/daemon.json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "storage-driver": "overlay2",
  "live-restore": true,
  "userland-proxy": false,
  "no-new-privileges": true,
  "icc": false,
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 64000,
      "Soft": 64000
    }
  },
  "registry-mirrors": [
    "https://mirror.gcr.io"
  ],
  "metrics-addr": "0.0.0.0:9323",
  "experimental": false
}

Docker Compose Production Template

# docker-compose.prod.yml
version: '3.8'

services:
  app:
    image: myapp:${VERSION:-latest}
    restart: always
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '1'
          memory: 1G
    environment:
      - NODE_ENV=production
      - DATABASE_URL=postgres://db:5432/app
    env_file:
      - .env.prod
    volumes:
      - app-uploads:/app/uploads
      - app-logs:/app/logs
    networks:
      - frontend
      - backend
    healthcheck:
      test: ["CMD", "wget", "-q", "--spider", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

  nginx:
    image: nginx:alpine
    restart: always
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - ./ssl:/etc/nginx/ssl:ro
      - app-static:/app/static:ro
    networks:
      - frontend
    depends_on:
      - app

  postgres:
    image: postgres:15-alpine
    restart: always
    environment:
      - POSTGRES_DB=app
      - POSTGRES_USER=app
      - POSTGRES_PASSWORD_FILE=/run/secrets/db_password
    secrets:
      - db_password
    volumes:
      - postgres-data:/var/lib/postgresql/data
    networks:
      - backend
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app -d app"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    restart: always
    command: redis-server --appendonly yes
    volumes:
      - redis-data:/data
    networks:
      - backend

volumes:
  app-uploads:
  app-logs:
  app-static:
  postgres-data:
  redis-data:

networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true

secrets:
  db_password:
    file: ./secrets/db_password.txt

9. Backup Strategies

Automated Backup Script

#!/bin/bash
# Automated Backup Script

set -euo pipefail

# Configuration
BACKUP_DIR="/backup"
RETENTION_DAYS=30
S3_BUCKET="s3://my-backups"
DATE=$(date +%Y%m%d_%H%M%S)

# Create backup directory
mkdir -p "$BACKUP_DIR/$DATE"

# 1. Database backup
echo "Backing up database..."
pg_dump -U postgres -h localhost mydb | gzip > "$BACKUP_DIR/$DATE/database.sql.gz"

# 2. Files backup
echo "Backing up files..."
tar -czf "$BACKUP_DIR/$DATE/files.tar.gz" /var/www/app /etc/nginx /etc/ssh

# 3. Configuration backup
echo "Backing up configurations..."
tar -czf "$BACKUP_DIR/$DATE/config.tar.gz" /etc /opt

# 4. Upload to S3
echo "Uploading to S3..."
aws s3 sync "$BACKUP_DIR/$DATE" "$S3_BUCKET/$DATE/"

# 5. Clean old backups
echo "Cleaning old backups..."
find "$BACKUP_DIR" -type d -mtime +$RETENTION_DAYS -exec rm -rf {} \;
aws s3 ls "$S3_BUCKET/" | while read -r line; do
  dir_date=$(echo "$line" | awk '{print $2}' | tr -d '/')
  if [ "$dir_date" != "" ]; then
    file_date=$(date -d "$dir_date" +%s 2>/dev/null || echo 0)
    cutoff_date=$(date -d "$RETENTION_DAYS days ago" +%s)
    if [ "$file_date" -lt "$cutoff_date" ]; then
      aws s3 rm "$S3_BUCKET/$dir_date" --recursive
    fi
  fi
done

echo "Backup completed: $DATE"

10. Incident Response

Incident Response Checklist

#!/bin/bash
# Incident Response Script

echo "=== Incident Response Checklist ==="

# 1. Check system status
echo "[1/10] Checking system status..."
uptime
free -h
df -h

# 2. Check service status
echo "[2/10] Checking service status..."
systemctl list-units --type=service --state=failed

# 3. Check recent logs
echo "[3/10] Checking recent logs..."
journalctl -p err -n 50 --no-pager

# 4. Check network connections
echo "[4/10] Checking network connections..."
ss -tunap
netstat -tunap

# 5. Check CPU usage
echo "[5/10] Checking CPU usage..."
top -bn1 | head -20

# 6. Check memory usage
echo "[6/10] Checking memory usage..."
ps aux --sort=-%mem | head -10

# 7. Check disk I/O
echo "[7/10] Checking disk I/O..."
iotop -b -n 1 | head -20

# 8. Check failed login attempts
echo "[8/10] Checking failed logins..."
grep "Failed password" /var/log/auth.log | tail -20

# 9. Check firewall status
echo "[9/10] Checking firewall status..."
ufw status

# 10. Save system state
echo "[10/10] Saving system state..."
STATE_FILE="/tmp/incident-state-$(date +%Y%m%d_%H%M%S).txt"
{
  echo "=== System State ==="
  uptime
  free -h
  df -h
  echo ""
  echo "=== Failed Services ==="
  systemctl list-units --type=service --state=failed
  echo ""
  echo "=== Recent Errors ==="
  journalctl -p err -n 20 --no-pager
  echo ""
  echo "=== Network Connections ==="
  ss -tunap
} > "$STATE_FILE"

echo "=== Incident Response Complete ==="
echo "State saved to: $STATE_FILE"

11. Decision Trees

Monitoring Solution Selection

What to monitor?
│
├─ System metrics → Prometheus + Node Exporter
├─ Application metrics → Prometheus + Custom Exporter
├─ Logs → ELK Stack / Loki
├─ Tracing → Jaeger / Tempo
├─ Uptime → Uptime Robot / Pingdom
└─ Security → Wazuh / OSSEC

Backup Strategy Selection

Data type?
│
├─ Database → Logical backup (pg_dump/mysqldump) + Physical backup
├─ Files → Rsync + Snapshots
├─ Configuration → Git + Version control
├─ Entire server → Full disk image (AMIs, snapshots)
└─ Disaster recovery → Off-site replication + DR drills

12. Anti-Patterns to Avoid

  1. No Monitoring: Never deploy without monitoring and alerting
  2. Weak Authentication: Always use SSH keys, never password auth
  3. No Backups: Always implement 3-2-1 backup strategy
  4. Ignoring Logs: Regular review of logs prevents incidents
  5. No Documentation: Document all configurations and changes
  6. No Testing: Test all changes in staging first
  7. No Rollback Plan: Always have a rollback plan for changes
  8. No Security Updates: Keep system updated with security patches
  9. No Resource Limits: Set limits on all services
  10. No Incident Response: Have a plan before incidents occur

13. Quality Checklist

Before considering server management complete:

  • All services have systemd unit files
  • Firewall configured and enabled
  • SSH hardened (key-based only)
  • Automatic security updates configured
  • Monitoring and alerting setup
  • Log rotation configured
  • Backup strategy implemented and tested
  • Documentation up to date
  • Incident response plan in place
  • Resource limits configured
  • SSL/TLS certificates valid
  • Security hardening applied
  • Performance tuning applied
  • Disaster recovery tested
  • Compliance requirements met
  • Network segmentation implemented
  • Vulnerability scanning performed
  • Access controls implemented
  • Change management process in place
  • Automation implemented for repetitive tasks

This comprehensive skill definition provides complete guidance for Linux server administration across production environments.