Files

Pony Alpha 2 68453089ee feat: initial Alpha Brain 2 dataset release

Massive training corpus for AI coding models containing:
- 10 JSONL training datasets (641+ examples across coding, reasoning, planning, architecture, communication, debugging, security, workflows, error handling, UI/UX)
- 11 agent behavior specifications (explorer, planner, reviewer, debugger, executor, UI designer, Linux admin, kernel engineer, security architect, automation engineer, API architect)
- 6 skill definition files (coding, API engineering, kernel, Linux server, security architecture, server automation, UI/UX)
- Master README with project origin story and philosophy

Built by Pony Alpha 2 to help AI models learn expert-level coding approaches.

2026-03-13 16:26:29 +04:00

32 KiB

Raw Blame History

Linux Server Administration Expert Skill

Activation Criteria

Activate this skill when the user:

Requests Linux server setup, configuration, or maintenance
Needs service management (systemd, init scripts)
Requires networking configuration (firewalls, routing, DNS)
Asks for storage management (LVM, RAID, filesystems)
Needs monitoring and logging solutions
Requires security hardening or compliance
Requests performance tuning or optimization
Needs container orchestration (Docker, Kubernetes)
Asks for backup and disaster recovery strategies
Requires incident response or troubleshooting
Needs automation (Ansible, bash scripts, cron)
Is managing: web servers, database servers, application servers, cloud instances

Core Methodology

1. System Administration Methodology

Server Setup Workflow

#!/bin/bash
# Initial Server Setup Script
# Usage: sudo ./initial-setup.sh

set -euo pipefail

# Configuration
SERVER_HOSTNAME="${1:-web-server-01}"
TIMEZONE="${2:-UTC}"
ADMIN_USER="${3:-admin}"
SSH_PORT="${4:-22}"

echo "=== Initial Server Setup ==="
echo "Hostname: $SERVER_HOSTNAME"
echo "Timezone: $TIMEZONE"
echo "Admin User: $ADMIN_USER"
echo "SSH Port: $SSH_PORT"

# 1. Update System
echo "[1/8] Updating system packages..."
apt-get update && apt-get upgrade -y
apt-get install -y \
    curl \
    wget \
    git \
    vim \
    htop \
    tmux \
    net-tools \
    lsof \
    strace \
    tcpdump \
    iotop \
    sysstat \
    fail2ban \
    ufw

# 2. Set Hostname
echo "[2/8] Setting hostname..."
hostnamectl set-hostname "$SERVER_HOSTNAME"
echo "127.0.0.1 localhost $SERVER_HOSTNAME" >> /etc/hosts

# 3. Configure Timezone
echo "[3/8] Configuring timezone..."
timedatectl set-timezone "$TIMEZONE"

# 4. Create Admin User
echo "[4/8] Creating admin user..."
if ! id "$ADMIN_USER" &>/dev/null; then
    useradd -m -s /bin/bash "$ADMIN_USER"
    usermod -aG sudo "$ADMIN_USER"
    echo "$ADMIN_USER ALL=(ALL) NOPASSWD:ALL" > "/etc/sudoers.d/$ADMIN_USER"
    chmod 440 "/etc/sudoers.d/$ADMIN_USER"
fi

# 5. Configure SSH
echo "[5/8] Hardening SSH configuration..."
sed -i 's/#PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config
sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config
sed -i "s/#Port 22/Port $SSH_PORT/" /etc/ssh/sshd_config
echo "AllowUsers $ADMIN_USER" >> /etc/ssh/sshd_config
systemctl restart sshd

# 6. Configure Firewall
echo "[6/8] Configuring firewall..."
ufw default deny incoming
ufw default allow outgoing
ufw allow "$SSH_PORT"/tcp
ufw allow 80/tcp
ufw allow 443/tcp
ufw --force enable

# 7. Configure Fail2Ban
echo "[7/8] Configuring Fail2Ban..."
cat > /etc/fail2ban/jail.local << 'EOF'
[DEFAULT]
bantime = 3600
findtime = 600
maxretry = 5
destemail = admin@example.com
sendername = Fail2Ban
action = %(action_mwl)s

[sshd]
enabled = true
port = ssh
filter = sshd
logpath = /var/log/auth.log
maxretry = 3
EOF

systemctl enable fail2ban
systemctl start fail2ban

# 8. Configure System Monitoring
echo "[8/8] Installing monitoring tools..."
apt-get install -y prometheus-node-exporter
systemctl enable prometheus-node-exporter
systemctl start prometheus-node-exporter

echo "=== Setup Complete ==="
echo "Next steps:"
echo "1. Copy SSH key for $ADMIN_USER"
echo "2. Test SSH connection on port $SSH_PORT"
echo "3. Reboot server"

2. Service Management

Systemd Unit Files

Web Application Service

# /etc/systemd/system/webapp.service
[Unit]
Description=Web Application Service
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service

[Service]
Type=simple
User=webapp
Group=webapp
WorkingDirectory=/opt/webapp
Environment="NODE_ENV=production"
Environment="PORT=3000"
EnvironmentFile=/opt/webapp/.env

ExecStart=/usr/bin/node /opt/webapp/server.js
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/kill -SIGTERM $MAINPID

Restart=always
RestartSec=10
StartLimitInterval=200
StartLimitBurst=5

# Security
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/webapp/logs /opt/webapp/uploads
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
AmbientCapabilities=CAP_NET_BIND_SERVICE

# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=webapp

[Install]
WantedBy=multi-user.target

Database Service (PostgreSQL)

# /etc/systemd/system/postgresql-custom.service
[Unit]
Description=PostgreSQL Database Server
After=network.target
Wants=network-online.target

[Service]
Type=notify
User=postgres
Group=postgres

Environment=PGDATA=/var/lib/postgresql/data
Environment=PGPORT=5432

ExecStart=/usr/bin/postgres -D ${PGDATA}
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/usr/bin/pg_ctl stop -D ${PGDATA} -m fast

TimeoutSec=300
OOMScoreAdjust=-1000

# Performance
LimitNOFILE=65536
LimitMEMLOCK=infinity

# Security
ProtectHome=true
ProtectSystem=strict
ReadWritePaths=/var/lib/postgresql /var/run/postgresql
PrivateDevices=true

[Install]
WantedBy=multi-user.target

Background Worker Service

# /etc/systemd/system/worker.service
[Unit]
Description=Background Job Worker
After=network.target redis.service
Wants=redis.service

[Service]
Type=forking
User=worker
Group=worker
WorkingDirectory=/opt/worker

Environment="QUEUE=default"
Environment="CONCURRENCY=4"
EnvironmentFile=/opt/worker/.env

ExecStart=/usr/bin/python3 -m worker start --daemon
ExecStop=/usr/bin/python3 -m worker stop
ExecReload=/usr/bin/python3 -m worker reload

Restart=on-failure
RestartSec=5

# Resource Limits
MemoryMax=2G
CPUQuota=200%

[Install]
WantedBy=multi-user.target

Service Management Commands

# Service Operations
systemctl start service-name
systemctl stop service-name
systemctl restart service-name
systemctl reload service-name
systemctl status service-name

# Enable/Disable Services
systemctl enable service-name    # Enable at boot
systemctl disable service-name   # Disable at boot
systemctl is-enabled service-name

# View Service Details
systemctl show service-name
journalctl -u service-name -f
journalctl -u service-name --since "1 hour ago"

# List Services
systemctl list-units --type=service
systemctl list-units --type=service --state=running
systemctl list-units --type=service --all

# Service Dependencies
systemctl list-dependencies service-name
systemctl list-dependencies --reverse service-name

# Resource Usage
systemd-cgtop
systemctl show service-name -p CPUUsage,MemoryCurrent

3. Network Configuration

Firewall Configuration (UFW)

#!/bin/bash
# Firewall Setup Script

# Reset firewall
ufw --force reset

# Default policies
ufw default deny incoming
ufw default allow outgoing
ufw default deny routed

# Allow loopback
ufw allow in on lo

# Allow established connections
ufw allow established
ufw allow related

# SSH (custom port)
ufw allow 2222/tcp comment 'SSH'

# HTTP/HTTPS
ufw allow 80/tcp comment 'HTTP'
ufw allow 443/tcp comment 'HTTPS'

# Database (internal only)
ufw allow from 10.0.0.0/8 to any port 5432 proto tcp comment 'PostgreSQL internal'

# Monitoring
ufw allow from 10.0.0.0/8 to any port 9100 proto tcp comment 'Node Exporter'

# Rate limiting
ufw limit 2222/tcp comment 'Rate limit SSH'

# Enable firewall
ufw --force enable

# Show rules
ufw status numbered

iptables Configuration

#!/bin/bash
# Advanced iptables Configuration

# Flush existing rules
iptables -F
iptables -X
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X

# Default policies
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT

# Allow loopback
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT

# Allow established connections
iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

# Drop invalid packets
iptables -A INPUT -m conntrack --ctstate INVALID -j DROP

# Allow SSH
iptables -A INPUT -p tcp --dport 2222 -m conntrack --ctstate NEW -m limit --limit 3/min --limit-burst 3 -j ACCEPT
iptables -A INPUT -p tcp --dport 2222 -j DROP

# Allow HTTP/HTTPS
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT

# Anti-spoofing
iptables -A INPUT -s 127.0.0.0/8 ! -i lo -j DROP
iptables -A INPUT -s 0.0.0.0/8 -j DROP
iptables -A INPUT -s 224.0.0.0/4 -j DROP

# ICMP protection
iptables -A INPUT -p icmp --icmp-type echo-request -m limit --limit 1/s -j ACCEPT
iptables -A INPUT -p icmp --icmp-type echo-request -j DROP

# SYN flood protection
iptables -A INPUT -p tcp --syn -m limit --limit 10/s --limit-burst 20 -j ACCEPT
iptables -A INPUT -p tcp --syn -j DROP

# Port scan protection
iptables -A INPUT -p tcp --tcp-flags ALL NONE -j DROP
iptables -A INPUT -p tcp --tcp-flags ALL ALL -j DROP

# Log dropped packets
iptables -A INPUT -j LOG --log-prefix "[DROPPED] " --log-level 4

# Save rules
iptables-save > /etc/iptables/rules.v4

Nginx Configuration

# /etc/nginx/nginx.conf
user nginx;
worker_processes auto;
worker_rlimit_nofile 65535;

error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;

events {
    worker_connections 4096;
    use epoll;
    multi_accept on;
}

http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # Logging
    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for"';

    access_log /var/log/nginx/access.log main;

    # Performance
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server_tokens off;

    # Buffers
    client_body_buffer_size 128k;
    client_max_body_size 100m;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 4k;
    output_buffers 1 32k;
    postpone_output 1460;

    # Gzip
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css text/xml text/javascript
               application/json application/javascript application/xml+rss
               application/rss+xml font/truetype font/opentype
               application/vnd.ms-fontobject image/svg+xml;
    gzip_disable "msie6";

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;
    add_header Referrer-Policy "no-referrer-when-downgrade" always;
    add_header Content-Security-Policy "default-src 'self' http: https: data: blob: 'unsafe-inline'" always;

    # Rate limiting
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
    limit_req_zone $binary_remote_addr zone=general:10m rate=5r/s;
    limit_conn_zone $binary_remote_addr zone=addr:10m;

    # Include site configs
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

Application Server Configuration

# /etc/nginx/sites-available/app.conf
upstream app_backend {
    least_conn;
    server 127.0.0.1:3000 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:3001 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:3002 max_fails=3 fail_timeout=30s;
    keepalive 32;
}

# Rate limiting
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=20r/s;

server {
    listen 80;
    listen [::]:80;
    server_name app.example.com;

    # Redirect to HTTPS
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name app.example.com;

    # SSL Configuration
    ssl_certificate /etc/ssl/certs/app.example.com.crt;
    ssl_certificate_key /etc/ssl/private/app.example.com.key;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384';
    ssl_prefer_server_ciphers on;
    ssl_session_cache shared:SSL:10m;
    ssl_session_timeout 10m;
    ssl_stapling on;
    ssl_stapling_verify on;

    # Security headers
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

    # Root
    root /var/www/app;
    index index.html;

    # Client upload size
    client_max_body_size 100M;

    # Logging
    access_log /var/log/nginx/app.access.log;
    error_log /var/log/nginx/app.error.log;

    # Static files
    location /static/ {
        alias /var/www/app/static/;
        expires 1y;
        add_header Cache-Control "public, immutable";
        access_log off;
    }

    # Media files
    location /media/ {
        alias /var/www/app/media/;
        expires 30d;
        add_header Cache-Control "public";
    }

    # API endpoints
    location /api/ {
        limit_req zone=api_limit burst=40 nodelay;

        proxy_pass http://app_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_cache_bypass $http_upgrade;

        # Timeouts
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }

    # WebSocket
    location /ws/ {
        proxy_pass http://app_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_read_timeout 86400;
    }

    # Health check
    location /health {
        access_log off;
        return 200 "healthy\n";
        add_header Content-Type text/plain;
    }

    # Favicon
    location = /favicon.ico {
        access_log off;
        log_not_found off;
    }

    # Robots.txt
    location = /robots.txt {
        access_log off;
        log_not_found off;
    }

    # Deny access to hidden files
    location ~ /\. {
        deny all;
        access_log off;
        log_not_found off;
    }
}

4. Storage Management

LVM Configuration

#!/bin/bash
# LVM Setup Script

VG_NAME="vg01"
LV_DATA="lv-data"
LV_LOGS="lv-logs"
LV_BACKUP="lv-backup"

# Create physical volume
pvcreate /dev/sdb

# Create volume group
vgcreate $VG_NAME /dev/sdb

# Create logical volumes
lvcreate -L 100G -n $LV_DATA $VG_NAME
lvcreate -L 50G -n $LV_LOGS $VG_NAME
lvcreate -L 200G -n $LV_BACKUP $VG_NAME

# Create filesystems
mkfs.xfs /dev/$VG_NAME/$LV_DATA
mkfs.xfs /dev/$VG_NAME/$LV_LOGS
mkfs.ext4 /dev/$VG_NAME/$LV_BACKUP

# Create mount points
mkdir -p /data /logs /backup

# Update /etc/fstab
cat >> /etc/fstab << EOF
/dev/$VG_NAME/$LV_DATA    /data    xfs    defaults,noatime    0 2
/dev/$VG_NAME/$LV_LOGS    /logs    xfs    defaults,noatime    0 2
/dev/$VG_NAME/$LV_BACKUP  /backup  ext4   defaults,noatime    0 2
EOF

# Mount filesystems
mount -a

# Verify
df -h
vgdisplay
lvdisplay

RAID Configuration

#!/bin/bash
# RAID 10 Setup Script

# Install mdadm
apt-get install -y mdadm

# Create RAID 10 array
mdadm --create /dev/md0 \
  --level=10 \
  --raid-devices=4 \
  /dev/sdb /dev/sdc /dev/sdd /dev/sde

# Create filesystem
mkfs.ext4 /dev/md0

# Create mount point
mkdir -p /raid

# Update /etc/fstab
echo "/dev/md0 /raid ext4 defaults,noatime 0 2" >> /etc/fstab

# Mount
mount /raid

# Save RAID configuration
mdadm --detail --scan >> /etc/mdadm/mdadm.conf

# Verify
cat /proc/mdstat
mdadm --detail /dev/md0

5. Monitoring and Logging

Prometheus Configuration

# /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'production'
    replica: '1'

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - localhost:9093

# Rule files
rule_files:
  - '/etc/prometheus/rules/*.yml'

# Scrape configurations
scrape_configs:
  # Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Node Exporter
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
        replacement: 'web-server-01'

  # Nginx Exporter
  - job_name: 'nginx'
    static_configs:
      - targets: ['localhost:9113']

  # PostgreSQL Exporter
  - job_name: 'postgresql'
    static_configs:
      - targets: ['localhost:9187']

  # Application metrics
  - job_name: 'application'
    static_configs:
      - targets: ['localhost:3000']
    metrics_path: '/metrics'

Alert Rules

# /etc/prometheus/rules/alerts.yml
groups:
  - name: system
    interval: 30s
    rules:
      # CPU usage
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is {{ $value }}%"

      # Memory usage
      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"
          description: "Memory usage is {{ $value }}%"

      # Disk space
      - alert: LowDiskSpace
        expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "Only {{ $value }}% free space on /"

  - name: application
    interval: 30s
    rules:
      # HTTP error rate
      - alert: HighErrorRate
        expr: (rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])) * 100 > 5
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High HTTP error rate"
          description: "Error rate is {{ $value }}%"

      # Response time
      - alert: SlowResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Slow response time"
          description: "P95 response time is {{ $value }}s"

Log Rotation

# /etc/logrotate.d/custom
/var/log/myapp/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    copytruncate
    notifempty
    create 0640 www-data adm
    sharedscripts
    postrotate
        systemctl reload myapp > /dev/null 2>&1 || true
    endscript
}

6. Security Hardening

SSH Hardening

# /etc/ssh/sshd_config

# Basic
Port 2222
Protocol 2
HostKey /etc/ssh/ssh_host_ed25519_key
HostKey /etc/ssh/ssh_host_rsa_key

# Logging
SyslogFacility AUTHPRIV
LogLevel VERBOSE

# Authentication
LoginGraceTime 60
PermitRootLogin no
StrictModes yes
MaxAuthTries 3
MaxStartups 10:30:60
PasswordAuthentication no
PermitEmptyPasswords no
ChallengeResponseAuthentication no

# Key-based authentication
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys

# Security
UsePAM yes
X11Forwarding no
AllowTcpForwarding no
AllowAgentForwarding no
PermitTunnel no
GatewayPorts no

# Restrict users
AllowUsers admin deploy
DenyUsers root

# Ciphers
Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com

# MACs
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com

# Kex
KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256

# Restart service
systemctl restart sshd

System Hardening Script

#!/bin/bash
# System Security Hardening

set -euo pipefail

echo "=== System Security Hardening ==="

# 1. Disable unused filesystems
echo "[1/10] Disabling unused filesystems..."
cat > /etc/modprobe.d/disable-filesystems.conf << 'EOF'
install cramfs /bin/true
install freevxfs /bin/true
install jffs2 /bin/true
install hfsplus /bin/true
install squashfs /bin/true
install udf /bin/true
EOF

# 2. Kernel hardening
echo "[2/10] Configuring kernel parameters..."
cat > /etc/sysctl.d/99-security.conf << 'EOF'
# Network
net.ipv4.ip_forward = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 5

# IPv6
net.ipv6.conf.all.accept_redirects = 0
net.ipv6.conf.default.accept_redirects = 0

# Kernel
kernel.randomize_va_space = 2
kernel.kptr_restrict = 2
kernel.dmesg_restrict = 1
kernel.perf_event_paranoid = 2

# Core dumps
fs.suid_dumpable = 0
kernel.core_pattern = |/bin/false
EOF

sysctl -p /etc/sysctl.d/99-security.conf

# 3. Restrict core dumps
echo "[3/10] Restricting core dumps..."
cat > /etc/security/limits.d/50-core.conf << 'EOF'
* hard core 0
EOF

# 4. Disable USB storage
echo "[4/10] Disabling USB storage..."
echo "install usb-storage /bin/true" > /etc/modprobe.d/disable-usb.conf

# 5. Configure PAM
echo "[5/10] Configuring PAM..."
cat > /etc/pam.d/login << 'EOF'
auth required pam_unix.so
auth requisite pam_deny.so
auth sufficient pam_rootok.so
account required pam_unix.so
password required pam_unix.so sha512 shadow
session required pam_unix.so
session required pam_limits.so
EOF

# 6. Disable unnecessary services
echo "[6/10] Disabling unnecessary services..."
systemctl disable cups 2>/dev/null || true
systemctl disable avahi-daemon 2>/dev/null || true
systemctl disable bluetooth 2>/dev/null || true

# 7. Install security updates
echo "[7/10] Installing security updates..."
apt-get install -y unattended-upgrades
dpkg-reconfigure -plow unattended-upgrades

# 8. Configure automatic updates
echo "[8/10] Configuring automatic updates..."
cat > /etc/apt/apt.conf.d/50unattended-upgrades << 'EOF'
Unattended-Upgrade::Allowed-Origins {
    "${distro_id}:${distro_codename}-security";
};
Unattended-Upgrade::AutoFixInterruptedDpkg "true";
Unattended-Upgrade::MinimalSteps "true";
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
Unattended-Upgrade::Automatic-Reboot "false";
Unattended-Upgrade::Automatic-Reboot-Time "02:00";
EOF

# 9. Install auditd
echo "[9/10] Installing audit daemon..."
apt-get install -y auditd audispd-plugins
systemctl enable auditd
systemctl start auditd

# 10. Configure audit rules
echo "[10/10] Configuring audit rules..."
cat > /etc/audit/rules.d/audit.rules << 'EOF'
-w /etc/passwd -p wa -k identity
-w /etc/group -p wa -k identity
-w /etc/shadow -p wa -k identity
-w /etc/sudoers -p wa -k sudoers
-w /var/log/audit/ -p wa -k audit-logs
-w /etc/ssh/sshd_config -p wa -k sshd
-w /var/log/auth.log -p wa -k auth-log
EOF

systemctl restart auditd

echo "=== Hardening Complete ==="

7. Performance Tuning

System Performance Tuning

#!/bin/bash
# Performance Tuning Script

# 1. I/O Scheduler
echo "[1/5] Configuring I/O scheduler..."
echo 'ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/scheduler}="deadline"' \
  > /etc/udev/rules.d/60-scheduler.rules

# 2. Filesystem optimization
echo "[2/5] Optimizing filesystem..."
cat > /etc/fstab << 'EOF'
UUID=xxx / ext4 defaults,noatime,nodiratime 0 1
EOF

# 3. Network tuning
echo "[3/5] Tuning network parameters..."
cat > /etc/sysctl.d/99-performance.conf << 'EOF'
# TCP
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.core.somaxconn = 1024
net.ipv4.tcp_max_tw_buckets = 400000
net.ipv4.tcp_no_metrics_save = 1
net.core.netdev_max_backlog = 16384
EOF

sysctl -p /etc/sysctl.d/99-performance.conf

# 4. Process limits
echo "[4/5] Configuring process limits..."
cat > /etc/security/limits.d/50-performance.conf << 'EOF'
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
* soft memlock unlimited
* hard memlock unlimited
EOF

# 5. Swapping
echo "[5/5] Configuring swap..."
sysctl vm.swappiness=10
echo "vm.swappiness=10" >> /etc/sysctl.conf

echo "=== Performance Tuning Complete ==="

8. Container Management

Docker Daemon Configuration

// /etc/docker/daemon.json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "storage-driver": "overlay2",
  "live-restore": true,
  "userland-proxy": false,
  "no-new-privileges": true,
  "icc": false,
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 64000,
      "Soft": 64000
    }
  },
  "registry-mirrors": [
    "https://mirror.gcr.io"
  ],
  "metrics-addr": "0.0.0.0:9323",
  "experimental": false
}

Docker Compose Production Template

# docker-compose.prod.yml
version: '3.8'

services:
  app:
    image: myapp:${VERSION:-latest}
    restart: always
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '1'
          memory: 1G
    environment:
      - NODE_ENV=production
      - DATABASE_URL=postgres://db:5432/app
    env_file:
      - .env.prod
    volumes:
      - app-uploads:/app/uploads
      - app-logs:/app/logs
    networks:
      - frontend
      - backend
    healthcheck:
      test: ["CMD", "wget", "-q", "--spider", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

  nginx:
    image: nginx:alpine
    restart: always
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - ./ssl:/etc/nginx/ssl:ro
      - app-static:/app/static:ro
    networks:
      - frontend
    depends_on:
      - app

  postgres:
    image: postgres:15-alpine
    restart: always
    environment:
      - POSTGRES_DB=app
      - POSTGRES_USER=app
      - POSTGRES_PASSWORD_FILE=/run/secrets/db_password
    secrets:
      - db_password
    volumes:
      - postgres-data:/var/lib/postgresql/data
    networks:
      - backend
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app -d app"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    restart: always
    command: redis-server --appendonly yes
    volumes:
      - redis-data:/data
    networks:
      - backend

volumes:
  app-uploads:
  app-logs:
  app-static:
  postgres-data:
  redis-data:

networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true

secrets:
  db_password:
    file: ./secrets/db_password.txt

9. Backup Strategies

Automated Backup Script

#!/bin/bash
# Automated Backup Script

set -euo pipefail

# Configuration
BACKUP_DIR="/backup"
RETENTION_DAYS=30
S3_BUCKET="s3://my-backups"
DATE=$(date +%Y%m%d_%H%M%S)

# Create backup directory
mkdir -p "$BACKUP_DIR/$DATE"

# 1. Database backup
echo "Backing up database..."
pg_dump -U postgres -h localhost mydb | gzip > "$BACKUP_DIR/$DATE/database.sql.gz"

# 2. Files backup
echo "Backing up files..."
tar -czf "$BACKUP_DIR/$DATE/files.tar.gz" /var/www/app /etc/nginx /etc/ssh

# 3. Configuration backup
echo "Backing up configurations..."
tar -czf "$BACKUP_DIR/$DATE/config.tar.gz" /etc /opt

# 4. Upload to S3
echo "Uploading to S3..."
aws s3 sync "$BACKUP_DIR/$DATE" "$S3_BUCKET/$DATE/"

# 5. Clean old backups
echo "Cleaning old backups..."
find "$BACKUP_DIR" -type d -mtime +$RETENTION_DAYS -exec rm -rf {} \;
aws s3 ls "$S3_BUCKET/" | while read -r line; do
  dir_date=$(echo "$line" | awk '{print $2}' | tr -d '/')
  if [ "$dir_date" != "" ]; then
    file_date=$(date -d "$dir_date" +%s 2>/dev/null || echo 0)
    cutoff_date=$(date -d "$RETENTION_DAYS days ago" +%s)
    if [ "$file_date" -lt "$cutoff_date" ]; then
      aws s3 rm "$S3_BUCKET/$dir_date" --recursive
    fi
  fi
done

echo "Backup completed: $DATE"

10. Incident Response

Incident Response Checklist

#!/bin/bash
# Incident Response Script

echo "=== Incident Response Checklist ==="

# 1. Check system status
echo "[1/10] Checking system status..."
uptime
free -h
df -h

# 2. Check service status
echo "[2/10] Checking service status..."
systemctl list-units --type=service --state=failed

# 3. Check recent logs
echo "[3/10] Checking recent logs..."
journalctl -p err -n 50 --no-pager

# 4. Check network connections
echo "[4/10] Checking network connections..."
ss -tunap
netstat -tunap

# 5. Check CPU usage
echo "[5/10] Checking CPU usage..."
top -bn1 | head -20

# 6. Check memory usage
echo "[6/10] Checking memory usage..."
ps aux --sort=-%mem | head -10

# 7. Check disk I/O
echo "[7/10] Checking disk I/O..."
iotop -b -n 1 | head -20

# 8. Check failed login attempts
echo "[8/10] Checking failed logins..."
grep "Failed password" /var/log/auth.log | tail -20

# 9. Check firewall status
echo "[9/10] Checking firewall status..."
ufw status

# 10. Save system state
echo "[10/10] Saving system state..."
STATE_FILE="/tmp/incident-state-$(date +%Y%m%d_%H%M%S).txt"
{
  echo "=== System State ==="
  uptime
  free -h
  df -h
  echo ""
  echo "=== Failed Services ==="
  systemctl list-units --type=service --state=failed
  echo ""
  echo "=== Recent Errors ==="
  journalctl -p err -n 20 --no-pager
  echo ""
  echo "=== Network Connections ==="
  ss -tunap
} > "$STATE_FILE"

echo "=== Incident Response Complete ==="
echo "State saved to: $STATE_FILE"

11. Decision Trees

Monitoring Solution Selection

What to monitor?
│
├─ System metrics → Prometheus + Node Exporter
├─ Application metrics → Prometheus + Custom Exporter
├─ Logs → ELK Stack / Loki
├─ Tracing → Jaeger / Tempo
├─ Uptime → Uptime Robot / Pingdom
└─ Security → Wazuh / OSSEC

Backup Strategy Selection

Data type?
│
├─ Database → Logical backup (pg_dump/mysqldump) + Physical backup
├─ Files → Rsync + Snapshots
├─ Configuration → Git + Version control
├─ Entire server → Full disk image (AMIs, snapshots)
└─ Disaster recovery → Off-site replication + DR drills

12. Anti-Patterns to Avoid

No Monitoring: Never deploy without monitoring and alerting
Weak Authentication: Always use SSH keys, never password auth
No Backups: Always implement 3-2-1 backup strategy
Ignoring Logs: Regular review of logs prevents incidents
No Documentation: Document all configurations and changes
No Testing: Test all changes in staging first
No Rollback Plan: Always have a rollback plan for changes
No Security Updates: Keep system updated with security patches
No Resource Limits: Set limits on all services
No Incident Response: Have a plan before incidents occur

13. Quality Checklist

Before considering server management complete:

All services have systemd unit files
Firewall configured and enabled
SSH hardened (key-based only)
Automatic security updates configured
Monitoring and alerting setup
Log rotation configured
Backup strategy implemented and tested
Documentation up to date
Incident response plan in place
Resource limits configured
SSL/TLS certificates valid
Security hardening applied
Performance tuning applied
Disaster recovery tested
Compliance requirements met
Network segmentation implemented
Vulnerability scanning performed
Access controls implemented
Change management process in place
Automation implemented for repetitive tasks

This comprehensive skill definition provides complete guidance for Linux server administration across production environments.

32 KiB Raw Blame History