# Linux Server Administration Expert Skill ## Activation Criteria Activate this skill when the user: - Requests Linux server setup, configuration, or maintenance - Needs service management (systemd, init scripts) - Requires networking configuration (firewalls, routing, DNS) - Asks for storage management (LVM, RAID, filesystems) - Needs monitoring and logging solutions - Requires security hardening or compliance - Requests performance tuning or optimization - Needs container orchestration (Docker, Kubernetes) - Asks for backup and disaster recovery strategies - Requires incident response or troubleshooting - Needs automation (Ansible, bash scripts, cron) - Is managing: web servers, database servers, application servers, cloud instances ## Core Methodology ### 1. System Administration Methodology #### Server Setup Workflow ```bash #!/bin/bash # Initial Server Setup Script # Usage: sudo ./initial-setup.sh set -euo pipefail # Configuration SERVER_HOSTNAME="${1:-web-server-01}" TIMEZONE="${2:-UTC}" ADMIN_USER="${3:-admin}" SSH_PORT="${4:-22}" echo "=== Initial Server Setup ===" echo "Hostname: $SERVER_HOSTNAME" echo "Timezone: $TIMEZONE" echo "Admin User: $ADMIN_USER" echo "SSH Port: $SSH_PORT" # 1. Update System echo "[1/8] Updating system packages..." apt-get update && apt-get upgrade -y apt-get install -y \ curl \ wget \ git \ vim \ htop \ tmux \ net-tools \ lsof \ strace \ tcpdump \ iotop \ sysstat \ fail2ban \ ufw # 2. Set Hostname echo "[2/8] Setting hostname..." hostnamectl set-hostname "$SERVER_HOSTNAME" echo "127.0.0.1 localhost $SERVER_HOSTNAME" >> /etc/hosts # 3. Configure Timezone echo "[3/8] Configuring timezone..." timedatectl set-timezone "$TIMEZONE" # 4. Create Admin User echo "[4/8] Creating admin user..." if ! id "$ADMIN_USER" &>/dev/null; then useradd -m -s /bin/bash "$ADMIN_USER" usermod -aG sudo "$ADMIN_USER" echo "$ADMIN_USER ALL=(ALL) NOPASSWD:ALL" > "/etc/sudoers.d/$ADMIN_USER" chmod 440 "/etc/sudoers.d/$ADMIN_USER" fi # 5. Configure SSH echo "[5/8] Hardening SSH configuration..." sed -i 's/#PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config sed -i "s/#Port 22/Port $SSH_PORT/" /etc/ssh/sshd_config echo "AllowUsers $ADMIN_USER" >> /etc/ssh/sshd_config systemctl restart sshd # 6. Configure Firewall echo "[6/8] Configuring firewall..." ufw default deny incoming ufw default allow outgoing ufw allow "$SSH_PORT"/tcp ufw allow 80/tcp ufw allow 443/tcp ufw --force enable # 7. Configure Fail2Ban echo "[7/8] Configuring Fail2Ban..." cat > /etc/fail2ban/jail.local << 'EOF' [DEFAULT] bantime = 3600 findtime = 600 maxretry = 5 destemail = admin@example.com sendername = Fail2Ban action = %(action_mwl)s [sshd] enabled = true port = ssh filter = sshd logpath = /var/log/auth.log maxretry = 3 EOF systemctl enable fail2ban systemctl start fail2ban # 8. Configure System Monitoring echo "[8/8] Installing monitoring tools..." apt-get install -y prometheus-node-exporter systemctl enable prometheus-node-exporter systemctl start prometheus-node-exporter echo "=== Setup Complete ===" echo "Next steps:" echo "1. Copy SSH key for $ADMIN_USER" echo "2. Test SSH connection on port $SSH_PORT" echo "3. Reboot server" ``` ### 2. Service Management #### Systemd Unit Files **Web Application Service** ```ini # /etc/systemd/system/webapp.service [Unit] Description=Web Application Service After=network.target postgresql.service redis.service Wants=postgresql.service redis.service [Service] Type=simple User=webapp Group=webapp WorkingDirectory=/opt/webapp Environment="NODE_ENV=production" Environment="PORT=3000" EnvironmentFile=/opt/webapp/.env ExecStart=/usr/bin/node /opt/webapp/server.js ExecReload=/bin/kill -HUP $MAINPID ExecStop=/bin/kill -SIGTERM $MAINPID Restart=always RestartSec=10 StartLimitInterval=200 StartLimitBurst=5 # Security NoNewPrivileges=true PrivateTmp=true ProtectSystem=strict ProtectHome=true ReadWritePaths=/opt/webapp/logs /opt/webapp/uploads CapabilityBoundingSet=CAP_NET_BIND_SERVICE AmbientCapabilities=CAP_NET_BIND_SERVICE # Logging StandardOutput=journal StandardError=journal SyslogIdentifier=webapp [Install] WantedBy=multi-user.target ``` **Database Service (PostgreSQL)** ```ini # /etc/systemd/system/postgresql-custom.service [Unit] Description=PostgreSQL Database Server After=network.target Wants=network-online.target [Service] Type=notify User=postgres Group=postgres Environment=PGDATA=/var/lib/postgresql/data Environment=PGPORT=5432 ExecStart=/usr/bin/postgres -D ${PGDATA} ExecReload=/bin/kill -HUP $MAINPID ExecStop=/usr/bin/pg_ctl stop -D ${PGDATA} -m fast TimeoutSec=300 OOMScoreAdjust=-1000 # Performance LimitNOFILE=65536 LimitMEMLOCK=infinity # Security ProtectHome=true ProtectSystem=strict ReadWritePaths=/var/lib/postgresql /var/run/postgresql PrivateDevices=true [Install] WantedBy=multi-user.target ``` **Background Worker Service** ```ini # /etc/systemd/system/worker.service [Unit] Description=Background Job Worker After=network.target redis.service Wants=redis.service [Service] Type=forking User=worker Group=worker WorkingDirectory=/opt/worker Environment="QUEUE=default" Environment="CONCURRENCY=4" EnvironmentFile=/opt/worker/.env ExecStart=/usr/bin/python3 -m worker start --daemon ExecStop=/usr/bin/python3 -m worker stop ExecReload=/usr/bin/python3 -m worker reload Restart=on-failure RestartSec=5 # Resource Limits MemoryMax=2G CPUQuota=200% [Install] WantedBy=multi-user.target ``` #### Service Management Commands ```bash # Service Operations systemctl start service-name systemctl stop service-name systemctl restart service-name systemctl reload service-name systemctl status service-name # Enable/Disable Services systemctl enable service-name # Enable at boot systemctl disable service-name # Disable at boot systemctl is-enabled service-name # View Service Details systemctl show service-name journalctl -u service-name -f journalctl -u service-name --since "1 hour ago" # List Services systemctl list-units --type=service systemctl list-units --type=service --state=running systemctl list-units --type=service --all # Service Dependencies systemctl list-dependencies service-name systemctl list-dependencies --reverse service-name # Resource Usage systemd-cgtop systemctl show service-name -p CPUUsage,MemoryCurrent ``` ### 3. Network Configuration #### Firewall Configuration (UFW) ```bash #!/bin/bash # Firewall Setup Script # Reset firewall ufw --force reset # Default policies ufw default deny incoming ufw default allow outgoing ufw default deny routed # Allow loopback ufw allow in on lo # Allow established connections ufw allow established ufw allow related # SSH (custom port) ufw allow 2222/tcp comment 'SSH' # HTTP/HTTPS ufw allow 80/tcp comment 'HTTP' ufw allow 443/tcp comment 'HTTPS' # Database (internal only) ufw allow from 10.0.0.0/8 to any port 5432 proto tcp comment 'PostgreSQL internal' # Monitoring ufw allow from 10.0.0.0/8 to any port 9100 proto tcp comment 'Node Exporter' # Rate limiting ufw limit 2222/tcp comment 'Rate limit SSH' # Enable firewall ufw --force enable # Show rules ufw status numbered ``` #### iptables Configuration ```bash #!/bin/bash # Advanced iptables Configuration # Flush existing rules iptables -F iptables -X iptables -t nat -F iptables -t nat -X iptables -t mangle -F iptables -t mangle -X # Default policies iptables -P INPUT DROP iptables -P FORWARD DROP iptables -P OUTPUT ACCEPT # Allow loopback iptables -A INPUT -i lo -j ACCEPT iptables -A OUTPUT -o lo -j ACCEPT # Allow established connections iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT # Drop invalid packets iptables -A INPUT -m conntrack --ctstate INVALID -j DROP # Allow SSH iptables -A INPUT -p tcp --dport 2222 -m conntrack --ctstate NEW -m limit --limit 3/min --limit-burst 3 -j ACCEPT iptables -A INPUT -p tcp --dport 2222 -j DROP # Allow HTTP/HTTPS iptables -A INPUT -p tcp --dport 80 -j ACCEPT iptables -A INPUT -p tcp --dport 443 -j ACCEPT # Anti-spoofing iptables -A INPUT -s 127.0.0.0/8 ! -i lo -j DROP iptables -A INPUT -s 0.0.0.0/8 -j DROP iptables -A INPUT -s 224.0.0.0/4 -j DROP # ICMP protection iptables -A INPUT -p icmp --icmp-type echo-request -m limit --limit 1/s -j ACCEPT iptables -A INPUT -p icmp --icmp-type echo-request -j DROP # SYN flood protection iptables -A INPUT -p tcp --syn -m limit --limit 10/s --limit-burst 20 -j ACCEPT iptables -A INPUT -p tcp --syn -j DROP # Port scan protection iptables -A INPUT -p tcp --tcp-flags ALL NONE -j DROP iptables -A INPUT -p tcp --tcp-flags ALL ALL -j DROP # Log dropped packets iptables -A INPUT -j LOG --log-prefix "[DROPPED] " --log-level 4 # Save rules iptables-save > /etc/iptables/rules.v4 ``` #### Nginx Configuration ```nginx # /etc/nginx/nginx.conf user nginx; worker_processes auto; worker_rlimit_nofile 65535; error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid; events { worker_connections 4096; use epoll; multi_accept on; } http { include /etc/nginx/mime.types; default_type application/octet-stream; # Logging log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /var/log/nginx/access.log main; # Performance sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; types_hash_max_size 2048; server_tokens off; # Buffers client_body_buffer_size 128k; client_max_body_size 100m; client_header_buffer_size 1k; large_client_header_buffers 4 4k; output_buffers 1 32k; postpone_output 1460; # Gzip gzip on; gzip_vary on; gzip_proxied any; gzip_comp_level 6; gzip_types text/plain text/css text/xml text/javascript application/json application/javascript application/xml+rss application/rss+xml font/truetype font/opentype application/vnd.ms-fontobject image/svg+xml; gzip_disable "msie6"; # Security headers add_header X-Frame-Options "SAMEORIGIN" always; add_header X-Content-Type-Options "nosniff" always; add_header X-XSS-Protection "1; mode=block" always; add_header Referrer-Policy "no-referrer-when-downgrade" always; add_header Content-Security-Policy "default-src 'self' http: https: data: blob: 'unsafe-inline'" always; # Rate limiting limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s; limit_req_zone $binary_remote_addr zone=general:10m rate=5r/s; limit_conn_zone $binary_remote_addr zone=addr:10m; # Include site configs include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; } ``` **Application Server Configuration** ```nginx # /etc/nginx/sites-available/app.conf upstream app_backend { least_conn; server 127.0.0.1:3000 max_fails=3 fail_timeout=30s; server 127.0.0.1:3001 max_fails=3 fail_timeout=30s; server 127.0.0.1:3002 max_fails=3 fail_timeout=30s; keepalive 32; } # Rate limiting limit_req_zone $binary_remote_addr zone=api_limit:10m rate=20r/s; server { listen 80; listen [::]:80; server_name app.example.com; # Redirect to HTTPS return 301 https://$server_name$request_uri; } server { listen 443 ssl http2; listen [::]:443 ssl http2; server_name app.example.com; # SSL Configuration ssl_certificate /etc/ssl/certs/app.example.com.crt; ssl_certificate_key /etc/ssl/private/app.example.com.key; ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384'; ssl_prefer_server_ciphers on; ssl_session_cache shared:SSL:10m; ssl_session_timeout 10m; ssl_stapling on; ssl_stapling_verify on; # Security headers add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always; # Root root /var/www/app; index index.html; # Client upload size client_max_body_size 100M; # Logging access_log /var/log/nginx/app.access.log; error_log /var/log/nginx/app.error.log; # Static files location /static/ { alias /var/www/app/static/; expires 1y; add_header Cache-Control "public, immutable"; access_log off; } # Media files location /media/ { alias /var/www/app/media/; expires 30d; add_header Cache-Control "public"; } # API endpoints location /api/ { limit_req zone=api_limit burst=40 nodelay; proxy_pass http://app_backend; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_cache_bypass $http_upgrade; # Timeouts proxy_connect_timeout 60s; proxy_send_timeout 60s; proxy_read_timeout 60s; } # WebSocket location /ws/ { proxy_pass http://app_backend; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_read_timeout 86400; } # Health check location /health { access_log off; return 200 "healthy\n"; add_header Content-Type text/plain; } # Favicon location = /favicon.ico { access_log off; log_not_found off; } # Robots.txt location = /robots.txt { access_log off; log_not_found off; } # Deny access to hidden files location ~ /\. { deny all; access_log off; log_not_found off; } } ``` ### 4. Storage Management #### LVM Configuration ```bash #!/bin/bash # LVM Setup Script VG_NAME="vg01" LV_DATA="lv-data" LV_LOGS="lv-logs" LV_BACKUP="lv-backup" # Create physical volume pvcreate /dev/sdb # Create volume group vgcreate $VG_NAME /dev/sdb # Create logical volumes lvcreate -L 100G -n $LV_DATA $VG_NAME lvcreate -L 50G -n $LV_LOGS $VG_NAME lvcreate -L 200G -n $LV_BACKUP $VG_NAME # Create filesystems mkfs.xfs /dev/$VG_NAME/$LV_DATA mkfs.xfs /dev/$VG_NAME/$LV_LOGS mkfs.ext4 /dev/$VG_NAME/$LV_BACKUP # Create mount points mkdir -p /data /logs /backup # Update /etc/fstab cat >> /etc/fstab << EOF /dev/$VG_NAME/$LV_DATA /data xfs defaults,noatime 0 2 /dev/$VG_NAME/$LV_LOGS /logs xfs defaults,noatime 0 2 /dev/$VG_NAME/$LV_BACKUP /backup ext4 defaults,noatime 0 2 EOF # Mount filesystems mount -a # Verify df -h vgdisplay lvdisplay ``` #### RAID Configuration ```bash #!/bin/bash # RAID 10 Setup Script # Install mdadm apt-get install -y mdadm # Create RAID 10 array mdadm --create /dev/md0 \ --level=10 \ --raid-devices=4 \ /dev/sdb /dev/sdc /dev/sdd /dev/sde # Create filesystem mkfs.ext4 /dev/md0 # Create mount point mkdir -p /raid # Update /etc/fstab echo "/dev/md0 /raid ext4 defaults,noatime 0 2" >> /etc/fstab # Mount mount /raid # Save RAID configuration mdadm --detail --scan >> /etc/mdadm/mdadm.conf # Verify cat /proc/mdstat mdadm --detail /dev/md0 ``` ### 5. Monitoring and Logging #### Prometheus Configuration ```yaml # /etc/prometheus/prometheus.yml global: scrape_interval: 15s evaluation_interval: 15s external_labels: cluster: 'production' replica: '1' # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - localhost:9093 # Rule files rule_files: - '/etc/prometheus/rules/*.yml' # Scrape configurations scrape_configs: # Prometheus itself - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] # Node Exporter - job_name: 'node' static_configs: - targets: ['localhost:9100'] relabel_configs: - source_labels: [__address__] target_label: instance replacement: 'web-server-01' # Nginx Exporter - job_name: 'nginx' static_configs: - targets: ['localhost:9113'] # PostgreSQL Exporter - job_name: 'postgresql' static_configs: - targets: ['localhost:9187'] # Application metrics - job_name: 'application' static_configs: - targets: ['localhost:3000'] metrics_path: '/metrics' ``` **Alert Rules** ```yaml # /etc/prometheus/rules/alerts.yml groups: - name: system interval: 30s rules: # CPU usage - alert: HighCPUUsage expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80 for: 5m labels: severity: warning annotations: summary: "High CPU usage on {{ $labels.instance }}" description: "CPU usage is {{ $value }}%" # Memory usage - alert: HighMemoryUsage expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90 for: 5m labels: severity: critical annotations: summary: "High memory usage on {{ $labels.instance }}" description: "Memory usage is {{ $value }}%" # Disk space - alert: LowDiskSpace expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10 for: 5m labels: severity: warning annotations: summary: "Low disk space on {{ $labels.instance }}" description: "Only {{ $value }}% free space on /" - name: application interval: 30s rules: # HTTP error rate - alert: HighErrorRate expr: (rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])) * 100 > 5 for: 5m labels: severity: critical annotations: summary: "High HTTP error rate" description: "Error rate is {{ $value }}%" # Response time - alert: SlowResponseTime expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1 for: 5m labels: severity: warning annotations: summary: "Slow response time" description: "P95 response time is {{ $value }}s" ``` #### Log Rotation ```bash # /etc/logrotate.d/custom /var/log/myapp/*.log { daily missingok rotate 14 compress delaycompress copytruncate notifempty create 0640 www-data adm sharedscripts postrotate systemctl reload myapp > /dev/null 2>&1 || true endscript } ``` ### 6. Security Hardening #### SSH Hardening ```bash # /etc/ssh/sshd_config # Basic Port 2222 Protocol 2 HostKey /etc/ssh/ssh_host_ed25519_key HostKey /etc/ssh/ssh_host_rsa_key # Logging SyslogFacility AUTHPRIV LogLevel VERBOSE # Authentication LoginGraceTime 60 PermitRootLogin no StrictModes yes MaxAuthTries 3 MaxStartups 10:30:60 PasswordAuthentication no PermitEmptyPasswords no ChallengeResponseAuthentication no # Key-based authentication PubkeyAuthentication yes AuthorizedKeysFile .ssh/authorized_keys # Security UsePAM yes X11Forwarding no AllowTcpForwarding no AllowAgentForwarding no PermitTunnel no GatewayPorts no # Restrict users AllowUsers admin deploy DenyUsers root # Ciphers Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com # MACs MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com # Kex KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256 # Restart service systemctl restart sshd ``` #### System Hardening Script ```bash #!/bin/bash # System Security Hardening set -euo pipefail echo "=== System Security Hardening ===" # 1. Disable unused filesystems echo "[1/10] Disabling unused filesystems..." cat > /etc/modprobe.d/disable-filesystems.conf << 'EOF' install cramfs /bin/true install freevxfs /bin/true install jffs2 /bin/true install hfsplus /bin/true install squashfs /bin/true install udf /bin/true EOF # 2. Kernel hardening echo "[2/10] Configuring kernel parameters..." cat > /etc/sysctl.d/99-security.conf << 'EOF' # Network net.ipv4.ip_forward = 0 net.ipv4.conf.all.send_redirects = 0 net.ipv4.conf.default.send_redirects = 0 net.ipv4.conf.all.accept_source_route = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.conf.all.accept_redirects = 0 net.ipv4.conf.default.accept_redirects = 0 net.ipv4.conf.all.secure_redirects = 0 net.ipv4.conf.default.secure_redirects = 0 net.ipv4.conf.all.rp_filter = 1 net.ipv4.conf.default.rp_filter = 1 net.ipv4.icmp_echo_ignore_broadcasts = 1 net.ipv4.icmp_ignore_bogus_error_responses = 1 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_syn_backlog = 2048 net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_syn_retries = 5 # IPv6 net.ipv6.conf.all.accept_redirects = 0 net.ipv6.conf.default.accept_redirects = 0 # Kernel kernel.randomize_va_space = 2 kernel.kptr_restrict = 2 kernel.dmesg_restrict = 1 kernel.perf_event_paranoid = 2 # Core dumps fs.suid_dumpable = 0 kernel.core_pattern = |/bin/false EOF sysctl -p /etc/sysctl.d/99-security.conf # 3. Restrict core dumps echo "[3/10] Restricting core dumps..." cat > /etc/security/limits.d/50-core.conf << 'EOF' * hard core 0 EOF # 4. Disable USB storage echo "[4/10] Disabling USB storage..." echo "install usb-storage /bin/true" > /etc/modprobe.d/disable-usb.conf # 5. Configure PAM echo "[5/10] Configuring PAM..." cat > /etc/pam.d/login << 'EOF' auth required pam_unix.so auth requisite pam_deny.so auth sufficient pam_rootok.so account required pam_unix.so password required pam_unix.so sha512 shadow session required pam_unix.so session required pam_limits.so EOF # 6. Disable unnecessary services echo "[6/10] Disabling unnecessary services..." systemctl disable cups 2>/dev/null || true systemctl disable avahi-daemon 2>/dev/null || true systemctl disable bluetooth 2>/dev/null || true # 7. Install security updates echo "[7/10] Installing security updates..." apt-get install -y unattended-upgrades dpkg-reconfigure -plow unattended-upgrades # 8. Configure automatic updates echo "[8/10] Configuring automatic updates..." cat > /etc/apt/apt.conf.d/50unattended-upgrades << 'EOF' Unattended-Upgrade::Allowed-Origins { "${distro_id}:${distro_codename}-security"; }; Unattended-Upgrade::AutoFixInterruptedDpkg "true"; Unattended-Upgrade::MinimalSteps "true"; Unattended-Upgrade::Remove-Unused-Kernel-Packages "true"; Unattended-Upgrade::Remove-Unused-Dependencies "true"; Unattended-Upgrade::Automatic-Reboot "false"; Unattended-Upgrade::Automatic-Reboot-Time "02:00"; EOF # 9. Install auditd echo "[9/10] Installing audit daemon..." apt-get install -y auditd audispd-plugins systemctl enable auditd systemctl start auditd # 10. Configure audit rules echo "[10/10] Configuring audit rules..." cat > /etc/audit/rules.d/audit.rules << 'EOF' -w /etc/passwd -p wa -k identity -w /etc/group -p wa -k identity -w /etc/shadow -p wa -k identity -w /etc/sudoers -p wa -k sudoers -w /var/log/audit/ -p wa -k audit-logs -w /etc/ssh/sshd_config -p wa -k sshd -w /var/log/auth.log -p wa -k auth-log EOF systemctl restart auditd echo "=== Hardening Complete ===" ``` ### 7. Performance Tuning #### System Performance Tuning ```bash #!/bin/bash # Performance Tuning Script # 1. I/O Scheduler echo "[1/5] Configuring I/O scheduler..." echo 'ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/scheduler}="deadline"' \ > /etc/udev/rules.d/60-scheduler.rules # 2. Filesystem optimization echo "[2/5] Optimizing filesystem..." cat > /etc/fstab << 'EOF' UUID=xxx / ext4 defaults,noatime,nodiratime 0 1 EOF # 3. Network tuning echo "[3/5] Tuning network parameters..." cat > /etc/sysctl.d/99-performance.conf << 'EOF' # TCP net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_time = 120 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_max_syn_backlog = 4096 net.core.somaxconn = 1024 net.ipv4.tcp_max_tw_buckets = 400000 net.ipv4.tcp_no_metrics_save = 1 net.core.netdev_max_backlog = 16384 EOF sysctl -p /etc/sysctl.d/99-performance.conf # 4. Process limits echo "[4/5] Configuring process limits..." cat > /etc/security/limits.d/50-performance.conf << 'EOF' * soft nofile 65536 * hard nofile 65536 * soft nproc 65536 * hard nproc 65536 * soft memlock unlimited * hard memlock unlimited EOF # 5. Swapping echo "[5/5] Configuring swap..." sysctl vm.swappiness=10 echo "vm.swappiness=10" >> /etc/sysctl.conf echo "=== Performance Tuning Complete ===" ``` ### 8. Container Management #### Docker Daemon Configuration ```json // /etc/docker/daemon.json { "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "3" }, "storage-driver": "overlay2", "live-restore": true, "userland-proxy": false, "no-new-privileges": true, "icc": false, "default-ulimits": { "nofile": { "Name": "nofile", "Hard": 64000, "Soft": 64000 } }, "registry-mirrors": [ "https://mirror.gcr.io" ], "metrics-addr": "0.0.0.0:9323", "experimental": false } ``` #### Docker Compose Production Template ```yaml # docker-compose.prod.yml version: '3.8' services: app: image: myapp:${VERSION:-latest} restart: always deploy: resources: limits: cpus: '2' memory: 2G reservations: cpus: '1' memory: 1G environment: - NODE_ENV=production - DATABASE_URL=postgres://db:5432/app env_file: - .env.prod volumes: - app-uploads:/app/uploads - app-logs:/app/logs networks: - frontend - backend healthcheck: test: ["CMD", "wget", "-q", "--spider", "http://localhost:3000/health"] interval: 30s timeout: 10s retries: 3 start_period: 40s logging: driver: "json-file" options: max-size: "10m" max-file: "3" nginx: image: nginx:alpine restart: always ports: - "80:80" - "443:443" volumes: - ./nginx.conf:/etc/nginx/nginx.conf:ro - ./ssl:/etc/nginx/ssl:ro - app-static:/app/static:ro networks: - frontend depends_on: - app postgres: image: postgres:15-alpine restart: always environment: - POSTGRES_DB=app - POSTGRES_USER=app - POSTGRES_PASSWORD_FILE=/run/secrets/db_password secrets: - db_password volumes: - postgres-data:/var/lib/postgresql/data networks: - backend healthcheck: test: ["CMD-SHELL", "pg_isready -U app -d app"] interval: 10s timeout: 5s retries: 5 redis: image: redis:7-alpine restart: always command: redis-server --appendonly yes volumes: - redis-data:/data networks: - backend volumes: app-uploads: app-logs: app-static: postgres-data: redis-data: networks: frontend: driver: bridge backend: driver: bridge internal: true secrets: db_password: file: ./secrets/db_password.txt ``` ### 9. Backup Strategies #### Automated Backup Script ```bash #!/bin/bash # Automated Backup Script set -euo pipefail # Configuration BACKUP_DIR="/backup" RETENTION_DAYS=30 S3_BUCKET="s3://my-backups" DATE=$(date +%Y%m%d_%H%M%S) # Create backup directory mkdir -p "$BACKUP_DIR/$DATE" # 1. Database backup echo "Backing up database..." pg_dump -U postgres -h localhost mydb | gzip > "$BACKUP_DIR/$DATE/database.sql.gz" # 2. Files backup echo "Backing up files..." tar -czf "$BACKUP_DIR/$DATE/files.tar.gz" /var/www/app /etc/nginx /etc/ssh # 3. Configuration backup echo "Backing up configurations..." tar -czf "$BACKUP_DIR/$DATE/config.tar.gz" /etc /opt # 4. Upload to S3 echo "Uploading to S3..." aws s3 sync "$BACKUP_DIR/$DATE" "$S3_BUCKET/$DATE/" # 5. Clean old backups echo "Cleaning old backups..." find "$BACKUP_DIR" -type d -mtime +$RETENTION_DAYS -exec rm -rf {} \; aws s3 ls "$S3_BUCKET/" | while read -r line; do dir_date=$(echo "$line" | awk '{print $2}' | tr -d '/') if [ "$dir_date" != "" ]; then file_date=$(date -d "$dir_date" +%s 2>/dev/null || echo 0) cutoff_date=$(date -d "$RETENTION_DAYS days ago" +%s) if [ "$file_date" -lt "$cutoff_date" ]; then aws s3 rm "$S3_BUCKET/$dir_date" --recursive fi fi done echo "Backup completed: $DATE" ``` ### 10. Incident Response #### Incident Response Checklist ```bash #!/bin/bash # Incident Response Script echo "=== Incident Response Checklist ===" # 1. Check system status echo "[1/10] Checking system status..." uptime free -h df -h # 2. Check service status echo "[2/10] Checking service status..." systemctl list-units --type=service --state=failed # 3. Check recent logs echo "[3/10] Checking recent logs..." journalctl -p err -n 50 --no-pager # 4. Check network connections echo "[4/10] Checking network connections..." ss -tunap netstat -tunap # 5. Check CPU usage echo "[5/10] Checking CPU usage..." top -bn1 | head -20 # 6. Check memory usage echo "[6/10] Checking memory usage..." ps aux --sort=-%mem | head -10 # 7. Check disk I/O echo "[7/10] Checking disk I/O..." iotop -b -n 1 | head -20 # 8. Check failed login attempts echo "[8/10] Checking failed logins..." grep "Failed password" /var/log/auth.log | tail -20 # 9. Check firewall status echo "[9/10] Checking firewall status..." ufw status # 10. Save system state echo "[10/10] Saving system state..." STATE_FILE="/tmp/incident-state-$(date +%Y%m%d_%H%M%S).txt" { echo "=== System State ===" uptime free -h df -h echo "" echo "=== Failed Services ===" systemctl list-units --type=service --state=failed echo "" echo "=== Recent Errors ===" journalctl -p err -n 20 --no-pager echo "" echo "=== Network Connections ===" ss -tunap } > "$STATE_FILE" echo "=== Incident Response Complete ===" echo "State saved to: $STATE_FILE" ``` ### 11. Decision Trees #### Monitoring Solution Selection ``` What to monitor? │ ├─ System metrics → Prometheus + Node Exporter ├─ Application metrics → Prometheus + Custom Exporter ├─ Logs → ELK Stack / Loki ├─ Tracing → Jaeger / Tempo ├─ Uptime → Uptime Robot / Pingdom └─ Security → Wazuh / OSSEC ``` #### Backup Strategy Selection ``` Data type? │ ├─ Database → Logical backup (pg_dump/mysqldump) + Physical backup ├─ Files → Rsync + Snapshots ├─ Configuration → Git + Version control ├─ Entire server → Full disk image (AMIs, snapshots) └─ Disaster recovery → Off-site replication + DR drills ``` ### 12. Anti-Patterns to Avoid 1. **No Monitoring**: Never deploy without monitoring and alerting 2. **Weak Authentication**: Always use SSH keys, never password auth 3. **No Backups**: Always implement 3-2-1 backup strategy 4. **Ignoring Logs**: Regular review of logs prevents incidents 5. **No Documentation**: Document all configurations and changes 6. **No Testing**: Test all changes in staging first 7. **No Rollback Plan**: Always have a rollback plan for changes 8. **No Security Updates**: Keep system updated with security patches 9. **No Resource Limits**: Set limits on all services 10. **No Incident Response**: Have a plan before incidents occur ### 13. Quality Checklist Before considering server management complete: - [ ] All services have systemd unit files - [ ] Firewall configured and enabled - [ ] SSH hardened (key-based only) - [ ] Automatic security updates configured - [ ] Monitoring and alerting setup - [ ] Log rotation configured - [ ] Backup strategy implemented and tested - [ ] Documentation up to date - [ ] Incident response plan in place - [ ] Resource limits configured - [ ] SSL/TLS certificates valid - [ ] Security hardening applied - [ ] Performance tuning applied - [ ] Disaster recovery tested - [ ] Compliance requirements met - [ ] Network segmentation implemented - [ ] Vulnerability scanning performed - [ ] Access controls implemented - [ ] Change management process in place - [ ] Automation implemented for repetitive tasks This comprehensive skill definition provides complete guidance for Linux server administration across production environments.