Files
Pony-Alpha-2-Dataset-Training/skills/skill-linux-server.md
Pony Alpha 2 68453089ee feat: initial Alpha Brain 2 dataset release
Massive training corpus for AI coding models containing:
- 10 JSONL training datasets (641+ examples across coding, reasoning, planning, architecture, communication, debugging, security, workflows, error handling, UI/UX)
- 11 agent behavior specifications (explorer, planner, reviewer, debugger, executor, UI designer, Linux admin, kernel engineer, security architect, automation engineer, API architect)
- 6 skill definition files (coding, API engineering, kernel, Linux server, security architecture, server automation, UI/UX)
- Master README with project origin story and philosophy

Built by Pony Alpha 2 to help AI models learn expert-level coding approaches.
2026-03-13 16:26:29 +04:00

1386 lines
32 KiB
Markdown

# Linux Server Administration Expert Skill
## Activation Criteria
Activate this skill when the user:
- Requests Linux server setup, configuration, or maintenance
- Needs service management (systemd, init scripts)
- Requires networking configuration (firewalls, routing, DNS)
- Asks for storage management (LVM, RAID, filesystems)
- Needs monitoring and logging solutions
- Requires security hardening or compliance
- Requests performance tuning or optimization
- Needs container orchestration (Docker, Kubernetes)
- Asks for backup and disaster recovery strategies
- Requires incident response or troubleshooting
- Needs automation (Ansible, bash scripts, cron)
- Is managing: web servers, database servers, application servers, cloud instances
## Core Methodology
### 1. System Administration Methodology
#### Server Setup Workflow
```bash
#!/bin/bash
# Initial Server Setup Script
# Usage: sudo ./initial-setup.sh
set -euo pipefail
# Configuration
SERVER_HOSTNAME="${1:-web-server-01}"
TIMEZONE="${2:-UTC}"
ADMIN_USER="${3:-admin}"
SSH_PORT="${4:-22}"
echo "=== Initial Server Setup ==="
echo "Hostname: $SERVER_HOSTNAME"
echo "Timezone: $TIMEZONE"
echo "Admin User: $ADMIN_USER"
echo "SSH Port: $SSH_PORT"
# 1. Update System
echo "[1/8] Updating system packages..."
apt-get update && apt-get upgrade -y
apt-get install -y \
curl \
wget \
git \
vim \
htop \
tmux \
net-tools \
lsof \
strace \
tcpdump \
iotop \
sysstat \
fail2ban \
ufw
# 2. Set Hostname
echo "[2/8] Setting hostname..."
hostnamectl set-hostname "$SERVER_HOSTNAME"
echo "127.0.0.1 localhost $SERVER_HOSTNAME" >> /etc/hosts
# 3. Configure Timezone
echo "[3/8] Configuring timezone..."
timedatectl set-timezone "$TIMEZONE"
# 4. Create Admin User
echo "[4/8] Creating admin user..."
if ! id "$ADMIN_USER" &>/dev/null; then
useradd -m -s /bin/bash "$ADMIN_USER"
usermod -aG sudo "$ADMIN_USER"
echo "$ADMIN_USER ALL=(ALL) NOPASSWD:ALL" > "/etc/sudoers.d/$ADMIN_USER"
chmod 440 "/etc/sudoers.d/$ADMIN_USER"
fi
# 5. Configure SSH
echo "[5/8] Hardening SSH configuration..."
sed -i 's/#PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config
sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config
sed -i "s/#Port 22/Port $SSH_PORT/" /etc/ssh/sshd_config
echo "AllowUsers $ADMIN_USER" >> /etc/ssh/sshd_config
systemctl restart sshd
# 6. Configure Firewall
echo "[6/8] Configuring firewall..."
ufw default deny incoming
ufw default allow outgoing
ufw allow "$SSH_PORT"/tcp
ufw allow 80/tcp
ufw allow 443/tcp
ufw --force enable
# 7. Configure Fail2Ban
echo "[7/8] Configuring Fail2Ban..."
cat > /etc/fail2ban/jail.local << 'EOF'
[DEFAULT]
bantime = 3600
findtime = 600
maxretry = 5
destemail = admin@example.com
sendername = Fail2Ban
action = %(action_mwl)s
[sshd]
enabled = true
port = ssh
filter = sshd
logpath = /var/log/auth.log
maxretry = 3
EOF
systemctl enable fail2ban
systemctl start fail2ban
# 8. Configure System Monitoring
echo "[8/8] Installing monitoring tools..."
apt-get install -y prometheus-node-exporter
systemctl enable prometheus-node-exporter
systemctl start prometheus-node-exporter
echo "=== Setup Complete ==="
echo "Next steps:"
echo "1. Copy SSH key for $ADMIN_USER"
echo "2. Test SSH connection on port $SSH_PORT"
echo "3. Reboot server"
```
### 2. Service Management
#### Systemd Unit Files
**Web Application Service**
```ini
# /etc/systemd/system/webapp.service
[Unit]
Description=Web Application Service
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service
[Service]
Type=simple
User=webapp
Group=webapp
WorkingDirectory=/opt/webapp
Environment="NODE_ENV=production"
Environment="PORT=3000"
EnvironmentFile=/opt/webapp/.env
ExecStart=/usr/bin/node /opt/webapp/server.js
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/kill -SIGTERM $MAINPID
Restart=always
RestartSec=10
StartLimitInterval=200
StartLimitBurst=5
# Security
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/webapp/logs /opt/webapp/uploads
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
AmbientCapabilities=CAP_NET_BIND_SERVICE
# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=webapp
[Install]
WantedBy=multi-user.target
```
**Database Service (PostgreSQL)**
```ini
# /etc/systemd/system/postgresql-custom.service
[Unit]
Description=PostgreSQL Database Server
After=network.target
Wants=network-online.target
[Service]
Type=notify
User=postgres
Group=postgres
Environment=PGDATA=/var/lib/postgresql/data
Environment=PGPORT=5432
ExecStart=/usr/bin/postgres -D ${PGDATA}
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/usr/bin/pg_ctl stop -D ${PGDATA} -m fast
TimeoutSec=300
OOMScoreAdjust=-1000
# Performance
LimitNOFILE=65536
LimitMEMLOCK=infinity
# Security
ProtectHome=true
ProtectSystem=strict
ReadWritePaths=/var/lib/postgresql /var/run/postgresql
PrivateDevices=true
[Install]
WantedBy=multi-user.target
```
**Background Worker Service**
```ini
# /etc/systemd/system/worker.service
[Unit]
Description=Background Job Worker
After=network.target redis.service
Wants=redis.service
[Service]
Type=forking
User=worker
Group=worker
WorkingDirectory=/opt/worker
Environment="QUEUE=default"
Environment="CONCURRENCY=4"
EnvironmentFile=/opt/worker/.env
ExecStart=/usr/bin/python3 -m worker start --daemon
ExecStop=/usr/bin/python3 -m worker stop
ExecReload=/usr/bin/python3 -m worker reload
Restart=on-failure
RestartSec=5
# Resource Limits
MemoryMax=2G
CPUQuota=200%
[Install]
WantedBy=multi-user.target
```
#### Service Management Commands
```bash
# Service Operations
systemctl start service-name
systemctl stop service-name
systemctl restart service-name
systemctl reload service-name
systemctl status service-name
# Enable/Disable Services
systemctl enable service-name # Enable at boot
systemctl disable service-name # Disable at boot
systemctl is-enabled service-name
# View Service Details
systemctl show service-name
journalctl -u service-name -f
journalctl -u service-name --since "1 hour ago"
# List Services
systemctl list-units --type=service
systemctl list-units --type=service --state=running
systemctl list-units --type=service --all
# Service Dependencies
systemctl list-dependencies service-name
systemctl list-dependencies --reverse service-name
# Resource Usage
systemd-cgtop
systemctl show service-name -p CPUUsage,MemoryCurrent
```
### 3. Network Configuration
#### Firewall Configuration (UFW)
```bash
#!/bin/bash
# Firewall Setup Script
# Reset firewall
ufw --force reset
# Default policies
ufw default deny incoming
ufw default allow outgoing
ufw default deny routed
# Allow loopback
ufw allow in on lo
# Allow established connections
ufw allow established
ufw allow related
# SSH (custom port)
ufw allow 2222/tcp comment 'SSH'
# HTTP/HTTPS
ufw allow 80/tcp comment 'HTTP'
ufw allow 443/tcp comment 'HTTPS'
# Database (internal only)
ufw allow from 10.0.0.0/8 to any port 5432 proto tcp comment 'PostgreSQL internal'
# Monitoring
ufw allow from 10.0.0.0/8 to any port 9100 proto tcp comment 'Node Exporter'
# Rate limiting
ufw limit 2222/tcp comment 'Rate limit SSH'
# Enable firewall
ufw --force enable
# Show rules
ufw status numbered
```
#### iptables Configuration
```bash
#!/bin/bash
# Advanced iptables Configuration
# Flush existing rules
iptables -F
iptables -X
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X
# Default policies
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT
# Allow loopback
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT
# Allow established connections
iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
# Drop invalid packets
iptables -A INPUT -m conntrack --ctstate INVALID -j DROP
# Allow SSH
iptables -A INPUT -p tcp --dport 2222 -m conntrack --ctstate NEW -m limit --limit 3/min --limit-burst 3 -j ACCEPT
iptables -A INPUT -p tcp --dport 2222 -j DROP
# Allow HTTP/HTTPS
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
# Anti-spoofing
iptables -A INPUT -s 127.0.0.0/8 ! -i lo -j DROP
iptables -A INPUT -s 0.0.0.0/8 -j DROP
iptables -A INPUT -s 224.0.0.0/4 -j DROP
# ICMP protection
iptables -A INPUT -p icmp --icmp-type echo-request -m limit --limit 1/s -j ACCEPT
iptables -A INPUT -p icmp --icmp-type echo-request -j DROP
# SYN flood protection
iptables -A INPUT -p tcp --syn -m limit --limit 10/s --limit-burst 20 -j ACCEPT
iptables -A INPUT -p tcp --syn -j DROP
# Port scan protection
iptables -A INPUT -p tcp --tcp-flags ALL NONE -j DROP
iptables -A INPUT -p tcp --tcp-flags ALL ALL -j DROP
# Log dropped packets
iptables -A INPUT -j LOG --log-prefix "[DROPPED] " --log-level 4
# Save rules
iptables-save > /etc/iptables/rules.v4
```
#### Nginx Configuration
```nginx
# /etc/nginx/nginx.conf
user nginx;
worker_processes auto;
worker_rlimit_nofile 65535;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 4096;
use epoll;
multi_accept on;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
# Logging
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
# Performance
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
server_tokens off;
# Buffers
client_body_buffer_size 128k;
client_max_body_size 100m;
client_header_buffer_size 1k;
large_client_header_buffers 4 4k;
output_buffers 1 32k;
postpone_output 1460;
# Gzip
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css text/xml text/javascript
application/json application/javascript application/xml+rss
application/rss+xml font/truetype font/opentype
application/vnd.ms-fontobject image/svg+xml;
gzip_disable "msie6";
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "no-referrer-when-downgrade" always;
add_header Content-Security-Policy "default-src 'self' http: https: data: blob: 'unsafe-inline'" always;
# Rate limiting
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=general:10m rate=5r/s;
limit_conn_zone $binary_remote_addr zone=addr:10m;
# Include site configs
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}
```
**Application Server Configuration**
```nginx
# /etc/nginx/sites-available/app.conf
upstream app_backend {
least_conn;
server 127.0.0.1:3000 max_fails=3 fail_timeout=30s;
server 127.0.0.1:3001 max_fails=3 fail_timeout=30s;
server 127.0.0.1:3002 max_fails=3 fail_timeout=30s;
keepalive 32;
}
# Rate limiting
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=20r/s;
server {
listen 80;
listen [::]:80;
server_name app.example.com;
# Redirect to HTTPS
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name app.example.com;
# SSL Configuration
ssl_certificate /etc/ssl/certs/app.example.com.crt;
ssl_certificate_key /etc/ssl/private/app.example.com.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384';
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
ssl_stapling on;
ssl_stapling_verify on;
# Security headers
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
# Root
root /var/www/app;
index index.html;
# Client upload size
client_max_body_size 100M;
# Logging
access_log /var/log/nginx/app.access.log;
error_log /var/log/nginx/app.error.log;
# Static files
location /static/ {
alias /var/www/app/static/;
expires 1y;
add_header Cache-Control "public, immutable";
access_log off;
}
# Media files
location /media/ {
alias /var/www/app/media/;
expires 30d;
add_header Cache-Control "public";
}
# API endpoints
location /api/ {
limit_req zone=api_limit burst=40 nodelay;
proxy_pass http://app_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_bypass $http_upgrade;
# Timeouts
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
# WebSocket
location /ws/ {
proxy_pass http://app_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 86400;
}
# Health check
location /health {
access_log off;
return 200 "healthy\n";
add_header Content-Type text/plain;
}
# Favicon
location = /favicon.ico {
access_log off;
log_not_found off;
}
# Robots.txt
location = /robots.txt {
access_log off;
log_not_found off;
}
# Deny access to hidden files
location ~ /\. {
deny all;
access_log off;
log_not_found off;
}
}
```
### 4. Storage Management
#### LVM Configuration
```bash
#!/bin/bash
# LVM Setup Script
VG_NAME="vg01"
LV_DATA="lv-data"
LV_LOGS="lv-logs"
LV_BACKUP="lv-backup"
# Create physical volume
pvcreate /dev/sdb
# Create volume group
vgcreate $VG_NAME /dev/sdb
# Create logical volumes
lvcreate -L 100G -n $LV_DATA $VG_NAME
lvcreate -L 50G -n $LV_LOGS $VG_NAME
lvcreate -L 200G -n $LV_BACKUP $VG_NAME
# Create filesystems
mkfs.xfs /dev/$VG_NAME/$LV_DATA
mkfs.xfs /dev/$VG_NAME/$LV_LOGS
mkfs.ext4 /dev/$VG_NAME/$LV_BACKUP
# Create mount points
mkdir -p /data /logs /backup
# Update /etc/fstab
cat >> /etc/fstab << EOF
/dev/$VG_NAME/$LV_DATA /data xfs defaults,noatime 0 2
/dev/$VG_NAME/$LV_LOGS /logs xfs defaults,noatime 0 2
/dev/$VG_NAME/$LV_BACKUP /backup ext4 defaults,noatime 0 2
EOF
# Mount filesystems
mount -a
# Verify
df -h
vgdisplay
lvdisplay
```
#### RAID Configuration
```bash
#!/bin/bash
# RAID 10 Setup Script
# Install mdadm
apt-get install -y mdadm
# Create RAID 10 array
mdadm --create /dev/md0 \
--level=10 \
--raid-devices=4 \
/dev/sdb /dev/sdc /dev/sdd /dev/sde
# Create filesystem
mkfs.ext4 /dev/md0
# Create mount point
mkdir -p /raid
# Update /etc/fstab
echo "/dev/md0 /raid ext4 defaults,noatime 0 2" >> /etc/fstab
# Mount
mount /raid
# Save RAID configuration
mdadm --detail --scan >> /etc/mdadm/mdadm.conf
# Verify
cat /proc/mdstat
mdadm --detail /dev/md0
```
### 5. Monitoring and Logging
#### Prometheus Configuration
```yaml
# /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: 'production'
replica: '1'
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
# Rule files
rule_files:
- '/etc/prometheus/rules/*.yml'
# Scrape configurations
scrape_configs:
# Prometheus itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Node Exporter
- job_name: 'node'
static_configs:
- targets: ['localhost:9100']
relabel_configs:
- source_labels: [__address__]
target_label: instance
replacement: 'web-server-01'
# Nginx Exporter
- job_name: 'nginx'
static_configs:
- targets: ['localhost:9113']
# PostgreSQL Exporter
- job_name: 'postgresql'
static_configs:
- targets: ['localhost:9187']
# Application metrics
- job_name: 'application'
static_configs:
- targets: ['localhost:3000']
metrics_path: '/metrics'
```
**Alert Rules**
```yaml
# /etc/prometheus/rules/alerts.yml
groups:
- name: system
interval: 30s
rules:
# CPU usage
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is {{ $value }}%"
# Memory usage
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
for: 5m
labels:
severity: critical
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Memory usage is {{ $value }}%"
# Disk space
- alert: LowDiskSpace
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
for: 5m
labels:
severity: warning
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Only {{ $value }}% free space on /"
- name: application
interval: 30s
rules:
# HTTP error rate
- alert: HighErrorRate
expr: (rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])) * 100 > 5
for: 5m
labels:
severity: critical
annotations:
summary: "High HTTP error rate"
description: "Error rate is {{ $value }}%"
# Response time
- alert: SlowResponseTime
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "Slow response time"
description: "P95 response time is {{ $value }}s"
```
#### Log Rotation
```bash
# /etc/logrotate.d/custom
/var/log/myapp/*.log {
daily
missingok
rotate 14
compress
delaycompress
copytruncate
notifempty
create 0640 www-data adm
sharedscripts
postrotate
systemctl reload myapp > /dev/null 2>&1 || true
endscript
}
```
### 6. Security Hardening
#### SSH Hardening
```bash
# /etc/ssh/sshd_config
# Basic
Port 2222
Protocol 2
HostKey /etc/ssh/ssh_host_ed25519_key
HostKey /etc/ssh/ssh_host_rsa_key
# Logging
SyslogFacility AUTHPRIV
LogLevel VERBOSE
# Authentication
LoginGraceTime 60
PermitRootLogin no
StrictModes yes
MaxAuthTries 3
MaxStartups 10:30:60
PasswordAuthentication no
PermitEmptyPasswords no
ChallengeResponseAuthentication no
# Key-based authentication
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
# Security
UsePAM yes
X11Forwarding no
AllowTcpForwarding no
AllowAgentForwarding no
PermitTunnel no
GatewayPorts no
# Restrict users
AllowUsers admin deploy
DenyUsers root
# Ciphers
Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com
# MACs
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com
# Kex
KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256
# Restart service
systemctl restart sshd
```
#### System Hardening Script
```bash
#!/bin/bash
# System Security Hardening
set -euo pipefail
echo "=== System Security Hardening ==="
# 1. Disable unused filesystems
echo "[1/10] Disabling unused filesystems..."
cat > /etc/modprobe.d/disable-filesystems.conf << 'EOF'
install cramfs /bin/true
install freevxfs /bin/true
install jffs2 /bin/true
install hfsplus /bin/true
install squashfs /bin/true
install udf /bin/true
EOF
# 2. Kernel hardening
echo "[2/10] Configuring kernel parameters..."
cat > /etc/sysctl.d/99-security.conf << 'EOF'
# Network
net.ipv4.ip_forward = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 5
# IPv6
net.ipv6.conf.all.accept_redirects = 0
net.ipv6.conf.default.accept_redirects = 0
# Kernel
kernel.randomize_va_space = 2
kernel.kptr_restrict = 2
kernel.dmesg_restrict = 1
kernel.perf_event_paranoid = 2
# Core dumps
fs.suid_dumpable = 0
kernel.core_pattern = |/bin/false
EOF
sysctl -p /etc/sysctl.d/99-security.conf
# 3. Restrict core dumps
echo "[3/10] Restricting core dumps..."
cat > /etc/security/limits.d/50-core.conf << 'EOF'
* hard core 0
EOF
# 4. Disable USB storage
echo "[4/10] Disabling USB storage..."
echo "install usb-storage /bin/true" > /etc/modprobe.d/disable-usb.conf
# 5. Configure PAM
echo "[5/10] Configuring PAM..."
cat > /etc/pam.d/login << 'EOF'
auth required pam_unix.so
auth requisite pam_deny.so
auth sufficient pam_rootok.so
account required pam_unix.so
password required pam_unix.so sha512 shadow
session required pam_unix.so
session required pam_limits.so
EOF
# 6. Disable unnecessary services
echo "[6/10] Disabling unnecessary services..."
systemctl disable cups 2>/dev/null || true
systemctl disable avahi-daemon 2>/dev/null || true
systemctl disable bluetooth 2>/dev/null || true
# 7. Install security updates
echo "[7/10] Installing security updates..."
apt-get install -y unattended-upgrades
dpkg-reconfigure -plow unattended-upgrades
# 8. Configure automatic updates
echo "[8/10] Configuring automatic updates..."
cat > /etc/apt/apt.conf.d/50unattended-upgrades << 'EOF'
Unattended-Upgrade::Allowed-Origins {
"${distro_id}:${distro_codename}-security";
};
Unattended-Upgrade::AutoFixInterruptedDpkg "true";
Unattended-Upgrade::MinimalSteps "true";
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
Unattended-Upgrade::Automatic-Reboot "false";
Unattended-Upgrade::Automatic-Reboot-Time "02:00";
EOF
# 9. Install auditd
echo "[9/10] Installing audit daemon..."
apt-get install -y auditd audispd-plugins
systemctl enable auditd
systemctl start auditd
# 10. Configure audit rules
echo "[10/10] Configuring audit rules..."
cat > /etc/audit/rules.d/audit.rules << 'EOF'
-w /etc/passwd -p wa -k identity
-w /etc/group -p wa -k identity
-w /etc/shadow -p wa -k identity
-w /etc/sudoers -p wa -k sudoers
-w /var/log/audit/ -p wa -k audit-logs
-w /etc/ssh/sshd_config -p wa -k sshd
-w /var/log/auth.log -p wa -k auth-log
EOF
systemctl restart auditd
echo "=== Hardening Complete ==="
```
### 7. Performance Tuning
#### System Performance Tuning
```bash
#!/bin/bash
# Performance Tuning Script
# 1. I/O Scheduler
echo "[1/5] Configuring I/O scheduler..."
echo 'ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/scheduler}="deadline"' \
> /etc/udev/rules.d/60-scheduler.rules
# 2. Filesystem optimization
echo "[2/5] Optimizing filesystem..."
cat > /etc/fstab << 'EOF'
UUID=xxx / ext4 defaults,noatime,nodiratime 0 1
EOF
# 3. Network tuning
echo "[3/5] Tuning network parameters..."
cat > /etc/sysctl.d/99-performance.conf << 'EOF'
# TCP
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.core.somaxconn = 1024
net.ipv4.tcp_max_tw_buckets = 400000
net.ipv4.tcp_no_metrics_save = 1
net.core.netdev_max_backlog = 16384
EOF
sysctl -p /etc/sysctl.d/99-performance.conf
# 4. Process limits
echo "[4/5] Configuring process limits..."
cat > /etc/security/limits.d/50-performance.conf << 'EOF'
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
* soft memlock unlimited
* hard memlock unlimited
EOF
# 5. Swapping
echo "[5/5] Configuring swap..."
sysctl vm.swappiness=10
echo "vm.swappiness=10" >> /etc/sysctl.conf
echo "=== Performance Tuning Complete ==="
```
### 8. Container Management
#### Docker Daemon Configuration
```json
// /etc/docker/daemon.json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"storage-driver": "overlay2",
"live-restore": true,
"userland-proxy": false,
"no-new-privileges": true,
"icc": false,
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Hard": 64000,
"Soft": 64000
}
},
"registry-mirrors": [
"https://mirror.gcr.io"
],
"metrics-addr": "0.0.0.0:9323",
"experimental": false
}
```
#### Docker Compose Production Template
```yaml
# docker-compose.prod.yml
version: '3.8'
services:
app:
image: myapp:${VERSION:-latest}
restart: always
deploy:
resources:
limits:
cpus: '2'
memory: 2G
reservations:
cpus: '1'
memory: 1G
environment:
- NODE_ENV=production
- DATABASE_URL=postgres://db:5432/app
env_file:
- .env.prod
volumes:
- app-uploads:/app/uploads
- app-logs:/app/logs
networks:
- frontend
- backend
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
nginx:
image: nginx:alpine
restart: always
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./ssl:/etc/nginx/ssl:ro
- app-static:/app/static:ro
networks:
- frontend
depends_on:
- app
postgres:
image: postgres:15-alpine
restart: always
environment:
- POSTGRES_DB=app
- POSTGRES_USER=app
- POSTGRES_PASSWORD_FILE=/run/secrets/db_password
secrets:
- db_password
volumes:
- postgres-data:/var/lib/postgresql/data
networks:
- backend
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d app"]
interval: 10s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
restart: always
command: redis-server --appendonly yes
volumes:
- redis-data:/data
networks:
- backend
volumes:
app-uploads:
app-logs:
app-static:
postgres-data:
redis-data:
networks:
frontend:
driver: bridge
backend:
driver: bridge
internal: true
secrets:
db_password:
file: ./secrets/db_password.txt
```
### 9. Backup Strategies
#### Automated Backup Script
```bash
#!/bin/bash
# Automated Backup Script
set -euo pipefail
# Configuration
BACKUP_DIR="/backup"
RETENTION_DAYS=30
S3_BUCKET="s3://my-backups"
DATE=$(date +%Y%m%d_%H%M%S)
# Create backup directory
mkdir -p "$BACKUP_DIR/$DATE"
# 1. Database backup
echo "Backing up database..."
pg_dump -U postgres -h localhost mydb | gzip > "$BACKUP_DIR/$DATE/database.sql.gz"
# 2. Files backup
echo "Backing up files..."
tar -czf "$BACKUP_DIR/$DATE/files.tar.gz" /var/www/app /etc/nginx /etc/ssh
# 3. Configuration backup
echo "Backing up configurations..."
tar -czf "$BACKUP_DIR/$DATE/config.tar.gz" /etc /opt
# 4. Upload to S3
echo "Uploading to S3..."
aws s3 sync "$BACKUP_DIR/$DATE" "$S3_BUCKET/$DATE/"
# 5. Clean old backups
echo "Cleaning old backups..."
find "$BACKUP_DIR" -type d -mtime +$RETENTION_DAYS -exec rm -rf {} \;
aws s3 ls "$S3_BUCKET/" | while read -r line; do
dir_date=$(echo "$line" | awk '{print $2}' | tr -d '/')
if [ "$dir_date" != "" ]; then
file_date=$(date -d "$dir_date" +%s 2>/dev/null || echo 0)
cutoff_date=$(date -d "$RETENTION_DAYS days ago" +%s)
if [ "$file_date" -lt "$cutoff_date" ]; then
aws s3 rm "$S3_BUCKET/$dir_date" --recursive
fi
fi
done
echo "Backup completed: $DATE"
```
### 10. Incident Response
#### Incident Response Checklist
```bash
#!/bin/bash
# Incident Response Script
echo "=== Incident Response Checklist ==="
# 1. Check system status
echo "[1/10] Checking system status..."
uptime
free -h
df -h
# 2. Check service status
echo "[2/10] Checking service status..."
systemctl list-units --type=service --state=failed
# 3. Check recent logs
echo "[3/10] Checking recent logs..."
journalctl -p err -n 50 --no-pager
# 4. Check network connections
echo "[4/10] Checking network connections..."
ss -tunap
netstat -tunap
# 5. Check CPU usage
echo "[5/10] Checking CPU usage..."
top -bn1 | head -20
# 6. Check memory usage
echo "[6/10] Checking memory usage..."
ps aux --sort=-%mem | head -10
# 7. Check disk I/O
echo "[7/10] Checking disk I/O..."
iotop -b -n 1 | head -20
# 8. Check failed login attempts
echo "[8/10] Checking failed logins..."
grep "Failed password" /var/log/auth.log | tail -20
# 9. Check firewall status
echo "[9/10] Checking firewall status..."
ufw status
# 10. Save system state
echo "[10/10] Saving system state..."
STATE_FILE="/tmp/incident-state-$(date +%Y%m%d_%H%M%S).txt"
{
echo "=== System State ==="
uptime
free -h
df -h
echo ""
echo "=== Failed Services ==="
systemctl list-units --type=service --state=failed
echo ""
echo "=== Recent Errors ==="
journalctl -p err -n 20 --no-pager
echo ""
echo "=== Network Connections ==="
ss -tunap
} > "$STATE_FILE"
echo "=== Incident Response Complete ==="
echo "State saved to: $STATE_FILE"
```
### 11. Decision Trees
#### Monitoring Solution Selection
```
What to monitor?
├─ System metrics → Prometheus + Node Exporter
├─ Application metrics → Prometheus + Custom Exporter
├─ Logs → ELK Stack / Loki
├─ Tracing → Jaeger / Tempo
├─ Uptime → Uptime Robot / Pingdom
└─ Security → Wazuh / OSSEC
```
#### Backup Strategy Selection
```
Data type?
├─ Database → Logical backup (pg_dump/mysqldump) + Physical backup
├─ Files → Rsync + Snapshots
├─ Configuration → Git + Version control
├─ Entire server → Full disk image (AMIs, snapshots)
└─ Disaster recovery → Off-site replication + DR drills
```
### 12. Anti-Patterns to Avoid
1. **No Monitoring**: Never deploy without monitoring and alerting
2. **Weak Authentication**: Always use SSH keys, never password auth
3. **No Backups**: Always implement 3-2-1 backup strategy
4. **Ignoring Logs**: Regular review of logs prevents incidents
5. **No Documentation**: Document all configurations and changes
6. **No Testing**: Test all changes in staging first
7. **No Rollback Plan**: Always have a rollback plan for changes
8. **No Security Updates**: Keep system updated with security patches
9. **No Resource Limits**: Set limits on all services
10. **No Incident Response**: Have a plan before incidents occur
### 13. Quality Checklist
Before considering server management complete:
- [ ] All services have systemd unit files
- [ ] Firewall configured and enabled
- [ ] SSH hardened (key-based only)
- [ ] Automatic security updates configured
- [ ] Monitoring and alerting setup
- [ ] Log rotation configured
- [ ] Backup strategy implemented and tested
- [ ] Documentation up to date
- [ ] Incident response plan in place
- [ ] Resource limits configured
- [ ] SSL/TLS certificates valid
- [ ] Security hardening applied
- [ ] Performance tuning applied
- [ ] Disaster recovery tested
- [ ] Compliance requirements met
- [ ] Network segmentation implemented
- [ ] Vulnerability scanning performed
- [ ] Access controls implemented
- [ ] Change management process in place
- [ ] Automation implemented for repetitive tasks
This comprehensive skill definition provides complete guidance for Linux server administration across production environments.