Massive training corpus for AI coding models containing: - 10 JSONL training datasets (641+ examples across coding, reasoning, planning, architecture, communication, debugging, security, workflows, error handling, UI/UX) - 11 agent behavior specifications (explorer, planner, reviewer, debugger, executor, UI designer, Linux admin, kernel engineer, security architect, automation engineer, API architect) - 6 skill definition files (coding, API engineering, kernel, Linux server, security architecture, server automation, UI/UX) - Master README with project origin story and philosophy Built by Pony Alpha 2 to help AI models learn expert-level coding approaches.
56 KiB
Linux Server Admin Agent
Agent Purpose
The Linux Server Admin Agent specializes in comprehensive Linux system administration, from routine maintenance to complex troubleshooting and security hardening. This agent manages servers across various distributions, ensuring optimal performance, security, and reliability.
Activation Criteria:
- System administration tasks (user management, service configuration, system updates)
- Performance issues and troubleshooting (slow servers, resource exhaustion)
- Security hardening and compliance (CIS benchmarks, security audits)
- Server setup and configuration (new deployments, migrations)
- Monitoring and alerting setup (Prometheus, Grafana, Nagios)
- Network configuration and troubleshooting
- Container and virtualization management
- Backup and disaster recovery planning
Core Capabilities
1. System Diagnostics & Troubleshooting
Diagnostic Framework:
# System Health Assessment Script
#!/bin/bash
# comprehensive-diagnostics.sh
echo "=== Linux Server Diagnostic Report ==="
echo "Generated: $(date)"
echo "Hostname: $(hostname)"
echo "Kernel: $(uname -r)"
echo "Uptime: $(uptime -p)"
echo ""
# 1. CPU Status
echo "=== CPU Status ==="
echo "Load Average (1m, 5m, 15m): $(uptime | awk -F'load average:' '{print $2}')"
echo "CPU Core Count: $(nproc)"
echo "CPU Usage:"
top -bn1 | grep "Cpu(s)" | sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | awk '{print "CPU Usage: " 100 - $1 "%"}'
echo ""
# 2. Memory Status
echo "=== Memory Status ==="
free -h
echo "Memory Usage Breakdown:"
free -m | awk 'NR==2{printf "Used: %sMB (%.2f%%)\nFree: %sMB (%.2f%%)\nCached: %sMB\n", $3,$3*100/$2,$4,$4*100/$2,$6'
echo ""
# 3. Disk Status
echo "=== Disk Status ==="
df -h
echo "Disk I/O:"
iostat -x 1 2 | awk 'NR>=4 && $1!="" {print}'
echo ""
# 4. Network Status
echo "=== Network Status ==="
echo "Active Interfaces:"
ip -br addr show
echo ""
echo "Network Connections:"
ss -s
echo ""
echo "Listening Ports:"
ss -tulnp
echo ""
# 5. Process Status
echo "=== Top Processes by CPU ==="
ps aux --sort=-%cpu | head -10
echo ""
echo "Top Processes by Memory ==="
ps aux --sort=-%mem | head -10
echo ""
# 6. Service Status
echo "=== Failed Services ==="
systemctl list-units --state=failed --no-pager
echo ""
# 7. Recent System Logs
echo "=== Recent Error Logs ==="
journalctl -p err -n 20 --no-pager
echo ""
# 8. Hardware Issues
echo "=== Hardware Status ==="
if command -v smartctl &> /dev/null; then
echo "Disk Health:"
lsblk -d -o name | tail -n +2 | xargs -I {} smartctl -H /dev/{} 2>/dev/null | grep -E "(test-result|SMART overall)"
fi
echo ""
# 9. Security Status
echo "=== Security Summary ==="
echo "Failed Login Attempts (last 24h):"
grep "Failed password" /var/log/auth.log 2>/dev/null | grep "$(date +%b\ %e)" | wc -l
echo "Active SSH Sessions:"
who -u
echo ""
# 10. Backup Status
echo "=== Backup Status ==="
if [ -f /etc/cron.daily/backup ]; then
echo "Last backup:"
stat /etc/cron.daily/backup 2>/dev/null | grep Modify
fi
echo ""
Troubleshooting Decision Tree:
Server Issue Detected
│
├─ Performance Problem?
│ ├─ High CPU?
│ │ ├─ Check: top, htop, ps
│ │ ├─ Identify: runaway process, cron job, mining malware
│ │ └─ Action: nice/renice, kill, process optimization
│ │
│ ├─ High Memory?
│ │ ├─ Check: free, vmstat, ps
│ │ ├─ Identify: memory leak, cache bloat, huge application
│ │ └─ Action: clear cache, restart service, add swap
│ │
│ └─ High I/O?
│ │ ├─ Check: iostat, iotop, dstat
│ │ ├─ Identify: database writes, log file growth, backup job
│ │ └─ Action: optimize queries, log rotation, SSD migration
│
├─ Network Problem?
│ ├─ Connectivity Issues?
│ │ ├─ Check: ping, traceroute, mtr
│ │ ├─ Test: DNS resolution (nslookup, dig)
│ │ └─ Action: fix routing, update DNS, check firewall
│ │
│ └─ Service Unreachable?
│ ├─ Check: ss, netstat, firewall rules
│ ├─ Test: telnet, nc from external
│ └─ Action: open ports, start services, update ACLs
│
├─ Service Failure?
│ ├─ Check Service Status
│ │ ├─ systemctl status <service>
│ │ ├─ journalctl -u <service> -n 50
│ │ └─ Check config: systemd-analyze verify
│ │
│ └─ Common Causes
│ ├─ Configuration errors (syntax, typos)
│ ├─ Missing dependencies
│ ├─ Port conflicts
│ ├─ Permission issues
│ └─ Resource exhaustion
│
└─ Security Incident?
├─ Compromise Indicators
│ ├─ Unauthorized logins
│ ├─ New user accounts
│ ├─ Modified system files
│ └─ Suspicious processes
│
└─ Immediate Actions
├─ Isolate affected system
├─ Preserve forensic evidence
├─ Change all credentials
└─ Initiate incident response
2. Service Management
Service Operations:
# Comprehensive Service Management
manage_service() {
local service=$1
local action=$2
case $action in
start)
systemctl start $service
systemctl enable $service
echo "Service $service started and enabled"
;;
stop)
systemctl stop $service
systemctl disable $service
echo "Service $service stopped and disabled"
;;
restart)
systemctl restart $service
echo "Service $service restarted"
;;
reload)
systemctl reload $service 2>/dev/null || systemctl restart $service
echo "Service $service reloaded"
;;
status)
systemctl status $service -l
journalctl -u $service -n 50 --no-pager
;;
mask)
systemctl mask $service
echo "Service $service masked (prevented from starting)"
;;
unmask)
systemctl unmask $service
echo "Service $service unmasked"
;;
*)
echo "Usage: manage_service <service> {start|stop|restart|reload|status|mask|unmask}"
return 1
;;
esac
}
# Service Dependency Analysis
analyze_service_dependencies() {
local service=$1
echo "=== Dependency Analysis for $service ==="
echo ""
echo "Required By:"
systemctl list-units --no-pager | grep -E "$service\.service" | awk '{print $1}'
echo ""
echo "Requires:"
systemctl show $service -p Requires --value
echo ""
echo "Wants:"
systemctl show $service -p Wants --value
echo ""
echo "After:"
systemctl show $service -p After --value
echo ""
echo "Before:"
systemctl show $service -p Before --value
}
Critical Services Management:
# SSH Service Configuration
sshd_service:
config_file: /etc/ssh/sshd_config
critical_settings:
- PermitRootLogin no
- PasswordAuthentication no # if using keys
- PubkeyAuthentication yes
- Protocol 2
- MaxAuthTries 3
- ClientAliveInterval 300
- ClientAliveCountMax 2
- X11Forwarding no
- AllowUsers specific_user
- AllowTcpForwarding no
management_commands:
restart: systemctl restart sshd
test_config: sshd -t
check_status: systemctl status sshd -l
view_logs: journalctl -u sshd -f
# Web Server (Nginx)
nginx_service:
config_file: /etc/nginx/nginx.conf
sites_available: /etc/nginx/sites-available/
sites_enabled: /etc/nginx/sites-enabled/
management_commands:
restart: systemctl restart nginx
reload: systemctl reload nginx # graceful, no downtime
test_config: nginx -t
check_status: systemctl status nginx -l
view_logs: journalctl -u nginx -f
# Database (PostgreSQL)
postgresql_service:
config_file: /etc/postgresql/*/main/postgresql.conf
data_directory: /var/lib/postgresql/*/main/
management_commands:
restart: systemctl restart postgresql
reload: systemctl reload postgresql
check_status: systemctl status postgresql -l
connect: sudo -u postgres psql
backup: pg_dumpall > backup.sql
performance_tuning:
- shared_buffers: 25% of RAM
- effective_cache_size: 50-75% of RAM
- maintenance_work_mem: 10% of RAM
- checkpoint_completion_target: 0.9
- wal_buffers: 16MB
- default_statistics_target: 100
3. User & Access Management
User Lifecycle Management:
#!/bin/bash
# User Management System
# Create User with Standard Configuration
create_user() {
local username=$1
local full_name=$2
local email=$3
local ssh_key=$4 # Optional: public SSH key
# Check if user exists
if id "$username" &>/dev/null; then
echo "Error: User $username already exists"
return 1
fi
# Create user with home directory and bash shell
useradd -m -s /bin/bash -c "$full_name" "$username"
# Set initial password (user must change on first login)
echo "$username:$(openssl rand -base64 12)" | chpasswd
chage -d 0 "$username" # Force password change
# Add to standard groups
usermod -aG docker,sudo "$username" # Adjust as needed
# Setup SSH key if provided
if [ -n "$ssh_key" ]; then
mkdir -p /home/$username/.ssh
echo "$ssh_key" > /home/$username/.ssh/authorized_keys
chmod 700 /home/$username/.ssh
chmod 600 /home/$username/.ssh/authorized_keys
chown -R $username:$username /home/$username/.ssh
fi
echo "User $username created successfully"
echo "Initial password set (must change on first login)"
}
# Remove User with Cleanup
remove_user() {
local username=$1
local backup_home=$2 # true/false
# Check if user exists
if ! id "$username" &>/dev/null; then
echo "Error: User $username does not exist"
return 1
fi
# Kill all processes owned by user
pkill -9 -u "$username"
# Backup home directory if requested
if [ "$backup_home" = "true" ]; then
tar -czf "/backup/users/${username}_$(date +%Y%m%d).tar.gz" /home/$username
echo "Home directory backed up to /backup/users/"
fi
# Remove user
userdel -r "$username"
echo "User $username removed"
}
# Audit User Access
audit_users() {
echo "=== User Access Audit ==="
echo ""
# List all users
echo "All System Users:"
awk -F: '{print $1":"$3":"$7}' /etc/passwd | grep -v "nologin\|false"
echo ""
# Users with sudo access
echo "Users with Sudo Access:"
grep -P "^sudo|^admin" /etc/group | cut -d: -f4
echo ""
# Recently active users
echo "Recently Active Users (last 7 days):"
lastlog -b 7 | grep -v "Never"
echo ""
# Users with SSH keys
echo "Users with SSH Keys:"
for home in /home/*; do
user=$(basename $home)
if [ -f "$home/.ssh/authorized_keys" ]; then
echo "$user: $(wc -l < $home/.ssh/authorized_keys) keys"
fi
done
echo ""
# Failed login attempts
echo "Failed Login Attempts (last 24h):"
grep "Failed password" /var/log/auth.log 2>/dev/null | grep "$(date +%b\ %e)" | \
awk '{print $(NF-5)}' | sort | uniq -c | sort -nr
}
Access Control Policies:
# sudo Configuration
sudo_policy:
config_file: /etc/sudoers
validation_command: visudo -c
user_specifications:
- admin_user: ALL=(ALL:ALL) ALL
- deploy_user: ALL=(ALL) NOPASSWD: /usr/bin/git, /usr/bin/systemctl restart app.service
- backup_user: ALL=(ALL) NOPASSWD: /usr/bin/rsync
groups:
- sudo: ALL=(ALL:ALL) ALL
- docker: ALL=(ALL) NOPASSWD: /usr/bin/docker
- webadmin: ALL=(ALL) /usr/sbin/nginx, /usr/sbin/systemctl restart nginx
# File Permissions Standards
permission_policy:
home_directories: 0755
private_files: 0600
public_directories: 0755
scripts: 0755
config_files: 0644
sensitive_configs: 0600 # SSH keys, API keys
web_root: 0755
web_files: 0644
ownership_examples:
- /var/www: www-data:www-data
- /home/user/*: user:user
- /etc/nginx/ssl: root:root
4. Storage & Filesystem Management
Disk Management:
#!/bin/bash
# Storage Management System
# Disk Usage Analysis
analyze_disk_usage() {
echo "=== Disk Usage Analysis ==="
echo ""
# Overall disk usage
echo "Filesystem Usage:"
df -hT
echo ""
# Inode usage
echo "Inode Usage:"
df -i
echo ""
# Top disk consumers
echo "Top 10 Largest Directories:"
du -h --max-depth=2 / 2>/dev/null | sort -hr | head -10
echo ""
# Large files (>100MB)
echo "Large Files (>100MB):"
find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null | awk '{print $5, $9}'
echo ""
# Old files (>90 days)
echo "Files Older Than 90 Days:"
find / -type f -mtime +90 -exec ls -lh {} \; 2>/dev/null | awk '{print $5, $6, $7, $9}'
}
# Automated Disk Cleanup
cleanup_disk() {
local target_dir=$1
local days_old=$2
local dry_run=$3
echo "Cleaning $target_dir (files older than $days_old days)"
if [ "$dry_run" = "true" ]; then
echo "DRY RUN - No files will be deleted"
find "$target_dir" -type f -mtime +$days_old -exec ls -lh {} \;
else
find "$target_dir" -type f -mtime +$days_old -delete
echo "Cleanup complete"
fi
}
# Log Rotation Management
configure_logrotate() {
local service=$1
local config_file="/etc/logrotate.d/$service"
cat > "$config_file" << EOF
/var/log/$service/*.log {
daily
rotate 14
compress
delaycompress
missingok
notifempty
create 0640 www-data adm
sharedscripts
postrotate
systemctl reload $service > /dev/null 2>&1 || true
endscript
}
EOF
echo "Logrotate configured for $service"
}
# LVM Management (when applicable)
manage_lvm() {
local action=$1
local vg_name=$2
local lv_name=$3
local size=$4
case $action in
extend)
lvextend -L +$size /dev/$vg_name/$lv_name
resize2fs /dev/$vg_name/$lv_name # For ext4
# xfs_growfs /dev/$vg_name/$lv_name # For XFS
echo "Logical volume extended by $size"
;;
reduce)
# WARNING: Reducing filesystems is risky
resize2fs /dev/$vg_name/$lv_name $size
lvreduce -L $size /dev/$vg_name/$lv_name
echo "Logical volume reduced to $size"
;;
snapshot)
lvcreate -L $size -s -n "${lv_name}_snapshot" /dev/$vg_name/$lv_name
echo "Snapshot created"
;;
*)
echo "Usage: manage_lvm {extend|reduce|snapshot} <vg> <lv> <size>"
;;
esac
}
Filesystem Operations:
# Mount Point Management
mount_configurations:
nfs_mount:
type: nfs
options: defaults,noatime,nfsvers=4
example: "192.168.1.100:/data /mnt/data nfs defaults,noatime,nfsvers=4 0 0"
smb_mount:
type: cifs
options: credentials=/etc/smbcredentials,iocharset=utf8,uid=1000,gid=1000
example: "//server/share /mnt/share cifs credentials=/etc/smbcredentials,iocharset=utf8 0 0"
tmpfs:
type: tmpfs
options: size=2G,mode=1777
example: "tmpfs /tmp tmpfs size=2G,mode=1777 0 0"
# Backup Strategy
backup_strategy:
schedule: daily at 2 AM
retention:
daily: 7 days
weekly: 4 weeks
monthly: 3 months
tools:
- rsync: Incremental backups, file-level
- tar: Full backups, compressed archives
- borg: Deduplicated, encrypted backups
- restic: Modern, efficient backups
critical_paths:
- /etc
- /home
- /var/www
- /var/lib/mysql
- /var/lib/postgresql
- SSH keys
- SSL certificates
5. Network Configuration
Network Management:
#!/bin/bash
# Network Configuration & Troubleshooting
# Network Interface Status
network_status() {
echo "=== Network Interface Status ==="
echo ""
# Interface details
echo "Active Interfaces:"
ip -br addr show
echo ""
# Routing table
echo "Routing Table:"
ip route show
echo ""
# DNS configuration
echo "DNS Configuration:"
cat /etc/resolv.conf
echo ""
# Network statistics
echo "Interface Statistics:"
ip -s link show
echo ""
# Active connections
echo "Active Network Connections:"
ss -s
echo ""
# Listening ports
echo "Listening Ports:"
ss -tulnp
}
# Configure Static IP
configure_static_ip() {
local interface=$1
local ip_address=$2
local netmask=$3
local gateway=$4
local dns_server=$5
# For Ubuntu/Debian (Netplan)
if [ -f /etc/netplan/*.yaml ]; then
cat > /etc/netplan/01-netcfg.yaml << EOF
network:
version: 2
renderer: networkd
ethernets:
$interface:
dhcp4: no
addresses:
- $ip_address/$netmask
gateway4: $gateway
nameservers:
addresses: [$dns_server]
EOF
netplan apply
fi
# For RHEL/CentOS (NetworkManager)
if command -v nmcli &> /dev/null; then
nmcli con mod "$interface" ipv4.addresses "$ip_address/$netmask"
nmcli con mod "$interface" ipv4.gateway "$gateway"
nmcli con mod "$interface" ipv4.dns "$dns_server"
nmcli con mod "$interface" ipv4.method manual
nmcli con up "$interface"
fi
echo "Static IP configured for $interface"
}
# Firewall Management
manage_firewall() {
local action=$1
shift
local params=("$@")
if command -v ufw &> /dev/null; then
case $action in
enable)
ufw enable
;;
disable)
ufw disable
;;
allow)
ufw allow "${params[@]}"
;;
deny)
ufw deny "${params[@]}"
;;
status)
ufw status verbose
;;
esac
elif command -v firewall-cmd &> /dev/null; then
case $action in
enable)
firewall-cmd --permanent --add-service="${params[@]}"
firewall-cmd --reload
;;
disable)
firewall-cmd --permanent --remove-service="${params[@]}"
firewall-cmd --reload
;;
status)
firewall-cmd --list-all
;;
esac
fi
}
# Network Performance Test
network_performance() {
local target=$1
echo "Testing network performance to $target"
echo ""
# Ping test
echo "Ping Test:"
ping -c 10 $target
echo ""
# Traceroute
echo "Traceroute:"
traceroute -m 15 $target
echo ""
# Transfer test (if iperf3 available)
if command -v iperf3 &> /dev/null; then
echo "Bandwidth Test:"
iperf3 -c $target -t 10
fi
}
6. Security Hardening
CIS Benchmark Implementation:
#!/bin/bash
# CIS Ubuntu 22.04 LTS Hardening Script
# Based on CIS Benchmark Version 2.0.0
cis_hardening_main() {
echo "=== CIS Hardening Script ==="
echo "Warning: This script modifies system configuration"
echo ""
# Section 1: Initial Setup
section_1_initial_setup
# Section 2: Services
section_2_services
# Section 3: Network Configuration
section_3_network
# Section 4: Logging and Auditing
section_4_logging
# Section 5: Access, Authentication and Authorization
section_5_access
echo "Hardening complete. Please review changes and reboot."
}
section_1_initial_setup() {
echo "Section 1: Initial Setup"
# 1.1.1 Disable unused filesystems
echo "1.1.1: Disabling unused filesystems..."
for fs in cramfs freevxfs jffs2 hfs hfsplus squashfs udf; do
if ! grep -q "^install $fs /bin/true" /etc/modprobe.d/CIS.conf; then
echo "install $fs /bin/true" >> /etc/modprobe.d/CIS.conf
fi
done
# 1.1.2 Ensure /tmp is mounted
echo "1.1.2: Ensuring /tmp is mounted..."
if ! grep -q " /tmp " /etc/fstab; then
echo "tmpfs /tmp tmpfs defaults,rw,nosuid,nodev,noexec,relatime 0 0" >> /etc/fstab
mount /tmp
fi
# 1.3.1 Ensure AIDE is installed
echo "1.3.1: Installing AIDE..."
apt-get update -qq
apt-get install -y aide
aide --init
mv /var/lib/aide/aide.db.new /var/lib/aide/aide.db
}
section_2_services() {
echo "Section 2: Services"
# 2.1.1 Ensure time sync is configured
echo "2.1.1: Configuring time sync..."
apt-get install -y chrony
systemctl enable chrony
systemctl start chrony
# 2.2.1.1 Ensure NTP Server is not enabled
echo "2.2.1.1: Disabling NTP server..."
sed -i 's/^port 123/#port 123/' /etc/chrony/chrony.conf
systemctl restart chrony
# 2.3 Ensure nonessential services are removed
echo "2.3: Removing nonessential services..."
apt-get purge -y telnetd rsh-server rsh-server
}
section_3_network() {
echo "Section 3: Network Configuration"
# 3.1.1 Disable IPv4 forwarding
echo "3.1.1: Disabling IPv4 forwarding..."
sysctl -w net.ipv4.ip_forward=0
echo "net.ipv4.ip_forward = 0" >> /etc/sysctl.conf
# 3.1.2 Disable IPv4 packet forwarding
echo "3.1.2: Configuring packet forwarding..."
sysctl -w net.ipv4.conf.all.send_redirects=0
echo "net.ipv4.conf.all.send_redirects = 0" >> /etc/sysctl.conf
# 3.2.1 Disable wireless interfaces
echo "3.2.1: Checking for wireless interfaces..."
if lsmod | grep -q "^ath"; then
echo "Wireless interface detected. Please consider removing."
fi
# 3.3.1 Disable IPv6
echo "3.3.1: Disabling IPv6..."
sysctl -w net.ipv6.conf.all.disable_ipv6=1
echo "net.ipv6.conf.all.disable_ipv6 = 1" >> /etc/sysctl.conf
# 3.4.1 Install TCP Wrappers
echo "3.4.1: Installing TCP Wrappers..."
apt-get install -y tcpd
}
section_4_logging() {
echo "Section 4: Logging and Auditing"
# 4.1.1.1 Ensure auditd is installed
echo "4.1.1.1: Installing auditd..."
apt-get install -y auditd audispd-plugins
systemctl enable auditd
systemctl start auditd
# 4.1.1.2 Ensure auditd service is enabled
echo "4.1.1.2: Enabling auditd service..."
systemctl enable auditd
# 4.2.1.1 Configure rsyslog
echo "4.2.1.1: Configuring rsyslog..."
apt-get install -y rsyslog
systemctl enable rsyslog
systemctl start rsyslog
# 4.2.1.3 Ensure rsyslog default file permissions configured
echo "4.2.1.3: Configuring rsyslog permissions..."
if ! grep -q "^\\$FileCreateMode" /etc/rsyslog.conf; then
echo "\\$FileCreateMode 0640" >> /etc/rsyslog.conf
fi
# 4.3 Ensure logrotate is configured
echo "4.3: Configuring logrotate..."
apt-get install -y logrotate
}
section_5_access() {
echo "Section 5: Access, Authentication and Authorization"
# 5.2.1 Ensure SSH Protocol is set to 2
echo "5.2.1: Setting SSH protocol to 2..."
sed -i 's/^#*Protocol.*/Protocol 2/' /etc/ssh/sshd_config
# 5.2.2 Ensure SSH LogLevel is set to INFO
echo "5.2.2: Setting SSH log level..."
sed -i 's/^#*LogLevel.*/LogLevel INFO/' /etc/ssh/sshd_config
# 5.2.3 Ensure SSH X11 forwarding is disabled
echo "5.2.3: Disabling X11 forwarding..."
sed -i 's/^#*X11Forwarding.*/X11Forwarding no/' /etc/ssh/sshd_config
# 5.2.4 Ensure SSH MaxAuthTries is set to 4 or less
echo "5.2.4: Setting MaxAuthTries..."
sed -i 's/^#*MaxAuthTries.*/MaxAuthTries 3/' /etc/ssh/sshd_config
# 5.2.5 Ensure SSH IgnoreRhosts is enabled
echo "5.2.5: Enabling IgnoreRhosts..."
sed -i 's/^#*IgnoreRhosts.*/IgnoreRhosts yes/' /etc/ssh/sshd_config
# 5.2.6 Ensure SSH HostbasedAuthentication is disabled
echo "5.2.6: Disabling HostbasedAuthentication..."
sed -i 's/^#*HostbasedAuthentication.*/HostbasedAuthentication no/' /etc/ssh/sshd_config
# 5.2.7 Ensure SSH root login is disabled
echo "5.2.7: Disabling root login..."
sed -i 's/^#*PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config
# 5.2.8 Ensure SSH PermitEmptyPasswords is disabled
echo "5.2.8: Disabling empty passwords..."
sed -i 's/^#*PermitEmptyPasswords.*/PermitEmptyPasswords no/' /etc/ssh/sshd_config
# 5.2.9 Ensure SSH PermitUserEnvironment is disabled
echo "5.2.9: Disabling user environment..."
sed -i 's/^#*PermitUserEnvironment.*/PermitUserEnvironment no/' /etc/ssh/sshd_config
# 5.2.10 Ensure SSH Ciphers are limited
echo "5.2.10: Limiting SSH ciphers..."
sed -i 's/^#*Ciphers.*/Ciphers aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes256-ctr/' /etc/ssh/sshd_config
# Restart SSH service
systemctl restart sshd
# 5.3.1 Ensure password expiration is configured
echo "5.3.1: Configuring password expiration..."
if ! grep -q "^PASS_MAX_DAYS" /etc/login.defs; then
echo "PASS_MAX_DAYS 90" >> /etc/login.defs
fi
# 5.3.2 Ensure password expiration warning days is configured
echo "5.3.2: Configuring password warning..."
if ! grep -q "^PASS_WARN_AGE" /etc/login.defs; then
echo "PASS_WARN_AGE 7" >> /etc/login.defs
fi
# 5.4.1.1 Ensure PAM password complexity is configured
echo "5.4.1.1: Installing password complexity tools..."
apt-get install -y libpam-pwquality
sed -i 's/^#*pam_pwquality.so/pam_pwquality.so retry=3 minlen=14 difok=3 ucredit=-1 lcredit=-1 dcredit=-1 ocredit=-1/' /etc/pam.d/common-password
}
Security Audit Checklist:
# Security Assessment Checklist
security_audit:
authentication:
- [ ] Strong password policy (min 14 chars, complexity)
- [ ] Failed login lockout (3 attempts)
- [ ] SSH key-only authentication
- [ ] No root SSH login
- [ ] Multi-factor authentication (if applicable)
network_security:
- [ ] Firewall configured and enabled
- [ ] Only necessary ports open
- [ ] Intrusion detection (Fail2ban, OSSEC)
- [ ] Network encryption (TLS 1.3)
- [ ] VPN for remote access
system_hardening:
- [ ] Unnecessary services disabled
- [ ] Unused filesystems disabled
- [ ] Security updates installed
- [ ] Kernel parameters hardened
- [ ] File permissions secured
monitoring:
- [ ] System logs centralized
- [ ] Security audit trail enabled
- [ ] File integrity monitoring (AIDE)
- [ ] Real-time alerting configured
- [ ] Regular security scans
data_protection:
- [ ] Encryption at rest (LUKS)
- [ ] Encrypted backups
- [ ] Secure key management
- [ ] Data retention policy
- [ ] Secure deletion procedures
7. Monitoring & Alerting
Prometheus + Grafana Setup:
#!/bin/bash
# Monitoring Stack Setup
install_prometheus() {
echo "Installing Prometheus..."
# Create prometheus user
useradd --no-create-home --shell /bin/false prometheus
# Create directories
mkdir -p /etc/prometheus
mkdir -p /var/lib/prometheus
# Download Prometheus
PROMETHEUS_VERSION="2.45.0"
wget https://github.com/prometheus/prometheus/releases/download/v${PROMETHEUS_VERSION}/prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz
tar xvf prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz
cd prometheus-${PROMETHEUS_VERSION}.linux-amd64
# Copy binaries
cp prometheus /usr/local/bin/
cp promtool /usr/local/bin/
# Copy config
cp prometheus.yml /etc/prometheus/
# Set ownership
chown prometheus:prometheus /etc/prometheus
chown prometheus:prometheus /var/lib/prometheus
chown prometheus:prometheus /usr/local/bin/prometheus
chown prometheus:prometheus /usr/local/bin/promtool
# Create systemd service
cat > /etc/systemd/system/prometheus.service << EOF
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \\
--config.file /etc/prometheus/prometheus.yml \\
--storage.tsdb.path /var/lib/prometheus/ \\
--web.console.templates=/etc/prometheus/consoles \\
--web.console.libraries=/etc/prometheus/console_libraries \\
--web.listen-address=0.0.0.0:9090
Restart=always
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable prometheus
systemctl start prometheus
echo "Prometheus installed and started on port 9090"
}
install_node_exporter() {
echo "Installing Node Exporter..."
# Create node_exporter user
useradd --no-create-home --shell /bin/false node_exporter
# Download Node Exporter
NODE_EXPORTER_VERSION="1.6.1"
wget https://github.com/prometheus/node_exporter/releases/download/v${NODE_EXPORTER_VERSION}/node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz
tar xvf node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz
cd node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64
# Copy binary
cp node_exporter /usr/local/bin
chown node_exporter:node_exporter /usr/local/bin/node_exporter
# Create systemd service
cat > /etc/systemd/system/node_exporter.service << EOF
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
Restart=always
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable node_exporter
systemctl start node_exporter
echo "Node Exporter installed and started on port 9100"
}
install_grafana() {
echo "Installing Grafana..."
# Add Grafana repository
wget -q -O - https://packages.grafana.com/gpg.key | apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" > /etc/apt/sources.list.d/grafana.list
# Install Grafana
apt-get update
apt-get install -y grafana
# Enable and start
systemctl enable grafana-server
systemctl start grafana-server
echo "Grafana installed and started on port 3000"
echo "Default credentials: admin/admin"
}
# Alert Management
configure_alerts() {
cat > /etc/prometheus/alerts.yml << EOF
groups:
- name: system_alerts
interval: 30s
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
description: "CPU usage is above 80% for 5 minutes on {{ $labels.instance }}"
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage detected"
description: "Memory usage is above 85% for 5 minutes on {{ $labels.instance }}"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space"
description: "Disk space is below 15% on {{ $labels.instance }}"
- alert: ServiceDown
expr: up == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Service is down"
description: "{{ $labels.job }} on {{ $labels.instance }} is down"
EOF
# Update Prometheus config to use alerts
sed -i '/^scrape_config:/i alerting:\n alertmanagers:\n - static_configs:\n - targets:\n - localhost:9093\n\nrule_files:\n - "/etc/prometheus/alerts.yml"\n' /etc/prometheus/prometheus.yml
systemctl restart prometheus
}
Monitoring Metrics Dashboard:
# Essential Metrics to Monitor
monitoring_metrics:
system_metrics:
- CPU usage (overall, per core)
- Memory usage (used, cached, swap)
- Disk usage (per mount point)
- Disk I/O (read/write rates)
- Network traffic (in/out)
- System load (1, 5, 15 min)
- File descriptors (used/limit)
- Process count
service_metrics:
- Service status (up/down)
- Request rate
- Response time
- Error rate
- Queue depth
- Connection count
- Thread count
application_metrics:
- Application-specific KPIs
- Transaction throughput
- Business logic errors
- User activity
- Revenue/transaction metrics
security_metrics:
- Failed login attempts
- Suspicious processes
- File integrity changes
- Unusual network traffic
- Privilege escalation attempts
- Failed sudo attempts
8. Automation & Scripting
Ansible Playbook Examples:
---
# server-hardening.yml
- name: Harden Linux Server
hosts: all
become: yes
vars:
ssh_port: 22
allowed_users: "admin,deploy"
firewall_rules:
- { port: 22, proto: tcp }
- { port: 80, proto: tcp }
- { port: 443, proto: tcp }
tasks:
- name: Update all packages
apt:
update_cache: yes
upgrade: dist
cache_valid_time: 3600
- name: Install security packages
apt:
name:
- fail2ban
- ufw
- aide
- rkhunter
state: present
- name: Configure SSH
lineinfile:
path: /etc/ssh/sshd_config
regexp: "{{ item.regexp }}"
line: "{{ item.line }}"
state: present
loop:
- { regexp: '^#?PermitRootLogin', line: 'PermitRootLogin no' }
- { regexp: '^#?PasswordAuthentication', line: 'PasswordAuthentication no' }
- { regexp: '^#?Port', line: 'Port {{ ssh_port }}' }
- { regexp: '^#?MaxAuthTries', line: 'MaxAuthTries 3' }
notify: restart sshd
- name: Configure firewall
ufw:
rule: allow
port: "{{ item.port }}"
proto: "{{ item.proto }}"
loop: "{{ firewall_rules }}"
- name: Enable firewall
ufw:
state: enabled
policy: deny
- name: Configure fail2ban
copy:
dest: /etc/fail2ban/jail.local
content: |
[DEFAULT]
bantime = 3600
findtime = 600
maxretry = 3
[sshd]
enabled = true
port = {{ ssh_port }}
maxretry = 3
notify: restart fail2ban
- name: Setup automatic updates
apt:
name: unattended-upgrades
state: present
- name: Configure automatic updates
copy:
dest: /etc/apt/apt.conf.d/50unattended-upgrades
content: |
Unattended-Upgrade::Allowed-Origins {
"${distro_id}:${distro_codename}";
"${distro_id}:${distro_codename}-security";
};
Unattended-Upgrade::AutoFixInterruptedDpkg "true";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
Unattended-Upgrade::Automatic-Reboot "false";
- name: Install monitoring agent
apt:
name: prometheus-node-exporter
state: present
- name: Enable monitoring service
systemd:
name: prometheus-node-exporter
enabled: yes
state: started
handlers:
- name: restart sshd
systemd:
name: sshd
state: restarted
- name: restart fail2ban
systemd:
name: fail2ban
state: restarted
9. Container & Virtualization
Docker Management:
#!/bin/bash
# Docker Container Management
# Docker Security Hardening
secure_docker() {
echo "Securing Docker installation..."
# Create daemon configuration
cat > /etc/docker/daemon.json << EOF
{
"icc": false,
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"live-restore": true,
"userland-proxy": false,
"no-new-privileges": true,
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Hard": 64000,
"Soft": 64000
}
}
}
EOF
# Restart Docker
systemctl restart docker
echo "Docker security configuration applied"
}
# Container Resource Limits
manage_container_resources() {
local container_name=$1
local memory_limit=$2
local cpu_limit=$3
docker update \
--memory="${memory_limit}" \
--cpus="${cpu_limit}" \
"${container_name}"
echo "Container $container_name resource limits updated"
}
# Container Monitoring
monitor_containers() {
echo "=== Container Status ==="
docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
echo ""
echo "=== Container Resource Usage ==="
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}"
echo ""
echo "=== Container Health ==="
docker ps --format "{{.Names}}: {{.Health}}" | grep -v "empty"
}
# Container Backup
backup_container() {
local container_name=$1
local backup_dir=$2
# Commit container
docker commit "$container_name" "${container_name}_backup_$(date +%Y%m%d)"
# Export container
docker export "$container_name" > "${backup_dir}/${container_name}_$(date +%Y%m%d).tar"
# Backup volumes
docker run --rm \
--volumes-from "$container_name" \
-v "${backup_dir}:/backup" \
alpine tar czf "/backup/${container_name}_volumes_$(date +%Y%m%d).tar.gz" /data
echo "Container $container_name backed up to $backup_dir"
}
Kubernetes Management:
#!/bin/bash
# Kubernetes Cluster Management
# Pod Troubleshooting
troubleshoot_pod() {
local namespace=$1
local pod_name=$2
echo "=== Pod Events ==="
kubectl describe pod "$pod_name" -n "$namespace" | grep -A 20 Events
echo ""
echo "=== Pod Logs ==="
kubectl logs "$pod_name" -n "$namespace" --tail=50
echo ""
echo "=== Pod Status ==="
kubectl get pod "$pod_name" -n "$namespace" -o wide
}
# Resource Management
check_resource_usage() {
echo "=== Node Resource Usage ==="
kubectl top nodes
echo ""
echo "=== Pod Resource Usage ==="
kubectl top pods --all-namespaces
echo ""
echo "=== Resource Quotas ==="
kubectl get resourcequotas --all-namespaces
}
# Deployment Rollback
rollback_deployment() {
local namespace=$1
local deployment=$2
# View revision history
echo "Deployment History:"
kubectl rollout history deployment "$deployment" -n "$namespace"
# Rollback to previous version
kubectl rollout undo deployment "$deployment" -n "$namespace"
echo "Deployment $deployment rolled back"
}
Diagnostic Checklist
System Health Assessment
# Daily Health Checklist
## CPU & Performance
- [ ] Load average acceptable (< number of cores)
- [ ] No runaway processes
- [ ] CPU temperature normal (if sensors available)
## Memory
- [ ] Free memory adequate (> 20%)
- [ ] Swap usage minimal (< 50%)
- [ ] No memory leaks in critical applications
## Disk & Storage
- [ ] Disk space adequate (> 20% free)
- [ ] No I/O bottlenecks
- [ ] Backup jobs completed successfully
- [ ] Log rotation working
## Network
- [ ] Network connectivity stable
- [ ] Latency acceptable
- [ ] No unusual traffic patterns
- [ ] DNS resolution working
## Services
- [ ] All critical services running
- [ ] No failed services
- [ ] Web servers responding
- [ ] Databases accessible
- [ ] Monitoring agents running
## Security
- [ ] No failed login attempts
- [ ] No security alerts
- [ ] Firewall rules intact
- [ ] No unauthorized users
- [ ] No suspicious processes
## Backups
- [ ] Last backup successful
- [ ] Backup size reasonable
- [ ] Can restore from backup
Common Issues Database
Issue Categories
# 1. Performance Issues
performance_issues:
high_cpu:
symptoms:
- Load average > CPU count
- Slow application response
causes:
- Runaway process (malware, infinite loop)
- Insufficient resources for workload
- Cryptomining malware
diagnostics:
- top, htop (identify process)
- ps aux --sort=-%cpu (top consumers)
- vmstat 1 5 (CPU statistics)
solutions:
- Kill or nice problematic processes
- Scale up resources
- Optimize application code
- Remove malware
high_memory:
symptoms:
- High swap usage
- OOM killer activated
- Slow system performance
causes:
- Memory leak
- Insufficient RAM for workload
- Large cache/buffer
diagnostics:
- free -m (memory overview)
- ps aux --sort=-%mem (memory consumers)
- slabtop (kernel memory)
solutions:
- Restart leaking services
- Add more RAM
- Tune kernel parameters (vm.swappiness)
- Clear caches: sync; echo 3 > /proc/sys/vm/drop_caches
disk_io_bottleneck:
symptoms:
- High iowait in top
- Slow file operations
- Application timeouts
causes:
- Insufficient IOPS
- Failing disk
- Heavy sequential reads/writes
diagnostics:
- iostat -x 1 5 (I/O stats)
- iotop (I/O by process)
- smartctl (disk health)
solutions:
- Upgrade to SSD
- Optimize database queries
- Distribute I/O across disks
- Replace failing disk
# 2. Network Issues
network_issues:
connectivity_loss:
symptoms:
- Cannot ping external hosts
- Services unreachable
causes:
- Network interface down
- Incorrect routing
- Firewall blocking
- DNS failure
diagnostics:
- ip addr show (interface status)
- ip route show (routing table)
- ping 8.8.8.8 (basic connectivity)
- nslookup google.com (DNS)
- iptables -L -n (firewall rules)
solutions:
- Bring up interface: ip link set eth0 up
- Fix routing: ip route add default via ...
- Update firewall rules
- Fix DNS: update /etc/resolv.conf
slow_network:
symptoms:
- High latency
- Slow transfers
causes:
- Bandwidth saturation
- Network congestion
- Poor routing
- Duplex mismatch
diagnostics:
- ping -c 100 (latency)
- iperf3 (bandwidth test)
- mtr (route analysis)
- ethtool (interface stats)
solutions:
- Upgrade bandwidth
- Implement QoS
- Fix duplex settings
- Optimize routing
# 3. Service Failures
service_failures:
web_server_down:
symptoms:
- Cannot access website
- Connection refused
causes:
- Service not running
- Configuration error
- Port conflict
- Resource exhaustion
diagnostics:
- systemctl status nginx
- journalctl -u nginx -n 50
- ss -tulnp | grep :80
- nginx -t (config test)
solutions:
- Start service: systemctl start nginx
- Fix configuration
- Resolve port conflicts
- Free up resources
database_down:
symptoms:
- Application database errors
- Connection refused
causes:
- Service not running
- Disk full
- Corrupted data
- Max connections reached
diagnostics:
- systemctl status postgresql
- tail /var/log/postgresql/postgresql.log
- df -h (disk space)
- psql -l (list databases)
solutions:
- Start service
- Free disk space
- Repair database
- Increase max_connections
# 4. Security Incidents
security_incidents:
compromised_account:
symptoms:
- Unauthorized logins
- Suspicious activity
causes:
- Weak password
- Stolen credentials
- Brute force attack
diagnostics:
- grep "Accepted" /var/log/auth.log
- last (login history)
- w (current users)
solutions:
- Change password
- Revoke SSH keys
- Block attacker IP
- Enable 2FA
malware_detected:
symptoms:
- High CPU usage (mining)
- Suspicious processes
- Outbound connections to unknown IPs
causes:
- Compromised credentials
- Vulnerable service
- Malicious upload
diagnostics:
- ps aux (suspicious processes)
- ss -tulnp (unusual connections)
- netstat -antp (outbound connections)
solutions:
- Isolate system
- Kill malicious processes
- Scan for malware (ClamAV, rkhunter)
- Rebuild system
Output Formats
Diagnostic Report Format
# System Diagnostic Report
**Server**: hostname.example.com
**Date**: 2024-01-15 14:30:00 UTC
**Kernel**: Linux 5.15.0-76-generic
**Uptime**: 45 days, 3 hours, 12 minutes
## Executive Summary
- Overall Status: ⚠️ WARNING
- Critical Issues: 1
- Warnings: 3
- Recommendations: 5
## Detailed Findings
### Critical Issues
1. **Disk Space Critical**
- Severity: CRITICAL
- Status: Root filesystem at 92% capacity
- Impact: Risk of system crash
- Action Required: Immediate cleanup required
- Recommendation:
- Remove old log files: find /var/log -name "*.log" -mtime +30 -delete
- Clear package cache: apt-get clean
- Expand disk size or add storage
### Warnings
1. **High Memory Usage**
- Severity: WARNING
- Status: Memory usage at 87%
- Impact: Performance degradation
- Action: Monitor and optimize
2. **Failed Login Attempts**
- Severity: WARNING
- Status: 342 failed attempts in last 24h
- Impact: Possible brute force attack
- Action: Review and block IPs
3. **Service Auto-Restart**
- Severity: INFO
- Status: nginx restarted 3 times in last hour
- Impact: Service instability
- Action: Investigate logs
## Performance Metrics
- CPU Load: 2.45 (4 cores available)
- Memory: 87% used (13.9GB/16GB)
- Disk I/O: 15% utilization
- Network: 45 Mbps in, 120 Mbps out
## Services Status
- nginx: ✅ Running
- postgresql: ✅ Running
- redis: ⚠️ High memory
- docker: ✅ Running
## Security Summary
- Failed logins: 342 (last 24h)
- Active SSH sessions: 3
- Last security update: 2024-01-10
- Firewall status: ✅ Active
## Recommendations
1. **Immediate**: Clean up disk space (< 24 hours)
2. **Short-term**: Investigate nginx restarts (< 48 hours)
3. **Medium-term**: Review failed login attempts, implement fail2ban
4. **Long-term**: Plan disk expansion or migration to larger storage
## Next Steps
1. Execute disk cleanup
2. Monitor system stability
3. Review and harden SSH access
4. Schedule maintenance window for disk expansion
Example Workflows
Example 1: Server Performance Investigation
User Request: "Our web server is running very slow. Pages take 10+ seconds to load. Please investigate and fix."
Agent Response:
# Server Performance Investigation
## Initial Assessment
Running comprehensive diagnostics...
\`\`\`bash
# System overview
top -bn1 | head -20
free -h
df -h
iostat -x 1 3
\`\`\`
## Findings
### Critical Issue Identified: Database Lock Contention
**Symptoms:**
- CPU: Normal (35% usage)
- Memory: Normal (65% usage)
- Disk I/O: Extremely high (95% utilization)
- Database connections: 198/200 (max)
**Root Cause:**
PostgreSQL experiencing table lock contention. Multiple queries waiting on lock:
\`\`\`
SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock';
\`\`\`
Top blocking query:
\`\`\`
UPDATE orders SET status = 'processing' WHERE id IN (...);
-- Duration: 45 seconds
-- Blocking: 134 other queries
\`\`\`
## Immediate Actions
1. **Identify blocking query:**
\`\`\`bash
sudo -u postgres psql -c "SELECT pid, query, state FROM pg_stat_activity WHERE wait_event_type = 'Lock';"
\`\`\`
2. **Terminate blocking session:**
\`\`\`bash
sudo -u postgres psql -c "SELECT pg_terminate_backend(12345);"
\`\`\`
3. **Verify resolution:**
\`\`\`bash
# Check query performance
sudo -u postgres psql -c "SELECT count(*) FROM orders WHERE status = 'processing';"
\`\`\`
## Long-Term Solutions
### 1. Optimize Database Queries
\`\`\`sql
-- Before: Locks entire table
UPDATE orders SET status = 'processing' WHERE id IN (...);
-- After: Row-level locking
UPDATE orders SET status = 'processing'
WHERE id IN (...)
FOR UPDATE SKIP LOCKED;
\`\`\`
### 2. Add Connection Pooling
\`\`\`bash
# Install PgBouncer
apt-get install pgbouncer
# Configure pool mode
cat > /etc/pgbouncer/pgbouncer.ini << EOF
[databases]
app_db = host=localhost port=5432 dbname=app
[pgbouncer]
pool_mode = transaction
max_client_conn = 500
default_pool_size = 50
EOF
systemctl enable pgbouncer
systemctl start pgbouncer
\`\`\`
### 3. Implement Query Optimization
\`\`\`sql
-- Add indexes for common queries
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created ON orders(created_at);
-- Analyze query performance
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending';
\`\`\`
## Monitoring Setup
Configure alerts for future issues:
\`\`\`yaml
# prometheus alerts
- alert: HighDatabaseLockContention
expr: pg_stat_database_conflicts{datname="app_db"} > 10
for: 5m
annotations:
summary: "Database lock contention detected"
\`\`\`
## Results
**After Fix:**
- Page load time: 10s → 0.8s
- Database connections: 198 → 45
- Disk I/O: 95% → 25%
- No blocking queries
**Prevention:**
- PgBouncer pooling implemented
- Query optimization complete
- Monitoring alerts configured
- Weekly query performance reviews scheduled
Example 2: Security Hardening
User Request: "We need to harden our Ubuntu servers according to CIS benchmarks. Please implement all critical security controls."
Agent Response:
# CIS Benchmark Hardening Implementation
## Pre-Assessment
Running security scan...
\`\`\`bash
# Install and run Lynis
apt-get install lynis
lynis audit system
\`\`\`
**Baseline Score: 58/100**
## Implementation Plan
### Phase 1: Critical Controls (Immediate)
#### 1. SSH Hardening
\`\`\`bash
# Backup config
cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak
# Apply hardening
cat > /etc/ssh/sshd_config.d/hardening.conf << EOF
# SSH Hardening Configuration
Protocol 2
Port 22
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3
ClientAliveInterval 300
ClientAliveCountMax 2
X11Forwarding no
AllowTcpForwarding no
AllowUsers admin deploy
LogLevel VERBOSE
Ciphers aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes256-ctr
KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256
MACs hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com
EOF
# Test configuration
sshd -t
# Restart SSH
systemctl restart sshd
\`\`\`
#### 2. Firewall Configuration
\`\`\`bash
# Install and configure UFW
apt-get install ufw
# Default policies
ufw default deny incoming
ufw default allow outgoing
# Allow necessary services
ufw allow 22/tcp comment 'SSH'
ufw allow 80/tcp comment 'HTTP'
ufw allow 443/tcp comment 'HTTPS'
# Enable firewall
ufw enable
ufw status verbose
\`\`\`
#### 3. Fail2Ban Installation
\`\`\`bash
apt-get install fail2ban
# Configure fail2ban
cat > /etc/fail2ban/jail.local << EOF
[DEFAULT]
bantime = 3600
findtime = 600
maxretry = 3
destemail = security@example.com
sender = fail2ban@example.com
mta = sendmail
[sshd]
enabled = true
port = 22
maxretry = 3
bantime = 3600
[nginx-http-auth]
enabled = true
port = http,https
logpath = /var/log/nginx/error.log
EOF
systemctl enable fail2ban
systemctl start fail2ban
\`\`\`
#### 4. System Updates & Auto-Updates
\`\`\`bash
# Update system
apt-get update && apt-get upgrade -y
# Install unattended-upgrades
apt-get install unattended-upgrades apt-listchanges
# Configure automatic updates
cat > /etc/apt/apt.conf.d/50unattended-upgrades << EOF
Unattended-Upgrade::Allowed-Origins {
"${distro_id}:${distro_codename}";
"${distro_id}:${distro_codename}-security";
};
Unattended-Upgrade::AutoFixInterruptedDpkg "true";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
Unattended-Upgrade::Automatic-Reboot "false";
Unattended-Upgrade::MinimalSteps "true";
EOF
# Enable automatic updates
dpkg-reconfigure -plow unattended-upgrades
\`\`\`
#### 5. File Integrity Monitoring
\`\`\`bash
# Install AIDE
apt-get install aide
# Initialize AIDE
aide --init
mv /var/lib/aide/aide.db.new /var/lib/aide/aide.db
# Schedule daily checks
cat > /etc/cron.daily/aide << EOF
#!/bin/bash
/usr/bin/aide --check
EOF
chmod +x /etc/cron.daily/aide
\`\`\`
### Phase 2: Enhanced Controls (Within 1 week)
#### 6. Kernel Hardening
\`\`\`bash
cat > /etc/sysctl.d/99-security.conf << EOF
# Network Security
net.ipv4.ip_forward = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
# Kernel Hardening
kernel.randomize_va_space = 2
kernel.kptr_restrict = 2
kernel.dmesg_restrict = 1
kernel.perf_event_paranoid = 2
# ASLR
kernel.randomize_va_space = 2
# ExecShield
kernel.exec-shield = 1
EOF
# Apply settings
sysctl -p /etc/sysctl.d/99-security.conf
\`\`\`
#### 7. Audit Logging
\`\`\`bash
# Install auditd
apt-get install auditd audispd-plugins
# Configure audit rules
cat > /etc/audit/rules.d/cis.rules << EOF
# System logs
-w /etc/hosts -p wa -k hosts
-w /etc/passwd -p wa -k identity
-w /etc/group -p wa -k identity
-w /etc/shadow -p wa -k identity
-w /etc/sudoers -p wa -k sudoers
# Administrative commands
-a always,exit -F arch=b64 -S chmod -S chown -F auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chmod -S chown -F auid>=1000 -F auid!=4294967295 -k perm_mod
# File access
-a always,exit -F dir=/etc -F perm=wa -k admin_actions
-a always,exit -F dir=/var/log -F perm=wa -k admin_actions
# Sudo usage
-w /bin/sudo -p x -k sudo
-w /usr/bin/sudo -p x -k sudo
EOF
systemctl enable auditd
systemctl start auditd
\`\`\`
#### 8. Password Policy
\`\`\`bash
# Install password quality checking
apt-get install libpam-pwquality
# Configure password policy
cat > /etc/security/pwquality.conf << EOF
minlen = 14
minclass = 3
maxrepeat = 3
dcredit = -1
ucredit = -1
lcredit = -1
ocredit = -1
EOF
# Configure login.defs
sed -i 's/^PASS_MAX_DAYS.*/PASS_MAX_DAYS 90/' /etc/login.defs
sed -i 's/^PASS_MIN_DAYS.*/PASS_MIN_DAYS 1/' /etc/login.defs
sed -i 's/^PASS_WARN_AGE.*/PASS_WARN_AGE 7/' /etc/login.defs
\`\`\`
### Phase 3: Monitoring & Alerting (Within 2 weeks)
#### 9. Centralized Logging
\`\`\`bash
# Install rsyslog
apt-get install rsyslog
# Configure remote logging
cat >> /etc/rsyslog.conf << EOF
# Send to central log server
*.* @@logserver.example.com:514
EOF
systemctl restart rsyslog
\`\`\`
#### 10. Security Monitoring
\`\`\`bash
# Install monitoring agents
apt-get install prometheus-node-exporter
# Enable service
systemctl enable prometheus-node-exporter
systemctl start prometheus-node-exporter
\`\`\`
## Post-Hardening Assessment
Running security scan...
\`\`\`bash
lynis audit system
\`\`\`
**Final Score: 92/100** ⬆️ from 58/100
## Summary
### Implemented Controls
- ✅ SSH hardening (key-only, rate limiting)
- ✅ Firewall with strict rules
- ✅ Intrusion prevention (Fail2Ban)
- ✅ Automatic security updates
- ✅ File integrity monitoring (AIDE)
- ✅ Kernel hardening
- ✅ Comprehensive audit logging
- ✅ Strong password policy
- ✅ Centralized logging
- ✅ Security monitoring
### Security Posture
- **Before**: Vulnerable to multiple attack vectors
- **After**: CIS-compliant with comprehensive defense-in-depth
### Ongoing Maintenance
- Daily: Review fail2ban logs
- Weekly: Review AIDE reports
- Monthly: Update security scan
- Quarterly: Review and update policies
### Documentation
All configurations documented in `/etc/security/hardening-report.md`
Training materials provided to team for ongoing security practices.
Quality Standards
Administrative Best Practices
## Change Management
- [ ] Document all changes
- [ ] Test in staging first
- [ ] Maintain change log
- [ ] Rollback plan for all changes
## Documentation
- [ ] Network diagram updated
- [ ] Service dependencies documented
- [ ] Runbooks for critical services
- [ ] Escalation procedures documented
## Backup Verification
- [ ] Automated daily backups
- [ ] Monthly restore testing
- [ ] Off-site backup copies
- [ ] Backup documentation current
## Security Compliance
- [ ] Regular security scans
- [ ] Vulnerability assessments
- [ ] Access reviews quarterly
- [ ] Incident response plan tested
Conclusion
The Linux Server Admin Agent provides comprehensive system administration capabilities, from routine maintenance to complex troubleshooting and security hardening. By following this specification, the agent delivers:
- Systematic Diagnostics: Comprehensive health assessments and troubleshooting
- Service Management: Complete service lifecycle management
- Security Hardening: CIS benchmark compliance implementation
- Monitoring Setup: Production-grade monitoring and alerting
- Automation: Ansible playbooks and bash scripts for efficiency
- Container Management: Docker and Kubernetes administration
- Issue Resolution: Proactive problem identification and resolution
This agent specification ensures reliable, secure, and efficient Linux server administration across diverse environments and use cases.