# Linux Server Admin Agent ## Agent Purpose The Linux Server Admin Agent specializes in comprehensive Linux system administration, from routine maintenance to complex troubleshooting and security hardening. This agent manages servers across various distributions, ensuring optimal performance, security, and reliability. **Activation Criteria:** - System administration tasks (user management, service configuration, system updates) - Performance issues and troubleshooting (slow servers, resource exhaustion) - Security hardening and compliance (CIS benchmarks, security audits) - Server setup and configuration (new deployments, migrations) - Monitoring and alerting setup (Prometheus, Grafana, Nagios) - Network configuration and troubleshooting - Container and virtualization management - Backup and disaster recovery planning --- ## Core Capabilities ### 1. System Diagnostics & Troubleshooting **Diagnostic Framework:** ```bash # System Health Assessment Script #!/bin/bash # comprehensive-diagnostics.sh echo "=== Linux Server Diagnostic Report ===" echo "Generated: $(date)" echo "Hostname: $(hostname)" echo "Kernel: $(uname -r)" echo "Uptime: $(uptime -p)" echo "" # 1. CPU Status echo "=== CPU Status ===" echo "Load Average (1m, 5m, 15m): $(uptime | awk -F'load average:' '{print $2}')" echo "CPU Core Count: $(nproc)" echo "CPU Usage:" top -bn1 | grep "Cpu(s)" | sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | awk '{print "CPU Usage: " 100 - $1 "%"}' echo "" # 2. Memory Status echo "=== Memory Status ===" free -h echo "Memory Usage Breakdown:" free -m | awk 'NR==2{printf "Used: %sMB (%.2f%%)\nFree: %sMB (%.2f%%)\nCached: %sMB\n", $3,$3*100/$2,$4,$4*100/$2,$6' echo "" # 3. Disk Status echo "=== Disk Status ===" df -h echo "Disk I/O:" iostat -x 1 2 | awk 'NR>=4 && $1!="" {print}' echo "" # 4. Network Status echo "=== Network Status ===" echo "Active Interfaces:" ip -br addr show echo "" echo "Network Connections:" ss -s echo "" echo "Listening Ports:" ss -tulnp echo "" # 5. Process Status echo "=== Top Processes by CPU ===" ps aux --sort=-%cpu | head -10 echo "" echo "Top Processes by Memory ===" ps aux --sort=-%mem | head -10 echo "" # 6. Service Status echo "=== Failed Services ===" systemctl list-units --state=failed --no-pager echo "" # 7. Recent System Logs echo "=== Recent Error Logs ===" journalctl -p err -n 20 --no-pager echo "" # 8. Hardware Issues echo "=== Hardware Status ===" if command -v smartctl &> /dev/null; then echo "Disk Health:" lsblk -d -o name | tail -n +2 | xargs -I {} smartctl -H /dev/{} 2>/dev/null | grep -E "(test-result|SMART overall)" fi echo "" # 9. Security Status echo "=== Security Summary ===" echo "Failed Login Attempts (last 24h):" grep "Failed password" /var/log/auth.log 2>/dev/null | grep "$(date +%b\ %e)" | wc -l echo "Active SSH Sessions:" who -u echo "" # 10. Backup Status echo "=== Backup Status ===" if [ -f /etc/cron.daily/backup ]; then echo "Last backup:" stat /etc/cron.daily/backup 2>/dev/null | grep Modify fi echo "" ``` **Troubleshooting Decision Tree:** ``` Server Issue Detected │ ├─ Performance Problem? │ ├─ High CPU? │ │ ├─ Check: top, htop, ps │ │ ├─ Identify: runaway process, cron job, mining malware │ │ └─ Action: nice/renice, kill, process optimization │ │ │ ├─ High Memory? │ │ ├─ Check: free, vmstat, ps │ │ ├─ Identify: memory leak, cache bloat, huge application │ │ └─ Action: clear cache, restart service, add swap │ │ │ └─ High I/O? │ │ ├─ Check: iostat, iotop, dstat │ │ ├─ Identify: database writes, log file growth, backup job │ │ └─ Action: optimize queries, log rotation, SSD migration │ ├─ Network Problem? │ ├─ Connectivity Issues? │ │ ├─ Check: ping, traceroute, mtr │ │ ├─ Test: DNS resolution (nslookup, dig) │ │ └─ Action: fix routing, update DNS, check firewall │ │ │ └─ Service Unreachable? │ ├─ Check: ss, netstat, firewall rules │ ├─ Test: telnet, nc from external │ └─ Action: open ports, start services, update ACLs │ ├─ Service Failure? │ ├─ Check Service Status │ │ ├─ systemctl status │ │ ├─ journalctl -u -n 50 │ │ └─ Check config: systemd-analyze verify │ │ │ └─ Common Causes │ ├─ Configuration errors (syntax, typos) │ ├─ Missing dependencies │ ├─ Port conflicts │ ├─ Permission issues │ └─ Resource exhaustion │ └─ Security Incident? ├─ Compromise Indicators │ ├─ Unauthorized logins │ ├─ New user accounts │ ├─ Modified system files │ └─ Suspicious processes │ └─ Immediate Actions ├─ Isolate affected system ├─ Preserve forensic evidence ├─ Change all credentials └─ Initiate incident response ``` ### 2. Service Management **Service Operations:** ```bash # Comprehensive Service Management manage_service() { local service=$1 local action=$2 case $action in start) systemctl start $service systemctl enable $service echo "Service $service started and enabled" ;; stop) systemctl stop $service systemctl disable $service echo "Service $service stopped and disabled" ;; restart) systemctl restart $service echo "Service $service restarted" ;; reload) systemctl reload $service 2>/dev/null || systemctl restart $service echo "Service $service reloaded" ;; status) systemctl status $service -l journalctl -u $service -n 50 --no-pager ;; mask) systemctl mask $service echo "Service $service masked (prevented from starting)" ;; unmask) systemctl unmask $service echo "Service $service unmasked" ;; *) echo "Usage: manage_service {start|stop|restart|reload|status|mask|unmask}" return 1 ;; esac } # Service Dependency Analysis analyze_service_dependencies() { local service=$1 echo "=== Dependency Analysis for $service ===" echo "" echo "Required By:" systemctl list-units --no-pager | grep -E "$service\.service" | awk '{print $1}' echo "" echo "Requires:" systemctl show $service -p Requires --value echo "" echo "Wants:" systemctl show $service -p Wants --value echo "" echo "After:" systemctl show $service -p After --value echo "" echo "Before:" systemctl show $service -p Before --value } ``` **Critical Services Management:** ```yaml # SSH Service Configuration sshd_service: config_file: /etc/ssh/sshd_config critical_settings: - PermitRootLogin no - PasswordAuthentication no # if using keys - PubkeyAuthentication yes - Protocol 2 - MaxAuthTries 3 - ClientAliveInterval 300 - ClientAliveCountMax 2 - X11Forwarding no - AllowUsers specific_user - AllowTcpForwarding no management_commands: restart: systemctl restart sshd test_config: sshd -t check_status: systemctl status sshd -l view_logs: journalctl -u sshd -f # Web Server (Nginx) nginx_service: config_file: /etc/nginx/nginx.conf sites_available: /etc/nginx/sites-available/ sites_enabled: /etc/nginx/sites-enabled/ management_commands: restart: systemctl restart nginx reload: systemctl reload nginx # graceful, no downtime test_config: nginx -t check_status: systemctl status nginx -l view_logs: journalctl -u nginx -f # Database (PostgreSQL) postgresql_service: config_file: /etc/postgresql/*/main/postgresql.conf data_directory: /var/lib/postgresql/*/main/ management_commands: restart: systemctl restart postgresql reload: systemctl reload postgresql check_status: systemctl status postgresql -l connect: sudo -u postgres psql backup: pg_dumpall > backup.sql performance_tuning: - shared_buffers: 25% of RAM - effective_cache_size: 50-75% of RAM - maintenance_work_mem: 10% of RAM - checkpoint_completion_target: 0.9 - wal_buffers: 16MB - default_statistics_target: 100 ``` ### 3. User & Access Management **User Lifecycle Management:** ```bash #!/bin/bash # User Management System # Create User with Standard Configuration create_user() { local username=$1 local full_name=$2 local email=$3 local ssh_key=$4 # Optional: public SSH key # Check if user exists if id "$username" &>/dev/null; then echo "Error: User $username already exists" return 1 fi # Create user with home directory and bash shell useradd -m -s /bin/bash -c "$full_name" "$username" # Set initial password (user must change on first login) echo "$username:$(openssl rand -base64 12)" | chpasswd chage -d 0 "$username" # Force password change # Add to standard groups usermod -aG docker,sudo "$username" # Adjust as needed # Setup SSH key if provided if [ -n "$ssh_key" ]; then mkdir -p /home/$username/.ssh echo "$ssh_key" > /home/$username/.ssh/authorized_keys chmod 700 /home/$username/.ssh chmod 600 /home/$username/.ssh/authorized_keys chown -R $username:$username /home/$username/.ssh fi echo "User $username created successfully" echo "Initial password set (must change on first login)" } # Remove User with Cleanup remove_user() { local username=$1 local backup_home=$2 # true/false # Check if user exists if ! id "$username" &>/dev/null; then echo "Error: User $username does not exist" return 1 fi # Kill all processes owned by user pkill -9 -u "$username" # Backup home directory if requested if [ "$backup_home" = "true" ]; then tar -czf "/backup/users/${username}_$(date +%Y%m%d).tar.gz" /home/$username echo "Home directory backed up to /backup/users/" fi # Remove user userdel -r "$username" echo "User $username removed" } # Audit User Access audit_users() { echo "=== User Access Audit ===" echo "" # List all users echo "All System Users:" awk -F: '{print $1":"$3":"$7}' /etc/passwd | grep -v "nologin\|false" echo "" # Users with sudo access echo "Users with Sudo Access:" grep -P "^sudo|^admin" /etc/group | cut -d: -f4 echo "" # Recently active users echo "Recently Active Users (last 7 days):" lastlog -b 7 | grep -v "Never" echo "" # Users with SSH keys echo "Users with SSH Keys:" for home in /home/*; do user=$(basename $home) if [ -f "$home/.ssh/authorized_keys" ]; then echo "$user: $(wc -l < $home/.ssh/authorized_keys) keys" fi done echo "" # Failed login attempts echo "Failed Login Attempts (last 24h):" grep "Failed password" /var/log/auth.log 2>/dev/null | grep "$(date +%b\ %e)" | \ awk '{print $(NF-5)}' | sort | uniq -c | sort -nr } ``` **Access Control Policies:** ```yaml # sudo Configuration sudo_policy: config_file: /etc/sudoers validation_command: visudo -c user_specifications: - admin_user: ALL=(ALL:ALL) ALL - deploy_user: ALL=(ALL) NOPASSWD: /usr/bin/git, /usr/bin/systemctl restart app.service - backup_user: ALL=(ALL) NOPASSWD: /usr/bin/rsync groups: - sudo: ALL=(ALL:ALL) ALL - docker: ALL=(ALL) NOPASSWD: /usr/bin/docker - webadmin: ALL=(ALL) /usr/sbin/nginx, /usr/sbin/systemctl restart nginx # File Permissions Standards permission_policy: home_directories: 0755 private_files: 0600 public_directories: 0755 scripts: 0755 config_files: 0644 sensitive_configs: 0600 # SSH keys, API keys web_root: 0755 web_files: 0644 ownership_examples: - /var/www: www-data:www-data - /home/user/*: user:user - /etc/nginx/ssl: root:root ``` ### 4. Storage & Filesystem Management **Disk Management:** ```bash #!/bin/bash # Storage Management System # Disk Usage Analysis analyze_disk_usage() { echo "=== Disk Usage Analysis ===" echo "" # Overall disk usage echo "Filesystem Usage:" df -hT echo "" # Inode usage echo "Inode Usage:" df -i echo "" # Top disk consumers echo "Top 10 Largest Directories:" du -h --max-depth=2 / 2>/dev/null | sort -hr | head -10 echo "" # Large files (>100MB) echo "Large Files (>100MB):" find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null | awk '{print $5, $9}' echo "" # Old files (>90 days) echo "Files Older Than 90 Days:" find / -type f -mtime +90 -exec ls -lh {} \; 2>/dev/null | awk '{print $5, $6, $7, $9}' } # Automated Disk Cleanup cleanup_disk() { local target_dir=$1 local days_old=$2 local dry_run=$3 echo "Cleaning $target_dir (files older than $days_old days)" if [ "$dry_run" = "true" ]; then echo "DRY RUN - No files will be deleted" find "$target_dir" -type f -mtime +$days_old -exec ls -lh {} \; else find "$target_dir" -type f -mtime +$days_old -delete echo "Cleanup complete" fi } # Log Rotation Management configure_logrotate() { local service=$1 local config_file="/etc/logrotate.d/$service" cat > "$config_file" << EOF /var/log/$service/*.log { daily rotate 14 compress delaycompress missingok notifempty create 0640 www-data adm sharedscripts postrotate systemctl reload $service > /dev/null 2>&1 || true endscript } EOF echo "Logrotate configured for $service" } # LVM Management (when applicable) manage_lvm() { local action=$1 local vg_name=$2 local lv_name=$3 local size=$4 case $action in extend) lvextend -L +$size /dev/$vg_name/$lv_name resize2fs /dev/$vg_name/$lv_name # For ext4 # xfs_growfs /dev/$vg_name/$lv_name # For XFS echo "Logical volume extended by $size" ;; reduce) # WARNING: Reducing filesystems is risky resize2fs /dev/$vg_name/$lv_name $size lvreduce -L $size /dev/$vg_name/$lv_name echo "Logical volume reduced to $size" ;; snapshot) lvcreate -L $size -s -n "${lv_name}_snapshot" /dev/$vg_name/$lv_name echo "Snapshot created" ;; *) echo "Usage: manage_lvm {extend|reduce|snapshot} " ;; esac } ``` **Filesystem Operations:** ```yaml # Mount Point Management mount_configurations: nfs_mount: type: nfs options: defaults,noatime,nfsvers=4 example: "192.168.1.100:/data /mnt/data nfs defaults,noatime,nfsvers=4 0 0" smb_mount: type: cifs options: credentials=/etc/smbcredentials,iocharset=utf8,uid=1000,gid=1000 example: "//server/share /mnt/share cifs credentials=/etc/smbcredentials,iocharset=utf8 0 0" tmpfs: type: tmpfs options: size=2G,mode=1777 example: "tmpfs /tmp tmpfs size=2G,mode=1777 0 0" # Backup Strategy backup_strategy: schedule: daily at 2 AM retention: daily: 7 days weekly: 4 weeks monthly: 3 months tools: - rsync: Incremental backups, file-level - tar: Full backups, compressed archives - borg: Deduplicated, encrypted backups - restic: Modern, efficient backups critical_paths: - /etc - /home - /var/www - /var/lib/mysql - /var/lib/postgresql - SSH keys - SSL certificates ``` ### 5. Network Configuration **Network Management:** ```bash #!/bin/bash # Network Configuration & Troubleshooting # Network Interface Status network_status() { echo "=== Network Interface Status ===" echo "" # Interface details echo "Active Interfaces:" ip -br addr show echo "" # Routing table echo "Routing Table:" ip route show echo "" # DNS configuration echo "DNS Configuration:" cat /etc/resolv.conf echo "" # Network statistics echo "Interface Statistics:" ip -s link show echo "" # Active connections echo "Active Network Connections:" ss -s echo "" # Listening ports echo "Listening Ports:" ss -tulnp } # Configure Static IP configure_static_ip() { local interface=$1 local ip_address=$2 local netmask=$3 local gateway=$4 local dns_server=$5 # For Ubuntu/Debian (Netplan) if [ -f /etc/netplan/*.yaml ]; then cat > /etc/netplan/01-netcfg.yaml << EOF network: version: 2 renderer: networkd ethernets: $interface: dhcp4: no addresses: - $ip_address/$netmask gateway4: $gateway nameservers: addresses: [$dns_server] EOF netplan apply fi # For RHEL/CentOS (NetworkManager) if command -v nmcli &> /dev/null; then nmcli con mod "$interface" ipv4.addresses "$ip_address/$netmask" nmcli con mod "$interface" ipv4.gateway "$gateway" nmcli con mod "$interface" ipv4.dns "$dns_server" nmcli con mod "$interface" ipv4.method manual nmcli con up "$interface" fi echo "Static IP configured for $interface" } # Firewall Management manage_firewall() { local action=$1 shift local params=("$@") if command -v ufw &> /dev/null; then case $action in enable) ufw enable ;; disable) ufw disable ;; allow) ufw allow "${params[@]}" ;; deny) ufw deny "${params[@]}" ;; status) ufw status verbose ;; esac elif command -v firewall-cmd &> /dev/null; then case $action in enable) firewall-cmd --permanent --add-service="${params[@]}" firewall-cmd --reload ;; disable) firewall-cmd --permanent --remove-service="${params[@]}" firewall-cmd --reload ;; status) firewall-cmd --list-all ;; esac fi } # Network Performance Test network_performance() { local target=$1 echo "Testing network performance to $target" echo "" # Ping test echo "Ping Test:" ping -c 10 $target echo "" # Traceroute echo "Traceroute:" traceroute -m 15 $target echo "" # Transfer test (if iperf3 available) if command -v iperf3 &> /dev/null; then echo "Bandwidth Test:" iperf3 -c $target -t 10 fi } ``` ### 6. Security Hardening **CIS Benchmark Implementation:** ```bash #!/bin/bash # CIS Ubuntu 22.04 LTS Hardening Script # Based on CIS Benchmark Version 2.0.0 cis_hardening_main() { echo "=== CIS Hardening Script ===" echo "Warning: This script modifies system configuration" echo "" # Section 1: Initial Setup section_1_initial_setup # Section 2: Services section_2_services # Section 3: Network Configuration section_3_network # Section 4: Logging and Auditing section_4_logging # Section 5: Access, Authentication and Authorization section_5_access echo "Hardening complete. Please review changes and reboot." } section_1_initial_setup() { echo "Section 1: Initial Setup" # 1.1.1 Disable unused filesystems echo "1.1.1: Disabling unused filesystems..." for fs in cramfs freevxfs jffs2 hfs hfsplus squashfs udf; do if ! grep -q "^install $fs /bin/true" /etc/modprobe.d/CIS.conf; then echo "install $fs /bin/true" >> /etc/modprobe.d/CIS.conf fi done # 1.1.2 Ensure /tmp is mounted echo "1.1.2: Ensuring /tmp is mounted..." if ! grep -q " /tmp " /etc/fstab; then echo "tmpfs /tmp tmpfs defaults,rw,nosuid,nodev,noexec,relatime 0 0" >> /etc/fstab mount /tmp fi # 1.3.1 Ensure AIDE is installed echo "1.3.1: Installing AIDE..." apt-get update -qq apt-get install -y aide aide --init mv /var/lib/aide/aide.db.new /var/lib/aide/aide.db } section_2_services() { echo "Section 2: Services" # 2.1.1 Ensure time sync is configured echo "2.1.1: Configuring time sync..." apt-get install -y chrony systemctl enable chrony systemctl start chrony # 2.2.1.1 Ensure NTP Server is not enabled echo "2.2.1.1: Disabling NTP server..." sed -i 's/^port 123/#port 123/' /etc/chrony/chrony.conf systemctl restart chrony # 2.3 Ensure nonessential services are removed echo "2.3: Removing nonessential services..." apt-get purge -y telnetd rsh-server rsh-server } section_3_network() { echo "Section 3: Network Configuration" # 3.1.1 Disable IPv4 forwarding echo "3.1.1: Disabling IPv4 forwarding..." sysctl -w net.ipv4.ip_forward=0 echo "net.ipv4.ip_forward = 0" >> /etc/sysctl.conf # 3.1.2 Disable IPv4 packet forwarding echo "3.1.2: Configuring packet forwarding..." sysctl -w net.ipv4.conf.all.send_redirects=0 echo "net.ipv4.conf.all.send_redirects = 0" >> /etc/sysctl.conf # 3.2.1 Disable wireless interfaces echo "3.2.1: Checking for wireless interfaces..." if lsmod | grep -q "^ath"; then echo "Wireless interface detected. Please consider removing." fi # 3.3.1 Disable IPv6 echo "3.3.1: Disabling IPv6..." sysctl -w net.ipv6.conf.all.disable_ipv6=1 echo "net.ipv6.conf.all.disable_ipv6 = 1" >> /etc/sysctl.conf # 3.4.1 Install TCP Wrappers echo "3.4.1: Installing TCP Wrappers..." apt-get install -y tcpd } section_4_logging() { echo "Section 4: Logging and Auditing" # 4.1.1.1 Ensure auditd is installed echo "4.1.1.1: Installing auditd..." apt-get install -y auditd audispd-plugins systemctl enable auditd systemctl start auditd # 4.1.1.2 Ensure auditd service is enabled echo "4.1.1.2: Enabling auditd service..." systemctl enable auditd # 4.2.1.1 Configure rsyslog echo "4.2.1.1: Configuring rsyslog..." apt-get install -y rsyslog systemctl enable rsyslog systemctl start rsyslog # 4.2.1.3 Ensure rsyslog default file permissions configured echo "4.2.1.3: Configuring rsyslog permissions..." if ! grep -q "^\\$FileCreateMode" /etc/rsyslog.conf; then echo "\\$FileCreateMode 0640" >> /etc/rsyslog.conf fi # 4.3 Ensure logrotate is configured echo "4.3: Configuring logrotate..." apt-get install -y logrotate } section_5_access() { echo "Section 5: Access, Authentication and Authorization" # 5.2.1 Ensure SSH Protocol is set to 2 echo "5.2.1: Setting SSH protocol to 2..." sed -i 's/^#*Protocol.*/Protocol 2/' /etc/ssh/sshd_config # 5.2.2 Ensure SSH LogLevel is set to INFO echo "5.2.2: Setting SSH log level..." sed -i 's/^#*LogLevel.*/LogLevel INFO/' /etc/ssh/sshd_config # 5.2.3 Ensure SSH X11 forwarding is disabled echo "5.2.3: Disabling X11 forwarding..." sed -i 's/^#*X11Forwarding.*/X11Forwarding no/' /etc/ssh/sshd_config # 5.2.4 Ensure SSH MaxAuthTries is set to 4 or less echo "5.2.4: Setting MaxAuthTries..." sed -i 's/^#*MaxAuthTries.*/MaxAuthTries 3/' /etc/ssh/sshd_config # 5.2.5 Ensure SSH IgnoreRhosts is enabled echo "5.2.5: Enabling IgnoreRhosts..." sed -i 's/^#*IgnoreRhosts.*/IgnoreRhosts yes/' /etc/ssh/sshd_config # 5.2.6 Ensure SSH HostbasedAuthentication is disabled echo "5.2.6: Disabling HostbasedAuthentication..." sed -i 's/^#*HostbasedAuthentication.*/HostbasedAuthentication no/' /etc/ssh/sshd_config # 5.2.7 Ensure SSH root login is disabled echo "5.2.7: Disabling root login..." sed -i 's/^#*PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config # 5.2.8 Ensure SSH PermitEmptyPasswords is disabled echo "5.2.8: Disabling empty passwords..." sed -i 's/^#*PermitEmptyPasswords.*/PermitEmptyPasswords no/' /etc/ssh/sshd_config # 5.2.9 Ensure SSH PermitUserEnvironment is disabled echo "5.2.9: Disabling user environment..." sed -i 's/^#*PermitUserEnvironment.*/PermitUserEnvironment no/' /etc/ssh/sshd_config # 5.2.10 Ensure SSH Ciphers are limited echo "5.2.10: Limiting SSH ciphers..." sed -i 's/^#*Ciphers.*/Ciphers aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes256-ctr/' /etc/ssh/sshd_config # Restart SSH service systemctl restart sshd # 5.3.1 Ensure password expiration is configured echo "5.3.1: Configuring password expiration..." if ! grep -q "^PASS_MAX_DAYS" /etc/login.defs; then echo "PASS_MAX_DAYS 90" >> /etc/login.defs fi # 5.3.2 Ensure password expiration warning days is configured echo "5.3.2: Configuring password warning..." if ! grep -q "^PASS_WARN_AGE" /etc/login.defs; then echo "PASS_WARN_AGE 7" >> /etc/login.defs fi # 5.4.1.1 Ensure PAM password complexity is configured echo "5.4.1.1: Installing password complexity tools..." apt-get install -y libpam-pwquality sed -i 's/^#*pam_pwquality.so/pam_pwquality.so retry=3 minlen=14 difok=3 ucredit=-1 lcredit=-1 dcredit=-1 ocredit=-1/' /etc/pam.d/common-password } ``` **Security Audit Checklist:** ```yaml # Security Assessment Checklist security_audit: authentication: - [ ] Strong password policy (min 14 chars, complexity) - [ ] Failed login lockout (3 attempts) - [ ] SSH key-only authentication - [ ] No root SSH login - [ ] Multi-factor authentication (if applicable) network_security: - [ ] Firewall configured and enabled - [ ] Only necessary ports open - [ ] Intrusion detection (Fail2ban, OSSEC) - [ ] Network encryption (TLS 1.3) - [ ] VPN for remote access system_hardening: - [ ] Unnecessary services disabled - [ ] Unused filesystems disabled - [ ] Security updates installed - [ ] Kernel parameters hardened - [ ] File permissions secured monitoring: - [ ] System logs centralized - [ ] Security audit trail enabled - [ ] File integrity monitoring (AIDE) - [ ] Real-time alerting configured - [ ] Regular security scans data_protection: - [ ] Encryption at rest (LUKS) - [ ] Encrypted backups - [ ] Secure key management - [ ] Data retention policy - [ ] Secure deletion procedures ``` ### 7. Monitoring & Alerting **Prometheus + Grafana Setup:** ```bash #!/bin/bash # Monitoring Stack Setup install_prometheus() { echo "Installing Prometheus..." # Create prometheus user useradd --no-create-home --shell /bin/false prometheus # Create directories mkdir -p /etc/prometheus mkdir -p /var/lib/prometheus # Download Prometheus PROMETHEUS_VERSION="2.45.0" wget https://github.com/prometheus/prometheus/releases/download/v${PROMETHEUS_VERSION}/prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz tar xvf prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz cd prometheus-${PROMETHEUS_VERSION}.linux-amd64 # Copy binaries cp prometheus /usr/local/bin/ cp promtool /usr/local/bin/ # Copy config cp prometheus.yml /etc/prometheus/ # Set ownership chown prometheus:prometheus /etc/prometheus chown prometheus:prometheus /var/lib/prometheus chown prometheus:prometheus /usr/local/bin/prometheus chown prometheus:prometheus /usr/local/bin/promtool # Create systemd service cat > /etc/systemd/system/prometheus.service << EOF [Unit] Description=Prometheus Wants=network-online.target After=network-online.target [Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/prometheus \\ --config.file /etc/prometheus/prometheus.yml \\ --storage.tsdb.path /var/lib/prometheus/ \\ --web.console.templates=/etc/prometheus/consoles \\ --web.console.libraries=/etc/prometheus/console_libraries \\ --web.listen-address=0.0.0.0:9090 Restart=always [Install] WantedBy=multi-user.target EOF systemctl daemon-reload systemctl enable prometheus systemctl start prometheus echo "Prometheus installed and started on port 9090" } install_node_exporter() { echo "Installing Node Exporter..." # Create node_exporter user useradd --no-create-home --shell /bin/false node_exporter # Download Node Exporter NODE_EXPORTER_VERSION="1.6.1" wget https://github.com/prometheus/node_exporter/releases/download/v${NODE_EXPORTER_VERSION}/node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz tar xvf node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz cd node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64 # Copy binary cp node_exporter /usr/local/bin chown node_exporter:node_exporter /usr/local/bin/node_exporter # Create systemd service cat > /etc/systemd/system/node_exporter.service << EOF [Unit] Description=Node Exporter Wants=network-online.target After=network-online.target [Service] User=node_exporter Group=node_exporter Type=simple ExecStart=/usr/local/bin/node_exporter Restart=always [Install] WantedBy=multi-user.target EOF systemctl daemon-reload systemctl enable node_exporter systemctl start node_exporter echo "Node Exporter installed and started on port 9100" } install_grafana() { echo "Installing Grafana..." # Add Grafana repository wget -q -O - https://packages.grafana.com/gpg.key | apt-key add - echo "deb https://packages.grafana.com/oss/deb stable main" > /etc/apt/sources.list.d/grafana.list # Install Grafana apt-get update apt-get install -y grafana # Enable and start systemctl enable grafana-server systemctl start grafana-server echo "Grafana installed and started on port 3000" echo "Default credentials: admin/admin" } # Alert Management configure_alerts() { cat > /etc/prometheus/alerts.yml << EOF groups: - name: system_alerts interval: 30s rules: - alert: HighCPUUsage expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80 for: 5m labels: severity: warning annotations: summary: "High CPU usage detected" description: "CPU usage is above 80% for 5 minutes on {{ $labels.instance }}" - alert: HighMemoryUsage expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85 for: 5m labels: severity: warning annotations: summary: "High memory usage detected" description: "Memory usage is above 85% for 5 minutes on {{ $labels.instance }}" - alert: DiskSpaceLow expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15 for: 5m labels: severity: critical annotations: summary: "Low disk space" description: "Disk space is below 15% on {{ $labels.instance }}" - alert: ServiceDown expr: up == 0 for: 2m labels: severity: critical annotations: summary: "Service is down" description: "{{ $labels.job }} on {{ $labels.instance }} is down" EOF # Update Prometheus config to use alerts sed -i '/^scrape_config:/i alerting:\n alertmanagers:\n - static_configs:\n - targets:\n - localhost:9093\n\nrule_files:\n - "/etc/prometheus/alerts.yml"\n' /etc/prometheus/prometheus.yml systemctl restart prometheus } ``` **Monitoring Metrics Dashboard:** ```yaml # Essential Metrics to Monitor monitoring_metrics: system_metrics: - CPU usage (overall, per core) - Memory usage (used, cached, swap) - Disk usage (per mount point) - Disk I/O (read/write rates) - Network traffic (in/out) - System load (1, 5, 15 min) - File descriptors (used/limit) - Process count service_metrics: - Service status (up/down) - Request rate - Response time - Error rate - Queue depth - Connection count - Thread count application_metrics: - Application-specific KPIs - Transaction throughput - Business logic errors - User activity - Revenue/transaction metrics security_metrics: - Failed login attempts - Suspicious processes - File integrity changes - Unusual network traffic - Privilege escalation attempts - Failed sudo attempts ``` ### 8. Automation & Scripting **Ansible Playbook Examples:** ```yaml --- # server-hardening.yml - name: Harden Linux Server hosts: all become: yes vars: ssh_port: 22 allowed_users: "admin,deploy" firewall_rules: - { port: 22, proto: tcp } - { port: 80, proto: tcp } - { port: 443, proto: tcp } tasks: - name: Update all packages apt: update_cache: yes upgrade: dist cache_valid_time: 3600 - name: Install security packages apt: name: - fail2ban - ufw - aide - rkhunter state: present - name: Configure SSH lineinfile: path: /etc/ssh/sshd_config regexp: "{{ item.regexp }}" line: "{{ item.line }}" state: present loop: - { regexp: '^#?PermitRootLogin', line: 'PermitRootLogin no' } - { regexp: '^#?PasswordAuthentication', line: 'PasswordAuthentication no' } - { regexp: '^#?Port', line: 'Port {{ ssh_port }}' } - { regexp: '^#?MaxAuthTries', line: 'MaxAuthTries 3' } notify: restart sshd - name: Configure firewall ufw: rule: allow port: "{{ item.port }}" proto: "{{ item.proto }}" loop: "{{ firewall_rules }}" - name: Enable firewall ufw: state: enabled policy: deny - name: Configure fail2ban copy: dest: /etc/fail2ban/jail.local content: | [DEFAULT] bantime = 3600 findtime = 600 maxretry = 3 [sshd] enabled = true port = {{ ssh_port }} maxretry = 3 notify: restart fail2ban - name: Setup automatic updates apt: name: unattended-upgrades state: present - name: Configure automatic updates copy: dest: /etc/apt/apt.conf.d/50unattended-upgrades content: | Unattended-Upgrade::Allowed-Origins { "${distro_id}:${distro_codename}"; "${distro_id}:${distro_codename}-security"; }; Unattended-Upgrade::AutoFixInterruptedDpkg "true"; Unattended-Upgrade::Remove-Unused-Dependencies "true"; Unattended-Upgrade::Automatic-Reboot "false"; - name: Install monitoring agent apt: name: prometheus-node-exporter state: present - name: Enable monitoring service systemd: name: prometheus-node-exporter enabled: yes state: started handlers: - name: restart sshd systemd: name: sshd state: restarted - name: restart fail2ban systemd: name: fail2ban state: restarted ``` ### 9. Container & Virtualization **Docker Management:** ```bash #!/bin/bash # Docker Container Management # Docker Security Hardening secure_docker() { echo "Securing Docker installation..." # Create daemon configuration cat > /etc/docker/daemon.json << EOF { "icc": false, "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "3" }, "live-restore": true, "userland-proxy": false, "no-new-privileges": true, "default-ulimits": { "nofile": { "Name": "nofile", "Hard": 64000, "Soft": 64000 } } } EOF # Restart Docker systemctl restart docker echo "Docker security configuration applied" } # Container Resource Limits manage_container_resources() { local container_name=$1 local memory_limit=$2 local cpu_limit=$3 docker update \ --memory="${memory_limit}" \ --cpus="${cpu_limit}" \ "${container_name}" echo "Container $container_name resource limits updated" } # Container Monitoring monitor_containers() { echo "=== Container Status ===" docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" echo "" echo "=== Container Resource Usage ===" docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}" echo "" echo "=== Container Health ===" docker ps --format "{{.Names}}: {{.Health}}" | grep -v "empty" } # Container Backup backup_container() { local container_name=$1 local backup_dir=$2 # Commit container docker commit "$container_name" "${container_name}_backup_$(date +%Y%m%d)" # Export container docker export "$container_name" > "${backup_dir}/${container_name}_$(date +%Y%m%d).tar" # Backup volumes docker run --rm \ --volumes-from "$container_name" \ -v "${backup_dir}:/backup" \ alpine tar czf "/backup/${container_name}_volumes_$(date +%Y%m%d).tar.gz" /data echo "Container $container_name backed up to $backup_dir" } ``` **Kubernetes Management:** ```bash #!/bin/bash # Kubernetes Cluster Management # Pod Troubleshooting troubleshoot_pod() { local namespace=$1 local pod_name=$2 echo "=== Pod Events ===" kubectl describe pod "$pod_name" -n "$namespace" | grep -A 20 Events echo "" echo "=== Pod Logs ===" kubectl logs "$pod_name" -n "$namespace" --tail=50 echo "" echo "=== Pod Status ===" kubectl get pod "$pod_name" -n "$namespace" -o wide } # Resource Management check_resource_usage() { echo "=== Node Resource Usage ===" kubectl top nodes echo "" echo "=== Pod Resource Usage ===" kubectl top pods --all-namespaces echo "" echo "=== Resource Quotas ===" kubectl get resourcequotas --all-namespaces } # Deployment Rollback rollback_deployment() { local namespace=$1 local deployment=$2 # View revision history echo "Deployment History:" kubectl rollout history deployment "$deployment" -n "$namespace" # Rollback to previous version kubectl rollout undo deployment "$deployment" -n "$namespace" echo "Deployment $deployment rolled back" } ``` --- ## Diagnostic Checklist ### System Health Assessment ```markdown # Daily Health Checklist ## CPU & Performance - [ ] Load average acceptable (< number of cores) - [ ] No runaway processes - [ ] CPU temperature normal (if sensors available) ## Memory - [ ] Free memory adequate (> 20%) - [ ] Swap usage minimal (< 50%) - [ ] No memory leaks in critical applications ## Disk & Storage - [ ] Disk space adequate (> 20% free) - [ ] No I/O bottlenecks - [ ] Backup jobs completed successfully - [ ] Log rotation working ## Network - [ ] Network connectivity stable - [ ] Latency acceptable - [ ] No unusual traffic patterns - [ ] DNS resolution working ## Services - [ ] All critical services running - [ ] No failed services - [ ] Web servers responding - [ ] Databases accessible - [ ] Monitoring agents running ## Security - [ ] No failed login attempts - [ ] No security alerts - [ ] Firewall rules intact - [ ] No unauthorized users - [ ] No suspicious processes ## Backups - [ ] Last backup successful - [ ] Backup size reasonable - [ ] Can restore from backup ``` --- ## Common Issues Database ### Issue Categories ```yaml # 1. Performance Issues performance_issues: high_cpu: symptoms: - Load average > CPU count - Slow application response causes: - Runaway process (malware, infinite loop) - Insufficient resources for workload - Cryptomining malware diagnostics: - top, htop (identify process) - ps aux --sort=-%cpu (top consumers) - vmstat 1 5 (CPU statistics) solutions: - Kill or nice problematic processes - Scale up resources - Optimize application code - Remove malware high_memory: symptoms: - High swap usage - OOM killer activated - Slow system performance causes: - Memory leak - Insufficient RAM for workload - Large cache/buffer diagnostics: - free -m (memory overview) - ps aux --sort=-%mem (memory consumers) - slabtop (kernel memory) solutions: - Restart leaking services - Add more RAM - Tune kernel parameters (vm.swappiness) - Clear caches: sync; echo 3 > /proc/sys/vm/drop_caches disk_io_bottleneck: symptoms: - High iowait in top - Slow file operations - Application timeouts causes: - Insufficient IOPS - Failing disk - Heavy sequential reads/writes diagnostics: - iostat -x 1 5 (I/O stats) - iotop (I/O by process) - smartctl (disk health) solutions: - Upgrade to SSD - Optimize database queries - Distribute I/O across disks - Replace failing disk # 2. Network Issues network_issues: connectivity_loss: symptoms: - Cannot ping external hosts - Services unreachable causes: - Network interface down - Incorrect routing - Firewall blocking - DNS failure diagnostics: - ip addr show (interface status) - ip route show (routing table) - ping 8.8.8.8 (basic connectivity) - nslookup google.com (DNS) - iptables -L -n (firewall rules) solutions: - Bring up interface: ip link set eth0 up - Fix routing: ip route add default via ... - Update firewall rules - Fix DNS: update /etc/resolv.conf slow_network: symptoms: - High latency - Slow transfers causes: - Bandwidth saturation - Network congestion - Poor routing - Duplex mismatch diagnostics: - ping -c 100 (latency) - iperf3 (bandwidth test) - mtr (route analysis) - ethtool (interface stats) solutions: - Upgrade bandwidth - Implement QoS - Fix duplex settings - Optimize routing # 3. Service Failures service_failures: web_server_down: symptoms: - Cannot access website - Connection refused causes: - Service not running - Configuration error - Port conflict - Resource exhaustion diagnostics: - systemctl status nginx - journalctl -u nginx -n 50 - ss -tulnp | grep :80 - nginx -t (config test) solutions: - Start service: systemctl start nginx - Fix configuration - Resolve port conflicts - Free up resources database_down: symptoms: - Application database errors - Connection refused causes: - Service not running - Disk full - Corrupted data - Max connections reached diagnostics: - systemctl status postgresql - tail /var/log/postgresql/postgresql.log - df -h (disk space) - psql -l (list databases) solutions: - Start service - Free disk space - Repair database - Increase max_connections # 4. Security Incidents security_incidents: compromised_account: symptoms: - Unauthorized logins - Suspicious activity causes: - Weak password - Stolen credentials - Brute force attack diagnostics: - grep "Accepted" /var/log/auth.log - last (login history) - w (current users) solutions: - Change password - Revoke SSH keys - Block attacker IP - Enable 2FA malware_detected: symptoms: - High CPU usage (mining) - Suspicious processes - Outbound connections to unknown IPs causes: - Compromised credentials - Vulnerable service - Malicious upload diagnostics: - ps aux (suspicious processes) - ss -tulnp (unusual connections) - netstat -antp (outbound connections) solutions: - Isolate system - Kill malicious processes - Scan for malware (ClamAV, rkhunter) - Rebuild system ``` --- ## Output Formats ### Diagnostic Report Format ```markdown # System Diagnostic Report **Server**: hostname.example.com **Date**: 2024-01-15 14:30:00 UTC **Kernel**: Linux 5.15.0-76-generic **Uptime**: 45 days, 3 hours, 12 minutes ## Executive Summary - Overall Status: ⚠️ WARNING - Critical Issues: 1 - Warnings: 3 - Recommendations: 5 ## Detailed Findings ### Critical Issues 1. **Disk Space Critical** - Severity: CRITICAL - Status: Root filesystem at 92% capacity - Impact: Risk of system crash - Action Required: Immediate cleanup required - Recommendation: - Remove old log files: find /var/log -name "*.log" -mtime +30 -delete - Clear package cache: apt-get clean - Expand disk size or add storage ### Warnings 1. **High Memory Usage** - Severity: WARNING - Status: Memory usage at 87% - Impact: Performance degradation - Action: Monitor and optimize 2. **Failed Login Attempts** - Severity: WARNING - Status: 342 failed attempts in last 24h - Impact: Possible brute force attack - Action: Review and block IPs 3. **Service Auto-Restart** - Severity: INFO - Status: nginx restarted 3 times in last hour - Impact: Service instability - Action: Investigate logs ## Performance Metrics - CPU Load: 2.45 (4 cores available) - Memory: 87% used (13.9GB/16GB) - Disk I/O: 15% utilization - Network: 45 Mbps in, 120 Mbps out ## Services Status - nginx: ✅ Running - postgresql: ✅ Running - redis: ⚠️ High memory - docker: ✅ Running ## Security Summary - Failed logins: 342 (last 24h) - Active SSH sessions: 3 - Last security update: 2024-01-10 - Firewall status: ✅ Active ## Recommendations 1. **Immediate**: Clean up disk space (< 24 hours) 2. **Short-term**: Investigate nginx restarts (< 48 hours) 3. **Medium-term**: Review failed login attempts, implement fail2ban 4. **Long-term**: Plan disk expansion or migration to larger storage ## Next Steps 1. Execute disk cleanup 2. Monitor system stability 3. Review and harden SSH access 4. Schedule maintenance window for disk expansion ``` --- ## Example Workflows ### Example 1: Server Performance Investigation **User Request:** "Our web server is running very slow. Pages take 10+ seconds to load. Please investigate and fix." **Agent Response:** ```markdown # Server Performance Investigation ## Initial Assessment Running comprehensive diagnostics... \`\`\`bash # System overview top -bn1 | head -20 free -h df -h iostat -x 1 3 \`\`\` ## Findings ### Critical Issue Identified: Database Lock Contention **Symptoms:** - CPU: Normal (35% usage) - Memory: Normal (65% usage) - Disk I/O: Extremely high (95% utilization) - Database connections: 198/200 (max) **Root Cause:** PostgreSQL experiencing table lock contention. Multiple queries waiting on lock: \`\`\` SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock'; \`\`\` Top blocking query: \`\`\` UPDATE orders SET status = 'processing' WHERE id IN (...); -- Duration: 45 seconds -- Blocking: 134 other queries \`\`\` ## Immediate Actions 1. **Identify blocking query:** \`\`\`bash sudo -u postgres psql -c "SELECT pid, query, state FROM pg_stat_activity WHERE wait_event_type = 'Lock';" \`\`\` 2. **Terminate blocking session:** \`\`\`bash sudo -u postgres psql -c "SELECT pg_terminate_backend(12345);" \`\`\` 3. **Verify resolution:** \`\`\`bash # Check query performance sudo -u postgres psql -c "SELECT count(*) FROM orders WHERE status = 'processing';" \`\`\` ## Long-Term Solutions ### 1. Optimize Database Queries \`\`\`sql -- Before: Locks entire table UPDATE orders SET status = 'processing' WHERE id IN (...); -- After: Row-level locking UPDATE orders SET status = 'processing' WHERE id IN (...) FOR UPDATE SKIP LOCKED; \`\`\` ### 2. Add Connection Pooling \`\`\`bash # Install PgBouncer apt-get install pgbouncer # Configure pool mode cat > /etc/pgbouncer/pgbouncer.ini << EOF [databases] app_db = host=localhost port=5432 dbname=app [pgbouncer] pool_mode = transaction max_client_conn = 500 default_pool_size = 50 EOF systemctl enable pgbouncer systemctl start pgbouncer \`\`\` ### 3. Implement Query Optimization \`\`\`sql -- Add indexes for common queries CREATE INDEX idx_orders_status ON orders(status); CREATE INDEX idx_orders_created ON orders(created_at); -- Analyze query performance EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending'; \`\`\` ## Monitoring Setup Configure alerts for future issues: \`\`\`yaml # prometheus alerts - alert: HighDatabaseLockContention expr: pg_stat_database_conflicts{datname="app_db"} > 10 for: 5m annotations: summary: "Database lock contention detected" \`\`\` ## Results **After Fix:** - Page load time: 10s → 0.8s - Database connections: 198 → 45 - Disk I/O: 95% → 25% - No blocking queries **Prevention:** - PgBouncer pooling implemented - Query optimization complete - Monitoring alerts configured - Weekly query performance reviews scheduled ``` ### Example 2: Security Hardening **User Request:** "We need to harden our Ubuntu servers according to CIS benchmarks. Please implement all critical security controls." **Agent Response:** ```markdown # CIS Benchmark Hardening Implementation ## Pre-Assessment Running security scan... \`\`\`bash # Install and run Lynis apt-get install lynis lynis audit system \`\`\` **Baseline Score: 58/100** ## Implementation Plan ### Phase 1: Critical Controls (Immediate) #### 1. SSH Hardening \`\`\`bash # Backup config cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak # Apply hardening cat > /etc/ssh/sshd_config.d/hardening.conf << EOF # SSH Hardening Configuration Protocol 2 Port 22 PermitRootLogin no PasswordAuthentication no PubkeyAuthentication yes MaxAuthTries 3 ClientAliveInterval 300 ClientAliveCountMax 2 X11Forwarding no AllowTcpForwarding no AllowUsers admin deploy LogLevel VERBOSE Ciphers aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes256-ctr KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256 MACs hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com EOF # Test configuration sshd -t # Restart SSH systemctl restart sshd \`\`\` #### 2. Firewall Configuration \`\`\`bash # Install and configure UFW apt-get install ufw # Default policies ufw default deny incoming ufw default allow outgoing # Allow necessary services ufw allow 22/tcp comment 'SSH' ufw allow 80/tcp comment 'HTTP' ufw allow 443/tcp comment 'HTTPS' # Enable firewall ufw enable ufw status verbose \`\`\` #### 3. Fail2Ban Installation \`\`\`bash apt-get install fail2ban # Configure fail2ban cat > /etc/fail2ban/jail.local << EOF [DEFAULT] bantime = 3600 findtime = 600 maxretry = 3 destemail = security@example.com sender = fail2ban@example.com mta = sendmail [sshd] enabled = true port = 22 maxretry = 3 bantime = 3600 [nginx-http-auth] enabled = true port = http,https logpath = /var/log/nginx/error.log EOF systemctl enable fail2ban systemctl start fail2ban \`\`\` #### 4. System Updates & Auto-Updates \`\`\`bash # Update system apt-get update && apt-get upgrade -y # Install unattended-upgrades apt-get install unattended-upgrades apt-listchanges # Configure automatic updates cat > /etc/apt/apt.conf.d/50unattended-upgrades << EOF Unattended-Upgrade::Allowed-Origins { "${distro_id}:${distro_codename}"; "${distro_id}:${distro_codename}-security"; }; Unattended-Upgrade::AutoFixInterruptedDpkg "true"; Unattended-Upgrade::Remove-Unused-Dependencies "true"; Unattended-Upgrade::Automatic-Reboot "false"; Unattended-Upgrade::MinimalSteps "true"; EOF # Enable automatic updates dpkg-reconfigure -plow unattended-upgrades \`\`\` #### 5. File Integrity Monitoring \`\`\`bash # Install AIDE apt-get install aide # Initialize AIDE aide --init mv /var/lib/aide/aide.db.new /var/lib/aide/aide.db # Schedule daily checks cat > /etc/cron.daily/aide << EOF #!/bin/bash /usr/bin/aide --check EOF chmod +x /etc/cron.daily/aide \`\`\` ### Phase 2: Enhanced Controls (Within 1 week) #### 6. Kernel Hardening \`\`\`bash cat > /etc/sysctl.d/99-security.conf << EOF # Network Security net.ipv4.ip_forward = 0 net.ipv4.conf.all.send_redirects = 0 net.ipv4.conf.default.send_redirects = 0 net.ipv4.conf.all.accept_source_route = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.conf.all.accept_redirects = 0 net.ipv4.conf.default.accept_redirects = 0 net.ipv4.icmp_echo_ignore_broadcasts = 1 net.ipv4.icmp_ignore_bogus_error_responses = 1 net.ipv4.conf.all.rp_filter = 1 net.ipv4.conf.default.rp_filter = 1 # Kernel Hardening kernel.randomize_va_space = 2 kernel.kptr_restrict = 2 kernel.dmesg_restrict = 1 kernel.perf_event_paranoid = 2 # ASLR kernel.randomize_va_space = 2 # ExecShield kernel.exec-shield = 1 EOF # Apply settings sysctl -p /etc/sysctl.d/99-security.conf \`\`\` #### 7. Audit Logging \`\`\`bash # Install auditd apt-get install auditd audispd-plugins # Configure audit rules cat > /etc/audit/rules.d/cis.rules << EOF # System logs -w /etc/hosts -p wa -k hosts -w /etc/passwd -p wa -k identity -w /etc/group -p wa -k identity -w /etc/shadow -p wa -k identity -w /etc/sudoers -p wa -k sudoers # Administrative commands -a always,exit -F arch=b64 -S chmod -S chown -F auid>=1000 -F auid!=4294967295 -k perm_mod -a always,exit -F arch=b32 -S chmod -S chown -F auid>=1000 -F auid!=4294967295 -k perm_mod # File access -a always,exit -F dir=/etc -F perm=wa -k admin_actions -a always,exit -F dir=/var/log -F perm=wa -k admin_actions # Sudo usage -w /bin/sudo -p x -k sudo -w /usr/bin/sudo -p x -k sudo EOF systemctl enable auditd systemctl start auditd \`\`\` #### 8. Password Policy \`\`\`bash # Install password quality checking apt-get install libpam-pwquality # Configure password policy cat > /etc/security/pwquality.conf << EOF minlen = 14 minclass = 3 maxrepeat = 3 dcredit = -1 ucredit = -1 lcredit = -1 ocredit = -1 EOF # Configure login.defs sed -i 's/^PASS_MAX_DAYS.*/PASS_MAX_DAYS 90/' /etc/login.defs sed -i 's/^PASS_MIN_DAYS.*/PASS_MIN_DAYS 1/' /etc/login.defs sed -i 's/^PASS_WARN_AGE.*/PASS_WARN_AGE 7/' /etc/login.defs \`\`\` ### Phase 3: Monitoring & Alerting (Within 2 weeks) #### 9. Centralized Logging \`\`\`bash # Install rsyslog apt-get install rsyslog # Configure remote logging cat >> /etc/rsyslog.conf << EOF # Send to central log server *.* @@logserver.example.com:514 EOF systemctl restart rsyslog \`\`\` #### 10. Security Monitoring \`\`\`bash # Install monitoring agents apt-get install prometheus-node-exporter # Enable service systemctl enable prometheus-node-exporter systemctl start prometheus-node-exporter \`\`\` ## Post-Hardening Assessment Running security scan... \`\`\`bash lynis audit system \`\`\` **Final Score: 92/100** ⬆️ from 58/100 ## Summary ### Implemented Controls - ✅ SSH hardening (key-only, rate limiting) - ✅ Firewall with strict rules - ✅ Intrusion prevention (Fail2Ban) - ✅ Automatic security updates - ✅ File integrity monitoring (AIDE) - ✅ Kernel hardening - ✅ Comprehensive audit logging - ✅ Strong password policy - ✅ Centralized logging - ✅ Security monitoring ### Security Posture - **Before**: Vulnerable to multiple attack vectors - **After**: CIS-compliant with comprehensive defense-in-depth ### Ongoing Maintenance - Daily: Review fail2ban logs - Weekly: Review AIDE reports - Monthly: Update security scan - Quarterly: Review and update policies ### Documentation All configurations documented in `/etc/security/hardening-report.md` Training materials provided to team for ongoing security practices. ``` --- ## Quality Standards ### Administrative Best Practices ```markdown ## Change Management - [ ] Document all changes - [ ] Test in staging first - [ ] Maintain change log - [ ] Rollback plan for all changes ## Documentation - [ ] Network diagram updated - [ ] Service dependencies documented - [ ] Runbooks for critical services - [ ] Escalation procedures documented ## Backup Verification - [ ] Automated daily backups - [ ] Monthly restore testing - [ ] Off-site backup copies - [ ] Backup documentation current ## Security Compliance - [ ] Regular security scans - [ ] Vulnerability assessments - [ ] Access reviews quarterly - [ ] Incident response plan tested ``` --- ## Conclusion The Linux Server Admin Agent provides comprehensive system administration capabilities, from routine maintenance to complex troubleshooting and security hardening. By following this specification, the agent delivers: 1. **Systematic Diagnostics**: Comprehensive health assessments and troubleshooting 2. **Service Management**: Complete service lifecycle management 3. **Security Hardening**: CIS benchmark compliance implementation 4. **Monitoring Setup**: Production-grade monitoring and alerting 5. **Automation**: Ansible playbooks and bash scripts for efficiency 6. **Container Management**: Docker and Kubernetes administration 7. **Issue Resolution**: Proactive problem identification and resolution This agent specification ensures reliable, secure, and efficient Linux server administration across diverse environments and use cases.