← All Refreshers
Linux Refresher
Comprehensive quick-reference for Linux — filesystem, shell scripting, process management, networking, systemd, SSH, and production debugging
Table of Contents
1. Setup & Environment
Linux Practice on macOS
macOS is POSIX-compliant but differs from Linux in important ways. For accurate Linux practice, run a real Linux environment locally.
Option A: Docker (Fastest)
# Interactive Ubuntu shell — disposable, no setup
docker run --rm -it ubuntu:24.04 bash
# With your current directory mounted
docker run --rm -it -v "$(pwd)":/work -w /work ubuntu:24.04 bash
# Persistent container you can stop/start
docker run -it --name linux-lab ubuntu:24.04 bash
docker start -ai linux-lab
# Debian with common tools pre-installed
docker run --rm -it debian:bookworm bash
apt-get update && apt-get install -y procps iproute2 net-tools curl vim
Option B: Multipass (Ubuntu VMs, native performance)
# Install
brew install multipass
# Launch Ubuntu 24.04 LTS VM (2 CPU, 4GB RAM, 20GB disk)
multipass launch --name lab --cpus 2 --memory 4G --disk 20G 24.04
# Shell into it
multipass shell lab
# Mount local directory
multipass mount ~/projects lab:/projects
# List / stop / delete
multipass list
multipass stop lab
multipass delete lab && multipass purge
When to use each
Docker is fine for filesystem, text processing, and scripting practice. Use Multipass when you need a full init system (systemd), real network interfaces, or kernel-level features like perf and iptables — Docker containers share the host kernel and lack systemd by default.
macOS vs Linux Differences That Bite
| Area | macOS (BSD-based) | Linux (GNU) |
|---|---|---|
sed -i | Requires empty string: sed -i '' 's/a/b/' f | sed -i 's/a/b/' f |
date | BSD date, no -d flag | GNU date: date -d '2 days ago' |
ls --color | Not supported (use -G) | ls --color=auto |
grep -P | No Perl regex by default | Supported natively |
readlink -f | Not available (use realpath) | Supported |
xargs -r | Not supported | Skips exec if stdin is empty |
/proc | Does not exist | Virtual filesystem exposing kernel state |
| Package manager | Homebrew (3rd party) | apt / dnf / pacman |
| Default shell | zsh (since Catalina) | bash (most distros) |
Install GNU tools on macOS
brew install coreutils findutils gnu-sed gawk grep — then use gdate, gsed, ggrep, etc., or prepend the gnubin paths to PATH to shadow the BSD versions transparently.
2. Filesystem Hierarchy
FHS — Key Directories
| Directory | Purpose |
|---|---|
/bin, /sbin | Essential binaries (now usually symlinks to /usr/bin on modern distros) |
/usr/bin | User commands — grep, python3, git |
/usr/local/bin | Locally installed software (Homebrew on macOS, manual installs on Linux) |
/etc | System-wide configuration (text files, human-editable) |
/var | Variable data — logs (/var/log), spool, caches, databases |
/tmp | Temporary files; cleared on reboot (usually tmpfs in RAM) |
/home | User home directories (/root for root) |
/proc | Virtual FS exposing kernel and process state as files |
/sys | Virtual FS for kernel objects — devices, drivers, power management |
/dev | Device files — block (sda), char (tty), pseudo (null, zero, urandom) |
/run | Runtime data since last boot — PID files, sockets (tmpfs) |
/opt | Self-contained optional packages (e.g., /opt/google/chrome) |
/lib, /lib64 | Shared libraries needed by /bin and /sbin |
/boot | Kernel image, initrd, GRUB bootloader |
/mnt, /media | Mount points for temporary and removable filesystems |
Everything Is a File
The Unix philosophy treats hardware, processes, and kernel state as files — making them composable with standard text tools.
# CPU info from kernel
cat /proc/cpuinfo | grep -m1 "model name"
nproc # Number of logical CPUs
# Memory info
cat /proc/meminfo | head -5
# Process info — every PID has a /proc/$PID directory
ls /proc/$$ # $$ = current shell PID
cat /proc/$$/cmdline | tr '\0' ' '
cat /proc/$$/status | grep -E "^(Name|Pid|VmRSS)"
# Kernel tunable parameters
cat /proc/sys/net/ipv4/ip_forward
sysctl net.ipv4.ip_forward
# Hardware via /sys
cat /sys/class/net/eth0/speed # NIC speed in Mbps
ls /sys/block/ # Block devices
# Pseudo-devices
dd if=/dev/zero bs=1M count=100 of=/dev/null # Throughput benchmark
dd if=/dev/urandom bs=16 count=1 | xxd # 16 random bytes
echo "discard this" > /dev/null # Suppress output
/proc is live kernel state
Reading /proc/meminfo does not read from disk — the kernel generates the content on-demand each time you open the file. Changes to /proc/sys/* take effect immediately but are not persistent across reboots. Use sysctl -w + /etc/sysctl.conf (or a file in /etc/sysctl.d/) for persistence.
Inodes and Links
# Every file has an inode — metadata record (permissions, timestamps, block pointers)
# A directory entry is just a name-to-inode mapping
stat myfile.txt # Full inode metadata
ls -li /etc/passwd # -i shows inode number
# Hard link: another directory entry pointing to the same inode
ln /etc/passwd /tmp/passwd-hard
# Both names share the same inode; deleting one leaves the other intact
# Hard links cannot cross filesystem boundaries; cannot link directories
# Soft (symbolic) link: a file whose content is a path
ln -s /etc/nginx/nginx.conf nginx.conf
ls -la nginx.conf # Shows -> /etc/nginx/nginx.conf
readlink -f nginx.conf # Resolve all symlinks to absolute path
# Find files by inode (useful when filename has unprintable chars)
find / -inum 12345 2>/dev/null
3. File Operations
ls & Navigation
ls -lah # Long format, all files, human-readable sizes
ls -lt # Sort by modification time (newest first)
ls -ltr # Oldest first (useful for log directories)
ls -lS # Sort by size (largest first)
ls -d */ # List only directories
ls -1 # One file per line (for scripting)
# Better alternatives
tree -L 2 # Directory tree, 2 levels deep
tree -L 2 -I 'node_modules|.git' # Exclude patterns
# Navigate
cd - # Jump to previous directory
pushd /tmp # Push to directory stack
popd # Pop back
dirs -v # Show directory stack
cp, mv, rm
# cp — preserve timestamps, ownership, permissions
cp -a src/ dst/ # Archive mode: recursive + preserve all metadata
cp -r src/ dst/ # Recursive (no metadata preservation)
cp -u src dst # Copy only if src is newer than dst
cp --backup=numbered f dst/ # Keep numbered backups on overwrite
# mv
mv oldname newname # Rename or move
mv -n src dst # No clobber — never overwrite existing
mv -v *.log /archive/ # Verbose
# rm — no trash, no undo
rm -rf dir/ # Delete recursively, no prompt
rm -i *.tmp # Interactive prompt per file
rm -- -weird-filename # -- treats args as filenames, not flags
rm -rf has no undo
There is no Recycle Bin on Linux. Always double-quote variables in scripts: rm -rf "$dir" — an unset $dir expands to nothing, turning the command into rm -rf alone, which is a very bad day. GNU rm has --preserve-root (default) to block rm -rf /.
find
# Syntax: find [path] [expression]
find . -name "*.log" # By name (case-sensitive)
find . -iname "*.Log" # Case-insensitive
find . -type f # Regular files only
find . -type d # Directories only
find . -type l # Symlinks only
# By size
find /var/log -size +100M # Larger than 100 MB
find . -size -1k # Smaller than 1 KB
find . -empty # Empty files and dirs
# By time (n = days, +n = older than n days, -n = newer)
find /tmp -mtime +7 # Modified more than 7 days ago
find . -mmin -60 # Modified in the last 60 minutes
find . -newer reference.txt # Newer than reference.txt
# By permissions
find / -perm -4000 2>/dev/null # SUID files (security audit)
find . -perm /022 # World- or group-writable
# Execute actions
find /tmp -mtime +7 -delete # Delete old files
find . -name "*.py" -exec wc -l {} \; # Count lines in each .py file
find . -name "*.py" -exec grep -l "TODO" {} \; # Files containing TODO
# xargs — more efficient than -exec for many files
find . -name "*.log" | xargs rm -f
find . -name "*.py" | xargs grep -l "import os"
find . -name "*.txt" -print0 | xargs -0 wc -l # -print0/-0 handles spaces in names
# Prune (skip) directories
find . -name node_modules -prune -o -name "*.js" -print
Globbing & Brace Expansion
# Standard globs (shell expands these before passing to command)
*.log # Any file ending in .log
file?.txt # file1.txt, fileA.txt (exactly one character)
[abc].txt # a.txt, b.txt, c.txt
[0-9]*.sh # Scripts starting with a digit
[!a]*.log # Files NOT starting with 'a'
# Extended globs (enable with: shopt -s extglob)
!(*.log) # Everything except .log files
+(*.tar|*.gz) # One or more .tar or .gz files
# Brace expansion — no filesystem lookup, purely syntactic
echo file{1,2,3}.txt # file1.txt file2.txt file3.txt
echo {a..z} # a b c ... z
mkdir -p project/{src,tests,docs,scripts}
cp config.yaml config.yaml.{bak,$(date +%Y%m%d)}
# Globstar — recursive glob (enable with: shopt -s globstar)
ls **/*.py # All Python files in any subdirectory
4. File Permissions & Ownership
Permission Model
# ls -l output: -rwxr-xr-x 1 owner group size date name
# ^ file type (- = regular, d = dir, l = symlink, b = block dev, c = char dev)
# ^rwx = owner (user) permissions
# ^r-x = group permissions
# ^r-x = other (world) permissions
# Permission bits: r=4, w=2, x=1
# rwx=7, r-x=5, r--=4, ---=0
# Show octal permissions
stat -c "%a %n" /usr/bin/sudo # e.g., 4755 /usr/bin/sudo
# Directory permissions behave differently from files:
# r = can list directory contents (ls)
# w = can create, delete, rename files WITHIN the directory
# x = can traverse (cd into it, or access files by name)
chmod & chown
# chmod — symbolic mode
chmod u+x script.sh # Add execute for owner
chmod go-w file.txt # Remove write from group and other
chmod a+r public.html # Add read for all (a = ugo)
chmod u=rwx,g=rx,o= # Set exact permissions (no access for other)
# chmod — octal mode
chmod 755 script.sh # rwxr-xr-x (typical executable)
chmod 644 config.txt # rw-r--r-- (typical data file)
chmod 600 ~/.ssh/id_rsa # rw------- (private key — SSH will reject if looser)
chmod 700 ~/.ssh # rwx------ (SSH directory)
chmod -R 755 /var/www/html # Recursive
# chown — change owner and/or group
chown alice file.txt
chown alice:developers file.txt
chown -R www-data:www-data /var/www
chown :docker /var/run/docker.sock # Change group only (owner unchanged)
umask
# umask defines permission bits to REMOVE from newly created files/dirs
# New file default: 666 (rw-rw-rw-) New dir default: 777 (rwxrwxrwx)
# umask 022 removes: ---w--w- (removes group+other write)
# Result files=644, dirs=755
umask # Show current umask (e.g., 0022)
umask 027 # Files=640, dirs=750 (group read, no other access)
umask 077 # Files=600, dirs=700 (owner only — useful for secrets)
# Verify effect
umask 022; touch testfile; stat -c "%a" testfile; rm testfile # Should print 644
SUID, SGID, Sticky Bit
# SUID (Set User ID) — bit 4 on executable
# The process runs as the FILE OWNER, not the calling user
ls -l /usr/bin/passwd # -rwsr-xr-x (s = SUID set, execute bit also set)
chmod u+s /usr/bin/myapp # Set SUID
chmod 4755 /usr/bin/myapp # Octal: 4=SUID + 755
# SGID (Set Group ID) — bit 2
# On executable: runs as file GROUP
# On directory: new files/dirs inherit the directory's group (great for shared dirs)
chmod g+s /shared/ # Set SGID on directory
chmod 2775 /shared/ # Octal: 2=SGID + 775
ls -ld /shared/ # drwxrwsr-x (s in group execute position)
# Sticky bit — bit 1
# On directory: users can only delete their OWN files, even if they have write on dir
ls -ld /tmp # drwxrwxrwt (t = sticky bit set)
chmod +t /shared/uploads/
chmod 1777 /tmp/ # Octal: 1=sticky + 777
SUID on shell scripts is silently ignored
Linux ignores the SUID bit on interpreted scripts (bash, python, etc.) for security reasons — only compiled binaries honour it. Grant specific script privileges via sudo rules in /etc/sudoers instead.
ACLs (Access Control Lists)
# ACLs allow per-user/per-group permissions beyond the owner/group/other triplet
# Requires filesystem mounted with 'acl' option (default on ext4/xfs on modern distros)
getfacl /etc/myapp/config.yaml # View ACLs
setfacl -m u:alice:rw /etc/myapp/config.yaml # Grant alice read+write
setfacl -m g:devops:rx /usr/local/bin/deploy.sh # Grant devops group r-x
setfacl -d -m g:developers:rw /var/www/html/ # Default ACL (inherited by new files)
setfacl -x u:alice /etc/myapp/config.yaml # Remove alice's ACL entry
setfacl -b /etc/myapp/config.yaml # Remove ALL ACLs
# A '+' in ls -l output indicates ACLs are present:
ls -l /etc/myapp/config.yaml # -rw-rw-r--+
5. Text Processing
grep
# Basic
grep "error" /var/log/syslog
grep -i "error" file.txt # Case-insensitive
grep -v "debug" file.log # Invert match (exclude)
grep -r "TODO" src/ # Recursive
grep -rl "TODO" src/ # Print matching filenames only
# Context around matches
grep -A 3 "Exception" app.log # 3 lines After
grep -B 2 "Exception" app.log # 2 lines Before
grep -C 5 "Exception" app.log # 5 lines Context (before + after)
# Counts and line numbers
grep -c "ERROR" app.log # Count of matching lines
grep -n "ERROR" app.log # Show line numbers
grep -m 10 "ERROR" app.log # Stop after 10 matches
# Regex
grep -E "error|warn|crit" app.log # Extended regex (ERE)
grep -P "\d{3}-\d{4}" phones.txt # Perl-compatible regex (GNU grep)
grep "^ERROR" app.log # Lines starting with ERROR
grep "\.py$" filelist.txt # Lines ending with .py
# Multiple patterns
grep -e "error" -e "warn" app.log
grep -f patterns.txt app.log # Patterns from a file
# Practical combos
grep -rn "deprecated" --include="*.py" src/
grep -rn "password" --exclude-dir=".git" .
zgrep "ERROR" /var/log/app.log.gz # Search inside gzip files
sed
# Substitution: s/pattern/replacement/flags
sed 's/foo/bar/' file.txt # Replace first occurrence per line
sed 's/foo/bar/g' file.txt # Replace all (global)
sed 's/foo/bar/gi' file.txt # Global + case-insensitive (GNU sed)
sed -i 's/foo/bar/g' file.txt # In-place edit
sed -i.bak 's/foo/bar/g' file.txt # In-place with .bak backup
# Address ranges
sed '3s/foo/bar/' file.txt # Only line 3
sed '2,5s/foo/bar/g' file.txt # Lines 2-5
sed '/^#/d' file.txt # Delete comment lines
sed '/start/,/end/d' file.txt # Delete from 'start' to 'end' pattern
sed -n '10,20p' file.txt # Print only lines 10-20
# Delete and print
sed '/^$/d' file.txt # Delete blank lines
sed '1d' file.txt # Delete first line (skip CSV header)
sed '$d' file.txt # Delete last line
# Practical extractions
sed -n 's/.*error: \(.*\)/\1/p' app.log # Extract text after "error: "
sed 's/[[:space:]]*$//' file.txt # Strip trailing whitespace
sed 's/^\s*//;s/\s*$//' file.txt # Strip leading and trailing whitespace
awk
# awk processes line by line; splits each line into fields
# Built-ins: $0=whole line, $1..$NF=fields, NF=field count, NR=record/line number
# Print specific columns
awk -F: '{print $1, $3}' /etc/passwd # Username and UID (colon-delimited)
awk -F: '{print $1}' /etc/passwd # Usernames only
# Conditions
awk -F: '$3 > 1000 {print $1}' /etc/passwd # Regular users (UID > 1000)
awk -F: '$7 == "/bin/bash" {print $1}' /etc/passwd # Users with bash shell
awk '/ERROR/ {print NR, $0}' app.log # Line number + line for errors
# Arithmetic and aggregation
awk '{sum += $1} END {print "Total:", sum}' numbers.txt
awk -F, '{sum += $5} END {printf "Revenue: $%.2f\n", sum}' sales.csv
# BEGIN and END blocks
awk 'BEGIN {print "=== Report ==="} {print NR, $0} END {print "Lines:", NR}' file.txt
# Field manipulation
awk '{$2="REDACTED"; print}' log.txt # Replace field 2
awk -F, 'OFS="," {$3=$3*1.1; print}' prices.csv # Increase column 3 by 10%
# Parse nginx access log — top 20 requested URLs
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -20
# Compute 95th percentile latency from log file (latency in ms, field 10)
awk '{print $10}' access.log | sort -n | awk 'BEGIN{c=0} {lines[c++]=$0} END{print lines[int(c*0.95)]}'
cut, sort, uniq, tr, wc
# cut — extract columns
cut -d: -f1,3 /etc/passwd # Fields 1 and 3, colon-delimited
cut -d, -f2-5 data.csv # Fields 2 through 5
cut -c1-10 file.txt # First 10 characters per line
# sort
sort file.txt # Lexicographic ascending
sort -r file.txt # Reverse
sort -n numbers.txt # Numeric sort (essential — "10" > "9" numerically)
sort -rn numbers.txt # Numeric descending
sort -t: -k3 -n /etc/passwd # Sort by 3rd field (UID), numeric
sort -k2,2 -k1,1 file.txt # Sort by field 2, then field 1
sort -u file.txt # Sort + deduplicate in one pass
sort --parallel=4 huge.txt # Parallel sort for large files
# uniq (input must be sorted first)
sort file.txt | uniq # Remove consecutive duplicates
sort file.txt | uniq -c # Count occurrences (most useful)
sort file.txt | uniq -d # Show only lines that appear more than once
sort file.txt | uniq -u # Show only lines that are truly unique
# tr — translate or delete characters
tr 'a-z' 'A-Z' <<< "hello world" # Uppercase
tr -d '\r' < windows.txt # Remove Windows carriage returns (CRLF -> LF)
tr -s ' ' <<< "hello world" # Squeeze repeated spaces to one
tr -dc '[:alnum:]' < /dev/urandom | head -c 32 # 32-char random alphanumeric string
# wc
wc -l file.txt # Line count
wc -w file.txt # Word count
wc -c file.txt # Byte count
find . -name "*.py" | xargs wc -l | tail -1 # Total lines in all Python files
diff & jq
# diff
diff file1.txt file2.txt # Default output
diff -u file1.txt file2.txt # Unified format (standard for patches)
diff -r dir1/ dir2/ # Recursive directory diff
diff --color=always old new | less -R # Colorized pager
# Apply a patch
diff -u original.py modified.py > changes.patch
patch original.py < changes.patch
# jq — JSON processor (apt install jq)
echo '{"name":"alice","age":30}' | jq '.name' # "alice"
echo '{"items":[1,2,3]}' | jq '.items[]' # Each element
cat data.json | jq '.users[] | select(.active == true)' # Filter array
cat data.json | jq '[.users[] | {name, email}]' # Reshape objects
cat data.json | jq '.users | length' # Array length
# jq flags
jq -r '.name' data.json # Raw output — no surrounding quotes
jq -c '.' data.json # Compact single-line output
jq '.' data.json # Pretty-print (format) JSON
jq --arg key "value" '.[$key]' data.json # Pass shell variable as jq variable
# Real-world: extract latest GitHub release tag
curl -s https://api.github.com/repos/cli/cli/releases/latest \
| jq -r '.tag_name'
6. I/O Redirection & Pipes
Standard Streams
# File descriptors: 0=stdin 1=stdout 2=stderr
# Redirect stdout
command > file.txt # Overwrite
command >> file.txt # Append
# Redirect stderr
command 2> errors.txt # stderr to file
command 2>&1 # Merge stderr into stdout
command > out.txt 2>&1 # Both to file (stdout first, then merge stderr)
command &> out.txt # Shorthand (bash only)
command 2>/dev/null # Discard errors silently
command >/dev/null 2>&1 # Discard all output
# Redirect stdin
command < input.txt
command < input.txt > output.txt
# tee — write to file AND stdout simultaneously
command | tee output.txt # stdout to terminal + file
command | tee -a output.txt # Append mode
command 2>&1 | tee all.log # Capture everything
Pipes & Process Substitution
# Pipes connect stdout of left command to stdin of right command
ps aux | grep nginx | grep -v grep | awk '{print $2}' # nginx PIDs
# Merge stderr into pipeline
command 2>&1 | grep ERROR
# Process substitution — treat command output as a file argument
diff <(sort file1.txt) <(sort file2.txt) # Compare sorted without temp files
comm <(sort a.txt) <(sort b.txt) # Lines in a only / b only / both
cat <(head -5 file1.txt) <(tail -5 file2.txt) # Concatenate two command outputs
# Named pipes (FIFOs)
mkfifo /tmp/mypipe
command1 > /tmp/mypipe & # Write in background
command2 < /tmp/mypipe # Read from pipe; blocks until writer is done
rm /tmp/mypipe
# pipefail — exit if any command in pipeline fails
set -o pipefail
# Without it, only the last command's exit code is checked:
false | true; echo $? # Prints 0 (hides the failure!)
set -o pipefail
false | true; echo $? # Prints 1 (correct)
Here Documents & Here Strings
# Here document — multi-line stdin
cat <<EOF
Line one with $HOME expanded
Line two
EOF
# Quoted heredoc — disable all expansion
cat <<'EOF'
The variable $HOME will NOT be expanded here.
EOF
# Indented heredoc — strip leading tabs (use real TAB chars, not spaces)
cat <<-EOF
Indented line one
Indented line two
EOF
# Redirect heredoc to a file
cat > /etc/myapp/config.yaml <<EOF
database:
host: ${DB_HOST}
port: ${DB_PORT:-5432}
EOF
# Here string — single value as stdin
grep "^root" <<< "$(cat /etc/passwd)"
base64 <<< "hello world"
# Send commands to interactive program
mysql -u root <<EOF
USE mydb;
SELECT COUNT(*) FROM users WHERE active = 1;
EOF
7. Shell Scripting (Bash)
Script Header & Safety Flags
#!/usr/bin/env bash
# Use /usr/bin/env bash for portability (bash may not be at /bin/bash)
set -euo pipefail
# -e exit immediately on any command returning non-zero
# -u treat unset variables as errors (prevents silent empty-string bugs)
# -o pipefail make pipeline fail if any stage fails (not just the last)
# Safer IFS for word splitting
IFS=$'\n\t'
# Cleanup trap — runs on exit (including error exit)
TMPDIR_WORK=$(mktemp -d)
cleanup() {
rm -rf "$TMPDIR_WORK"
}
trap cleanup EXIT
# Error trap with line number
trap 'echo "Error on line $LINENO" >&2' ERR
# Script directory (works when called from any working directory)
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
Variables & Parameter Expansion
# Assignment — NO spaces around =
name="Alice"
count=0
readonly CONFIG="/etc/app/config.yaml" # Immutable variable
# Always double-quote expansions to prevent word splitting and glob expansion
echo "$name" # Correct
echo $name # Risky — splits on whitespace, expands globs
# Parameter expansion
echo "${name}" # Unambiguous (required before letters/digits)
echo "${name:-default}" # Use 'default' if name is unset or empty
echo "${name:=default}" # Assign 'default' if unset, then expand
echo "${name:?Error: required}" # Exit with error if name is unset/empty
echo "${name:+set}" # Expand to 'set' if name is non-empty; else empty
# String operations
file="/path/to/archive.tar.gz"
echo "${file##*/}" # archive.tar.gz (basename — strip up to last /)
echo "${file%/*}" # /path/to (dirname — strip from last /)
echo "${file%%.*}" # /path/to/archive (strip from first .)
echo "${file#*.}" # tar.gz (strip up to first .)
echo "${#file}" # 24 (string length)
echo "${file/tar/TAR}" # Replace first match
echo "${file//a/A}" # Replace all matches
echo "${name^^}" # UPPERCASE (bash 4+)
echo "${name,,}" # lowercase (bash 4+)
# Arrays
fruits=("apple" "banana" "cherry")
echo "${fruits[0]}" # apple
echo "${fruits[@]}" # All elements (space-separated)
echo "${#fruits[@]}" # 3 (length)
fruits+=("date") # Append element
for f in "${fruits[@]}"; do echo "$f"; done
# Associative arrays (bash 4+)
declare -A config
config[host]="localhost"
config[port]="5432"
for key in "${!config[@]}"; do echo "$key = ${config[$key]}"; done
Conditionals
# [[ ]] preferred in bash — no word splitting, no glob expansion, supports regex
if [[ "$name" == "Alice" ]]; then
echo "Hello Alice"
elif [[ "$name" =~ ^Bob ]]; then # =~ is regex match; capture in BASH_REMATCH
echo "Hello ${BASH_REMATCH[0]}"
else
echo "Hello stranger"
fi
# Common test operators
[[ -f "$file" ]] # Regular file exists
[[ -d "$dir" ]] # Directory exists
[[ -e "$path" ]] # Any path exists
[[ -s "$file" ]] # File exists and is non-empty
[[ -r "$file" ]] # File is readable
[[ -w "$file" ]] # File is writable
[[ -x "$file" ]] # File is executable
[[ -L "$path" ]] # Path is a symlink
[[ -z "$var" ]] # String is empty
[[ -n "$var" ]] # String is non-empty
[[ "$a" == "$b" ]] # String equality
[[ "$n" -eq 0 ]] # Numeric equal
[[ "$n" -gt 0 ]] # Numeric greater than
[[ "$n" -lt 10 ]] # Numeric less than
# Compound conditions
[[ -f "$f" && -r "$f" ]] # File exists AND is readable
[[ "$a" == "x" || "$b" == "y" ]] # Either condition
# Short-circuit for guard clauses
[[ -d /tmp/work ]] || mkdir -p /tmp/work
[[ -n "$DB_URL" ]] || { echo "DB_URL required" >&2; exit 1; }
# case statement
case "$os" in
ubuntu|debian) pkg_mgr="apt" ;;
centos|rhel) pkg_mgr="yum" ;;
fedora) pkg_mgr="dnf" ;;
*) echo "Unknown OS: $os" >&2; exit 1 ;;
esac
Loops
# for — list iteration
for host in web1 web2 web3; do
ssh "$host" 'systemctl restart nginx'
done
# for — array iteration
files=(/var/log/*.log)
for file in "${files[@]}"; do
gzip "$file"
done
# Read lines from file safely (handles spaces, backslashes)
while IFS= read -r line; do
echo "Processing: $line"
done < input.txt
# Read lines from command output
while IFS= read -r pid; do
kill -TERM "$pid"
done < <(pgrep stale-worker)
# C-style for loop
for ((i=1; i<=10; i++)); do
echo "$i"
done
# while loop with counter
count=0
while [[ $count -lt 5 ]]; do
echo "Attempt $count"
((count++))
done
# Retry loop with backoff
max_attempts=5
attempt=0
until curl -sf https://api.example.com/health >/dev/null; do
((attempt++))
[[ $attempt -ge $max_attempts ]] && { echo "Service unreachable"; exit 1; }
echo "Waiting... (attempt $attempt/$max_attempts)"
sleep $((2 ** attempt)) # Exponential backoff: 2, 4, 8, 16 seconds
done
Functions
# Logging helper
log() {
local level="$1"; shift
echo "[$(date '+%Y-%m-%d %H:%M:%S')] [$level] $*" >&2
}
log "INFO" "Starting deployment"
log "ERROR" "Database unreachable"
# Functions return exit codes (0-255)
# Return VALUES via echo + command substitution
get_timestamp() {
date '+%Y%m%d_%H%M%S'
}
ts=$(get_timestamp)
# Validate and guard
require_env() {
local var="$1"
[[ -n "${!var}" ]] || { log "ERROR" "Required env var not set: $var"; exit 1; }
}
require_env "DATABASE_URL"
require_env "SECRET_KEY"
# Variadic functions — "$@" expands to all args, individually quoted
sum() {
local total=0
for n in "$@"; do ((total += n)); done
echo "$total"
}
sum 1 2 3 4 5 # 15
# local is essential — without it, variables leak into caller scope
process_file() {
local file="$1" # function-scoped
local line_count
line_count=$(wc -l < "$file")
echo "$file has $line_count lines"
}
Always use local for function variables
Without local, bash variables are global. A function that sets count=0 will silently zero out a count variable in the calling scope. This is one of the most common sources of subtle shell script bugs.
Complete production-quality backup script
#!/usr/bin/env bash
# backup.sh — Back up a directory to S3 with retention and error alerting
set -euo pipefail
IFS=$'\n\t'
# --- Configuration (override via environment) ---
readonly BACKUP_SRC="${BACKUP_SRC:-/var/lib/myapp}"
readonly BACKUP_BUCKET="${BACKUP_BUCKET:?BACKUP_BUCKET env var is required}"
readonly RETENTION_DAYS="${RETENTION_DAYS:-30}"
readonly TIMESTAMP=$(date '+%Y%m%d_%H%M%S')
readonly BACKUP_FILE="/tmp/backup-${TIMESTAMP}-$$.tar.gz"
readonly LOG_FILE="/var/log/myapp-backup.log"
# --- Logging ---
log() {
local level="$1"; shift
echo "[$(date '+%Y-%m-%d %H:%M:%S')] [$level] $*" | tee -a "$LOG_FILE"
}
# --- Cleanup on exit ---
cleanup() {
local exit_code=$?
rm -f "$BACKUP_FILE"
if [[ $exit_code -ne 0 ]]; then
log "ERROR" "Backup FAILED with exit code $exit_code (line $LINENO)"
# Uncomment to alert via Slack:
# curl -s -X POST "$SLACK_WEBHOOK" \
# -H 'Content-type: application/json' \
# -d "{\"text\":\"Backup failed on $(hostname) at $(date)\"}"
fi
}
trap cleanup EXIT
# --- Validate dependencies ---
for cmd in tar aws; do
command -v "$cmd" >/dev/null 2>&1 \
|| { log "ERROR" "Required command not found: $cmd"; exit 1; }
done
# --- Validate source ---
[[ -d "$BACKUP_SRC" ]] \
|| { log "ERROR" "Source directory not found: $BACKUP_SRC"; exit 1; }
# --- Create archive ---
log "INFO" "Backing up $BACKUP_SRC"
tar -czf "$BACKUP_FILE" -C "$(dirname "$BACKUP_SRC")" "$(basename "$BACKUP_SRC")"
size=$(du -sh "$BACKUP_FILE" | cut -f1)
log "INFO" "Archive created: $BACKUP_FILE ($size)"
# --- Upload ---
s3_path="s3://${BACKUP_BUCKET}/backups/${TIMESTAMP}.tar.gz"
aws s3 cp "$BACKUP_FILE" "$s3_path" --storage-class STANDARD_IA
log "INFO" "Uploaded to $s3_path"
# --- Rotate old backups ---
log "INFO" "Rotating backups older than ${RETENTION_DAYS} days"
cutoff=$(date -d "${RETENTION_DAYS} days ago" '+%Y%m%d' 2>/dev/null \
|| date -v-${RETENTION_DAYS}d '+%Y%m%d') # Linux / macOS compat
aws s3 ls "s3://${BACKUP_BUCKET}/backups/" | awk '{print $4}' \
| while IFS= read -r key; do
file_date="${key:0:8}"
if [[ "$file_date" < "$cutoff" ]]; then
aws s3 rm "s3://${BACKUP_BUCKET}/backups/$key"
log "INFO" "Deleted old backup: $key"
fi
done
log "INFO" "Backup complete"
8. Process Management
ps & top/htop
# ps — process snapshot
ps aux # All processes, user-oriented format
ps -ef # All processes, full format (shows PPID)
ps -ejH # Process tree (forest view)
ps aux --sort=-%cpu | head -10 # Top 10 CPU consumers
ps aux --sort=-%mem | head -10 # Top 10 memory consumers
# ps output columns (aux format):
# USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
# VSZ = virtual size (includes shared libs, mmap'd files — often misleading)
# RSS = resident set size = actual physical RAM in use
# STAT: R=running, S=interruptible sleep, D=uninterruptible sleep (I/O wait),
# Z=zombie (exited, parent hasn't wait()ed), T=stopped, <=high priority
# + = foreground process group, s = session leader
# Get PIDs
pgrep nginx # PIDs matching process name
pgrep -a nginx # With full command line
pgrep -u www-data # All processes owned by www-data
pidof sshd # PIDs of exact binary name
# Process details via /proc
cat /proc/$(pgrep -n nginx)/status | grep -E "^(Name|Pid|VmRSS|Threads)"
ls /proc/$(pgrep -n nginx)/fd | wc -l # Count open file descriptors
# top interactive keys:
# k = kill a PID, r = renice, f = field selector
# 1 = per-CPU breakdown, M = sort by memory, P = sort by CPU
# u = filter by user, q = quit
top -b -n 1 | head -20 # Batch mode (scriptable)
Signals & kill
# Common signals and their conventional meanings
# SIGTERM (15) — polite termination; process should clean up and exit
# SIGKILL (9) — force kill; cannot be caught, blocked, or ignored
# SIGHUP (1) — hangup; convention: reload config (nginx, sshd, etc.)
# SIGINT (2) — keyboard interrupt (Ctrl+C)
# SIGQUIT (3) — quit with core dump (Ctrl+\)
# SIGSTOP (19) — pause process (cannot be caught)
# SIGCONT (18) — continue stopped process
# SIGUSR1/2 — user-defined; application-specific
kill -TERM 1234 # Polite terminate (default signal)
kill -HUP 1234 # Reload config
kill -KILL 1234 # Force kill (always works, last resort)
kill -9 1234 # Same as -KILL
# Kill by name
pkill nginx # SIGTERM to all matching
pkill -HUP nginx # Reload all nginx workers
pkill -u alice # Kill all of alice's processes (use with caution)
killall -9 python3 # SIGKILL to all python3 processes
# Kill process using a port
fuser -k 8080/tcp # Kill process bound to TCP port 8080
lsof -ti :8080 | xargs kill # Same using lsof
# Trap signals in scripts
cleanup() { echo "Interrupted, cleaning up..." >&2; rm -f /tmp/lockfile; exit 1; }
trap cleanup SIGTERM SIGINT
Job Control
# Background / foreground
command & # Start immediately in background
Ctrl+Z # Suspend the current foreground job
bg # Resume most recent suspended job in background
bg %2 # Resume job #2 in background
fg # Bring most recent background job to foreground
fg %2 # Bring job #2 to foreground
jobs # List background jobs in current shell
jobs -l # With PIDs
# Detach from shell — process survives shell exit
command &
disown %1 # Remove from jobs table (immune to SIGHUP on shell exit)
# nohup — immune to SIGHUP, stdout/stderr go to nohup.out
nohup ./long-running-script.sh > /var/log/myscript.log 2>&1 &
# tmux — persistent multiplexed sessions (far better than nohup for interactive work)
tmux new -s deploy # New named session
tmux attach -t deploy # Reattach (survives SSH disconnect)
tmux ls # List sessions
# Inside tmux: Ctrl+B then d to detach without killing
9. Users & Groups
User Database Files
# /etc/passwd — one line per user: name:x:UID:GID:GECOS:home:shell
# The 'x' means password hash is in /etc/shadow
cat /etc/passwd | grep -v "nologin\|false" # Users with real login shells
# /etc/shadow — hashed passwords (root-readable only)
# Format: name:hash:last_change:min:max:warn:inactive:expire
sudo cat /etc/shadow | head -3
# /etc/group — group database: group:x:GID:member1,member2
cat /etc/group | grep docker # Who is in the docker group
# Current user info
id # uid, gid, all supplementary groups
whoami # Username only
id alice # Info for a specific user
groups alice # List alice's groups
User Management
# Create user
useradd -m -s /bin/bash -G sudo alice # -m=create home, -s=shell, -G=supplementary groups
useradd -r -s /usr/sbin/nologin myservice # System user (no interactive login, UID < 1000)
# Modify user
usermod -aG docker alice # Add to docker group (-a = append, required to not lose existing groups)
usermod -s /bin/zsh alice # Change login shell
usermod -L alice # Lock account (prepend ! to password hash)
usermod -U alice # Unlock account
# Delete user
userdel alice # Delete user, keep home directory
userdel -r alice # Delete user AND home directory
# Passwords
passwd alice # Set/change password interactively
passwd -e alice # Expire password (forces change on next login)
passwd -l alice # Lock account
chage -l alice # Show password aging policy
chage -M 90 alice # Set max password age to 90 days
# Groups
groupadd developers
groupdel developers
gpasswd -a alice developers # Add alice to group
gpasswd -d alice developers # Remove alice from group
sudo
# Run as root
sudo command
sudo -i # Interactive root shell with root's environment
sudo -u alice command # Run as a specific user
sudo -l # List what current user can sudo
# /etc/sudoers — ALWAYS edit with visudo (validates syntax before saving)
sudo visudo
# Common sudoers patterns
# alice ALL=(ALL:ALL) ALL # Full sudo
# %sudo ALL=(ALL:ALL) ALL # All members of sudo group
# alice ALL=(ALL) NOPASSWD: /bin/systemctl restart nginx # Specific passwordless cmd
# deploy ALL=(ALL) NOPASSWD: /usr/local/bin/deploy.sh
# Drop-in files (preferred over editing /etc/sudoers directly)
echo "deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl" \
| sudo tee /etc/sudoers.d/deploy
sudo chmod 0440 /etc/sudoers.d/deploy # Must not be world-readable
Never edit /etc/sudoers directly
A syntax error locks out ALL sudo access on the system. You may need physical console access to recover. Always use visudo which validates before saving. Use drop-in files in /etc/sudoers.d/ for application-specific rules.
10. Package Management
apt (Debian / Ubuntu)
# Update package index first
sudo apt update
# Install / remove
sudo apt install nginx postgresql-16 build-essential
sudo apt remove nginx # Remove but keep config files
sudo apt purge nginx # Remove including config files
sudo apt autoremove # Remove orphaned dependencies
# Upgrade
sudo apt upgrade # Upgrade installed packages
sudo apt full-upgrade # Upgrade + handle dependency changes
# Inspect
apt search "web server"
apt show nginx # Dependencies, description
dpkg -l | grep nginx # Installed packages matching name
dpkg -L nginx # Files installed by the nginx package
dpkg -S /usr/sbin/nginx # Which package owns this file
apt-cache policy nginx # Installed vs available version
# Hold a package version
sudo apt-mark hold nginx
sudo apt-mark unhold nginx
# Add third-party repository (modern way with signed keyring)
curl -fsSL https://repo.example.com/gpg.key \
| sudo gpg --dearmor -o /usr/share/keyrings/example.gpg
echo "deb [signed-by=/usr/share/keyrings/example.gpg] https://repo.example.com stable main" \
| sudo tee /etc/apt/sources.list.d/example.list
sudo apt update
yum / dnf (RHEL / CentOS / Fedora)
# dnf is the modern replacement for yum
sudo dnf install nginx
sudo dnf remove nginx
sudo dnf update # Update all packages
sudo dnf update nginx # Update specific package
sudo dnf search "web server"
sudo dnf info nginx
sudo dnf list installed | grep nginx
sudo dnf provides /usr/sbin/nginx # Which package owns this path
# RHEL extras
sudo dnf install epel-release # Extra Packages for Enterprise Linux
sudo dnf config-manager --set-enabled crb # CodeReady Linux Builder (RHEL 9)
Package Manager Quick-Reference
| Task | apt (Debian/Ubuntu) | dnf (RHEL/Fedora) |
|---|---|---|
| Update index | apt update | (auto or dnf check-update) |
| Install | apt install pkg | dnf install pkg |
| Remove + config | apt purge pkg | dnf remove pkg |
| Upgrade all | apt upgrade | dnf update |
| Search | apt search term | dnf search term |
| File to package | dpkg -S /path | dnf provides /path |
| Package files | dpkg -L pkg | rpm -ql pkg |
| Package info | apt show pkg | dnf info pkg |
11. Networking
ip & ss
# ip — modern replacement for ifconfig/route (iproute2 package)
# Addresses
ip addr # All interfaces and IPs
ip addr show eth0 # Specific interface
ip addr add 192.168.1.10/24 dev eth0 # Add IP (not persistent)
ip addr del 192.168.1.10/24 dev eth0
# Links
ip link show
ip link set eth0 up
ip link set eth0 down
# Routes
ip route # Routing table
ip route get 8.8.8.8 # Which route handles destination
ip route add default via 192.168.1.1
ip route add 10.0.0.0/8 via 10.1.0.1
# ss — socket statistics (replaces netstat)
ss -tlnp # TCP listening sockets with PIDs
ss -ulnp # UDP listening
ss -tnp # Established TCP connections with PIDs
ss -s # Socket summary statistics
ss -tnp dst :443 # Connections to port 443
ss state established # Only established connections
curl & dig
# curl
curl https://api.example.com/health
curl -s https://api.example.com # Silent (no progress bar)
curl -I https://example.com # HEAD only
curl -L https://example.com # Follow redirects
curl -o /tmp/file.zip https://example.com/f.zip
# POST with JSON body
curl -X POST https://api.example.com/users \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{"name":"alice","email":"[email protected]"}'
# Check response code + latency
curl -w "\nHTTP %{http_code} %{time_total}s\n" -o /dev/null -s https://example.com
# Retry on failure
curl --retry 3 --retry-delay 2 --connect-timeout 5 --max-time 30 https://api.example.com
# dig — DNS queries
dig example.com # A record (IPv4)
dig example.com AAAA # IPv6
dig example.com MX # Mail exchangers
dig example.com TXT # TXT records (SPF, DKIM)
dig @8.8.8.8 example.com # Force specific DNS server
dig +short example.com # Answer only (no extra output)
dig +trace example.com # Full delegation chain from root
dig -x 93.184.216.34 # Reverse DNS lookup (PTR)
iptables / UFW
# iptables
sudo iptables -L -n -v # List filter table
sudo iptables -t nat -L -n -v # nat table
# Common INPUT rules
sudo iptables -A INPUT -p tcp --dport 22 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 80 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 443 -j ACCEPT
sudo iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
sudo iptables -A INPUT -i lo -j ACCEPT
sudo iptables -P INPUT DROP # Default deny — set LAST
# Save / restore
sudo iptables-save > /etc/iptables/rules.v4
sudo iptables-restore < /etc/iptables/rules.v4
# UFW — simplified frontend (Ubuntu)
sudo ufw allow 22/tcp
sudo ufw allow from 10.0.0.0/8 to any port 5432 # Postgres from private net
sudo ufw deny 8080/tcp
sudo ufw enable
sudo ufw status verbose
tcpdump
sudo tcpdump -i eth0 # All traffic
sudo tcpdump -i any port 443 # HTTPS on all interfaces
sudo tcpdump -i eth0 host 10.0.0.5 # Traffic to/from specific host
sudo tcpdump -i eth0 -A port 80 # Print ASCII payload (HTTP debug)
sudo tcpdump -i eth0 -w capture.pcap # Save for Wireshark
sudo tcpdump -r capture.pcap # Read saved file
sudo tcpdump -i any -n udp port 53 # Watch DNS queries
# Watch TCP SYN packets (connection attempts)
sudo tcpdump -i eth0 'tcp[tcpflags] & tcp-syn != 0'
12. systemd
Service Control
sudo systemctl start nginx
sudo systemctl stop nginx
sudo systemctl restart nginx # Stop + start (brief downtime)
sudo systemctl reload nginx # Send SIGHUP — reload config in-place
sudo systemctl try-reload-or-restart nginx # Reload if supported, else restart
# Enable / disable at boot
sudo systemctl enable nginx
sudo systemctl disable nginx
sudo systemctl enable --now nginx # Enable AND start immediately
# Status
systemctl status nginx # Status, recent log lines, enabled state
systemctl is-active nginx # Exits 0 if active
systemctl is-enabled nginx # "enabled" / "disabled" / "static"
systemctl is-failed nginx
# List units
systemctl list-units --type=service
systemctl list-units --state=failed
systemctl list-unit-files --type=service
# Dependency graph
systemctl list-dependencies nginx
systemctl list-dependencies --reverse nginx # Who depends on nginx
journalctl
journalctl -u nginx # All nginx logs
journalctl -u nginx -n 100 -f # Last 100 lines + follow
journalctl -u nginx --since "1 hour ago"
journalctl -u nginx --since "2024-01-15 10:00" --until "2024-01-15 11:00"
# Priority filter
journalctl -p err # Error and above
journalctl -p warning -u nginx
# Priorities: emerg alert crit err warning notice info debug
# Boot logs
journalctl -b # Current boot
journalctl -b -1 # Previous boot
journalctl --list-boots
# Kernel messages
journalctl -k # Kernel ring buffer
journalctl -k --since "5 min ago"
# JSON output for parsing
journalctl -u nginx -o json | jq '.MESSAGE'
# Disk management
journalctl --disk-usage
sudo journalctl --vacuum-size=500M # Trim to 500MB
sudo journalctl --vacuum-time=30d # Trim to 30 days
Writing Unit Files
Example: production service unit file
# /etc/systemd/system/myapp.service
# After creating or editing: sudo systemctl daemon-reload
[Unit]
Description=My Application Server
Documentation=https://github.com/myorg/myapp
After=network.target postgresql.service
Wants=postgresql.service
[Service]
Type=simple
User=myapp
Group=myapp
WorkingDirectory=/opt/myapp
EnvironmentFile=/etc/myapp/env # Load KEY=VALUE pairs from file
ExecStart=/opt/myapp/bin/server --port 8080
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure # Restart if exits non-zero or crashes
RestartSec=5
StartLimitIntervalSec=60
StartLimitBurst=3 # Max 3 restarts in 60 seconds
# Security hardening
NoNewPrivileges=true
PrivateTmp=true # Isolated /tmp namespace
ProtectSystem=strict # Read-only /usr and /etc
ProtectHome=true
ReadWritePaths=/var/lib/myapp /var/log/myapp
# Resource limits
LimitNOFILE=65536
MemoryMax=2G
CPUQuota=200% # Max 2 CPU cores
[Install]
WantedBy=multi-user.target
systemd Timers
# Timer unit triggers a matching .service unit on a schedule
# Advantages over cron: journald logging, Persistent=true, resource limits, RandomizedDelay
# /etc/systemd/system/backup.timer
# [Unit]
# Description=Daily backup timer
# [Timer]
# OnCalendar=daily # Every day at midnight
# OnCalendar=*-*-* 02:30:00 # Every day at 02:30
# OnCalendar=Mon,Thu *-*-* 04:00 # Mon and Thu at 04:00
# RandomizedDelaySec=1800 # Random delay up to 30 minutes (spread load)
# Persistent=true # Run missed jobs on next boot
# [Install]
# WantedBy=timers.target
sudo systemctl enable --now backup.timer
systemctl list-timers --all # All timers with next trigger time
13. Disk & Storage
df & du
# df — mounted filesystem space
df -h # Human-readable
df -hT # Include filesystem type
df -i # Inode usage (can run out before disk space)
# du — file/directory usage
du -sh /var/log # Total size
du -sh /var/log/* # Per-item summary
du -h --max-depth=2 /var # Tree to 2 levels
du -ah /var | sort -rh | head -20 # Top 20 consumers under /var
# Find large files
find /var -type f -size +100M -exec ls -lh {} \;
# Interactive explorer
ncdu /var # apt install ncdu
lsblk, mount & fstab
# List block devices
lsblk # Tree view: disks, partitions, LVM, loop
lsblk -f # Include filesystem type, UUID, mountpoint
# Format
sudo mkfs.ext4 /dev/sdb1
sudo mkfs.xfs /dev/sdb1 # XFS preferred for large files and databases
# Mount
sudo mount /dev/sdb1 /mnt/data
sudo mount -o ro /dev/sdb1 /mnt/data # Read-only mount
sudo umount /mnt/data
sudo umount -l /mnt/data # Lazy — detach now, clean up when idle
# Show mounts
mount | column -t
findmnt # Tree view
# /etc/fstab — persistent mounts (loaded at boot)
# Format: device mountpoint type options dump fsck-order
# UUID=abc123 /mnt/data ext4 defaults,noatime 0 2
# Get UUID
sudo blkid /dev/sdb1
# Test fstab without rebooting
sudo mount -a
# Common mount options:
# noatime = skip access time updates (significant I/O reduction)
# noexec = prevent binary execution (security hardening for /tmp, user uploads)
# nosuid = ignore SUID/SGID bits (security hardening)
LVM
# Inspect
pvdisplay && vgdisplay && lvdisplay
lvs # Compact summary
# Extend an LV online (no unmount needed with ext4 or xfs)
sudo lvextend -L +10G /dev/vg0/data # Add 10GB
sudo lvextend -l +100%FREE /dev/vg0/data # Use all VG free space
sudo resize2fs /dev/vg0/data # Grow ext4 to fill LV
sudo xfs_growfs /mnt/data # Grow XFS (pass mount point)
# Snapshot for backup
sudo lvcreate -L 5G -s -n mydata-snap /dev/vg0/mydata
sudo mount -o ro /dev/vg0/mydata-snap /mnt/snap
# ... run backup ...
sudo umount /mnt/snap
sudo lvremove /dev/vg0/mydata-snap
14. SSH
Keys & Config
# Generate key — Ed25519 preferred (faster, smaller, more secure than RSA 4096)
ssh-keygen -t ed25519 -C "alice@work" -f ~/.ssh/id_ed25519_work
# Deploy public key to server
ssh-copy-id -i ~/.ssh/id_ed25519_work.pub [email protected]
# ~/.ssh/config — per-host connection settings
# Host web1
# HostName 10.0.1.50
# User alice
# IdentityFile ~/.ssh/id_ed25519_work
# Port 2222
#
# Host bastion
# HostName bastion.example.com
# User deploy
# ForwardAgent yes # Forward local SSH agent (keys usable on bastion)
#
# Host prod-* # Wildcard — matches prod-web1, prod-db1, etc.
# ProxyJump bastion # Auto-jump through bastion host
# User deploy
ssh web1 # Connects using config above
ssh prod-db1 # Jumps through bastion automatically
# SSH agent — cache passphrase in memory
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519_work
Tunneling
# Local port forwarding — bring a remote service to a local port
ssh -L 5433:localhost:5432 alice@db-server
# psql -h 127.0.0.1 -p 5433 now reaches db-server's local Postgres
# Non-interactive background tunnel
ssh -fNL 5433:localhost:5432 alice@db-server # -f=background, -N=no shell
# Tunnel to a third host reachable from the SSH server
ssh -L 8080:internal-app.internal:80 alice@bastion
# Remote port forwarding — expose a local service on a remote port
ssh -R 8080:localhost:3000 alice@server # server:8080 -> local:3000
# Dynamic SOCKS proxy — proxy any TCP traffic through SSH
ssh -D 1080 -fN alice@server # SOCKS5 on localhost:1080
# ProxyJump — multi-hop SSH
ssh -J bastion.example.com alice@internal-server
# Agent forwarding — use your local private keys while on the remote server
ssh -A alice@bastion # Only for fully trusted hosts
scp & rsync
# scp — simple copy (no delta, no resume)
scp file.txt alice@server:/tmp/
scp alice@server:/var/log/app.log /tmp/
scp -r ./project alice@server:/opt/
scp -P 2222 file.txt alice@server:/tmp/ # Non-default port
# rsync — efficient sync with delta transfer
rsync -avz ./src/ alice@server:/opt/myapp/
# -a = archive (recursive + preserve permissions, timestamps, symlinks)
# -v = verbose, -z = compress during transfer
rsync -avz --delete ./src/ server:/opt/myapp/ # Mirror (delete remote extras)
rsync -avz --exclude='*.log' --exclude='.git' ./src/ server:/opt/
rsync -avz -e "ssh -p 2222" ./src/ server:/opt/ # Custom port
rsync -avz --progress big.tar.gz server:/tmp/ # Show per-file progress
# Dry run first
rsync -avzn --delete ./src/ server:/opt/myapp/ # -n = simulate only
sshd Hardening
# /etc/ssh/sshd_config — key settings
# Test config BEFORE reloading: sudo sshd -t
# PermitRootLogin no # Never allow direct root login
# PasswordAuthentication no # Keys only — eliminates brute-force risk
# PubkeyAuthentication yes
# AllowUsers alice bob deploy # Explicit whitelist
# MaxAuthTries 3
# ClientAliveInterval 300 # Disconnect idle after 5 min
# ClientAliveCountMax 2
# X11Forwarding no
# AllowTcpForwarding yes # Required for tunnels; set no if unneeded
# Reload without locking yourself out
sudo sshd -t && sudo systemctl reload sshd
# Verify effective settings
sudo sshd -T | grep -E "permitrootlogin|passwordauth|allowusers"
15. Cron & Scheduling
crontab Syntax
# Format: minute hour day-of-month month day-of-week command
# Range: 0-59 0-23 1-31 1-12 0-7 (0&7=Sunday)
# Wildcards: * = any, */5 = every 5, 1,3,5 = list, 1-5 = range
crontab -e # Edit current user's crontab
crontab -l # List
crontab -r # Remove ALL (no confirmation prompt!)
sudo crontab -l -u alice # List another user's crontab (as root)
# Common schedule patterns
# 0 2 * * * Daily backup at 02:00
# */5 * * * * Every 5 minutes
# 0 9-17 * * 1-5 Top of every business hour, Mon-Fri
# 0 0 1 * * First day of each month at midnight
# @reboot Once on system boot
# @daily Midnight every day (alias)
# @hourly Every hour at :00 (alias)
# System crontabs (include user field)
# /etc/crontab and /etc/cron.d/myapp:
# minute hour day month weekday USER command
# Drop scripts directly into these directories (no crontab format):
# /etc/cron.daily/ /etc/cron.weekly/ /etc/cron.monthly/
Cron runs with a minimal PATH
Cron's PATH is usually only /usr/bin:/bin. Always use absolute paths in cron commands. Always redirect output to avoid silent email failures:*/5 * * * * /usr/local/bin/check.sh >>/var/log/check.log 2>&1
# Set variables at top of crontab for correct environment
# SHELL=/bin/bash
# PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
# MAILTO="" # Suppress email output
# Test in cron's restricted environment
env -i HOME=/root SHELL=/bin/bash PATH=/usr/local/bin:/usr/bin:/bin \
/usr/local/bin/myscript.sh
# Prevent overlapping runs with flock
*/5 * * * * /usr/bin/flock -n /var/lock/myjob.lock /usr/local/bin/myjob.sh
Cron vs systemd Timers
| Feature | cron | systemd timers |
|---|---|---|
| Logging | Email or redirect manually | Automatic journald integration |
| Missed runs | Silently skipped | Run on next boot with Persistent=true |
| Dependencies | None | Full unit dependency support |
| Random delay | Manual sleep in script | RandomizedDelaySec built-in |
| Resource limits | None | CPUQuota, MemoryMax, etc. |
| Status check | Grep syslog/email | systemctl status job.timer |
| Complexity | Single crontab line | Two unit files required |
16. Performance & Monitoring
vmstat, iostat, sar
# vmstat — VM stats at a glance
vmstat 1 10 # 1-second samples, 10 times
# Key columns:
# r = run queue (tasks waiting for CPU; high = CPU-bound)
# b = blocked on uninterruptible I/O
# si/so = swap in/out KB/s (nonzero = memory pressure)
# wa = iowait % (high >20% = I/O bottleneck)
# iostat — per-device disk I/O (sysstat package)
iostat -xz 1 # Extended stats; skip idle devices
# Key columns:
# r/s w/s = reads/writes per second
# rkB/s wkB/s = throughput KB/s
# await = avg wait time ms (high = disk bottleneck)
# %util = percent time busy (100% = saturated)
# sar — historical activity (collected every 10 min by sadc)
sar -u 1 10 # CPU utilization
sar -r 1 10 # Memory utilization
sar -b 1 10 # Block I/O stats
sar -n DEV 1 5 # Network interface throughput
sar -q 1 10 # Load average and run queue
# Today's CPU history from sadc
sar -u -f /var/log/sa/sa$(date +%d)
strace & perf
# strace — trace system calls made by a process
strace command # Trace new process
strace -p 1234 # Attach to running PID
strace -e trace=network command # Only network syscalls
strace -e openat,read,write -p 1234 # File I/O syscalls
strace -c command # Summary: time spent per syscall
strace -T -p 1234 # Time each individual syscall
# Diagnose a hung process
strace -p 1234 2>&1 | head -5
# futex(WAIT) = blocked on mutex/lock
# epoll_wait = event loop idle (normal)
# read on socket = waiting for network data
# perf — hardware performance counters (requires kernel perf support)
sudo perf stat command # HW event counts: cycles, cache misses, branches
sudo perf top # Live per-function CPU profiling
sudo perf record -F 99 -g -p 1234 -- sleep 30 # 30-second CPU profile
sudo perf report # Flamegraph-style interactive viewer
sysctl & System Tuning
# Read / write kernel parameters
sysctl -a # All parameters
sysctl net.ipv4.tcp_max_syn_backlog # Read specific param
sudo sysctl -w net.ipv4.ip_forward=1 # Write immediately (not persistent)
# Persist in a drop-in file
sudo tee /etc/sysctl.d/99-production.conf <<'EOF'
fs.file-max = 2097152
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_tw_reuse = 1
vm.swappiness = 10
vm.overcommit_memory = 1
EOF
sudo sysctl -p /etc/sysctl.d/99-production.conf
# Per-process fd limits
ulimit -n # Current shell limit
ulimit -n 65536 # Raise for current session
# Persistent limits: /etc/security/limits.conf
# myapp soft nofile 65536
# myapp hard nofile 131072
Load Average & Memory
# Load average interpretation
uptime
# load average: 1.23, 0.87, 0.72 (1/5/15 minute)
# load / nCPU: <0.7 = healthy, 1.0 = at capacity, >1.0 = overloaded
nproc # Number of logical CPU cores
# Memory summary
free -h
# "available" is what matters (MemFree + reclaimable cache)
cat /proc/meminfo | grep -E "MemTotal|MemAvailable|SwapUsed|Cached"
# Memory hogs
ps aux --sort=-%mem | head -10
# OOM events in kernel log
dmesg | grep -i "oom\|killed process\|out of memory"
# OOM score — 0-1000; higher = more likely to be killed
cat /proc/$(pgrep -n myapp)/oom_score
echo -300 | sudo tee /proc/$(pgrep -n myapp)/oom_score_adj # Protect it
17. Common Interview Scenarios
Debugging a Slow Server
# Step 1: 30-second high-level snapshot
uptime # Load — is CPU saturated?
free -h # Memory exhausted? Swap active?
df -h # Any filesystem full?
iostat -xz 1 3 # Any disk %util near 100%?
# Step 2: Find the hot processes
ps aux --sort=-%cpu | head -10
ps aux --sort=-%mem | head -10
# Step 3: If iowait is high — who is doing I/O?
iotop -o # apt install iotop
iostat -xz 1 | grep -v "^$"
# Step 4: Network issues?
ss -tnp # Connection states — lots of CLOSE_WAIT or TIME_WAIT?
netstat -s | grep -E "retransmit|failed"
# Step 5: Application layer
journalctl -u myapp --since "10 min ago" -p warning --no-pager
tail -f /var/log/myapp/app.log | grep -E "SLOW|timeout|error|WARN"
Disk Full Recovery
# Identify the full filesystem
df -h
# Find the culprits
du -ah /var | sort -rh | head -20
find /var/log -type f -size +100M -exec ls -lh {} \;
# Quick wins — safe to clean
sudo apt clean # Cached .deb packages
sudo dnf clean all # Cached .rpm packages
sudo journalctl --vacuum-size=100M # Trim journal to 100MB
gzip /var/log/*.log.{1,2} # Compress old rotated logs
find /tmp -mtime +1 -delete # Clear old /tmp files
# Truncate a log file without restarting the writing process
> /var/log/app.log # Truncate in-place (FD stays open)
# Find deleted-but-open files (space held until process restarts)
lsof | grep "(deleted)" | awk '{print $1, $2, $7}'
# Fix: restart the holding process
# Docker cleanup (if applicable)
docker system prune -a --volumes # CAUTION: removes unused images and volumes
OOM Debugging
# Confirm OOM happened
dmesg | grep -i "out of memory\|oom_kill\|killed process"
grep "Out of memory" /var/log/kern.log | tail -5
# OOM log shows:
# - Process name + PID that was killed
# - Memory state at time of kill (active, inactive, free, slab pages)
# - oom_score of all processes at the time
# Current memory consumers
ps aux --sort=-%mem | head -10
# Check swap
free -h
swapon --show
# Reduce OOM kill risk for a critical process
echo -500 | sudo tee /proc/$(pgrep -n postgres)/oom_score_adj # Less likely to die
# Add swap on a RAM-constrained server (EC2, VMs)
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
# Make persistent: echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
Network Connectivity Debugging
# Layer-by-layer approach
# L3 — Is network configured?
ip addr # Do we have an IP address?
ip route # Do we have a default gateway?
ping -c 3 $(ip route | awk '/default/{print $3}') # Can we reach the gateway?
# L3 — DNS working?
dig +short google.com # Does DNS resolve?
dig @8.8.8.8 +short google.com # Test against Google's DNS (bypass local resolver)
cat /etc/resolv.conf # What DNS servers are configured?
# L4 — Remote port reachable?
nc -zv api.example.com 443 # TCP handshake test
curl -sv --connect-timeout 5 https://api.example.com 2>&1 | head -20
# L4 — Is our service actually listening?
ss -tlnp | grep :8080 # Is the port bound?
# Firewall blocking?
sudo iptables -L INPUT -n -v --line-numbers
sudo ufw status verbose
# Path analysis
traceroute api.example.com
mtr --report api.example.com # Combined ping + traceroute (apt install mtr)
# Capture to confirm packets
sudo tcpdump -i eth0 -n host api.example.com and port 443 -c 20
Runaway / High-CPU Process
# Find it
ps aux --sort=-%cpu | head -5
top -b -n 1 | head -15
# Gather info before acting
PID=12345
cat /proc/$PID/cmdline | tr '\0' ' ' # Full command line
cat /proc/$PID/status | grep -E "Name|Uid|VmRSS|Threads"
ls -la /proc/$PID/exe # Binary path
ls /proc/$PID/fd | wc -l # Open file descriptor count
# Profile it briefly before killing
strace -p $PID -c -e trace=all 2>&1 &
PID_STRACE=$!; sleep 5; kill $PID_STRACE 2>/dev/null; true
# Graceful termination first
kill -TERM $PID
sleep 10
# Force kill only if still alive
kill -0 $PID 2>/dev/null && kill -KILL $PID
# Reduce impact without killing (while investigating)
renice +15 -p $PID # Lower scheduling priority
cpulimit -p $PID -l 25 # Limit to 25% CPU (apt install cpulimit)
Production incident triage script
#!/usr/bin/env bash
# incident-triage.sh — Collect system snapshot during a production incident
set -euo pipefail
OUTDIR="/tmp/incident-$(hostname)-$(date +%Y%m%d_%H%M%S)"
mkdir -p "$OUTDIR"
collect() {
local name="$1"; shift
echo " Collecting $name..."
"$@" > "$OUTDIR/${name}.txt" 2>&1 || true
}
echo "Collecting system state to $OUTDIR ..."
collect "01-uptime" uptime
collect "02-date-uname" bash -c 'date; uname -a'
collect "03-df" df -h
collect "04-free" free -h
collect "05-vmstat" vmstat 1 5
collect "06-iostat" iostat -xz 1 5
collect "07-ps-cpu" ps aux --sort=-%cpu
collect "08-ps-mem" ps aux --sort=-%mem
collect "09-top" top -b -n 1
collect "10-ss-listen" ss -tlnp
collect "11-ss-conns" ss -tnp state established
collect "12-ip-addr" ip addr
collect "13-ip-route" ip route
collect "14-dmesg" dmesg --time-format iso | tail -200
collect "15-journal-errors" journalctl -p err --since "2 hours ago" --no-pager
collect "16-file-nr" cat /proc/sys/fs/file-nr
collect "17-lsof-count" bash -c 'lsof 2>/dev/null | wc -l'
collect "18-sar-history" sar -q -f /var/log/sa/sa$(date +%d) 2>/dev/null || true
tar -czf "${OUTDIR}.tar.gz" -C /tmp "$(basename "$OUTDIR")"
rm -rf "$OUTDIR"
echo "Done: ${OUTDIR}.tar.gz"
echo "Share this archive with the on-call team for postmortem analysis."