Table of Contents

1. Setup & Environment

Linux Practice on macOS

macOS is POSIX-compliant but differs from Linux in important ways. For accurate Linux practice, run a real Linux environment locally.

Option A: Docker (Fastest)

# Interactive Ubuntu shell — disposable, no setup
docker run --rm -it ubuntu:24.04 bash

# With your current directory mounted
docker run --rm -it -v "$(pwd)":/work -w /work ubuntu:24.04 bash

# Persistent container you can stop/start
docker run -it --name linux-lab ubuntu:24.04 bash
docker start -ai linux-lab

# Debian with common tools pre-installed
docker run --rm -it debian:bookworm bash
apt-get update && apt-get install -y procps iproute2 net-tools curl vim

Option B: Multipass (Ubuntu VMs, native performance)

# Install
brew install multipass

# Launch Ubuntu 24.04 LTS VM (2 CPU, 4GB RAM, 20GB disk)
multipass launch --name lab --cpus 2 --memory 4G --disk 20G 24.04

# Shell into it
multipass shell lab

# Mount local directory
multipass mount ~/projects lab:/projects

# List / stop / delete
multipass list
multipass stop lab
multipass delete lab && multipass purge
When to use each
Docker is fine for filesystem, text processing, and scripting practice. Use Multipass when you need a full init system (systemd), real network interfaces, or kernel-level features like perf and iptables — Docker containers share the host kernel and lack systemd by default.

macOS vs Linux Differences That Bite

AreamacOS (BSD-based)Linux (GNU)
sed -iRequires empty string: sed -i '' 's/a/b/' fsed -i 's/a/b/' f
dateBSD date, no -d flagGNU date: date -d '2 days ago'
ls --colorNot supported (use -G)ls --color=auto
grep -PNo Perl regex by defaultSupported natively
readlink -fNot available (use realpath)Supported
xargs -rNot supportedSkips exec if stdin is empty
/procDoes not existVirtual filesystem exposing kernel state
Package managerHomebrew (3rd party)apt / dnf / pacman
Default shellzsh (since Catalina)bash (most distros)
Install GNU tools on macOS
brew install coreutils findutils gnu-sed gawk grep — then use gdate, gsed, ggrep, etc., or prepend the gnubin paths to PATH to shadow the BSD versions transparently.

2. Filesystem Hierarchy

FHS — Key Directories

DirectoryPurpose
/bin, /sbinEssential binaries (now usually symlinks to /usr/bin on modern distros)
/usr/binUser commands — grep, python3, git
/usr/local/binLocally installed software (Homebrew on macOS, manual installs on Linux)
/etcSystem-wide configuration (text files, human-editable)
/varVariable data — logs (/var/log), spool, caches, databases
/tmpTemporary files; cleared on reboot (usually tmpfs in RAM)
/homeUser home directories (/root for root)
/procVirtual FS exposing kernel and process state as files
/sysVirtual FS for kernel objects — devices, drivers, power management
/devDevice files — block (sda), char (tty), pseudo (null, zero, urandom)
/runRuntime data since last boot — PID files, sockets (tmpfs)
/optSelf-contained optional packages (e.g., /opt/google/chrome)
/lib, /lib64Shared libraries needed by /bin and /sbin
/bootKernel image, initrd, GRUB bootloader
/mnt, /mediaMount points for temporary and removable filesystems

Everything Is a File

The Unix philosophy treats hardware, processes, and kernel state as files — making them composable with standard text tools.

# CPU info from kernel
cat /proc/cpuinfo | grep -m1 "model name"
nproc                          # Number of logical CPUs

# Memory info
cat /proc/meminfo | head -5

# Process info — every PID has a /proc/$PID directory
ls /proc/$$                    # $$ = current shell PID
cat /proc/$$/cmdline | tr '\0' ' '
cat /proc/$$/status | grep -E "^(Name|Pid|VmRSS)"

# Kernel tunable parameters
cat /proc/sys/net/ipv4/ip_forward
sysctl net.ipv4.ip_forward

# Hardware via /sys
cat /sys/class/net/eth0/speed  # NIC speed in Mbps
ls /sys/block/                 # Block devices

# Pseudo-devices
dd if=/dev/zero bs=1M count=100 of=/dev/null   # Throughput benchmark
dd if=/dev/urandom bs=16 count=1 | xxd         # 16 random bytes
echo "discard this" > /dev/null                # Suppress output
/proc is live kernel state
Reading /proc/meminfo does not read from disk — the kernel generates the content on-demand each time you open the file. Changes to /proc/sys/* take effect immediately but are not persistent across reboots. Use sysctl -w + /etc/sysctl.conf (or a file in /etc/sysctl.d/) for persistence.

Inodes and Links

# Every file has an inode — metadata record (permissions, timestamps, block pointers)
# A directory entry is just a name-to-inode mapping

stat myfile.txt                # Full inode metadata
ls -li /etc/passwd             # -i shows inode number

# Hard link: another directory entry pointing to the same inode
ln /etc/passwd /tmp/passwd-hard
# Both names share the same inode; deleting one leaves the other intact
# Hard links cannot cross filesystem boundaries; cannot link directories

# Soft (symbolic) link: a file whose content is a path
ln -s /etc/nginx/nginx.conf nginx.conf
ls -la nginx.conf              # Shows -> /etc/nginx/nginx.conf
readlink -f nginx.conf         # Resolve all symlinks to absolute path

# Find files by inode (useful when filename has unprintable chars)
find / -inum 12345 2>/dev/null

3. File Operations

ls & Navigation

ls -lah                        # Long format, all files, human-readable sizes
ls -lt                         # Sort by modification time (newest first)
ls -ltr                        # Oldest first (useful for log directories)
ls -lS                         # Sort by size (largest first)
ls -d */                       # List only directories
ls -1                          # One file per line (for scripting)

# Better alternatives
tree -L 2                      # Directory tree, 2 levels deep
tree -L 2 -I 'node_modules|.git'  # Exclude patterns

# Navigate
cd -                           # Jump to previous directory
pushd /tmp                     # Push to directory stack
popd                           # Pop back
dirs -v                        # Show directory stack

cp, mv, rm

# cp — preserve timestamps, ownership, permissions
cp -a src/ dst/                # Archive mode: recursive + preserve all metadata
cp -r src/ dst/                # Recursive (no metadata preservation)
cp -u src dst                  # Copy only if src is newer than dst
cp --backup=numbered f dst/    # Keep numbered backups on overwrite

# mv
mv oldname newname             # Rename or move
mv -n src dst                  # No clobber — never overwrite existing
mv -v *.log /archive/          # Verbose

# rm — no trash, no undo
rm -rf dir/                    # Delete recursively, no prompt
rm -i *.tmp                    # Interactive prompt per file
rm -- -weird-filename          # -- treats args as filenames, not flags
rm -rf has no undo
There is no Recycle Bin on Linux. Always double-quote variables in scripts: rm -rf "$dir" — an unset $dir expands to nothing, turning the command into rm -rf alone, which is a very bad day. GNU rm has --preserve-root (default) to block rm -rf /.

find

# Syntax: find [path] [expression]
find . -name "*.log"                      # By name (case-sensitive)
find . -iname "*.Log"                     # Case-insensitive
find . -type f                            # Regular files only
find . -type d                            # Directories only
find . -type l                            # Symlinks only

# By size
find /var/log -size +100M                 # Larger than 100 MB
find . -size -1k                          # Smaller than 1 KB
find . -empty                             # Empty files and dirs

# By time (n = days, +n = older than n days, -n = newer)
find /tmp -mtime +7                       # Modified more than 7 days ago
find . -mmin -60                          # Modified in the last 60 minutes
find . -newer reference.txt               # Newer than reference.txt

# By permissions
find / -perm -4000 2>/dev/null            # SUID files (security audit)
find . -perm /022                         # World- or group-writable

# Execute actions
find /tmp -mtime +7 -delete               # Delete old files
find . -name "*.py" -exec wc -l {} \;    # Count lines in each .py file
find . -name "*.py" -exec grep -l "TODO" {} \;  # Files containing TODO

# xargs — more efficient than -exec for many files
find . -name "*.log" | xargs rm -f
find . -name "*.py" | xargs grep -l "import os"
find . -name "*.txt" -print0 | xargs -0 wc -l  # -print0/-0 handles spaces in names

# Prune (skip) directories
find . -name node_modules -prune -o -name "*.js" -print

Globbing & Brace Expansion

# Standard globs (shell expands these before passing to command)
*.log           # Any file ending in .log
file?.txt       # file1.txt, fileA.txt (exactly one character)
[abc].txt       # a.txt, b.txt, c.txt
[0-9]*.sh       # Scripts starting with a digit
[!a]*.log       # Files NOT starting with 'a'

# Extended globs (enable with: shopt -s extglob)
!(*.log)        # Everything except .log files
+(*.tar|*.gz)   # One or more .tar or .gz files

# Brace expansion — no filesystem lookup, purely syntactic
echo file{1,2,3}.txt           # file1.txt file2.txt file3.txt
echo {a..z}                    # a b c ... z
mkdir -p project/{src,tests,docs,scripts}
cp config.yaml config.yaml.{bak,$(date +%Y%m%d)}

# Globstar — recursive glob (enable with: shopt -s globstar)
ls **/*.py                     # All Python files in any subdirectory

4. File Permissions & Ownership

Permission Model

# ls -l output: -rwxr-xr-x  1  owner  group  size  date  name
#               ^ file type (- = regular, d = dir, l = symlink, b = block dev, c = char dev)
#                ^rwx  = owner (user) permissions
#                   ^r-x = group permissions
#                      ^r-x = other (world) permissions

# Permission bits: r=4, w=2, x=1
# rwx=7, r-x=5, r--=4, ---=0

# Show octal permissions
stat -c "%a %n" /usr/bin/sudo  # e.g., 4755 /usr/bin/sudo

# Directory permissions behave differently from files:
# r = can list directory contents (ls)
# w = can create, delete, rename files WITHIN the directory
# x = can traverse (cd into it, or access files by name)

chmod & chown

# chmod — symbolic mode
chmod u+x script.sh            # Add execute for owner
chmod go-w file.txt            # Remove write from group and other
chmod a+r public.html          # Add read for all (a = ugo)
chmod u=rwx,g=rx,o=            # Set exact permissions (no access for other)

# chmod — octal mode
chmod 755 script.sh            # rwxr-xr-x (typical executable)
chmod 644 config.txt           # rw-r--r-- (typical data file)
chmod 600 ~/.ssh/id_rsa        # rw------- (private key — SSH will reject if looser)
chmod 700 ~/.ssh               # rwx------ (SSH directory)
chmod -R 755 /var/www/html     # Recursive

# chown — change owner and/or group
chown alice file.txt
chown alice:developers file.txt
chown -R www-data:www-data /var/www
chown :docker /var/run/docker.sock   # Change group only (owner unchanged)

umask

# umask defines permission bits to REMOVE from newly created files/dirs
# New file default: 666 (rw-rw-rw-)   New dir default: 777 (rwxrwxrwx)
# umask 022 removes: ---w--w-  (removes group+other write)
# Result files=644, dirs=755

umask                          # Show current umask (e.g., 0022)
umask 027                      # Files=640, dirs=750 (group read, no other access)
umask 077                      # Files=600, dirs=700 (owner only — useful for secrets)

# Verify effect
umask 022; touch testfile; stat -c "%a" testfile; rm testfile  # Should print 644

SUID, SGID, Sticky Bit

# SUID (Set User ID) — bit 4 on executable
# The process runs as the FILE OWNER, not the calling user
ls -l /usr/bin/passwd          # -rwsr-xr-x  (s = SUID set, execute bit also set)
chmod u+s /usr/bin/myapp       # Set SUID
chmod 4755 /usr/bin/myapp      # Octal: 4=SUID + 755

# SGID (Set Group ID) — bit 2
# On executable: runs as file GROUP
# On directory: new files/dirs inherit the directory's group (great for shared dirs)
chmod g+s /shared/             # Set SGID on directory
chmod 2775 /shared/            # Octal: 2=SGID + 775
ls -ld /shared/                # drwxrwsr-x  (s in group execute position)

# Sticky bit — bit 1
# On directory: users can only delete their OWN files, even if they have write on dir
ls -ld /tmp                    # drwxrwxrwt  (t = sticky bit set)
chmod +t /shared/uploads/
chmod 1777 /tmp/               # Octal: 1=sticky + 777
SUID on shell scripts is silently ignored
Linux ignores the SUID bit on interpreted scripts (bash, python, etc.) for security reasons — only compiled binaries honour it. Grant specific script privileges via sudo rules in /etc/sudoers instead.

ACLs (Access Control Lists)

# ACLs allow per-user/per-group permissions beyond the owner/group/other triplet
# Requires filesystem mounted with 'acl' option (default on ext4/xfs on modern distros)

getfacl /etc/myapp/config.yaml           # View ACLs

setfacl -m u:alice:rw /etc/myapp/config.yaml       # Grant alice read+write
setfacl -m g:devops:rx /usr/local/bin/deploy.sh    # Grant devops group r-x
setfacl -d -m g:developers:rw /var/www/html/       # Default ACL (inherited by new files)

setfacl -x u:alice /etc/myapp/config.yaml          # Remove alice's ACL entry
setfacl -b /etc/myapp/config.yaml                  # Remove ALL ACLs

# A '+' in ls -l output indicates ACLs are present:
ls -l /etc/myapp/config.yaml   # -rw-rw-r--+

5. Text Processing

grep

# Basic
grep "error" /var/log/syslog
grep -i "error" file.txt       # Case-insensitive
grep -v "debug" file.log       # Invert match (exclude)
grep -r "TODO" src/            # Recursive
grep -rl "TODO" src/           # Print matching filenames only

# Context around matches
grep -A 3 "Exception" app.log  # 3 lines After
grep -B 2 "Exception" app.log  # 2 lines Before
grep -C 5 "Exception" app.log  # 5 lines Context (before + after)

# Counts and line numbers
grep -c "ERROR" app.log        # Count of matching lines
grep -n "ERROR" app.log        # Show line numbers
grep -m 10 "ERROR" app.log     # Stop after 10 matches

# Regex
grep -E "error|warn|crit" app.log   # Extended regex (ERE)
grep -P "\d{3}-\d{4}" phones.txt    # Perl-compatible regex (GNU grep)
grep "^ERROR" app.log               # Lines starting with ERROR
grep "\.py$" filelist.txt           # Lines ending with .py

# Multiple patterns
grep -e "error" -e "warn" app.log
grep -f patterns.txt app.log        # Patterns from a file

# Practical combos
grep -rn "deprecated" --include="*.py" src/
grep -rn "password" --exclude-dir=".git" .
zgrep "ERROR" /var/log/app.log.gz   # Search inside gzip files

sed

# Substitution: s/pattern/replacement/flags
sed 's/foo/bar/' file.txt             # Replace first occurrence per line
sed 's/foo/bar/g' file.txt            # Replace all (global)
sed 's/foo/bar/gi' file.txt           # Global + case-insensitive (GNU sed)
sed -i 's/foo/bar/g' file.txt         # In-place edit
sed -i.bak 's/foo/bar/g' file.txt     # In-place with .bak backup

# Address ranges
sed '3s/foo/bar/' file.txt            # Only line 3
sed '2,5s/foo/bar/g' file.txt         # Lines 2-5
sed '/^#/d' file.txt                  # Delete comment lines
sed '/start/,/end/d' file.txt         # Delete from 'start' to 'end' pattern
sed -n '10,20p' file.txt              # Print only lines 10-20

# Delete and print
sed '/^$/d' file.txt                  # Delete blank lines
sed '1d' file.txt                     # Delete first line (skip CSV header)
sed '$d' file.txt                     # Delete last line

# Practical extractions
sed -n 's/.*error: \(.*\)/\1/p' app.log    # Extract text after "error: "
sed 's/[[:space:]]*$//' file.txt           # Strip trailing whitespace
sed 's/^\s*//;s/\s*$//' file.txt           # Strip leading and trailing whitespace

awk

# awk processes line by line; splits each line into fields
# Built-ins: $0=whole line, $1..$NF=fields, NF=field count, NR=record/line number

# Print specific columns
awk -F: '{print $1, $3}' /etc/passwd      # Username and UID (colon-delimited)
awk -F: '{print $1}' /etc/passwd          # Usernames only

# Conditions
awk -F: '$3 > 1000 {print $1}' /etc/passwd       # Regular users (UID > 1000)
awk -F: '$7 == "/bin/bash" {print $1}' /etc/passwd  # Users with bash shell
awk '/ERROR/ {print NR, $0}' app.log               # Line number + line for errors

# Arithmetic and aggregation
awk '{sum += $1} END {print "Total:", sum}' numbers.txt
awk -F, '{sum += $5} END {printf "Revenue: $%.2f\n", sum}' sales.csv

# BEGIN and END blocks
awk 'BEGIN {print "=== Report ==="} {print NR, $0} END {print "Lines:", NR}' file.txt

# Field manipulation
awk '{$2="REDACTED"; print}' log.txt              # Replace field 2
awk -F, 'OFS="," {$3=$3*1.1; print}' prices.csv  # Increase column 3 by 10%

# Parse nginx access log — top 20 requested URLs
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -20

# Compute 95th percentile latency from log file (latency in ms, field 10)
awk '{print $10}' access.log | sort -n | awk 'BEGIN{c=0} {lines[c++]=$0} END{print lines[int(c*0.95)]}'

cut, sort, uniq, tr, wc

# cut — extract columns
cut -d: -f1,3 /etc/passwd         # Fields 1 and 3, colon-delimited
cut -d, -f2-5 data.csv            # Fields 2 through 5
cut -c1-10 file.txt               # First 10 characters per line

# sort
sort file.txt                      # Lexicographic ascending
sort -r file.txt                   # Reverse
sort -n numbers.txt                # Numeric sort (essential — "10" > "9" numerically)
sort -rn numbers.txt               # Numeric descending
sort -t: -k3 -n /etc/passwd        # Sort by 3rd field (UID), numeric
sort -k2,2 -k1,1 file.txt          # Sort by field 2, then field 1
sort -u file.txt                   # Sort + deduplicate in one pass
sort --parallel=4 huge.txt         # Parallel sort for large files

# uniq (input must be sorted first)
sort file.txt | uniq               # Remove consecutive duplicates
sort file.txt | uniq -c            # Count occurrences (most useful)
sort file.txt | uniq -d            # Show only lines that appear more than once
sort file.txt | uniq -u            # Show only lines that are truly unique

# tr — translate or delete characters
tr 'a-z' 'A-Z' <<< "hello world"   # Uppercase
tr -d '\r' < windows.txt            # Remove Windows carriage returns (CRLF -> LF)
tr -s ' ' <<< "hello   world"       # Squeeze repeated spaces to one
tr -dc '[:alnum:]' < /dev/urandom | head -c 32  # 32-char random alphanumeric string

# wc
wc -l file.txt                     # Line count
wc -w file.txt                     # Word count
wc -c file.txt                     # Byte count
find . -name "*.py" | xargs wc -l | tail -1  # Total lines in all Python files

diff & jq

# diff
diff file1.txt file2.txt           # Default output
diff -u file1.txt file2.txt        # Unified format (standard for patches)
diff -r dir1/ dir2/                # Recursive directory diff
diff --color=always old new | less -R  # Colorized pager

# Apply a patch
diff -u original.py modified.py > changes.patch
patch original.py < changes.patch

# jq — JSON processor (apt install jq)
echo '{"name":"alice","age":30}' | jq '.name'              # "alice"
echo '{"items":[1,2,3]}' | jq '.items[]'                   # Each element
cat data.json | jq '.users[] | select(.active == true)'    # Filter array
cat data.json | jq '[.users[] | {name, email}]'            # Reshape objects
cat data.json | jq '.users | length'                       # Array length

# jq flags
jq -r '.name' data.json            # Raw output — no surrounding quotes
jq -c '.' data.json                # Compact single-line output
jq '.' data.json                   # Pretty-print (format) JSON
jq --arg key "value" '.[$key]' data.json  # Pass shell variable as jq variable

# Real-world: extract latest GitHub release tag
curl -s https://api.github.com/repos/cli/cli/releases/latest \
  | jq -r '.tag_name'

6. I/O Redirection & Pipes

Standard Streams

# File descriptors: 0=stdin  1=stdout  2=stderr

# Redirect stdout
command > file.txt             # Overwrite
command >> file.txt            # Append

# Redirect stderr
command 2> errors.txt          # stderr to file
command 2>&1                   # Merge stderr into stdout
command > out.txt 2>&1         # Both to file (stdout first, then merge stderr)
command &> out.txt              # Shorthand (bash only)
command 2>/dev/null             # Discard errors silently
command >/dev/null 2>&1        # Discard all output

# Redirect stdin
command < input.txt
command < input.txt > output.txt

# tee — write to file AND stdout simultaneously
command | tee output.txt       # stdout to terminal + file
command | tee -a output.txt    # Append mode
command 2>&1 | tee all.log    # Capture everything

Pipes & Process Substitution

# Pipes connect stdout of left command to stdin of right command
ps aux | grep nginx | grep -v grep | awk '{print $2}'   # nginx PIDs

# Merge stderr into pipeline
command 2>&1 | grep ERROR

# Process substitution — treat command output as a file argument
diff <(sort file1.txt) <(sort file2.txt)       # Compare sorted without temp files
comm <(sort a.txt) <(sort b.txt)               # Lines in a only / b only / both
cat <(head -5 file1.txt) <(tail -5 file2.txt)  # Concatenate two command outputs

# Named pipes (FIFOs)
mkfifo /tmp/mypipe
command1 > /tmp/mypipe &       # Write in background
command2 < /tmp/mypipe         # Read from pipe; blocks until writer is done
rm /tmp/mypipe

# pipefail — exit if any command in pipeline fails
set -o pipefail
# Without it, only the last command's exit code is checked:
false | true; echo $?          # Prints 0 (hides the failure!)
set -o pipefail
false | true; echo $?          # Prints 1 (correct)

Here Documents & Here Strings

# Here document — multi-line stdin
cat <<EOF
Line one with $HOME expanded
Line two
EOF

# Quoted heredoc — disable all expansion
cat <<'EOF'
The variable $HOME will NOT be expanded here.
EOF

# Indented heredoc — strip leading tabs (use real TAB chars, not spaces)
cat <<-EOF
	Indented line one
	Indented line two
EOF

# Redirect heredoc to a file
cat > /etc/myapp/config.yaml <<EOF
database:
  host: ${DB_HOST}
  port: ${DB_PORT:-5432}
EOF

# Here string — single value as stdin
grep "^root" <<< "$(cat /etc/passwd)"
base64 <<< "hello world"

# Send commands to interactive program
mysql -u root <<EOF
USE mydb;
SELECT COUNT(*) FROM users WHERE active = 1;
EOF

7. Shell Scripting (Bash)

Script Header & Safety Flags

#!/usr/bin/env bash
# Use /usr/bin/env bash for portability (bash may not be at /bin/bash)

set -euo pipefail
# -e  exit immediately on any command returning non-zero
# -u  treat unset variables as errors (prevents silent empty-string bugs)
# -o pipefail  make pipeline fail if any stage fails (not just the last)

# Safer IFS for word splitting
IFS=$'\n\t'

# Cleanup trap — runs on exit (including error exit)
TMPDIR_WORK=$(mktemp -d)
cleanup() {
  rm -rf "$TMPDIR_WORK"
}
trap cleanup EXIT

# Error trap with line number
trap 'echo "Error on line $LINENO" >&2' ERR

# Script directory (works when called from any working directory)
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

Variables & Parameter Expansion

# Assignment — NO spaces around =
name="Alice"
count=0
readonly CONFIG="/etc/app/config.yaml"   # Immutable variable

# Always double-quote expansions to prevent word splitting and glob expansion
echo "$name"                    # Correct
echo $name                      # Risky — splits on whitespace, expands globs

# Parameter expansion
echo "${name}"                  # Unambiguous (required before letters/digits)
echo "${name:-default}"         # Use 'default' if name is unset or empty
echo "${name:=default}"         # Assign 'default' if unset, then expand
echo "${name:?Error: required}" # Exit with error if name is unset/empty
echo "${name:+set}"             # Expand to 'set' if name is non-empty; else empty

# String operations
file="/path/to/archive.tar.gz"
echo "${file##*/}"              # archive.tar.gz  (basename — strip up to last /)
echo "${file%/*}"               # /path/to        (dirname — strip from last /)
echo "${file%%.*}"              # /path/to/archive (strip from first .)
echo "${file#*.}"               # tar.gz          (strip up to first .)
echo "${#file}"                 # 24              (string length)
echo "${file/tar/TAR}"          # Replace first match
echo "${file//a/A}"             # Replace all matches
echo "${name^^}"                # UPPERCASE (bash 4+)
echo "${name,,}"                # lowercase (bash 4+)

# Arrays
fruits=("apple" "banana" "cherry")
echo "${fruits[0]}"             # apple
echo "${fruits[@]}"             # All elements (space-separated)
echo "${#fruits[@]}"            # 3 (length)
fruits+=("date")                # Append element
for f in "${fruits[@]}"; do echo "$f"; done

# Associative arrays (bash 4+)
declare -A config
config[host]="localhost"
config[port]="5432"
for key in "${!config[@]}"; do echo "$key = ${config[$key]}"; done

Conditionals

# [[ ]] preferred in bash — no word splitting, no glob expansion, supports regex
if [[ "$name" == "Alice" ]]; then
  echo "Hello Alice"
elif [[ "$name" =~ ^Bob ]]; then   # =~ is regex match; capture in BASH_REMATCH
  echo "Hello ${BASH_REMATCH[0]}"
else
  echo "Hello stranger"
fi

# Common test operators
[[ -f "$file" ]]       # Regular file exists
[[ -d "$dir" ]]        # Directory exists
[[ -e "$path" ]]       # Any path exists
[[ -s "$file" ]]       # File exists and is non-empty
[[ -r "$file" ]]       # File is readable
[[ -w "$file" ]]       # File is writable
[[ -x "$file" ]]       # File is executable
[[ -L "$path" ]]       # Path is a symlink
[[ -z "$var" ]]        # String is empty
[[ -n "$var" ]]        # String is non-empty
[[ "$a" == "$b" ]]     # String equality
[[ "$n" -eq 0 ]]       # Numeric equal
[[ "$n" -gt 0 ]]       # Numeric greater than
[[ "$n" -lt 10 ]]      # Numeric less than

# Compound conditions
[[ -f "$f" && -r "$f" ]]          # File exists AND is readable
[[ "$a" == "x" || "$b" == "y" ]]   # Either condition

# Short-circuit for guard clauses
[[ -d /tmp/work ]] || mkdir -p /tmp/work
[[ -n "$DB_URL" ]] || { echo "DB_URL required" >&2; exit 1; }

# case statement
case "$os" in
  ubuntu|debian)  pkg_mgr="apt"  ;;
  centos|rhel)    pkg_mgr="yum"  ;;
  fedora)         pkg_mgr="dnf"  ;;
  *)              echo "Unknown OS: $os" >&2; exit 1 ;;
esac

Loops

# for — list iteration
for host in web1 web2 web3; do
  ssh "$host" 'systemctl restart nginx'
done

# for — array iteration
files=(/var/log/*.log)
for file in "${files[@]}"; do
  gzip "$file"
done

# Read lines from file safely (handles spaces, backslashes)
while IFS= read -r line; do
  echo "Processing: $line"
done < input.txt

# Read lines from command output
while IFS= read -r pid; do
  kill -TERM "$pid"
done < <(pgrep stale-worker)

# C-style for loop
for ((i=1; i<=10; i++)); do
  echo "$i"
done

# while loop with counter
count=0
while [[ $count -lt 5 ]]; do
  echo "Attempt $count"
  ((count++))
done

# Retry loop with backoff
max_attempts=5
attempt=0
until curl -sf https://api.example.com/health >/dev/null; do
  ((attempt++))
  [[ $attempt -ge $max_attempts ]] && { echo "Service unreachable"; exit 1; }
  echo "Waiting... (attempt $attempt/$max_attempts)"
  sleep $((2 ** attempt))    # Exponential backoff: 2, 4, 8, 16 seconds
done

Functions

# Logging helper
log() {
  local level="$1"; shift
  echo "[$(date '+%Y-%m-%d %H:%M:%S')] [$level] $*" >&2
}
log "INFO" "Starting deployment"
log "ERROR" "Database unreachable"

# Functions return exit codes (0-255)
# Return VALUES via echo + command substitution
get_timestamp() {
  date '+%Y%m%d_%H%M%S'
}
ts=$(get_timestamp)

# Validate and guard
require_env() {
  local var="$1"
  [[ -n "${!var}" ]] || { log "ERROR" "Required env var not set: $var"; exit 1; }
}
require_env "DATABASE_URL"
require_env "SECRET_KEY"

# Variadic functions — "$@" expands to all args, individually quoted
sum() {
  local total=0
  for n in "$@"; do ((total += n)); done
  echo "$total"
}
sum 1 2 3 4 5   # 15

# local is essential — without it, variables leak into caller scope
process_file() {
  local file="$1"           # function-scoped
  local line_count
  line_count=$(wc -l < "$file")
  echo "$file has $line_count lines"
}
Always use local for function variables
Without local, bash variables are global. A function that sets count=0 will silently zero out a count variable in the calling scope. This is one of the most common sources of subtle shell script bugs.
Complete production-quality backup script
#!/usr/bin/env bash
# backup.sh — Back up a directory to S3 with retention and error alerting
set -euo pipefail
IFS=$'\n\t'

# --- Configuration (override via environment) ---
readonly BACKUP_SRC="${BACKUP_SRC:-/var/lib/myapp}"
readonly BACKUP_BUCKET="${BACKUP_BUCKET:?BACKUP_BUCKET env var is required}"
readonly RETENTION_DAYS="${RETENTION_DAYS:-30}"
readonly TIMESTAMP=$(date '+%Y%m%d_%H%M%S')
readonly BACKUP_FILE="/tmp/backup-${TIMESTAMP}-$$.tar.gz"
readonly LOG_FILE="/var/log/myapp-backup.log"

# --- Logging ---
log() {
  local level="$1"; shift
  echo "[$(date '+%Y-%m-%d %H:%M:%S')] [$level] $*" | tee -a "$LOG_FILE"
}

# --- Cleanup on exit ---
cleanup() {
  local exit_code=$?
  rm -f "$BACKUP_FILE"
  if [[ $exit_code -ne 0 ]]; then
    log "ERROR" "Backup FAILED with exit code $exit_code (line $LINENO)"
    # Uncomment to alert via Slack:
    # curl -s -X POST "$SLACK_WEBHOOK" \
    #   -H 'Content-type: application/json' \
    #   -d "{\"text\":\"Backup failed on $(hostname) at $(date)\"}"
  fi
}
trap cleanup EXIT

# --- Validate dependencies ---
for cmd in tar aws; do
  command -v "$cmd" >/dev/null 2>&1 \
    || { log "ERROR" "Required command not found: $cmd"; exit 1; }
done

# --- Validate source ---
[[ -d "$BACKUP_SRC" ]] \
  || { log "ERROR" "Source directory not found: $BACKUP_SRC"; exit 1; }

# --- Create archive ---
log "INFO" "Backing up $BACKUP_SRC"
tar -czf "$BACKUP_FILE" -C "$(dirname "$BACKUP_SRC")" "$(basename "$BACKUP_SRC")"
size=$(du -sh "$BACKUP_FILE" | cut -f1)
log "INFO" "Archive created: $BACKUP_FILE ($size)"

# --- Upload ---
s3_path="s3://${BACKUP_BUCKET}/backups/${TIMESTAMP}.tar.gz"
aws s3 cp "$BACKUP_FILE" "$s3_path" --storage-class STANDARD_IA
log "INFO" "Uploaded to $s3_path"

# --- Rotate old backups ---
log "INFO" "Rotating backups older than ${RETENTION_DAYS} days"
cutoff=$(date -d "${RETENTION_DAYS} days ago" '+%Y%m%d' 2>/dev/null \
         || date -v-${RETENTION_DAYS}d '+%Y%m%d')  # Linux / macOS compat
aws s3 ls "s3://${BACKUP_BUCKET}/backups/" | awk '{print $4}' \
  | while IFS= read -r key; do
      file_date="${key:0:8}"
      if [[ "$file_date" < "$cutoff" ]]; then
        aws s3 rm "s3://${BACKUP_BUCKET}/backups/$key"
        log "INFO" "Deleted old backup: $key"
      fi
    done

log "INFO" "Backup complete"

8. Process Management

ps & top/htop

# ps — process snapshot
ps aux                         # All processes, user-oriented format
ps -ef                         # All processes, full format (shows PPID)
ps -ejH                        # Process tree (forest view)
ps aux --sort=-%cpu | head -10 # Top 10 CPU consumers
ps aux --sort=-%mem | head -10 # Top 10 memory consumers

# ps output columns (aux format):
# USER  PID  %CPU  %MEM  VSZ    RSS    TTY  STAT  START  TIME  COMMAND
# VSZ = virtual size (includes shared libs, mmap'd files — often misleading)
# RSS = resident set size = actual physical RAM in use
# STAT: R=running, S=interruptible sleep, D=uninterruptible sleep (I/O wait),
#       Z=zombie (exited, parent hasn't wait()ed), T=stopped, <=high priority
#       + = foreground process group, s = session leader

# Get PIDs
pgrep nginx                    # PIDs matching process name
pgrep -a nginx                 # With full command line
pgrep -u www-data              # All processes owned by www-data
pidof sshd                     # PIDs of exact binary name

# Process details via /proc
cat /proc/$(pgrep -n nginx)/status | grep -E "^(Name|Pid|VmRSS|Threads)"
ls /proc/$(pgrep -n nginx)/fd | wc -l   # Count open file descriptors

# top interactive keys:
# k = kill a PID, r = renice, f = field selector
# 1 = per-CPU breakdown, M = sort by memory, P = sort by CPU
# u = filter by user, q = quit
top -b -n 1 | head -20         # Batch mode (scriptable)

Signals & kill

# Common signals and their conventional meanings
# SIGTERM (15) — polite termination; process should clean up and exit
# SIGKILL  (9) — force kill; cannot be caught, blocked, or ignored
# SIGHUP   (1) — hangup; convention: reload config (nginx, sshd, etc.)
# SIGINT   (2) — keyboard interrupt (Ctrl+C)
# SIGQUIT  (3) — quit with core dump (Ctrl+\)
# SIGSTOP (19) — pause process (cannot be caught)
# SIGCONT (18) — continue stopped process
# SIGUSR1/2    — user-defined; application-specific

kill -TERM 1234                # Polite terminate (default signal)
kill -HUP 1234                 # Reload config
kill -KILL 1234                # Force kill (always works, last resort)
kill -9 1234                   # Same as -KILL

# Kill by name
pkill nginx                    # SIGTERM to all matching
pkill -HUP nginx               # Reload all nginx workers
pkill -u alice                 # Kill all of alice's processes (use with caution)
killall -9 python3             # SIGKILL to all python3 processes

# Kill process using a port
fuser -k 8080/tcp              # Kill process bound to TCP port 8080
lsof -ti :8080 | xargs kill    # Same using lsof

# Trap signals in scripts
cleanup() { echo "Interrupted, cleaning up..." >&2; rm -f /tmp/lockfile; exit 1; }
trap cleanup SIGTERM SIGINT

Job Control

# Background / foreground
command &                      # Start immediately in background
Ctrl+Z                         # Suspend the current foreground job
bg                             # Resume most recent suspended job in background
bg %2                          # Resume job #2 in background
fg                             # Bring most recent background job to foreground
fg %2                          # Bring job #2 to foreground

jobs                           # List background jobs in current shell
jobs -l                        # With PIDs

# Detach from shell — process survives shell exit
command &
disown %1                      # Remove from jobs table (immune to SIGHUP on shell exit)

# nohup — immune to SIGHUP, stdout/stderr go to nohup.out
nohup ./long-running-script.sh > /var/log/myscript.log 2>&1 &

# tmux — persistent multiplexed sessions (far better than nohup for interactive work)
tmux new -s deploy             # New named session
tmux attach -t deploy          # Reattach (survives SSH disconnect)
tmux ls                        # List sessions
# Inside tmux: Ctrl+B then d to detach without killing

9. Users & Groups

User Database Files

# /etc/passwd — one line per user: name:x:UID:GID:GECOS:home:shell
# The 'x' means password hash is in /etc/shadow
cat /etc/passwd | grep -v "nologin\|false"  # Users with real login shells

# /etc/shadow — hashed passwords (root-readable only)
# Format: name:hash:last_change:min:max:warn:inactive:expire
sudo cat /etc/shadow | head -3

# /etc/group — group database: group:x:GID:member1,member2
cat /etc/group | grep docker    # Who is in the docker group

# Current user info
id                              # uid, gid, all supplementary groups
whoami                          # Username only
id alice                        # Info for a specific user
groups alice                    # List alice's groups

User Management

# Create user
useradd -m -s /bin/bash -G sudo alice      # -m=create home, -s=shell, -G=supplementary groups
useradd -r -s /usr/sbin/nologin myservice  # System user (no interactive login, UID < 1000)

# Modify user
usermod -aG docker alice        # Add to docker group (-a = append, required to not lose existing groups)
usermod -s /bin/zsh alice       # Change login shell
usermod -L alice                # Lock account (prepend ! to password hash)
usermod -U alice                # Unlock account

# Delete user
userdel alice                   # Delete user, keep home directory
userdel -r alice                # Delete user AND home directory

# Passwords
passwd alice                    # Set/change password interactively
passwd -e alice                 # Expire password (forces change on next login)
passwd -l alice                 # Lock account
chage -l alice                  # Show password aging policy
chage -M 90 alice               # Set max password age to 90 days

# Groups
groupadd developers
groupdel developers
gpasswd -a alice developers     # Add alice to group
gpasswd -d alice developers     # Remove alice from group

sudo

# Run as root
sudo command
sudo -i                         # Interactive root shell with root's environment
sudo -u alice command           # Run as a specific user
sudo -l                         # List what current user can sudo

# /etc/sudoers — ALWAYS edit with visudo (validates syntax before saving)
sudo visudo

# Common sudoers patterns
# alice ALL=(ALL:ALL) ALL                     # Full sudo
# %sudo ALL=(ALL:ALL) ALL                     # All members of sudo group
# alice ALL=(ALL) NOPASSWD: /bin/systemctl restart nginx  # Specific passwordless cmd
# deploy ALL=(ALL) NOPASSWD: /usr/local/bin/deploy.sh

# Drop-in files (preferred over editing /etc/sudoers directly)
echo "deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl" \
  | sudo tee /etc/sudoers.d/deploy
sudo chmod 0440 /etc/sudoers.d/deploy   # Must not be world-readable
Never edit /etc/sudoers directly
A syntax error locks out ALL sudo access on the system. You may need physical console access to recover. Always use visudo which validates before saving. Use drop-in files in /etc/sudoers.d/ for application-specific rules.

10. Package Management

apt (Debian / Ubuntu)

# Update package index first
sudo apt update

# Install / remove
sudo apt install nginx postgresql-16 build-essential
sudo apt remove nginx                  # Remove but keep config files
sudo apt purge nginx                   # Remove including config files
sudo apt autoremove                    # Remove orphaned dependencies

# Upgrade
sudo apt upgrade                       # Upgrade installed packages
sudo apt full-upgrade                  # Upgrade + handle dependency changes

# Inspect
apt search "web server"
apt show nginx                         # Dependencies, description
dpkg -l | grep nginx                   # Installed packages matching name
dpkg -L nginx                          # Files installed by the nginx package
dpkg -S /usr/sbin/nginx                # Which package owns this file
apt-cache policy nginx                 # Installed vs available version

# Hold a package version
sudo apt-mark hold nginx
sudo apt-mark unhold nginx

# Add third-party repository (modern way with signed keyring)
curl -fsSL https://repo.example.com/gpg.key \
  | sudo gpg --dearmor -o /usr/share/keyrings/example.gpg
echo "deb [signed-by=/usr/share/keyrings/example.gpg] https://repo.example.com stable main" \
  | sudo tee /etc/apt/sources.list.d/example.list
sudo apt update

yum / dnf (RHEL / CentOS / Fedora)

# dnf is the modern replacement for yum
sudo dnf install nginx
sudo dnf remove nginx
sudo dnf update                        # Update all packages
sudo dnf update nginx                  # Update specific package
sudo dnf search "web server"
sudo dnf info nginx
sudo dnf list installed | grep nginx
sudo dnf provides /usr/sbin/nginx      # Which package owns this path

# RHEL extras
sudo dnf install epel-release          # Extra Packages for Enterprise Linux
sudo dnf config-manager --set-enabled crb  # CodeReady Linux Builder (RHEL 9)

Package Manager Quick-Reference

Taskapt (Debian/Ubuntu)dnf (RHEL/Fedora)
Update indexapt update(auto or dnf check-update)
Installapt install pkgdnf install pkg
Remove + configapt purge pkgdnf remove pkg
Upgrade allapt upgradednf update
Searchapt search termdnf search term
File to packagedpkg -S /pathdnf provides /path
Package filesdpkg -L pkgrpm -ql pkg
Package infoapt show pkgdnf info pkg

11. Networking

ip & ss

# ip — modern replacement for ifconfig/route (iproute2 package)

# Addresses
ip addr                                    # All interfaces and IPs
ip addr show eth0                          # Specific interface
ip addr add 192.168.1.10/24 dev eth0       # Add IP (not persistent)
ip addr del 192.168.1.10/24 dev eth0

# Links
ip link show
ip link set eth0 up
ip link set eth0 down

# Routes
ip route                                   # Routing table
ip route get 8.8.8.8                       # Which route handles destination
ip route add default via 192.168.1.1
ip route add 10.0.0.0/8 via 10.1.0.1

# ss — socket statistics (replaces netstat)
ss -tlnp                        # TCP listening sockets with PIDs
ss -ulnp                        # UDP listening
ss -tnp                         # Established TCP connections with PIDs
ss -s                           # Socket summary statistics
ss -tnp dst :443                # Connections to port 443
ss state established            # Only established connections

curl & dig

# curl
curl https://api.example.com/health
curl -s https://api.example.com              # Silent (no progress bar)
curl -I https://example.com                  # HEAD only
curl -L https://example.com                  # Follow redirects
curl -o /tmp/file.zip https://example.com/f.zip

# POST with JSON body
curl -X POST https://api.example.com/users \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"name":"alice","email":"[email protected]"}'

# Check response code + latency
curl -w "\nHTTP %{http_code}  %{time_total}s\n" -o /dev/null -s https://example.com

# Retry on failure
curl --retry 3 --retry-delay 2 --connect-timeout 5 --max-time 30 https://api.example.com

# dig — DNS queries
dig example.com                # A record (IPv4)
dig example.com AAAA           # IPv6
dig example.com MX             # Mail exchangers
dig example.com TXT            # TXT records (SPF, DKIM)
dig @8.8.8.8 example.com       # Force specific DNS server
dig +short example.com         # Answer only (no extra output)
dig +trace example.com         # Full delegation chain from root
dig -x 93.184.216.34           # Reverse DNS lookup (PTR)

iptables / UFW

# iptables
sudo iptables -L -n -v             # List filter table
sudo iptables -t nat -L -n -v      # nat table

# Common INPUT rules
sudo iptables -A INPUT -p tcp --dport 22 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 80 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 443 -j ACCEPT
sudo iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
sudo iptables -A INPUT -i lo -j ACCEPT
sudo iptables -P INPUT DROP        # Default deny — set LAST

# Save / restore
sudo iptables-save > /etc/iptables/rules.v4
sudo iptables-restore < /etc/iptables/rules.v4

# UFW — simplified frontend (Ubuntu)
sudo ufw allow 22/tcp
sudo ufw allow from 10.0.0.0/8 to any port 5432  # Postgres from private net
sudo ufw deny 8080/tcp
sudo ufw enable
sudo ufw status verbose

tcpdump

sudo tcpdump -i eth0                         # All traffic
sudo tcpdump -i any port 443                 # HTTPS on all interfaces
sudo tcpdump -i eth0 host 10.0.0.5          # Traffic to/from specific host
sudo tcpdump -i eth0 -A port 80             # Print ASCII payload (HTTP debug)
sudo tcpdump -i eth0 -w capture.pcap        # Save for Wireshark
sudo tcpdump -r capture.pcap                # Read saved file
sudo tcpdump -i any -n udp port 53          # Watch DNS queries
# Watch TCP SYN packets (connection attempts)
sudo tcpdump -i eth0 'tcp[tcpflags] & tcp-syn != 0'

12. systemd

Service Control

sudo systemctl start nginx
sudo systemctl stop nginx
sudo systemctl restart nginx           # Stop + start (brief downtime)
sudo systemctl reload nginx            # Send SIGHUP — reload config in-place
sudo systemctl try-reload-or-restart nginx  # Reload if supported, else restart

# Enable / disable at boot
sudo systemctl enable nginx
sudo systemctl disable nginx
sudo systemctl enable --now nginx      # Enable AND start immediately

# Status
systemctl status nginx                 # Status, recent log lines, enabled state
systemctl is-active nginx              # Exits 0 if active
systemctl is-enabled nginx             # "enabled" / "disabled" / "static"
systemctl is-failed nginx

# List units
systemctl list-units --type=service
systemctl list-units --state=failed
systemctl list-unit-files --type=service

# Dependency graph
systemctl list-dependencies nginx
systemctl list-dependencies --reverse nginx  # Who depends on nginx

journalctl

journalctl -u nginx                     # All nginx logs
journalctl -u nginx -n 100 -f          # Last 100 lines + follow
journalctl -u nginx --since "1 hour ago"
journalctl -u nginx --since "2024-01-15 10:00" --until "2024-01-15 11:00"

# Priority filter
journalctl -p err                       # Error and above
journalctl -p warning -u nginx
# Priorities: emerg alert crit err warning notice info debug

# Boot logs
journalctl -b                           # Current boot
journalctl -b -1                        # Previous boot
journalctl --list-boots

# Kernel messages
journalctl -k                           # Kernel ring buffer
journalctl -k --since "5 min ago"

# JSON output for parsing
journalctl -u nginx -o json | jq '.MESSAGE'

# Disk management
journalctl --disk-usage
sudo journalctl --vacuum-size=500M      # Trim to 500MB
sudo journalctl --vacuum-time=30d       # Trim to 30 days

Writing Unit Files

Example: production service unit file
# /etc/systemd/system/myapp.service
# After creating or editing: sudo systemctl daemon-reload

[Unit]
Description=My Application Server
Documentation=https://github.com/myorg/myapp
After=network.target postgresql.service
Wants=postgresql.service

[Service]
Type=simple
User=myapp
Group=myapp
WorkingDirectory=/opt/myapp
EnvironmentFile=/etc/myapp/env          # Load KEY=VALUE pairs from file
ExecStart=/opt/myapp/bin/server --port 8080
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure                      # Restart if exits non-zero or crashes
RestartSec=5
StartLimitIntervalSec=60
StartLimitBurst=3                       # Max 3 restarts in 60 seconds

# Security hardening
NoNewPrivileges=true
PrivateTmp=true                         # Isolated /tmp namespace
ProtectSystem=strict                    # Read-only /usr and /etc
ProtectHome=true
ReadWritePaths=/var/lib/myapp /var/log/myapp

# Resource limits
LimitNOFILE=65536
MemoryMax=2G
CPUQuota=200%                           # Max 2 CPU cores

[Install]
WantedBy=multi-user.target

systemd Timers

# Timer unit triggers a matching .service unit on a schedule
# Advantages over cron: journald logging, Persistent=true, resource limits, RandomizedDelay

# /etc/systemd/system/backup.timer
# [Unit]
# Description=Daily backup timer
# [Timer]
# OnCalendar=daily                   # Every day at midnight
# OnCalendar=*-*-* 02:30:00          # Every day at 02:30
# OnCalendar=Mon,Thu *-*-* 04:00     # Mon and Thu at 04:00
# RandomizedDelaySec=1800            # Random delay up to 30 minutes (spread load)
# Persistent=true                    # Run missed jobs on next boot
# [Install]
# WantedBy=timers.target

sudo systemctl enable --now backup.timer
systemctl list-timers --all            # All timers with next trigger time

13. Disk & Storage

df & du

# df — mounted filesystem space
df -h                           # Human-readable
df -hT                          # Include filesystem type
df -i                           # Inode usage (can run out before disk space)

# du — file/directory usage
du -sh /var/log                 # Total size
du -sh /var/log/*               # Per-item summary
du -h --max-depth=2 /var        # Tree to 2 levels
du -ah /var | sort -rh | head -20  # Top 20 consumers under /var

# Find large files
find /var -type f -size +100M -exec ls -lh {} \;

# Interactive explorer
ncdu /var                       # apt install ncdu

lsblk, mount & fstab

# List block devices
lsblk                           # Tree view: disks, partitions, LVM, loop
lsblk -f                        # Include filesystem type, UUID, mountpoint

# Format
sudo mkfs.ext4 /dev/sdb1
sudo mkfs.xfs /dev/sdb1         # XFS preferred for large files and databases

# Mount
sudo mount /dev/sdb1 /mnt/data
sudo mount -o ro /dev/sdb1 /mnt/data   # Read-only mount
sudo umount /mnt/data
sudo umount -l /mnt/data               # Lazy — detach now, clean up when idle

# Show mounts
mount | column -t
findmnt                         # Tree view

# /etc/fstab — persistent mounts (loaded at boot)
# Format: device  mountpoint  type  options  dump  fsck-order
# UUID=abc123  /mnt/data  ext4  defaults,noatime  0  2

# Get UUID
sudo blkid /dev/sdb1

# Test fstab without rebooting
sudo mount -a

# Common mount options:
# noatime = skip access time updates (significant I/O reduction)
# noexec  = prevent binary execution (security hardening for /tmp, user uploads)
# nosuid  = ignore SUID/SGID bits (security hardening)

LVM

# Inspect
pvdisplay && vgdisplay && lvdisplay
lvs                             # Compact summary

# Extend an LV online (no unmount needed with ext4 or xfs)
sudo lvextend -L +10G /dev/vg0/data           # Add 10GB
sudo lvextend -l +100%FREE /dev/vg0/data      # Use all VG free space
sudo resize2fs /dev/vg0/data                  # Grow ext4 to fill LV
sudo xfs_growfs /mnt/data                     # Grow XFS (pass mount point)

# Snapshot for backup
sudo lvcreate -L 5G -s -n mydata-snap /dev/vg0/mydata
sudo mount -o ro /dev/vg0/mydata-snap /mnt/snap
# ... run backup ...
sudo umount /mnt/snap
sudo lvremove /dev/vg0/mydata-snap

14. SSH

Keys & Config

# Generate key — Ed25519 preferred (faster, smaller, more secure than RSA 4096)
ssh-keygen -t ed25519 -C "alice@work" -f ~/.ssh/id_ed25519_work

# Deploy public key to server
ssh-copy-id -i ~/.ssh/id_ed25519_work.pub [email protected]

# ~/.ssh/config — per-host connection settings
# Host web1
#   HostName 10.0.1.50
#   User alice
#   IdentityFile ~/.ssh/id_ed25519_work
#   Port 2222
#
# Host bastion
#   HostName bastion.example.com
#   User deploy
#   ForwardAgent yes              # Forward local SSH agent (keys usable on bastion)
#
# Host prod-*                     # Wildcard — matches prod-web1, prod-db1, etc.
#   ProxyJump bastion             # Auto-jump through bastion host
#   User deploy

ssh web1                        # Connects using config above
ssh prod-db1                    # Jumps through bastion automatically

# SSH agent — cache passphrase in memory
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519_work

Tunneling

# Local port forwarding — bring a remote service to a local port
ssh -L 5433:localhost:5432 alice@db-server
# psql -h 127.0.0.1 -p 5433 now reaches db-server's local Postgres

# Non-interactive background tunnel
ssh -fNL 5433:localhost:5432 alice@db-server   # -f=background, -N=no shell

# Tunnel to a third host reachable from the SSH server
ssh -L 8080:internal-app.internal:80 alice@bastion

# Remote port forwarding — expose a local service on a remote port
ssh -R 8080:localhost:3000 alice@server         # server:8080 -> local:3000

# Dynamic SOCKS proxy — proxy any TCP traffic through SSH
ssh -D 1080 -fN alice@server                    # SOCKS5 on localhost:1080

# ProxyJump — multi-hop SSH
ssh -J bastion.example.com alice@internal-server

# Agent forwarding — use your local private keys while on the remote server
ssh -A alice@bastion                            # Only for fully trusted hosts

scp & rsync

# scp — simple copy (no delta, no resume)
scp file.txt alice@server:/tmp/
scp alice@server:/var/log/app.log /tmp/
scp -r ./project alice@server:/opt/
scp -P 2222 file.txt alice@server:/tmp/         # Non-default port

# rsync — efficient sync with delta transfer
rsync -avz ./src/ alice@server:/opt/myapp/
# -a = archive (recursive + preserve permissions, timestamps, symlinks)
# -v = verbose, -z = compress during transfer

rsync -avz --delete ./src/ server:/opt/myapp/   # Mirror (delete remote extras)
rsync -avz --exclude='*.log' --exclude='.git' ./src/ server:/opt/
rsync -avz -e "ssh -p 2222" ./src/ server:/opt/ # Custom port
rsync -avz --progress big.tar.gz server:/tmp/   # Show per-file progress

# Dry run first
rsync -avzn --delete ./src/ server:/opt/myapp/  # -n = simulate only

sshd Hardening

# /etc/ssh/sshd_config — key settings
# Test config BEFORE reloading: sudo sshd -t

# PermitRootLogin no               # Never allow direct root login
# PasswordAuthentication no        # Keys only — eliminates brute-force risk
# PubkeyAuthentication yes
# AllowUsers alice bob deploy      # Explicit whitelist
# MaxAuthTries 3
# ClientAliveInterval 300          # Disconnect idle after 5 min
# ClientAliveCountMax 2
# X11Forwarding no
# AllowTcpForwarding yes           # Required for tunnels; set no if unneeded

# Reload without locking yourself out
sudo sshd -t && sudo systemctl reload sshd

# Verify effective settings
sudo sshd -T | grep -E "permitrootlogin|passwordauth|allowusers"

15. Cron & Scheduling

crontab Syntax

# Format: minute  hour  day-of-month  month  day-of-week  command
# Range:    0-59   0-23     1-31       1-12      0-7 (0&7=Sunday)
# Wildcards: * = any,  */5 = every 5,  1,3,5 = list,  1-5 = range

crontab -e                      # Edit current user's crontab
crontab -l                      # List
crontab -r                      # Remove ALL (no confirmation prompt!)
sudo crontab -l -u alice        # List another user's crontab (as root)

# Common schedule patterns
# 0 2 * * *          Daily backup at 02:00
# */5 * * * *        Every 5 minutes
# 0 9-17 * * 1-5     Top of every business hour, Mon-Fri
# 0 0 1 * *          First day of each month at midnight
# @reboot            Once on system boot
# @daily             Midnight every day (alias)
# @hourly            Every hour at :00 (alias)

# System crontabs (include user field)
# /etc/crontab and /etc/cron.d/myapp:
# minute hour day month weekday USER command

# Drop scripts directly into these directories (no crontab format):
# /etc/cron.daily/  /etc/cron.weekly/  /etc/cron.monthly/
Cron runs with a minimal PATH
Cron's PATH is usually only /usr/bin:/bin. Always use absolute paths in cron commands. Always redirect output to avoid silent email failures:
*/5 * * * * /usr/local/bin/check.sh >>/var/log/check.log 2>&1
# Set variables at top of crontab for correct environment
# SHELL=/bin/bash
# PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
# MAILTO=""   # Suppress email output

# Test in cron's restricted environment
env -i HOME=/root SHELL=/bin/bash PATH=/usr/local/bin:/usr/bin:/bin \
  /usr/local/bin/myscript.sh

# Prevent overlapping runs with flock
*/5 * * * * /usr/bin/flock -n /var/lock/myjob.lock /usr/local/bin/myjob.sh

Cron vs systemd Timers

Featurecronsystemd timers
LoggingEmail or redirect manuallyAutomatic journald integration
Missed runsSilently skippedRun on next boot with Persistent=true
DependenciesNoneFull unit dependency support
Random delayManual sleep in scriptRandomizedDelaySec built-in
Resource limitsNoneCPUQuota, MemoryMax, etc.
Status checkGrep syslog/emailsystemctl status job.timer
ComplexitySingle crontab lineTwo unit files required

16. Performance & Monitoring

vmstat, iostat, sar

# vmstat — VM stats at a glance
vmstat 1 10                     # 1-second samples, 10 times
# Key columns:
# r   = run queue (tasks waiting for CPU; high = CPU-bound)
# b   = blocked on uninterruptible I/O
# si/so = swap in/out KB/s (nonzero = memory pressure)
# wa  = iowait % (high >20% = I/O bottleneck)

# iostat — per-device disk I/O (sysstat package)
iostat -xz 1                    # Extended stats; skip idle devices
# Key columns:
# r/s  w/s      = reads/writes per second
# rkB/s wkB/s   = throughput KB/s
# await         = avg wait time ms (high = disk bottleneck)
# %util         = percent time busy (100% = saturated)

# sar — historical activity (collected every 10 min by sadc)
sar -u 1 10                     # CPU utilization
sar -r 1 10                     # Memory utilization
sar -b 1 10                     # Block I/O stats
sar -n DEV 1 5                  # Network interface throughput
sar -q 1 10                     # Load average and run queue

# Today's CPU history from sadc
sar -u -f /var/log/sa/sa$(date +%d)

strace & perf

# strace — trace system calls made by a process
strace command                  # Trace new process
strace -p 1234                  # Attach to running PID
strace -e trace=network command # Only network syscalls
strace -e openat,read,write -p 1234  # File I/O syscalls
strace -c command               # Summary: time spent per syscall
strace -T -p 1234               # Time each individual syscall

# Diagnose a hung process
strace -p 1234 2>&1 | head -5
# futex(WAIT)     = blocked on mutex/lock
# epoll_wait      = event loop idle (normal)
# read on socket  = waiting for network data

# perf — hardware performance counters (requires kernel perf support)
sudo perf stat command          # HW event counts: cycles, cache misses, branches
sudo perf top                   # Live per-function CPU profiling
sudo perf record -F 99 -g -p 1234 -- sleep 30   # 30-second CPU profile
sudo perf report                # Flamegraph-style interactive viewer

sysctl & System Tuning

# Read / write kernel parameters
sysctl -a                              # All parameters
sysctl net.ipv4.tcp_max_syn_backlog    # Read specific param
sudo sysctl -w net.ipv4.ip_forward=1  # Write immediately (not persistent)

# Persist in a drop-in file
sudo tee /etc/sysctl.d/99-production.conf <<'EOF'
fs.file-max = 2097152
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_tw_reuse = 1
vm.swappiness = 10
vm.overcommit_memory = 1
EOF
sudo sysctl -p /etc/sysctl.d/99-production.conf

# Per-process fd limits
ulimit -n                              # Current shell limit
ulimit -n 65536                        # Raise for current session

# Persistent limits: /etc/security/limits.conf
# myapp   soft  nofile  65536
# myapp   hard  nofile  131072

Load Average & Memory

# Load average interpretation
uptime
# load average: 1.23, 0.87, 0.72  (1/5/15 minute)
# load / nCPU: <0.7 = healthy,  1.0 = at capacity,  >1.0 = overloaded
nproc                           # Number of logical CPU cores

# Memory summary
free -h
# "available" is what matters (MemFree + reclaimable cache)
cat /proc/meminfo | grep -E "MemTotal|MemAvailable|SwapUsed|Cached"

# Memory hogs
ps aux --sort=-%mem | head -10

# OOM events in kernel log
dmesg | grep -i "oom\|killed process\|out of memory"

# OOM score — 0-1000; higher = more likely to be killed
cat /proc/$(pgrep -n myapp)/oom_score
echo -300 | sudo tee /proc/$(pgrep -n myapp)/oom_score_adj  # Protect it

17. Common Interview Scenarios

Debugging a Slow Server

# Step 1: 30-second high-level snapshot
uptime                          # Load — is CPU saturated?
free -h                         # Memory exhausted? Swap active?
df -h                           # Any filesystem full?
iostat -xz 1 3                  # Any disk %util near 100%?

# Step 2: Find the hot processes
ps aux --sort=-%cpu | head -10
ps aux --sort=-%mem | head -10

# Step 3: If iowait is high — who is doing I/O?
iotop -o                        # apt install iotop
iostat -xz 1 | grep -v "^$"

# Step 4: Network issues?
ss -tnp                         # Connection states — lots of CLOSE_WAIT or TIME_WAIT?
netstat -s | grep -E "retransmit|failed"

# Step 5: Application layer
journalctl -u myapp --since "10 min ago" -p warning --no-pager
tail -f /var/log/myapp/app.log | grep -E "SLOW|timeout|error|WARN"

Disk Full Recovery

# Identify the full filesystem
df -h

# Find the culprits
du -ah /var | sort -rh | head -20
find /var/log -type f -size +100M -exec ls -lh {} \;

# Quick wins — safe to clean
sudo apt clean                         # Cached .deb packages
sudo dnf clean all                     # Cached .rpm packages
sudo journalctl --vacuum-size=100M     # Trim journal to 100MB
gzip /var/log/*.log.{1,2}              # Compress old rotated logs
find /tmp -mtime +1 -delete            # Clear old /tmp files

# Truncate a log file without restarting the writing process
> /var/log/app.log              # Truncate in-place (FD stays open)

# Find deleted-but-open files (space held until process restarts)
lsof | grep "(deleted)" | awk '{print $1, $2, $7}'
# Fix: restart the holding process

# Docker cleanup (if applicable)
docker system prune -a --volumes       # CAUTION: removes unused images and volumes

OOM Debugging

# Confirm OOM happened
dmesg | grep -i "out of memory\|oom_kill\|killed process"
grep "Out of memory" /var/log/kern.log | tail -5

# OOM log shows:
# - Process name + PID that was killed
# - Memory state at time of kill (active, inactive, free, slab pages)
# - oom_score of all processes at the time

# Current memory consumers
ps aux --sort=-%mem | head -10

# Check swap
free -h
swapon --show

# Reduce OOM kill risk for a critical process
echo -500 | sudo tee /proc/$(pgrep -n postgres)/oom_score_adj  # Less likely to die

# Add swap on a RAM-constrained server (EC2, VMs)
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
# Make persistent: echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

Network Connectivity Debugging

# Layer-by-layer approach

# L3 — Is network configured?
ip addr                         # Do we have an IP address?
ip route                        # Do we have a default gateway?
ping -c 3 $(ip route | awk '/default/{print $3}')  # Can we reach the gateway?

# L3 — DNS working?
dig +short google.com           # Does DNS resolve?
dig @8.8.8.8 +short google.com  # Test against Google's DNS (bypass local resolver)
cat /etc/resolv.conf            # What DNS servers are configured?

# L4 — Remote port reachable?
nc -zv api.example.com 443      # TCP handshake test
curl -sv --connect-timeout 5 https://api.example.com 2>&1 | head -20

# L4 — Is our service actually listening?
ss -tlnp | grep :8080           # Is the port bound?

# Firewall blocking?
sudo iptables -L INPUT -n -v --line-numbers
sudo ufw status verbose

# Path analysis
traceroute api.example.com
mtr --report api.example.com    # Combined ping + traceroute (apt install mtr)

# Capture to confirm packets
sudo tcpdump -i eth0 -n host api.example.com and port 443 -c 20

Runaway / High-CPU Process

# Find it
ps aux --sort=-%cpu | head -5
top -b -n 1 | head -15

# Gather info before acting
PID=12345
cat /proc/$PID/cmdline | tr '\0' ' '        # Full command line
cat /proc/$PID/status | grep -E "Name|Uid|VmRSS|Threads"
ls -la /proc/$PID/exe                       # Binary path
ls /proc/$PID/fd | wc -l                   # Open file descriptor count

# Profile it briefly before killing
strace -p $PID -c -e trace=all 2>&1 &
PID_STRACE=$!; sleep 5; kill $PID_STRACE 2>/dev/null; true

# Graceful termination first
kill -TERM $PID
sleep 10
# Force kill only if still alive
kill -0 $PID 2>/dev/null && kill -KILL $PID

# Reduce impact without killing (while investigating)
renice +15 -p $PID              # Lower scheduling priority
cpulimit -p $PID -l 25          # Limit to 25% CPU (apt install cpulimit)
Production incident triage script
#!/usr/bin/env bash
# incident-triage.sh — Collect system snapshot during a production incident
set -euo pipefail

OUTDIR="/tmp/incident-$(hostname)-$(date +%Y%m%d_%H%M%S)"
mkdir -p "$OUTDIR"

collect() {
  local name="$1"; shift
  echo "  Collecting $name..."
  "$@" > "$OUTDIR/${name}.txt" 2>&1 || true
}

echo "Collecting system state to $OUTDIR ..."

collect "01-uptime"          uptime
collect "02-date-uname"      bash -c 'date; uname -a'
collect "03-df"              df -h
collect "04-free"            free -h
collect "05-vmstat"          vmstat 1 5
collect "06-iostat"          iostat -xz 1 5
collect "07-ps-cpu"          ps aux --sort=-%cpu
collect "08-ps-mem"          ps aux --sort=-%mem
collect "09-top"             top -b -n 1
collect "10-ss-listen"       ss -tlnp
collect "11-ss-conns"        ss -tnp state established
collect "12-ip-addr"         ip addr
collect "13-ip-route"        ip route
collect "14-dmesg"           dmesg --time-format iso | tail -200
collect "15-journal-errors"  journalctl -p err --since "2 hours ago" --no-pager
collect "16-file-nr"         cat /proc/sys/fs/file-nr
collect "17-lsof-count"      bash -c 'lsof 2>/dev/null | wc -l'
collect "18-sar-history"     sar -q -f /var/log/sa/sa$(date +%d) 2>/dev/null || true

tar -czf "${OUTDIR}.tar.gz" -C /tmp "$(basename "$OUTDIR")"
rm -rf "$OUTDIR"
echo "Done: ${OUTDIR}.tar.gz"
echo "Share this archive with the on-call team for postmortem analysis."