Table of Contents

0. Setup & Environment

Get a local Kubernetes cluster running on macOS so every example in this guide works immediately.

Prerequisites

Install Docker Desktop — it ships with a built-in single-node cluster:

brew install --cask docker

Enable Kubernetes in Docker Desktop (simplest)

Docker Desktop → Settings → Kubernetes → Enable Kubernetes → Apply & Restart. Wait for the Kubernetes status indicator in the bottom-left to turn green, then verify:

kubectl cluster-info
kubectl get nodes
Single-node cluster: Docker Desktop runs one node that acts as both control plane and worker. That is sufficient for learning every Kubernetes concept covered in this guide.

Alternative: minikube (more control)

minikube gives you a closer-to-production setup and lets you tweak resources, add-ons, and CNI plugins:

brew install minikube
minikube start --driver=docker
minikube status

Install kubectl (if not already available)

Docker Desktop includes kubectl, but installing via Homebrew keeps it updated independently:

brew install kubectl
kubectl version --client
Tip: Run kubectl version --client to confirm the binary is on your $PATH. If you have both Docker Desktop and a Homebrew kubectl, the Homebrew one takes precedence when it appears first in $PATH.

Essential kubectl Config

# See which cluster kubectl is currently talking to
kubectl config current-context

# Switch context (useful when you have multiple clusters)
kubectl config use-context docker-desktop

# Set a default namespace so you don't need -n on every command
kubectl config set-context --current --namespace=default

Helpful Tools

k9s: Think of it as htop for Kubernetes — a real-time dashboard of pods, deployments, services, and events. Press ? inside k9s for keybindings.

Quick Verify

Deploy a pod, inspect it, and clean up — confirms the full local stack is working:

# Run a test pod
kubectl run hello --image=nginx --port=80

# Confirm it reaches Running status
kubectl get pods

# Forward a local port to the pod
kubectl port-forward pod/hello 8080:80
# Visit http://localhost:8080 — should see the nginx welcome page

# Cleanup
kubectl delete pod hello

Architecture Overview

Kubernetes is a distributed system for automating deployment, scaling, and management of containerized workloads. The cluster is split into the control plane (brain) and worker nodes (muscle).

Control Plane Components

ComponentRole
kube-apiserverThe single entry point for all cluster operations. Validates and processes REST requests, writes to etcd. Horizontally scalable.
etcdDistributed key-value store. The only stateful component — all cluster state lives here. Run with 3 or 5 nodes for HA (odd number for quorum).
kube-schedulerWatches for unscheduled pods and assigns them to nodes. Considers resource requests, affinity/taints, topology. Pluggable.
kube-controller-managerRuns control loops: node controller, replication controller, endpoints controller, service account controller, etc. All in one process.
cloud-controller-managerCloud-specific control loops (load balancer provisioning, node lifecycle, routes). Separates cloud logic from core k8s.

Worker Node Components

ComponentRole
kubeletAgent on every node. Watches PodSpecs assigned to this node, ensures containers are running and healthy. Reports node/pod status to API server.
kube-proxyMaintains network rules (iptables or IPVS) for Service abstraction. Routes traffic to the correct pod. Can be replaced by CNI-level proxying (Cilium).
Container runtimeExecutes containers. containerd (default), CRI-O. Docker was removed as a direct runtime in k8s 1.24 (containerd still uses it underneath).

Cluster Networking Model

Kubernetes mandates a flat network model:

API Request Flow

# What happens when you run: kubectl apply -f deployment.yaml

# 1. kubectl reads kubeconfig (~/.kube/config), finds server URL + credentials
# 2. kubectl serializes the manifest to JSON, sends HTTP PATCH/POST to API server
# 3. API server authenticates (cert, token, webhook) → authorizes (RBAC) → admission controllers
# 4. API server validates the object schema
# 5. API server writes to etcd (two-phase: propose, commit)
# 6. Controllers watch for changes via informers (long-poll on /watch)
# 7. Deployment controller creates/updates ReplicaSet
# 8. ReplicaSet controller creates Pod objects
# 9. Scheduler watches unbound pods, selects node, writes nodeName to Pod spec
# 10. kubelet on chosen node watches for pods assigned to it
# 11. kubelet calls CRI (containerd) to pull image and start container
# 12. kubelet updates Pod status (Running, IP address, etc.)
Declarative vs Imperative
Kubernetes is declarative: you describe desired state, controllers reconcile actual state to match. This is the "control loop" pattern: observe → diff → act. Never mutate running resources directly in production — always update manifests and re-apply.

Core Concepts

Pods

A pod is the smallest deployable unit — a group of one or more containers sharing network (same IP, port space) and storage. Containers in a pod communicate via localhost.

# Minimal pod — rarely created directly in production
apiVersion: v1
kind: Pod
metadata:
  name: myapp
  namespace: default
  labels:
    app: myapp
    version: "1.0"
spec:
  containers:
  - name: main
    image: nginx:1.25
    ports:
    - containerPort: 80
    resources:
      requests:
        cpu: "100m"      # 0.1 CPU cores
        memory: "128Mi"
      limits:
        cpu: "500m"
        memory: "256Mi"

---
# Multi-container pod: main app + sidecar
apiVersion: v1
kind: Pod
metadata:
  name: app-with-sidecar
spec:
  # Init containers run to completion before main containers start
  initContainers:
  - name: db-migration
    image: myapp:migrate
    command: ["./migrate", "--up"]
    envFrom:
    - secretRef:
        name: db-credentials

  containers:
  - name: app
    image: myapp:1.0
    ports:
    - containerPort: 8080

  - name: log-shipper             # Sidecar: shares filesystem with main
    image: fluentbit:2.2
    volumeMounts:
    - name: app-logs
      mountPath: /var/log/app

  volumes:
  - name: app-logs
    emptyDir: {}

Pod Lifecycle Phases

PhaseMeaning
PendingPod accepted, but containers not yet running. Scheduling or image pull in progress.
RunningAt least one container is running, starting, or restarting.
SucceededAll containers exited with status 0 and won't restart (Job pods).
FailedAll containers terminated, at least one with non-zero exit code.
UnknownPod state can't be determined (node communication lost).

Labels, Selectors, and Annotations

metadata:
  labels:
    app: myapp              # Identifies the application
    env: production         # Environment
    version: "2.1.0"        # Release version
    tier: frontend          # Logical tier
    team: payments          # Owning team
  annotations:
    # Annotations: non-identifying metadata, no selector support
    # Can hold larger values (URLs, JSON, multi-line strings)
    kubernetes.io/change-cause: "Bumped image to fix CVE-2024-1234"
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    deployment.kubernetes.io/revision: "3"
# Label selector syntax
kubectl get pods -l app=myapp
kubectl get pods -l 'env in (production,staging)'
kubectl get pods -l 'version notin (1.0,1.1)'
kubectl get pods -l '!canary'              # Does NOT have 'canary' label
kubectl get pods -l app=myapp,env=prod     # AND logic

# Set-based selectors (used in Job, Deployment, etc.)
# matchLabels: { app: myapp }              # equality
# matchExpressions:
#   - { key: env, operator: In, values: [production, staging] }
#   - { key: canary, operator: DoesNotExist }

Namespaces

# Built-in namespaces
# default       — resources without a namespace
# kube-system   — k8s system components
# kube-public   — publicly readable, used for cluster info
# kube-node-lease — node heartbeat leases (performance)

kubectl create namespace production
kubectl get namespaces

# Set default namespace for current context
kubectl config set-context --current --namespace=production

# Cross-namespace DNS: ..svc.cluster.local

Resource Quotas and LimitRanges

# ResourceQuota: caps total resource consumption in a namespace
apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    pods: "50"
    requests.cpu: "20"
    requests.memory: "40Gi"
    limits.cpu: "40"
    limits.memory: "80Gi"
    persistentvolumeclaims: "20"
    services.loadbalancers: "5"

---
# LimitRange: sets default/min/max per pod or container
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:           # Applied if container doesn't specify limits
      cpu: "500m"
      memory: "256Mi"
    defaultRequest:    # Applied if container doesn't specify requests
      cpu: "100m"
      memory: "128Mi"
    max:
      cpu: "4"
      memory: "8Gi"
    min:
      cpu: "50m"
      memory: "64Mi"

Workloads

Deployments

Deployments manage stateless applications. They own a ReplicaSet, which owns pods. Rolling updates replace pods one batch at a time.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: production
  annotations:
    kubernetes.io/change-cause: "Release v2.1 — add payment retry"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp          # Must match pod template labels
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1         # Extra pods above desired during update
      maxUnavailable: 0   # No downtime: always maintain 3 pods
  template:
    metadata:
      labels:
        app: myapp
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: app
        image: myapp:2.1
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
          limits:
            cpu: "1"
            memory: "512Mi"
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 15
# Rollout commands
kubectl rollout status deployment/myapp
kubectl rollout history deployment/myapp
kubectl rollout history deployment/myapp --revision=3

# Rollback to previous revision
kubectl rollout undo deployment/myapp
kubectl rollout undo deployment/myapp --to-revision=2

# Pause/resume (useful for canary-style manual gates)
kubectl rollout pause deployment/myapp
kubectl rollout resume deployment/myapp

# Scale
kubectl scale deployment myapp --replicas=5

StatefulSets

StatefulSets provide stable network identities (pod-0, pod-1, ...), stable persistent storage (each pod keeps its PVC on reschedule), and ordered deployment/scaling. Use for databases, message queues, distributed caches.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: "postgres"   # Headless service name — required for DNS
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:16
        ports:
        - containerPort: 5432
          name: postgres
        env:
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:       # Each pod gets its own PVC
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: "fast-ssd"
      resources:
        requests:
          storage: 20Gi
StatefulSet DNS
Each pod gets a stable DNS entry: pod-name.service-name.namespace.svc.cluster.local. For a StatefulSet named postgres with headless service postgres in namespace default: postgres-0.postgres.default.svc.cluster.local.

DaemonSets

DaemonSets ensure one pod per node (or per selected nodes). Used for log collectors, monitoring agents, CNI plugins, node-level security tooling.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentbit
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluentbit
  updateStrategy:
    type: RollingUpdate          # or OnDelete (manual)
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: fluentbit
    spec:
      tolerations:
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule        # Run on control plane nodes too
      containers:
      - name: fluentbit
        image: fluent/fluent-bit:2.2
        volumeMounts:
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: docker-containers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: docker-containers
        hostPath:
          path: /var/lib/docker/containers

Jobs and CronJobs

# Job: run to completion
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
spec:
  completions: 1          # Number of successful pod completions needed
  parallelism: 1          # Number of pods to run in parallel
  backoffLimit: 3         # Retry up to 3 times on failure
  activeDeadlineSeconds: 600   # Kill job if not done in 10 min
  ttlSecondsAfterFinished: 3600  # Auto-clean up 1h after finish
  template:
    spec:
      restartPolicy: Never      # OnFailure or Never for Jobs (not Always)
      containers:
      - name: migrate
        image: myapp:2.1
        command: ["./migrate", "--up"]
        envFrom:
        - secretRef:
            name: db-credentials

---
# CronJob: scheduled jobs
apiVersion: batch/v1
kind: CronJob
metadata:
  name: report-generator
spec:
  schedule: "0 2 * * *"        # At 02:00 every day (cron syntax)
  timeZone: "America/New_York"  # k8s 1.27+
  concurrencyPolicy: Forbid     # Allow, Forbid, or Replace
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  startingDeadlineSeconds: 300  # Skip if missed by 5 minutes
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: reporter
            image: reporter:1.0

Pod Disruption Budgets

# PDB: limits voluntary disruptions (node drain, cluster upgrades)
# Ensures minimum availability during maintenance
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 2           # At least 2 pods must be available
  # OR: maxUnavailable: 1   # At most 1 pod unavailable at once
  selector:
    matchLabels:
      app: myapp
PDB Gotcha
PDBs only protect against voluntary disruptions (drains, evictions). Hardware failures are involuntary and bypass PDBs. Also: if minAvailable equals replicas, node drains will block forever.

Services & Networking

Service Types

TypeAccessibilityUse Case
ClusterIPCluster-internal onlyDefault. Pod-to-pod communication within cluster.
NodePortExternal via node IP:port (30000–32767)Development, on-prem without load balancer. Exposes on every node.
LoadBalancerExternal via cloud LBProduction cloud deployments. Creates cloud load balancer automatically.
ExternalNameDNS CNAME aliasMap service name to external DNS (e.g., external database FQDN).
# ClusterIP (default)
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  type: ClusterIP         # Omit = ClusterIP
  selector:
    app: myapp            # Routes to pods with this label
  ports:
  - name: http
    port: 80              # Service port (what clients connect to)
    targetPort: 8080      # Container port (can be named: targetPort: http)
    protocol: TCP

---
# LoadBalancer with annotations (AWS EKS example)
apiVersion: v1
kind: Service
metadata:
  name: myapp-lb
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
  type: LoadBalancer
  selector:
    app: myapp
  ports:
  - port: 443
    targetPort: 8080

---
# ExternalName: routes to external service
apiVersion: v1
kind: Service
metadata:
  name: prod-database
spec:
  type: ExternalName
  externalName: db.prod.example.com

Headless Services

# Headless: clusterIP: None — no VIP, DNS returns pod IPs directly
# Required for StatefulSets; enables direct pod addressing
apiVersion: v1
kind: Service
metadata:
  name: postgres-headless
spec:
  clusterIP: None         # This makes it headless
  selector:
    app: postgres
  ports:
  - port: 5432
    targetPort: 5432

Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: nginx     # Which IngressClass to use
  tls:
  - hosts:
    - api.example.com
    secretName: api-tls-cert  # cert-manager populates this
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /v1
        pathType: Prefix
        backend:
          service:
            name: api-v1
            port:
              number: 80
      - path: /v2
        pathType: Prefix
        backend:
          service:
            name: api-v2
            port:
              number: 80
  - host: admin.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: admin-ui
            port:
              number: 80

Gateway API

Gateway API (GA in k8s 1.28) is the successor to Ingress. It separates infrastructure (GatewayClass, Gateway) from routing (HTTPRoute) concerns.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: myapp-route
spec:
  parentRefs:
  - name: prod-gateway     # Reference to a Gateway object
    namespace: infra
  hostnames:
  - "api.example.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /api
    backendRefs:
    - name: myapp
      port: 80
      weight: 90           # Canary: 90% to stable
    - name: myapp-canary
      port: 80
      weight: 10

Network Policies

Network policies control ingress/egress traffic between pods. Default: all traffic allowed — policies are additive whitelists. Requires a CNI that enforces policies (Calico, Cilium, Weave).

# Default-deny all ingress in a namespace, then allow selectively
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}           # Matches all pods in namespace
  policyTypes:
  - Ingress

---
# Allow ingress only from pods with specific labels, and from a namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-ingress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: database
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api           # From pods labeled app=api
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: monitoring  # From monitoring NS
    ports:
    - protocol: TCP
      port: 5432
  egress:
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 169.254.169.254/32  # Block AWS metadata service
    ports:
    - protocol: TCP
      port: 443
Network Policy Pitfall
A podSelector: {} matches ALL pods in the namespace. A from: [] or to: [] (empty list) means deny all. A missing ingress or egress section means allow all for that direction when policyTypes doesn't include it.

Configuration

ConfigMaps

# Create from literal values
kubectl create configmap app-config \
  --from-literal=LOG_LEVEL=info \
  --from-literal=MAX_CONNECTIONS=100

# Create from a file (key = filename)
kubectl create configmap nginx-config --from-file=nginx.conf

# Create from env file (dotenv format)
kubectl create configmap app-env --from-env-file=.env.production
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  LOG_LEVEL: "info"
  MAX_CONNECTIONS: "100"
  # Multi-line values: pipe preserves newlines
  config.yaml: |
    server:
      port: 8080
      timeout: 30s
    database:
      pool_size: 10

---
# Using ConfigMap in a pod
spec:
  containers:
  - name: app
    image: myapp:1.0
    # Option 1: Inject all keys as env vars
    envFrom:
    - configMapRef:
        name: app-config

    # Option 2: Inject specific keys
    env:
    - name: LOG_LEVEL
      valueFrom:
        configMapKeyRef:
          name: app-config
          key: LOG_LEVEL

    # Option 3: Mount as a volume (files)
    volumeMounts:
    - name: config-volume
      mountPath: /etc/config
      readOnly: true

  volumes:
  - name: config-volume
    configMap:
      name: app-config

Secrets

Secrets Are Not Encrypted by Default
Secrets are base64-encoded in etcd — not encrypted. Enable encryption at rest via EncryptionConfiguration, or use external secret managers (Vault, AWS Secrets Manager via External Secrets Operator).
# Create generic secret
kubectl create secret generic db-creds \
  --from-literal=username=admin \
  --from-literal=password='s3cur3P@ss'

# Create TLS secret
kubectl create secret tls my-tls \
  --cert=tls.crt \
  --key=tls.key

# Create docker-registry secret
kubectl create secret docker-registry regcred \
  --docker-server=registry.example.com \
  --docker-username=myuser \
  --docker-password=mypass
apiVersion: v1
kind: Secret
metadata:
  name: db-credentials
type: Opaque            # Generic; also: kubernetes.io/tls, kubernetes.io/dockerconfigjson
data:
  username: YWRtaW4=   # base64 encoded "admin"
  password: czNjdXIzUEBzcw==
# Use stringData for human-readable (auto-encoded on apply)
stringData:
  connection-string: "postgresql://admin:s3cur3P@ss@db:5432/mydb"

---
# Mount secret as volume (files are updated automatically when secret changes)
spec:
  containers:
  - name: app
    volumeMounts:
    - name: db-secret
      mountPath: /etc/secrets
      readOnly: true
  volumes:
  - name: db-secret
    secret:
      secretName: db-credentials
      defaultMode: 0400    # File permissions

Immutable ConfigMaps and Secrets

# Immutable = cannot be changed after creation (must delete + recreate)
# Improves performance: kubelet doesn't need to watch for changes
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config-v2
immutable: true
data:
  VERSION: "2.0"

Storage

Volumes

Volume TypeLifetimeUse Case
emptyDirPod lifetimeScratch space, inter-container file sharing. Lost on pod restart.
hostPathNode lifetimeAccess node filesystem. Avoid in production — not portable, security risk.
configMapUntil ConfigMap deletedMount config files into pods.
secretUntil Secret deletedMount credentials as files.
projectedMixedCombine multiple sources (secret + configmap + serviceAccountToken) into one mount.
persistentVolumeClaimClaim lifetimeDurable storage — survives pod restarts and rescheduling.
nfsExternalNFS server. ReadWriteMany access for shared data.
spec:
  volumes:
  - name: tmp
    emptyDir:
      medium: Memory        # RAM-backed tmpfs; "" = disk
      sizeLimit: 512Mi

  - name: projected-vol
    projected:
      sources:
      - secret:
          name: db-credentials
      - configMap:
          name: app-config
      - serviceAccountToken:
          path: token
          expirationSeconds: 3600
          audience: my-service

PersistentVolumes and PersistentVolumeClaims

# PersistentVolume: cluster-scoped, provisioned by admin or dynamically
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-example
spec:
  capacity:
    storage: 100Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce           # RWO: single node read-write
  # - ReadOnlyMany          # ROX: multiple nodes read-only
  # - ReadWriteMany         # RWX: multiple nodes read-write
  # - ReadWriteOncePod      # RWOP: k8s 1.22+, single pod only
  persistentVolumeReclaimPolicy: Retain  # Retain, Recycle (deprecated), Delete
  storageClassName: fast-ssd
  csi:
    driver: ebs.csi.aws.com
    volumeHandle: vol-0a1b2c3d4e5f

---
# PersistentVolumeClaim: namespace-scoped, user requests storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-storage
  namespace: production
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: fast-ssd   # Match a StorageClass for dynamic provisioning
  resources:
    requests:
      storage: 50Gi

---
# Use PVC in a pod
spec:
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: database-storage
  containers:
  - name: app
    volumeMounts:
    - name: data
      mountPath: /data

StorageClasses

# StorageClass: defines how to dynamically provision storage
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"  # Default SC
provisioner: ebs.csi.aws.com    # CSI driver
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
  encrypted: "true"
  kmsKeyId: "arn:aws:kms:..."
reclaimPolicy: Delete           # Delete PV when PVC deleted
volumeBindingMode: WaitForFirstConsumer  # Delay binding until pod scheduled
allowVolumeExpansion: true
WaitForFirstConsumer
Use volumeBindingMode: WaitForFirstConsumer for zonal storage (EBS, GCP PD). This ensures the PV is created in the same AZ as the pod, preventing unschedulable pods due to AZ mismatch.

Scheduling

Node Selectors and Node Affinity

spec:
  # Simple: nodeSelector (equality only)
  nodeSelector:
    kubernetes.io/os: linux
    node.kubernetes.io/instance-type: m5.xlarge

  affinity:
    nodeAffinity:
      # requiredDuringSchedulingIgnoredDuringExecution: HARD rule
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values: [us-east-1a, us-east-1b]
          - key: node.kubernetes.io/instance-type
            operator: NotIn
            values: [t3.nano, t3.micro]

      # preferredDuringSchedulingIgnoredDuringExecution: SOFT rule
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100          # Higher weight = stronger preference
        preference:
          matchExpressions:
          - key: cloud.google.com/gke-nodepool
            operator: In
            values: [high-memory]

Pod Affinity and Anti-Affinity

affinity:
  # Co-locate pods on the same node as pods with label app=cache
  podAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 50
      podAffinityTerm:
        labelSelector:
          matchLabels:
            app: cache
        topologyKey: kubernetes.io/hostname

  # Spread replicas across different nodes (anti-affinity for HA)
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchLabels:
          app: myapp
      topologyKey: kubernetes.io/hostname   # One replica per node
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchLabels:
            app: myapp
        topologyKey: topology.kubernetes.io/zone  # Prefer different AZs

Taints and Tolerations

Taints repel pods from nodes. Tolerations allow pods to be scheduled on tainted nodes.

# Taint a node: key=value:effect
kubectl taint nodes node1 dedicated=gpu:NoSchedule
kubectl taint nodes node1 maintenance=true:NoExecute  # Evicts running pods too

# Effects:
# NoSchedule:       Don't schedule new pods without toleration
# PreferNoSchedule: Soft NoSchedule
# NoExecute:        Don't schedule + evict existing pods without toleration

# Remove a taint (append -)
kubectl taint nodes node1 dedicated:NoSchedule-
spec:
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"

  # Tolerate any taint with this key regardless of value
  - key: "node.kubernetes.io/not-ready"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 300   # Evict after 5 min if node stays not-ready

Topology Spread Constraints

# Distribute pods evenly across zones and nodes
spec:
  topologySpreadConstraints:
  - maxSkew: 1                      # Max difference between zones
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule  # or ScheduleAnyway
    labelSelector:
      matchLabels:
        app: myapp
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: myapp

Priority Classes

# PriorityClass: higher value = higher priority, can preempt lower-priority pods
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
preemptionPolicy: PreemptLowerPriority  # or Never
description: "Critical production services"

---
# Use in pod spec
spec:
  priorityClassName: high-priority

Scaling

Horizontal Pod Autoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 20
  metrics:
  # CPU: scale when avg CPU utilization exceeds 70% of requests
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

  # Memory
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 400Mi

  # Custom metric from Prometheus Adapter
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"

  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300    # Wait 5 min before scaling down
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0     # Scale up immediately
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
HPA Requirements
HPA requires Metrics Server for CPU/memory. For custom metrics, deploy Prometheus Adapter or KEDA. CPU-based HPA only works if pods have resources.requests.cpu set — without requests, utilization % can't be calculated.

Vertical Pod Autoscaler (VPA)

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  updatePolicy:
    updateMode: "Auto"    # Off | Initial | Recreate | Auto
    # Auto: evicts pods to apply recommendations (downtime!)
    # Initial: only set on new pods
    # Off: only recommend, never apply
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: "4"
        memory: 8Gi
VPA + HPA Conflict
Do not use VPA (Auto mode) and HPA (CPU/memory) on the same deployment. They conflict. Use VPA for right-sizing, or HPA for horizontal scaling — not both on the same resource metric.

KEDA (Event-Driven Autoscaling)

KEDA extends HPA with 50+ scalers: Kafka lag, SQS queue depth, Redis lists, Prometheus queries, cron schedules, and more. It can also scale to zero.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker-scaler
spec:
  scaleTargetRef:
    name: worker-deployment
  minReplicaCount: 0      # Scale to zero when idle
  maxReplicaCount: 50
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka:9092
      consumerGroup: my-consumer-group
      topic: events
      lagThreshold: "100"   # 1 replica per 100 messages of lag

Health Checks & Lifecycle

Probes

ProbeFailure ActionUse Case
livenessProbeRestart containerDetect deadlock. If this fails, the container is restarted. Set initialDelaySeconds generously.
readinessProbeRemove from Service endpointsDetect when app is ready to receive traffic. Fails during startup and brief overload.
startupProbeRestart containerSlow-starting apps. Disables liveness/readiness during startup. Once it succeeds, normal probes take over.
spec:
  containers:
  - name: app
    image: myapp:1.0

    # Startup probe: allow up to 5 min (30 * 10s) to start
    startupProbe:
      httpGet:
        path: /healthz
        port: 8080
        httpHeaders:
        - name: Custom-Header
          value: startup-check
      failureThreshold: 30
      periodSeconds: 10

    # Liveness: restart if unhealthy
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 0   # Startup probe handles the delay
      periodSeconds: 15
      timeoutSeconds: 5
      failureThreshold: 3      # Restart after 3 consecutive failures
      successThreshold: 1

    # Readiness: only receive traffic when ready
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 0
      periodSeconds: 5
      timeoutSeconds: 3
      failureThreshold: 2
      successThreshold: 1

    # Exec probe: run command inside container
    # livenessProbe:
    #   exec:
    #     command: ["redis-cli", "ping"]

    # TCP probe: check port is open
    # livenessProbe:
    #   tcpSocket:
    #     port: 5432

    # gRPC probe (k8s 1.24+)
    # livenessProbe:
    #   grpc:
    #     port: 50051
    #     service: "grpc.health.v1.Health"

Lifecycle Hooks and Graceful Shutdown

spec:
  terminationGracePeriodSeconds: 60   # Time allowed for graceful shutdown

  containers:
  - name: app
    lifecycle:
      postStart:
        # Runs immediately after container starts (async, no guarantee before traffic)
        exec:
          command: ["/bin/sh", "-c", "echo started > /tmp/started"]

      preStop:
        # Runs before SIGTERM — use to delay shutdown or drain connections
        # Critical: add a sleep to allow Service endpoint removal propagation
        exec:
          command: ["/bin/sh", "-c", "sleep 5 && nginx -s quit"]
        # OR httpGet preStop
        # httpGet:
        #   path: /shutdown
        #   port: 8080
Graceful Shutdown Pattern
Shutdown sequence: (1) Pod marked for deletion; (2) Removed from Service endpoints — this is async and takes a few seconds; (3) preStop hook runs; (4) SIGTERM sent; (5) Grace period; (6) SIGKILL. Add a sleep 5 in preStop to allow endpoint propagation before your app stops accepting new connections. Without this, you get request errors during rolling updates.

RBAC & Security

ServiceAccounts

# ServiceAccount: identity for pod processes to call the k8s API
apiVersion: v1
kind: ServiceAccount
metadata:
  name: myapp-sa
  namespace: production
  annotations:
    # EKS: IRSA — map to IAM role
    eks.amazonaws.com/role-arn: "arn:aws:iam::123456789:role/myapp-role"
    # GKE: Workload Identity
    # iam.gke.io/gcp-service-account: "[email protected]"
automountServiceAccountToken: false  # Disable auto-mount if not needed

---
spec:
  serviceAccountName: myapp-sa
  automountServiceAccountToken: false  # Also overridable at pod level

Roles and ClusterRoles

# Role: namespace-scoped permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
  namespace: production
rules:
- apiGroups: [""]              # "" = core API group
  resources: ["pods", "pods/log"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list", "watch", "update", "patch"]

---
# ClusterRole: cluster-wide OR reusable across namespaces
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: secret-reader
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get", "list"]
  # Restrict to specific resource names:
  # resourceNames: ["allowed-secret-name"]

---
# RoleBinding: bind Role or ClusterRole to subjects in a namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: production
subjects:
- kind: ServiceAccount
  name: myapp-sa
  namespace: production
- kind: User
  name: "[email protected]"
  apiGroup: rbac.authorization.k8s.io
- kind: Group
  name: "dev-team"
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role               # ClusterRole can also be referenced here
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

---
# ClusterRoleBinding: cluster-wide binding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-admin-binding
subjects:
- kind: User
  name: [email protected]
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
# Test RBAC permissions
kubectl auth can-i create deployments --namespace production
kubectl auth can-i create deployments --as system:serviceaccount:production:myapp-sa
kubectl auth can-i --list --namespace production  # List all permissions

Security Contexts

spec:
  # Pod-level security context
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000           # Volume files owned by this GID
    seccompProfile:
      type: RuntimeDefault  # Apply seccomp filter

  containers:
  - name: app
    # Container-level overrides pod-level
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true  # Immutable container filesystem
      capabilities:
        drop: ["ALL"]         # Drop all Linux capabilities
        add: ["NET_BIND_SERVICE"]  # Add only what's needed

Pod Security Standards (replaces PodSecurityPolicy)

LevelRestrictionsUse Case
privilegedUnrestrictedSystem/infrastructure pods only
baselineMinimal restrictions; prevents known privilege escalationsGeneral workloads
restrictedHeavily restricted; follows pod hardening best practicesSecurity-sensitive workloads
# Apply Pod Security Standards to a namespace via label
kubectl label namespace production \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/warn=restricted \
  pod-security.kubernetes.io/audit=restricted

kubectl Essentials

Core Commands

# Get resources
kubectl get pods                          # All pods in current namespace
kubectl get pods -n kube-system           # Specific namespace
kubectl get pods -A                       # All namespaces
kubectl get pods -o wide                  # Show node, IP
kubectl get pods -o yaml                  # Full YAML output
kubectl get pods --watch                  # Stream changes

# Describe: human-readable detail + events
kubectl describe pod myapp-7d9f8b-xyz
kubectl describe node worker-1

# Logs
kubectl logs myapp-7d9f8b-xyz
kubectl logs myapp-7d9f8b-xyz -c sidecar  # Specific container
kubectl logs myapp-7d9f8b-xyz --previous  # Previous container (after crash)
kubectl logs -l app=myapp --all-containers=true  # All pods with label
kubectl logs myapp-7d9f8b-xyz --tail=100 -f      # Follow last 100 lines

# Exec into container
kubectl exec -it myapp-7d9f8b-xyz -- /bin/bash
kubectl exec -it myapp-7d9f8b-xyz -c sidecar -- sh

# Port forwarding (local debugging without exposing service)
kubectl port-forward pod/myapp-7d9f8b-xyz 8080:8080
kubectl port-forward svc/myapp 8080:80
kubectl port-forward deployment/myapp 8080:8080

Apply vs Create

# apply: declarative, idempotent, tracks changes via annotation
# Use for ongoing management of resources
kubectl apply -f deployment.yaml
kubectl apply -f ./k8s/                    # All files in directory
kubectl apply -k ./overlays/production/   # Kustomize directory

# create: imperative, fails if resource exists
# Use for one-time creation
kubectl create -f deployment.yaml

# diff: preview changes before applying
kubectl diff -f deployment.yaml

# dry-run: validate without applying
kubectl apply -f deployment.yaml --dry-run=client   # Local validation only
kubectl apply -f deployment.yaml --dry-run=server   # Sends to API server (full validation)

# delete
kubectl delete -f deployment.yaml
kubectl delete pod myapp-7d9f8b-xyz
kubectl delete pod myapp-7d9f8b-xyz --grace-period=0 --force  # Emergency only

Output Formatting

# Custom columns
kubectl get pods -o custom-columns=\
  NAME:.metadata.name,\
  STATUS:.status.phase,\
  NODE:.spec.nodeName,\
  IP:.status.podIP

# JSONPath: extract specific fields
kubectl get pods -o jsonpath='{.items[*].metadata.name}'
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}'

# Get all container images in cluster
kubectl get pods -A -o jsonpath='{range .items[*]}{range .spec.containers[*]}{.image}{"\n"}{end}{end}' | sort -u

# Resource usage (requires metrics-server)
kubectl top pods
kubectl top pods --sort-by=cpu
kubectl top nodes

Context and Namespace Management

# Contexts
kubectl config get-contexts
kubectl config use-context my-cluster-prod
kubectl config current-context

# Set namespace for current context
kubectl config set-context --current --namespace=production

# Useful tools (install separately)
# kubectx — fast context switching: kubectx prod
# kubens  — fast namespace switching: kubens production

# Useful aliases
alias k='kubectl'
alias kgp='kubectl get pods'
alias kgs='kubectl get svc'
alias kgd='kubectl get deployments'
alias kns='kubectl config set-context --current --namespace'

# Shell completion
source <(kubectl completion zsh)  # or bash

Editing and Patching

# Edit live resource (opens $EDITOR)
kubectl edit deployment myapp

# Patch: targeted update without full manifest
# JSON merge patch
kubectl patch deployment myapp -p '{"spec":{"replicas":5}}'

# Strategic merge patch (for arrays: lists by name key)
kubectl patch deployment myapp --type=strategic -p \
  '{"spec":{"template":{"spec":{"containers":[{"name":"app","image":"myapp:2.0"}]}}}}'

# JSON patch (RFC 6902: explicit operations)
kubectl patch deployment myapp --type=json -p \
  '[{"op":"replace","path":"/spec/replicas","value":5}]'

# Label and annotate
kubectl label pod myapp-xyz tier=frontend
kubectl annotate deployment myapp description="Main API service"
kubectl label node worker-1 node-type=gpu  # Add label to node

Debugging & Troubleshooting

Common Pod Problems

StateCauseFix
ImagePullBackOffImage not found, wrong tag, no registry credentialsCheck image name/tag. Create imagePullSecret. Verify registry access.
CrashLoopBackOffContainer exits (crash or OOM). k8s keeps restarting with exponential backoff.kubectl logs --previous. Check OOMKilled in events.
PendingNo node satisfies scheduling constraints (resources, taints, affinity)kubectl describe pod → Events. Check node capacity.
OOMKilledContainer exceeded memory limitIncrease limits.memory. Check for memory leak. Use VPA recommendations.
ContainerCreatingVolume mount pending (PVC not bound, secret not found)Check PVC status. Verify secret names.
Terminating (stuck)preStop hook hanging, finalizer not clearedForce delete: --grace-period=0 --force. Check finalizers.
# Diagnostic workflow
kubectl get pods                                  # See state
kubectl describe pod myapp-7d9f8b-xyz             # Events + conditions
kubectl logs myapp-7d9f8b-xyz --previous          # Logs from crashed container
kubectl get events --sort-by=.lastTimestamp       # Cluster-wide event stream
kubectl get events -n production --field-selector reason=OOMKilling

# Check resource pressure on nodes
kubectl describe node worker-1 | grep -A5 "Conditions:"
kubectl describe node worker-1 | grep -A20 "Allocated resources:"

# DNS debugging: run a debug pod with network tools
kubectl run debug --image=nicolaka/netshoot -it --rm -- bash
# Inside: nslookup myapp.production.svc.cluster.local
# Inside: curl http://myapp.production.svc.cluster.local/healthz
# Inside: dig @10.96.0.10 myapp.production.svc.cluster.local  # CoreDNS IP

# Ephemeral debug container (k8s 1.23+, doesn't modify running pod)
kubectl debug -it myapp-7d9f8b-xyz \
  --image=busybox \
  --target=app           # Share process namespace with 'app' container

# Copy a pod spec for debugging (adds debug container, changes command)
kubectl debug myapp-7d9f8b-xyz -it --copy-to=debug-pod --image=busybox

Network Debugging

# Check service endpoints (are pods being selected?)
kubectl get endpoints myapp
kubectl describe svc myapp        # Check selector matches pod labels

# Test connectivity from within cluster
kubectl run curl-test --image=curlimages/curl -it --rm -- \
  curl http://myapp.default.svc.cluster.local/healthz

# Check DNS resolution
kubectl run dns-test --image=busybox -it --rm -- \
  nslookup kubernetes.default.svc.cluster.local

# List all network policies in namespace
kubectl get networkpolicies -n production

# Check kube-proxy iptables rules (on node)
iptables -t nat -L KUBE-SERVICES | grep myapp

Resource Exhaustion

# Find resource-hungry pods
kubectl top pods -A --sort-by=cpu | head -20
kubectl top pods -A --sort-by=memory | head -20

# Check for OOMKilled pods
kubectl get pods -A -o json | \
  jq '.items[] | select(.status.containerStatuses[]?.lastState.terminated.reason=="OOMKilled") | .metadata.name'

# Find pods without resource requests (scheduling/QoS risk)
kubectl get pods -A -o json | \
  jq '.items[] | select(.spec.containers[].resources.requests == null) | .metadata.name'

# Check PVC status
kubectl get pvc -A
kubectl describe pvc database-storage  # If Pending: check StorageClass, provisioner

Helm

Helm is the package manager for Kubernetes. A chart is a package of pre-configured Kubernetes resources. A release is a running instance of a chart in a cluster.

Chart Structure

mychart/
├── Chart.yaml          # Chart metadata (name, version, dependencies)
├── values.yaml         # Default configuration values
├── values.schema.json  # Optional JSON schema for values validation
├── charts/             # Chart dependencies (subcharts)
├── templates/          # Kubernetes manifest templates
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── ingress.yaml
│   ├── configmap.yaml
│   ├── _helpers.tpl    # Named templates (partials), not rendered directly
│   ├── NOTES.txt       # Post-install instructions
│   └── tests/
│       └── test-connection.yaml
└── .helmignore
# Chart.yaml
apiVersion: v2
name: myapp
description: My application Helm chart
type: application       # or library
version: 1.2.3          # Chart version (semver)
appVersion: "2.1.0"     # App version (informational)
dependencies:
- name: postgresql
  version: "12.x.x"
  repository: "https://charts.bitnami.com/bitnami"
  condition: postgresql.enabled

Core Commands

# Repository management
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
helm search repo postgresql
helm search hub redis           # Search Artifact Hub

# Install / Upgrade
helm install myapp ./mychart --namespace production --create-namespace
helm install myapp ./mychart -f values.yaml -f values.production.yaml
helm install myapp bitnami/postgresql --set primary.persistence.size=20Gi

# upgrade --install: idempotent (install or upgrade)
helm upgrade --install myapp ./mychart \
  --namespace production \
  --values values.yaml \
  --set image.tag="2.1.0" \
  --atomic          # Roll back automatically on failure
  --timeout 5m

# Dry run
helm upgrade --install myapp ./mychart --dry-run

# Rollback
helm history myapp -n production
helm rollback myapp 3 -n production      # Roll back to revision 3

# Status and debugging
helm list -n production
helm status myapp -n production
helm get values myapp -n production      # Show applied values
helm get manifest myapp -n production    # Show rendered manifests

# Uninstall
helm uninstall myapp -n production
helm uninstall myapp -n production --keep-history   # Keep release history

Templates

# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "mychart.fullname" . }}
  labels:
    {{- include "mychart.labels" . | nindent 4 }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      {{- include "mychart.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      labels:
        {{- include "mychart.selectorLabels" . | nindent 8 }}
    spec:
      containers:
      - name: {{ .Chart.Name }}
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
        {{- if .Values.resources }}
        resources:
          {{- toYaml .Values.resources | nindent 10 }}
        {{- end }}
        env:
        {{- range .Values.env }}
        - name: {{ .name }}
          value: {{ .value | quote }}
        {{- end }}
        {{- with .Values.nodeSelector }}
        nodeSelector:
          {{- toYaml . | nindent 8 }}
        {{- end }}
# templates/_helpers.tpl
{{/*
Expand the name of the chart.
*/}}
{{- define "mychart.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Common labels
*/}}
{{- define "mychart.labels" -}}
helm.sh/chart: {{ include "mychart.chart" . }}
{{ include "mychart.selectorLabels" . }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}

Hooks

# Hook: run at specific lifecycle event
apiVersion: batch/v1
kind: Job
metadata:
  name: "{{ .Release.Name }}-migrate"
  annotations:
    "helm.sh/hook": pre-upgrade,pre-install   # When to run
    "helm.sh/hook-weight": "-5"               # Order (lower runs first)
    "helm.sh/hook-delete-policy": hook-succeeded  # Cleanup after
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: migrate
        image: {{ .Values.image.repository }}:{{ .Values.image.tag }}
        command: ["./migrate", "--up"]

# Hook annotations:
# helm.sh/hook: pre-install, post-install, pre-delete, post-delete,
#               pre-upgrade, post-upgrade, pre-rollback, post-rollback, test

Deployment Strategies

Rolling Update (Default)

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1         # Can temporarily have 1 extra pod
      maxUnavailable: 0   # Zero-downtime: never go below desired count
      # maxSurge and maxUnavailable can also be percentages: "25%"

Blue-Green Deployment

Run two identical environments. Switch traffic instantaneously. Enables instant rollback.

# Deploy "green" alongside existing "blue"
# Blue deployment (current production)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      slot: blue
  template:
    metadata:
      labels:
        app: myapp
        slot: blue
    spec:
      containers:
      - name: app
        image: myapp:1.0

---
# Green deployment (new version)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      slot: green
  template:
    metadata:
      labels:
        app: myapp
        slot: green
    spec:
      containers:
      - name: app
        image: myapp:2.0

---
# Service: switch by patching selector
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
    slot: blue    # Change to "green" to cut over
# Switch traffic from blue to green
kubectl patch svc myapp -p '{"spec":{"selector":{"app":"myapp","slot":"green"}}}'

# Rollback: switch back to blue
kubectl patch svc myapp -p '{"spec":{"selector":{"app":"myapp","slot":"blue"}}}'

Canary Deployment

# Canary: route small % of traffic to new version
# Stable: 9 replicas, Canary: 1 replica = 10% traffic to canary

# Stable deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-stable
spec:
  replicas: 9
  selector:
    matchLabels:
      app: myapp
      track: stable
  template:
    metadata:
      labels:
        app: myapp
        track: stable
    spec:
      containers:
      - name: app
        image: myapp:1.0

---
# Canary deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-canary
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
      track: canary
  template:
    metadata:
      labels:
        app: myapp
        track: canary
    spec:
      containers:
      - name: app
        image: myapp:2.0

---
# Service selects BOTH (only on shared label)
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp    # Matches both stable and canary pods
Header-Based Canary with Argo Rollouts or Istio
For more precise canary control (header-based routing, weighted traffic without replica count dependency), use Argo Rollouts or Istio VirtualService with weight fields. This decouples traffic percentage from pod count.

GitOps

ArgoCD and Flux implement GitOps: the desired state lives in Git, and the operator continuously reconciles the cluster to match. No manual kubectl apply in production.

# ArgoCD Application
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/gitops-repo
    targetRevision: HEAD
    path: apps/myapp/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true       # Delete resources removed from Git
      selfHeal: true    # Fix manual changes (drift correction)
    syncOptions:
    - CreateNamespace=true

Monitoring & Observability

QoS Classes

Kubernetes assigns a QoS class based on resource specifications. This determines eviction priority when a node is under memory pressure.

QoS ClassConditionEviction Priority
GuaranteedEvery container has equal requests == limits for CPU and memoryLast to evict
BurstableAt least one container has requests or limits set (but not equal)Middle priority
BestEffortNo container has any requests or limitsFirst to evict
Guaranteed QoS for Critical Pods
Set requests == limits for CPU and memory on critical pods (databases, payment services). This makes scheduling deterministic and prevents OOM eviction. For batch workloads, BestEffort or Burstable is fine.

Prometheus + Grafana Stack

# Expose metrics for Prometheus scraping
apiVersion: v1
kind: Service
metadata:
  name: myapp
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/path: "/metrics"
    prometheus.io/port: "8080"

---
# PodMonitor (kube-prometheus-stack / Prometheus Operator)
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: myapp
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: myapp
  namespaceSelector:
    matchNames:
    - production
  podMetricsEndpoints:
  - port: metrics
    path: /metrics
    interval: 30s
# Install kube-prometheus-stack via Helm (Prometheus + Grafana + AlertManager)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm upgrade --install kube-prometheus-stack \
  prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set grafana.adminPassword=changeme

Logging Architecture

Kubernetes containers log to stdout/stderr. The kubelet writes logs to /var/log/containers/ on each node. Three patterns:

# Access logs via kubectl (kubelet's log endpoint)
kubectl logs myapp-xyz --tail=200 -f
kubectl logs myapp-xyz --since=1h
kubectl logs myapp-xyz --since-time="2024-01-15T10:00:00Z"

# Stern: multi-pod log tailing (install separately)
stern myapp --namespace production --tail=50

# kubetail: tail multiple pods by label
kubetail -l app=myapp -n production

Operators & CRDs

Custom Resource Definitions

CRDs extend the Kubernetes API with domain-specific resource types. Once installed, you can kubectl get, apply, and watch them like built-in resources.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.db.example.com    # plural.group
spec:
  group: db.example.com
  versions:
  - name: v1
    served: true
    storage: true          # Only one version can be the storage version
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            required: [engine, version, storage]
            properties:
              engine:
                type: string
                enum: [postgres, mysql]
              version:
                type: string
              storage:
                type: string
                pattern: '^[0-9]+Gi$'
    subresources:
      status: {}           # Enables .status subresource
    additionalPrinterColumns:
    - name: Engine
      type: string
      jsonPath: .spec.engine
    - name: Age
      type: date
      jsonPath: .metadata.creationTimestamp
  scope: Namespaced        # or Cluster
  names:
    plural: databases
    singular: database
    kind: Database
    shortNames: [db]
# Using the custom resource
apiVersion: db.example.com/v1
kind: Database
metadata:
  name: production-db
spec:
  engine: postgres
  version: "16"
  storage: 100Gi

Operator Pattern

An operator is a controller that watches custom resources and reconciles cluster state. It encodes operational knowledge (provisioning, scaling, backup, failover) in code.

# Popular operators in production
# cert-manager:       Automates TLS certificate issuance/renewal (Let's Encrypt, ACME)
# external-dns:       Sync Kubernetes Services/Ingresses to DNS providers (Route53, Cloudflare)
# prometheus-operator: Manage Prometheus/Alertmanager instances via CRDs
# external-secrets:   Sync secrets from Vault, AWS SSM, GCP Secret Manager
# crossplane:         Provision cloud resources (RDS, S3, etc.) via Kubernetes CRDs

# Install cert-manager
helm repo add jetstack https://charts.jetstack.io
helm upgrade --install cert-manager jetstack/cert-manager \
  --namespace cert-manager --create-namespace \
  --set installCRDs=true
# cert-manager: automatic TLS for Ingress
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx

Best Practices

Resource Management

High Availability Patterns

spec:
  replicas: 3              # Minimum 2 for HA, ideally odd for quorum-aware apps

  # Spread across nodes — hard requirement
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: myapp
        topologyKey: kubernetes.io/hostname

  # Spread across AZs — soft preference
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: myapp

Image Best Practices

Namespace Organization

PatternNamespacesBest For
Per environmentdev, staging, productionSmall teams, simple topology
Per teampayments, auth, platformMulti-team clusters with isolation needs
Per environment + teampayments-prod, payments-stagingLarge orgs with hard isolation requirements

Label Conventions

# Recommended Kubernetes labels (app.kubernetes.io/)
metadata:
  labels:
    app.kubernetes.io/name: myapp            # App name
    app.kubernetes.io/instance: myapp-prod   # Unique instance
    app.kubernetes.io/version: "2.1.0"       # Current version
    app.kubernetes.io/component: api         # Component role
    app.kubernetes.io/part-of: platform      # Larger system
    app.kubernetes.io/managed-by: helm       # What manages this
    app.kubernetes.io/created-by: ci-system  # Who created it

Security Hardening Checklist

Common Pitfalls & Gotchas

OOMKilled — Memory Limits Too Low

# Diagnose OOMKilled
kubectl describe pod myapp-xyz | grep -A3 "Last State"
# Last State: Terminated
#   Reason: OOMKilled
#   Exit Code: 137

# Check actual memory usage vs limits
kubectl top pod myapp-xyz --containers
kubectl get events --field-selector reason=OOMKilling

# Fix: increase limits, or find the memory leak
# kubectl set resources deployment/myapp --limits=memory=1Gi

Pending Pods

# Check why pod is Pending
kubectl describe pod myapp-xyz
# Common reasons in Events:
# "Insufficient cpu" / "Insufficient memory" — no node has capacity
# "didn't match Pod's node affinity/selector" — no matching node
# "had taint {key:value} that the pod didn't tolerate"
# "pod has unbound immediate PersistentVolumeClaims"

# Check node capacity
kubectl describe nodes | grep -A10 "Allocated resources"

# Add more nodes or reduce requests
# For cluster autoscaler: check if it's scaling (look at CA events)
kubectl -n kube-system logs -l app=cluster-autoscaler --tail=50

DNS Resolution Issues

CoreDNS Overload
In large clusters, DNS queries can overwhelm CoreDNS. Symptoms: intermittent DNS failures, high CoreDNS CPU. Fixes: (1) NodeLocal DNSCache — run a DNS cache on every node, (2) increase CoreDNS replicas, (3) tune ndots:5 in pod spec to reduce search domain lookups.
# Reduce unnecessary DNS queries: set ndots to 2
spec:
  dnsConfig:
    options:
    - name: ndots
      value: "2"     # Default is 5 — causes 5 searches before absolute lookup
    - name: single-request-reopen  # Avoid race condition in some resolvers

PVC Stuck in Pending

# Check PVC status
kubectl describe pvc database-storage
# Causes:
# "no persistent volumes available" — no matching PV, no StorageClass provisioner
# "waiting for first consumer to be created" — WaitForFirstConsumer binding mode
# StorageClass not found
# CSI driver not running (check pods in kube-system)

# Check storageclass
kubectl get storageclass
kubectl describe storageclass fast-ssd | grep Provisioner

# Check CSI driver pods
kubectl get pods -n kube-system | grep csi

Ingress Not Routing

# Common causes:
# 1. Ingress controller not installed
kubectl get pods -n ingress-nginx | grep controller

# 2. Wrong ingressClassName
kubectl get ingressclass

# 3. Service/port mismatch in backend spec
kubectl describe ingress myapp-ingress

# 4. TLS cert not ready
kubectl describe certificate myapp-tls -n production  # cert-manager

# 5. Check ingress controller logs
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller | tail -50

Secret Management Anti-Patterns

Resource Leaks and etcd Performance

# Orphaned resources accumulate over time and slow etcd
# Find completed/failed jobs (clean up regularly)
kubectl get jobs -A --field-selector status.successful=1
kubectl delete jobs -A --field-selector status.successful=1

# Find evicted pods
kubectl get pods -A | grep Evicted
kubectl get pods -A --field-selector status.phase=Failed

# Delete all evicted pods in a namespace
kubectl get pods -n production --field-selector status.phase=Failed \
  -o name | xargs kubectl delete -n production

# Set TTL on jobs to auto-clean
# spec.ttlSecondsAfterFinished: 3600  # In Job spec

# Monitor etcd object count (>100k objects degrades performance)
kubectl get --raw /metrics | grep etcd_object_counts

Rolling Update Gotchas

maxUnavailable: 0 with 1 Replica
If replicas: 1 and maxUnavailable: 0, rolling updates require an extra pod slot (maxSurge must be > 0). If the node has no capacity for the surge pod, the deployment will stall. For single-replica services, you need either spare capacity or accept brief downtime.
# Check rollout status (detects stalled deploys)
kubectl rollout status deployment/myapp --timeout=5m

# Watch pod transitions during rollout
kubectl get pods -l app=myapp -w

# Check if HPA is fighting the rollout (scaling down your new pods)
kubectl get hpa myapp
# Pause HPA during rollout if needed (set min/max to desired count)

kubectl apply vs replace

Never Use kubectl replace in Production
kubectl replace does a full object replacement — it deletes fields not in your manifest, including server-set fields. Use kubectl apply (strategic merge) or kubectl patch. replace can delete status, finalizers, and controller-managed fields.
# Safe update pattern
kubectl diff -f deployment.yaml     # Always review before applying
kubectl apply -f deployment.yaml    # Apply changes
kubectl rollout status deployment/myapp  # Verify success

# If you must force-replace (e.g., immutable fields changed):
kubectl apply -f deployment.yaml --force  # Delete + recreate — causes downtime
# Or: delete + apply separately with controlled timing