Networking Refresher — TCP/IP, DNS, HTTP, TLS

Table of Contents

Setup & Environment

These tools cover the entire debugging surface from DNS resolution to packet-level inspection. Most are available on macOS out of the box.

Built-in macOS Tools

# DNS lookup (both tools available by default)
dig google.com          # full DNS response with query time and server
nslookup google.com     # simpler output, good for quick checks

# Trace the network path (hop-by-hop)
traceroute google.com   # ICMP/UDP based, 30 hops max

# Network connections and listening ports
netstat -an | grep LISTEN      # all listening sockets
netstat -rn                    # routing table

# HTTP — already on macOS
curl -I https://httpbin.org/get     # HEAD request, show response headers
curl -v https://httpbin.org/get     # verbose: shows TLS handshake + headers

Install Extras

brew install httpie mtr nmap wireshark

# httpie — human-friendly HTTP client
http GET https://httpbin.org/get
http POST https://httpbin.org/post name=alice age:=30

# mtr — combines traceroute + ping, live updating
mtr google.com

# nmap — port scanner and service fingerprinter
nmap -sT localhost               # TCP connect scan, localhost
nmap -p 80,443,8080 example.com  # specific ports
nmap -sV example.com             # version detection

# wireshark — GUI packet capture (use tcpdump for CLI)
sudo tcpdump -i en0 port 443 -w capture.pcap
wireshark capture.pcap

Docker for Local Practice

# Create an isolated bridge network to simulate microservice networking
docker network create test-net

# Spin up two containers on the same network
docker run -d --name service-a --network test-net nginx
docker run -d --name service-b --network test-net nginx

# service-a can reach service-b by DNS name (Docker's embedded DNS)
docker exec service-a curl http://service-b

# Inspect the network (see IP assignments, subnet)
docker network inspect test-net

# Cleanup
docker rm -f service-a service-b
docker network rm test-net

httpie vs curl

httpie (http command) is a developer-friendly alternative to curl. It auto-formats JSON responses, adds syntax highlighting, and has a cleaner CLI syntax for setting headers and bodies. Use curl in scripts (it is always available), and http for interactive exploration.

OSI & TCP/IP Model

The OSI model provides a conceptual framework; TCP/IP is what actually runs on the internet. Knowing both is essential for mapping symptoms to layers during debugging.

Layer Mapping

OSI Layer	TCP/IP Layer	Protocols & Technologies	Unit
7 Application	Application	HTTP, HTTPS, gRPC, WebSocket, DNS, SMTP, FTP, SSH	Data / Message
6 Presentation		TLS/SSL, encoding (JSON, Protobuf), compression	Data
5 Session		Session management, RPC, WebSocket sessions	Data
4 Transport	Transport	TCP, UDP, QUIC	Segment / Datagram
3 Network	Internet	IP (IPv4/IPv6), ICMP, BGP, OSPF	Packet
2 Data Link	Network Access	Ethernet, Wi-Fi (802.11), ARP, VLANs	Frame
1 Physical	Network Access	Copper, fiber, radio waves, hubs	Bits

Layer-Based Debugging

When something is broken, think in layers from bottom up:

L1/L2 (Physical/Link): Can you ping the gateway? Is the NIC up? (ifconfig, ip link)
L3 (Network): Can you reach the IP? (ping, traceroute)
L4 (Transport): Is the port open? Is the service listening? (telnet host port, nmap)
L7 (Application): Is the HTTP/gRPC response correct? (curl -v, http)

Interview Tip

When asked "what happens when you type a URL", walk the layers: DNS resolution (L7) → TCP connection (L4) → IP routing (L3) → TLS handshake (L6/L5) → HTTP request (L7). This shows you understand the full stack.

TCP vs UDP

TCP Three-Way Handshake

Client Server | | |--- SYN (seq=x) --------->| Client initiates connection | | |<-- SYN-ACK (seq=y,ack=x+1) -- Server acknowledges + sends its seq | | |--- ACK (ack=y+1) ------->| Client confirms | | |===== DATA TRANSFER =======| Connection established | | |--- FIN ------------------>| Client initiates teardown |<-- FIN-ACK --------------| |<-- FIN ------------------| Server initiates its half-close |--- ACK ------------------>| Four-way teardown complete

TCP Flags

Flag	Meaning	Common Scenario
`SYN`	Synchronize sequence numbers	Connection initiation
`ACK`	Acknowledgment field valid	Every packet after handshake
`FIN`	No more data from sender	Graceful connection close
`RST`	Reset the connection immediately	Port unreachable, connection aborted
`PSH`	Push buffered data to application	Interactive sessions (SSH)
`URG`	Urgent pointer field significant	Rare in practice

Flow Control & Congestion Control

Flow control prevents the sender from overwhelming the receiver. The receiver advertises a receive window (rwnd) — the number of bytes it can buffer. The sender never sends more than rwnd unacknowledged bytes.

Congestion control prevents the sender from overwhelming the network:

Slow start: Begin with cwnd=1 MSS. Double cwnd each RTT until reaching ssthresh. Exponential growth.
Congestion avoidance: Once cwnd >= ssthresh, grow linearly (+1 MSS per RTT). This is AIMD (Additive Increase, Multiplicative Decrease).
Packet loss detected: ssthresh = cwnd/2, cwnd = 1. Restart slow start.
Fast retransmit: 3 duplicate ACKs → retransmit immediately without waiting for timeout.

UDP: Fire and Forget

UDP has no connection establishment, no ordering guarantees, and no retransmission. The header is only 8 bytes (vs TCP's 20+). Use it when:

Latency matters more than reliability: video streaming, online gaming, VoIP
The application handles reliability: QUIC (HTTP/3) implements its own reliable delivery on top of UDP
Request-response is short: DNS (single UDP datagram per query/response, retries handled by the resolver)
Broadcast/multicast: service discovery on a LAN

Property	TCP	UDP
Connection	Connection-oriented (3-way handshake)	Connectionless
Reliability	Guaranteed delivery, ordering, dedup	Best-effort, no guarantees
Overhead	20+ byte header, RTT for setup	8 byte header, no setup
Flow control	Yes (sliding window)	No
Use cases	HTTP, SSH, databases, file transfer	DNS, video, gaming, QUIC

DNS

Resolution Flow

Browser cache miss → OS cache miss → Recursive resolver (ISP or 8.8.8.8) | ┌───────────────┼───────────────┐ v v v Root NS (. ) TLD NS (.com) Auth NS (example.com) | | | "Who owns .com?" "Try ns1.example.com" "93.184.216.34" | Recursive resolver caches result (TTL) | Returns A record to OS | OS caches in /etc/hosts-like store | Browser connects to IP

Record Types

Record	Purpose	Example
A	IPv4 address	`api.example.com → 93.184.216.34`
AAAA	IPv6 address	`api.example.com → 2606:2800::1`
CNAME	Alias to another name	`www → example.com`
MX	Mail server with priority	`10 mail.example.com`
TXT	Arbitrary text (SPF, DKIM, verification)	`"v=spf1 include:..."`
SRV	Service location (host + port)	`_http._tcp → 0 5 80 api.example.com`
NS	Authoritative nameservers for zone	`ns1.cloudflare.com`
PTR	Reverse DNS (IP → name)	Used by mail servers for spam checks

dig & nslookup Examples

# Basic A record lookup
dig google.com

# Specific record type
dig google.com MX
dig google.com TXT
dig google.com NS

# Query a specific resolver (bypass system resolver)
dig @8.8.8.8 google.com

# Short output (just the answer)
dig +short google.com

# Trace the full resolution path
dig +trace google.com

# Reverse lookup (IP to hostname)
dig -x 8.8.8.8

# Check TTL remaining
dig +ttlid google.com

# nslookup interactive
nslookup google.com
nslookup -type=MX google.com

Internal DNS & Service Discovery

In Kubernetes, every Service gets a DNS entry automatically:

# Format: {service}.{namespace}.svc.cluster.local
# From within a pod:
curl http://payment-service.billing.svc.cluster.local:8080/charge

# Short form (within same namespace):
curl http://payment-service:8080/charge

# Kubernetes DNS server: CoreDNS (runs as pods in kube-system)
kubectl get pods -n kube-system | grep coredns

# Headless service (no cluster IP) — returns pod IPs directly
# Useful for StatefulSets (kafka, cassandra): each pod gets a stable DNS name
# kafka-0.kafka.default.svc.cluster.local

TTL and Caching Pitfalls

DNS changes don't propagate instantly. Old IPs can be cached at multiple levels: browser (often ignores TTL, caches 60s), OS resolver, corporate DNS caches. When doing a failover or IP change, lower TTL to 60s at least 24h in advance. After the change, wait for old TTL to expire before raising it back.

HTTP/1.1, HTTP/2, HTTP/3

Request & Response Anatomy

# HTTP/1.1 request (text-based, human-readable)
GET /api/users/42 HTTP/1.1
Host: api.example.com
Accept: application/json
Authorization: Bearer eyJhbGc...
User-Agent: MyApp/1.0

# HTTP response
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 142
Cache-Control: max-age=60

{"id": 42, "name": "Alice"}

Version Comparison

Feature	HTTP/1.1	HTTP/2	HTTP/3
Transport	TCP	TCP	QUIC (UDP)
Multiplexing	No (one request/connection)	Yes (streams on one conn)	Yes (independent streams)
HOL blocking	Yes (request level)	Yes (TCP level)	No (per-stream)
Header compression	None (repeated verbatim)	HPACK (huffman + table)	QPACK
Server push	No	Yes (rarely used)	Yes
Connection setup	1 RTT + TLS	1 RTT + TLS (or 0-RTT)	0-RTT or 1-RTT
Binary	No (text)	Yes (framing layer)	Yes

HTTP/2 Key Features

Multiplexing: Multiple requests/responses interleaved on one TCP connection as independent streams. Eliminates the 6-connection-per-domain workaround of HTTP/1.1.
HPACK compression: Headers are compressed using a static table (common headers pre-indexed) and a dynamic table (connection-specific). Content-Type: application/json becomes a single byte after first use.
Binary framing: Data is split into frames (DATA, HEADERS, SETTINGS, PING). Enables precise flow control per stream.
TCP HOL blocking still exists: A single lost TCP packet stalls all streams on that connection. HTTP/3 solves this by moving to QUIC.

HTTP/3 & QUIC

QUIC is a transport protocol built on UDP that reimplements TCP's reliability and TLS's security, but with independent stream delivery. Key advantages:

0-RTT connection resumption: Returning clients can send data in the first packet (session tickets). New connections take 1 RTT vs TCP+TLS which takes 2-3 RTT.
No HOL blocking: Each stream is independently reliable. A lost packet for stream 3 doesn't block stream 5.
Connection migration: Connection ID is not tied to the source IP:port tuple. Switching from Wi-Fi to cellular doesn't drop the connection.
Encrypted by default: QUIC integrates TLS 1.3. There is no unencrypted QUIC.

Key Status Codes

Code	Meaning	Notes
200 OK	Success	GET, POST, PUT responses
201 Created	Resource created	POST that created a resource; include Location header
204 No Content	Success, no body	DELETE, PUT with no response body
301 Moved Permanently	Redirect (cached)	Browser caches indefinitely; hard to undo
302 Found	Redirect (temporary)	Not cached; use for auth redirects
304 Not Modified	Use cached version	ETag/If-None-Match or Last-Modified
400 Bad Request	Client error	Malformed JSON, missing required field
401 Unauthorized	Not authenticated	Missing or invalid token; include WWW-Authenticate
403 Forbidden	Not authorized	Authenticated but lacks permission
404 Not Found	Resource absent	Never leak whether resource exists to unauthorized callers
409 Conflict	State conflict	Duplicate create, optimistic lock failure
422 Unprocessable Entity	Validation error	Syntactically valid but semantically wrong
429 Too Many Requests	Rate limited	Include Retry-After header
500 Internal Server Error	Server crashed	Never expose stack traces
502 Bad Gateway	Upstream error	Reverse proxy got bad response from backend
503 Service Unavailable	Overloaded/down	Include Retry-After; used during deployments
504 Gateway Timeout	Upstream timeout	Backend took too long; check for slow queries

TLS & HTTPS

TLS 1.3 Handshake (1-RTT)

Client Server | | |--- ClientHello (supported ciphers, ---->| | key_share, random) | | | |<-- ServerHello (chosen cipher, -----| | key_share, Certificate, | | CertificateVerify, Finished) | | | |--- Finished (client auth if mTLS) ------>| | | |========= Encrypted Application Data =====| (1 RTT total) TLS 1.2 required 2 RTTs. TLS 1.3 reduces to 1 RTT. 0-RTT (early data): Client can send data in first flight using session ticket from prior connection. Risk: replay attacks.

Certificates & CA Chain

A TLS certificate proves: "this public key belongs to this domain". The chain of trust works like this:

Root CA: Self-signed, pre-installed in OS/browser trust stores (DigiCert, Let's Encrypt ISRG Root X1, etc.)
Intermediate CA: Signed by Root CA. Root CAs stay offline for security; intermediates sign leaf certs.
Leaf certificate: Your domain's cert, signed by an intermediate. Contains the public key and Subject Alternative Names (SANs).

# Inspect a live TLS certificate
openssl s_client -connect google.com:443 -servername google.com /dev/null \
  | openssl x509 -noout -text | grep -E "Subject:|Issuer:|Not After|DNS:"

# Check certificate chain
openssl s_client -connect google.com:443 -showcerts /dev/null

# Verify expiration date
echo | openssl s_client -servername example.com -connect example.com:443 2>/dev/null \
  | openssl x509 -noout -dates

# Decode a local cert
openssl x509 -in cert.pem -noout -text

Let's Encrypt & ACME

# certbot (ACME client) issues free 90-day certificates
# HTTP-01 challenge: LE places a file at /.well-known/acme-challenge/{token}
# DNS-01 challenge: LE adds a TXT record to your DNS zone (needed for wildcards)

# Issue cert for a domain (nginx)
certbot --nginx -d example.com -d www.example.com

# Standalone (no web server running)
certbot certonly --standalone -d example.com

# Auto-renew (run in cron or systemd timer)
certbot renew --quiet

# In Kubernetes: cert-manager handles ACME automatically
# annotations on Ingress trigger Certificate object creation

mTLS: Mutual TLS for Service-to-Service

In standard TLS, only the server presents a certificate. In mTLS, both client and server present certificates. This provides cryptographic proof of identity on both sides — no passwords or API keys needed.

# Generate a private CA for internal services
openssl genrsa -out ca.key 4096
openssl req -new -x509 -days 3650 -key ca.key -out ca.crt -subj "/CN=MyInternalCA"

# Issue a cert for service-a
openssl genrsa -out service-a.key 2048
openssl req -new -key service-a.key -out service-a.csr -subj "/CN=service-a"
openssl x509 -req -days 365 -in service-a.csr -CA ca.crt -CAkey ca.key \
  -CAcreateserial -out service-a.crt

# curl with mTLS
curl --cert service-a.crt --key service-a.key --cacert ca.crt \
  https://service-b.internal/api/data

Service Meshes Automate mTLS

Managing per-service certificates manually is operationally painful. Tools like Istio and Linkerd automatically provision and rotate mTLS certificates for every pod using SPIFFE/SPIRE identity standards. Certificate rotation happens transparently without restarting services.

REST API Design

HTTP Methods

Method	Semantics	Idempotent	Body
GET	Retrieve resource(s)	Yes	No
POST	Create resource or trigger action	No	Yes
PUT	Replace resource entirely	Yes	Yes
PATCH	Partial update	No (unless careful)	Yes
DELETE	Remove resource	Yes	Optional
HEAD	Same as GET but no body	Yes	No
OPTIONS	What methods does this endpoint support?	Yes	No

URI Design Principles

# Resources are nouns, not verbs
GET  /users/42              # Good: noun
GET  /getUser/42            # Bad: verb

# Nested resources for relationships
GET  /users/42/orders       # Orders belonging to user 42
POST /users/42/orders       # Create an order for user 42
GET  /users/42/orders/7     # Specific order

# Actions that don't fit CRUD: use sub-resources or POST
POST /orders/7/cancel       # Cancel an order
POST /users/42/password-reset

# Collections vs singletons
GET  /users                 # List all users
GET  /users/42              # Specific user
DELETE /users/42            # Delete user 42

# Filtering, sorting, pagination as query params (never in path)
GET /users?role=admin&sort=created_at&order=desc&limit=20&cursor=abc123

Versioning Strategies

Strategy	Example	Pros	Cons
URL path	`/v1/users`	Visible, easy to test in browser	Breaks REST purity (same resource, different URL)
Accept header	`Accept: application/vnd.myapi.v2+json`	RESTfully correct	Hard to test, invisible in browser
Custom header	`API-Version: 2`	Clean URLs	Non-standard, not cacheable by CDN
Query param	`/users?version=2`	Simple	Pollutes query string

Recommendation: URL path versioning (/v1/, /v2/) is the most pragmatic for public APIs. It is explicit, observable in logs, and easy for consumers to manage.

Pagination

// Offset-based (simple but inefficient at scale)
GET /users?offset=20&limit=10
{
  "data": [...],
  "total": 1542,
  "offset": 20,
  "limit": 10
}

// Cursor-based (preferred: stable, works with real-time data)
GET /users?cursor=eyJ1c2VyX2lkIjogNDJ9&limit=10
{
  "data": [...],
  "next_cursor": "eyJ1c2VyX2lkIjogNTJ9",
  "has_more": true
}
// Cursor is typically an opaque base64-encoded bookmark
// (e.g., encoded {id: 42, created_at: "2024-01-15T..."})
// Stable even when records are inserted/deleted between pages

gRPC

gRPC is a high-performance RPC framework developed by Google. It uses Protocol Buffers for serialization and HTTP/2 for transport. It is the dominant choice for internal service-to-service communication in polyglot microservice environments.

Protocol Buffers

# Define service in .proto file
# payment.proto
syntax = "proto3";
package payment;

service PaymentService {
  rpc Charge(ChargeRequest) returns (ChargeResponse);
  rpc StreamTransactions(TransactionFilter)
      returns (stream Transaction);       // server streaming
  rpc BatchCharge(stream ChargeRequest)
      returns (BatchResult);             // client streaming
  rpc Chat(stream Message)
      returns (stream Message);          // bidirectional streaming
}

message ChargeRequest {
  string user_id    = 1;
  int64  amount_cents = 2;
  string currency   = 3;
}

message ChargeResponse {
  string transaction_id = 1;
  string status         = 2;
}

# Generate client/server code
protoc --go_out=. --go-grpc_out=. payment.proto   # Go
python -m grpc_tools.protoc -I. --python_out=. \
  --grpc_python_out=. payment.proto               # Python

gRPC vs REST Comparison

Dimension	gRPC	REST/JSON
Protocol	HTTP/2 (binary)	HTTP/1.1 or HTTP/2 (text)
Serialization	Protocol Buffers (~5x smaller, ~7x faster)	JSON (human-readable, widely supported)
Streaming	Built-in (4 modes)	SSE or WebSockets (not REST)
Code generation	Strong: type-safe clients from .proto	Optional (OpenAPI/Swagger)
Browser support	Requires gRPC-Web proxy	Native
Debugging	Binary (need grpcurl/BloomRPC)	curl/Postman readable
Best for	Internal service calls, low-latency, streaming	External/public APIs, browser clients

# grpcurl — curl for gRPC
# List services
grpcurl -plaintext localhost:50051 list

# Describe a service
grpcurl -plaintext localhost:50051 describe payment.PaymentService

# Call an RPC
grpcurl -plaintext -d '{"user_id": "u42", "amount_cents": 1000, "currency": "USD"}' \
  localhost:50051 payment.PaymentService/Charge

gRPC vs REST Decision Rule

Use gRPC for internal service calls where you control both client and server — the type safety and performance are worth it. Use REST/JSON for anything consumed by external developers, mobile apps, or browsers — the tooling and discoverability are far better.

Exposing APIs: Internal Services

Key Section

Internal APIs are service-to-service calls within your infrastructure. The challenges are: service discovery (how does service A find service B?), load balancing, mutual authentication, and resilience.

Full Request Flow: Service A Calls Service B

Service A (Pod) | | 1. DNS lookup: "payment-service.billing.svc.cluster.local" v CoreDNS (K8s) | | Returns: ClusterIP 10.96.45.12 v K8s Service (ClusterIP: 10.96.45.12:8080) | | 2. kube-proxy rewrites destination to a healthy pod v iptables / IPVS (node kernel) | | Selects: pod 10.244.2.7:8080 (round-robin or random) v Service B Pod (payment-service) | | 3. (Optional) Sidecar proxy intercepts (Envoy/Istio) | - mTLS termination | - Circuit breaker check | - Retry logic | - Telemetry v Application container | v Response travels back same path

Service Mesh (Sidecar Proxy Pattern)

A service mesh injects a proxy sidecar (typically Envoy) into every pod. All traffic passes through the sidecar, giving you observability, security, and reliability without changing application code.

# Istio: enable mTLS for an entire namespace
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: billing
spec:
  mtls:
    mode: STRICT   # Reject plaintext connections

---
# Istio: circuit breaker via DestinationRule
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: payment-service
spec:
  host: payment-service
  trafficPolicy:
    outlierDetection:
      consecutive5xxErrors: 5      # open circuit after 5 errors
      interval: 30s                # check window
      baseEjectionTime: 30s        # eject for 30s
      maxEjectionPercent: 50       # never eject more than 50% of pods

Service Discovery

Kubernetes DNS (default): Every Service automatically gets a DNS entry. CoreDNS watches the K8s API and updates records. Zero configuration required.
Consul: Service registry with health checks, supports multi-datacenter, works for non-K8s services (VMs, bare metal).
Client-side discovery with gRPC: gRPC has a pluggable name resolver. Pass a dns:///payment-service:8080 address and the client resolves A/SRV records periodically and load-balances across returned IPs.

Circuit Breaker Pattern

┌──────────────────────────────────────┐ │ Circuit Breaker States │ └──────────────────────────────────────┘ CLOSED (healthy) OPEN (failing) ───────────────── ────────────── Requests pass through ──────────> Requests fail fast (no call to downstream) Count failures Wait for timeout (e.g. 30s) If failures > threshold Then move to HALF-OPEN │ HALF-OPEN (probing) ───────────────────── Let 1 request through If succeeds → CLOSED If fails → OPEN again

# Python resilience with tenacity + circuit breaker logic
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=1, max=10)
)
async def call_payment_service(payload: dict) -> dict:
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "http://payment-service:8080/charge",
            json=payload,
            timeout=5.0
        )
        response.raise_for_status()
        return response.json()

Rate Limiting Between Services

Internal rate limiting protects downstream services from being overwhelmed by a misbehaving upstream caller. Implement at the proxy layer (Envoy/Istio) rather than in application code:

# Envoy rate limit filter configuration
http_filters:
  - name: envoy.filters.http.ratelimit
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
      domain: billing-service
      rate_limit_service:
        grpc_service:
          envoy_grpc:
            cluster_name: rate_limit_service

Exposing APIs: External Users / Frontend

Key Section

External APIs are consumed by mobile apps, browsers, third-party developers, and webhooks. The concerns shift to authentication, rate limiting, CORS, TLS termination, and protection from malicious traffic.

Full Request Flow: Mobile App to Backend

Mobile App / Browser | | HTTPS request to api.example.com v DNS resolution → Cloudflare/AWS (CDN edge) | | Static assets served from edge cache (CDN HIT) | Dynamic API requests forwarded to origin v AWS ALB / API Gateway / Nginx (TLS termination here) | | Strips TLS, forwards plain HTTP internally (or re-encrypts) v API Gateway (Kong / AWS API Gateway) | | 1. Authentication middleware (validate JWT/API key) | 2. Rate limiting (token bucket per user/IP) | 3. Request logging + tracing (add X-Request-ID) | 4. Route to upstream service v Backend Service (your API) | v Response ← same path in reverse

Authentication Patterns

Method	How It Works	Best For	Pitfall
API Key	Static secret in header (`X-API-Key: ...`)	Server-to-server, developer APIs	Long-lived; rotate carefully
JWT (Bearer)	Signed token in `Authorization: Bearer ...`	User sessions, microservices	Can't revoke without blacklist
OAuth2	Delegated authorization (access token + refresh token)	Third-party app access	Complex flow; use a library
Session Cookie	Server-side session, cookie with session ID	Browser-only web apps	CSRF vulnerability; need SameSite
mTLS	Client certificate presented during TLS	B2B, high-security APIs	Certificate management overhead

# JWT anatomy
# Header.Payload.Signature (base64url encoded, dot-separated)
# Header: {"alg": "RS256", "typ": "JWT"}
# Payload: {"sub": "user42", "iat": 1706000000, "exp": 1706003600, "roles": ["admin"]}
# Signature: RS256(base64(header) + "." + base64(payload), private_key)

# Decode a JWT (never send sensitive data in payload — it's not encrypted)
jwt_token="eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
echo $jwt_token | cut -d. -f2 | base64 -d 2>/dev/null | python3 -m json.tool

# Validate with public key (do this server-side, never client-side)
# Verify: signature, expiration (exp), issuer (iss), audience (aud)

Rate Limiting Algorithms

Algorithm	How It Works	Pros	Cons
Token Bucket	Bucket holds N tokens. Each request consumes 1 token. Tokens refill at fixed rate.	Allows bursts up to bucket size	Burst can overwhelm if bucket is large
Leaky Bucket	Requests enter a queue (bucket). Processed at a fixed rate. Excess dropped.	Smooth output rate	Strict queue; high latency under load
Fixed Window	Count requests in fixed time window (e.g., 100/min). Reset at boundary.	Simple to implement	Burst at window boundary (2x rate)
Sliding Window	Rolling count over last N seconds using timestamps or Redis sorted sets.	No boundary burst	More memory/compute
Sliding Log	Store timestamp of each request. Count in window. Expire old ones.	Most accurate	High memory at scale

# Sliding window rate limit with Redis
import redis
import time

r = redis.Redis()

def is_rate_limited(user_id: str, limit: int, window_seconds: int) -> bool:
    key = f"rate:{user_id}"
    now = time.time()
    window_start = now - window_seconds

    pipe = r.pipeline()
    pipe.zremrangebyscore(key, 0, window_start)   # remove old entries
    pipe.zadd(key, {str(now): now})                # add current request
    pipe.zcard(key)                                # count in window
    pipe.expire(key, window_seconds)               # auto-expire key
    results = pipe.execute()

    request_count = results[2]
    return request_count > limit

CORS (Cross-Origin Resource Sharing)

Browsers enforce the same-origin policy: JavaScript on https://app.example.com cannot call https://api.example.com unless the API explicitly allows it. CORS headers tell the browser which cross-origin requests are permitted.

# Simple CORS request (GET/POST with simple headers)
# Browser automatically adds:
Origin: https://app.example.com

# Server must respond with:
Access-Control-Allow-Origin: https://app.example.com  # or * (never for credentialed)
Access-Control-Allow-Credentials: true                 # if sending cookies/auth

# Preflight (OPTIONS) — triggered by:
# - Non-simple methods (PUT, DELETE, PATCH)
# - Non-simple headers (Authorization, Content-Type: application/json)
OPTIONS /api/users HTTP/1.1
Origin: https://app.example.com
Access-Control-Request-Method: DELETE
Access-Control-Request-Headers: Authorization

# Server must respond to OPTIONS with:
HTTP/1.1 204 No Content
Access-Control-Allow-Origin: https://app.example.com
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
Access-Control-Allow-Headers: Authorization, Content-Type
Access-Control-Max-Age: 86400          # Cache preflight for 24h
Access-Control-Allow-Credentials: true

CORS Pitfall: Wildcard + Credentials

Access-Control-Allow-Origin: * (wildcard) cannot be used with Access-Control-Allow-Credentials: true. If you need to send cookies or Authorization headers cross-origin, you must specify the exact origin. Never use * for authenticated APIs.

TLS Termination

TLS termination is where the encrypted HTTPS connection is decrypted. Two approaches:

Terminate at load balancer (most common): ALB/Nginx decrypts HTTPS, forwards plain HTTP internally. Certificate management is centralized. Internal traffic is unencrypted — acceptable in a trusted VPC, but not zero-trust.
Terminate at application (end-to-end TLS): The load balancer passes through the encrypted TCP stream (L4 passthrough). Each application instance manages its own certificate. More secure, but operationally complex.
Re-encrypt (hybrid): Terminate at load balancer, re-encrypt to backend with internal certificates. Used with service meshes (mTLS).

CDN & API Caching

# Control what CDNs (and browsers) cache via Cache-Control header
# Public GET endpoints
Cache-Control: public, max-age=300, stale-while-revalidate=60

# Private user data — never cache at CDN
Cache-Control: private, no-cache

# Immutable assets (content-hashed filenames)
Cache-Control: public, max-age=31536000, immutable

# CDN cache purge (Cloudflare API)
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache" \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/json" \
  -d '{"purge_everything": true}'

# Vary header: CDN must cache separate responses per Accept-Encoding
Vary: Accept-Encoding

Webhook Patterns

Webhooks are outbound API calls — your server pushes events to a customer's endpoint when something happens. Design considerations:

# Webhook delivery: retry with exponential backoff
import httpx
import hashlib
import hmac

def deliver_webhook(url: str, payload: dict, secret: str) -> bool:
    """
    Deliver webhook with HMAC signature for verification.
    Receiver should validate: hmac.compare_digest(expected_sig, received_sig)
    """
    body = json.dumps(payload, separators=(',', ':')).encode()

    # Signature so receivers can verify authenticity
    signature = hmac.new(
        secret.encode(),
        body,
        hashlib.sha256
    ).hexdigest()

    headers = {
        "Content-Type": "application/json",
        "X-Webhook-Signature": f"sha256={signature}",
        "X-Webhook-Timestamp": str(int(time.time())),
    }

    for attempt in range(5):
        try:
            response = httpx.post(url, content=body, headers=headers, timeout=10)
            if response.status_code < 500:
                return True          # 2xx success or 4xx (don't retry client errors)
        except httpx.RequestError:
            pass
        time.sleep(2 ** attempt)     # 1s, 2s, 4s, 8s, 16s
    return False

Load Balancing

L4 vs L7 Load Balancers

Layer	Sees	Can Route By	Examples	Use Case
L4 (Transport)	TCP/UDP packets	IP, port	AWS NLB, HAProxy (TCP mode)	Low latency, TLS passthrough, non-HTTP
L7 (Application)	HTTP headers, URL, body	Path, host, header, cookie	AWS ALB, Nginx, Envoy, Caddy	HTTP routing, SSL termination, A/B testing

Internet | ┌───────────────────────┐ │ L7 Load Balancer │ (Nginx / ALB) │ Path-based routing │ └───────────────────────┘ /api/* | /static/* | Host-based ↓ ↓ ↓ API Pods API Pods S3 / CDN (backend) (v2 canary) (assets) ┌───────────────────────────────────┐ │ L4 Load Balancer (AWS NLB) │ │ TCP passthrough, ultra-low lat │ └───────────────────────────────────┘ ↓ ↓ gRPC Service Database Proxy (PgBouncer)

Load Balancing Algorithms

Algorithm	How It Works	Best For
Round Robin	Cycle through servers in order	Homogeneous servers, short-lived requests
Weighted Round Robin	Servers get proportional share (e.g., 70/30)	Heterogeneous capacity, canary deployments
Least Connections	Send to server with fewest active connections	Long-lived connections (WebSockets, gRPC streams)
IP Hash	Hash client IP to always route to same server	Simple session affinity (no shared session store)
Consistent Hashing	Hash to a ring; minimize remapping on server add/remove	Cache clusters, distributed storage
Random (power of 2)	Pick 2 random servers, choose less loaded	Large fleets, avoids round-robin thundering herd

Health Checks & Session Affinity

# Nginx upstream with health checks and sticky sessions
upstream api_backend {
    least_conn;                          # algorithm

    server api-1.internal:8080;
    server api-2.internal:8080;
    server api-3.internal:8080;

    # Sticky sessions via cookie (nginx plus / commercial)
    sticky cookie srv_id expires=1h domain=.example.com path=/;

    keepalive 32;                        # reuse connections to backend
}

server {
    location /api/ {
        proxy_pass http://api_backend;
        health_check interval=5s fails=3 passes=2;   # nginx plus
    }
}

# AWS ALB target group health check equivalent
# In Terraform:
resource "aws_lb_target_group" "api" {
  health_check {
    path                = "/healthz"
    interval            = 30
    healthy_threshold   = 2
    unhealthy_threshold = 3
    timeout             = 5
  }
}

WebSockets & Server-Sent Events

WebSocket Upgrade Handshake

WebSockets start as an HTTP request and are "upgraded" to a persistent bidirectional TCP connection. The HTTP connection is reused — no new TCP connection is needed.

# WebSocket upgrade request
GET /ws/feed HTTP/1.1
Host: api.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==   # random base64
Sec-WebSocket-Version: 13

# Server accepts upgrade
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=  # SHA1 of key + GUID

# After upgrade: full-duplex binary framing
# Ping/pong frames for keepalive
# Close frame for graceful shutdown

Server-Sent Events (SSE)

SSE is one-way server push over a single HTTP connection. Simpler than WebSockets when you only need server → client streaming.

# FastAPI SSE endpoint
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import asyncio

app = FastAPI()

async def event_generator():
    while True:
        # yield events in SSE format
        data = await get_next_event()
        yield f"id: {data['id']}\n"
        yield f"event: {data['type']}\n"
        yield f"data: {json.dumps(data['payload'])}\n\n"
        await asyncio.sleep(0.1)

@app.get("/events")
async def stream_events():
    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "X-Accel-Buffering": "no"   # disable Nginx buffering
        }
    )

Comparison: Real-Time Options

Method	Direction	Protocol	Best For	Notes
WebSocket	Bidirectional	WS (over TCP)	Chat, games, collaborative editing	Needs sticky sessions at LB
SSE	Server → Client only	HTTP	Live feeds, notifications, logs	Auto-reconnect built-in; simpler
Long Polling	Server → Client	HTTP	Fallback, fire-and-forget push	High overhead; legacy approach
HTTP/2 Server Push	Server → Client	HTTP/2	Preloading assets	Deprecated/removed in Chrome; avoid

Load Balancing WebSockets

WebSockets maintain a persistent connection to a specific backend instance. This creates challenges for load balancers:

Sticky sessions required: The LB must route all frames from one client to the same backend. Use IP hash or cookie-based affinity.
Timeout configuration: Default LB idle timeouts (60s for AWS ALB) will kill long-lived WebSocket connections. Increase to 3600s or use keepalive pings.
Horizontal scaling is harder: State is local to the connection. To scale out, use a pub/sub layer (Redis Pub/Sub, NATS) so any backend can receive and forward messages to the appropriate WebSocket connection.

Network Debugging

curl Advanced Usage

# Verbose: show TLS handshake, request/response headers
curl -v https://api.example.com/users

# Show only response headers
curl -I https://api.example.com/users

# Custom headers
curl -H "Authorization: Bearer eyJhbGc..." \
     -H "Content-Type: application/json" \
     https://api.example.com/users

# POST with JSON body
curl -X POST https://api.example.com/users \
     -H "Content-Type: application/json" \
     -d '{"name": "Alice", "email": "[email protected]"}'

# Override DNS (test a specific IP without changing /etc/hosts)
curl --resolve api.example.com:443:93.184.216.34 https://api.example.com/users

# Follow redirects, show final URL
curl -L -w "%{url_effective}\n" https://short.url/abc

# Measure timing breakdown
curl -w "\nDNS: %{time_namelookup}s\nConnect: %{time_connect}s\nTLS: %{time_appconnect}s\nFirst byte: %{time_starttransfer}s\nTotal: %{time_total}s\n" \
     -o /dev/null -s https://api.example.com/users

# Set timeout
curl --connect-timeout 5 --max-time 30 https://api.example.com/users

tcpdump Basics

# Capture HTTP traffic on port 80 to a file
sudo tcpdump -i en0 port 80 -w capture.pcap

# Live capture: show HTTP requests (ASCII)
sudo tcpdump -i en0 -A port 80

# Filter by host and port
sudo tcpdump -i en0 host api.example.com and port 443

# Capture DNS queries
sudo tcpdump -i en0 port 53

# Read captured file in Wireshark
wireshark capture.pcap

# Note: TLS traffic is encrypted in captures.
# Decrypt with SSLKEYLOGFILE env var (Chrome/Firefox/curl support it):
SSLKEYLOGFILE=~/ssl-keys.log curl https://api.example.com
# Then load ssl-keys.log in Wireshark under TLS preferences

Other Diagnostic Tools

# mtr: continuous traceroute (shows packet loss at each hop)
mtr --report --report-cycles 10 google.com

# ss: socket statistics (modern netstat replacement)
ss -tlnp              # TCP listening sockets with process names
ss -tnp               # established TCP connections
ss -s                 # summary statistics

# lsof: what process owns a port
lsof -i :8080         # what's on port 8080
lsof -i TCP:443       # all TCP connections on 443
lsof -i -n -P | grep LISTEN   # all listening ports

# Check if a port is open (no nmap needed)
timeout 3 bash -c 'cat < /dev/null > /dev/tcp/api.example.com/443'
echo $?  # 0 = open, 1 = closed/refused

Common Debugging Scenarios

Connection refused (ECONNREFUSED)

The port is not open on the target host. Check:

lsof -i :PORT — is anything listening on that port?
docker ps — is the container running? Did it crash?
Is the service binding to 127.0.0.1 (loopback only) instead of 0.0.0.0?
Is a firewall (iptables, security group) blocking the port?

lsof -i :8080
# If nothing shows, the service isn't running or crashed at startup
# Check service logs: docker logs container-name, journalctl -u service-name

Connection timeout (no response)

Unlike connection refused, the packet reached the network but got dropped silently:

Firewall is blocking and dropping (not rejecting) — check security group rules
Host is unreachable — check routing table, VPC peering, VPN
Wrong IP — the DNS resolved to the wrong address

traceroute api.example.com    # Where does the path stop?
dig api.example.com            # Is DNS resolving to the expected IP?
nmap -p 443 api.example.com   # Is the port responding?

DNS failure (NXDOMAIN or SERVFAIL)

# Test with a public resolver (bypass local cache)
dig @8.8.8.8 api.example.com
dig @1.1.1.1 api.example.com

# Check if it's a local cache issue
sudo killall -HUP mDNSResponder   # flush macOS DNS cache
# or: sudo dscacheutil -flushcache

# SERVFAIL: upstream resolver error — try a different resolver
# NXDOMAIN: the name truly doesn't exist — check DNS zone, check for typos

Certificate errors (SSL handshake failed)

# Check certificate details
openssl s_client -connect api.example.com:443 -servername api.example.com

# Common errors:
# "certificate has expired"
openssl s_client -connect api.example.com:443 2>/dev/null | openssl x509 -noout -dates

# "hostname mismatch" — cert's CN/SANs don't match the hostname
openssl s_client -connect api.example.com:443 2>/dev/null | openssl x509 -noout -text | grep DNS

# "certificate signed by unknown authority" — custom CA not in trust store
curl --cacert /path/to/custom-ca.crt https://internal.example.com
# or add to system trust store:
# macOS: sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain custom-ca.crt

Network Security

Firewalls & Security Groups

Firewalls filter traffic based on source/destination IP, port, and protocol. In cloud environments, security groups are stateful firewalls at the instance/ENI level.

# iptables: Linux kernel firewall (legacy, still common)
# Allow established connections (stateful)
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Allow incoming on port 443
iptables -A INPUT -p tcp --dport 443 -j ACCEPT

# Drop everything else
iptables -A INPUT -j DROP

# List current rules
iptables -L -n -v

# nftables: modern replacement for iptables
nft list ruleset

# AWS Security Group (Terraform)
resource "aws_security_group" "api" {
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]     # public HTTPS
  }
  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.lb.id]  # only from LB
  }
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Network Segmentation (VPC)

VPC (10.0.0.0/16) ┌─────────────────────────────────────────────────────────┐ │ │ │ Public Subnets (10.0.1.0/24, 10.0.2.0/24) │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ ALB / NAT │ │ Bastion │ ← only SSH entry │ │ │ Gateway │ │ Host │ │ │ └──────────────┘ └──────────────┘ │ │ │ │ │ │ ▼ ▼ (SSH jump) │ │ Private Subnets (10.0.3.0/24, 10.0.4.0/24) │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ App Servers │ │ Databases │ ← no internet access│ │ │ (EKS nodes) │ │ (RDS, Elast.│ │ │ └──────────────┘ └──────────────┘ │ │ │ │ No direct internet route to private subnets │ │ Outbound: private subnet → NAT Gateway → Internet │ └─────────────────────────────────────────────────────────┘

Common Attacks & Mitigations

Attack	How It Works	Mitigation
SYN Flood (DDoS)	Flood server with SYN packets, exhaust connection table (half-open connections)	SYN cookies (stateless), rate limit SYN at edge, Cloudflare/AWS Shield
DNS Spoofing / Cache Poisoning	Inject false DNS records into resolver cache, redirect traffic to attacker	DNSSEC (signed records), randomize source port + query ID
MITM (Man-in-the-Middle)	Attacker intercepts traffic between client and server	TLS everywhere, certificate pinning, HSTS
BGP Hijacking	Malicious AS announces prefixes it doesn't own, reroutes internet traffic	RPKI (Route Origin Authorization), BGP filtering
Amplification DDoS	Use UDP services (DNS, NTP) to amplify small requests into large floods	BCP38 ingress filtering, rate-limit UDP reflection, disable open resolvers
SSL Stripping	Downgrade HTTPS to HTTP in transit (requires MITM position)	HSTS, HSTS Preload list — browser refuses to connect over HTTP

Zero-Trust Architecture

Traditional perimeter security assumes: "if you're inside the network, you can be trusted." Zero-trust assumes: "never trust, always verify" — regardless of network location.

Identity-based access: Every service has a cryptographic identity (SPIFFE/SPIRE). Access is granted per-identity, not per-IP.
mTLS everywhere: All service-to-service calls are mutually authenticated. No implicit trust just because traffic is internal.
Least privilege: Services only get network access to what they need. A web server can reach the database; a batch worker cannot reach the payment service.
No implicit VPN trust: VPN access doesn't grant blanket access. Each resource requires separate authorization.
Continuous verification: Auth tokens are short-lived; access is re-verified on each request.

# Kubernetes NetworkPolicy: restrict what can talk to the database
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: database-allow-api-only
  namespace: production
spec:
  podSelector:
    matchLabels:
      role: database
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              role: api      # only pods labeled role=api can connect
      ports:
        - protocol: TCP
          port: 5432
# All other ingress to database pods is dropped by default

HSTS & Certificate Pinning

# HSTS: tell browsers to always use HTTPS (never downgrade)
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

# max-age=31536000 = 1 year
# includeSubDomains: applies to all subdomains
# preload: submit to browser HSTS preload list (hardcoded in Chrome/Firefox)
# WARNING: preload is nearly irreversible — only add if you're committed to HTTPS

# Certificate pinning (mobile apps)
# Pin the public key hash of your certificate or CA
# If the cert changes without updating the pin, all requests fail
# Risk: if you lose the private key, your app is permanently broken for old versions