Networking Refresher
TCP/IP, DNS, HTTP, TLS, load balancing, and API exposure patterns for internal and external services
Table of Contents
Setup & Environment
These tools cover the entire debugging surface from DNS resolution to packet-level inspection. Most are available on macOS out of the box.
Built-in macOS Tools
# DNS lookup (both tools available by default)
dig google.com # full DNS response with query time and server
nslookup google.com # simpler output, good for quick checks
# Trace the network path (hop-by-hop)
traceroute google.com # ICMP/UDP based, 30 hops max
# Network connections and listening ports
netstat -an | grep LISTEN # all listening sockets
netstat -rn # routing table
# HTTP — already on macOS
curl -I https://httpbin.org/get # HEAD request, show response headers
curl -v https://httpbin.org/get # verbose: shows TLS handshake + headers
Install Extras
brew install httpie mtr nmap wireshark
# httpie — human-friendly HTTP client
http GET https://httpbin.org/get
http POST https://httpbin.org/post name=alice age:=30
# mtr — combines traceroute + ping, live updating
mtr google.com
# nmap — port scanner and service fingerprinter
nmap -sT localhost # TCP connect scan, localhost
nmap -p 80,443,8080 example.com # specific ports
nmap -sV example.com # version detection
# wireshark — GUI packet capture (use tcpdump for CLI)
sudo tcpdump -i en0 port 443 -w capture.pcap
wireshark capture.pcap
Docker for Local Practice
# Create an isolated bridge network to simulate microservice networking
docker network create test-net
# Spin up two containers on the same network
docker run -d --name service-a --network test-net nginx
docker run -d --name service-b --network test-net nginx
# service-a can reach service-b by DNS name (Docker's embedded DNS)
docker exec service-a curl http://service-b
# Inspect the network (see IP assignments, subnet)
docker network inspect test-net
# Cleanup
docker rm -f service-a service-b
docker network rm test-net
httpie (http command) is a developer-friendly alternative to curl. It auto-formats JSON responses, adds syntax highlighting, and has a cleaner CLI syntax for setting headers and bodies. Use curl in scripts (it is always available), and http for interactive exploration.
OSI & TCP/IP Model
The OSI model provides a conceptual framework; TCP/IP is what actually runs on the internet. Knowing both is essential for mapping symptoms to layers during debugging.
Layer Mapping
| OSI Layer | TCP/IP Layer | Protocols & Technologies | Unit |
|---|---|---|---|
| 7 Application | Application | HTTP, HTTPS, gRPC, WebSocket, DNS, SMTP, FTP, SSH | Data / Message |
| 6 Presentation | TLS/SSL, encoding (JSON, Protobuf), compression | Data | |
| 5 Session | Session management, RPC, WebSocket sessions | Data | |
| 4 Transport | Transport | TCP, UDP, QUIC | Segment / Datagram |
| 3 Network | Internet | IP (IPv4/IPv6), ICMP, BGP, OSPF | Packet |
| 2 Data Link | Network Access | Ethernet, Wi-Fi (802.11), ARP, VLANs | Frame |
| 1 Physical | Copper, fiber, radio waves, hubs | Bits |
Layer-Based Debugging
When something is broken, think in layers from bottom up:
- L1/L2 (Physical/Link): Can you ping the gateway? Is the NIC up? (
ifconfig,ip link) - L3 (Network): Can you reach the IP? (
ping,traceroute) - L4 (Transport): Is the port open? Is the service listening? (
telnet host port,nmap) - L7 (Application): Is the HTTP/gRPC response correct? (
curl -v,http)
TCP vs UDP
TCP Three-Way Handshake
TCP Flags
| Flag | Meaning | Common Scenario |
|---|---|---|
SYN | Synchronize sequence numbers | Connection initiation |
ACK | Acknowledgment field valid | Every packet after handshake |
FIN | No more data from sender | Graceful connection close |
RST | Reset the connection immediately | Port unreachable, connection aborted |
PSH | Push buffered data to application | Interactive sessions (SSH) |
URG | Urgent pointer field significant | Rare in practice |
Flow Control & Congestion Control
Flow control prevents the sender from overwhelming the receiver. The receiver advertises a receive window (rwnd) — the number of bytes it can buffer. The sender never sends more than rwnd unacknowledged bytes.
Congestion control prevents the sender from overwhelming the network:
- Slow start: Begin with cwnd=1 MSS. Double cwnd each RTT until reaching ssthresh. Exponential growth.
- Congestion avoidance: Once cwnd >= ssthresh, grow linearly (+1 MSS per RTT). This is AIMD (Additive Increase, Multiplicative Decrease).
- Packet loss detected: ssthresh = cwnd/2, cwnd = 1. Restart slow start.
- Fast retransmit: 3 duplicate ACKs → retransmit immediately without waiting for timeout.
UDP: Fire and Forget
UDP has no connection establishment, no ordering guarantees, and no retransmission. The header is only 8 bytes (vs TCP's 20+). Use it when:
- Latency matters more than reliability: video streaming, online gaming, VoIP
- The application handles reliability: QUIC (HTTP/3) implements its own reliable delivery on top of UDP
- Request-response is short: DNS (single UDP datagram per query/response, retries handled by the resolver)
- Broadcast/multicast: service discovery on a LAN
| Property | TCP | UDP |
|---|---|---|
| Connection | Connection-oriented (3-way handshake) | Connectionless |
| Reliability | Guaranteed delivery, ordering, dedup | Best-effort, no guarantees |
| Overhead | 20+ byte header, RTT for setup | 8 byte header, no setup |
| Flow control | Yes (sliding window) | No |
| Use cases | HTTP, SSH, databases, file transfer | DNS, video, gaming, QUIC |
DNS
Resolution Flow
Record Types
| Record | Purpose | Example |
|---|---|---|
| A | IPv4 address | api.example.com → 93.184.216.34 |
| AAAA | IPv6 address | api.example.com → 2606:2800::1 |
| CNAME | Alias to another name | www → example.com |
| MX | Mail server with priority | 10 mail.example.com |
| TXT | Arbitrary text (SPF, DKIM, verification) | "v=spf1 include:..." |
| SRV | Service location (host + port) | _http._tcp → 0 5 80 api.example.com |
| NS | Authoritative nameservers for zone | ns1.cloudflare.com |
| PTR | Reverse DNS (IP → name) | Used by mail servers for spam checks |
dig & nslookup Examples
# Basic A record lookup
dig google.com
# Specific record type
dig google.com MX
dig google.com TXT
dig google.com NS
# Query a specific resolver (bypass system resolver)
dig @8.8.8.8 google.com
# Short output (just the answer)
dig +short google.com
# Trace the full resolution path
dig +trace google.com
# Reverse lookup (IP to hostname)
dig -x 8.8.8.8
# Check TTL remaining
dig +ttlid google.com
# nslookup interactive
nslookup google.com
nslookup -type=MX google.com
Internal DNS & Service Discovery
In Kubernetes, every Service gets a DNS entry automatically:
# Format: {service}.{namespace}.svc.cluster.local
# From within a pod:
curl http://payment-service.billing.svc.cluster.local:8080/charge
# Short form (within same namespace):
curl http://payment-service:8080/charge
# Kubernetes DNS server: CoreDNS (runs as pods in kube-system)
kubectl get pods -n kube-system | grep coredns
# Headless service (no cluster IP) — returns pod IPs directly
# Useful for StatefulSets (kafka, cassandra): each pod gets a stable DNS name
# kafka-0.kafka.default.svc.cluster.local
HTTP/1.1, HTTP/2, HTTP/3
Request & Response Anatomy
# HTTP/1.1 request (text-based, human-readable)
GET /api/users/42 HTTP/1.1
Host: api.example.com
Accept: application/json
Authorization: Bearer eyJhbGc...
User-Agent: MyApp/1.0
# HTTP response
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 142
Cache-Control: max-age=60
{"id": 42, "name": "Alice"}
Version Comparison
| Feature | HTTP/1.1 | HTTP/2 | HTTP/3 |
|---|---|---|---|
| Transport | TCP | TCP | QUIC (UDP) |
| Multiplexing | No (one request/connection) | Yes (streams on one conn) | Yes (independent streams) |
| HOL blocking | Yes (request level) | Yes (TCP level) | No (per-stream) |
| Header compression | None (repeated verbatim) | HPACK (huffman + table) | QPACK |
| Server push | No | Yes (rarely used) | Yes |
| Connection setup | 1 RTT + TLS | 1 RTT + TLS (or 0-RTT) | 0-RTT or 1-RTT |
| Binary | No (text) | Yes (framing layer) | Yes |
HTTP/2 Key Features
- Multiplexing: Multiple requests/responses interleaved on one TCP connection as independent streams. Eliminates the 6-connection-per-domain workaround of HTTP/1.1.
- HPACK compression: Headers are compressed using a static table (common headers pre-indexed) and a dynamic table (connection-specific).
Content-Type: application/jsonbecomes a single byte after first use. - Binary framing: Data is split into frames (DATA, HEADERS, SETTINGS, PING). Enables precise flow control per stream.
- TCP HOL blocking still exists: A single lost TCP packet stalls all streams on that connection. HTTP/3 solves this by moving to QUIC.
HTTP/3 & QUIC
QUIC is a transport protocol built on UDP that reimplements TCP's reliability and TLS's security, but with independent stream delivery. Key advantages:
- 0-RTT connection resumption: Returning clients can send data in the first packet (session tickets). New connections take 1 RTT vs TCP+TLS which takes 2-3 RTT.
- No HOL blocking: Each stream is independently reliable. A lost packet for stream 3 doesn't block stream 5.
- Connection migration: Connection ID is not tied to the source IP:port tuple. Switching from Wi-Fi to cellular doesn't drop the connection.
- Encrypted by default: QUIC integrates TLS 1.3. There is no unencrypted QUIC.
Key Status Codes
| Code | Meaning | Notes |
|---|---|---|
| 200 OK | Success | GET, POST, PUT responses |
| 201 Created | Resource created | POST that created a resource; include Location header |
| 204 No Content | Success, no body | DELETE, PUT with no response body |
| 301 Moved Permanently | Redirect (cached) | Browser caches indefinitely; hard to undo |
| 302 Found | Redirect (temporary) | Not cached; use for auth redirects |
| 304 Not Modified | Use cached version | ETag/If-None-Match or Last-Modified |
| 400 Bad Request | Client error | Malformed JSON, missing required field |
| 401 Unauthorized | Not authenticated | Missing or invalid token; include WWW-Authenticate |
| 403 Forbidden | Not authorized | Authenticated but lacks permission |
| 404 Not Found | Resource absent | Never leak whether resource exists to unauthorized callers |
| 409 Conflict | State conflict | Duplicate create, optimistic lock failure |
| 422 Unprocessable Entity | Validation error | Syntactically valid but semantically wrong |
| 429 Too Many Requests | Rate limited | Include Retry-After header |
| 500 Internal Server Error | Server crashed | Never expose stack traces |
| 502 Bad Gateway | Upstream error | Reverse proxy got bad response from backend |
| 503 Service Unavailable | Overloaded/down | Include Retry-After; used during deployments |
| 504 Gateway Timeout | Upstream timeout | Backend took too long; check for slow queries |
TLS & HTTPS
TLS 1.3 Handshake (1-RTT)
Certificates & CA Chain
A TLS certificate proves: "this public key belongs to this domain". The chain of trust works like this:
- Root CA: Self-signed, pre-installed in OS/browser trust stores (DigiCert, Let's Encrypt ISRG Root X1, etc.)
- Intermediate CA: Signed by Root CA. Root CAs stay offline for security; intermediates sign leaf certs.
- Leaf certificate: Your domain's cert, signed by an intermediate. Contains the public key and Subject Alternative Names (SANs).
# Inspect a live TLS certificate
openssl s_client -connect google.com:443 -servername google.com /dev/null \
| openssl x509 -noout -text | grep -E "Subject:|Issuer:|Not After|DNS:"
# Check certificate chain
openssl s_client -connect google.com:443 -showcerts /dev/null
# Verify expiration date
echo | openssl s_client -servername example.com -connect example.com:443 2>/dev/null \
| openssl x509 -noout -dates
# Decode a local cert
openssl x509 -in cert.pem -noout -text
Let's Encrypt & ACME
# certbot (ACME client) issues free 90-day certificates
# HTTP-01 challenge: LE places a file at /.well-known/acme-challenge/{token}
# DNS-01 challenge: LE adds a TXT record to your DNS zone (needed for wildcards)
# Issue cert for a domain (nginx)
certbot --nginx -d example.com -d www.example.com
# Standalone (no web server running)
certbot certonly --standalone -d example.com
# Auto-renew (run in cron or systemd timer)
certbot renew --quiet
# In Kubernetes: cert-manager handles ACME automatically
# annotations on Ingress trigger Certificate object creation
mTLS: Mutual TLS for Service-to-Service
In standard TLS, only the server presents a certificate. In mTLS, both client and server present certificates. This provides cryptographic proof of identity on both sides — no passwords or API keys needed.
# Generate a private CA for internal services
openssl genrsa -out ca.key 4096
openssl req -new -x509 -days 3650 -key ca.key -out ca.crt -subj "/CN=MyInternalCA"
# Issue a cert for service-a
openssl genrsa -out service-a.key 2048
openssl req -new -key service-a.key -out service-a.csr -subj "/CN=service-a"
openssl x509 -req -days 365 -in service-a.csr -CA ca.crt -CAkey ca.key \
-CAcreateserial -out service-a.crt
# curl with mTLS
curl --cert service-a.crt --key service-a.key --cacert ca.crt \
https://service-b.internal/api/data
REST API Design
HTTP Methods
| Method | Semantics | Idempotent | Body |
|---|---|---|---|
| GET | Retrieve resource(s) | Yes | No |
| POST | Create resource or trigger action | No | Yes |
| PUT | Replace resource entirely | Yes | Yes |
| PATCH | Partial update | No (unless careful) | Yes |
| DELETE | Remove resource | Yes | Optional |
| HEAD | Same as GET but no body | Yes | No |
| OPTIONS | What methods does this endpoint support? | Yes | No |
URI Design Principles
# Resources are nouns, not verbs
GET /users/42 # Good: noun
GET /getUser/42 # Bad: verb
# Nested resources for relationships
GET /users/42/orders # Orders belonging to user 42
POST /users/42/orders # Create an order for user 42
GET /users/42/orders/7 # Specific order
# Actions that don't fit CRUD: use sub-resources or POST
POST /orders/7/cancel # Cancel an order
POST /users/42/password-reset
# Collections vs singletons
GET /users # List all users
GET /users/42 # Specific user
DELETE /users/42 # Delete user 42
# Filtering, sorting, pagination as query params (never in path)
GET /users?role=admin&sort=created_at&order=desc&limit=20&cursor=abc123
Versioning Strategies
| Strategy | Example | Pros | Cons |
|---|---|---|---|
| URL path | /v1/users | Visible, easy to test in browser | Breaks REST purity (same resource, different URL) |
| Accept header | Accept: application/vnd.myapi.v2+json | RESTfully correct | Hard to test, invisible in browser |
| Custom header | API-Version: 2 | Clean URLs | Non-standard, not cacheable by CDN |
| Query param | /users?version=2 | Simple | Pollutes query string |
Recommendation: URL path versioning (/v1/, /v2/) is the most pragmatic for public APIs. It is explicit, observable in logs, and easy for consumers to manage.
Pagination
// Offset-based (simple but inefficient at scale)
GET /users?offset=20&limit=10
{
"data": [...],
"total": 1542,
"offset": 20,
"limit": 10
}
// Cursor-based (preferred: stable, works with real-time data)
GET /users?cursor=eyJ1c2VyX2lkIjogNDJ9&limit=10
{
"data": [...],
"next_cursor": "eyJ1c2VyX2lkIjogNTJ9",
"has_more": true
}
// Cursor is typically an opaque base64-encoded bookmark
// (e.g., encoded {id: 42, created_at: "2024-01-15T..."})
// Stable even when records are inserted/deleted between pages
gRPC
gRPC is a high-performance RPC framework developed by Google. It uses Protocol Buffers for serialization and HTTP/2 for transport. It is the dominant choice for internal service-to-service communication in polyglot microservice environments.
Protocol Buffers
# Define service in .proto file
# payment.proto
syntax = "proto3";
package payment;
service PaymentService {
rpc Charge(ChargeRequest) returns (ChargeResponse);
rpc StreamTransactions(TransactionFilter)
returns (stream Transaction); // server streaming
rpc BatchCharge(stream ChargeRequest)
returns (BatchResult); // client streaming
rpc Chat(stream Message)
returns (stream Message); // bidirectional streaming
}
message ChargeRequest {
string user_id = 1;
int64 amount_cents = 2;
string currency = 3;
}
message ChargeResponse {
string transaction_id = 1;
string status = 2;
}
# Generate client/server code
protoc --go_out=. --go-grpc_out=. payment.proto # Go
python -m grpc_tools.protoc -I. --python_out=. \
--grpc_python_out=. payment.proto # Python
gRPC vs REST Comparison
| Dimension | gRPC | REST/JSON |
|---|---|---|
| Protocol | HTTP/2 (binary) | HTTP/1.1 or HTTP/2 (text) |
| Serialization | Protocol Buffers (~5x smaller, ~7x faster) | JSON (human-readable, widely supported) |
| Streaming | Built-in (4 modes) | SSE or WebSockets (not REST) |
| Code generation | Strong: type-safe clients from .proto | Optional (OpenAPI/Swagger) |
| Browser support | Requires gRPC-Web proxy | Native |
| Debugging | Binary (need grpcurl/BloomRPC) | curl/Postman readable |
| Best for | Internal service calls, low-latency, streaming | External/public APIs, browser clients |
# grpcurl — curl for gRPC
# List services
grpcurl -plaintext localhost:50051 list
# Describe a service
grpcurl -plaintext localhost:50051 describe payment.PaymentService
# Call an RPC
grpcurl -plaintext -d '{"user_id": "u42", "amount_cents": 1000, "currency": "USD"}' \
localhost:50051 payment.PaymentService/Charge
Exposing APIs: Internal Services
Key SectionInternal APIs are service-to-service calls within your infrastructure. The challenges are: service discovery (how does service A find service B?), load balancing, mutual authentication, and resilience.
Full Request Flow: Service A Calls Service B
Service Mesh (Sidecar Proxy Pattern)
A service mesh injects a proxy sidecar (typically Envoy) into every pod. All traffic passes through the sidecar, giving you observability, security, and reliability without changing application code.
# Istio: enable mTLS for an entire namespace
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: billing
spec:
mtls:
mode: STRICT # Reject plaintext connections
---
# Istio: circuit breaker via DestinationRule
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: payment-service
spec:
host: payment-service
trafficPolicy:
outlierDetection:
consecutive5xxErrors: 5 # open circuit after 5 errors
interval: 30s # check window
baseEjectionTime: 30s # eject for 30s
maxEjectionPercent: 50 # never eject more than 50% of pods
Service Discovery
- Kubernetes DNS (default): Every Service automatically gets a DNS entry. CoreDNS watches the K8s API and updates records. Zero configuration required.
- Consul: Service registry with health checks, supports multi-datacenter, works for non-K8s services (VMs, bare metal).
- Client-side discovery with gRPC: gRPC has a pluggable name resolver. Pass a
dns:///payment-service:8080address and the client resolves A/SRV records periodically and load-balances across returned IPs.
Circuit Breaker Pattern
# Python resilience with tenacity + circuit breaker logic
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=1, max=10)
)
async def call_payment_service(payload: dict) -> dict:
async with httpx.AsyncClient() as client:
response = await client.post(
"http://payment-service:8080/charge",
json=payload,
timeout=5.0
)
response.raise_for_status()
return response.json()
Rate Limiting Between Services
Internal rate limiting protects downstream services from being overwhelmed by a misbehaving upstream caller. Implement at the proxy layer (Envoy/Istio) rather than in application code:
# Envoy rate limit filter configuration
http_filters:
- name: envoy.filters.http.ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
domain: billing-service
rate_limit_service:
grpc_service:
envoy_grpc:
cluster_name: rate_limit_service
Exposing APIs: External Users / Frontend
Key SectionExternal APIs are consumed by mobile apps, browsers, third-party developers, and webhooks. The concerns shift to authentication, rate limiting, CORS, TLS termination, and protection from malicious traffic.
Full Request Flow: Mobile App to Backend
Authentication Patterns
| Method | How It Works | Best For | Pitfall |
|---|---|---|---|
| API Key | Static secret in header (X-API-Key: ...) | Server-to-server, developer APIs | Long-lived; rotate carefully |
| JWT (Bearer) | Signed token in Authorization: Bearer ... | User sessions, microservices | Can't revoke without blacklist |
| OAuth2 | Delegated authorization (access token + refresh token) | Third-party app access | Complex flow; use a library |
| Session Cookie | Server-side session, cookie with session ID | Browser-only web apps | CSRF vulnerability; need SameSite |
| mTLS | Client certificate presented during TLS | B2B, high-security APIs | Certificate management overhead |
# JWT anatomy
# Header.Payload.Signature (base64url encoded, dot-separated)
# Header: {"alg": "RS256", "typ": "JWT"}
# Payload: {"sub": "user42", "iat": 1706000000, "exp": 1706003600, "roles": ["admin"]}
# Signature: RS256(base64(header) + "." + base64(payload), private_key)
# Decode a JWT (never send sensitive data in payload — it's not encrypted)
jwt_token="eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
echo $jwt_token | cut -d. -f2 | base64 -d 2>/dev/null | python3 -m json.tool
# Validate with public key (do this server-side, never client-side)
# Verify: signature, expiration (exp), issuer (iss), audience (aud)
Rate Limiting Algorithms
| Algorithm | How It Works | Pros | Cons |
|---|---|---|---|
| Token Bucket | Bucket holds N tokens. Each request consumes 1 token. Tokens refill at fixed rate. | Allows bursts up to bucket size | Burst can overwhelm if bucket is large |
| Leaky Bucket | Requests enter a queue (bucket). Processed at a fixed rate. Excess dropped. | Smooth output rate | Strict queue; high latency under load |
| Fixed Window | Count requests in fixed time window (e.g., 100/min). Reset at boundary. | Simple to implement | Burst at window boundary (2x rate) |
| Sliding Window | Rolling count over last N seconds using timestamps or Redis sorted sets. | No boundary burst | More memory/compute |
| Sliding Log | Store timestamp of each request. Count in window. Expire old ones. | Most accurate | High memory at scale |
# Sliding window rate limit with Redis
import redis
import time
r = redis.Redis()
def is_rate_limited(user_id: str, limit: int, window_seconds: int) -> bool:
key = f"rate:{user_id}"
now = time.time()
window_start = now - window_seconds
pipe = r.pipeline()
pipe.zremrangebyscore(key, 0, window_start) # remove old entries
pipe.zadd(key, {str(now): now}) # add current request
pipe.zcard(key) # count in window
pipe.expire(key, window_seconds) # auto-expire key
results = pipe.execute()
request_count = results[2]
return request_count > limit
CORS (Cross-Origin Resource Sharing)
Browsers enforce the same-origin policy: JavaScript on https://app.example.com cannot call https://api.example.com unless the API explicitly allows it. CORS headers tell the browser which cross-origin requests are permitted.
# Simple CORS request (GET/POST with simple headers)
# Browser automatically adds:
Origin: https://app.example.com
# Server must respond with:
Access-Control-Allow-Origin: https://app.example.com # or * (never for credentialed)
Access-Control-Allow-Credentials: true # if sending cookies/auth
# Preflight (OPTIONS) — triggered by:
# - Non-simple methods (PUT, DELETE, PATCH)
# - Non-simple headers (Authorization, Content-Type: application/json)
OPTIONS /api/users HTTP/1.1
Origin: https://app.example.com
Access-Control-Request-Method: DELETE
Access-Control-Request-Headers: Authorization
# Server must respond to OPTIONS with:
HTTP/1.1 204 No Content
Access-Control-Allow-Origin: https://app.example.com
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
Access-Control-Allow-Headers: Authorization, Content-Type
Access-Control-Max-Age: 86400 # Cache preflight for 24h
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: * (wildcard) cannot be used with Access-Control-Allow-Credentials: true. If you need to send cookies or Authorization headers cross-origin, you must specify the exact origin. Never use * for authenticated APIs.
TLS Termination
TLS termination is where the encrypted HTTPS connection is decrypted. Two approaches:
- Terminate at load balancer (most common): ALB/Nginx decrypts HTTPS, forwards plain HTTP internally. Certificate management is centralized. Internal traffic is unencrypted — acceptable in a trusted VPC, but not zero-trust.
- Terminate at application (end-to-end TLS): The load balancer passes through the encrypted TCP stream (L4 passthrough). Each application instance manages its own certificate. More secure, but operationally complex.
- Re-encrypt (hybrid): Terminate at load balancer, re-encrypt to backend with internal certificates. Used with service meshes (mTLS).
CDN & API Caching
# Control what CDNs (and browsers) cache via Cache-Control header
# Public GET endpoints
Cache-Control: public, max-age=300, stale-while-revalidate=60
# Private user data — never cache at CDN
Cache-Control: private, no-cache
# Immutable assets (content-hashed filenames)
Cache-Control: public, max-age=31536000, immutable
# CDN cache purge (Cloudflare API)
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache" \
-H "Authorization: Bearer {token}" \
-H "Content-Type: application/json" \
-d '{"purge_everything": true}'
# Vary header: CDN must cache separate responses per Accept-Encoding
Vary: Accept-Encoding
Webhook Patterns
Webhooks are outbound API calls — your server pushes events to a customer's endpoint when something happens. Design considerations:
# Webhook delivery: retry with exponential backoff
import httpx
import hashlib
import hmac
def deliver_webhook(url: str, payload: dict, secret: str) -> bool:
"""
Deliver webhook with HMAC signature for verification.
Receiver should validate: hmac.compare_digest(expected_sig, received_sig)
"""
body = json.dumps(payload, separators=(',', ':')).encode()
# Signature so receivers can verify authenticity
signature = hmac.new(
secret.encode(),
body,
hashlib.sha256
).hexdigest()
headers = {
"Content-Type": "application/json",
"X-Webhook-Signature": f"sha256={signature}",
"X-Webhook-Timestamp": str(int(time.time())),
}
for attempt in range(5):
try:
response = httpx.post(url, content=body, headers=headers, timeout=10)
if response.status_code < 500:
return True # 2xx success or 4xx (don't retry client errors)
except httpx.RequestError:
pass
time.sleep(2 ** attempt) # 1s, 2s, 4s, 8s, 16s
return False
Load Balancing
L4 vs L7 Load Balancers
| Layer | Sees | Can Route By | Examples | Use Case |
|---|---|---|---|---|
| L4 (Transport) | TCP/UDP packets | IP, port | AWS NLB, HAProxy (TCP mode) | Low latency, TLS passthrough, non-HTTP |
| L7 (Application) | HTTP headers, URL, body | Path, host, header, cookie | AWS ALB, Nginx, Envoy, Caddy | HTTP routing, SSL termination, A/B testing |
Load Balancing Algorithms
| Algorithm | How It Works | Best For |
|---|---|---|
| Round Robin | Cycle through servers in order | Homogeneous servers, short-lived requests |
| Weighted Round Robin | Servers get proportional share (e.g., 70/30) | Heterogeneous capacity, canary deployments |
| Least Connections | Send to server with fewest active connections | Long-lived connections (WebSockets, gRPC streams) |
| IP Hash | Hash client IP to always route to same server | Simple session affinity (no shared session store) |
| Consistent Hashing | Hash to a ring; minimize remapping on server add/remove | Cache clusters, distributed storage |
| Random (power of 2) | Pick 2 random servers, choose less loaded | Large fleets, avoids round-robin thundering herd |
Health Checks & Session Affinity
# Nginx upstream with health checks and sticky sessions
upstream api_backend {
least_conn; # algorithm
server api-1.internal:8080;
server api-2.internal:8080;
server api-3.internal:8080;
# Sticky sessions via cookie (nginx plus / commercial)
sticky cookie srv_id expires=1h domain=.example.com path=/;
keepalive 32; # reuse connections to backend
}
server {
location /api/ {
proxy_pass http://api_backend;
health_check interval=5s fails=3 passes=2; # nginx plus
}
}
# AWS ALB target group health check equivalent
# In Terraform:
resource "aws_lb_target_group" "api" {
health_check {
path = "/healthz"
interval = 30
healthy_threshold = 2
unhealthy_threshold = 3
timeout = 5
}
}
WebSockets & Server-Sent Events
WebSocket Upgrade Handshake
WebSockets start as an HTTP request and are "upgraded" to a persistent bidirectional TCP connection. The HTTP connection is reused — no new TCP connection is needed.
# WebSocket upgrade request
GET /ws/feed HTTP/1.1
Host: api.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== # random base64
Sec-WebSocket-Version: 13
# Server accepts upgrade
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= # SHA1 of key + GUID
# After upgrade: full-duplex binary framing
# Ping/pong frames for keepalive
# Close frame for graceful shutdown
Server-Sent Events (SSE)
SSE is one-way server push over a single HTTP connection. Simpler than WebSockets when you only need server → client streaming.
# FastAPI SSE endpoint
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import asyncio
app = FastAPI()
async def event_generator():
while True:
# yield events in SSE format
data = await get_next_event()
yield f"id: {data['id']}\n"
yield f"event: {data['type']}\n"
yield f"data: {json.dumps(data['payload'])}\n\n"
await asyncio.sleep(0.1)
@app.get("/events")
async def stream_events():
return StreamingResponse(
event_generator(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"X-Accel-Buffering": "no" # disable Nginx buffering
}
)
Comparison: Real-Time Options
| Method | Direction | Protocol | Best For | Notes |
|---|---|---|---|---|
| WebSocket | Bidirectional | WS (over TCP) | Chat, games, collaborative editing | Needs sticky sessions at LB |
| SSE | Server → Client only | HTTP | Live feeds, notifications, logs | Auto-reconnect built-in; simpler |
| Long Polling | Server → Client | HTTP | Fallback, fire-and-forget push | High overhead; legacy approach |
| HTTP/2 Server Push | Server → Client | HTTP/2 | Preloading assets | Deprecated/removed in Chrome; avoid |
Load Balancing WebSockets
WebSockets maintain a persistent connection to a specific backend instance. This creates challenges for load balancers:
- Sticky sessions required: The LB must route all frames from one client to the same backend. Use IP hash or cookie-based affinity.
- Timeout configuration: Default LB idle timeouts (60s for AWS ALB) will kill long-lived WebSocket connections. Increase to 3600s or use keepalive pings.
- Horizontal scaling is harder: State is local to the connection. To scale out, use a pub/sub layer (Redis Pub/Sub, NATS) so any backend can receive and forward messages to the appropriate WebSocket connection.
Network Debugging
curl Advanced Usage
# Verbose: show TLS handshake, request/response headers
curl -v https://api.example.com/users
# Show only response headers
curl -I https://api.example.com/users
# Custom headers
curl -H "Authorization: Bearer eyJhbGc..." \
-H "Content-Type: application/json" \
https://api.example.com/users
# POST with JSON body
curl -X POST https://api.example.com/users \
-H "Content-Type: application/json" \
-d '{"name": "Alice", "email": "[email protected]"}'
# Override DNS (test a specific IP without changing /etc/hosts)
curl --resolve api.example.com:443:93.184.216.34 https://api.example.com/users
# Follow redirects, show final URL
curl -L -w "%{url_effective}\n" https://short.url/abc
# Measure timing breakdown
curl -w "\nDNS: %{time_namelookup}s\nConnect: %{time_connect}s\nTLS: %{time_appconnect}s\nFirst byte: %{time_starttransfer}s\nTotal: %{time_total}s\n" \
-o /dev/null -s https://api.example.com/users
# Set timeout
curl --connect-timeout 5 --max-time 30 https://api.example.com/users
tcpdump Basics
# Capture HTTP traffic on port 80 to a file
sudo tcpdump -i en0 port 80 -w capture.pcap
# Live capture: show HTTP requests (ASCII)
sudo tcpdump -i en0 -A port 80
# Filter by host and port
sudo tcpdump -i en0 host api.example.com and port 443
# Capture DNS queries
sudo tcpdump -i en0 port 53
# Read captured file in Wireshark
wireshark capture.pcap
# Note: TLS traffic is encrypted in captures.
# Decrypt with SSLKEYLOGFILE env var (Chrome/Firefox/curl support it):
SSLKEYLOGFILE=~/ssl-keys.log curl https://api.example.com
# Then load ssl-keys.log in Wireshark under TLS preferences
Other Diagnostic Tools
# mtr: continuous traceroute (shows packet loss at each hop)
mtr --report --report-cycles 10 google.com
# ss: socket statistics (modern netstat replacement)
ss -tlnp # TCP listening sockets with process names
ss -tnp # established TCP connections
ss -s # summary statistics
# lsof: what process owns a port
lsof -i :8080 # what's on port 8080
lsof -i TCP:443 # all TCP connections on 443
lsof -i -n -P | grep LISTEN # all listening ports
# Check if a port is open (no nmap needed)
timeout 3 bash -c 'cat < /dev/null > /dev/tcp/api.example.com/443'
echo $? # 0 = open, 1 = closed/refused
Common Debugging Scenarios
Connection refused (ECONNREFUSED)
The port is not open on the target host. Check:
lsof -i :PORT— is anything listening on that port?docker ps— is the container running? Did it crash?- Is the service binding to
127.0.0.1(loopback only) instead of0.0.0.0? - Is a firewall (iptables, security group) blocking the port?
lsof -i :8080
# If nothing shows, the service isn't running or crashed at startup
# Check service logs: docker logs container-name, journalctl -u service-name
Connection timeout (no response)
Unlike connection refused, the packet reached the network but got dropped silently:
- Firewall is blocking and dropping (not rejecting) — check security group rules
- Host is unreachable — check routing table, VPC peering, VPN
- Wrong IP — the DNS resolved to the wrong address
traceroute api.example.com # Where does the path stop?
dig api.example.com # Is DNS resolving to the expected IP?
nmap -p 443 api.example.com # Is the port responding?
DNS failure (NXDOMAIN or SERVFAIL)
# Test with a public resolver (bypass local cache)
dig @8.8.8.8 api.example.com
dig @1.1.1.1 api.example.com
# Check if it's a local cache issue
sudo killall -HUP mDNSResponder # flush macOS DNS cache
# or: sudo dscacheutil -flushcache
# SERVFAIL: upstream resolver error — try a different resolver
# NXDOMAIN: the name truly doesn't exist — check DNS zone, check for typos
Certificate errors (SSL handshake failed)
# Check certificate details
openssl s_client -connect api.example.com:443 -servername api.example.com
# Common errors:
# "certificate has expired"
openssl s_client -connect api.example.com:443 2>/dev/null | openssl x509 -noout -dates
# "hostname mismatch" — cert's CN/SANs don't match the hostname
openssl s_client -connect api.example.com:443 2>/dev/null | openssl x509 -noout -text | grep DNS
# "certificate signed by unknown authority" — custom CA not in trust store
curl --cacert /path/to/custom-ca.crt https://internal.example.com
# or add to system trust store:
# macOS: sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain custom-ca.crt
Network Security
Firewalls & Security Groups
Firewalls filter traffic based on source/destination IP, port, and protocol. In cloud environments, security groups are stateful firewalls at the instance/ENI level.
# iptables: Linux kernel firewall (legacy, still common)
# Allow established connections (stateful)
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# Allow incoming on port 443
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
# Drop everything else
iptables -A INPUT -j DROP
# List current rules
iptables -L -n -v
# nftables: modern replacement for iptables
nft list ruleset
# AWS Security Group (Terraform)
resource "aws_security_group" "api" {
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # public HTTPS
}
ingress {
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.lb.id] # only from LB
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
Network Segmentation (VPC)
Common Attacks & Mitigations
| Attack | How It Works | Mitigation |
|---|---|---|
| SYN Flood (DDoS) | Flood server with SYN packets, exhaust connection table (half-open connections) | SYN cookies (stateless), rate limit SYN at edge, Cloudflare/AWS Shield |
| DNS Spoofing / Cache Poisoning | Inject false DNS records into resolver cache, redirect traffic to attacker | DNSSEC (signed records), randomize source port + query ID |
| MITM (Man-in-the-Middle) | Attacker intercepts traffic between client and server | TLS everywhere, certificate pinning, HSTS |
| BGP Hijacking | Malicious AS announces prefixes it doesn't own, reroutes internet traffic | RPKI (Route Origin Authorization), BGP filtering |
| Amplification DDoS | Use UDP services (DNS, NTP) to amplify small requests into large floods | BCP38 ingress filtering, rate-limit UDP reflection, disable open resolvers |
| SSL Stripping | Downgrade HTTPS to HTTP in transit (requires MITM position) | HSTS, HSTS Preload list — browser refuses to connect over HTTP |
Zero-Trust Architecture
Traditional perimeter security assumes: "if you're inside the network, you can be trusted." Zero-trust assumes: "never trust, always verify" — regardless of network location.
- Identity-based access: Every service has a cryptographic identity (SPIFFE/SPIRE). Access is granted per-identity, not per-IP.
- mTLS everywhere: All service-to-service calls are mutually authenticated. No implicit trust just because traffic is internal.
- Least privilege: Services only get network access to what they need. A web server can reach the database; a batch worker cannot reach the payment service.
- No implicit VPN trust: VPN access doesn't grant blanket access. Each resource requires separate authorization.
- Continuous verification: Auth tokens are short-lived; access is re-verified on each request.
# Kubernetes NetworkPolicy: restrict what can talk to the database
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: database-allow-api-only
namespace: production
spec:
podSelector:
matchLabels:
role: database
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
role: api # only pods labeled role=api can connect
ports:
- protocol: TCP
port: 5432
# All other ingress to database pods is dropped by default
HSTS & Certificate Pinning
# HSTS: tell browsers to always use HTTPS (never downgrade)
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
# max-age=31536000 = 1 year
# includeSubDomains: applies to all subdomains
# preload: submit to browser HSTS preload list (hardcoded in Chrome/Firefox)
# WARNING: preload is nearly irreversible — only add if you're committed to HTTPS
# Certificate pinning (mobile apps)
# Pin the public key hash of your certificate or CA
# If the cert changes without updating the pin, all requests fail
# Risk: if you lose the private key, your app is permanently broken for old versions