SQL Refresher — Joins, Window Functions, CTEs

Table of Contents

0. Setup & Environment

Prerequisites

Install Docker Desktop and the psql client — you do not need a full Postgres server on the host.

# Docker Desktop (Mac)
brew install --cask docker

# psql client only (no server)
brew install libpq && brew link --force libpq

Start Postgres in Docker

docker run -d --name pg-refresher \
  -e POSTGRES_USER=dev \
  -e POSTGRES_PASSWORD=dev \
  -e POSTGRES_DB=refresher \
  -p 5432:5432 \
  postgres:16

Connect via psql

PGPASSWORD=dev psql -h 127.0.0.1 -p 5432 -U dev -d refresher

Tip: Use 127.0.0.1 instead of localhost. On macOS, localhost may resolve to IPv6 (::1) while the Docker container binds IPv4 only, causing a connection refused error.

Load Sample Data

A minimal employees table is used throughout this refresher.

CREATE TABLE employees (
  id         SERIAL PRIMARY KEY,
  name       TEXT        NOT NULL,
  department TEXT,
  salary     NUMERIC(10,2),
  hired_at   DATE        DEFAULT CURRENT_DATE
);

INSERT INTO employees (name, department, salary) VALUES
  ('Alice', 'Engineering', 130000),
  ('Bob',   'Engineering', 125000),
  ('Carol', 'Product',     115000),
  ('Dave',  'Sales',        95000),
  ('Eve',   'Engineering', 140000);

Useful psql Commands

Command	What it does
`\dt`	List tables in the current database
`\d tablename`	Describe a table (columns, types, constraints)
`\x`	Toggle expanded (vertical) display for wide rows
`\timing`	Show query execution time after each statement
`\q`	Quit psql

Note: All \ commands are psql meta-commands — they are not SQL and only work inside the psql shell.

Cleanup

docker stop pg-refresher && docker rm pg-refresher

1. Fundamentals

Core SELECT Syntax

-- Basic anatomy
SELECT column1, column2, expression AS alias
FROM   table_name
WHERE  condition
ORDER BY column1 ASC, column2 DESC
LIMIT  10 OFFSET 20;   -- skip 20 rows, return next 10

-- DISTINCT eliminates duplicate rows (entire row, not per column)
SELECT DISTINCT department, location FROM employees;

-- Column and table aliases
SELECT e.first_name || ' ' || e.last_name AS full_name,
       d.name                             AS dept_name
FROM   employees e
JOIN   departments d ON e.dept_id = d.id;

Logical Execution Order

SQL is written in one order but executed in another. This governs what aliases are visible where.

-- Written order:
-- SELECT → FROM → JOIN → WHERE → GROUP BY → HAVING → ORDER BY → LIMIT

-- Execution order:
-- 1. FROM / JOIN  — identify source rows and perform joins
-- 2. WHERE        — filter individual rows (no aggregates yet)
-- 3. GROUP BY     — form groups
-- 4. HAVING       — filter groups (aggregates allowed here)
-- 5. SELECT       — compute output columns, assign aliases
-- 6. DISTINCT     — deduplicate if specified
-- 7. ORDER BY     — sort (SELECT aliases visible here)
-- 8. LIMIT/OFFSET — truncate result set

-- Consequence: WHERE cannot reference SELECT aliases
-- This fails:
SELECT salary * 1.1 AS new_salary FROM employees WHERE new_salary > 50000;

-- This works (use a subquery / CTE):
SELECT * FROM (
  SELECT salary * 1.1 AS new_salary FROM employees
) sub
WHERE new_salary > 50000;

Why execution order matters

You cannot use a SELECT alias in WHERE or GROUP BY (except in MySQL/BigQuery which relax this). You can use it in ORDER BY. Aggregates cannot appear in WHERE — use HAVING instead.

NULL Handling

-- NULL is NOT equal to anything — even itself
SELECT NULL = NULL;   -- returns NULL (not TRUE)
SELECT NULL != NULL;  -- returns NULL (not TRUE)

-- Correct NULL checks
SELECT * FROM orders WHERE cancelled_at IS NULL;
SELECT * FROM orders WHERE cancelled_at IS NOT NULL;

-- COALESCE: first non-NULL value in list
SELECT COALESCE(phone, mobile, 'N/A') AS contact FROM users;

-- NULLIF: returns NULL if two values are equal (avoids divide-by-zero)
SELECT revenue / NULLIF(units_sold, 0) AS revenue_per_unit FROM sales;

-- Three-valued logic: TRUE, FALSE, NULL
-- NULL AND TRUE  = NULL
-- NULL AND FALSE = FALSE  (short-circuit)
-- NULL OR  TRUE  = TRUE   (short-circuit)
-- NULL OR  FALSE = NULL

-- NULLs sort LAST in ASC, FIRST in DESC (Postgres/MySQL default)
-- Control explicitly:
SELECT name FROM products ORDER BY discontinued_at NULLS LAST;
SELECT name FROM products ORDER BY discontinued_at DESC NULLS FIRST;

NULL in WHERE clauses

WHERE col = NULL never matches any row — it evaluates to NULL, not FALSE. Always use IS NULL / IS NOT NULL. This is a frequent source of silent data loss.

2. Data Types

Numeric Types

Type	Storage	Range / Precision	Use when
`SMALLINT`	2 bytes	-32,768 to 32,767	Flags, small enums
`INTEGER` / `INT`	4 bytes	-2.1B to 2.1B	Most IDs, counts
`BIGINT`	8 bytes	-9.2e18 to 9.2e18	Large IDs, timestamps as millis
`DECIMAL(p,s)` / `NUMERIC(p,s)`	variable	exact; p digits, s after decimal	Money, exact arithmetic
`REAL` / `FLOAT4`	4 bytes	~6 sig. digits	Scientific data
`DOUBLE PRECISION` / `FLOAT8`	8 bytes	~15 sig. digits	ML features, coordinates

Never use FLOAT for money

0.1 + 0.2 in IEEE 754 floating point is 0.30000000000000004. Use DECIMAL(19, 4) for currency. Even better: store amounts in integer cents.

String Types

-- CHAR(n): fixed-length, right-padded with spaces. Rarely needed.
-- VARCHAR(n): variable-length with max n characters.
-- TEXT: unlimited length (Postgres). In MySQL, TEXT is a BLOB family type.

-- Postgres: TEXT and VARCHAR(n) have identical performance.
-- Constraint value goes in CHECK, not in the column type for TEXT.
CREATE TABLE users (
  username  VARCHAR(50) NOT NULL,  -- enforces max length
  bio       TEXT,                  -- unbounded
  country   CHAR(2)                -- ISO country code, always 2 chars
);

Date and Time Types

-- DATE: calendar date only (no time)
-- TIME: time of day only (no date)
-- TIMESTAMP: date + time, no timezone
-- TIMESTAMPTZ (Postgres) / DATETIME (MySQL): timestamp with timezone awareness

-- Postgres examples
SELECT
  CURRENT_DATE,                         -- today's date
  CURRENT_TIME,                         -- current time of day (timetz, no date component)
  NOW(),                                -- current timestamp with tz
  CURRENT_TIMESTAMP,                    -- same as NOW()
  NOW() - INTERVAL '7 days',            -- 7 days ago
  DATE_TRUNC('month', NOW()),           -- first moment of current month
  EXTRACT(DOW FROM NOW()),              -- day of week (0=Sun)
  NOW()::date;                          -- cast to date

-- Arithmetic
SELECT created_at + INTERVAL '30 days' AS expires_at FROM subscriptions;
SELECT AGE(expires_at, NOW()) AS time_remaining FROM subscriptions;

-- MySQL date functions
SELECT
  CURDATE(),
  NOW(),
  DATE_FORMAT(NOW(), '%Y-%m-%d'),
  DATEDIFF('2026-12-31', '2026-01-01'),  -- days between
  DATE_ADD(NOW(), INTERVAL 7 DAY);

Boolean, JSON, Arrays, UUID

-- Boolean (Postgres native; MySQL uses TINYINT(1))
SELECT * FROM features WHERE is_enabled = TRUE;
SELECT * FROM features WHERE NOT is_enabled;

-- UUID (Postgres 13+ built-in, no extension needed)
SELECT gen_random_uuid();

-- JSONB (Postgres) — binary JSON, indexed, preferred over JSON
CREATE TABLE events (
  id   BIGSERIAL PRIMARY KEY,
  data JSONB NOT NULL
);

-- Arrays (Postgres)
CREATE TABLE tags (
  post_id  INT,
  keywords TEXT[]
);
INSERT INTO tags VALUES (1, ARRAY['sql', 'database', 'postgres']);
SELECT * FROM tags WHERE 'sql' = ANY(keywords);

Type Casting

-- Three equivalent forms in Postgres:
SELECT CAST('42' AS INTEGER);
SELECT '42'::INTEGER;           -- Postgres-specific shorthand
SELECT CONVERT('42', UNSIGNED INTEGER); -- MySQL syntax

-- Common casts
SELECT '2026-01-15'::DATE;
SELECT 3.14::NUMERIC(5,2);
SELECT NOW()::DATE;             -- strip time component
SELECT id::TEXT FROM users;     -- integer to string for concatenation

-- Implicit coercion pitfall (MySQL)
-- '5abc' + 1 = 6 in MySQL (silently truncates string)
-- Postgres raises an error — prefer explicit casts

Dialect Type Comparison

Concept	PostgreSQL	MySQL	SQLite
Auto-increment PK	`SERIAL` / `BIGSERIAL` / `GENERATED ALWAYS AS IDENTITY`	`AUTO_INCREMENT`	`INTEGER PRIMARY KEY` (implicit)
Boolean	`BOOLEAN` (true type)	`TINYINT(1)`	No boolean type; use 0/1
Unlimited text	`TEXT`	`TEXT` / `LONGTEXT`	`TEXT`
Binary JSON	`JSONB`	`JSON` (no binary form)	Stored as TEXT
Arrays	Native `TEXT[]`, `INT[]`	None (use JSON)	None
UUID	`UUID` type	Store as `CHAR(36)` or `BINARY(16)`	Store as `TEXT`
Interval/duration	`INTERVAL`	No native type	No native type

3. Filtering & Operators

Comparison and Logical Operators

-- Comparison
SELECT * FROM products WHERE price > 100 AND stock >= 5;
SELECT * FROM products WHERE price <> 0;   -- != also works in most dialects
SELECT * FROM products WHERE category IS DISTINCT FROM 'archived';  -- NULL-safe <>

-- BETWEEN (inclusive on both ends)
SELECT * FROM orders WHERE created_at BETWEEN '2026-01-01' AND '2026-01-31';
-- Equivalent to: created_at >= '2026-01-01' AND created_at <= '2026-01-31'

-- IN (membership test)
SELECT * FROM users WHERE role IN ('admin', 'moderator');
SELECT * FROM users WHERE id NOT IN (1, 2, 3);

-- IN with NULL pitfall: NOT IN returns no rows if list contains NULL
-- This returns 0 rows even when you expect results:
SELECT * FROM users WHERE id NOT IN (1, NULL);

Pattern Matching

-- LIKE: % matches any sequence, _ matches exactly one character
SELECT * FROM products WHERE name LIKE 'Apple%';    -- starts with Apple
SELECT * FROM products WHERE sku LIKE 'A_-___';     -- matches 6-char SKU: literal A, any char, dash, 3 chars

-- ILIKE (Postgres only): case-insensitive LIKE
SELECT * FROM products WHERE name ILIKE '%phone%';

-- SIMILAR TO (Postgres): SQL regex flavor
SELECT * FROM users WHERE phone SIMILAR TO '[+]1[0-9]{10}';

-- Postgres regex operators
SELECT * FROM users WHERE email ~ '^[a-z]';     -- case-sensitive match
SELECT * FROM users WHERE email ~* '^[a-z]';    -- case-insensitive match
SELECT * FROM users WHERE email !~ 'gmail';     -- does not match

-- MySQL REGEXP
SELECT * FROM users WHERE email REGEXP '^[a-z]';

ANY, ALL, EXISTS

-- ANY: true if condition holds for at least one value in set
SELECT * FROM products WHERE price > ANY(SELECT price FROM products WHERE category = 'budget');

-- ALL: true if condition holds for every value in set
SELECT * FROM products WHERE price > ALL(SELECT price FROM products WHERE category = 'budget');

-- EXISTS: true if subquery returns at least one row (stops at first match — efficient)
SELECT * FROM customers c
WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.customer_id = c.id AND o.total > 1000
);

-- NOT EXISTS: anti-join pattern (often faster than NOT IN when NULLs present)
SELECT * FROM customers c
WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.customer_id = c.id
);

CASE Expressions

-- Searched CASE (most general form)
SELECT
  name,
  CASE
    WHEN salary < 50000  THEN 'junior'
    WHEN salary < 100000 THEN 'mid'
    WHEN salary < 200000 THEN 'senior'
    ELSE 'executive'
  END AS level
FROM employees;

-- Simple CASE (equality checks only)
SELECT
  name,
  CASE status
    WHEN 'A' THEN 'Active'
    WHEN 'I' THEN 'Inactive'
    WHEN 'P' THEN 'Pending'
    ELSE 'Unknown'
  END AS status_label
FROM accounts;

-- CASE in aggregation
SELECT
  COUNT(CASE WHEN status = 'active'   THEN 1 END) AS active_count,
  COUNT(CASE WHEN status = 'inactive' THEN 1 END) AS inactive_count
FROM users;

-- IIF / IF shorthand (not ANSI, but common)
-- MySQL: IF(condition, true_val, false_val)
-- SQLite: IIF(condition, true_val, false_val) — SQLite 3.32+
-- Postgres: use CASE WHEN or a custom function

CASE returns the first matching branch

Order your WHEN clauses from most specific to most general. A NULL input to a simple CASE will never match any WHEN value — it falls through to ELSE.

Predict the output

The NULL Comparison Trap

How many rows does this return?

View table schema

CREATE TABLE users (id INT, name TEXT, email TEXT);
INSERT INTO users VALUES (1, 'Alice', '[email protected]');
INSERT INTO users VALUES (2, 'Bob', NULL);
INSERT INTO users VALUES (3, 'Carol', '[email protected]');
INSERT INTO users VALUES (4, 'Dave', NULL);

SELECT count(*) FROM users WHERE email != '[email protected]';

1 (only Carol). NULL comparisons with != return NULL (not TRUE), so Bob and Dave are excluded. Use IS NOT NULL or COALESCE to include NULLs.

4. JOINs

Join Types at a Glance

/*
  A      B         JOIN TYPE          RESULT
  ----   ----      ---------------    ----------------------------
  1      1         INNER JOIN         rows where A.key = B.key
  2      2         LEFT OUTER JOIN    all A rows + matched B (NULL if no match)
  3             3  RIGHT OUTER JOIN   all B rows + matched A (NULL if no match)
                   FULL OUTER JOIN    all rows from both (NULL where no match)
  CROSS JOIN                          cartesian product (n*m rows)
*/

-- INNER JOIN: only matching rows
SELECT o.id, c.name, o.total
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;

-- LEFT JOIN: all orders, customer info if available
SELECT o.id, c.name, o.total
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id;

-- FULL OUTER JOIN: all orders and all customers (nulls on either side)
SELECT o.id, c.name
FROM orders o
FULL OUTER JOIN customers c ON o.customer_id = c.id;

Self Join

-- Find employee and their manager (same table, different aliases)
SELECT e.name AS employee, m.name AS manager
FROM   employees e
LEFT JOIN employees m ON e.manager_id = m.id;

Multiple Join Conditions

-- Join on composite key
SELECT *
FROM order_items oi
JOIN inventory inv ON oi.product_id = inv.product_id
                  AND oi.warehouse_id = inv.warehouse_id;

-- Non-equi join (range join)
SELECT e.name, b.band_name
FROM   employees e
JOIN   salary_bands b ON e.salary BETWEEN b.min_salary AND b.max_salary;

Anti-Join and Semi-Join Patterns

-- Semi-join: customers who have placed at least one order
-- Using EXISTS (most readable, stops at first match)
SELECT c.* FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);

-- Same with IN (works, but behaves differently with NULLs)
SELECT * FROM customers WHERE id IN (SELECT customer_id FROM orders);

-- Anti-join: customers with NO orders
-- Using NOT EXISTS (NULL-safe, preferred)
SELECT c.* FROM customers c
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);

-- Using LEFT JOIN + IS NULL (equivalent, sometimes faster)
SELECT c.*
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.id IS NULL;

LATERAL Join (Postgres) / CROSS APPLY (SQL Server)

-- LATERAL lets a subquery reference columns from preceding FROM items
-- Use case: top-N per group

SELECT u.name, recent.order_date, recent.total
FROM users u
CROSS JOIN LATERAL (
  SELECT order_date, total
  FROM orders o
  WHERE o.user_id = u.id
  ORDER BY order_date DESC
  LIMIT 3
) recent;

-- Without LATERAL, you'd need a window function + outer query

Join performance checklist

Join columns should be indexed (especially the smaller table's FK side)
Ensure data types match exactly — implicit casts prevent index use
Filter before joining when possible (push predicates into subqueries)
CROSS JOIN on large tables produces n×m rows — always intentional

Predict the output

WHERE vs ON in LEFT JOIN

How many rows does each query return?

View table schema

CREATE TABLE departments (id INT, name TEXT);
INSERT INTO departments VALUES (1, 'Eng'), (2, 'Sales'), (3, 'HR');
CREATE TABLE employees (id INT, name TEXT, dept_id INT);
INSERT INTO employees VALUES (1, 'Alice', 1), (2, 'Bob', 1), (3, 'Carol', 2);

-- Query A: filter in ON
SELECT d.name, e.name
FROM departments d LEFT JOIN employees e ON d.id = e.dept_id AND e.name = 'Alice';

-- Query B: filter in WHERE
SELECT d.name, e.name
FROM departments d LEFT JOIN employees e ON d.id = e.dept_id
WHERE e.name = 'Alice';

Query A returns 3 rows (all departments preserved, only Alice matched). Query B returns 1 row (WHERE filters after the join, eliminating NULLs). Key insight: ON conditions filter before the join; WHERE filters after.

5. Aggregation

Core Aggregate Functions

SELECT
  COUNT(*)                    AS total_rows,      -- counts all rows including NULLs
  COUNT(email)                AS non_null_emails, -- skips NULLs
  COUNT(DISTINCT country)     AS unique_countries,
  SUM(amount)                 AS revenue,
  AVG(amount)                 AS avg_order,
  MIN(created_at)             AS first_order,
  MAX(created_at)             AS last_order,
  STDDEV(amount)              AS std_deviation,   -- Postgres/MySQL
  VARIANCE(amount)            AS variance
FROM orders;

GROUP BY

-- Every non-aggregate SELECT column must appear in GROUP BY
SELECT department, job_title, COUNT(*) AS headcount
FROM employees
GROUP BY department, job_title
ORDER BY headcount DESC;

-- GROUP BY ordinal position (avoid — fragile)
SELECT department, COUNT(*) FROM employees GROUP BY 1;

-- HAVING filters after grouping (aggregates allowed)
SELECT department, AVG(salary) AS avg_sal
FROM employees
GROUP BY department
HAVING AVG(salary) > 80000;

-- WHERE vs HAVING: WHERE filters rows, HAVING filters groups
SELECT department, COUNT(*) AS cnt
FROM employees
WHERE is_active = TRUE          -- filter rows first
GROUP BY department
HAVING COUNT(*) >= 5;           -- then filter groups

GROUPING SETS, ROLLUP, CUBE

-- GROUPING SETS: explicit list of groupings in one query
SELECT region, product, SUM(sales) AS total
FROM sales_data
GROUP BY GROUPING SETS (
  (region, product),  -- subtotal by region+product
  (region),           -- subtotal by region
  (product),          -- subtotal by product
  ()                  -- grand total
);

-- ROLLUP: hierarchical subtotals (year > quarter > month)
SELECT year, quarter, month, SUM(revenue)
FROM financials
GROUP BY ROLLUP (year, quarter, month);
-- Equivalent to: GROUPING SETS((year,quarter,month),(year,quarter),(year),())

-- CUBE: all combinations of the dimensions
SELECT region, product, channel, SUM(sales)
FROM sales_data
GROUP BY CUBE (region, product, channel);
-- Produces 2^3 = 8 groupings

-- GROUPING() function: tells you which columns are aggregated (= NULL from rollup)
SELECT
  COALESCE(region, 'ALL')   AS region,
  COALESCE(product, 'ALL')  AS product,
  SUM(sales),
  GROUPING(region)           AS is_region_total
FROM sales_data
GROUP BY ROLLUP (region, product);

FILTER Clause and String Aggregation

-- FILTER (Postgres 9.4+): conditional aggregation without CASE
SELECT
  COUNT(*) FILTER (WHERE status = 'active')   AS active,
  COUNT(*) FILTER (WHERE status = 'inactive') AS inactive,
  AVG(salary) FILTER (WHERE dept = 'eng')     AS eng_avg_salary
FROM employees;

-- STRING_AGG (Postgres / SQL Server): aggregate into comma-separated string
SELECT department, STRING_AGG(name, ', ' ORDER BY name) AS members
FROM employees
GROUP BY department;

-- GROUP_CONCAT (MySQL / SQLite)
SELECT department, GROUP_CONCAT(name ORDER BY name SEPARATOR ', ') AS members
FROM employees
GROUP BY department;

-- ARRAY_AGG (Postgres): aggregate into array
SELECT department, ARRAY_AGG(name ORDER BY name) AS members
FROM employees
GROUP BY department;

COUNT(*) vs COUNT(col) vs COUNT(DISTINCT col)

COUNT(*) — counts every row, including those with all NULLs
COUNT(col) — counts rows where col is NOT NULL
COUNT(DISTINCT col) — counts unique non-NULL values; slower (requires dedup)

Fill in the blank

Departments with High Average Salary

Write a query to find departments where the average salary exceeds 80,000. Show the department name and average salary.

View table schema

CREATE TABLE emp (id INT, name TEXT, dept TEXT, salary INT);
INSERT INTO emp VALUES (1,'Alice','Eng',95000),(2,'Bob','Eng',85000),(3,'Carol','Sales',70000),(4,'Dave','Sales',72000),(5,'Eve','Eng',90000),(6,'Frank','HR',65000);

-- YOUR QUERY HERE
SELECT

Use GROUP BY dept with HAVING AVG(salary) > 80000. Remember: WHERE filters rows before grouping, HAVING filters groups after.

SELECT dept, AVG(salary) as avg_salary
FROM emp
GROUP BY dept
HAVING AVG(salary) > 80000;

6. Subqueries

Scalar Subqueries

-- Returns exactly one row and one column; usable anywhere a value is expected
SELECT
  name,
  salary,
  salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;

-- Scalar subquery in WHERE
SELECT * FROM products
WHERE price = (SELECT MAX(price) FROM products);

Correlated vs Non-Correlated

-- Non-correlated: inner query runs once, result is reused
SELECT * FROM departments
WHERE id IN (SELECT DISTINCT dept_id FROM employees);

-- Correlated: inner query runs once per outer row — can be slow on large tables
-- "Find employees who earn more than the average for their department"
SELECT name, salary, dept_id
FROM employees e
WHERE salary > (
  SELECT AVG(salary)
  FROM employees
  WHERE dept_id = e.dept_id   -- references outer query's row
);

-- The correlated version above is often better rewritten as a window function:
SELECT name, salary, dept_id,
       AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg
FROM employees;

Derived Tables (Subquery in FROM)

-- Subquery in FROM must be aliased
SELECT dept_stats.dept_id, dept_stats.avg_sal, e.name
FROM (
  SELECT dept_id, AVG(salary) AS avg_sal
  FROM employees
  GROUP BY dept_id
) dept_stats
JOIN employees e ON e.dept_id = dept_stats.dept_id
WHERE e.salary > dept_stats.avg_sal;

EXISTS vs IN: Performance

-- IN: materializes the full subquery result set
-- Good when inner result is small and cached
SELECT * FROM orders WHERE customer_id IN (
  SELECT id FROM customers WHERE country = 'US'
);

-- EXISTS: short-circuits on first match — better for large inner result
SELECT * FROM orders o WHERE EXISTS (
  SELECT 1 FROM customers c
  WHERE c.id = o.customer_id AND c.country = 'US'
);

-- NOT IN danger: if subquery contains ANY NULL, entire result is empty
-- Because: id NOT IN (..., NULL) evaluates to NULL for every row
SELECT * FROM a WHERE id NOT IN (SELECT b_id FROM b);  -- breaks if b_id has NULLs

-- Safe alternative: NOT EXISTS or LEFT JOIN + IS NULL
SELECT * FROM a WHERE NOT EXISTS (SELECT 1 FROM b WHERE b.a_id = a.id);

NOT IN with NULLs is a silent trap

If the subquery in NOT IN can return NULLs, the result is always an empty set. Use NOT EXISTS or add WHERE b_id IS NOT NULL to the subquery.

Fill in the blank

Above Department Average

Find employees earning more than their department's average salary. Show name, dept, and salary.

View table schema

CREATE TABLE emp (id INT, name TEXT, dept TEXT, salary INT);
INSERT INTO emp VALUES (1,'Alice','Eng',95000),(2,'Bob','Eng',85000),(3,'Carol','Sales',70000),(4,'Dave','Sales',72000),(5,'Eve','Eng',90000),(6,'Frank','HR',65000);

-- YOUR QUERY HERE
SELECT

Use a correlated subquery in WHERE: WHERE salary > (SELECT AVG(salary) FROM emp e2 WHERE e2.dept = e1.dept). The inner query references the outer query's row.

SELECT e1.name, e1.dept, e1.salary
FROM emp e1
WHERE e1.salary > (
    SELECT AVG(e2.salary) FROM emp e2 WHERE e2.dept = e1.dept
);

7. Common Table Expressions (CTEs)

Basic CTE Syntax

-- Named, reusable subquery defined before the main SELECT
WITH active_users AS (
  SELECT id, name, email
  FROM users
  WHERE is_active = TRUE
),
recent_orders AS (
  SELECT user_id, COUNT(*) AS order_count, SUM(total) AS total_spent
  FROM orders
  WHERE created_at > NOW() - INTERVAL '90 days'
  GROUP BY user_id
)
SELECT
  u.name,
  u.email,
  COALESCE(o.order_count, 0) AS orders_90d,
  COALESCE(o.total_spent, 0) AS spent_90d
FROM active_users u
LEFT JOIN recent_orders o ON u.id = o.user_id
ORDER BY o.total_spent DESC NULLS LAST;

Recursive CTEs

-- Hierarchical data: org chart traversal
WITH RECURSIVE org_tree AS (
  -- Anchor: start with the CEO (no manager)
  SELECT id, name, manager_id, 0 AS depth
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive: join employees to already-found rows
  SELECT e.id, e.name, e.manager_id, ot.depth + 1
  FROM employees e
  JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT depth, name FROM org_tree ORDER BY depth, name;

-- Generate a date series (Postgres also has generate_series)
WITH RECURSIVE dates AS (
  SELECT '2026-01-01'::DATE AS d
  UNION ALL
  SELECT d + 1 FROM dates WHERE d < '2026-01-31'
)
SELECT d FROM dates;

-- Postgres built-in series generator (simpler):
SELECT generate_series('2026-01-01'::DATE, '2026-01-31'::DATE, '1 day'::INTERVAL)::DATE;

Materialized vs Non-Materialized CTEs (Postgres)

-- Before Postgres 12: CTEs were always materialized (optimization fences)
-- Postgres 12+: optimizer may inline simple CTEs. Use MATERIALIZED to force old behavior.
-- Force materialization (prevents optimizer from re-evaluating):
WITH expensive_query AS MATERIALIZED (
  SELECT * FROM large_table WHERE complex_condition = TRUE
)
SELECT * FROM expensive_query WHERE other_filter = 'x';

-- Force inlining (let optimizer push predicates through):
WITH simple_cte AS NOT MATERIALIZED (
  SELECT * FROM users
)
SELECT * FROM simple_cte WHERE id = 42;
-- Optimizer can rewrite this as: SELECT * FROM users WHERE id = 42 (index used)

CTE vs Subquery vs Temp Table

Approach	Scope	Materialized	Best for
Subquery	Expression-level	Optimizer decides	One-off, inline logic
CTE	Single statement	Optimizer decides (Postgres 12+)	Named clauses, readability, recursion
Temp Table	Session / transaction	Yes (on disk/memory)	Multi-query reuse, large intermediate results with indexes
View	Database-wide	No (re-executed each time)	Shared, reusable logic across queries
Materialized View	Database-wide	Yes (manual refresh)	Expensive aggregations, reporting

When to use a temp table over a CTE

If you need to add an index to the intermediate result, or if the same data is used in multiple subsequent queries, a temp table with an explicit index will almost always outperform a CTE.

Free practice

Generate a Number Series

Write a recursive CTE that generates numbers 1 through 10. SQLite doesn't have generate_series(), so this is the standard workaround.

-- Generate numbers 1 to 10 using a recursive CTE
-- YOUR QUERY HERE

Structure: WITH RECURSIVE nums(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM nums WHERE n < 10) SELECT n FROM nums;

WITH RECURSIVE nums(n) AS (
    SELECT 1
    UNION ALL
    SELECT n + 1 FROM nums WHERE n < 10
)
SELECT n FROM nums;

8. Window Functions

Window functions do not collapse rows

Unlike GROUP BY aggregates, window functions compute a result for each row while keeping all rows in the output. The OVER() clause defines the "window" of rows to consider.

OVER Clause Anatomy

function_name(expr)
  OVER (
    [PARTITION BY col1, col2]   -- divide into independent groups
    [ORDER BY col3 ASC]         -- order within partition
    [ROWS|RANGE|GROUPS BETWEEN frame_start AND frame_end]
  )

-- Without PARTITION BY: entire result set is one partition
-- Without ORDER BY: frame = entire partition (for aggregates)
-- With ORDER BY: default frame = RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Ranking Functions

SELECT
  name,
  department,
  salary,
  ROW_NUMBER()   OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
  RANK()         OVER (PARTITION BY department ORDER BY salary DESC) AS rnk,
  DENSE_RANK()   OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rnk,
  NTILE(4)       OVER (PARTITION BY department ORDER BY salary DESC) AS quartile
FROM employees;

/*
salary: 100, 100, 90, 80
ROW_NUMBER: 1, 2, 3, 4     -- unique, arbitrary tiebreaker
RANK:        1, 1, 3, 4     -- gap after ties
DENSE_RANK:  1, 1, 2, 3     -- no gap after ties
NTILE(4):    1, 2, 3, 4     -- assigns rows to N equal buckets
*/

-- Top-N per group using ROW_NUMBER (most common pattern)
SELECT * FROM (
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
  FROM employees
) ranked
WHERE rn <= 3;  -- top 3 per department

LAG, LEAD, FIRST_VALUE, LAST_VALUE

SELECT
  date,
  revenue,
  LAG(revenue)          OVER (ORDER BY date)     AS prev_day_revenue,
  LEAD(revenue)         OVER (ORDER BY date)     AS next_day_revenue,
  LAG(revenue, 7)       OVER (ORDER BY date)     AS revenue_7d_ago,
  FIRST_VALUE(revenue)  OVER (ORDER BY date)     AS first_day_revenue,
  -- LAST_VALUE needs explicit frame to include all rows up to end:
  LAST_VALUE(revenue)   OVER (
    ORDER BY date
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS last_day_revenue,
  NTH_VALUE(revenue, 3) OVER (ORDER BY date)     AS third_day_revenue
FROM daily_revenue;

-- LAG with default value (avoid NULLs on first row)
SELECT date,
  revenue,
  LAG(revenue, 1, 0) OVER (ORDER BY date) AS prev_revenue
FROM daily_revenue;

Running Totals and Moving Averages

SELECT
  date,
  revenue,
  -- Cumulative (running) total
  SUM(revenue) OVER (ORDER BY date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS cumulative_revenue,

  -- 7-day moving average
  AVG(revenue) OVER (ORDER BY date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS moving_avg_7d,

  -- Running count of distinct users per month partition
  COUNT(*) OVER (
    PARTITION BY DATE_TRUNC('month', date)
    ORDER BY date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS daily_cumulative_count

FROM daily_revenue;

Frame Types: ROWS vs RANGE vs GROUPS

Frame Type	Unit	Tie handling	Use when
`ROWS`	Physical rows	Each row counted individually	Running totals, most window uses
`RANGE`	Logical range of values	Tied ORDER BY values treated as group	Default; date ranges where ties should share same cumulative
`GROUPS`	Groups of tied rows	All tied rows move together	Sparse data where distinct value groups matter

Named Windows

-- Reuse the same window definition with WINDOW clause
SELECT
  name,
  salary,
  AVG(salary)   OVER dept_window AS dept_avg,
  MIN(salary)   OVER dept_window AS dept_min,
  MAX(salary)   OVER dept_window AS dept_max,
  RANK()        OVER (dept_window ORDER BY salary DESC) AS dept_rank
FROM employees
WINDOW dept_window AS (PARTITION BY department);

Window function evaluation order

Window functions execute after WHERE and GROUP BY but before ORDER BY and LIMIT. You cannot filter on a window function result in WHERE — wrap in a subquery or CTE.

Fill in the blank

Running Total

Write a query that shows each sale with a running total of amount, ordered by date.

View table schema

CREATE TABLE sales (id INT, sale_date TEXT, amount INT);
INSERT INTO sales VALUES (1,'2026-01-01',100),(2,'2026-01-02',250),(3,'2026-01-03',175),(4,'2026-01-04',300);

-- Show: sale_date, amount, running_total
SELECT

Use SUM(amount) OVER (ORDER BY sale_date). The default frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which gives a running total.

SELECT sale_date, amount,
       SUM(amount) OVER (ORDER BY sale_date) as running_total
FROM sales
ORDER BY sale_date;

Predict the output

RANK vs DENSE_RANK with Ties

What rank values does each function produce? Focus on Dave's values.

View table schema

CREATE TABLE scores (name TEXT, score INT);
INSERT INTO scores VALUES ('Alice',95),('Bob',90),('Carol',90),('Dave',85);

SELECT name, score,
       RANK() OVER (ORDER BY score DESC) as rnk,
       DENSE_RANK() OVER (ORDER BY score DESC) as dense_rnk
FROM scores;

Alice: 1/1, Bob: 2/2, Carol: 2/2, Dave: 4/3. RANK skips numbers after ties (1,2,2,4). DENSE_RANK doesn't skip (1,2,2,3).

9. Set Operations

-- UNION ALL: all rows from both queries (keeps duplicates, fastest)
SELECT id, name FROM customers_us
UNION ALL
SELECT id, name FROM customers_eu;

-- UNION: all rows, deduplicates (implicit DISTINCT — sorts internally, slower)
SELECT email FROM newsletter_list
UNION
SELECT email FROM purchase_list;

-- INTERSECT: rows present in both result sets
SELECT user_id FROM newsletter_subscribers
INTERSECT
SELECT user_id FROM paying_customers;

-- EXCEPT (Postgres/SQLite/SQL Server) / MINUS (Oracle):
-- rows in first set but NOT in second
SELECT email FROM all_users
EXCEPT
SELECT email FROM unsubscribed_users;

-- Rules:
-- 1. Same number of columns in all queries
-- 2. Compatible data types (column for column)
-- 3. Result column names come from the FIRST query

-- ORDER BY applies to the final combined result, not individual queries
SELECT id, name FROM table_a
UNION ALL
SELECT id, name FROM table_b
ORDER BY name;  -- sorts entire combined result

Prefer UNION ALL over UNION

UNION ALL skips the deduplication sort pass and is significantly faster. Only use UNION when you explicitly need deduplication — and even then, consider whether a DISTINCT on the outer query is more readable.

Predict the output

UNION vs UNION ALL

How many rows does each query return?

-- Query A
SELECT 'a' UNION ALL SELECT 'b' UNION ALL SELECT 'a';

-- Query B
SELECT 'a' UNION SELECT 'b' UNION SELECT 'a';

Query A: 3 rows (keeps duplicates). Query B: 2 rows (deduplicates). UNION = UNION ALL + DISTINCT. Always prefer UNION ALL when you know there are no duplicates — it avoids the sort/dedup cost.

10. Data Modification (DML)

INSERT

-- Single row
INSERT INTO users (name, email, created_at)
VALUES ('Alice', '[email protected]', NOW());

-- Multi-row (one statement — much faster than N separate INSERTs)
INSERT INTO users (name, email) VALUES
  ('Bob',   '[email protected]'),
  ('Carol', '[email protected]'),
  ('Dave',  '[email protected]');

-- INSERT INTO ... SELECT (bulk copy)
INSERT INTO archived_orders (id, user_id, total, created_at)
SELECT id, user_id, total, created_at
FROM orders
WHERE created_at < NOW() - INTERVAL '2 years';

-- RETURNING (Postgres): get inserted row data without a second query
INSERT INTO users (name, email) VALUES ('Eve', '[email protected]')
RETURNING id, created_at;

UPDATE

-- Basic update
UPDATE products SET price = price * 1.10 WHERE category = 'premium';

-- UPDATE with JOIN (Postgres: FROM clause)
UPDATE employees e
SET salary = salary * 1.05
FROM departments d
WHERE e.dept_id = d.id AND d.name = 'Engineering';

-- UPDATE with subquery
UPDATE orders
SET status = 'fulfilled'
WHERE id IN (
  SELECT order_id FROM shipments WHERE delivered_at IS NOT NULL
);

-- UPDATE with CTE (Postgres)
WITH price_adjustments AS (
  SELECT product_id, AVG(sale_price) * 0.9 AS new_price
  FROM order_items
  GROUP BY product_id
)
UPDATE products p
SET price = pa.new_price
FROM price_adjustments pa
WHERE p.id = pa.product_id;

-- RETURNING after UPDATE
UPDATE accounts SET balance = balance - 100 WHERE id = 42
RETURNING id, balance;

DELETE vs TRUNCATE

-- DELETE: removes rows, fires triggers, can WHERE-filter, logs every row, rollback OK
DELETE FROM sessions WHERE expires_at < NOW();

-- DELETE with JOIN (Postgres)
DELETE FROM order_items oi
USING orders o
WHERE oi.order_id = o.id AND o.status = 'cancelled';

-- TRUNCATE: removes ALL rows fast (transactional in Postgres, avoids per-row overhead)
-- Does NOT fire row-level triggers
-- Resets sequences (with RESTART IDENTITY)
TRUNCATE TABLE sessions;
TRUNCATE TABLE sessions RESTART IDENTITY CASCADE;  -- Postgres

UPSERT (INSERT ... ON CONFLICT)

-- Postgres: INSERT ON CONFLICT
INSERT INTO user_preferences (user_id, theme, font_size)
VALUES (42, 'dark', 14)
ON CONFLICT (user_id) DO UPDATE
  SET theme     = EXCLUDED.theme,
      font_size = EXCLUDED.font_size,
      updated_at = NOW();

-- ON CONFLICT DO NOTHING (ignore duplicate)
INSERT INTO tags (name) VALUES ('python')
ON CONFLICT (name) DO NOTHING;

-- MySQL: INSERT ... ON DUPLICATE KEY UPDATE
INSERT INTO user_preferences (user_id, theme)
VALUES (42, 'dark')
ON DUPLICATE KEY UPDATE theme = VALUES(theme);

-- MERGE (SQL standard, supported in Postgres 15+ (BY SOURCE: PG17+), SQL Server, Oracle)
MERGE INTO target t
USING source s ON t.id = s.id
WHEN MATCHED THEN
  UPDATE SET t.value = s.value
WHEN NOT MATCHED THEN
  INSERT (id, value) VALUES (s.id, s.value)
-- WHEN NOT MATCHED BY SOURCE requires Postgres 17+
WHEN NOT MATCHED BY SOURCE THEN
  DELETE;

11. Schema (DDL)

CREATE TABLE

CREATE TABLE orders (
  id          BIGSERIAL         PRIMARY KEY,
  user_id     BIGINT            NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
  status      VARCHAR(20)       NOT NULL DEFAULT 'pending',
  total       NUMERIC(12, 2)    NOT NULL CHECK (total >= 0),
  notes       TEXT,
  created_at  TIMESTAMPTZ       NOT NULL DEFAULT NOW(),
  updated_at  TIMESTAMPTZ       NOT NULL DEFAULT NOW(),

  -- Table-level constraints
  CONSTRAINT orders_status_check CHECK (
    status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled')
  ),
  CONSTRAINT orders_user_status_unique UNIQUE (user_id, status)
    WHERE status = 'pending'  -- Postgres partial unique index
);

ALTER TABLE

-- Add column (non-null requires a default or backfill)
ALTER TABLE users ADD COLUMN last_login TIMESTAMPTZ;
ALTER TABLE users ADD COLUMN tier VARCHAR(20) NOT NULL DEFAULT 'free';

-- Drop column
ALTER TABLE users DROP COLUMN IF EXISTS legacy_field;
ALTER TABLE users DROP COLUMN profile_json CASCADE; -- drops dependent views/indexes

-- Rename column
ALTER TABLE users RENAME COLUMN fname TO first_name;

-- Change column type (Postgres)
ALTER TABLE orders ALTER COLUMN notes TYPE TEXT;
ALTER TABLE products ALTER COLUMN price TYPE NUMERIC(14,2)
  USING price::NUMERIC(14,2);  -- explicit cast for type change

-- Add constraint
ALTER TABLE orders ADD CONSTRAINT orders_positive_total CHECK (total > 0);

-- Drop constraint
ALTER TABLE orders DROP CONSTRAINT orders_positive_total;

-- Rename table
ALTER TABLE order_lines RENAME TO order_items;

Indexes

-- B-tree (default): equality, ranges, ORDER BY
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Multi-column: order matters — leftmost prefix rule
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Partial index: index a subset of rows (smaller, faster)
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';

-- Covering index (Postgres INCLUDE): store extra columns in leaf pages
-- Allows index-only scans for SELECT id, status FROM orders WHERE user_id = ?
CREATE INDEX idx_orders_user_covering ON orders(user_id) INCLUDE (status, total);

-- Expression index
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Now: WHERE LOWER(email) = '[email protected]'  uses the index

-- GIN index: multi-valued types (arrays, JSONB, full-text search)
CREATE INDEX idx_posts_tags ON posts USING GIN(tags);
CREATE INDEX idx_events_data ON events USING GIN(data);  -- JSONB

-- GiST index: geometric, full-text, range types
CREATE INDEX idx_locations_geo ON locations USING GIST(coordinates);

-- Hash index: equality only, smaller than B-tree for large keys
CREATE INDEX idx_users_email_hash ON users USING HASH(email);

-- Unique index
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- Concurrent index build (Postgres): no table lock
CREATE INDEX CONCURRENTLY idx_orders_created ON orders(created_at);

-- Drop index
DROP INDEX IF EXISTS idx_orders_user_id;

Views and Materialized Views

-- Regular view: stored query, re-executed on each access
CREATE OR REPLACE VIEW active_users AS
SELECT id, name, email, created_at
FROM users
WHERE is_active = TRUE AND deleted_at IS NULL;

-- Updatable views (simple views on single table without aggregates are updatable)
UPDATE active_users SET name = 'Alice B.' WHERE id = 42;

-- Materialized view (Postgres): cached result, needs manual refresh
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
  DATE_TRUNC('month', created_at) AS month,
  SUM(total) AS revenue,
  COUNT(*) AS order_count
FROM orders
WHERE status = 'delivered'
GROUP BY 1;

-- Refresh the materialized view (blocks reads until done)
REFRESH MATERIALIZED VIEW monthly_revenue;

-- Non-blocking refresh (requires unique index on the view)
CREATE UNIQUE INDEX ON monthly_revenue(month);
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;

Sequences and Auto-Increment

-- Postgres: SERIAL (legacy shorthand)
id BIGSERIAL PRIMARY KEY
-- Equivalent to:
CREATE SEQUENCE orders_id_seq;
id BIGINT NOT NULL DEFAULT nextval('orders_id_seq') PRIMARY KEY;

-- Postgres: SQL standard IDENTITY (preferred in Postgres 10+)
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
-- GENERATED BY DEFAULT: allows manual inserts (useful for migrations)
id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,

-- MySQL: AUTO_INCREMENT
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,

-- Get current sequence value
SELECT currval('orders_id_seq');  -- last value in current session
SELECT nextval('orders_id_seq');  -- advance and return
SELECT lastval();                 -- last value from any sequence this session

12. Transactions & Concurrency

Transaction Basics

-- Explicit transaction
BEGIN;  -- or: START TRANSACTION;

  UPDATE accounts SET balance = balance - 500 WHERE id = 1;
  UPDATE accounts SET balance = balance + 500 WHERE id = 2;

COMMIT;   -- persist changes
-- or ROLLBACK; -- undo everything since BEGIN

-- Savepoints: partial rollback within a transaction
BEGIN;
  INSERT INTO orders (user_id, total) VALUES (1, 100);
  SAVEPOINT after_order;

  INSERT INTO order_items (order_id, product_id) VALUES (1, 99); -- this might fail
  -- If it fails:
  ROLLBACK TO SAVEPOINT after_order;
  -- Continue from the savepoint...

COMMIT;

ACID Properties

Property	Guarantee	Mechanism
Atomicity	All or nothing — transaction completes fully or not at all	Transaction log / UNDO log
Consistency	DB moves from one valid state to another; constraints hold	Constraints, triggers, app logic
Isolation	Concurrent transactions appear sequential	Locks, MVCC
Durability	Committed data survives crashes	WAL / redo log, fsync

Isolation Levels

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read
READ UNCOMMITTED	Possible	Possible	Possible
READ COMMITTED (default in Postgres)	Prevented	Possible	Possible
REPEATABLE READ (default in MySQL InnoDB)	Prevented	Prevented	Possible (but MySQL prevents it)
SERIALIZABLE	Prevented	Prevented	Prevented

-- Set isolation level for current transaction
BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- ... queries ...
COMMIT;

-- Postgres default: READ COMMITTED
-- MySQL InnoDB default: REPEATABLE READ

-- Anomaly definitions:
-- Dirty Read: reading uncommitted changes from another transaction
-- Non-Repeatable Read: reading same row twice in same txn, getting different values
-- Phantom Read: a new row appears in a range query within the same transaction

Locking

-- Explicit row-level locks (pessimistic locking)
-- FOR UPDATE: locks selected rows for exclusive update
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;

-- FOR SHARE (Postgres) / LOCK IN SHARE MODE (MySQL):
-- allows other readers but blocks writers
SELECT * FROM products WHERE id = 99 FOR SHARE;

-- SKIP LOCKED: skip rows locked by other transactions (queue processing)
SELECT * FROM jobs WHERE status = 'pending'
ORDER BY created_at
LIMIT 10
FOR UPDATE SKIP LOCKED;

-- NOWAIT: fail immediately if lock unavailable
SELECT * FROM accounts WHERE id = 1 FOR UPDATE NOWAIT;

-- Table-level lock (Postgres)
LOCK TABLE orders IN ACCESS EXCLUSIVE MODE;

-- Advisory locks (Postgres): application-defined locks
SELECT pg_advisory_lock(42);       -- session-level, blocks
SELECT pg_try_advisory_lock(42);   -- returns false instead of blocking
SELECT pg_advisory_unlock(42);

MVCC Overview

Multi-Version Concurrency Control (Postgres/MySQL InnoDB)

MVCC maintains multiple versions of each row. Readers see a consistent snapshot of the database at transaction start time. Writers create new row versions instead of overwriting. This means:

Readers never block writers; writers never block readers
Old row versions accumulate (Postgres: need VACUUM to reclaim space)
Long-running transactions prevent VACUUM from reclaiming old versions — monitor pg_stat_activity

Deadlock Prevention

-- Deadlock: T1 holds lock on A, waits for B; T2 holds lock on B, waits for A
-- Prevention: always acquire locks in the same order
-- T1 and T2 both lock accounts in ascending ID order:

-- Bad (different orders between transactions leads to deadlock risk):
-- T1: LOCK account 1, then account 2
-- T2: LOCK account 2, then account 1

-- Good (consistent ordering):
BEGIN;
SELECT * FROM accounts WHERE id IN (1, 2) ORDER BY id FOR UPDATE;
-- Now update both — no deadlock possible
COMMIT;

13. Query Optimization

EXPLAIN / EXPLAIN ANALYZE

# Postgres: show plan without executing
EXPLAIN SELECT * FROM orders WHERE user_id = 42;

# Postgres: execute and show actual timings
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM orders WHERE user_id = 42;

# MySQL
EXPLAIN SELECT * FROM orders WHERE user_id = 42;
EXPLAIN FORMAT=JSON SELECT * FROM orders WHERE user_id = 42;

-- Reading a Postgres EXPLAIN output:
-- Seq Scan: full table scan — no usable index
-- Index Scan: uses index, then fetches heap for each row
-- Index Only Scan: all needed columns in index — no heap access
-- Bitmap Index Scan + Bitmap Heap Scan: batch fetch for many rows
-- Hash Join: build hash table from smaller table, probe with larger
-- Nested Loop: for each outer row, scan inner — good for small result sets
-- Merge Join: requires both inputs sorted on join key

-- Key numbers to look at:
-- rows=N: estimated rows (compare with actual=N to spot planner mistakes)
-- cost=N.N..N.N: startup cost..total cost (in abstract units)
-- actual time=N..N ms: wall clock time per loop
-- loops=N: how many times this node executed
-- Buffers: shared hit=N (from cache) vs read=N (from disk)

Index Strategy

-- Check index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- Find unused indexes (candidates for removal)
SELECT indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0;

-- Find missing indexes (sequential scans on large tables)
SELECT relname, seq_scan, idx_scan
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan AND n_live_tup > 10000
ORDER BY seq_scan DESC;

Common Anti-Patterns

-- 1. SELECT *: fetches unused columns, prevents index-only scans
-- BAD:
SELECT * FROM orders WHERE user_id = 1;
-- GOOD:
SELECT id, status, total FROM orders WHERE user_id = 1;

-- 2. Function on indexed column: prevents index use
-- BAD: function on indexed column prevents index use
SELECT * FROM orders WHERE YEAR(created_at) = 2026;          -- MySQL
SELECT * FROM orders WHERE EXTRACT(YEAR FROM created_at) = 2026; -- Postgres (also bad)
-- GOOD: range condition uses index
SELECT * FROM orders WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01';

-- 3. Implicit type cast: prevents index use
-- BAD: user_id is INT, but '42' is VARCHAR — implicit cast prevents index
SELECT * FROM orders WHERE user_id = '42';
-- GOOD:
SELECT * FROM orders WHERE user_id = 42;

-- 4. Leading wildcard: full table scan (no B-tree prefix)
-- BAD:
SELECT * FROM products WHERE name LIKE '%phone%';
-- GOOD for prefix: uses index
SELECT * FROM products WHERE name LIKE 'Smart%';
-- For contains: use full-text search or pg_trgm index

-- 5. NOT IN with NULLs (covered in Subqueries section)

-- 6. OFFSET pagination on large tables: scans and discards rows
-- BAD (page 1000 with 20 rows/page = scan 20,000 rows):
SELECT * FROM posts ORDER BY id LIMIT 20 OFFSET 20000;
-- GOOD: keyset/cursor pagination
SELECT * FROM posts WHERE id > 20000 ORDER BY id LIMIT 20;

Statistics and Planner Hints

-- Update table statistics (runs automatically via autovacuum, but run manually after bulk load)
ANALYZE orders;
ANALYZE;  -- all tables

-- Increase statistics target for a column with high cardinality
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500; -- default is 100

-- Postgres planner configuration (session-level)
SET enable_seqscan = OFF;        -- discourage seq scans (testing only)
SET work_mem = '256MB';          -- more memory for sorts/hashes
SET enable_hashjoin = OFF;       -- force nested loop join

-- Query hints (MySQL)
SELECT /*+ INDEX(orders idx_orders_user_id) */ * FROM orders WHERE user_id = 42;
SELECT * FROM orders USE INDEX (idx_orders_user_id) WHERE user_id = 42;

Profile before you optimize

Use EXPLAIN ANALYZE to confirm the bottleneck before adding indexes. An unnecessary index adds write overhead and storage. The query planner is usually right — check if statistics are stale (ANALYZE) before forcing a plan with hints.

14. Advanced Features

JSON / JSONB Operations (Postgres)

-- Arrow operators
SELECT data->'user'          AS user_obj,      -- returns JSON
       data->>'user'         AS user_text,     -- returns text
       data->'user'->>'name' AS user_name      -- nested extraction
FROM events;

-- Path operator (Postgres 9.3+)
SELECT data #> '{user,address,city}'  AS city_json,
       data #>> '{user,address,city}' AS city_text
FROM events;

-- Containment: does the JSONB contain this subset?
SELECT * FROM events WHERE data @> '{"type": "click"}';

-- Key existence
SELECT * FROM events WHERE data ? 'error_code';        -- has key?
SELECT * FROM events WHERE data ?| ARRAY['a', 'b'];    -- any key?
SELECT * FROM events WHERE data ?& ARRAY['a', 'b'];    -- all keys?

-- Update JSONB fields
UPDATE users SET profile = profile || '{"verified": true}';
UPDATE users SET profile = profile - 'legacy_field';  -- remove key

-- Expand JSONB array to rows
SELECT id, jsonb_array_elements(data->'tags') AS tag FROM posts;

-- Aggregate rows into JSON
SELECT jsonb_agg(jsonb_build_object('id', id, 'name', name)) AS users
FROM users WHERE is_active = TRUE;

JSON in MySQL

-- MySQL JSON_EXTRACT
SELECT JSON_EXTRACT(data, '$.user.name') AS name FROM events;
SELECT data->>'$.user.name' AS name FROM events;  -- shorthand

-- Update
UPDATE events SET data = JSON_SET(data, '$.status', 'processed');
UPDATE events SET data = JSON_REMOVE(data, '$.legacy');

-- Array
SELECT JSON_ARRAYAGG(name) AS names FROM users WHERE is_active = 1;

Array Operations (Postgres)

-- Array construction
SELECT ARRAY[1, 2, 3];
SELECT ARRAY(SELECT id FROM users LIMIT 10);

-- Membership
SELECT * FROM posts WHERE 'sql' = ANY(tags);     -- contains element
SELECT * FROM posts WHERE tags @> ARRAY['sql'];  -- contains all elements

-- Unnest (expand array to rows)
SELECT id, UNNEST(tags) AS tag FROM posts;

-- Array aggregation
SELECT ARRAY_AGG(name ORDER BY name) FROM users GROUP BY dept_id;

-- Array operations
SELECT tags || ARRAY['new_tag'] FROM posts WHERE id = 1;  -- append
SELECT ARRAY_REMOVE(tags, 'old_tag') FROM posts;          -- remove element
SELECT ARRAY_LENGTH(tags, 1) FROM posts;                  -- length of dim 1
SELECT ARRAY_POSITIONS(tags, 'sql') FROM posts;           -- indexes of element

Full-Text Search (Postgres)

-- tsvector: preprocessed document; tsquery: search query
SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog');
-- 'brown':3 'dog':9 'fox':4 'jump':5 'lazy':8 'quick':2

SELECT to_tsquery('english', 'quick & fox');

-- Search
SELECT title FROM articles
WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('english', 'sql & index');

-- Index for FTS
CREATE INDEX idx_articles_fts ON articles
USING GIN(to_tsvector('english', title || ' ' || body));

-- Rank results by relevance
SELECT title,
       ts_rank(to_tsvector('english', body), to_tsquery('sql')) AS rank
FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('sql')
ORDER BY rank DESC;

Generated Columns

-- Postgres 12+ / MySQL 5.7+
-- STORED: computed and saved to disk
-- VIRTUAL: computed on read (Postgres supports only STORED)
CREATE TABLE rectangles (
  width   NUMERIC(10, 2) NOT NULL,
  height  NUMERIC(10, 2) NOT NULL,
  area    NUMERIC(20, 4) GENERATED ALWAYS AS (width * height) STORED
);

-- MySQL: VIRTUAL (computed on read, not stored)
CREATE TABLE rectangles (
  width  DECIMAL(10,2),
  height DECIMAL(10,2),
  area   DECIMAL(20,4) AS (width * height) VIRTUAL
);

Table Partitioning (Postgres)

-- Declarative partitioning (Postgres 10+)
CREATE TABLE orders (
  id         BIGSERIAL,
  created_at TIMESTAMPTZ NOT NULL,
  total      NUMERIC(12,2)
) PARTITION BY RANGE (created_at);

-- Create partitions
CREATE TABLE orders_2025 PARTITION OF orders
  FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

CREATE TABLE orders_2026 PARTITION OF orders
  FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');

-- List partitioning
CREATE TABLE customers (id BIGSERIAL, region TEXT)
PARTITION BY LIST (region);

CREATE TABLE customers_us PARTITION OF customers FOR VALUES IN ('US', 'CA');
CREATE TABLE customers_eu PARTITION OF customers FOR VALUES IN ('DE', 'FR', 'GB');

-- Hash partitioning (distribute evenly)
CREATE TABLE events (id BIGSERIAL, user_id BIGINT)
PARTITION BY HASH (user_id);

CREATE TABLE events_0 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE events_1 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 1);

Temporary Tables

-- Temporary table: session-scoped, dropped at end of session
CREATE TEMP TABLE staging (
  id    INT,
  value TEXT
);

-- Populate and use
INSERT INTO staging SELECT id, raw_data FROM import_buffer;
CREATE INDEX ON staging(id);  -- can index temp tables

-- WITH (ON COMMIT DROP): auto-drop at transaction end
CREATE TEMP TABLE session_data (x INT)
ON COMMIT DROP;

15. Stored Procedures & Functions

Functions in Postgres (PL/pgSQL)

-- Returns a scalar value
CREATE OR REPLACE FUNCTION get_user_total_spent(p_user_id BIGINT)
RETURNS NUMERIC AS $$
DECLARE
  v_total NUMERIC;
BEGIN
  SELECT COALESCE(SUM(total), 0)
  INTO   v_total
  FROM   orders
  WHERE  user_id = p_user_id AND status = 'delivered';

  RETURN v_total;
END;
$$ LANGUAGE plpgsql
   STABLE;  -- STABLE: won't modify DB, same args => same result within transaction

-- Call it
SELECT get_user_total_spent(42);
SELECT * FROM users WHERE get_user_total_spent(id) > 1000;

-- IMMUTABLE: same args always same result (pure function, can be indexed)
-- STABLE: won't modify DB, same result within a single table scan (may see changes between scans)
-- VOLATILE (default): can return different results every call

-- Returns a set of rows (table function)
CREATE OR REPLACE FUNCTION top_customers(p_limit INT DEFAULT 10)
RETURNS TABLE (id BIGINT, name TEXT, total_spent NUMERIC) AS $$
BEGIN
  RETURN QUERY
    SELECT u.id, u.name, COALESCE(SUM(o.total), 0) AS total_spent
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id AND o.status = 'delivered'
    GROUP BY u.id, u.name
    ORDER BY total_spent DESC
    LIMIT p_limit;
END;
$$ LANGUAGE plpgsql STABLE;

-- Use in FROM
SELECT * FROM top_customers(5);

Stored Procedures (Postgres 11+)

-- Procedures can COMMIT/ROLLBACK (functions cannot)
CREATE OR REPLACE PROCEDURE process_batch(p_batch_size INT)
LANGUAGE plpgsql AS $$
DECLARE
  v_count INT := 0;
BEGIN
  LOOP
    UPDATE jobs SET status = 'processing'
    WHERE id IN (
      SELECT id FROM jobs WHERE status = 'pending' LIMIT p_batch_size FOR UPDATE SKIP LOCKED
    );

    GET DIAGNOSTICS v_count = ROW_COUNT;
    EXIT WHEN v_count = 0;

    -- Do work here...

    COMMIT;  -- commit each batch
  END LOOP;
END;
$$;

-- Call with CALL
CALL process_batch(100);

Triggers

-- 1. Create the trigger function
CREATE OR REPLACE FUNCTION set_updated_at()
RETURNS TRIGGER AS $$
BEGIN
  NEW.updated_at = NOW();
  RETURN NEW;  -- return NEW for INSERT/UPDATE; return OLD to cancel
END;
$$ LANGUAGE plpgsql;

-- 2. Create the trigger
CREATE TRIGGER trg_orders_updated_at
  BEFORE UPDATE ON orders
  FOR EACH ROW
  EXECUTE FUNCTION set_updated_at();

-- Audit trigger: log changes to a separate table
CREATE OR REPLACE FUNCTION audit_changes()
RETURNS TRIGGER AS $$
BEGIN
  INSERT INTO audit_log (table_name, operation, old_row, new_row, changed_at)
  VALUES (TG_TABLE_NAME, TG_OP, row_to_json(OLD), row_to_json(NEW), NOW());
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_orders_audit
  AFTER INSERT OR UPDATE OR DELETE ON orders
  FOR EACH ROW EXECUTE FUNCTION audit_changes();

-- Drop trigger
DROP TRIGGER IF EXISTS trg_orders_updated_at ON orders;

Functions vs Procedures in Postgres

Functions: return a value, called in expressions (SELECT func()), cannot COMMIT/ROLLBACK
Procedures: called with CALL, can COMMIT/ROLLBACK mid-execution, no return value (use OUT params or INOUT)

16. Security & Permissions

GRANT and REVOKE

-- Grant specific privileges
GRANT SELECT, INSERT, UPDATE ON orders TO app_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly_user;
GRANT USAGE, SELECT ON SEQUENCE orders_id_seq TO app_user;
GRANT EXECUTE ON FUNCTION get_user_total_spent(BIGINT) TO app_user;

-- Grant with ability to pass it on
GRANT SELECT ON orders TO reporting_user WITH GRANT OPTION;

-- Revoke
REVOKE INSERT, UPDATE ON orders FROM app_user;
REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM legacy_user;

-- Roles (Postgres): groups of privileges
CREATE ROLE readonly_role;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly_role;
GRANT readonly_role TO alice, bob;  -- alice and bob inherit the role's privileges

-- Default privileges for future tables
ALTER DEFAULT PRIVILEGES IN SCHEMA public
  GRANT SELECT ON TABLES TO readonly_role;

-- Schema privileges
GRANT USAGE ON SCHEMA analytics TO reporting_user;
GRANT CREATE ON SCHEMA analytics TO developer;

Row-Level Security (Postgres)

-- Enable RLS on a table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

-- Create policies (without a policy, no rows are accessible)
-- Users can only see their own orders
CREATE POLICY user_own_orders ON orders
  FOR ALL
  TO app_user
  USING (user_id = current_setting('app.current_user_id')::BIGINT);

-- Admins see everything
CREATE POLICY admin_all_orders ON orders
  FOR ALL
  TO admin_role
  USING (TRUE);

-- Force RLS even for table owner (security hardening)
ALTER TABLE orders FORCE ROW LEVEL SECURITY;

-- Disable RLS (e.g. for migrations run as superuser)
SET row_security = OFF;

SQL Injection Prevention

-- NEVER concatenate user input into SQL strings
-- Vulnerable (Python example — do not do this):
-- query = "SELECT * FROM users WHERE name = '" + user_input + "'"
-- Input: ' OR '1'='1 => returns all users

-- SAFE: use parameterized queries (the driver handles escaping)
-- Python psycopg2:
cursor.execute("SELECT * FROM users WHERE name = %s", (user_input,))

-- Python asyncpg:
await conn.fetch("SELECT * FROM users WHERE name = $1", user_input)

-- If you must build dynamic SQL in PL/pgSQL, use EXECUTE with USING:
CREATE OR REPLACE FUNCTION search_table(p_table TEXT, p_value TEXT)
RETURNS SETOF RECORD AS $$
BEGIN
  RETURN QUERY EXECUTE
    format('SELECT * FROM %I WHERE name = $1', p_table)
    USING p_value;
    -- %I: identifier quoting (prevents injection via table name)
    -- $1 with USING: parameter binding (prevents injection via value)
END;
$$ LANGUAGE plpgsql;

Principle of Least Privilege

Your application database user should only have the specific privileges it needs: typically SELECT/INSERT/UPDATE/DELETE on specific tables, USAGE on sequences, and EXECUTE on specific functions. Never connect as a superuser or table owner from application code.

17. Dialect Differences

Syntax Comparison Table

Feature	PostgreSQL	MySQL	SQLite	SQL Server
String concat	`a \|\| b`	`CONCAT(a, b)`	`a \|\| b`	`a + b`
Limit rows	`LIMIT n OFFSET m`	`LIMIT n OFFSET m`	`LIMIT n OFFSET m`	`FETCH FIRST n ROWS ONLY` or `TOP n`
Auto-increment	`BIGSERIAL` / `GENERATED ALWAYS AS IDENTITY`	`AUTO_INCREMENT`	`INTEGER PRIMARY KEY` (implicit)	`IDENTITY(1,1)`
Case-insensitive LIKE	`ILIKE`	`LIKE` (default)	`LIKE` (ASCII only)	`LIKE` (depends on collation)
Regex match	`~` / `~*`	`REGEXP`	`GLOB` (limited)	`LIKE` only (no regex)
Upsert	`ON CONFLICT DO UPDATE`	`ON DUPLICATE KEY UPDATE`	`ON CONFLICT DO UPDATE`	`MERGE`
Returning data	`RETURNING *`	Not supported	`RETURNING *` (3.35+)	`OUTPUT INSERTED.*`
Full outer join	Yes	No (emulate with UNION)	No (emulate with UNION)	Yes
Lateral join	`LATERAL`	`LATERAL` (8.0+)	No	`CROSS APPLY`
Boolean type	Native BOOLEAN	TINYINT(1)	INTEGER 0/1	BIT
JSON support	JSON / JSONB (indexable)	JSON (5.7+)	TEXT (no functions)	JSON functions (2016+)

Date Function Comparison

-- Current timestamp
-- PostgreSQL:  NOW() or CURRENT_TIMESTAMP
-- MySQL:       NOW() or CURRENT_TIMESTAMP
-- SQLite:      datetime('now')
-- SQL Server:  GETDATE() or SYSDATETIME()

-- Truncate to month
-- PostgreSQL:  DATE_TRUNC('month', ts)
-- MySQL:       DATE_FORMAT(ts, '%Y-%m-01')
-- SQLite:      strftime('%Y-%m-01', ts)
-- SQL Server:  DATETRUNC(month, ts)  -- 2022+; else: CAST(FORMAT(ts,'yyyy-MM-01') AS DATE)

-- Extract year
-- PostgreSQL:  EXTRACT(YEAR FROM ts)  or  DATE_PART('year', ts)
-- MySQL:       YEAR(ts)
-- SQLite:      strftime('%Y', ts)
-- SQL Server:  YEAR(ts)  or  DATEPART(year, ts)

-- Add interval
-- PostgreSQL:  ts + INTERVAL '7 days'
-- MySQL:       DATE_ADD(ts, INTERVAL 7 DAY)
-- SQLite:      datetime(ts, '+7 days')
-- SQL Server:  DATEADD(day, 7, ts)

Upsert Syntax Comparison

-- PostgreSQL (9.5+)
INSERT INTO counters (key, value) VALUES ('clicks', 1)
ON CONFLICT (key) DO UPDATE SET value = counters.value + EXCLUDED.value;

-- MySQL
INSERT INTO counters (key, value) VALUES ('clicks', 1)
ON DUPLICATE KEY UPDATE value = value + VALUES(value);

-- SQLite (3.24+)
INSERT INTO counters (key, value) VALUES ('clicks', 1)
ON CONFLICT(key) DO UPDATE SET value = value + excluded.value;

-- SQL Server (MERGE)
MERGE INTO counters AS target
USING (VALUES ('clicks', 1)) AS source(key, value)
ON target.key = source.key
WHEN MATCHED THEN UPDATE SET target.value = target.value + source.value
WHEN NOT MATCHED THEN INSERT (key, value) VALUES (source.key, source.value);

18. Common Pitfalls & Gotchas

NULL Pitfalls

-- 1. NULL in aggregates: NULL is silently ignored
SELECT AVG(rating) FROM products;  -- NULL ratings excluded
-- If all values are NULL, AVG returns NULL (not 0)

-- 2. NULL in NOT IN (covered earlier — returns empty set)

-- 3. NULL in CASE: NULL input to simple CASE falls through to ELSE
SELECT CASE NULL WHEN NULL THEN 'match' ELSE 'no match' END;
-- Returns: 'no match'  (use IS NULL test instead)

-- 4. GROUP BY treats NULLs as equal (they group together)
SELECT category, COUNT(*) FROM products GROUP BY category;
-- NULL category forms its own group

-- 5. Outer join NULLs vs. true NULLs
-- After LEFT JOIN, NULLs in right-side columns may be either:
-- a) The stored value IS NULL
-- b) No matching row existed (join produced NULL)
-- Can't distinguish without IS NOT NULL on a NOT NULL column

GROUP BY Gotchas

-- 1. Non-aggregated columns not in GROUP BY
-- ANSI SQL: error. MySQL (with ONLY_FULL_GROUP_BY disabled): silently picks arbitrary value
-- Enable ONLY_FULL_GROUP_BY in MySQL:
SET SESSION sql_mode = 'ONLY_FULL_GROUP_BY';

-- 2. Filtering on aggregate — use HAVING, not WHERE
-- WRONG:
SELECT dept_id, COUNT(*) FROM employees WHERE COUNT(*) > 5 GROUP BY dept_id;
-- RIGHT:
SELECT dept_id, COUNT(*) FROM employees GROUP BY dept_id HAVING COUNT(*) > 5;

-- 3. GROUP BY column position (ordinal reference) is fragile
-- If you reorder SELECT columns, the GROUP BY breaks silently
SELECT name, COUNT(*) FROM t GROUP BY 1;  -- avoid
SELECT name, COUNT(*) FROM t GROUP BY name;  -- explicit

OFFSET Pagination at Scale

-- OFFSET N scans and discards N rows — O(N) cost
-- Page 1 (fast): LIMIT 20 OFFSET 0     — reads 20 rows
-- Page 100 (slow): LIMIT 20 OFFSET 1980 — reads 2000 rows, discards 1980

-- Keyset / cursor pagination: always O(1) index seek
-- Requires a stable sort key (usually monotonic ID or timestamp)

-- Page 1
SELECT id, title, created_at FROM posts ORDER BY id LIMIT 20;

-- Page 2: pass last seen ID as cursor
SELECT id, title, created_at FROM posts
WHERE id > :last_id           -- use the last id from previous page
ORDER BY id LIMIT 20;

-- Compound cursor (sort on non-unique column)
SELECT id, title, score FROM posts
WHERE (score, id) < (:last_score, :last_id)  -- row value comparison
ORDER BY score DESC, id DESC
LIMIT 20;

N+1 Query Problem

N+1 is the most common ORM-related performance issue

Querying 100 users then fetching each user's orders in a loop = 101 queries. Always use JOINs or eager loading. In raw SQL, this means a single joined query or a two-query approach (fetch IDs, then WHERE id IN (...)).

-- N+1 (bad): application loops over users and issues one query per user
-- SELECT * FROM users LIMIT 100;
-- Then for each user: SELECT * FROM orders WHERE user_id = {id}
-- = 101 queries

-- Fixed: one JOIN query
SELECT u.id, u.name, o.id AS order_id, o.total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.id IN (/* your user IDs */)
ORDER BY u.id, o.created_at;

Other Common Gotchas

-- 1. Implicit type coercion (MySQL is especially permissive)
SELECT '5' + 3;      -- MySQL: 8 (coerces string to int)
SELECT '5abc' + 3;   -- MySQL: 8 (truncates, no error!)
SELECT '5' + 3;      -- Postgres: error — explicit cast required

-- 2. Division truncation with integers
SELECT 7 / 2;        -- returns 3, not 3.5 (integer division)
SELECT 7.0 / 2;      -- returns 3.5
SELECT 7 / 2.0;      -- returns 3.5
SELECT 7::FLOAT / 2; -- Postgres: returns 3.5

-- 3. Trailing spaces in CHAR(n)
-- 'abc' and 'abc   ' are equal in CHAR comparisons in most databases

-- 4. Sequence gaps are normal
-- SERIAL/IDENTITY sequences skip numbers on rollbacks or failed inserts
-- Never assume IDs are contiguous

-- 5. Time zone traps
-- TIMESTAMP stores no timezone info — same value interpreted differently in different locales
-- TIMESTAMPTZ (Postgres) stores UTC internally, converts on display to session timezone
-- Store timestamps in UTC; convert at presentation layer

-- 6. ORDER BY in subqueries is not guaranteed
-- WRONG assumption: subquery preserves ORDER BY in outer query
SELECT * FROM (SELECT * FROM orders ORDER BY created_at) sub;
-- The outer query may ignore the inner ORDER BY — always add ORDER BY to the outer query

-- 7. Index on expression requires matching expression
CREATE INDEX idx_lower_email ON users(LOWER(email));
-- This uses the index:
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
-- This does NOT use it (different expression):
SELECT * FROM users WHERE email = '[email protected]';

Index Bloat and Maintenance

-- Postgres: dead tuples accumulate until VACUUM clears them
-- Check table bloat
SELECT relname, n_dead_tup, n_live_tup,
       ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

-- Manual vacuum (autovacuum normally handles this)
VACUUM ANALYZE orders;

-- VACUUM FULL: rewrites table, reclaims disk space, requires exclusive lock
-- Prefer pg_repack or CLUSTER for zero-downtime alternatives
VACUUM FULL orders;

-- Rebuild a bloated index
REINDEX INDEX CONCURRENTLY idx_orders_user_id;  -- Postgres 12+

-- Check index bloat
SELECT indexrelname,
       pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

Character Set and Collation

-- Collation affects: sorting, comparison, case folding, uniqueness
-- Postgres: set per column or expression
SELECT * FROM products ORDER BY name COLLATE "en-US";
CREATE TABLE items (name TEXT COLLATE "en-US");

-- Case-insensitive unique constraint (Postgres)
CREATE UNIQUE INDEX idx_users_email_ci ON users(LOWER(email));

-- MySQL: check collation
SHOW CREATE TABLE users;  -- shows column collations
-- utf8mb4_unicode_ci: case-insensitive, accent-insensitive
-- utf8mb4_bin: case-sensitive, binary comparison

-- Mixing collations in JOINs causes errors or falls back to table scan
-- Ensure joined columns use the same collation

MySQL gotcha: ONLY_FULL_GROUP_BY

Older MySQL installations have ONLY_FULL_GROUP_BY disabled by default, which silently allows non-deterministic GROUP BY results. Always enable it in production. Postgres always enforces standard GROUP BY semantics.