SQL

SQL in the real world

A practitioner's PostgreSQL workshop: every query starts from a real symptom — an empty report, a slow join, a phantom index — and shows the exact query alongside its annotated psql output. Dense with real gotchas and their fixes (NOT IN/NULL, non-sargable predicates, deep OFFSET) plus the diagnostic queries against the system catalogs that DBAs genuinely keep within reach.

19 featured snippets

FIRST_VALUE / LAST_VALUE: the default-frame trapLAST_VALUE returns the current row unless you change the default frame (up to CURRENT ROW). The fix: extend the frame to UNBOUNDED FOLLOWING.
Detecting cycles in a graph: the CYCLE clauseA loop in the data (A is B's parent, B is A's parent) makes a recursive CTE run forever. The CYCLE clause (PostgreSQL 14+) stops the recursion and flags the offending rows.
Merging overlapping date rangesCollapsing overlapping periods into continuous ranges: a new group is detected when a start exceeds the running max of the previous ends, then you aggregate per group.
Indexing jsonb: GIN + the @> containment operatorA jsonb_path_ops GIN index dramatically speeds up @> containment queries over large document volumes. Verify it's used with EXPLAIN.
Finding foreign keys without an indexPostgreSQL doesn't index FKs automatically: every DELETE on the parent then triggers a seq scan of the child. This catalog query lists FKs whose first column isn't indexed anywhere.
Anti-pattern: NOT IN against NULLsA single NULL in the subquery and NOT IN silently returns zero rows (SQL three-valued logic). NOT EXISTS is NULL-safe and usually planned better.
Anti-pattern: a function on an indexed column (non-sargable)DATE_TRUNC or EXTRACT applied to the column prevents the index from being used. Rewriting the filter as a value range makes it sargable, and the B-tree index becomes usable again.
Anti-pattern: deep OFFSET — switch to keysetOFFSET 20000 reads and discards 20,000 rows on every page. Keyset (seek) pagination resumes from the last item seen: constant cost regardless of depth.
DISTINCT ON: last row per key in a single passPostgreSQL-specific: DISTINCT ON keeps the first row of each group per the ORDER BY — often faster than ROW_NUMBER with the right index.
Concurrent job queue: FOR UPDATE SKIP LOCKEDEach worker claims a batch of jobs by skipping rows already locked by the others: a reliable work queue in pure SQL, no external broker.
Conversion funnel: per-step rates in one queryCount the distinct users at each step with FILTER, then unfold the funnel into rows via LATERAL VALUES to compute both the overall rate and the step-by-step rate.
Sequences close to overflow (int4)Measures how much of each sequence is consumed relative to its max_value: an int4 sequence that overflows blocks every insert overnight.
NOT VALID constraints and disabled triggers: the dormant guardrailsAn inventory of CHECK/FK constraints created NOT VALID (never re-validated) and of disabled triggers: that many business rules the database no longer enforces.
MRR waterfall: new, expansion, contraction, churn in one reportMonthly breakdown of recurring revenue into four movements via a FULL JOIN of each customer's MRR with their previous month — every SaaS's standard report, in pure SQL.
Full RFM scoring: recency, frequency, monetary in quintilesA two-CTE pipeline: aggregate the three RFM axes per customer, score them in quintiles with NTILE, then produce readable segmentation (Champion, VIP to win back, Lost) ready for the CRM.
Fraud detection: closely spaced multi-country transactions on the same cardA LAG window per card compares each transaction to the previous one: a country change in under an hour is physically impossible — a classic signal of a compromised card.
Spotting the N+1 pattern in pg_stat_statementsQueries with negligible per-call time but called hundreds of thousands of times betray an ORM stuck in an N+1 loop: individually invisible, collectively dominant.
Physical column correlation: BRIN and CLUSTER candidatesRead pg_stats.correlation for indexed columns: close to 1, a tiny BRIN is enough; close to 0 on a frequently sorted column, a CLUSTER reorders the table and cuts I/O.
Real B-tree index bloat with pgstatindexAn exact (not estimated) measure of each index's leaf-page density: below 50% density, the index has doubled in size for nothing and deserves a REINDEX CONCURRENTLY.

← Back to the Data Lab