NULL Handling in SQL — The 5 Mistakes That Defeat Most Analysts
NULL is the most-misunderstood part of SQL — and the most expensive part to misunderstand. Every analyst has shipped a "wrong number" report at some point because of NULL semantics: a count that quietly excluded rows, a NOT IN that silently dropped everything, a divide-by-zero hidden behind COALESCE in the wrong place. This guide walks the five mistakes that defeat most analysts, with runnable examples and the rules that actually protect you.
Contents
- What NULL actually means
- Mistake 1:
col = NULLnever matches - Mistake 2: NOT IN with a nullable subquery
- Mistake 3: COUNT(*) vs COUNT(col)
- Mistake 4: NULL contagion in arithmetic
- Mistake 5: NULL in LEFT JOIN's WHERE
- COALESCE — the everyday workhorse
- NULLIF — converting sentinels
- Three-valued logic in one paragraph
- The NULL checklist
- Practice with real challenges
What NULL actually means
NULL is not zero, not empty string, not false. NULL means "unknown." A row's email column being NULL says "we don't have that person's email" — not "their email is empty." This sounds pedantic but it's the foundation of every NULL bug: the engine treats NULL as a third logical value, distinct from true and false, called unknown.
Three rules follow from that:
- Any comparison with NULL produces unknown. Not true, not false. The WHERE clause keeps rows only when the predicate is true, so any predicate that involves NULL silently excludes rows.
- Any arithmetic with NULL produces NULL.
5 + NULL = NULL.'hello' || NULL = NULL. NULL is contagious. - The only operators that handle NULL specially are
IS NULL,IS NOT NULL,COALESCE,NULLIF, and the rarely-usedIS DISTINCT FROM. Everything else propagates NULL or treats it as unknown.
Mistake 1: col = NULL never matches
This is the canonical NULL bug. Every SQL beginner writes it once, gets confused by an empty result, then learns the lesson the hard way.
SELECT name FROM users WHERE email = NULL;
Why: email = NULL evaluates to unknown for every row, including ones where email is genuinely NULL. Unknown is not true, so WHERE excludes them. The fix:
SELECT name FROM users WHERE email IS NULL;
SELECT name FROM users WHERE email IS NOT NULL;
The same trap applies to !=, <, >, BETWEEN, IN, and pattern-match operators like LIKE. They all propagate NULL, all silently exclude.
Spotting the bug in code review. Anywhere you see = NULL or != NULL, it's a bug. Always. Even if the column has no NULLs today, it might tomorrow — and the query will start dropping rows without changing.
Mistake 2: NOT IN with a nullable subquery
This bug is more subtle than = NULL because the query looks right. It only fails when the subquery happens to contain a NULL.
-- "Find customers who have never ordered"
SELECT name FROM customers
WHERE id NOT IN (SELECT customer_id FROM orders);
Why: NOT IN (...) logically expands to != a AND != b AND != c AND .... If any value in that list is NULL, that comparison becomes unknown. Unknown ANDed with anything stays unknown. The whole WHERE clause becomes unknown for every customer, and every row drops.
The bug is invisible until production data has its first NULL. Two safe alternatives:
SELECT name FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id
);
SELECT name FROM customers
WHERE id NOT IN (
SELECT customer_id FROM orders WHERE customer_id IS NOT NULL
);
NOT EXISTS is generally safer and faster — most query optimizers can short-circuit NOT EXISTS as soon as they find any matching row, while NOT IN often forces a full materialization of the subquery. Default to NOT EXISTS unless you have a specific reason for NOT IN.
Mistake 3: COUNT(*) vs COUNT(col)
These look similar, behave differently, and the difference matters for correctness.
SELECT
COUNT(*) AS total_rows, -- every row, including NULL emails
COUNT(email) AS with_email, -- non-NULL emails only
COUNT(DISTINCT email) AS unique_emails
FROM users;
Three different numbers are possible. If you have 100 users, 80 with emails (one duplicate), the result is 100, 80, 79. Pick the wrong COUNT variant and you ship a wrong report.
Same principle for other aggregates:
- SUM, AVG, MIN, MAX ignore NULL.
SUM(amount)andSUM(CASE WHEN amount IS NOT NULL THEN amount ELSE 0 END)produce the same result. - AVG divides the sum of non-NULL values by the count of non-NULL values, not the total row count. So
AVG(score) WHERE score IS NULL FOR HALF THE ROWSis the average of the other half — not "half the average." - AVG of an all-NULL column returns NULL, not 0. Wrap in COALESCE if your downstream code expects a number.
Mistake 4: NULL contagion in arithmetic
Arithmetic operations involving NULL produce NULL. 5 + NULL is NULL. price * quantity with quantity NULL is NULL. String concatenation: 'hello ' || NULL is NULL in standard SQL (some engines treat it as 'hello' instead — Oracle, for example).
SELECT name, price - discount AS net_price
FROM orders;
If discount is sometimes NULL (no discount applied), price - discount returns NULL — not the original price. Fix with COALESCE:
SELECT name, price - COALESCE(discount, 0) AS net_price
FROM orders;
The same fix applies to string concatenation, division, multiplication, modulo — anywhere a NULL operand might silently turn the whole expression to NULL.
Mistake 5: NULL in LEFT JOIN's WHERE
This is the same NULL trap, hidden inside a JOIN. We covered it in detail in the JOINs guide, but it deserves a slot here because the underlying cause is purely NULL semantics.
SELECT c.name, o.amount
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.status = 'paid';
For customers with no matching order, the LEFT JOIN produces a row with o.status = NULL. Then NULL = 'paid' evaluates to unknown, and WHERE drops the row. The LEFT JOIN's whole point — keeping unmatched left rows — is silently defeated.
SELECT c.name, o.amount
FROM customers c
LEFT JOIN orders o
ON o.customer_id = c.id AND o.status = 'paid';
COALESCE — the everyday workhorse
COALESCE returns the first non-NULL argument. It's the everyday tool for providing defaults, filling gaps, and stopping NULL from contaminating downstream math.
-- Default name when nickname is missing
COALESCE(nickname, first_name, 'Unknown')
-- Treat missing discount as zero
COALESCE(discount, 0)
-- Fill NULLs from a LEFT JOIN with sensible defaults
COALESCE(o.amount, 0) AS revenue,
COALESCE(o.created_at, date('now')) AS last_seen
COALESCE accepts any number of arguments. It returns NULL only if all of them are NULL. It's also smart about types — most engines promote arguments to a common type so COALESCE(int_col, 0.0) returns a real number cleanly.
COALESCE vs ISNULL. Both exist in SQL Server, only COALESCE is standard. Use COALESCE — it's portable, accepts more arguments, and handles types better. ISNULL is two-argument-only and Microsoft-specific.
NULLIF — converting sentinels
NULLIF is COALESCE's mirror. It returns NULL if the two arguments are equal, else returns the first. The classic use is avoiding divide-by-zero by turning 0 into NULL — division by NULL returns NULL, not an error.
-- Errors when total_orders is 0:
SELECT revenue / total_orders AS avg_order_value FROM ...
-- Returns NULL when total_orders is 0:
SELECT revenue / NULLIF(total_orders, 0) AS avg_order_value FROM ...
The reason this works: most engines raise a runtime error on division by zero, but division by NULL is well-defined as NULL. The pattern x / NULLIF(y, 0) turns potential errors into NULLs that downstream code can handle (often combined with COALESCE: COALESCE(x / NULLIF(y, 0), 0) for "ratio, defaulting to zero when undefined").
Other NULLIF uses:
- Sentinel value cleanup. A column stores empty strings for "no value." Replace with NULL:
NULLIF(name, ''). - Sentinel numeric values. A legacy column uses -1 for "unknown." Replace with NULL:
NULLIF(temperature, -1). - Detecting unchanged values.
NULLIF(after_value, before_value)returns NULL when nothing changed and the new value when it did — useful for change-data-capture style queries.
Three-valued logic in one paragraph
Standard programming languages have two-valued logic: true and false. SQL has three-valued logic: true, false, and unknown. NULL comparisons return unknown. The WHERE clause keeps a row only if its predicate is true; unknown rows are dropped. NOT (unknown) is unknown. true OR unknown is true (because true ORed with anything is true). false AND unknown is false (because false ANDed with anything is false). Everything else involving unknown is unknown. That's the entire system. Nearly every NULL bug traces back to one of these rules.
The NOT (unknown) trap. If you have a NOT condition wrapping a comparison that might be NULL, the result is still unknown — not true. WHERE NOT (status = 'closed') excludes rows where status is NULL, even though "NULL is not closed" feels like it should be true. Always combine with explicit IS NULL: WHERE status != 'closed' OR status IS NULL.
The NULL checklist
Before shipping any query that touches a nullable column, walk this checklist:
- Have I used
=or!=against NULL? If yes, replace withIS NULL/IS NOT NULL. - Have I used NOT IN with a subquery? If yes, switch to NOT EXISTS or filter NULL out of the subquery.
- Am I doing arithmetic on a column that might be NULL? If yes, wrap with COALESCE or be sure the NULL output is what you want.
- Am I dividing by a column that might be zero or NULL? If yes, use NULLIF on the divisor.
- Am I filtering on a right-side column in a LEFT JOIN's WHERE? If yes, move it into ON.
- Does my COUNT need every row or just non-NULL ones? COUNT(*) for every row, COUNT(col) for non-NULL.
- Does my AVG need to treat NULL as zero? If yes, use AVG(COALESCE(col, 0)) instead.
Practice with real challenges
NULL bugs are the hardest to catch in code review because the queries don't error — they silently return wrong data. The only real cure is encountering them on practice problems where the grader actually checks NULL handling.
Catch NULL bugs before production does
Every challenge runs against real data with NULL values mixed in. The diagnostic engine flags NULL handling specifically — not just "incorrect" but "you're missing the NULL rows."
Practice NULL Handling — Free →