What is the difference between INNER JOIN and LEFT JOIN?

INNER JOIN returns only rows where the join condition matches on both sides — unmatched rows from either side are dropped. LEFT JOIN keeps every row from the left table whether or not it matches; columns from the right table are filled with NULL for unmatched left rows. The choice depends on intent: use INNER when you only want matched data, LEFT when you want a complete list of left-side records and any matching right-side data.

Why does my LEFT JOIN behave like an INNER JOIN?

Because you filtered the right table in WHERE instead of in the ON clause. WHERE runs after the join completes, so any predicate on a right-table column rejects the NULL-padded unmatched rows that LEFT JOIN intentionally kept. Move the right-table filter into the ON clause: LEFT JOIN orders o ON o.customer_id = c.id AND o.status = 'paid'. This is the single most common SQL bug in production analytics code.

What is an anti-join?

An anti-join finds rows in one table that have no match in another. The pattern: LEFT JOIN the second table, then filter where the right side is NULL. Example: customers who never ordered = customers LEFT JOIN orders ON ... WHERE orders.id IS NULL. Anti-joins are also expressible with NOT EXISTS, which is generally faster on large tables and safer if the join column has NULLs.

Should I use SELECT * with JOINs?

No. SELECT * with multi-table joins returns duplicate columns when both tables have a column with the same name (almost always 'id' or 'created_at'), and your downstream code may break invisibly when an upstream column gets renamed or dropped. Always name columns explicitly and prefix them with the table alias — c.name, o.amount — so the intent is unambiguous.

Is RIGHT JOIN ever the right choice?

Almost never in modern code. Most teams convert RIGHT JOINs to LEFT JOINs by reversing the table order, which keeps the dominant table on the left consistently. RIGHT JOIN is logically valid but harder to read in a chain of joins. The exception: when refactoring an existing query where reversing the order would touch many lines, leave the RIGHT JOIN and add a comment.

How do I handle the case where my JOIN multiplies rows?

Row multiplication is real, not a bug. If a customer has 3 orders, joining customers to orders produces 3 rows for that customer. The fix is not to slap DISTINCT on the result — that hides the cardinality without explaining it. Instead: trace which side is the 'one' and which is the 'many', then decide whether you actually want one row per order (so the multiplication is correct) or one row per customer with an aggregated order count (so you should GROUP BY customer or use a subquery to pre-aggregate).

SQL JOINs Explained — Every Type, With Real Examples (2026)

Published May 2026 · 14 min read · Practice 30+ JOIN challenges →

JOINs are where most analysts freeze on the job. The first SQL course covers INNER JOIN in 10 minutes and moves on. Then you sit at work, three tables in front of you, and the chain doesn't form. This guide walks every JOIN type with runnable examples, the one bug that defeats most LEFT JOINs in production, and the patterns that interview screens are actually looking for.

Contents

Why JOINs trip people up
The sample data we'll use
INNER JOIN — only matches
LEFT JOIN — keep all the left
RIGHT and FULL OUTER
CROSS JOIN — every combination
SELF JOIN — table joined to itself
The WHERE vs ON bug (critical)
Anti-joins — finding "X without Y"
Multi-table chains
Performance and ordering
Five common JOIN mistakes
Practice with real challenges

Why JOINs trip people up

Real databases store data across multiple tables on purpose. Customers in one table, orders in another, products in a third. JOIN is how you reassemble the picture. The mental hurdle for most learners isn't the syntax — it's the cardinality math. When you join customers to orders, you don't get one row per customer. You get one row per order, with the customer's columns repeated. Skip past that conceptual flip and the rest is mechanical.

JOINs also have an outsized effect on interview screens. Roughly 50–60% of mid-level data-analyst interview questions involve at least two tables. Window functions get the spotlight, but you can't reach the window-function part of the answer if your join is wrong.

The sample data we'll use

SQL Quest's challenges run on a Titanic dataset in the browser — a real historical record of passengers, what class they travelled, what they paid, and whether they survived. We'll add a small tickets table for the JOIN examples. You can paste any of these queries into the app and see them run.

passengers

CREATE TABLE passengers (
  id     INTEGER PRIMARY KEY,
  name   TEXT,
  class  TEXT,    -- '1st', '2nd', '3rd'
  age    INTEGER,
  fare   REAL,
  survived INTEGER  -- 0 or 1
);

tickets

CREATE TABLE tickets (
  id            INTEGER PRIMARY KEY,
  passenger_id  INTEGER,
  ticket_number TEXT,
  embarked_port TEXT    -- 'S', 'C', 'Q'
);

Some passengers have a ticket record, some don't. That asymmetry is what makes the different JOIN types matter.

INNER JOIN — only matches

INNER JOIN returns rows where the join condition is true on both sides. Unmatched rows from either table are dropped. It's the default — when you write just JOIN without a qualifier, the engine treats it as INNER.

List passengers who have a ticket record

SELECT p.name, p.class, t.embarked_port
FROM passengers p
INNER JOIN tickets t ON t.passenger_id = p.id;

Two patterns to internalize from this query:

Aliases (p, t) are required practice. Without them, ambiguous column names — id, name — produce errors. With them, every column reference is unambiguous and the query reads cleanly.
The ON clause is the join condition. Logically: "match a passenger row with a ticket row where the ticket's passenger_id equals the passenger's id." Most joins use equality on a primary-key / foreign-key relationship. Non-equality joins (range joins, fuzzy joins) exist but are rare.

If a passenger has no ticket record, they don't appear in the result. If a ticket points at a passenger ID that doesn't exist, that ticket also doesn't appear. INNER returns the intersection.

LEFT JOIN — keep all the left

LEFT JOIN (also written LEFT OUTER JOIN) keeps every row from the left table, whether or not it matches the right. Unmatched rows still appear, with right-table columns filled in as NULL.

List every passenger and their ticket port if available

SELECT p.name, p.class, t.embarked_port
FROM passengers p
LEFT JOIN tickets t ON t.passenger_id = p.id;

The output looks identical to INNER JOIN for matched rows. The difference appears for passengers without a ticket: they still show up, with embarked_port as NULL.

name	class	embarked_port
Cardeza, Mr	1st	S
Ward, Miss	1st	C
Baxter, Mrs	1st	NULL
...	...	...

Use LEFT JOIN whenever you want a complete list of left-side records and any matching right-side data. Common cases: every customer with their order count (some customers have zero orders), every employee with their manager's name (the CEO has no manager), every product with its first-purchase date (some products never sold).

RIGHT and FULL OUTER

RIGHT JOIN is the mirror of LEFT — keep every row from the right table, fill the left with NULL when unmatched. In modern code, almost no one writes RIGHT JOIN. The convention is to put the dominant table first and use LEFT JOIN, which keeps the join order consistent across a multi-step query. The query above could equivalently be written as:

RIGHT JOIN equivalent — but harder to read

SELECT p.name, p.class, t.embarked_port
FROM tickets t
RIGHT JOIN passengers p ON t.passenger_id = p.id;

FULL OUTER JOIN keeps every row from both sides. Unmatched rows on either side get NULLs from the other. Useful when comparing two sources that should match (a sales table from system A vs a sales table from system B) and you want to see every discrepancy in either direction.

Note: SQLite (the engine SQL Quest uses) didn't support FULL OUTER JOIN natively until version 3.39. PostgreSQL, SQL Server, and Oracle support it directly. MySQL still lacks it — the workaround is two LEFT JOINs combined with UNION.

CROSS JOIN — every combination

CROSS JOIN produces every row of A paired with every row of B. No ON clause. The output size is rows(A) × rows(B), which can explode quickly — a 10,000 × 10,000 cross join is 100M rows. Use it carefully.

All sizes paired with all colors

SELECT sizes.size, colors.color
FROM sizes
CROSS JOIN colors;

Three legitimate use cases:

Generating combinations — every size × every color, every customer × every product (for a "did this customer ever buy this?" matrix), every salesperson × every region.
Calendar dimension joining — generate every (date, customer_id) pair, then LEFT JOIN actual sales to fill missing days with zero. This is how you build a continuous time series even when no sales happened on certain dates.
Materializing parameter grids — a hyperparameter sweep with every (learning_rate, batch_size) combo.

SELF JOIN — table joined to itself

A self-join is just a regular join where both sides happen to be the same table — different aliases, treated as separate references. Self-joins answer questions like "for every row, what's the row before it?" or "find pairs of records that share property X."

Find passengers in the same class who paid wildly different fares

SELECT a.name AS name_a, a.fare AS fare_a,
       b.name AS name_b, b.fare AS fare_b
FROM passengers a
JOIN passengers b
  ON a.class = b.class
  AND a.id < b.id            -- avoid duplicates & self-pairs
WHERE ABS(a.fare - b.fare) > 100;

The a.id < b.id trick is essential. Without it you'd see each pair twice (once as A,B and once as B,A) and every row paired with itself. The strict less-than picks one canonical ordering per pair.

Other classic self-join patterns:

Hierarchies — employees joined to themselves on manager_id = id to get the manager's name alongside each employee.
Consecutive events — sessions joined to themselves to find the gap between a user's current and previous login. (Modern SQL prefers LAG(), but self-joins still work.)
Pairs and graph edges — friend connections, follow relationships, anything where rows of the same type relate to each other.

The WHERE vs ON bug (critical)

This is the single most common SQL bug in production analytics code. It's worth reading slowly.

Suppose you want every customer plus their paid orders. Customers without paid orders should still appear (with NULL order columns). The intuitive query is:

⚠ This silently turns LEFT JOIN into INNER JOIN

SELECT c.name, o.amount
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.status = 'paid';

Looks fine. Runs without error. Gives wrong results. Customers without any paid orders do not appear, even though the query is structurally a LEFT JOIN.

Why: the WHERE clause runs after the join completes. For a customer with no paid orders, the LEFT JOIN produces a row where o.status is NULL. NULL = 'paid' evaluates to unknown, which the WHERE clause treats as not-true and excludes. Every NULL-padded row gets filtered out — exactly the rows the LEFT JOIN was supposed to keep.

The fix: move the right-table predicate into the ON clause.

✅ Correct — filter inside ON, not in WHERE

SELECT c.name, o.amount
FROM customers c
LEFT JOIN orders o
  ON o.customer_id = c.id AND o.status = 'paid';

Now the join still produces a row for every customer; for customers without a paid order the amount column is NULL, and that NULL row survives because it's never tested against the failed predicate.

Rule of thumb: in a LEFT JOIN, predicates that filter the right table belong in ON. Predicates that filter the left table can go in WHERE. If you're not sure which side a predicate is filtering, run the query with and without it — if rows you expected to keep disappear, it's filtering the right table and needs to move into ON.

Anti-joins — finding "X without Y"

An anti-join finds rows in one table that have no match in another. Two common patterns work; one is faster and safer.

Pattern 1: LEFT JOIN + IS NULL

Customers who never ordered

SELECT c.name
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.id IS NULL;

Standard LEFT JOIN keeps every customer with order columns NULL when no match. Filtering on o.id IS NULL narrows to customers whose order side stayed NULL — i.e., the unmatched ones. Note that filtering for IS NULL on a right-table column is what you want here, unlike the bug above; you're explicitly asking for the unmatched rows.

Pattern 2: NOT EXISTS

Same result, often faster on large tables

SELECT c.name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.customer_id = c.id
);

Most query optimizers can short-circuit NOT EXISTS as soon as they find any matching row, so it tends to outperform the LEFT JOIN + IS NULL pattern when the right table is large and most customers do have orders.

Don't use NOT IN with a nullable column. WHERE id NOT IN (SELECT customer_id FROM orders) silently returns nothing if even one row in the subquery has customer_id = NULL. The reason is the same as the WHERE-vs-ON bug: NULL comparisons return unknown, the NOT IN logic chains through with AND, and the whole clause becomes unknown. NOT EXISTS doesn't have this trap.

Multi-table chains

Real questions usually need three or more tables. The mental shortcut: build the chain one join at a time, name what each join contributes, and never mix INNER and LEFT carelessly.

Total revenue per customer in the last 30 days, including customers with zero

SELECT c.name,
       COALESCE(SUM(oi.quantity * p.price), 0) AS revenue_30d
FROM customers c
LEFT JOIN orders o
  ON o.customer_id = c.id
  AND o.created_at >= date('now', '-30 days')
LEFT JOIN order_items oi ON oi.order_id = o.id
LEFT JOIN products p ON p.id = oi.product_id
GROUP BY c.id, c.name;

What's happening:

The customer table is anchor. Everything LEFT JOINs from it, so every customer is guaranteed to appear in the output.
The 30-day filter lives in ON, not WHERE. Putting it in WHERE would re-introduce the bug — customers without recent orders would vanish.
COALESCE turns NULL revenue into 0. When all the LEFT JOINs unmatched, SUM returns NULL, and your downstream code probably expects a number.

Performance and ordering

Two practical tips. They're not always universal, but they're a good default.

Indexed join keys are non-negotiable

Every column you use in ON should be indexed (or be the primary key, which is indexed automatically). Without an index, the engine falls back to a nested-loop scan, which scales as O(n × m) — fine for 1000 × 1000 (a million ops, instant), painful for 1M × 1M (a trillion ops, hours).

Filter as early as possible

The query planner usually does this automatically: predicates that don't depend on the join are pushed down to the source tables, so the join only sees rows that survive the filter. But you can help by making predicates explicit and avoiding patterns that block pushdown — like wrapping a column in a function (WHERE LOWER(name) = 'alice' can prevent index use; WHERE name = 'Alice' uses the index).

Five common JOIN mistakes

1. Slapping DISTINCT on a row-multiplied result

If your query returns more rows than expected, the fix isn't DISTINCT. The fix is identifying which join is producing the multiplication. DISTINCT scans every row to deduplicate — slow on large tables — and hides the cardinality issue without explaining it. Trace the join graph: which side is "one" and which is "many"? Is the multiplication actually correct (one row per order) or do you want one row per customer (then GROUP BY customer or pre-aggregate)?

2. Forgetting table aliases in multi-table queries

If two tables both have an id column, SELECT id is ambiguous. Always alias: SELECT c.id, o.id. Most engines reject ambiguous unqualified columns; the ones that don't pick arbitrarily, which is worse — you get a result that looks fine but is wrong.

3. JOIN on a non-unique key without realizing

If you join customers to addresses and a customer can have multiple addresses, the result has one row per (customer, address) pair, not one row per customer. Fine if that's what you wanted; not fine if you assumed customers are still uniquely represented. Check both sides' uniqueness before assuming the row count.

4. Implicit JOIN syntax (don't)

SELECT * FROM customers, orders WHERE customers.id = orders.customer_id is the old comma-separated implicit-join syntax. It works, but it's harder to read and easier to forget the join condition (which silently produces a CROSS JOIN of every customer with every order). Use explicit INNER JOIN ... ON.

5. The same WHERE-vs-ON bug, in `AND` chains

A subtle variant: LEFT JOIN orders o ON o.customer_id = c.id AND o.created_at >= '2026-01-01' AND o.status = 'paid'. This is fine — both predicates filter the right table, both are inside ON. But if you also need a left-table filter, put it in WHERE: WHERE c.country = 'US'. Mixing left- and right-table filters in ON causes the optimizer to do extra work and obscures intent.

Practice with real challenges

Reading about JOINs is the easy part. The patterns only stick once your fingers know them — which means writing 30+ queries from cold memory until the chain forms automatically. SQL Quest has a curated bank of JOIN challenges across difficulty levels:

The fastest way to remember JOINs is to use them

Every challenge runs in your browser against a real SQLite engine. When your query is wrong, the AI Coach explains the specific gap — wrong row count, missing JOIN condition, NULL filtering — instead of just "incorrect."

Start Practicing — Free →

SQL JOINs Explained — Every Type, With Real Examples (2026)

Why JOINs trip people up

The sample data we'll use

INNER JOIN — only matches

LEFT JOIN — keep all the left

RIGHT and FULL OUTER

CROSS JOIN — every combination

SELF JOIN — table joined to itself

The WHERE vs ON bug (critical)

Anti-joins — finding "X without Y"

Pattern 1: LEFT JOIN + IS NULL

Pattern 2: NOT EXISTS

Multi-table chains

Performance and ordering

Indexed join keys are non-negotiable

Filter as early as possible

Five common JOIN mistakes

1. Slapping DISTINCT on a row-multiplied result

2. Forgetting table aliases in multi-table queries

3. JOIN on a non-unique key without realizing

4. Implicit JOIN syntax (don't)

5. The same WHERE-vs-ON bug, in AND chains

Practice with real challenges

Recommended next reads & drills

The fastest way to remember JOINs is to use them

5. The same WHERE-vs-ON bug, in `AND` chains