SQL Server, from the ground up Lesson 9 / 40

JOINs: the only diagram you need

INNER, LEFT, RIGHT, FULL, CROSS. Venn-diagram heresy. Semi-joins, anti-joins, and the difference between a JOIN predicate and a WHERE filter.

If there’s one topic in SQL that separates “someone who knows a little SQL” from “someone who can actually use a relational database,” it’s joins. Joins are the reason the R in RDBMS exists. They’re how you combine Sales.Orders with Sales.Customer to see which customer placed each order, how you link Inventory.Stock to Catalog.Product to see what’s actually in the warehouse, how you tie a Support.Ticket back to the Shipping.Shipment that caused someone to email.

Runehold’s business runs on joins. Marketing wants “top-spending customers in Italy.” Operations wants “orders shipped late this week.” Finance wants “revenue per VAT rate per country last quarter.” None of those are single-table questions. All of them are joins.

The Venn-diagram heresy

You’ve seen the Venn diagrams. Two overlapping circles. “INNER JOIN is the intersection, LEFT JOIN is the left circle plus the overlap.” The diagrams are widely shared, largely wrong, and I’m going to explain the real mental model because it makes harder joins trivial later.

The truth: joins produce a row for every valid combination of rows from the input tables. Sometimes that’s the intersection of sets; sometimes it’s a multiplication. Venn diagrams imply set-based thinking when joins are actually row-combinatoric.

Keep the Venn picture as a rough mnemonic if it helps. But when something weird happens with joins, reach for the row-combinations model instead. I’ll show examples.

INNER JOIN: rows that match on both sides

INNER JOIN (or just JOIN — same thing) returns rows that have a match in both tables based on the join condition.

-- Orders with the customer's name and country
SELECT o.OrderId,
       o.Total,
       c.Name,
       c.Country
FROM Sales.Orders   AS o
INNER JOIN Sales.Customer AS c
    ON c.CustomerId = o.CustomerId;

If an order has a CustomerId but that customer doesn’t exist in Sales.Customer (rare, because of the foreign key), the order is excluded. If a customer has no orders, they’re excluded. Only pairs where both sides match show up.

ON <condition> is how you tell the engine what “match” means. It’s usually an equality between keys but can be any boolean expression. ON c.CustomerId = o.CustomerId is the standard “link on the foreign key” pattern.

LEFT JOIN: everything from the left, plus any matches from the right

Often written as LEFT OUTER JOIN, but OUTER is optional and everyone skips it.

-- Every customer, with their orders if they have any
SELECT c.CustomerId,
       c.Name,
       o.OrderId,
       o.Total
FROM Sales.Customer AS c
LEFT JOIN Sales.Orders AS o
    ON o.CustomerId = c.CustomerId;

Customers with no orders still appear in the result. The columns from o are NULL for those rows. This is crucial for questions like “which customers have never ordered?” and “which products have never sold?”:

-- Customers who have never ordered
SELECT c.CustomerId, c.Name
FROM Sales.Customer AS c
LEFT JOIN Sales.Orders AS o
    ON o.CustomerId = c.CustomerId
WHERE o.OrderId IS NULL;

The filter o.OrderId IS NULL keeps only the rows where the LEFT JOIN failed to find a match. This pattern is called an anti-join, and it’s how you answer every “X without Y” question.

RIGHT JOIN: nobody writes this

RIGHT JOIN is a mirror of LEFT JOIN — everything from the right side, plus matches from the left. It exists for completeness.

In practice, you never write RIGHT JOIN in new code. Every time you’re tempted, just flip the table order and use LEFT JOIN — way easier to read:

-- These are equivalent:
SELECT ...
FROM A LEFT JOIN B ON ...

SELECT ...
FROM B RIGHT JOIN A ON ...

Consistency wins. Stick with LEFT.

FULL OUTER JOIN: both sides, plus mismatches

Returns everything from both sides. Rows that match on both sides are combined. Rows that don’t match on one side have NULLs for the missing side.

-- Products with sales this month, plus products that exist but didn't sell,
-- plus sales that reference products we no longer carry.
SELECT p.Sku,
       p.Name,
       s.OrderId,
       s.Quantity
FROM Catalog.Product AS p
FULL OUTER JOIN Sales.OrderLine AS s
    ON s.ProductId = p.ProductId
WHERE s.OrderDate >= '2026-04-01' OR s.OrderDate IS NULL;

FULL OUTER is genuinely useful when you’re looking for “mismatches on either side.” I use it about once a month for reconciliation reports.

CROSS JOIN: everything times everything

SELECT c.Name, p.Name
FROM Sales.Customer AS c
CROSS JOIN Catalog.Product AS p;

Every customer paired with every product. If you have 50 customers and 8,000 products, you get 400,000 rows. Useful for:

  • Filling in a date dimension (every date × every country, for reporting grids with no gaps).
  • Building matrices (every employee × every required training).
  • Generating test data.

Used rarely in production queries. Used constantly in report generation.

There’s no ON clause. Any filter you want goes in WHERE.

Join predicates vs WHERE filters

Subtle and important. There’s a difference between these two:

-- A: filter in the ON clause
SELECT c.Name, o.Total
FROM Sales.Customer AS c
LEFT JOIN Sales.Orders AS o
    ON o.CustomerId = c.CustomerId
   AND o.CountryCode = 'IT';

-- B: filter in the WHERE clause
SELECT c.Name, o.Total
FROM Sales.Customer AS c
LEFT JOIN Sales.Orders AS o
    ON o.CustomerId = c.CustomerId
WHERE o.CountryCode = 'IT';

For an INNER JOIN, these are equivalent. For a LEFT JOIN — which is what’s above — they’re not.

Query A: “every customer, with their IT orders if any; customers with no IT orders still appear, with NULL columns from o.”

Query B: “every customer, with all their orders, then filter out rows where o.CountryCode isn’t 'IT'.” The WHERE o.CountryCode = 'IT' eliminates rows where o.CountryCode is NULL (from the outer-join null-fill), so customers with no IT orders disappear entirely. You’ve effectively turned the LEFT JOIN back into an INNER JOIN.

The rule: filters on the right-side table go in the ON clause of a LEFT JOIN. Filters on the left-side table go in the WHERE clause. Filters that apply to both tables or that don’t care about the join type go in WHERE.

This is the single most common join bug in real code. Get the mental model right once and it stops.

Multi-table joins

You can chain joins across many tables. Runehold’s “shipment report” query:

SELECT o.OrderId,
       c.Name                   AS CustomerName,
       c.Country                 AS CustomerCountry,
       s.TrackingNumber,
       s.ShippedAt,
       carr.Name                 AS Carrier
FROM Sales.Orders           AS o
JOIN Sales.Customer         AS c   ON c.CustomerId = o.CustomerId
JOIN Shipping.Shipment      AS s   ON s.OrderId    = o.OrderId
JOIN Shipping.Carrier       AS carr ON carr.CarrierId = s.CarrierId
WHERE o.OrderDate >= '2026-04-01'
  AND o.OrderDate  < '2026-04-08'
ORDER BY s.ShippedAt DESC;

Every new JOIN adds more filtering (for INNER) or more optionality (for OUTER). Aliases are mandatory here — without them the query is unreadable. Pick short, memorable aliases. Don’t use t1, t2, t3. Use o, c, s, carr.

Self-joins

A table joined to itself — common for hierarchy queries and “compare a row to another row of the same table” patterns.

Runehold’s employees have managers who are also employees:

SELECT e.FullName    AS Employee,
       e.Team        AS Team,
       m.FullName    AS Manager
FROM HR.Employee AS e
LEFT JOIN HR.Employee AS m
    ON m.EmployeeId = e.ManagerId;

The same table, aliased as e (employee) and m (manager), joined on the ManagerId foreign key. This is self-joining, and it’s fine. Any row-based pattern — “find customers who have another customer in the same city with higher LTV” — is a self-join.

EXISTS and NOT EXISTS: semi-joins and anti-joins

Sometimes you don’t want the joined columns, just the membership info: “does at least one order exist for this customer?”

-- Customers with at least one order
SELECT c.CustomerId, c.Name
FROM Sales.Customer AS c
WHERE EXISTS (
    SELECT 1 FROM Sales.Orders o WHERE o.CustomerId = c.CustomerId
);

-- Customers with NO orders
SELECT c.CustomerId, c.Name
FROM Sales.Customer AS c
WHERE NOT EXISTS (
    SELECT 1 FROM Sales.Orders o WHERE o.CustomerId = c.CustomerId
);

EXISTS is a semi-join: it keeps rows from the left where a match exists on the right, but doesn’t duplicate rows if multiple matches exist. Compared to JOIN:

  • JOIN can multiply rows: one customer with five orders → five rows.
  • EXISTS keeps one row per left-side match.

NOT EXISTS is the anti-join — keep rows with no match. Safer than NOT IN when NULL is in play (we covered that in lesson 7).

Rule: use EXISTS / NOT EXISTS when you only need membership, not data from the other side. Cleaner, usually faster, and not susceptible to the “accidentally multiplied my rows” bug.

Run this on your own machine

USE Runehold;
GO

-- Extend our data with a fresh Customer having zero orders
INSERT INTO Sales.Customer (Name, Country)
VALUES ('Zero-order Zoe', 'ES');

-- Query 1: INNER JOIN — orders + customer info, only matching rows
SELECT o.OrderId,
       c.Name,
       o.Total,
       o.CountryCode
FROM Sales.Orders  AS o
JOIN Sales.Customer AS c
    ON c.CustomerId = o.CustomerId
ORDER BY o.OrderId;

-- Query 2: LEFT JOIN — every customer, with orders or NULL
SELECT c.CustomerId,
       c.Name,
       c.Country,
       o.OrderId,
       o.Total
FROM Sales.Customer AS c
LEFT JOIN Sales.Orders AS o
    ON o.CustomerId = c.CustomerId
ORDER BY c.CustomerId;

-- Query 3: the "customers with no orders" anti-join
SELECT c.CustomerId, c.Name, c.Country
FROM Sales.Customer AS c
LEFT JOIN Sales.Orders AS o
    ON o.CustomerId = c.CustomerId
WHERE o.OrderId IS NULL;

-- Query 4: same, but with NOT EXISTS (equivalent, often clearer)
SELECT c.CustomerId, c.Name, c.Country
FROM Sales.Customer AS c
WHERE NOT EXISTS (
    SELECT 1 FROM Sales.Orders o WHERE o.CustomerId = c.CustomerId
);

-- Query 5: ON vs WHERE demo — spot the difference
-- "A" keeps customers without Italian orders, with NULLs
SELECT c.Name, o.OrderId, o.CountryCode
FROM Sales.Customer AS c
LEFT JOIN Sales.Orders AS o
    ON o.CustomerId = c.CustomerId
   AND o.CountryCode = 'IT'
ORDER BY c.CustomerId;

-- "B" effectively becomes an INNER JOIN, filter drops non-matches
SELECT c.Name, o.OrderId, o.CountryCode
FROM Sales.Customer AS c
LEFT JOIN Sales.Orders AS o
    ON o.CustomerId = c.CustomerId
WHERE o.CountryCode = 'IT'
ORDER BY c.CustomerId;

Run queries 5A and 5B. Count the rows. Convince yourself why they differ. This is the exact trap you’ll hit in real code.

Next lesson: GROUP BY, HAVING, and aggregates — building on this lesson’s joins to finally write real business queries like “total revenue per country” and “average order value per customer tier.”

Search