Query Order, NULLs, Joins

SQL is written top-down but logically processed in a different order.

Logical Order

Conceptually:

FROM
JOIN and ON
WHERE
GROUP BY
aggregates
HAVING
window functions
SELECT
DISTINCT
ORDER BY
LIMIT / fetch

Optimizers can reorder physical operations when semantics allow it, but this logical order explains visibility and NULL behavior.

WHERE vs HAVING

WHERE filters rows before aggregation.

SELECT customer_id, COUNT(*) AS paid_orders
FROM orders
WHERE status = 'paid'
GROUP BY customer_id;

HAVING filters groups after aggregation.

SELECT customer_id, COUNT(*) AS paid_orders
FROM orders
WHERE status = 'paid'
GROUP BY customer_id
HAVING COUNT(*) >= 3;

Three-Valued Logic

Comparisons can be TRUE, FALSE, or UNKNOWN. WHERE keeps only TRUE.

-- Does not return rows where country is NULL.
SELECT *
FROM customers
WHERE country <> 'US';

If you want non-US including missing:

SELECT *
FROM customers
WHERE country <> 'US' OR country IS NULL;

NOT IN Trap

NOT IN behaves badly when the subquery can return NULL.

-- Risky if banned_users.user_id contains NULL.
SELECT user_id
FROM customers
WHERE user_id NOT IN (SELECT user_id FROM banned_users);

Prefer NOT EXISTS.

SELECT c.user_id
FROM customers c
WHERE NOT EXISTS (
	SELECT 1
	FROM banned_users b
	WHERE b.user_id = c.user_id
);

LEFT JOIN Filter Trap

A WHERE condition on the right table can turn a left join into an inner join.

-- Drops customers with no orders.
SELECT c.customer_id, o.order_id
FROM customers c
LEFT JOIN orders o
	ON o.customer_id = c.customer_id
WHERE o.status = 'paid';

Move the condition into ON if you want to preserve all customers.

SELECT c.customer_id, o.order_id
FROM customers c
LEFT JOIN orders o
	ON o.customer_id = c.customer_id
	AND o.status = 'paid';

Semi and Anti Joins

Use EXISTS when you only need existence.

SELECT c.customer_id
FROM customers c
WHERE EXISTS (
	SELECT 1
	FROM orders o
	WHERE o.customer_id = c.customer_id
		AND o.status = 'paid'
);

This avoids row multiplication and often enables efficient semijoin plans.

Practice

1. Return customers with no paid orders.

Use an anti join.

SELECT c.customer_id
FROM customers c
WHERE NOT EXISTS (
	SELECT 1
	FROM orders o
	WHERE o.customer_id = c.customer_id
		AND o.status = 'paid'
);

This handles duplicates in orders and avoids NOT IN/NULL issues.

2. Why does COUNT(*) differ from COUNT(column)?

COUNT(*) counts rows. COUNT(column) counts non-NULL values in that column.

SELECT
	COUNT(*) AS rows,
	COUNT(country) AS rows_with_country
FROM customers;

3. Preserve all customers and count their paid orders, including zero.

Filter paid orders before or inside the join, then count right-side keys.

SELECT
	c.customer_id,
	COUNT(o.order_id) AS paid_orders
FROM customers c
LEFT JOIN orders o
	ON o.customer_id = c.customer_id
	AND o.status = 'paid'
GROUP BY c.customer_id;

4. Why is WHERE o.status = 'paid' wrong after a LEFT JOIN when counting zero-order customers?

Customers without matching orders have NULL in o.status. The WHERE predicate removes those rows, so they disappear instead of showing count zero.