Skip to content

Grouping and Windowing

Grouping collapses rows. Windowing annotates rows without collapsing them.

SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;

Output grain: one row per customer.

Every selected non-aggregate expression must be functionally determined by the group keys. If your database allows otherwise, treat it as unsafe unless you can prove determinism.

SELECT
	customer_id,
	COUNT(*) AS total_orders,
	SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END) AS paid_orders,
	SUM(CASE WHEN status = 'paid' THEN total_amount ELSE 0 END) AS paid_revenue
FROM orders
GROUP BY customer_id;

This is portable and often clearer than multiple joins.

SELECT
	order_id,
	customer_id,
	order_ts,
	ROW_NUMBER() OVER (
		PARTITION BY customer_id
		ORDER BY order_ts DESC, order_id DESC
	) AS order_recency_rank
FROM orders;

Output grain: one row per order.

FunctionTiesGaps
ROW_NUMBERbreaks ties arbitrarily unless order is deterministicno gaps
RANKsame rank for tiesgaps
DENSE_RANKsame rank for tiesno gaps

Always include deterministic tie breakers when you need reproducible output.

SELECT
	customer_id,
	order_id,
	order_ts,
	total_amount,
	SUM(total_amount) OVER (
		PARTITION BY customer_id
		ORDER BY order_ts, order_id
		ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
	) AS lifetime_revenue_so_far
FROM orders
WHERE status = 'paid';

Use explicit window frames. Defaults vary and can surprise you, especially with duplicate order keys.

WITH ranked AS (
	SELECT
		o.*,
		ROW_NUMBER() OVER (
			PARTITION BY customer_id
			ORDER BY order_ts DESC, order_id DESC
		) AS rn
	FROM orders o
)
SELECT *
FROM ranked
WHERE rn = 1;
1. Find each customer's first paid order.
WITH ranked AS (
	SELECT
		o.*,
		ROW_NUMBER() OVER (
			PARTITION BY customer_id
			ORDER BY order_ts ASC, order_id ASC
		) AS rn
	FROM orders o
	WHERE status = 'paid'
)
SELECT customer_id, order_id, order_ts, total_amount
FROM ranked
WHERE rn = 1;

Tie breaker order_id makes the result deterministic.

2. Compute daily revenue and 7-day rolling revenue.

Date truncation syntax varies by engine; assume order_date is already present or derive it with your engine’s date function.

WITH daily AS (
	SELECT
		order_date,
		SUM(total_amount) AS revenue
	FROM orders
	WHERE status = 'paid'
	GROUP BY order_date
)
SELECT
	order_date,
	revenue,
	SUM(revenue) OVER (
		ORDER BY order_date
		ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
	) AS revenue_7d
FROM daily;

This is seven rows, not necessarily seven calendar days if dates are missing. For strict calendar windows, join to a date spine first.

3. What is the difference between ROWS and RANGE window frames?

ROWS counts physical rows relative to the current row. RANGE groups peers by order-key value and uses value ranges. With duplicate timestamps or prices, RANGE can include more rows than expected. Use explicit ROWS when you need row-count behavior.

4. Top 3 products by revenue per category.
WITH product_revenue AS (
	SELECT
		p.category,
		oi.product_id,
		SUM(oi.quantity * oi.unit_price) AS revenue
	FROM order_items oi
	JOIN products p
		ON p.product_id = oi.product_id
	GROUP BY p.category, oi.product_id
),
ranked AS (
	SELECT
		*,
		DENSE_RANK() OVER (
			PARTITION BY category
			ORDER BY revenue DESC
		) AS revenue_rank
	FROM product_revenue
)
SELECT category, product_id, revenue
FROM ranked
WHERE revenue_rank <= 3;

Use DENSE_RANK if tied products should all be included. Use ROW_NUMBER if exactly three rows per category are required.