Skip to content

How To Use This

Use this repo as an interview simulator, not passive reading.

  1. Read one concept page.
  2. Do ten revealable problems without opening answers.
  3. For every miss, write the failed assumption: grain, NULL, cardinality, time, or performance.
  4. Rewrite two queries for performance and explain why the plan should improve.
  5. Speak one staff scenario out loud in a five-minute answer.

Before touching syntax, say:

  • “The output grain is one row per …”
  • “The event-time boundary is …”
  • “The join from A to B is …”
  • “The rows that must be preserved are …”
  • “The performance risk is …”

This habit makes your answer sound senior and prevents common errors.

The examples avoid vendor-specific features where possible. Some topics are inherently engine-dependent:

  • date arithmetic
  • percentile functions
  • approximate distinct counts
  • generated columns
  • index syntax
  • JSON extraction
  • materialized view refresh
  • partition DDL
  • Spark/Dask execution knobs

When syntax differs by engine, focus on the portable intent and say the exact function varies by database.

Do not memorize final queries. Memorize the decomposition:

  1. Pick the correct grain.
  2. Build one CTE per semantic step.
  3. Validate row counts after each high-risk join.
  4. Move filters to the earliest safe location.
  5. Replace row-multiplying joins with EXISTS, pre-aggregation, or window filters when appropriate.
ScoreMeaning
1Query does not compile or misses the output grain.
2Query works for happy path but fails NULLs, duplicates, or time boundaries.
3Query is correct and readable.
4Query is correct and you can explain the likely plan.
5You can discuss correctness, optimization, distributed execution, and production tests.