Beyond SELECT *: Master the SQL Queries That Land Jobs You're in the final round. The interviewer shares their screen, opens a schema with a few familiar tables, and says, “Show me how you'd find the answer.” You know SQL, but that isn't the same as performing under pressure. In that moment, interview questions on […]
Beyond SELECT *: Master the SQL Queries That Land Jobs
You're in the final round. The interviewer shares their screen, opens a schema with a few familiar tables, and says, “Show me how you'd find the answer.” You know SQL, but that isn't the same as performing under pressure. In that moment, interview questions on sql queries stop being trivia and start becoming a test of how you reason through messy data, ambiguous requirements, and performance trade-offs.
That's why memorized snippets usually fail. A decent interviewer isn't only checking whether you know JOIN, GROUP BY, or ROW_NUMBER(). They're checking whether you notice duplicate rows, whether you ask how ties should be handled, whether you understand why WHERE and HAVING aren't interchangeable, and whether your query would survive outside a toy dataset.
SQL screening is still a standard part of hiring across major tech companies, and the format is fairly consistent. Phone screens often include a few conceptual questions, while on-site or take-home rounds tend to focus on several hands-on query problems. The same reference also notes that foundational topics like joins, filtering, aggregation, sorting, rolling averages, and year-over-year analysis are common across Facebook, Amazon, Google, Microsoft, and Apple in data and backend hiring according to Coursera's overview of SQL interview formats.
This guide moves fast and stays practical. Each question includes the interviewer’s real intent, multiple valid solutions, what works better at scale, and the mistakes that sink strong candidates. If you're also evaluating your broader stack and workflow, it helps to compare database administration platforms so your practice environment resembles production reality.

This question looks basic. It isn't. Interviewers use it to see whether you understand duplicates, NULL, ranking semantics, and whether you can adapt when they change “second highest” to “Nth highest” halfway through.
A common version uses a subquery:
SELECT MAX(salary)
FROM employees
WHERE salary < (
SELECT MAX(salary)
FROM employees
);
That reads well and works if you want the second distinct salary. It also handles duplicate top salaries correctly because it asks for the highest value below the maximum.
Window functions usually make your thinking clearer:
SELECT salary
FROM (
SELECT
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees
WHERE salary IS NOT NULL
) ranked
WHERE salary_rank = 2;
DENSE_RANK() matters here. If two employees tie for the top salary, ROW_NUMBER() would assign one of them 1 and the other 2, which is wrong for “second highest salary” if the interviewer means second distinct value. DENSE_RANK() preserves the business meaning.
You can also use sorting plus offset:
SELECT DISTINCT salary
FROM employees
WHERE salary IS NOT NULL
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
That’s concise, but I wouldn't lead with it in an interview. It can look like you reached for syntax before clarifying semantics.
Practical rule: Ask whether the interviewer wants the second highest row or the second distinct value. That single question often separates careful engineers from people who memorize patterns.
ROW_NUMBER() answers a different question.NULL: some databases sort NULL in ways that create confusing results.If you want to sound senior, mention performance briefly. On large datasets, sorting is the expensive part in most formulations. The best answer isn't “window functions are always faster.” The best answer is “I’d inspect the execution plan and choose the clearest query that matches the indexing and database dialect.”
Three-table joins expose weak habits fast. The interviewer isn't just checking whether you know INNER JOIN and LEFT JOIN. They're checking whether you understand row multiplication, filter placement, and what happens when one related table has missing records.
Start with a plain example:
SELECT
e.name,
d.department_name
FROM employees e
INNER JOIN departments d
ON e.dept_id = d.id
WHERE d.budget > 50000;
That works when you only want employees who belong to departments that meet the filter. But many candidates lose points when they add optional relationships and accidentally convert an outer join into an inner join.
Consider this version:
SELECT
e.name,
p.project_name
FROM employees e
LEFT JOIN projects p
ON e.id = p.employee_id
WHERE p.status = 'Active';
That WHERE clause removes rows where p is NULL, so your LEFT JOIN no longer behaves like one. If the requirement is “show all employees and their active project if they have one,” move the filter into the join condition:
SELECT
e.name,
p.project_name
FROM employees e
LEFT JOIN projects p
ON e.id = p.employee_id
AND p.status = 'Active';
That’s one of the most common interview traps in all interview questions on sql queries.
When several tables are involved, name the relationship before writing code. Example: employees belong to departments, employees may have zero or more projects. Then write the join in that order.
SELECT
e.name,
d.name AS department_name,
p.name AS project_name
FROM employees e
INNER JOIN departments d
ON e.dept_id = d.id
LEFT JOIN projects p
ON e.id = p.employee_id;
Use aliases, but keep them readable. e, d, and p are fine because the table names are obvious.
If your result count suddenly explodes after adding a join, assume row multiplication first and syntax problems second.
For production-scale thinking, mention selectivity. On large tables, early filtering on selective predicates usually helps. So does avoiding SELECT *, especially when a join already produces wide rows. If the interviewer asks about realism, note that data engineering roles often emphasize SQL more heavily than general backend roles because the work depends on reliable joins and complete datasets, including cases like zero-interaction months in time-series reporting as discussed in the EngageSmart-style SQL interview breakdown on DataLemur.
A common interview scenario goes like this. You get asked for “departments with more than five employees,” and the interviewer watches for one decision first. Do you know which filters belong before aggregation and which belong after it?
A clean answer starts here:
SELECT
department,
COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;
This query answers a group-level question. The interviewer is not only checking whether you remember HAVING. They want to see whether you understand SQL’s execution model well enough to avoid writing reporting queries that look right but answer the wrong question.
The practical rule is simple. WHERE filters individual rows before groups are formed. HAVING filters groups after aggregate values exist.
That distinction matters in real systems.
SELECT
project_id,
COUNT(*) AS team_size,
AVG(salary) AS avg_salary
FROM employees
WHERE salary IS NOT NULL
GROUP BY project_id
HAVING AVG(salary) > 60000;
Here, WHERE salary IS NOT NULL fixes input quality before aggregation. HAVING AVG(salary) > 60000 applies a business rule to the finished groups. Candidates who can explain that difference usually write more reliable analytics SQL under pressure.
Interviewers also use this topic to test whether you can spot hidden correctness issues:
GROUP BY, unless the database permits nonstandard behavior.COUNT(*) counts rows. COUNT(column) counts only non-NULL values in that column.Date-based aggregation is where many answers get sloppy:
SELECT
DATE(hire_date) AS hire_day,
COUNT(*) AS new_hires
FROM employees
GROUP BY DATE(hire_date)
ORDER BY new_hires DESC;
This works, but a strong interview answer goes one step further and explains the trade-off. Applying DATE() in both SELECT and GROUP BY is readable, but on large tables it can force the database to compute that expression for many rows. In production, I would check whether the engine can still optimize it well, whether a generated column or date dimension would help, and whether the reporting grain belongs at the day level.
That kind of thinking matters beyond syntax. Aggregated query results often feed charts, dashboards, and executive summaries, so the grouping logic needs to match the visual story. Teams that publish reporting should also follow data visualization best practices for aggregated metrics so a correct query does not turn into a misleading chart.
A strong candidate treats GROUP BY questions as more than “write the query.” Explain what the business question is, decide whether the filter is row-level or group-level, and call out the performance cost of grouping expressions, high-cardinality columns, and unnecessary scans. That turns a basic SQL question into evidence that you can build reporting queries people can trust.

Interview performance often separates competent SQL users from people who think analytically about data. Window functions let you calculate across related rows without collapsing them into one row per group. That’s a huge mental shift.
A ranking example is the standard entry point:
SELECT
department,
name,
salary,
RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS dept_rank
FROM employees;
The interviewer isn't only checking syntax. They're checking whether you know why RANK() and DENSE_RANK() produce different outputs when ties exist.
A running total stays readable with a window function:
SELECT
sale_date,
revenue,
SUM(revenue) OVER (
ORDER BY sale_date
) AS cumulative_revenue
FROM sales;
And comparing each row to the previous one becomes much cleaner with LAG():
SELECT
sale_date,
revenue,
LAG(revenue) OVER (ORDER BY sale_date) AS prev_revenue,
revenue - LAG(revenue) OVER (ORDER BY sale_date) AS revenue_change
FROM sales;
This kind of pattern shows up in real reporting work. If you build dashboards or analytics outputs, the database query design affects what the chart can say. That’s the same discipline behind good data visualization best practices. Clean analytical queries produce clean visual narratives.
Window functions aren't magic. They often require sorting within each partition or across the full result set. On large datasets, that can be expensive. But they often replace more awkward self-joins or correlated subqueries, which can be even worse for clarity and execution.
Candidates who practice these patterns repeatedly tend to perform better because these aren't one-off syntax tricks. They mirror real production problems like rankings, duplicate detection, and trend analysis. DataCamp’s SQL prep coverage highlights real-world interview questions across joins, subqueries, indexing, and window-function-heavy scenarios from companies like Google, Amazon, and Oracle in its SQL interview question guide.
Subqueries are where many candidates either overcomplicate everything or refuse to use them at all. Neither extreme is good. A subquery is fine when it expresses the logic cleanly. It becomes a problem when it hides expensive repeated work.
Start with a readable IN example:
SELECT name
FROM employees
WHERE id IN (
SELECT employee_id
FROM projects
WHERE status = 'Active'
);
That’s easy to follow. If the subquery returns a large result set or if NULL behavior matters, EXISTS may be the safer expression.
A classic interview prompt asks for employees who earn more than their department average:
SELECT e.name
FROM employees e
WHERE salary > (
SELECT AVG(salary)
FROM employees
WHERE department = e.department
);
That’s a correlated subquery because the inner query depends on the current outer row. It’s elegant. It can also be expensive if the optimizer can't rewrite it efficiently.
A join or window-function alternative may be better:
SELECT name
FROM (
SELECT
name,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees
) x
WHERE salary > dept_avg;
That version often makes your intent clearer.
EXISTS when existence matters more than returned values.IN and NULL.A good SQL answer isn't the one with the fewest lines. It's the one whose semantics are obvious and whose execution characteristics you can defend.
A lot of interviewers also use subquery prompts to see whether you understand nested logic placement: subqueries in WHERE, in FROM, and inside SELECT. That's useful because real systems mix all three. If you can explain why your chosen form is easier to maintain, you're already answering at a more senior level.
Candidates often answer this one too fast: “UNION removes duplicates, UNION ALL doesn't.” True, but incomplete. A deeper test is whether you understand when deduplication is necessary and what it costs.
A basic example:
SELECT name, 'Employee' AS person_type
FROM employees
UNION
SELECT name, 'Contractor' AS person_type
FROM contractors;
This combines rows and removes duplicates across the final result. That duplicate elimination usually requires extra work, typically sorting or hashing depending on the engine.
UNION ALL is usually the default starting pointIf duplicates are valid or expected, use UNION ALL:
SELECT revenue
FROM sales_2023
UNION ALL
SELECT revenue
FROM sales_2024;
That avoids unnecessary deduplication. In interviews, say this clearly: choose UNION ALL unless the business rule requires duplicate removal. That's a practical default.
A common reporting pattern:
SELECT department, COUNT(*) AS row_count
FROM employees
GROUP BY department
UNION ALL
SELECT 'Total', COUNT(*)
FROM employees;
This is readable and often easier than forcing the total row into the same aggregation logic with more complex grouping syntax.
ORDER BY placement: order the final combined set, not each branch individually unless wrapped intentionally.UNION out of habit: that creates hidden work.When candidates know both operators but never mention cost, they sound theoretical. When they say, “I’d start with UNION ALL and only pay for deduplication if the output contract requires it,” they sound like someone who has handled real data volumes.
This question reveals whether you can encode business rules inside SQL without turning the query into unreadable sludge. CASE is powerful, but many candidates either misuse it in WHERE clauses or bury logic to a point where it becomes unmaintainable.
A straightforward classification query:
SELECT
name,
salary,
CASE
WHEN salary > 100000 THEN 'Senior'
WHEN salary > 70000 THEN 'Mid'
ELSE 'Junior'
END AS level
FROM employees;
This is fine because the rule is visible. The interviewer will often push further and ask for conditions that vary by department, tenure, or missing values.
A better showcase is using CASE inside aggregates:
SELECT
department,
COUNT(CASE WHEN experience > 5 THEN 1 END) AS senior_count,
COUNT(*) AS total_count
FROM employees
GROUP BY department;
That pattern appears constantly in analytics work because it lets you calculate multiple categorized metrics in one pass.
Handling missing values also matters:
SELECT
name,
COALESCE(phone, email, 'No contact info') AS contact
FROM customers;
COALESCE is simple, readable, and usually preferable to a longer CASE for fallback logic.
ELSE so unexpected values don't vanish into NULL.CASE unless the query belongs in the database layer.A senior answer acknowledges readability as a technical constraint. If someone else can't debug the rule under pressure, the query isn't production-grade, even if it returns the right rows.
Many SQL mistakes aren't syntax mistakes. They're NULL mistakes. Interviewers know that. That's why they keep asking about missing managers, absent commission values, optional foreign keys, and incomplete records.
A classic example:
SELECT name
FROM employees
WHERE manager_id IS NULL;
That’s basic, but it matters because = NULL doesn't work the way many people still expect. SQL uses three-valued logic, and NULL means unknown, not zero, empty string, or false.
NULL changes aggregates and filtersThese two counts do different things:
SELECT
COUNT(*) AS total_rows,
COUNT(commission) AS rows_with_commission
FROM sales;
COUNT(*) includes every row. COUNT(commission) ignores NULL. That distinction affects reporting, compensation dashboards, and data quality checks.
Fallback values are often necessary:
SELECT
name,
CASE
WHEN bonus_amount IS NULL THEN 0
ELSE bonus_amount
END AS bonus_amount
FROM employees;
Or more compactly:
SELECT
name,
COALESCE(bonus_amount, 0) AS bonus_amount
FROM employees;
NULLdoesn't behave like an ordinary value. Every comparison rule you rely on should be rechecked when missing data enters the picture.
Three problem areas show up repeatedly:
price > 100 excludes rows where price is NULL.NULL values unless you specify it.The strongest answers treat NULL as a data contract issue, not just a syntax quirk. If a column can be missing, say what that absence means. Unknown? Not applicable? Not yet populated? That interpretation affects the query every bit as much as the schema does.
You write a query in two minutes. Production spends the next six months paying for it.
That is why interviewers ask optimization questions. They are not checking whether you can recite CREATE INDEX. They want to see whether you understand how the optimizer reads a query, where indexes help, and when a rewrite changes readability more than runtime.
A basic index example is easy:
CREATE INDEX idx_employee_dept
ON employees(department_id);
The stronger answer connects that index to a specific access pattern. If queries filter by department_id, join on it, or group related rows through that key, the index may cut I/O. If the table is small, or the predicate is not selective, the optimizer may still choose a scan. That trade-off matters more than naming index types from memory.
Start with a query that often appears in interviews:
SELECT *
FROM orders
WHERE customer_id IN (
SELECT id
FROM customers
WHERE country = 'USA'
)
AND order_date > '2024-01-01';
A common rewrite is:
SELECT o.*
FROM orders o
INNER JOIN customers c
ON o.customer_id = c.id
WHERE c.country = 'USA'
AND o.order_date > '2024-01-01';
The rewrite can make intent clearer, but clarity is not the same as speed. Many modern optimizers transform IN, EXISTS, and joins into similar plans. The right answer in an interview is, "I would check the execution plan before claiming the join is faster." That shows judgment.
Then get specific about what you would inspect. Are both tables using index seeks or falling back to scans? Is the join cardinality estimate reasonable? Is order_date filtered early, or is the engine touching far more rows than necessary? Those are the questions that separate query tuning from query styling.
Composite indexes come up for the same reason:
CREATE INDEX idx_sales_date_product
ON sales(sale_date, product_id);
Column order is the whole point. An index on (sale_date, product_id) helps queries that start with sale_date far more than queries that filter only on product_id. Candidates often know composite indexes exist but cannot explain left-prefix behavior, sort support, or the extra write cost each index adds on inserts and updates.
A practical optimization answer usually includes these points:
OFFSET values often force the database to skip many rows first.Interviewers also like to probe mistakes. SELECT * can block covering-index opportunities. Wrapping order_date in DATE(order_date) can turn an index-friendly predicate into a scan. Adding five overlapping indexes can improve one benchmark and slow every write path in the system. Good candidates mention those costs without being prompted.
If you work across data stores, it also helps to compare indexing strategy with document databases. Teams hiring for broader backend roles may pair SQL tuning questions with MongoDB interview questions for backend and data roles to see whether you can reason about query patterns across different engines.
For more hands-on examples of plan analysis and tuning workflow, Pratt Solutions technical consulting for SQL is a useful reference.

Recursive SQL questions don't show up in every interview, but when they do, they expose whether you understand iterative thinking inside a declarative language. Organizational charts, category trees, folder structures, and dependency graphs all lead here.
A standard recursive CTE for an employee hierarchy looks like this:
WITH RECURSIVE emp_hierarchy AS (
SELECT
id,
name,
manager_id,
0 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT
e.id,
e.name,
e.manager_id,
eh.level + 1
FROM employees e
INNER JOIN emp_hierarchy eh
ON e.manager_id = eh.id
)
SELECT *
FROM emp_hierarchy;
The anchor query finds root rows. The recursive member walks downward. If you can't explain those two roles clearly, the syntax won't save you.
Recursive queries need a natural stopping point. Usually that's when no more child rows match. But interviewers may ask about cycles, maximum depth, or accidental infinite recursion.
That’s where the stronger answer goes beyond the first working query. Mention depth tracking, cycle detection if the dialect supports it, and careful testing on small samples before running against production-scale trees.
A path-building variation is also useful:
WITH RECURSIVE paths AS (
SELECT
id,
parent_id,
CAST(id AS VARCHAR(255)) AS path
FROM nodes
WHERE parent_id IS NULL
UNION ALL
SELECT
n.id,
n.parent_id,
CONCAT(p.path, ',', n.id)
FROM nodes n
INNER JOIN paths p
ON n.parent_id = p.id
)
SELECT *
FROM paths;
Recursive SQL forces you to think about data shape, not just syntax. Trees are common in relational systems, and sometimes the better design isn't recursive SQL at all. Materialized paths or another model may be easier to query depending on the workload.
That trade-off becomes even more interesting if your team works across non-relational stores too. If you want to compare how hierarchy questions shift outside SQL, the MongoDB interview questions guide is a useful contrast because document databases reward different modeling choices.
| Topic | Implementation complexity | Resource requirements | Expected outcomes | Ideal use cases | Key advantages |
|---|---|---|---|---|---|
| Finding the Second Highest Salary | Low–Medium (subquery or window functions) | Low; requires window-function support for advanced solutions | Retrieve correct 2nd-highest salary handling ties/NULLs | Interview questions, payroll queries, simple reports | Tests core SQL, aggregates, and edge-case handling |
| Joining Multiple Tables and Filtering Results | Medium | Medium; benefits from proper indexes and join-friendly schema | Combined records across tables with accurate filters | Normalized schemas, API backends, multi-table reports | Validates relational design and JOIN logic |
| Aggregating Data with GROUP BY and HAVING | Medium | Medium to high for large groups; indexing on group columns helps | Group-level summaries and filtered aggregates | Business reports, dashboards, analytics queries | Essential for summarization and reporting |
| Window Functions and Analytics Queries | High | Medium–High; DB must support window functions; can be memory‑intensive | Rankings, running totals, row-wise analytics without collapsing rows | Time-series analysis, advanced analytics, leaderboards | Powerful, concise analytics; often more performant than subqueries |
| Subqueries and Correlated Subqueries | Medium–High | Medium–High; correlated subqueries can be costly per row | Row-relative comparisons and nested filtering | Complex conditional logic, correlated calculations | Expressive for problems hard to model with JOINs |
| UNION and UNION ALL Operations | Low | Low–Medium; UNION adds dedup overhead | Combined result sets across queries, optionally deduplicated | Data consolidation, cross-source reporting, simple unions | Simple, readable set operations for consolidating results |
| Complex Filtering with CASE and CONDITIONAL Logic | Medium | Low–Medium; CPU cost depends on logic complexity | Categorized or transformed columns based on business rules | Embedding business rules, conditional reporting | Keeps logic in SQL, reduces app-side post-processing |
| Handling NULLs and Data Quality Issues | Low–Medium | Low; requires awareness of DB NULL semantics | Robust queries that correctly handle missing data | Real-world datasets, ETL, data validation | Prevents subtle bugs; improves data correctness |
| Query Optimization and Indexing Strategies | High | High; requires indexes, monitoring tools, realistic data volumes | Faster query execution and reduced DB load | Scaling systems, high-volume applications, performance tuning | Dramatically improves performance and scalability |
| Recursive Queries and Hierarchical Data | High | Medium–High; needs CTE/recursive support and careful resource limits | Hierarchy traversal, path generation, depth calculations | Org charts, category trees, dependency resolution | Elegant hierarchical queries; clearer than self-joins |
Mastering interview questions on sql queries isn't just about surviving a technical screen. It's about proving that you can reason about data under constraints, choose the right pattern for the job, and explain your trade-offs like someone who will be trusted with production systems.
That distinction matters more than most candidates realize. Plenty of engineers can produce a query that returns the right rows on a tiny sample table. Fewer can explain why a LEFT JOIN became an accidental inner join, why DENSE_RANK() fits a salary question better than ROW_NUMBER(), why HAVING changes the meaning of a report, or why a correlated subquery may be elegant but risky at scale. Interviewers notice that difference quickly.
The pattern across these ten questions is simple. The SQL syntax is only the surface. Underneath, every prompt is testing one or more of these deeper habits:
NULL alter your result? Did your aggregation answer the question posed?That’s what turns a candidate into a contributor. A strong interview answer doesn't sound like a snippet repository. It sounds like someone making deliberate choices.
You also shouldn't expect real interviews to stay confined to beginner syntax. SQL hiring has broadened beyond simple query writing into execution plans, indexing strategy, and production-minded tuning. At the same time, there are still major gaps in common prep material. Distributed query optimization across sharded or multi-cloud systems remains undercovered, and security-first query design is still treated too lightly in many interview repositories, even though both are increasingly relevant in serious engineering environments as noted in Dataquest’s broader SQL interview coverage gap and in Devinterview-io’s discussion of underrepresented security concerns in SQL interviewing. If you're targeting senior roles, think beyond textbook joins and aggregations.
Practice the way you'll perform. Don't just memorize final answers. Rewrite each query three ways. Add bad data. Add duplicates. Remove related records. Change the requirement from “top salary” to “top salary per department.” Run EXPLAIN. Ask yourself what breaks first. That kind of practice builds the judgment interviewers want.
If you're looking for roles where that judgment matters, HireDevelopers.com is built for exactly that level of technical screening. The platform helps companies hire rigorously vetted engineers across data, backend, DevOps, AI, and full-stack roles, with flexible hiring models across regions and time zones. HireDevelopers.com states that candidates go through multi-stage vetting and that clients receive a customized shortlist within 24 hours, with developers averaging 7+ years of experience and coverage across 800+ validated technologies. It also describes itself as connecting the top 1% of talent and highlights meaningful cost savings versus traditional hiring for many clients, all within its publisher overview on HireDevelopers.com.
Prepare with depth, not just speed. The best SQL interview answers show that you don't just know how to query data. You know how to think like the person who will own it.
You’re probably staring at a half-finished resume right now, trying to solve two conflicting problems at once. You want it to be technical enough to prove you can build production software, but simple enough that a recruiter, an ATS, and a hiring manager can all understand it fast. That tension is why most web developer […]
You post a data engineer job description on Monday. By Wednesday, your inbox is full of resumes from BI developers, junior analysts, backend engineers who touched a warehouse once, and data science candidates who want to build models, not pipelines. That’s not a candidate problem. It’s a job description problem. A vague JD attracts vague […]
USD 178.6 billion in 2025, projected to reach USD 509.2 billion by 2035. That’s the scale of offshore software development now, according to Business Research Insights. If you still treat offshore development services as a cheap staffing trick, you’re reading the market wrong. The key question isn’t whether offshore works. It does, for a lot […]