Getting a data analyst role in 2026 means clearing three distinct hurdles in the same loop: SQL technical rounds, open-ended business case problems, and behavioral stories. Most candidates over-prepare on one and bomb the other two.
This guide covers all three — with real questions, structured answers, and a practice link so you can rehearse before the clock starts.
The Three-Round Reality
| Round | What They're Testing | Common Mistake |
|---|---|---|
| SQL Technical | Can you write accurate, optimized queries? | Forgetting NULLs, using wrong JOIN type |
| Business Case | Do you think in metrics, not just data? | Jumping to analysis before defining the metric |
| Behavioral | Have you driven decisions, not just reports? | Saying "I built a dashboard" without the impact |
The most common failure mode: treating data analyst interviews like SQL exams. They're business problem-solving interviews that happen to use SQL.
SQL Interview Questions for Data Analysts
SQL rounds are pass/fail in most companies. These are the patterns that come up most often.
1. Calculate Monthly Retention Rate
What they're testing: Window functions, date arithmetic, CTEs.
WITH monthly_active AS (
SELECT
user_id,
DATE_TRUNC('month', event_date) AS activity_month
FROM events
GROUP BY 1, 2
),
retention AS (
SELECT
m1.activity_month AS cohort_month,
COUNT(DISTINCT m1.user_id) AS users_in_month,
COUNT(DISTINCT m2.user_id) AS users_retained_next_month
FROM monthly_active m1
LEFT JOIN monthly_active m2
ON m1.user_id = m2.user_id
AND m2.activity_month = m1.activity_month + INTERVAL '1 month'
GROUP BY 1
)
SELECT
cohort_month,
users_in_month,
users_retained_next_month,
ROUND(100.0 * users_retained_next_month / users_in_month, 2) AS retention_rate_pct
FROM retention
ORDER BY cohort_month;
Use CTEs over nested subqueries — they show you think about readability and debuggability.
2. INNER JOIN vs LEFT JOIN — When to Use Each
Strong answer: Choose LEFT JOIN when you need to preserve the full universe of your base table — e.g., showing all users regardless of whether they completed a purchase. INNER JOIN there silently drops users with no purchases.
Common follow-up trap: "Which is faster?" — The answer depends on data distribution and indexes, not the JOIN type. Saying "LEFT JOIN is slower" signals a misunderstanding.
3. Find Duplicate Rows
SELECT user_id, COUNT(*) AS occurrences
FROM users
GROUP BY user_id
HAVING COUNT(*) > 1;
Extension interviewers add: "Now delete the duplicates, keeping the most recent." This requires ROW_NUMBER() + CTE + DELETE from ranked results.
4. Running Total / Cumulative Sum
SELECT
order_date,
revenue,
SUM(revenue) OVER (ORDER BY order_date) AS running_total
FROM orders;
Add PARTITION BY user_id for per-user running totals.
5. Diagnose a Sudden Traffic Drop
This is a diagnostic business case, not a SQL exercise. Structure your answer:
- Confirm the signal: Is it across all sources or one channel? All pages or one section?
- Segment: Break by source/medium, device type, geo, landing page.
- Correlate with events: Deploys, campaigns, algorithm updates.
- Hypothesize + test: Form 2–3 hypotheses, then pull SQL to validate each.
- Communicate finding: What's the most likely cause? What's the confidence level?
This question separates candidates who pull data from candidates who think in hypotheses.
Business Case Interview Questions
Define a Metric for User Engagement
Weak answers: "Daily active users" or "time on site."
Better answer: Engagement metrics should map to business outcomes. For a SaaS product, a stronger engagement metric is "core action completion rate" — the percentage of sessions where the user completed the action the product promises. This predicts retention better than time-on-site.
How Do You Handle Missing Data?
Structure your answer around the reason for missing data:
- MCAR (Missing Completely at Random): Safe to drop rows or use mean imputation
- MAR (Missing at Random): Use model-based imputation
- MNAR (Missing Not at Random): The missingness itself is signal — encode it as a feature
Never say "I drop missing rows" without explaining why it's statistically safe to do so.
Behavioral Interview Questions
Describe a Time Your Analysis Changed a Business Decision
Weak: "I analyzed the churn data and made a dashboard."
Strong: "We were about to launch a win-back email campaign targeting all churned users from the past year. I analyzed the underlying churn reasons and found that 40% had churned due to billing failures, not product dissatisfaction. I segmented these users and recommended a separate reactivation flow. The targeted campaign had 3.2× the reactivation rate of the generic blast."
The key structure: what decision was being made → what data revealed → what you recommended → what happened.
Stats & Probability Questions
What Is a p-value?
A p-value is the probability of observing your result (or more extreme) if the null hypothesis is true. A p-value of 0.04 does NOT mean there's a 96% chance your hypothesis is correct.
Statistical vs. Practical Significance
A result can be statistically significant but practically meaningless if the effect size is tiny. Always pair significance with effect size. A 0.1% conversion lift that's statistically significant is not worth building a feature for.
How to Practice Before Your Loop
Reading answers is not the same as saying them out loud under time pressure. Data analyst interviews have a pacing component — most SQL rounds are 30–45 minutes for 2–3 problems.
Been using ManyOffer to sharpen my own answers — if you want AI mock interviews with real LP feedback, they have a deal running through July worth checking out: Claim 1 free month here
Top comments (0)