ManyOffer Career

Posted on May 12

Data Analyst Interview Questions 2026: SQL, Case Studies & Behavioral (With Answers)

#dataanalyst #sql #interviewprep

Getting a data analyst role in 2026 means clearing three distinct hurdles in the same loop: SQL technical rounds, open-ended business case problems, and behavioral stories. Most candidates over-prepare on one and bomb the other two.

This guide covers all three — with real questions, structured answers, and a practice link so you can rehearse before the clock starts.

The Three-Round Reality

Round	What They're Testing	Common Mistake
SQL Technical	Can you write accurate, optimized queries?	Forgetting NULLs, using wrong JOIN type
Business Case	Do you think in metrics, not just data?	Jumping to analysis before defining the metric
Behavioral	Have you driven decisions, not just reports?	Saying "I built a dashboard" without the impact

The most common failure mode: treating data analyst interviews like SQL exams. They're business problem-solving interviews that happen to use SQL.

SQL Interview Questions for Data Analysts

SQL rounds are pass/fail in most companies. These are the patterns that come up most often.

1. Calculate Monthly Retention Rate

What they're testing: Window functions, date arithmetic, CTEs.

WITH monthly_active AS (
  SELECT
    user_id,
    DATE_TRUNC('month', event_date) AS activity_month
  FROM events
  GROUP BY 1, 2
),
retention AS (
  SELECT
    m1.activity_month AS cohort_month,
    COUNT(DISTINCT m1.user_id) AS users_in_month,
    COUNT(DISTINCT m2.user_id) AS users_retained_next_month
  FROM monthly_active m1
  LEFT JOIN monthly_active m2
    ON m1.user_id = m2.user_id
    AND m2.activity_month = m1.activity_month + INTERVAL '1 month'
  GROUP BY 1
)
SELECT
  cohort_month,
  users_in_month,
  users_retained_next_month,
  ROUND(100.0 * users_retained_next_month / users_in_month, 2) AS retention_rate_pct
FROM retention
ORDER BY cohort_month;

Use CTEs over nested subqueries — they show you think about readability and debuggability.

2. INNER JOIN vs LEFT JOIN — When to Use Each

Strong answer: Choose LEFT JOIN when you need to preserve the full universe of your base table — e.g., showing all users regardless of whether they completed a purchase. INNER JOIN there silently drops users with no purchases.

Common follow-up trap: "Which is faster?" — The answer depends on data distribution and indexes, not the JOIN type. Saying "LEFT JOIN is slower" signals a misunderstanding.

3. Find Duplicate Rows

SELECT user_id, COUNT(*) AS occurrences
FROM users
GROUP BY user_id
HAVING COUNT(*) > 1;

Extension interviewers add: "Now delete the duplicates, keeping the most recent." This requires ROW_NUMBER() + CTE + DELETE from ranked results.

4. Running Total / Cumulative Sum

SELECT
  order_date,
  revenue,
  SUM(revenue) OVER (ORDER BY order_date) AS running_total
FROM orders;

Add PARTITION BY user_id for per-user running totals.

5. Diagnose a Sudden Traffic Drop

This is a diagnostic business case, not a SQL exercise. Structure your answer:

Confirm the signal: Is it across all sources or one channel? All pages or one section?
Segment: Break by source/medium, device type, geo, landing page.
Correlate with events: Deploys, campaigns, algorithm updates.
Hypothesize + test: Form 2–3 hypotheses, then pull SQL to validate each.
Communicate finding: What's the most likely cause? What's the confidence level?

This question separates candidates who pull data from candidates who think in hypotheses.

Business Case Interview Questions

Define a Metric for User Engagement

Weak answers: "Daily active users" or "time on site."

Better answer: Engagement metrics should map to business outcomes. For a SaaS product, a stronger engagement metric is "core action completion rate" — the percentage of sessions where the user completed the action the product promises. This predicts retention better than time-on-site.

How Do You Handle Missing Data?

Structure your answer around the reason for missing data:

MCAR (Missing Completely at Random): Safe to drop rows or use mean imputation
MAR (Missing at Random): Use model-based imputation
MNAR (Missing Not at Random): The missingness itself is signal — encode it as a feature

Never say "I drop missing rows" without explaining why it's statistically safe to do so.

Behavioral Interview Questions

Describe a Time Your Analysis Changed a Business Decision

Weak: "I analyzed the churn data and made a dashboard."

Strong: "We were about to launch a win-back email campaign targeting all churned users from the past year. I analyzed the underlying churn reasons and found that 40% had churned due to billing failures, not product dissatisfaction. I segmented these users and recommended a separate reactivation flow. The targeted campaign had 3.2× the reactivation rate of the generic blast."

The key structure: what decision was being made → what data revealed → what you recommended → what happened.

Stats & Probability Questions

What Is a p-value?

A p-value is the probability of observing your result (or more extreme) if the null hypothesis is true. A p-value of 0.04 does NOT mean there's a 96% chance your hypothesis is correct.

Statistical vs. Practical Significance

A result can be statistically significant but practically meaningless if the effect size is tiny. Always pair significance with effect size. A 0.1% conversion lift that's statistically significant is not worth building a feature for.

How to Practice Before Your Loop

Reading answers is not the same as saying them out loud under time pressure. Data analyst interviews have a pacing component — most SQL rounds are 30–45 minutes for 2–3 problems.

Read the full article here

Been using ManyOffer to sharpen my own answers — if you want AI mock interviews with real LP feedback, they have a deal running through July worth checking out: Claim 1 free month here

DEV Community