Vivek Kumar

Posted on May 12

Don't Trust AI-Generated SQL Blindly: A Developer's Validation Checklist

#sql #database #ai #webdev

AI coding assistants can knock out a SQL query in seconds. You describe what you want in plain English, and out comes a SELECT statement that looks completely reasonable. So you copy it, paste it into your app or dashboard query runner, and hit execute.

This is where things go wrong.

Unlike a syntax error that crashes loudly, a semantically wrong SQL query runs successfully and returns a result. The result looks plausible. You ship the dashboard. A week later, someone notices the revenue numbers are 15% off because the AI joined on the wrong foreign key. The silent failure is the dangerous one.

This post is a practical checklist for validating AI-generated SQL before it touches your production database — whether you're building a reporting feature, an embedded dashboard, or using a text-to-SQL tool internally.

Why AI Gets SQL Wrong

Large language models generate SQL by predicting likely token sequences based on training data. They don't execute the query or check your actual schema — they hallucinate a plausible-looking result. The four failure patterns that show up most often are:

1. Schema hallucination — the model invents column names that don't exist, or references the right column from the wrong table.

2. Faulty joins — missing ON clauses (producing accidental Cartesian products), wrong join keys, or incorrect join types (LEFT vs INNER).

3. Aggregation mistakes — applying COUNT or SUM to the wrong column, forgetting GROUP BY columns, or confusing WHERE (row-level) with HAVING (group-level).

4. Missing filters — dropping a WHERE clause entirely, or missing a tenant/user scoping condition that you absolutely need for correctness and security.

A 2026 benchmark study found that even top models only achieve ~78% execution accuracy on zero-shot text-to-SQL tasks. That means roughly 1 in 5 queries is wrong — often in ways that won't throw an error.

The Validation Checklist

✅ 1. Check Every Column Against Your Actual Schema

Before anything else, look up every column name in the generated query against your real schema. Don't assume — AI models frequently confuse similar column names like user_id vs account_id, or reference created_at on a table that uses created_date.

-- AI generated this
SELECT u.name, o.total_amount
FROM users u
JOIN orders o ON u.user_id = o.user_id
WHERE o.status = 'completed';

-- Your schema actually uses:
-- users.id (not users.user_id)
-- orders.customer_id (not orders.user_id)
-- orders.amount (not orders.total_amount)

Result if run as-is: either a syntax error (best case) or a silent Cartesian product (worst case). The fix is one minute of schema cross-referencing.

✅ 2. Trace Every JOIN Condition

For each JOIN, ask yourself:

Is the ON condition using the actual primary key → foreign key relationship?
Is the join type (INNER, LEFT, RIGHT) what you actually want?
Could this produce duplicate rows?

-- Potentially dangerous: AI used INNER JOIN, dropping users without orders
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
INNER JOIN orders o ON u.id = o.customer_id
GROUP BY u.name;

-- Correct if you want ALL users, including those with zero orders:
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.customer_id
GROUP BY u.name;

The INNER JOIN version silently excludes users with no orders. If you're building a "user activity" report, that's a significant data gap.

✅ 3. Verify Aggregation Logic

Check that:

GROUP BY includes all non-aggregated columns in SELECT
SUM and COUNT are applied to the right columns
HAVING is used for filtering aggregated results (not WHERE)

-- Bug: filtering on an aggregate in WHERE (will error in most databases)
SELECT customer_id, SUM(amount) AS total
FROM orders
WHERE SUM(amount) > 1000
GROUP BY customer_id;

-- Correct:
SELECT customer_id, SUM(amount) AS total
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 1000;

AI tools make this exact mistake surprisingly often, especially when you phrase the question as "show me customers who spent more than $1000."

✅ 4. Check for Missing Tenant/User Scoping

If you're building a multi-tenant SaaS app or any system where different users should only see their own data, this is the most critical check. AI tools have no idea your events table contains data from 500 different customers.

-- AI generates this when asked "show me top pages by view count"
SELECT page_url, COUNT(*) AS views
FROM page_events
GROUP BY page_url
ORDER BY views DESC
LIMIT 10;

-- What you actually need in a multi-tenant app:
SELECT page_url, COUNT(*) AS views
FROM page_events
WHERE account_id = :current_account_id  -- ← THIS is what the AI forgot
GROUP BY page_url
ORDER BY views DESC
LIMIT 10;

Running the AI version against production would expose every customer's data to every other customer. Always add tenant scoping filters as a mandatory review step.

✅ 5. Run Against a Staging Database First

Never run an untested AI-generated query directly on production — especially anything that writes data. For SELECT queries, use a staging copy of your database and compare:

Row counts (are they in the expected ballpark?)
Known values (does the result match what you'd expect for a test account?)
Edge cases (what happens with NULL values? What about accounts with no data?)

-- Spot-check: does this query return the right count for a known account?
SELECT COUNT(*) FROM orders WHERE customer_id = 42;  -- you know this customer has 7 orders

If the result is 7, you're on the right track. If it's 0 or 700, something is wrong with the query logic.

✅ 6. Watch Out for NULL Handling

AI-generated queries often miss NULL edge cases. The = operator doesn't work with NULL — it silently excludes rows.

-- This will miss rows where discount_code IS NULL
SELECT * FROM orders WHERE discount_code != 'PROMO10';

-- Correct if you want all orders without that specific code, including nulls:
SELECT * FROM orders
WHERE discount_code != 'PROMO10' OR discount_code IS NULL;

This is especially common in WHERE filters on optional fields like coupon codes, referral sources, or plan tiers.

✅ 7. For Write Operations: Always Dry-Run First

If the AI generates an UPDATE, INSERT, or DELETE, wrap it in a transaction and inspect the affected rows before committing:

BEGIN;

UPDATE subscriptions
SET status = 'cancelled'
WHERE trial_ends_at < NOW() AND status = 'trialing';

-- Check what would be affected before committing:
SELECT COUNT(*) FROM subscriptions
WHERE trial_ends_at < NOW() AND status = 'trialing';

ROLLBACK;  -- Don't commit until you're sure

This gives you a preview of the blast radius before any data changes are permanent.

A Simple Pre-Run Mental Model

Before executing any AI-generated query, ask yourself:

Question	What to check
Do all columns exist?	Cross-reference against schema
Are all JOINs correct?	Verify keys and join types
Is aggregation right?	Check GROUP BY, HAVING vs WHERE
Is tenant data scoped?	Look for account_id / user_id filters
Are NULLs handled?	Check optional column filters
Is this a write operation?	Use BEGIN/ROLLBACK to preview

Common Mistakes to Watch For

Trusting output that "looks right" — plausible-looking results are the most dangerous kind of wrong
Skipping staging for read-only queries — even SELECTs can cause problems (performance, data leakage) if they're wrong
Not providing schema context to the AI — the better your prompt (including table definitions and sample values), the more accurate the output
Running AI queries against production without row limits — add LIMIT 100 while testing to avoid accidentally returning or processing millions of rows

Key Takeaways

AI tools are genuinely useful for writing SQL faster — especially for repetitive patterns and boilerplate joins. But they're drafting assistants, not a replacement for your understanding of the data model. Every AI-generated query deserves at least a 30-second review against the checklist above.

The silent failure — a query that runs cleanly but returns wrong data — is what makes SQL validation non-negotiable. A 15% revenue discrepancy discovered a week after shipping is a bad day. A 30-second schema check is not.

Do you have a validation workflow you follow for AI-generated SQL? Share it in the comments — I'd love to hear how other teams handle this, especially in multi-tenant or high-stakes reporting contexts.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.