DEV Community

spkibe
spkibe

Posted on

The Silent Bug That Exposed All Tenant Data in Databricks Unity Catalog

We were building a multi-tenant data platform on Databricks. Multiple organisations sharing the same physical tables — each one should see only their own rows. Standard stuff.
We implemented it using Unity Catalog's row-level security and column masking. The functions compiled. The filter showed as applied in DESCRIBE EXTENDED. Every test from the admin account looked perfect.
Then we logged in as a real tenant user.
They could see every tenant's data.

What Row-Level Security and Column Masking Actually Do
Before getting to the bug, a quick primer on how Unity Catalog security works — because understanding the mechanism is what makes the bug obvious in hindsight.

Row-Level Security — Row Filters
A row filter is a SQL function you attach to a table. Unity Catalog calls it automatically on every query, passing the value of a specified column from each row. If the function returns TRUE, the row is shown. If it returns FALSE, the row is completely hidden — not counted, not visible, not even hinted at.

-- Attach a row filter to a table
ALTER TABLE my_catalog.my_schema.my_table
  SET ROW FILTER my_catalog.governance.filter_by_tenant
  ON (TENANT_KEY);
Enter fullscreen mode Exit fullscreen mode

The user never writes a WHERE clause for this. They cannot remove it. It fires invisibly on every query from every tool — SQL editor, notebook, BI connection, API call.

Column-Level Masking — Column Masks
A column mask is a SQL function attached to a specific column. Instead of hiding rows, it transforms values at query time. The row is visible but sensitive fields are replaced, generalized, or redacted based on who is asking.

-- Attach a column mask
ALTER TABLE my_catalog.my_schema.my_table
  ALTER COLUMN FIRST_NAME
  SET MASK my_catalog.governance.mask_name;
Enter fullscreen mode Exit fullscreen mode

The same SELECT returns different values depending on the user's group membership:

One table. One query. Different results per role. Platform-enforced.

Why This Matters
The old approach — dynamic views, one per tenant per role — requires you to trust that every developer always queries the right view, that views stay in sync with schema changes, and that no one ever accidentally gets direct table access. Unity Catalog removes all of that trust dependency. Security lives at the storage engine layer, not the SQL layer.

The Bug
Here is the row filter function we wrote:

CREATE OR REPLACE FUNCTION
my_catalog.governance.filter_by_tenant(tenant_key BIGINT)
RETURNS BOOLEAN
RETURN
  IS_ACCOUNT_GROUP_MEMBER('admin_group')
  OR
  EXISTS (
    SELECT 1
    FROM my_catalog.governance.tenant_group_mapping tgm
    WHERE IS_ACCOUNT_GROUP_MEMBER(tgm.group_name)
      AND CAST(tgm.tenant_key AS BIGINT) = tenant_key
  );
Enter fullscreen mode Exit fullscreen mode

Read it carefully.
The function parameter is named tenant_key.
The mapping table column is also named tenant_key.
In the WHERE clause:

AND CAST(tgm.tenant_key AS BIGINT) = tenant_key

SQL sees two references to tenant_key. It resolves both as the table column tgm.tenant_key. The function parameter is completely ignored.

The comparison becomes:

tgm.tenant_key = tgm.tenant_key

Why It Was So Hard to Spot

  1. No error was thrown. The function compiled without warnings. Unity Catalog reported it as valid SQL.
  2. DESCRIBE EXTENDED showed the filter was applied. Row Filter: my_catalog.governance.filter_by_tenant(TENANT_KEY)

Everything looked correct at the metadata level. The filter was attached. The problem was invisible in the schema description.

  1. Admin tests passed. Our initial testing was done from an admin account. The admin bypass (IS_ACCOUNT_GROUP_MEMBER('admin_group')) fires before the EXISTS check, so it returned TRUE for the correct reason. We never noticed the EXISTS was broken.
  2. The function fails open, not closed. When Unity Catalog cannot properly evaluate a row filter, it fails open — showing rows rather than blocking them. This is the safer choice for uptime but the dangerous choice for security. A broken filter that silently shows everything is much harder to detect than a broken filter that throws an error.

The Diagnosis
The key test was running the filter function directly as the tenant user:

-- Run as the tenant user, not the admin
SELECT
  my_catalog.governance.filter_by_tenant(1) AS can_see_tenant_1,
  my_catalog.governance.filter_by_tenant(2) AS can_see_tenant_2,
  my_catalog.governance.filter_by_tenant(3) AS can_see_tenant_3;
Enter fullscreen mode Exit fullscreen mode

Result:

can_see_tenant_1 = true
can_see_tenant_2 = true
can_see_tenant_3 = true
Enter fullscreen mode Exit fullscreen mode

A user who should only see tenant 3 could see all three. The function was returning true everywhere regardless of tenant key. That confirmed the EXISTS logic was broken — and pointed directly to the parameter name collision.

The Fix — Rename the Parameter

CREATE OR REPLACE FUNCTION
my_catalog.governance.filter_by_tenant(p_tenant_key BIGINT)
RETURNS BOOLEAN
RETURN
  CASE
    -- Null tenant keys are always hidden
    WHEN p_tenant_key IS NULL THEN FALSE

    -- Admin bypass
    WHEN IS_ACCOUNT_GROUP_MEMBER('admin_group') THEN TRUE

    -- Tenant check — p_tenant_key is the parameter
    -- tgm.tenant_key is the table column
    -- SQL can now distinguish between them
    WHEN EXISTS (
      SELECT 1
      FROM my_catalog.governance.tenant_group_mapping tgm
      WHERE IS_ACCOUNT_GROUP_MEMBER(tgm.group_name)
        AND CAST(tgm.tenant_key AS BIGINT) = p_tenant_key
    ) THEN TRUE

    -- Explicit deny — everything else sees zero rows
    ELSE FALSE
  END;
Enter fullscreen mode Exit fullscreen mode

Two changes:

Parameter renamed from tenant_key to p_tenant_key — eliminates the name collision
CASE structure with explicit ELSE FALSE — makes the deny-by-default behaviour visible and intentional

After recreating the function and reapplying the row filter, the same test returned:

can_see_tenant_1 = false
can_see_tenant_2 = false
can_see_tenant_3 = true

Enter fullscreen mode Exit fullscreen mode

Drop and Reapply After Fixing
Updating the function is not enough on its own. You also need to drop and reapply the row filter so the table picks up the new function definition:

ALTER TABLE my_catalog.my_schema.my_table
  DROP ROW FILTER;

ALTER TABLE my_catalog.my_schema.my_table
  SET ROW FILTER my_catalog.governance.filter_by_tenant
  ON (TENANT_KEY);
Enter fullscreen mode Exit fullscreen mode

The Column Masking Side
For completeness — column masking uses the same pattern and has the same naming risk. Here is what a safe masking function looks like with the p_ prefix convention applied:

CREATE OR REPLACE FUNCTION
my_catalog.governance.mask_name(p_name STRING)
RETURNS STRING
RETURN CASE
  WHEN IS_ACCOUNT_GROUP_MEMBER('full_access_group') THEN p_name
  WHEN IS_ACCOUNT_GROUP_MEMBER('admin_group')       THEN p_name
  WHEN IS_ACCOUNT_GROUP_MEMBER('partial_access_group')
    THEN CONCAT(LEFT(p_name, 1), '***')
  ELSE '#### MASKED ####'
END;
Enter fullscreen mode Exit fullscreen mode

Apply it inline at table creation to avoid broken dependencies later:

CREATE TABLE IF NOT EXISTS my_catalog.my_schema.members
(
    MEMBER_KEY   BIGINT  NOT NULL,
    TENANT_KEY   BIGINT  NOT NULL,
    FIRST_NAME   STRING  MASK my_catalog.governance.mask_name,
    LAST_NAME    STRING  MASK my_catalog.governance.mask_name,
    DATE_OF_BIRTH DATE   MASK my_catalog.governance.mask_dob
)
USING DELTA;

-- Row filter applied separately
ALTER TABLE my_catalog.my_schema.members
  SET ROW FILTER my_catalog.governance.filter_by_tenant
  ON (TENANT_KEY);
Enter fullscreen mode Exit fullscreen mode

Declaring masks inline means they survive DROP TABLE / CREATE TABLE cycles. The row filter does not — always reapply it after recreating a table.

The Rule

Never name a row filter function parameter the same as a column in any table the function queries.

Prefix all function parameters with p_. It is one character. It prevents this entire class of silent security failure.

filter_by_tenant(tenant_key BIGINT)   ← dangerous
filter_by_tenant(p_tenant_key BIGINT) ← safe
Enter fullscreen mode Exit fullscreen mode

Full Verification Checklist
Run these in order before trusting any row filter in production:

-- 1. Confirm groups are account-level (not workspace-level)
--    Run as the target user:
SELECT IS_ACCOUNT_GROUP_MEMBER('your_tenant_group');
-- Expected: true

-- 2. Confirm filter function returns correct values per tenant
SELECT
  my_catalog.governance.filter_by_tenant(1) AS t1,
  my_catalog.governance.filter_by_tenant(2) AS t2,
  my_catalog.governance.filter_by_tenant(3) AS t3;
-- Expected: false, false, true (for a tenant 3 user)

-- 3. Confirm filter is attached to the table
DESCRIBE EXTENDED my_catalog.my_schema.my_table;
-- Look for: Row Filter: my_catalog.governance.filter_by_tenant(TENANT_KEY)

-- 4. Confirm mapping table has correct data
SELECT * FROM my_catalog.governance.tenant_group_mapping;

-- 5. Confirm the EXISTS subquery works correctly
SELECT EXISTS (
  SELECT 1
  FROM my_catalog.governance.tenant_group_mapping tgm
  WHERE IS_ACCOUNT_GROUP_MEMBER(tgm.group_name)
    AND tgm.tenant_key = 3
) AS exists_result;
-- Expected: true (for tenant 3 user)

-- 6. Run query as target user and confirm only their rows appear
SELECT COUNT(*), TENANT_KEY
FROM my_catalog.my_schema.my_table
GROUP BY TENANT_KEY;
-- Expected: only their tenant_key in results
Enter fullscreen mode Exit fullscreen mode

Other Gotchas We Hit Along the Way
While we are here — these are the other issues that burned us during the same implementation:
Workspace groups vs account groups. IS_ACCOUNT_GROUP_MEMBER() only recognises account-level groups created in the Databricks Account Console, not workspace-level groups. A workspace group always returns false. This one caused hours of confusion.
Cluster identity. Notebooks attached to a cluster run queries as the cluster owner's identity, not the logged-in user. IS_ACCOUNT_GROUP_MEMBER() _checks the cluster owner's groups. Switch to a SQL Warehouse — it always evaluates per the logged-in user.
Broken dependencies after catalog deletion. Column masks hold references to functions by their fully-qualified path. Delete the catalog containing a masking function without first dropping the masks, and every table with that mask becomes unqueryable with _UC_DEPENDENCY_DOES_NOT_EXIST
. Always drop masks before dropping catalogs.
Row filter lost after DROP TABLE. When you drop and recreate a table, inline column masks are preserved in the CREATE TABLE statement. Row filters are not. Always reapply ALTER TABLE SET ROW FILTER after recreating any filtered table.

Summary
Unity Catalog row-level security and column masking are genuinely powerful. One filter function and one masking function replace hundreds of views, a duplicate encrypted schema, and developer-discipline-as-security-policy.
But the parameter name collision bug is subtle enough that it will catch you if you are not looking for it. The function looks right. It compiles cleanly. It attaches without errors. And it silently hands every user a complete view of every tenant's data.
Prefix your parameters. Always.

Top comments (0)