DEV Community

Zackrag
Zackrag

Posted on

Clay Enrichment Waterfall Sequence Setup: The Exact Provider Order That Cuts Credit Waste

I ran 500 enterprise records and 500 SMB records through a naive Clay waterfall last quarter — PDL first, then Apollo, then Hunter.io — and burned through credits at roughly 3x the rate I needed to. The SMB records in particular were wasteful: Hunter.io found valid emails for 58% of them on the first try, but I'd already paid for PDL lookups that returned nothing on over 300 rows. That test forced me to actually think about sequencing logic instead of just stacking providers in the order Clay's UI suggests them.

The angle every existing guide misses is this: the optimal provider order is not universal. It depends on persona type, and getting it wrong is a direct tax on your credit budget.

Which Provider Should Lead Depends on the Persona, Not the Data Field

Before you touch the waterfall toggle in Clay, answer two questions about your target persona: company size and title type. These two variables determine which provider has the highest probability of returning a result on the first call.

Here's what I found across roughly 2,000 records tested over four months:

Persona Type Best Lead Provider Avg Fill Rate (Lead) Second Provider Notes
SMB, any title Hunter.io 56–62% Snov.io Domain-level inference is strong for SMBs
Mid-market, business title (Sales, Ops, Finance) Apollo 61–68% PDL Apollo's business contact DB is dense here
Mid-market, technical title (Eng, DevOps, Data) PDL 55–63% RocketReach PDL indexes GitHub, Stack Overflow signals
Enterprise, business title (VP+, C-suite) Lusha 52–58% Clearbit Lusha skews toward verified direct dials too
Enterprise, technical title PDL 60–67% RocketReach Seniority + technical = PDL's sweet spot

The SMB row is the one that cost me the most when ignored. Hunter.io's domain-level pattern inference is genuinely well-suited for companies under 200 employees where email formats are consistent and public. PDL at those same companies returns a lot of nulls because PDL's coverage skews toward companies that have a public data footprint — which mid-market and enterprise firms have, small businesses often don't.

For technical titles at any size, PDL earns its spot at the front because it cross-references professional community data sources. When I ran 300 engineering managers through PDL-first vs. Apollo-first sequences, PDL filled 63% versus Apollo's 41% on the first call.

The Short-Circuit Condition That Saves 20–30% of Your Credits

The single most expensive mistake in Clay waterfall setup is failing to implement a hard short-circuit after each provider step. The default behavior without explicit conditional logic will keep calling providers even when a prior one already returned a usable result. I've watched this happen on shared tables where someone set up a five-provider waterfall with no stop conditions and every row hit all five providers regardless of whether the first one succeeded.

Here's the conditional logic structure I now use on every waterfall in Clay. After each provider enrichment column, I add a formula column that evaluates whether the result is actually usable before the next provider fires:

// Example: evaluate Hunter.io email result before calling Apollo
// In Clay formula column:
IF(
  AND(
    NOT(ISEMPTY(hunteremail)),
    CONTAINS(hunteremail, "@"),
    confidence_score >= 70
  ),
  "skip",
  "enrich"
)
Enter fullscreen mode Exit fullscreen mode

Then the next waterfall step uses that column as its run condition: only fire if the formula returns "enrich". This means you're not just checking for a non-empty cell — you're checking for a result that meets your actual quality threshold. An email with a confidence score of 30 from Hunter.io is not a valid result. You want to pass through and let the next provider try.

I use 70 as my confidence threshold for Hunter.io specifically, and I cross-check against a separate Maigret or email validation step downstream. For PDL, I check that the work_email field is non-null AND that the job_title field was also populated — because a PDL record with an email but no job title often indicates a stale export.

The practical impact: when I added proper short-circuit conditions to a 4-provider waterfall (Hunter → Apollo → PDL → RocketReach), average provider calls per record dropped from 2.8 to 1.6 on a 1,000-row test. That's roughly a 43% reduction in credit consumption for the same fill rate.

Building the Decision Tree by Persona Segment in Practice

Here's how I actually structure this in Clay when I have a mixed list that includes SMB, mid-market, and enterprise contacts — which is the realistic scenario for most outbound tables.

First, I enrich company size using Clearbit Company or Apollo company data before the contact waterfall runs. This adds one credit per row but gates the entire downstream waterfall logic, so it pays for itself immediately.

Then I use Clay's conditional column logic to route each row to a different "waterfall lane" based on the employee_count result:

Lane 1 — SMB (< 200 employees): Hunter.io → Snov.io → Apollo. I stop after three providers. If none of those three return a valid result on an SMB contact, the record is likely a bad input or a non-indexed individual, and throwing PDL credits at it is rarely worth it.

Lane 2 — Mid-market (200–2,000 employees), business title: Apollo → PDL → Lusha. Apollo fills most of these. PDL picks up the remainder. Lusha is the expensive fallback and I only let it fire if both Apollo and PDL returned nulls.

Lane 3 — Mid-market, technical title: PDL → RocketReach → Apollo. PDL leads because of the reasons above. RocketReach is solid for technical personas at this company size — I found 51% fill on records PDL missed.

Lane 4 — Enterprise (2,000+ employees): Lusha → Clearbit → PDL. This is the most expensive lane and also where direct dials matter most, so Lusha earns the lead position despite its credit cost. Clearbit's enterprise coverage on business titles is underrated and I've gotten 44% fill on Lusha misses.

I do not use Phantombuster in the main waterfall. Phantombuster is useful for scraping LinkedIn profile data as an input to the waterfall — specifically for getting LinkedIn URLs that then feed into PDL or RocketReach lookups — but using it as a contact data source mid-waterfall introduces rate limit unpredictability that breaks table run timing.

What Happens After the Waterfall Closes

Fill rate is not the same as deliverability rate, and conflating them is how teams end up with 70% "enriched" lists that bounce at 18%. After the waterfall closes, I run every returned email through an additional validation step before it hits any sequence.

For validation, I use a separate Clay integration — either NeverBounce or MillionVerifier depending on volume — as a non-waterfall enrichment column. This fires on every row where the waterfall returned an email, regardless of which provider sourced it. The cost is low compared to a bounce-damaged sending domain.

I also normalize job titles after enrichment because PDL, Apollo, and Lusha all return titles in different formats. A Clay formula that maps raw titles to a controlled vocabulary (VP Engineering → VP_ENG, Head of Sales → HEAD_SALES) makes downstream segmentation and personalization columns far cleaner.

One more thing: log which provider filled each record. I add a source_provider column that captures the winning provider name. Over time, this tells you whether your waterfall is actually sequenced correctly for your specific ICP, or whether you're still leaving cheap providers too far down the chain.

What I Actually Use

For SMB-heavy lists, my current stack is Hunter.io leading into Snov.io with PDL as a last resort. Hunter.io's pricing makes it the obvious first call when domain inference is viable, and Snov.io covers a meaningful portion of the gap at similar cost efficiency.

For enterprise and technical personas, I default to PDL leading, with RocketReach second and Lusha gated behind a conditional that only fires on high-priority accounts (usually filtered by a tier column I set manually or via firmographic scoring).

Clay itself I use for the orchestration layer — the conditional logic, routing, and normalization — not as a data source. The value Clay adds is not in any single provider; it's in the ability to implement the routing logic described above without writing a pipeline in code.

For teams that want a lighter-weight alternative for simpler waterfall setups without Clay's full complexity, Wiza handles multi-provider lookup reasonably well for LinkedIn-sourced inputs. Ziwa is also worth evaluating if you're running high volumes of EU-based records and need GDPR-friendly provider options — it sits in a similar category to Wiza but with a different provider mix. Neither replaces the control you get from building this in Clay properly, but both are valid depending on your scale and use case.

The core principle stays constant regardless of tooling: sequence by probability of success given the persona type, short-circuit the moment you have a valid result, and measure fill rate by provider so you can reorder when the data tells you to.


Top comments (0)