
Most Python developers pick Pydantic (or msgspec) and never benchmark it seriously. Most benchmarks only test the happy path — valid data going in, valid data coming out.
But at real API boundaries, webhooks, and data pipelines, a meaningful percentage of incoming requests are malformed, incomplete, or outright malicious. In those cases, how fast you can reject bad data matters a lot.
I ran a benchmark focused on this reality.
TL;DR
- On valid data, Pydantic v2 and msgspec perform well.
- On invalid data, many libraries become significantly slower due to full error collection and type coercion.
-
validatedata's
validator()fast path (with early exit) is dramatically faster at rejection — in some cases even beating hand-written checks.
Test Data Used
Scalars
email_val = "test@example.com"
int_val = 10
bad_int = "10" # fails strict int check
dict_data = {
'name': 'John',
'age': 30,
'email': 'john@example.com',
'active': True
}
bad_dict = {
'name': 'Jo', # too short
'age': 10, # below minimum
'email': 'bad', # not an email
'active': "yes" # wrong type
}
bad_dict_extended = {
'name': 'Al', # too short
'age': 15, # below minimum
'email': 'bademail', # not an email
'active': "yes", # wrong type
'address': '', # empty
'phone': 'not-a-phone', # invalid
'roles': 'admin' # should be list
}
Benchmark Results (1 million iterations)
| Test | manual | msgspec | validatedata | fastjsonschema | pydantic v2 | beartype |
|---|---|---|---|---|---|---|
| Scalar: type (int) | 0.0842s | 0.0793s | 0.1109s | 0.1478s | 0.4254s | 0.3594s |
| Scalar: type + range | 0.1286s | 0.1353s | 0.1508s | 0.1493s | 0.1314s | 0.3841s |
| Dict (valid) | 1.1996s | 1.2350s | 1.9438s | 2.8658s | 1.8246s | 3.8948s |
| Dict (invalid) | 0.5856s | 1.1895s | 0.2644s | 2.7938s | 2.1661s | 2.0818s |
Key takeaway: On invalid dicts, validatedata finished in 0.26 seconds — more than 8× faster than Pydantic v2.
Why the Big Difference?
Most validation libraries run a full pipeline on every input:
- Type coercion
- Running all validators
- Collecting all errors
- Building rich result objects
This is perfect for user forms, but expensive when you just need to reject bad data quickly.
validatedata uses early-exit optimization in its fast path — it stops at the first failure.
Two Tools for Two Different Jobs
1. validator() — Fast Boolean Path (Early Exit)
from validatedata import validator
is_valid_user = validator({
'username': 'str|min:3|max:32',
'email': 'email',
'age': 'int|min:18'
})
if is_valid_user(data): # True or False — very fast on failure
do_xyz():
Best for: API gatekeeping, webhooks, data pipelines, bot rejection, rate-limited endpoints.
2. validate_data() — Full Featured Validation
Returns detailed errors when needed.
Recommendation Matrix
| Use Case | Best Choice | Reason |
|---|---|---|
| Rich models, IDE, serialization | Pydantic | Mature ecosystem |
| Ultra-fast JSON + structs | msgspec | Great happy path |
| Fast rejection of bad data | validatedata.validator() | Early exit |
| Simple rules + good errors, no classes | validatedata.validate_data() | Low boilerplate |
| Maximum control | Manual checks | Simplicity |
Final Thoughts
Pydantic is still excellent for many applications. However, if you work with public APIs, webhooks, or high-volume data where bad input is common, the performance difference on invalid data can be significant.
validatedata offers a compelling combination: clean pipe syntax and a genuinely fast rejection path.
Try it:
pip install validatedata
Top comments (0)