Your microservice calls an external API. It fails.
You add Resilience4j. Now you have 10 transitive dependencies, a 500KB JAR, 3 config classes, and a @Bean method just to say "try 3 times, wait 1 second".
I hit that wall one too many times. So I built RetryKit — a zero-dependency Java 17 retry & circuit breaker library. No Spring required. No transitive pulls. JAR under 50KB.
<dependency>
<groupId>io.github.caninaam</groupId>
<artifactId>retrykit</artifactId>
<version>1.0.1</version>
</dependency>
Fixed delay retry with fallback — the most common case:
String result = RetryKit.<String>retry()
.maxAttempts(3)
.waitDuration(Duration.ofSeconds(1))
.fallback(ctx -> "default")
.call(() -> myService.call());
Exponential backoff with jitter — for avoiding thundering herd under load:
RetryKit.<String>retry()
.maxAttempts(4)
.exponentialBackoff(Duration.ofMillis(500), 2.0, Duration.ofSeconds(10))
.withJitter(0.2)
.call(() -> myService.call());
Retry only on specific exceptions — don't retry a 400 Bad Request:
RetryKit.<String>retry()
.maxAttempts(3)
.retryOn(IOException.class, HttpServerErrorException.class)
.call(() -> myService.call());
Most retry libraries make you choose: retry OR circuit breaker OR timeout. In real production systems you need all three, composed in the right order. RetryKit lets you express that as a single string:
RetryKit.<String>retry()
.pipeline("TIMEOUT(3s) > RETRY(3) > CB(50%)")
.call(() -> myService.call());
Read it left to right — outermost wrapper first. TIMEOUT(3s) — each attempt must complete within 3 seconds. RETRY(3) — retry up to 3 times on failure. CB(50%) — open the circuit if 50%+ of calls fail.
The full DSL syntax:
TIMEOUT(5s)
TIMEOUT(500ms)
RETRY(3)
RETRY(maxAttempts:5, waitDuration:500ms)
RETRY(maxAttempts:4, waitDuration:1s, backoff:2.0, maxWait:10s, jitter:0.2)
CB(50%)
CB(failureRate:50%, minCalls:5, wait:1m, halfOpen:2, timeout:2s)
A realistic production pipeline in one line:
.pipeline("TIMEOUT(2s) > RETRY(maxAttempts:4, waitDuration:500ms, backoff:2.0, jitter:0.2) > CB(failureRate:60%, minCalls:10, wait:30s)")
This used to take 40+ lines of config. Now it's one.
For teams that want to tune retry behavior without redeploying:
production:
mode: RETRY_FIRST
maxAttempts: 3
waitDuration: PT1S
circuitBreaker:
failureRateThreshold: 50
minimumNumberOfCalls: 5
waitDurationInOpenState: PT1M
aggressive:
mode: PIPELINE
pipeline: "TIMEOUT(2s) > RETRY(3) > CB(50%)"
Load it in your app:
RetryKit.<String>fromYaml("/etc/myapp/retrykit.yaml")
.profile("production")
.<String>as()
.withHotReload(Duration.ofSeconds(10))
.fallback(ctx -> "fallback")
.build();
Change maxAttempts or failureRateThreshold in the file — the running service picks it up in 10 seconds. No restart, no redeploy.
Two workflow modes depending on your use case. RETRY_FIRST — retry exhausts attempts, CB accumulates failures across retries. CB_FIRST — circuit breaker is checked before any retry attempt, if OPEN fail immediately — no retries wasted. Use CB_FIRST when the downstream service is known to be down.
Two distinct exceptions so you always know what happened:
try {
kit.call(() -> myService.call());
} catch (RetryException e) {
// service WAS called — all e.attempts() failed
} catch (CircuitBreakerOpenException e) {
// service was NOT called — CB is open, we already know it's down
}
How does RetryKit compare?
| RetryKit | Resilience4j | |
|---|---|---|
| Dependencies | 0 | ~10 |
| JAR size | < 50 KB | ~500 KB |
| Pipeline DSL | yes | no |
| YAML hot reload | yes | no |
| Java version | 17+ | 8+ |
| Setup time | 5 min | 30 min+ |
Most enterprise retry libraries are built for every possible use case — you pay that cost even when you need none of it. RetryKit does one thing well.
GitHub: https://github.com/caninaam/retry-kit
Maven Central: https://central.sonatype.com/artifact/io.github.caninaam/retrykit
Next post — how the circuit breaker state machine works under the hood, and why compareAndSet matters in concurrent systems.
Top comments (1)
Dep weight is one of those things that never shows up in code review but quietly tells you what the team actually values. We made the same call in our payments stack two years ago — retry + circuit breaker as a ~200-line utility instead of a framework — and the unexpected win wasn't bundle size, it was that every engineer could explain what happened on failure. Frameworks are great until an incident at 2am when you need to know what "exhausted" means to the retry logic. Nice ship.