DEV Community

Jangwook Kim
Jangwook Kim

Posted on • Originally published at jangwook.net

Anthropic SDK vs OpenAI SDK: Developer Experience Compared — Type Safety, Error Handling, and Streaming Patterns

pip install anthropic openai — then I started pulling them apart side by side. anthropic 0.100.0, openai 2.36.0. The version numbers alone tell a story: Anthropic is still in 0.x territory while OpenAI is already on a 2.x branch. The numbers matter less than what they signal about design philosophy.

I ran both SDKs in a temporary sandbox, inspecting type hierarchies, error classes, streaming source code, and tool call formats directly. Here's what I found.

First Impressions: What Version Numbers Actually Tell You

anthropic 0.100.0 is its 100th minor release without ever shipping 1.0. That's either deliberate API stability caution or a marketing choice. openai 2.36.0 has already gone through one major version bump.

Both SDKs are built on httpx internally, both stream using SSE (Server-Sent Events). Looking at top-level client initialization reveals the philosophical differences.

# anthropic.Anthropic() exclusive params
client = anthropic.Anthropic(
    api_key=None,
    auth_token=None,
    credentials=None,   # enterprise credential objects
    config=None,        # profile-based configuration
    profile=None,       # named profile
    webhook_key=None,
    _token_cache=NOT_GIVEN,
)

# openai.OpenAI() exclusive params
client = openai.OpenAI(
    api_key=None,
    admin_api_key=None,
    workload_identity=None,   # IAM-based auth
    organization=None,
    project=None,
    webhook_secret=None,
    websocket_base_url=None,  # for Realtime API
    _enforce_credentials=True,
)
Enter fullscreen mode Exit fullscreen mode

Anthropic leans into enterprise credential management — multiple accounts via config files and named profiles. OpenAI focuses on team-level billing and IAM-based authentication through organization, project, and workload_identity.

Shared params include max_retries=2, timeout, default_headers, http_client — identical defaults on both.

Type System: 408 vs 230

This was the most surprising number from the sandbox.

anthropic.types exported types: 408
openai.types exported types: 230
Enter fullscreen mode Exit fullscreen mode

The gap is significant. Claude's API returns richer response structures — ToolUseBlock, ThinkingBlock, CitationContentBlockLocation, BashCodeExecutionOutputBlock are all valid response content types, each with a matching TypedDict param. OpenAI centers around ChatCompletion with a simpler, more uniform response format.

Anthropic-exclusive types worth knowing:

Type Feature
ThinkingBlock / ThinkingConfigParam Extended Thinking
CacheControlEphemeralParam Prompt caching (TTL: '5m' / '1h')
CitationContentBlockLocation Citation location tracking
BashCodeExecutionOutputBlock Code execution tool results
MemoryTool20250818Param Agent memory tool
ServerToolCaller20260120Param Server-side tool executor
AnthropicBetaParam Beta feature header control

OpenAI-exclusive:

Feature Description
AssistantEventHandler Assistants API event streaming
Realtime API WebSocket-based bidirectional streaming
Fine-tuning module fine_tuning module built in
OAuthError Under AuthenticationError

More types isn't strictly better — BashCodeExecutionToolResultBlockParam and BashCodeExecutionToolResultErrorParam as separate types gives you precise autocomplete but steepens the learning curve. OpenAI's simpler shape makes onboarding faster.

Tool Call Format: input_schema vs function.parameters

The biggest API design divergence between the two.

# Anthropic tool definition
anthropic_tool = {
    "name": "get_weather",
    "description": "Get current weather",
    "input_schema": {              # JSON Schema is the root
        "type": "object",
        "properties": {
            "location": {"type": "string"}
        },
        "required": ["location"]
    }
}

# OpenAI tool definition
openai_tool = {
    "type": "function",            # extra wrapper layer
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {            # nested inside function
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Tool results also differ:

# Anthropic: tool result as content block in a user message
messages.append({
    "role": "user",
    "content": [
        {
            "type": "tool_result",
            "tool_use_id": "toolu_01A...",
            "content": "15°C sunny"
        }
    ]
})

# OpenAI: tool result as a distinct role message
messages.append({
    "role": "tool",
    "tool_call_id": "call_abc123",
    "content": "15°C sunny"
})
Enter fullscreen mode Exit fullscreen mode

Anthropic unifies everything under content blocks — messages are just arrays of typed blocks. OpenAI uses a dedicated tool role. Anthropic's approach is arguably more consistent; OpenAI's feels more familiar to developers who've used chat APIs with role-based message history.

If you're building a multi-model router that needs to handle both, this format difference is where most of the adapter complexity lands.

Error Handling Architecture: What's Shared and What Isn't

I printed both SDK error hierarchies from the sandbox directly.

Error hierarchy comparison diagram

# Anthropic error hierarchy (0.100.0)
APIError
├─ APIResponseValidationError
├─ APIWebhookValidationError
├─ APIStatusError
│   ├─ BadRequestError
│   ├─ AuthenticationError
│   ├─ PermissionDeniedError
│   ├─ NotFoundError
│   ├─ ConflictError
│   ├─ RequestTooLargeError     ← Anthropic-only
│   ├─ RateLimitError
│   ├─ ServiceUnavailableError  ← Anthropic-only
│   ├─ OverloadedError          ← Anthropic-only (HTTP 529)
│   ├─ DeadlineExceededError    ← Anthropic-only
│   └─ InternalServerError
└─ APIConnectionError
    └─ APITimeoutError

# OpenAI error hierarchy (2.36.0)
APIError
├─ APIResponseValidationError
├─ APIStatusError
│   ├─ BadRequestError
│   ├─ AuthenticationError
│   │   └─ OAuthError           ← OpenAI-only
│   ├─ PermissionDeniedError
│   ├─ NotFoundError
│   ├─ ConflictError
│   ├─ UnprocessableEntityError
│   ├─ RateLimitError
│   └─ InternalServerError
└─ APIConnectionError
    └─ APITimeoutError
Enter fullscreen mode Exit fullscreen mode

The Anthropic-exclusive errors matter in production. OverloadedError is HTTP 529 — Claude server traffic overload. Catching it separately from RateLimitError lets you apply different backoff strategies. DeadlineExceededError is distinct from APITimeoutError — it's a processing time issue, not a connection timeout, which means the fix is different. RequestTooLargeError differs from context length errors.

from anthropic import OverloadedError, RateLimitError, DeadlineExceededError
import time

def safe_call(client, **kwargs):
    try:
        return client.messages.create(**kwargs)
    except OverloadedError:
        # Server overloaded — back off longer
        time.sleep(10)
        return client.messages.create(**kwargs)
    except RateLimitError as e:
        wait = int(e.response.headers.get('retry-after', 60))
        time.sleep(wait)
        return client.messages.create(**kwargs)
    except DeadlineExceededError:
        # Processing time exceeded — split into smaller requests
        raise
Enter fullscreen mode Exit fullscreen mode

Both SDKs default to max_retries=2 and auto-retry on 429 and 5xx errors.

Streaming Pattern: Identical Core, Different Surface

I read both streaming implementations from source files in the sandbox. The Stream.__iter__ implementations are nearly identical.

# anthropic._streaming.Stream (actual source)
class Stream(Generic[_T], metaclass=_SyncStreamMeta):
    response: httpx.Response
    _decoder: SSEBytesDecoder

    def __iter__(self) -> Iterator[_T]:
        for item in self._iterator:
            yield item

    def _iter_events(self) -> Iterator[ServerSentEvent]:
        yield from self._decoder.iter_bytes(self.response.iter_bytes())

# openai._streaming.Stream (nearly identical)
class Stream(Generic[_T]):  # no metaclass
    response: httpx.Response
    _decoder: SSEBytesDecoder

    def __iter__(self) -> Iterator[_T]:
        for item in self._iterator:
            yield item

    def _iter_events(self) -> Iterator[ServerSentEvent]:
        yield from self._decoder.iter_bytes(self.response.iter_bytes())
Enter fullscreen mode Exit fullscreen mode

The only structural difference is Anthropic uses _SyncStreamMeta metaclass. The usage APIs differ more meaningfully:

# Anthropic streaming
with client.messages.stream(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
) as stream:
    for text in stream.text_stream:   # clean text extraction
        print(text, end="", flush=True)
    final = stream.get_final_message()

# OpenAI streaming
with client.chat.completions.stream(
    model="gpt-5",
    messages=[{"role": "user", "content": "Hello"}]
) as stream:
    for chunk in stream:
        delta = chunk.choices[0].delta
        if delta.content:             # more path traversal required
            print(delta.content, end="", flush=True)
    final = stream.get_final_completion()
Enter fullscreen mode Exit fullscreen mode

stream.text_stream on the Anthropic side is ergonomically nicer for pure text output. The Vercel AI SDK approach to Claude streaming agents builds on a similar pattern and is worth reading if you're taking streaming to production.

Anthropic-Exclusive: Prompt Caching, Extended Thinking, Citations

Prompt Caching

CacheControlEphemeralParam has a ttl field — '5m' or '1h'. This was new to me. Previously ephemeral cache was the only option; now you can set expiry.

client.messages.create(
    model="claude-opus-4-7",
    system=[
        {
            "type": "text",
            "text": "Very long system document... (tens of thousands of tokens)",
            "cache_control": {"type": "ephemeral", "ttl": "1h"}
        }
    ],
    messages=[{"role": "user", "content": "Summarize"}]
)
Enter fullscreen mode Exit fullscreen mode

The practical impact is real — repeated API calls against the same large document see significant token cost reductions on cached portions. The Claude API Prompt Caching guide covers the four patterns for applying this in production.

Extended Thinking

ThinkingConfigParam and ThinkingBlock let Claude expose its reasoning chain as structured output. OpenAI has no equivalent.

response = client.messages.create(
    model="claude-opus-4-7",
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "Complex math problem"}]
)

for block in response.content:
    if block.type == "thinking":
        print("Reasoning:", block.thinking)  # structured, typed
    elif block.type == "text":
        print("Answer:", block.text)
Enter fullscreen mode Exit fullscreen mode

Citations

CitationContentBlockLocation tracks which document fragment Claude drew from. Useful for RAG systems where you need to surface source attribution alongside answers.

OpenAI-Exclusive: Assistants API, Realtime, Fine-Tuning

Assistants APIAssistantEventHandler enables stateful agents combining file search, code interpreter, and custom functions. Anthropic has no SDK-level abstraction equivalent.

Realtime API — that websocket_base_url parameter exists for a reason. WebSocket-based bidirectional streaming is supported directly by the SDK. Useful for voice agents and live interactive experiences.

Fine-tuning — OpenAI's fine_tuning module lets you manage tuning jobs from within the SDK. Anthropic's fine-tuning is a separate enterprise contract path with no public SDK interface.

OAuthError — a subtype of AuthenticationError, useful when your auth flow uses OAuth and you need to distinguish OAuth failures from standard key-based auth failures.

Which SDK to Pick

The honest answer: it depends on which model you're using, and which features you actually need.

Anthropic SDK 0.100.0 OpenAI SDK 2.36.0
Exported types 408 230
Error classes 16 (includes 529) 13
Default max_retries 2 2
Streaming core httpx + SSE httpx + SSE
Prompt caching ✓ SDK-level
Extended thinking
Realtime API
Assistants API
Fine-tuning
Citations
Tool result format content block tool role message

Pick Anthropic SDK when: your workload is long-context document processing (use prompt caching to cut costs), you need reasoning transparency (Extended Thinking), you're building document QA with citation tracking, or your team prioritizes type safety.

Pick OpenAI SDK when: you need voice interfaces or live interaction (Realtime API), you want the Assistants API's file search + code interpreter combination, you need org/project-level billing separation, or you're fine-tuning and managing custom models.

Running both? Put an abstraction layer in front. PydanticAI handles multi-provider routing at the agent level, which lets you avoid propagating tool format differences throughout your codebase.

What This Comparison Actually Reveals

An SDK isn't just a wrapper around an HTTP API. It's a design document that shows what the company thinks developers will do with the model.

Anthropic's 408 types tell you: we expect you to care about caching tokens, tracking citations, and introspecting reasoning chains. OpenAI's Realtime API and Assistants tell you: we expect you to build live user-facing experiences and stateful conversational systems.

I use the Anthropic SDK for Claude-based projects and pull in OpenAI only where I specifically need Realtime or fine-tuned models. When I need both in the same codebase, I isolate the SDK calls behind an interface layer so the tool format differences don't leak into business logic.

The SDK you pick should follow from the model choice, not lead it.

Top comments (0)