DEV Community

Cover image for How to Use the Claude Opus 4.7 API ?
Preecha
Preecha

Posted on

How to Use the Claude Opus 4.7 API ?

TL;DR

Claude Opus 4.7 (claude-opus-4-7) is Anthropic’s most capable GA model. It supports a 1M token context window, 128K max output, adaptive thinking, a new xhigh effort level, task budgets, high-res vision up to 3.75 MP, and tool use. This guide shows how to set up the API and implement the main capabilities in Python, TypeScript, and cURL.

Try Apidog today

Introduction

Anthropic released Claude Opus 4.7 on April 16, 2026. It is the most powerful model in the Claude family and is designed for complex reasoning, autonomous agents, and vision-heavy workflows.

If you already use the Claude API, the Messages API will look familiar. The main code changes are:

  • Extended thinking budgets are no longer supported.
  • Sampling parameters such as temperature, top_p, and top_k are no longer supported.
  • Thinking now uses only adaptive thinking.
  • Thinking is off by default.
  • display: "summarized" is required if you want thinking content returned.

This guide walks through API setup, authentication, basic requests, adaptive thinking, high-resolution images, tool use, task budgets, streaming, prompt caching, and multi-turn conversations. It also shows how to test these payloads with Apidog.

Getting Started

1. Get your API key

Create an API key from Anthropic Console:

  • Sign up at console.anthropic.com
  • Open API Keys
  • Click Create Key
  • Copy the key

Store it as an environment variable:

export ANTHROPIC_API_KEY="sk-ant-your-key-here"
Enter fullscreen mode Exit fullscreen mode

2. Install the SDK

Python:

pip install anthropic
Enter fullscreen mode Exit fullscreen mode

TypeScript / Node.js:

npm install @anthropic-ai/sdk
Enter fullscreen mode Exit fullscreen mode

3. Use the Messages API endpoint

All requests go to:

POST https://api.anthropic.com/v1/messages
Enter fullscreen mode Exit fullscreen mode

Required headers:

x-api-key: YOUR_API_KEY
anthropic-version: 2023-06-01
content-type: application/json
Enter fullscreen mode Exit fullscreen mode

Basic Text Request

Use this as your smoke test before adding tools, images, streaming, or thinking.

Python

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Explain how HTTP/2 server push works in three sentences."
        }
    ]
)

print(message.content[0].text)
Enter fullscreen mode Exit fullscreen mode

TypeScript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const message = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content: "Explain how HTTP/2 server push works in three sentences.",
    },
  ],
});

console.log(message.content[0].text);
Enter fullscreen mode Exit fullscreen mode

cURL

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-opus-4-7",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": "Explain how HTTP/2 server push works in three sentences."
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

Adaptive Thinking

Adaptive thinking lets Claude allocate reasoning tokens dynamically based on task complexity.

It is not enabled by default. Add a thinking object to the request:

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16384,
    thinking={
        "type": "adaptive",
        "display": "summarized"
    },
    messages=[
        {
            "role": "user",
            "content": """Analyze this algorithm's time complexity and suggest optimizations:

def find_pairs(arr, target):
    result = []
    for i in range(len(arr)):
        for j in range(i+1, len(arr)):
            if arr[i] + arr[j] == target:
                result.append((arr[i], arr[j]))
    return result"""
        }
    ]
)

for block in message.content:
    if block.type == "thinking":
        print("Thinking:", block.thinking)
    elif block.type == "text":
        print("Response:", block.text)
Enter fullscreen mode Exit fullscreen mode

Key implementation notes:

  • Use thinking={"type": "adaptive"} to enable adaptive thinking.
  • Do not set budget_tokens; it returns a 400 error.
  • Use display: "summarized" if you want thinking content in the response.
  • If display is omitted, thinking is not returned.
  • Use output_config.effort to influence reasoning depth.

Control reasoning depth with effort

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16384,
    thinking={"type": "adaptive"},
    output_config={"effort": "xhigh"},
    messages=[
        {
            "role": "user",
            "content": "Review this pull request for security vulnerabilities..."
        }
    ]
)
Enter fullscreen mode Exit fullscreen mode

Supported effort levels:

Level Best for
xhigh Coding, agentic tasks, complex reasoning
high Most intelligence-sensitive work
medium Balanced speed vs. quality
low Simple tasks and fast responses

High-Resolution Vision

Opus 4.7 accepts images up to 2,576 pixels on the long edge, or 3.75 megapixels. Coordinates map 1:1 to actual pixels.

Analyze an image from a URL

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://example.com/architecture-diagram.png"
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this architecture diagram. List every service and the connections between them."
                }
            ]
        }
    ]
)

print(message.content[0].text)
Enter fullscreen mode Exit fullscreen mode

Analyze a local image with base64

import base64
import anthropic

client = anthropic.Anthropic()

with open("screenshot.png", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "What UI bugs do you see in this screenshot?"
                }
            ]
        }
    ]
)

print(message.content[0].text)
Enter fullscreen mode Exit fullscreen mode

Higher-resolution images consume more tokens. Resize images before sending them if you do not need full visual fidelity.

Tool Use

Tool use lets Claude call functions you define. Opus 4.7 tends to use fewer tool calls by default and may prefer reasoning. Increase effort when you want stronger tool-use behavior.

Define a tool

tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a city. Returns temperature, conditions, and humidity.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g. 'San Francisco'"
                },
                "units": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["city"]
        }
    }
]
Enter fullscreen mode Exit fullscreen mode

Run a tool-use request

import json
import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a city. Returns temperature, conditions, and humidity.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g. 'San Francisco'"
                },
                "units": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["city"]
        }
    }
]

messages = [
    {
        "role": "user",
        "content": "What's the weather like in Tokyo right now?"
    }
]

# First call: Claude requests a tool
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    tools=tools,
    messages=messages,
)

if response.stop_reason == "tool_use":
    messages.append({
        "role": "assistant",
        "content": response.content
    })

    tool_results = []

    for block in response.content:
        if block.type == "tool_use":
            # Execute your real function here.
            result = {
                "temperature": 22,
                "conditions": "Partly cloudy",
                "humidity": 65
            }

            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": json.dumps(result)
            })

    messages.append({
        "role": "user",
        "content": tool_results
    })

    # Second call: Claude uses the tool result
    final_response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        tools=tools,
        messages=messages,
    )

    print(final_response.content[0].text)
Enter fullscreen mode Exit fullscreen mode

Agentic Loop Pattern

For autonomous agents, keep calling the model until it stops requesting tools.

def run_agent(system_prompt: str, tools: list, user_message: str) -> str:
    messages = [
        {
            "role": "user",
            "content": user_message
        }
    ]

    while True:
        response = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=16384,
            system=system_prompt,
            tools=tools,
            thinking={"type": "adaptive"},
            output_config={"effort": "xhigh"},
            messages=messages,
        )

        messages.append({
            "role": "assistant",
            "content": response.content
        })

        if response.stop_reason != "tool_use":
            return "".join(
                block.text
                for block in response.content
                if hasattr(block, "text")
            )

        tool_results = []

        for block in response.content:
            if block.type == "tool_use":
                result = execute_tool(block.name, block.input)

                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                })

        messages.append({
            "role": "user",
            "content": tool_results
        })
Enter fullscreen mode Exit fullscreen mode

Task Budgets Beta

Task budgets give Claude a token allowance for an entire agentic loop. The model sees a running countdown and can wrap up work as the budget is consumed.

response = client.beta.messages.create(
    model="claude-opus-4-7",
    max_tokens=128000,
    output_config={
        "effort": "high",
        "task_budget": {
            "type": "tokens",
            "total": 128000
        },
    },
    messages=[
        {
            "role": "user",
            "content": "Review the codebase and propose a refactor plan."
        }
    ],
    betas=["task-budgets-2026-03-13"],
)
Enter fullscreen mode Exit fullscreen mode

Important constraints:

  • Minimum budget: 20,000 tokens
  • Advisory, not a hard cap
  • Claude may overshoot the budget
  • Different from max_tokens, which is a hard ceiling the model cannot see
  • Requires the beta header task-budgets-2026-03-13

Streaming Responses

Use streaming for chat UIs, CLIs, and long-running responses.

Python

with client.messages.stream(
    model="claude-opus-4-7",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": "Write a Python function to parse CSV files with error handling."
        }
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

TypeScript

const stream = await client.messages.stream({
  model: "claude-opus-4-7",
  max_tokens: 4096,
  messages: [
    {
      role: "user",
      content: "Write a Python function to parse CSV files with error handling.",
    },
  ],
});

for await (const event of stream) {
  if (
    event.type === "content_block_delta" &&
    event.delta.type === "text_delta"
  ) {
    process.stdout.write(event.delta.text);
  }
}
Enter fullscreen mode Exit fullscreen mode

If adaptive thinking is enabled with display: "summarized", thinking blocks stream before the final text response.

If display is omitted, users may see a pause while the model reasons, followed by the text response.

Prompt Caching

Use prompt caching for repeated context, such as long system prompts, codebase summaries, or documents.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a senior code reviewer. Review code for security vulnerabilities, performance issues, and best practices violations...",
            "cache_control": {
                "type": "ephemeral"
            }
        }
    ],
    messages=[
        {
            "role": "user",
            "content": """Review this function:

def process_user_input(data):
    return eval(data)"""
        }
    ]
)
Enter fullscreen mode Exit fullscreen mode

Cache pricing for Opus 4.7:

Operation Cost
5-minute cache write $6.25 / MTok, 1.25x base
1-hour cache write $10 / MTok, 2x base
Cache read / hit $0.50 / MTok, 0.1x base

A single cache read pays for the 5-minute cache write. Two reads pay for the 1-hour write.

Multi-Turn Conversations

Maintain conversation state by appending each user and assistant turn to the messages array.

messages = []

# Turn 1
messages.append({
    "role": "user",
    "content": "I need to build a REST API for a todo app."
})

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    messages=messages,
)

messages.append({
    "role": "assistant",
    "content": response.content
})

# Turn 2
messages.append({
    "role": "user",
    "content": "Add authentication with JWT tokens."
})

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    messages=messages,
)

print(response.content[0].text)
Enter fullscreen mode Exit fullscreen mode

Testing Your API Calls with Apidog

Building a Claude API integration usually involves complex payloads: multi-turn messages, tool definitions, tool results, base64 images, beta headers, and streaming responses. Apidog can help you inspect and debug those requests visually.

Image

Set up a Claude API request in Apidog:

  1. Create a new project in Apidog.
  2. Add the Claude Messages API endpoint.
  3. Store ANTHROPIC_API_KEY as an environment variable.
  4. Add the required headers:
    • x-api-key
    • anthropic-version
    • content-type
  5. Save reusable request bodies for basic text, vision, tool use, and streaming scenarios.

Test tool-use flows

Tool use usually requires at least two API calls:

  1. Send the initial user message.
  2. Inspect Claude’s tool_use block.
  3. Execute your function outside the model.
  4. Send a tool_result block back.
  5. Read Claude’s final answer.

Apidog lets you chain these requests so you can simulate the full loop and inspect each payload.

Compare models

Run the same request against claude-opus-4-6 and claude-opus-4-7 to compare:

  • Token counts
  • Response quality
  • Latency
  • Tool-use behavior

Apidog’s test runner makes these comparisons repeatable.

Validate schemas

Define JSON schemas for expected response formats and validate responses automatically. This helps catch regressions when you change prompts, tools, or model versions.

Common Errors and Fixes

Error Cause Fix
400: thinking.budget_tokens not supported Using extended thinking syntax Switch to thinking: {"type": "adaptive"}
400: temperature not supported Setting unsupported sampling parameters Remove temperature, top_p, and top_k
400: max_tokens exceeded New tokenizer produces more tokens Increase max_tokens, up to 128,000
429: Rate limited Too many requests Implement exponential backoff and check your tier limits
Blank thinking blocks Thinking display defaults to omitted Add display: "summarized" to the thinking config

Pricing Reference

Usage Cost
Input tokens $5 / MTok
Output tokens $25 / MTok
Batch input $2.50 / MTok
Batch output $12.50 / MTok
Cache reads $0.50 / MTok
5-minute cache writes $6.25 / MTok
1-hour cache writes $10 / MTok

Opus 4.7’s new tokenizer may use up to 35% more tokens for the same text compared to Opus 4.6. Use the /v1/messages/count_tokens endpoint to estimate costs before production deployment.

Conclusion

Claude Opus 4.7 keeps the familiar Messages API shape but changes how reasoning is configured. Remove extended thinking budgets and unsupported sampling parameters, then use adaptive thinking, effort, task budgets, high-resolution vision, and tool use where they fit your workflow.

A practical implementation path:

  1. Start with a basic text request.
  2. Add adaptive thinking for complex reasoning.
  3. Add tool use for external actions and data retrieval.
  4. Use task budgets for long-running agentic loops.
  5. Stream responses for better UX.
  6. Use prompt caching for repeated context.
  7. Test requests, tool loops, and schemas with Apidog before shipping.

Top comments (0)