You know the OSINT workflow. Open a terminal. Run holehe against an email. Copy a username you found. Switch tools. Run sherlock. Open a browser. Check HaveIBeenPwned manually. Pull up a WHOIS tab. Take notes. Repeat.
Every tool is a silo. Every pivot is manual. The investigation logic lives entirely in your head.
I wanted to fix that.
What I built
OpenOSINT is an open-source Python framework with an AI agent at its core. You describe a target in natural language — an email address, a username, a domain, an IP, a phone number — and the agent decides which tools to run, chains them based on what it finds, executes everything against the real binaries, and compiles a structured Markdown report.
Three interfaces:
- Interactive AI REPL (default) — type natural language, agent chains the tools autonomously
- Direct CLI — run individual tools directly, no AI, perfect for scripting
- MCP Server — expose all 9 tools to Claude Code or Claude Desktop
The demo
Here's a real session. No mocking. The agent receives an email, runs discovery, extracts a username, pivots to search it across 300+ platforms, checks breaches, and saves a report — all unchained:
$ openosint
openosint ❯ investigate target@example.com
→ generate_dorks('target@example.com')
→ search_email('target@example.com')
✓ Found: Spotify, WordPress, Gravatar, Office365
→ search_breach('target@example.com')
✓ Found in 2 breaches: LinkedIn (2016), Adobe (2013)
→ search_username('target_handle')
✓ Found on: GitHub, Reddit, HackerNews, Twitter
╭──────────────── Report ────────────────╮
│ ## Summary │
│ Single target — high confidence. │
│ │
│ ## Online Presence │
│ Spotify · WordPress · Gravatar │
│ │
│ ## Data Breaches │
│ LinkedIn (2016) · Adobe (2013) │
╰────────────────────────────────────────╯
✓ Report saved → reports/2026-05-11_report.md
The agent went email → accounts → username pivot → cross-platform search → breach check. No human orchestration.
The architecture
The codebase has three layers with a hard no-upward-import rule:
| Layer | Path | Responsibility |
|---|---|---|
| Core tools | openosint/tools/ |
Async wrappers around binaries and APIs. Stateless. No AI. |
| AI agent | openosint/agent.py |
Anthropic tool use loop. Per-session conversation history. |
| Interfaces |
repl.py, mcp_server.py, cli.py
|
REPL, MCP server, direct CLI. |
The AI layer is optional. The core tools run fine without it — the CLI and MCP server both bypass the agent entirely.
Why hallucination in tool results is structurally ruled out
The AI layer uses the Anthropic native tool use API. Here's the exact flow:
- Agent receives your prompt
- Model decides which tool to call → issues a hard stop
- Real binary executes (
holehe,sherlock, etc.) - Real output goes back into the context as a
tool_result - Model reads actual output, decides next step
The model never infers or synthesizes what a tool would return. It only ever sees real output. If sherlock finds 47 profiles, that exact number and those exact URLs go back. The agent can't make up results because it never generates them.
9 tools
| Tool | Backend | What it finds |
|---|---|---|
search_email |
holehe | Social accounts linked to an email |
search_username |
sherlock | Accounts across 300+ platforms |
search_breach |
HaveIBeenPwned v3 | Breach exposure and leaked data types |
search_whois |
python-whois | Registrant, registrar, creation/expiry dates |
search_ip |
ipinfo.io | Geolocation, ASN, hostname, org |
search_domain |
sublist3r | Subdomain enumeration |
generate_dorks |
built-in | 12 targeted Google dork URLs (no network calls) |
search_paste |
psbdmp.ws | Pastebin dump mentions |
search_phone |
phoneinfoga | Carrier, country, line type |
If a dependency is missing, that tool returns a descriptive error and the rest keeps running.
Installation
git clone https://github.com/OpenOSINT/OpenOSINT.git
cd OpenOSINT
pip install -e .
export ANTHROPIC_API_KEY=sk-ant-...
External deps (via pip):
pip install holehe sherlock-project sublist3r
phoneinfoga is a standalone binary — download from GitHub releases.
Optional env vars:
export HIBP_API_KEY=your_key # HaveIBeenPwned v3
export IPINFO_TOKEN=your_token # higher rate limits on ipinfo.io
Requires Python 3.10+.
Using it
Interactive REPL
$ openosint
openosint ❯ investigate target@example.com
openosint ❯ find all accounts for johndoe99
openosint ❯ what subdomains does example.com have?
openosint ❯ check if +14155552671 is a mobile number
Reports are auto-saved to reports/ as Markdown after every investigation containing structured findings.
REPL commands:
clear Reset conversation memory
save Save last report manually
tools Show available tools and status
config Show current configuration
help All commands
exit Quit
Direct CLI (no AI)
openosint email target@example.com -t 60
openosint username johndoe99
openosint -v email target@example.com # verbose
MCP Server
All 9 tools are exposed as an MCP server. Register in Claude Code:
claude mcp add openosint python /absolute/path/to/OpenOSINT/openosint/mcp_server.py
claude mcp list # verify
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"openosint": {
"command": "python",
"args": ["/absolute/path/to/OpenOSINT/openosint/mcp_server.py"]
}
}
}
Then from Claude Code:
> Investigate target@example.com. If you find a linked username,
trace it across other platforms and compile a full report.
The agent chains exactly as it would in the REPL, but driven by Claude Code's context.
How the agent loop works (for the curious)
# Simplified version of openosint/agent.py
messages = [{"role": "user", "content": user_input}]
while True:
response = anthropic.messages.create(
model="claude-opus-4-5",
tools=TOOL_SCHEMAS, # all 9 tools as JSON schemas
messages=messages
)
if response.stop_reason == "end_turn":
break # agent is done
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type == "tool_use":
# Execute the REAL binary
result = await execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result # real output, no inference
})
# Feed real results back into context
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
The loop runs until stop_reason == "end_turn". The agent decides when it has enough information to write the report.
What's next
- Shodan and Censys integration
- Support for additional LLM providers (Ollama, GPT-4)
- JSON and PDF export formats
- Docker image for zero-setup deployment
- Async parallel tool execution for multi-target investigations
Legal
OpenOSINT is for authorized security research, penetration testing, and investigative journalism only. Users are solely responsible for compliance with applicable law including GDPR, CCPA, and the CFAA. See DISCLAIMER.md.
GitHub: github.com/OpenOSINT/OpenOSINT
Docs: openosint.tech
Stars and issues welcome. If you build something with it, drop a comment — curious what use cases people find.





Top comments (0)