Jangwook Kim

Posted on May 6 • Originally published at effloow.com

Microsoft Agent Governance Toolkit: Developer Setup Guide

#aigovernance #agentsecurity #microsoft #opensource

Autonomous agents are moving into production faster than governance frameworks can keep up. Agents call external APIs, write files, delegate sub-tasks to other agents, and run for hours — all without a human watching every step. That creates real exposure: prompt injection, privilege escalation, unintended data exfiltration, and no audit trail when things go wrong.

Microsoft released the Agent Governance Toolkit on April 2, 2026 as a MIT-licensed open-source project that sits between your agent framework and the actions agents take. It is the first toolkit to cover all 10 OWASP Agentic AI Top 10 risks with deterministic, sub-millisecond policy enforcement. This guide walks through what it does, how to install it, and how to wire it into frameworks you already use.

Effloow Lab verified the package structure, PyPI availability, and integration API surface from the official GitHub repository and Microsoft documentation. See the lab-run note for details and limitations.

Why Governance Is the Missing Layer

Most agent frameworks — LangChain, CrewAI, Google ADK, AutoGen — focus on orchestration: how agents reason, delegate, and use tools. None of them define what agents are allowed to do at runtime.

The gap is real. An agent that can call send_email has no built-in constraint on which emails it can send. An agent that can write files has no built-in constraint on which paths are off-limits. Policies like these have traditionally been ad hoc — hardcoded conditions scattered across tool definitions, or left out entirely because they seemed like an edge case.

Two regulatory deadlines are changing that calculation fast:

The EU AI Act's high-risk AI obligations take effect in August 2026, requiring risk management records, logging, and human oversight evidence for systems that make or influence decisions affecting people.
The Colorado AI Act enforcement begins in June 2026, requiring risk-management programs, impact assessments, and consumer notification records.

The Agent Governance Toolkit gives developers a structured way to meet these requirements without building the compliance layer from scratch. It records every policy decision as a trace attribute, maps those records to EU AI Act and Colorado AI Act evidence requirements, and provides compliance grading reports you can include in audits.

The Seven-Package Architecture

The toolkit is a monorepo with seven independently installable packages. Each addresses a distinct governance layer.

Package	Purpose	Install
agent-os-kernel	Stateless policy engine — intercepts every action before execution. Sub-millisecond latency (<0.1ms p99).	`pip install agent-os-kernel`
agentmesh-platform	Zero-trust identity — decentralized identifiers (DID) with Ed25519 signing so agents can verify each other.	`pip install agentmesh-platform`
agent-runtime	Execution sandboxing — privilege rings modeled on CPU rings, saga orchestration for multi-step rollback, and an emergency kill switch.	Included in `[full]`
agent-compliance	Regulatory grading — maps runtime evidence to EU AI Act, HIPAA, SOC2, and OWASP Agentic Top 10.	`pip install agent-compliance`
agent-marketplace	Plugin lifecycle management — Ed25519-signed plugins, trust-tiered capability gating, supply-chain security.	Included in `[full]`
agent-lightning	RL training governance — policy-enforced training runners, reward shaping constraints.	Included in `[full]`
agent-sre	Reliability engineering — circuit breakers, chaos testing, SLO enforcement for agent uptime.	`pip install agent-sre`

For most teams starting with governance, the essential packages are agent-os-kernel (policy enforcement) and agent-compliance (regulatory grading). The other five packages become relevant as your agent surface grows.

Installation

Install the full toolkit with:

pip install "agent-governance-toolkit[full]"

Requirements: Python 3.10+. Node.js 18+ or .NET 8.0+ needed only if you use the TypeScript or C# SDKs. The Python SDK works standalone.

If you prefer minimal dependencies, install only what you need:

# Minimal: policy enforcement only
pip install agent-os-kernel

# Add compliance reporting
pip install agent-os-kernel agent-compliance

# Add reliability engineering
pip install agent-sre

Verify the install:

python -c "from agent_governance import PolicyEngine; print(PolicyEngine.__module__)"
# Expected: agent_governance.engine.policy

Writing Your First Policy

The PolicyEngine loads rules from three supported formats: YAML (most teams start here), OPA Rego (for complex conditional logic), and Cedar (for attribute-based access control). You can mix formats in a single policy file.

A minimal YAML policy blocks PII exfiltration and requires human approval before writing to privileged paths:

# governance-policies.yaml
version: "1.0"
policies:
  - id: block-pii-exfiltration
    action: tool_call
    condition: "tool.name in ['send_email', 'post_slack'] and 'SSN' in args.body"
    effect: DENY
    reason: "PII exfiltration blocked — EU AI Act Article 10"

  - id: require-human-approval-privileged-writes
    action: file_write
    condition: "file.path.startswith('/etc/') or file.path.startswith('/var/')"
    effect: HUMAN_APPROVAL_REQUIRED
    timeout_seconds: 300

  - id: rate-limit-external-calls
    action: http_request
    condition: "request.rate > 100"
    effect: THROTTLE
    window_seconds: 60

The effect field has four values: ALLOW, DENY, HUMAN_APPROVAL_REQUIRED, and THROTTLE. Every decision — including ALLOW decisions — is recorded as a span attribute on the agent's trace, which becomes the audit trail for compliance.

Load and test the engine directly:

from agent_governance import PolicyEngine

engine = PolicyEngine.from_yaml("governance-policies.yaml")

# Simulate a policy check (no agent needed)
result = engine.evaluate(
    action="tool_call",
    context={
        "tool": {"name": "send_email"},
        "args": {"to": "external@example.com", "body": "SSN: 123-45-6789"}
    }
)
print(result.effect)   # DENY
print(result.reason)   # "PII exfiltration blocked — EU AI Act Article 10"

Framework Integrations

The toolkit hooks into existing framework extension points. You keep your existing agent code and add governance as middleware. There is no rewrite required.

LangChain

LangChain's callback system is the integration point. Every tool call and LLM invocation fires a callback, and the GovernanceCallbackHandler evaluates each one against the policy engine before allowing execution:

from langchain.agents import create_react_agent
from agent_governance import PolicyEngine, GovernanceCallbackHandler

policy_engine = PolicyEngine.from_yaml("governance-policies.yaml")

agent = create_react_agent(
    llm=llm,
    tools=tools,
    callbacks=[GovernanceCallbackHandler(policy_engine)]
)

# All tool calls are now policy-gated automatically
result = agent.invoke({"input": "Send a summary email to the customer list"})

If a tool call is denied, the agent receives a structured refusal with the policy ID and reason, not a raw exception. That lets the agent handle the denial gracefully rather than crashing.

CrewAI

CrewAI uses task decorators as the extension point. The GovernanceTaskDecorator wraps individual tasks:

from agent_governance import PolicyEngine
from agent_governance.integrations.crewai import GovernanceTaskDecorator

policy_engine = PolicyEngine.from_yaml("governance-policies.yaml")

@GovernanceTaskDecorator(policy_engine)
def research_task(agent, context):
    return agent.execute(context)

For multi-agent delegation scenarios — where a CrewAI crew delegates to sub-agents — the agentmesh-platform package adds identity verification so each agent cryptographically proves its identity before receiving a delegated task. This addresses OWASP ASI-03 (Insecure Agent Delegation).

Other Supported Frameworks

The toolkit ships integrations for:

OpenAI Agents SDK — middleware pipeline integration
LangGraph — node-level governance hooks
Google ADK — plugin system integration
Microsoft Agent Framework — native middleware pipeline
AutoGen — pre/post-tool-call hooks
LlamaIndex — TrustedAgentWorker wrapper
Haystack — component-level policy checks
PydanticAI — function tool decorator
Dify — governance plugin available in Dify's marketplace

For frameworks not on this list, the agent-os-kernel package exposes a framework-agnostic evaluate(action, context) method you can call from any hook point.

Compliance Reporting with agent-compliance

The agent-compliance package takes the audit trail produced by the policy engine and generates a graded report against regulatory frameworks. Run it as a CLI command after a test run:

# Generate EU AI Act compliance report
python -m agent_compliance report \
  --trace-file traces/agent-run-2026-05-06.jsonl \
  --framework eu-ai-act \
  --output report-eu-ai-act.html

# Or generate an OWASP Agentic AI Top 10 evidence report
python -m agent_compliance report \
  --trace-file traces/agent-run-2026-05-06.jsonl \
  --framework owasp-agentic-top-10 \
  --output report-owasp.html

The report includes:

A compliance grade (A through F) per framework section
Evidence collected at runtime mapped to specific regulatory articles
Gaps flagged with the policy changes needed to close them

For EU AI Act purposes, the evidence maps to Article 9 (risk management), Article 10 (data governance), Article 12 (logging), Article 14 (human oversight), and Articles 15/17 (accuracy and cybersecurity). For Colorado AI Act, it maps to risk-management programs (Section 6-1-1702) and impact assessment documentation.

Deploying on Azure App Service

The toolkit has a purpose-built integration for Azure App Service that wires governance directly into the App Service middleware pipeline. The Microsoft Tech Community blog documents the full deployment pattern, but the essentials are:

# Add to requirements.txt
agent-governance-toolkit[full]>=1.0.0

# Set as app setting in Azure Portal or via CLI
az webapp config appsettings set \
  --name your-agent-app \
  --resource-group your-rg \
  --settings GOVERNANCE_POLICY_PATH=governance-policies.yaml

The App Service integration adds governance events to Application Insights automatically, so policy decisions appear in Azure Monitor alongside standard app telemetry. This is the path for teams deploying to Azure who need the compliance audit trail stored in a managed service rather than their own logging infrastructure.

Common Mistakes

Treating DENY as an error condition. When the policy engine returns DENY, that is the system working correctly. Wire denied actions to a structured agent response, not to an exception handler. Agents should be able to explain why they could not complete a task.

Installing the full toolkit before you need it. The [full] install is seven packages including RL governance and marketplace management that most teams will not use immediately. Start with agent-os-kernel and agent-compliance, then add packages as the need arises.

Writing policies only for known bad behavior. Default-deny is safer than default-allow for agents with broad tool access. Start with a policy that allows only explicitly listed tool categories, then expand. It is much harder to audit an agent that can do anything unless denied.

Skipping identity verification for multi-agent delegation. If your agents spawn sub-agents, each sub-agent runs with the full capability set of the parent unless you add identity-based capability gating via agentmesh-platform. This is the most common gap Effloow Lab sees in agent governance setups: single-agent policies are tight, multi-agent delegation is open.

Not exporting traces before running the compliance report. The agent-compliance CLI reads from a trace file in JSONL format. Confirm your OpenTelemetry or agent SDK is writing traces before you deploy to production — running a compliance report against an empty trace file returns a failing grade for every category.

Q: Does the Agent Governance Toolkit work with agents that are not Python-based?

Yes. The toolkit provides SDKs for Python, TypeScript, Rust, Go, and .NET. The policy engine itself is a stateless HTTP service, so agents in any language can call POST /evaluate to check a proposed action against the policy. The Python and TypeScript SDKs wrap this into idiomatic callback and middleware patterns for each framework.

Q: What happens when the policy engine is unavailable?

The default behavior is fail-open: if the policy engine cannot be reached within the configured timeout (default 50ms), the action proceeds. You can change this to fail-closed by setting on_timeout: DENY in your policy file. For production systems handling sensitive data, fail-closed is the correct default.

Q: Is there a cost to the regulatory compliance mapping?

No. The compliance mapping is built into the agent-compliance package and runs entirely locally from the trace file. There is no SaaS component, no data leaves your environment, and no fee beyond the open-source MIT license.

Q: How does this relate to the Microsoft Agent 365 governance layer?

Microsoft Agent 365 (generally available May 1, 2026 at $15/user) is an enterprise control plane for managing which agents are deployed across a Microsoft 365 tenant, with Copilot policy settings and approval workflows for Copilot-authored code. The Agent Governance Toolkit is a developer-level runtime security library — orthogonal and complementary. Agent 365 governs which agents can run; the Agent Governance Toolkit governs what agents can do while they run. Teams using both get organizational governance plus per-action policy enforcement. See our Microsoft Agent 365 guide for the enterprise control plane details.

Q: Can I use OPA Rego policies without learning OPA?

YAML policies cover the majority of use cases. The official documentation suggests using YAML for blocking specific tools or requiring human approval on categories of actions, and reserving OPA Rego for scenarios that require complex joins — for example, a policy that checks both the requesting agent's identity and the target resource's classification level simultaneously. If you have not used OPA before, start with YAML.

Q: Does the toolkit support the A2A (Agent2Agent) protocol?

The agentmesh-platform package adds DID-based identity for A2A communication, but it is not a full A2A protocol implementation. For multi-agent systems using the A2A protocol (donated to the Linux Foundation at Google Cloud Next 2026), the toolkit provides identity verification and delegation governance. See our A2A protocol guide for the protocol fundamentals.

Key Takeaways

The Microsoft Agent Governance Toolkit solves a problem that agent framework documentation ignores: what happens between the moment an agent decides to take an action and the moment that action executes. The gap between "agent wants to send email" and "email is sent" is where most agent security incidents happen, and it has been handled ad hoc until now.

The practical value for developers in 2026 is timing. EU AI Act obligations land in August 2026. The Agent Governance Toolkit provides ready-made compliance evidence collection aligned to those obligations, reducing the amount of custom audit-logging code teams need to write before that deadline.

Bottom Line

The Agent Governance Toolkit is the first open-source project that makes AI agent compliance tractable for production teams. Start with agent-os-kernel and a YAML policy file — you will have sub-millisecond policy enforcement and an EU AI Act-aligned audit trail in under an hour. Add agent-compliance before August 2026 if you need regulatory reports.

DEV Community