Building a security proxy for MCP

Model Context Protocol (MCP) has become the default way to connect AI agents — Cursor, Claude Code, and others — to external tools: filesystems, databases, APIs. The protocol is well-designed. One thing it deliberately leaves out is authentication and authorization.

That’s a reasonable choice for a protocol spec. It’s a problem in practice.

The gap

An MCP server exposes tools. Any client that can reach the server can call any tool. There’s no concept of per-client permissions, no rate limiting, no audit trail. If you run a filesystem server that can write files and two agents connect — one trusted, one not — they both get the same access.

The standard answer is “put it behind a firewall.” That works for a server you control end-to-end. It doesn’t work when you want to give different agents different levels of access to the same server, or when you want to know after the fact what each agent actually called.

There’s also a subtler problem: AI agents are prompt-injectable. A malicious tool response can instruct an agent to exfiltrate data or call a tool it shouldn’t. A proxy that sits in the path of every response can catch this before it reaches the agent — but only if response filtering is built in from the start.

What I built

Arbitus is a security proxy that sits between AI agents and MCP servers. It enforces per-agent policies before any tool call reaches the upstream.

Agent (Cursor, Claude, etc.)
        │  JSON-RPC
        ▼
    arbitus     ← auth, rate limit, HITL, payload filter, audit
        │
        ▼
  MCP Server (filesystem, database, APIs...)

A policy looks like this:

agents:
  cursor:
    allowed_tools: [read_*]   # glob wildcards supported
    rate_limit: 30
    timeout_secs: 10
  claude-code:
    denied_tools: [write_file, delete_file]
    rate_limit: 60

default_policy:               # fallback for unlisted agents
  denied_tools: [delete_*]
  rate_limit: 10

Each agent gets an allowlist or denylist with glob wildcards. The tools/list response is filtered before it reaches the agent — it only sees tools it’s allowed to call. Agents not in the config fall back to default_policy, or are blocked entirely if none is set.

The pipeline applies in order: rate limit → auth → HITL → schema validation → payload filter → upstream.

Why Rust

The proxy sits in the hot path of every tool call. It needs to be fast and correct.

Rust’s type system made the architecture natural. Each layer of the pipeline is a trait object — transport, upstream, and audit backend are all traits. You can swap SQLite for a webhook, or HTTP transport for stdio, without touching the core logic. The policy map is an Arc<HashMap> shared across all middleware without copying.

The other reason was cross-compilation. MCP servers run everywhere — Linux, macOS, Windows, ARM boards. Rust compiles to all of them cleanly, especially once you drop OpenSSL for rustls (pure Rust TLS, no system dependency).

The tricky parts

stdio mode. MCP has two transports: HTTP+SSE and stdio. In stdio mode, the agent and server communicate via stdin/stdout — there’s no HTTP request to intercept. Arbitus spawns the upstream server as a child process, reads from its stdout, writes to its stdin, and applies policies on each message in both directions, including responses.

Encoding-aware filtering. The first version matched block patterns against raw request strings. That’s bypassable — a payload that’s Base64-encoded, percent-encoded, or Unicode-normalized differently won’t match a literal regex. The solution was to decode every candidate string through a chain of passes (standard Base64, URL-safe Base64, percent-encoding, double-encoding, Unicode NFC normalization with Bidi-control stripping) before applying patterns. Each decode attempt happens regardless of whether it succeeds, so the pipeline catches multi-layer encoding.

Schema validation in the hot path. Each tools/call gets validated against the inputSchema fetched from tools/list. The schema cache is LRU-bounded to prevent memory exhaustion when agents probe tool lists aggressively. The tricky part was deciding what happens on validation failure: return a structured JSON-RPC error with the exact constraint that failed, without leaking anything about the upstream’s internal schema structure.

Human-in-the-Loop. The HITL middleware suspends any tool call matching approval_required patterns, stores it with a generated ID, and holds the HTTP response open until an operator calls POST /approvals/{id}/approve or POST /approvals/{id}/reject. If no decision arrives within hitl_timeout_secs, it auto-rejects. The hard part was the connection model: the suspended request is a live goroutine holding a response writer — approval comes in on a different HTTP path. A per-request oneshot channel bridges them.

Immutable audit. The SQLite audit backend now uses database-level enforcement: INSERT triggers compute a SHA-256 hash over each row’s fields chained to the previous row’s hash, and a BEFORE UPDATE trigger rejects any modification. No application code can alter the audit trail after the fact — the database itself refuses. This required careful ordering of trigger creation and a genesis hash for the first row.

OPA/Rego integration. Every tools/call can be evaluated against a Rego policy file. The implementation uses regorus (a pure-Rust Rego evaluator) instead of the OPA binary or WASM runtime, which avoids an external process dependency. The policy input exposes agent ID, tool name, arguments, and client IP. The tricky part was deciding what a policy silence means: no matching rule defaults to deny, because a policy that allows everything by omission isn’t a policy.

mTLS agent identity. When mutual TLS is enabled, the client certificate’s Common Name is extracted and mapped to an agent policy entry via mtls_identity. Authentication priority is JWT Bearer → mTLS CN → API key → clientInfo.name. The CN extraction happens in the TLS handshake handler, before any JSON-RPC parsing, so a client that can’t authenticate never reaches the middleware pipeline.

SSRF guard on OIDC discovery. OIDC provider config is fetched from {issuer}/.well-known/openid-configuration. An issuer URL from untrusted config could point to an internal service. The guard validates that the resolved IP is not in RFC 1918 or link-local ranges before making the fetch. This runs at startup, not per-request, but the check still needs to be airtight — a bypass here is a full SSRF.

MCP Sampling security. MCP’s sampling messages flow in the opposite direction — from server to agent. The bidirectional middleware pipeline was designed for tool calls going downstream; applying policies to server-initiated messages required inverting the pipeline traversal order and adding a separate sampling policy config. Immutable audit triggers now cover both directions.

What it does now

Access control

Per-agent allowlist / denylist for tools, resources, and prompts — with glob wildcards
tools/list, resources/list, and prompts/list filtered before reaching the agent
Default policy fallback for unlisted agents
Human-in-the-Loop: suspend tool calls pending operator approval via REST API
Shadow mode: intercept and log without forwarding; dry-run risky operations

Auth

JWT / OIDC — single provider or a list; Google, GitHub Actions, Auth0, Okta presets; JWKS caching
OAuth 2.1 with PKCE
API key per agent
mTLS with CN-to-agent mapping
Auth priority: JWT Bearer → mTLS CN → API key → clientInfo.name

Rate limiting

Sliding window per agent, per tool, and per IP
GCRA algorithm via governor — lock-free, O(1) per check
Standard X-RateLimit-* headers on every response

Filtering

Payload filtering: block or redact with user-defined regexes
Encoding-aware: Base64, percent-encoding, double-encoding, Unicode normalization decoded before matching
Response filtering: sensitive patterns scrubbed from upstream responses before reaching the agent
Prompt injection detection: seven built-in patterns covering instruction-override, role reassignment, DAN-style jailbreaks, and prompt exfiltration
Schema validation: tools/call arguments validated against inputSchema from tools/list
OPA/Rego policy engine: arbitrary policy-as-code evaluated per tool call

Audit and observability

SQLite audit log with immutable hash chain (database-level triggers)
Webhook backend with plain JSON or CNCF CloudEvents 1.0
OpenLineage integration (SIEM/data lineage pipelines)
Fan-out to multiple backends simultaneously
X-Request-Id on every response, recorded in audit and OTel spans
Prometheus-compatible /metrics with cost/token estimation per agent
OpenTelemetry tracing via OTLP
/dashboard audit viewer with per-agent filtering
arbitus audit CLI with filtering by agent, outcome, time range

Resilience

Circuit breaker per upstream: automatic isolation and half-open recovery
Config hot-reload: SIGUSR1, automatic 30s polling, or Kubernetes ConfigMap watcher
Graceful shutdown: drain connections, close child processes, flush audit backends

Supply chain

Verify upstream MCP server binaries via SHA-256 hash pinning
Cosign/Sigstore bundle verification before spawning

Infrastructure

HTTP+SSE and stdio transports; TLS and mTLS
Tool federation: aggregate tools from multiple upstreams into a single agent view
OpenAI Tools Bridge: /openai/v1/tools and /openai/v1/execute for OpenAI function-calling clients
Multi-arch Docker image (ghcr.io/arbitusgateway/arbitus)
Helm chart with sidecar pattern, HPA, PDB, NetworkPolicy, PVC
Kubernetes ConfigMap watcher for zero-touch config updates
Secrets-safe config: ${VAR} interpolation and ARBITUS_* env var overrides

The landscape

agentgateway (Linux Foundation) is the most mature project in this space — Rust core, WASM plugin system, multi-protocol. It’s the right choice for platform-level infrastructure that needs to handle arbitrary agent protocols.

Arbitus bets on a different scope: deep MCP-specific security, not a generalized gateway. The features that matter for that bet — HITL, shadow mode, OPA policy evaluation, encoding-aware filtering, mTLS agent identity, supply-chain verification, immutable audit — don’t make sense as generic gateway plugins. They’re built around the semantics of MCP tool calls specifically.

The install path is still cargo install arbitus with a single YAML file. The Kubernetes path is a Helm chart with a sidecar pattern. No commercial tier, no managed service requirement.

Full source at github.com/arbitusgateway/arbitus. Documentation at arbitus-gateway.xyz.

O Model Context Protocol (MCP) se tornou a forma padrão de conectar agentes de IA — Cursor, Claude Code, entre outros — a ferramentas externas: sistemas de arquivos, bancos de dados, APIs. O protocolo é bem projetado. Uma coisa que ele deliberadamente deixa de fora é autenticação e autorização.

Essa é uma escolha razoável para uma especificação de protocolo. É um problema na prática.

O gap

Um servidor MCP expõe tools. Qualquer cliente que consiga alcançar o servidor pode chamar qualquer tool. Não existe conceito de permissões por cliente, sem rate limiting, sem trilha de auditoria. Se você roda um servidor de filesystem que pode escrever arquivos e dois agentes se conectam — um confiável, outro não — os dois recebem o mesmo acesso.

A resposta padrão é “coloca atrás de um firewall.” Isso funciona para um servidor que você controla de ponta a ponta. Não funciona quando você quer dar a agentes diferentes níveis de acesso ao mesmo servidor, ou quando quer saber depois o que cada agente realmente chamou.

Existe também um problema mais sutil: agentes de IA são suscetíveis a prompt injection. Uma resposta maliciosa de uma tool pode instruir o agente a exfiltrar dados ou chamar uma tool que não deveria. Um proxy no caminho de cada resposta pode interceptar isso antes de chegar ao agente — mas só se o response filtering for parte do design desde o início.

O que construí

Arbitus é um proxy de segurança que fica entre agentes de IA e servidores MCP. Ele aplica políticas por agente antes que qualquer tool call chegue ao upstream.

Agente (Cursor, Claude, etc.)
        │  JSON-RPC
        ▼
    arbitus     ← auth, rate limit, HITL, payload filter, audit
        │
        ▼
  Servidor MCP (filesystem, banco de dados, APIs...)

Uma política fica assim:

agents:
  cursor:
    allowed_tools: [read_*]   # wildcards glob suportados
    rate_limit: 30
    timeout_secs: 10
  claude-code:
    denied_tools: [write_file, delete_file]
    rate_limit: 60

default_policy:               # fallback para agentes não listados
  denied_tools: [delete_*]
  rate_limit: 10

Cada agente recebe uma allowlist ou denylist com wildcards glob. A resposta do tools/list é filtrada antes de chegar ao agente — ele só vê as tools que pode chamar. Agentes fora da config herdam default_policy, ou são bloqueados completamente se não houver.

O pipeline se aplica em ordem: rate limit → auth → HITL → schema validation → payload filter → upstream.

Por que Rust

O proxy fica no caminho crítico de cada tool call. Precisa ser rápido e correto.

O sistema de tipos do Rust tornou a arquitetura natural. Cada camada do pipeline é um trait object — transport, upstream e audit backend são todos traits. Você pode trocar SQLite por webhook, ou HTTP por stdio, sem tocar na lógica central. O mapa de políticas é um Arc<HashMap> compartilhado entre todos os middleware sem copiar.

O outro motivo foi cross-compilation. Servidores MCP rodam em todo lugar — Linux, macOS, Windows, boards ARM. Rust compila para todos de forma limpa, especialmente depois de trocar OpenSSL por rustls (TLS puro em Rust, sem dependência de sistema).

As partes difíceis

Modo stdio. O MCP tem dois transports: HTTP+SSE e stdio. No modo stdio, agente e servidor se comunicam via stdin/stdout — não há requisição HTTP para interceptar. O Arbitus spawna o servidor upstream como processo filho, lê do stdout dele, escreve no stdin, e aplica políticas em cada mensagem nos dois sentidos, incluindo respostas.

Filtragem com awareness de encoding. A primeira versão comparava padrões de bloqueio contra strings brutas. Isso é contornável — um payload codificado em Base64, percent-encoded ou com normalização Unicode diferente não faz match em um regex literal. A solução foi decodificar cada string candidata por uma cadeia de passes (Base64 padrão, Base64 URL-safe, percent-encoding, double-encoding, normalização NFC com remoção de controles Bidi) antes de aplicar os padrões. Cada tentativa de decodificação acontece independente de sucesso, então o pipeline captura encoding em múltiplas camadas.

Schema validation no caminho crítico. Cada tools/call é validado contra o inputSchema obtido do tools/list. O cache de schemas é limitado por LRU para evitar exaustão de memória quando agentes exploram tool lists agressivamente. A parte difícil foi decidir o que acontece em falha de validação: retornar um erro JSON-RPC estruturado com a constraint exata que falhou, sem vazar nada sobre a estrutura interna do upstream.

Human-in-the-Loop. O middleware HITL suspende qualquer tool call que faça match nos padrões de approval_required, armazena com um ID gerado, e mantém a resposta HTTP aberta até que um operador chame POST /approvals/{id}/approve ou POST /approvals/{id}/reject. Se nenhuma decisão chegar dentro de hitl_timeout_secs, auto-rejeita. A parte difícil foi o modelo de conexão: a requisição suspensa é uma goroutine viva segurando um response writer — a aprovação chega por um caminho HTTP diferente. Um channel oneshot por requisição faz a ponte.

Audit imutável. O backend SQLite agora usa enforcement no nível do banco de dados: triggers de INSERT calculam um hash SHA-256 sobre os campos de cada linha encadeado ao hash da linha anterior, e um trigger BEFORE UPDATE rejeita qualquer modificação. Nenhum código de aplicação pode alterar o audit trail depois do fato — o próprio banco recusa. Isso exigiu ordenação cuidadosa na criação dos triggers e um hash gênesis para a primeira linha.

Integração OPA/Rego. Cada tools/call pode ser avaliado contra um arquivo de política Rego. A implementação usa regorus (um avaliador Rego puro em Rust) em vez do binário OPA ou um runtime WASM, o que evita dependência de processo externo. O input da política expõe ID do agente, nome da tool, argumentos e IP do cliente. A parte difícil foi decidir o que um silêncio de política significa: nenhuma regra matching resulta em deny por padrão, porque uma política que permite tudo por omissão não é uma política.

Identidade mTLS por agente. Quando mutual TLS está habilitado, o Common Name do certificado do cliente é extraído e mapeado para uma entrada de política de agente via mtls_identity. A prioridade de autenticação é JWT Bearer → mTLS CN → API key → clientInfo.name. A extração do CN acontece no handler do handshake TLS, antes de qualquer parsing de JSON-RPC, então um cliente que não consegue autenticar nunca chega ao pipeline de middleware.

SSRF guard no OIDC discovery. A config do provider OIDC é obtida de {issuer}/.well-known/openid-configuration. Uma URL de issuer de config não confiável poderia apontar para um serviço interno. O guard valida que o IP resolvido não está em ranges RFC 1918 ou link-local antes de fazer o fetch. Isso roda na inicialização, não por requisição, mas o check ainda precisa ser hermético — um bypass aqui é um SSRF completo.

Segurança em MCP Sampling. As mensagens de sampling do MCP fluem na direção oposta — do servidor para o agente. O pipeline de middleware bidirecional foi projetado para tool calls indo downstream; aplicar políticas a mensagens iniciadas pelo servidor exigiu inverter a ordem de traversal do pipeline e adicionar uma config de política de sampling separada. Os triggers de audit imutável agora cobrem ambas as direções.

O que tem agora

Controle de acesso

Allowlist / denylist por agente para tools, resources e prompts — com wildcards glob
tools/list, resources/list e prompts/list filtrados antes de chegar ao agente
Default policy fallback para agentes não listados
Human-in-the-Loop: suspender tool calls aguardando aprovação do operador via REST API
Shadow mode: interceptar e logar sem encaminhar; dry-run de operações arriscadas

Auth

JWT / OIDC — provider único ou lista; presets Google, GitHub Actions, Auth0, Okta; cache de JWKS
OAuth 2.1 com PKCE
API key por agente
mTLS com mapeamento de CN para agente
Prioridade de auth: JWT Bearer → mTLS CN → API key → clientInfo.name

Rate limiting

Janela deslizante por agente, por tool e por IP
Algoritmo GCRA via governor — lock-free, O(1) por check
Headers padrão X-RateLimit-* em cada resposta

Filtragem

Payload filtering: block ou redact com regexes do usuário
Encoding-aware: Base64, percent-encoding, double-encoding, Unicode normalization decodificados antes do match
Response filtering: padrões sensíveis removidos das respostas do upstream antes de chegar ao agente
Detecção de prompt injection: sete padrões built-in cobrindo override de instrução, reassignment de papel, jailbreaks estilo DAN e exfiltração de prompt
Schema validation: argumentos de tools/call validados contra inputSchema do tools/list
OPA/Rego policy engine: policy-as-code arbitrária avaliada por tool call

Audit e observabilidade

Audit log em SQLite com hash chain imutável (triggers no nível do banco)
Backend webhook com JSON simples ou CNCF CloudEvents 1.0
Integração OpenLineage (pipelines SIEM/data lineage)
Fan-out para múltiplos backends simultaneamente
X-Request-Id em cada resposta, gravado no audit e nos spans OTel
/metrics compatível com Prometheus com estimativa de custo/tokens por agente
Tracing OpenTelemetry via OTLP
/dashboard com filtragem por agente
CLI arbitus audit com filtragem por agente, outcome e intervalo de tempo

Resiliência

Circuit breaker por upstream: isolamento automático e recovery half-open
Config hot-reload: SIGUSR1, polling automático a cada 30s, ou watcher de ConfigMap do Kubernetes
Graceful shutdown: drenar conexões, fechar processos filhos, fazer flush dos backends de audit

Supply chain

Verificar binários de servidores MCP upstream via SHA-256 hash pinning
Verificação de bundle Cosign/Sigstore antes de spawnar

Infraestrutura

Transports HTTP+SSE e stdio; TLS e mTLS
Tool federation: agregar tools de múltiplos upstreams em uma visão unificada por agente
OpenAI Tools Bridge: /openai/v1/tools e /openai/v1/execute para clientes OpenAI function-calling
Imagem Docker multi-arch (ghcr.io/arbitusgateway/arbitus)
Helm chart com sidecar pattern, HPA, PDB, NetworkPolicy, PVC
Watcher de ConfigMap do Kubernetes para atualizações de config sem intervenção
Config com segurança para secrets: interpolação ${VAR} e overrides via env vars ARBITUS_*

O cenário

O agentgateway (Linux Foundation) é o projeto mais maduro neste espaço — core em Rust, sistema de plugins WASM, multi-protocolo. É a escolha certa para infraestrutura de nível de plataforma que precisa lidar com protocolos de agente arbitrários.

O Arbitus aposta em um escopo diferente: segurança profunda e específica para MCP, não um gateway genérico. As features que importam para essa aposta — HITL, shadow mode, OPA, filtragem com awareness de encoding, identidade mTLS, verificação de supply chain, audit imutável — não fazem sentido como plugins genéricos de gateway. São construídas em torno da semântica específica de tool calls do MCP.

O caminho de instalação ainda é cargo install arbitus com um único arquivo YAML. O caminho Kubernetes é um Helm chart com sidecar pattern. Sem tier comercial, sem requisito de serviço gerenciado.

Código completo em github.com/arbitusgateway/arbitus. Documentação em arbitus-gateway.xyz.