Skip to content

Security Architecture

Netclaw gives an LLM shell access, file access, and the ability to talk to external services. Without strong guardrails, that’s a loaded weapon pointed at your infrastructure. The security architecture exists to make the agent useful without making it dangerous.

All of this is enforced automatically from daemon startup — no additional setup required for the defaults to protect you.

The core bet: explicit grants are safer than implicit restrictions. Rather than trying to enumerate everything dangerous and blocking it, netclaw starts from zero permissions and requires you to opt in to each capability. This inverts the typical “sandbox” approach — instead of poking holes in a wall, you’re building up from nothing.

Most tools start permissive and add restrictions. Netclaw does the opposite. A freshly initialized daemon with no config file binds to loopback, disables shell access, and limits tools to basic file operations in a temporary session directory that’s wiped on session end. You build up from there.

Misconfiguration fails safe. Forget to add a tool grant? The tool is invisible. Typo in a channel ID? That channel gets nothing. Config file corrupted? Daemon won’t start.

Rather than per-user ACLs (which require identity infrastructure most self-hosted deployments don’t have), netclaw classifies trust by channel type. A message from the TUI (terminal interface, launched by netclaw chat) is from the operator sitting at the machine — high trust. A message from Slack could be anyone in the workspace — lower trust. An unknown channel gets the lowest trust.

Three audiences, ordered by trust:

Personal > Team > Public

Security posture selection during netclaw init

Posture selection during netclaw init — the choice that establishes the baseline trust tier for your deployment.

This maps to how people actually deploy: you trust yourself on your own machine, partially trust your coworkers in Slack, and don’t trust unknown sources at all. The exact channel-to-audience mapping and per-audience permission tables are in the Security Model reference.

A single permission check is a single point of failure. Netclaw stacks four independent defense-in-depth layers:

┌─────────────────────────────────┐
│ 1. Operation Hard Deny │ ← unconditional, not overridable
├─────────────────────────────────┤
│ 2. Resource Hard Deny │ ← path-based, symlink-aware
├─────────────────────────────────┤
│ 3. Tool Access Grant │ ← audience-scoped allowlist
├─────────────────────────────────┤
│ 4. Approval Gate │ ← human in the loop
└─────────────────────────────────┘

Each layer can only deny — none can override a denial from a layer above it. Layer 1 blocks rm -rf / regardless of audience or approval status. Layer 2 blocks credential access even for Personal. Layer 3 controls what the model can see. Layer 4 adds human oversight for high-risk operations.

This is a conceptual model — the actual enforcement runs across both the gateway (for inbound message ACLs) and the session layer (for tool execution policy). What matters: no tool executes without passing all four checks, regardless of where in the code they run. See the reference page for exact deny lists and configuration syntax.

Inbound message ACL checks run in the gateway boundary, before messages reach the session layer. Tool execution policy (shell denies, path checks, approval gates) runs in the session’s tool executor, synchronously before any tool fires. The three-boundary design (gateway → session → subscriber) enforces this structurally — the gateway is the only ingress point for messages, and the tool executor is the only execution path for tools.

Human-in-the-loop as a layer, not a crutch

Section titled “Human-in-the-loop as a layer, not a crutch”

Approval gates sit at Layer 4 — after hard denies and access grants have already filtered. The human only sees requests that are structurally permitted but operationally risky. You’re not approving every ls — you’re approving git push to a remote.

The approval system extracts verb-chain patterns from shell commands. A verb-chain is the leading command tokens (git push, docker compose up) without paths or flags — so approving git push covers git push origin main and git push --force-with-lease. Compound commands (cmd1 && cmd2) are split and each segment is checked independently.

Channels that don’t support interactive prompts (scheduled reminders, webhooks, headless automation) auto-deny tools that require approval — unless those tools have been persistently pre-approved or appear on the safe-list for non-interactive execution.

Every inbound message carries a trust context assembled from:

SignalSourcePurpose
Principal classificationChannel adapterWho is speaking? (operator, team member, external, automation)
Transport authenticityConnection typeCan we verify the sender? (local process, verified, unverified)
Payload taintContent originWhere did this data come from? (trusted, community, public)
AudienceChannel typeWhat trust level does this channel get?

The audience resolves the final permission set. The other signals flow through to audit logs for post-hoc review.

External tool servers connecting via the Model Context Protocol (MCP) are doubly gated:

  1. Server allowlist — the MCP server must be explicitly permitted for the audience
  2. Tool allowlist — if per-tool grants are configured for that server, only listed tools are available

When no per-tool grants are configured for a server, all tools on that server are accessible (subject to the server-level gate). Configure McpServerToolGrants to restrict individual tools — see netclaw mcp permissions for the operational details.

Netclaw doesn’t load MCP tools into the model’s context until explicitly searched via search_tools. Granted-but-unused tools don’t consume context window or influence behavior — the model doesn’t know they exist until it asks.

The agent can modify its own personality, instructions, scheduled tasks, and project registry — but security policy is off-limits. ACL rules, exposure mode, tool grants, and audience mappings are read-only from the agent’s perspective.

This is enforced at two levels: the config_write tool category (one of seven grant categories in the ACL policy) requires explicit grants, and the resource hard-deny layer blocks direct file access to config paths regardless of tool grants.

What the security model protects against:

  • The LLM deciding to run destructive commands
  • Prompt injection via tool output or file contents attempting privilege escalation
  • Overly broad tool access from misconfiguration
  • Credential leakage through command output
  • Unauthorized network exposure

What it explicitly does not protect against:

  • A malicious operator with direct machine access (they can edit config files)
  • Side-channel attacks against the LLM provider’s API
  • Denial of service against the daemon process itself
  • Vulnerabilities in third-party MCP servers (netclaw gates access, not behavior)
  • Prompt injection detection is regex-based, not semantic — novel phrasings can evade it
  • Approval gates require an interactive channel — headless sessions (scheduled jobs, reminders, webhooks with no connected user) auto-deny gated tools unless persistently pre-approved
  • No per-user identity within a channel — all Slack users in an allowed channel get the same audience
  • Secret redaction catches known patterns only — custom formats need custom path deny rules
  • No formal sandboxing (seccomp, namespaces) for shell execution — defense is policy-based, not kernel-based