Skip to content

Health Checks

The daemon exposes two HTTP health endpoints for container orchestration, load balancers, and monitoring systems.

EndpointAuthPurpose
GET /api/health/readyNoneLiveness/readiness probe
GET /api/health/statusRequiredFull subsystem status

Both listen on the daemon’s configured port (default 5199).

Terminal window
curl -sf http://127.0.0.1:5199/api/health/ready

Returns 200 OK with body healthy when the daemon is accepting requests. No authentication, no JSON — just a string. Use this for Docker HEALTHCHECK, Kubernetes liveness probes, and load balancer health checks.

Any non-200 response (or connection refused) means the daemon isn’t ready.

The official container image uses this:

HEALTHCHECK --interval=15s --timeout=5s --start-period=30s --retries=3 \
CMD curl -sf http://127.0.0.1:5199/api/health/ready || exit 1

The 30-second start period gives the daemon time to initialize providers and connect channels before the first probe fires.

livenessProbe:
httpGet:
path: /api/health/ready
port: 5199
initialDelaySeconds: 30
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3

For systemd-managed daemons, use a ExecStartPost check or a watchdog timer:

[Service]
ExecStartPost=/bin/sh -c 'until curl -sf http://127.0.0.1:5199/api/health/ready; do sleep 2; done'
Terminal window
curl -s http://127.0.0.1:5199/api/health/status \
-H "Authorization: Bearer $(netclaw auth token)"

Returns a JSON object with the state of every subsystem. This is the same data that netclaw status displays — the CLI just formats it as a table.

Requires authentication (bearer token or loopback origin). Returns 401 without valid credentials.

{
"overall": "healthy",
"build": {
"version": "0.4.2",
"commitHash": "a1b2c3d",
"buildTimestamp": "2026-05-01T12:00:00Z"
},
"process": {
"pid": 1234,
"startedAtUtc": "2026-05-05T08:00:00Z",
"uptimeSeconds": 3600
},
"connectors": [
{
"key": "slack",
"displayName": "Slack",
"enabled": true,
"status": "healthy",
"message": null
},
{
"key": "discord",
"displayName": "Discord",
"enabled": true,
"status": "disconnected",
"message": "Gateway timeout"
},
{
"key": "mcp:github",
"displayName": "GitHub MCP",
"enabled": true,
"status": "healthy",
"message": null
}
],
"model": {
"modelId": "anthropic/claude-sonnet-4",
"displayName": "Claude Sonnet 4",
"provider": "openrouter",
"inputModalities": ["text", "image"],
"outputModalities": ["text"],
"contextWindow": 200000
},
"persistence": {
"provider": "sqlite"
},
"memory": {
"provider": "sqlite",
"status": "healthy",
"databasePath": "/root/.netclaw/memory/netclaw-memory.db",
"pendingCheckpoints": 0
},
"reminders": {
"scheduledCount": 3,
"activeExecutions": 0,
"failedCount": 0
},
"telemetry": {
"enabled": true,
"otlpEndpoint": "http://localhost:4317"
},
"update": {
"state": "up-to-date",
"available": false,
"currentVersion": "0.4.2",
"latestVersion": "0.4.2"
}
}

The overall field is computed from connector states:

OverallCondition
healthyAll enabled connectors are healthy
degradedAny enabled connector is disconnected, degraded, auth-required, or auth-failed

The HTTP status code is always 200 — parse the overall field to determine health. This avoids false-positive container restarts when a single channel has a transient disconnect.

Each connector (Slack, Discord, MCP servers) reports one of:

StatusMeaning
healthyConnected and operational
degradedPartially functional (e.g., reconnecting)
disconnectedConnection lost
auth-requiredNeeds OAuth flow (MCP servers)
auth-failedCredentials rejected
disabledTurned off in config
StatusMeaning
healthyDatabase accessible, checkpoint backlog ≤ 25
degradedCheckpoint backlog growing (> 25 pending)
unavailableDatabase unreachable

Poll /api/health/status on an interval and extract metrics:

Terminal window
# Simple availability check (no auth needed)
curl -sf http://127.0.0.1:5199/api/health/ready && echo 1 || echo 0

For richer metrics, use the OpenTelemetry integration which exports to OTLP directly.

When a connector transitions to disconnected or auth-failed, the daemon fires an operational alert to configured webhook targets. You don’t need to poll the status endpoint to detect failures — alerts push to you.