Skip to content

OpenTelemetry

Netclaw can push metrics and structured logs to any OpenTelemetry Protocol (OTLP) compatible collector. Flip it on, point it at your collector, and your existing backend (Grafana, Datadog, Honeycomb, whatever speaks OTLP) gets per-channel message flow metrics, token consumption counters, and full daemon logs.

Merge a Telemetry block into your existing ~/.netclaw/config/netclaw.json:

{
"Telemetry": {
"Enabled": true,
"Otlp": {
"Endpoint": "http://127.0.0.1:4317"
}
}
}
FieldTypeDefaultDescription
EnabledboolfalseTurns on the OTLP export pipeline
Otlp:Endpointstringhttp://127.0.0.1:4317OTLP collector endpoint (gRPC)

Netclaw uses gRPC OTLP on port 4317, not HTTP/Protobuf (4318). If your collector only accepts HTTP OTLP, you’ll get silent failures.

Environment variable overrides follow the .NET double-underscore convention:

Terminal window
export NETCLAW_Telemetry__Enabled="true"
export NETCLAW_Telemetry__Otlp__Endpoint="http://127.0.0.1:4317"

Telemetry config changes need a daemon restart. Run netclaw doctor first to catch config errors before you bounce the daemon:

Terminal window
netclaw doctor
netclaw daemon stop && netclaw daemon start

If Otlp:Endpoint isn’t a valid absolute URI, the daemon refuses to start, even with telemetry disabled:

Telemetry:Otlp:Endpoint must be an absolute URI.

netclaw doctor catches this before you hit it at startup. It validates the endpoint format and warns when telemetry is on but no explicit endpoint is set.

The OTel resource service name is netclawd (hardcoded, not configurable).

Every log line the daemon produces goes to your collector, with full formatting and scope data (IncludeFormattedMessage, IncludeScopes, ParseStateValues all enabled). Two meters cover metrics: one for session-level token usage, one for per-channel message flow. Full reference below.

Distributed tracing is off for now. The cross-actor model produces disconnected spans with no meaningful causality chain, so it’s more noise than signal.

Token consumption and turn tracking across all sessions.

MetricTypeDescription
netclaw.session.tokens.inputCounterInput tokens consumed
netclaw.session.tokens.outputCounterOutput tokens consumed
netclaw.session.turns.completedCounterConversation turns completed

These are aggregate totals across all models and providers, with no per-model or per-provider attribute breakdowns.

Per-channel message pipeline metrics. Each metric name is prefixed with netclaw.channel.{channel-type} where channel type is one of: slack, tui, headless, signalr, reminder, webhook, discord.

Metric suffixTypeAttributesDescription
.events.receivedCounterkindInbound events received
.events.droppedCounterreasonEvents dropped before processing
.events.filteredCounterreasonEvents filtered by policy
.events.routedCounterkindEvents that reached conversation routing
.messages.enqueuedCounterMessages accepted into session queue
.replies.postedCounterSuccessful replies sent
.replies.rejectedCountererror_codeRejected reply attempts
.replies.failedCounterFailed reply attempts
.reply.duration.msHistogramReply post latency in milliseconds (includes both successful and failed attempts)

Netclaw.Webhooks (the meter behind netclaw stats) is not wired into OTLP. The netclaw.channel.webhook.* metrics above cover message flow through the webhook channel; per-route delivery stats only show up in netclaw stats.

SymptomWhat to check
No repliesevents.received > 0 but replies.posted = 0. Events arrive, nothing comes back.
Looping agentturns.completed climbing without corresponding replies.posted
Policy dropping messagesevents.dropped with reason attribute showing the cause

Point the endpoint at any OpenTelemetry Collector that accepts gRPC OTLP. A minimal local setup with Docker:

Terminal window
docker run -d --name otel-collector \
-p 4317:4317 \
otel/opentelemetry-collector-contrib:latest

Route to your backend (Prometheus, Grafana Cloud, Datadog, etc.) via the collector’s exporter configuration.

For a quick local stack, Grafana OTel-LGTM bundles the collector, Prometheus, Loki, and Grafana in one container. Good for kicking the tires before committing to a production backend.

After restarting the daemon with telemetry enabled:

  1. netclaw status should show the telemetry row as enabled with your endpoint
  2. netclaw doctor validates the endpoint URI format
  3. Send a test message through any channel and look for netclaw.channel.* metrics in your collector

Once data is flowing, build dashboards around the diagnostic queries above. Operational Alerts covers netclaw’s built-in webhook notifications, which work alongside OTel-based alerting.