Self-Hosted Providers

Self-hosted providers run inference on your own hardware. No API keys, no data leaving your network. Netclaw has two provider types for this: native Ollama integration and an OpenAI-compatible mode that works with any server exposing a /v1/chat/completions endpoint — llama.cpp, vLLM, Lemonade, or anything else OpenAI-compatible.

Config goes in ~/.netclaw/config/netclaw.json. Self-hosted providers don’t need credentials unless you’re running an authenticated endpoint. Environment variables with the NETCLAW_ prefix override file-based config.

netclaw init handles provider setup interactively if you’re starting fresh. This page covers manual configuration.

For cloud-hosted providers (OpenRouter, Anthropic, OpenAI), see Managed Providers.

Before You Start

Ollama or another inference server installed and running
netclaw init completed (or you’re configuring manually for the first time)
The netclaw daemon running (netclaw daemon start)

Provider Summary

Type	Display Name	Default Endpoint	Auth	Use Case
`ollama`	Ollama	`http://localhost:11434`	None	Ollama servers (auto-detects model capabilities)
`openai-compatible`	llama.cpp / vLLM	`http://localhost:11434`	Optional Bearer token	Anything exposing `/v1/chat/completions`

Configuration Schema

Field	Type	Default	Description
`Type`	string	`"ollama"`	Provider SDK: `ollama` or `openai-compatible`
`Endpoint`	string	varies by type	Base URL of the inference server
`ApiKey`	string?	`null`	Optional Bearer token for authenticated endpoints

Ollama

The default provider. Netclaw discovers models via /api/tags and detects capabilities per-model through /api/show.

Ollama must have at least one model pulled before netclaw can use it — an empty model list is treated as a probe failure. Pull a model first:

ollama pull qwen3:30b

Browse available models at the Ollama model library.

Local Ollama

{
  "Providers": {
    "local": {
      "Type": "ollama",
      "Endpoint": "http://localhost:11434"
    }
  },
  "Models": {
    "Main": { "Provider": "local", "ModelId": "qwen3:30b" }
  }
}

If Ollama is running locally, no other config is needed.

Remote Ollama Instance

Point to any machine on your network:

{
  "Providers": {
    "gpu-box": {
      "Type": "ollama",
      "Endpoint": "http://192.168.1.50:11434"
    }
  }
}

Capability Detection

Netclaw queries /api/show for each model and inspects the architecture metadata:

Capability	Detection Method	Example Models
Context window	`{arch}.context_length` field	All models
Vision	`{arch}.vision.block_count` field	llava, llama3.2-vision

Models without native tool calling still work — netclaw falls back to structured prompting for tool use.

Ollama provider setup during the init wizard

The init wizard auto-discovers models pulled in your local Ollama instance and lets you assign them to roles.

OpenAI-Compatible (llama.cpp, vLLM, Lemonade)

For any inference server with a /v1/chat/completions endpoint — llama.cpp, vLLM, Lemonade, or anything else OpenAI-compatible. Netclaw discovers models via /v1/models and streams completions with tool calling.

Basic Setup

{
  "Providers": {
    "llama-server": {
      "Type": "openai-compatible",
      "Endpoint": "http://localhost:8080"
    }
  },
  "Models": {
    "Main": { "Provider": "llama-server", "ModelId": "my-model" }
  }
}

Note: the netclaw default endpoint for openai-compatible is http://localhost:11434, but llama-server defaults to port 8080 — specify the endpoint explicitly when using llama.cpp.

With Authentication

Some deployments protect the API with a Bearer token. Store tokens in secrets.json to keep credentials out of version-controlled config:

{
  "Providers": {
    "vllm-cluster": { "ApiKey": "your-token-here" }
  }
}

The main config in netclaw.json just references the provider without the key:

{
  "Providers": {
    "vllm-cluster": {
      "Type": "openai-compatible",
      "Endpoint": "https://inference.internal:8443"
    }
  }
}

llama.cpp Server Example

Start llama-server, then point netclaw at it:

# Start llama-server (defaults to port 8080)
llama-server -m ./models/qwen3-30b-q4_k_m.gguf --port 8080

# Configure netclaw
netclaw provider add llama-local openai-compatible --endpoint http://localhost:8080

Multi-Provider Setup

Mix provider types freely. Here’s Ollama handling Main with a llama.cpp instance as Fallback:

{
  "Providers": {
    "ollama-local": {
      "Type": "ollama",
      "Endpoint": "http://localhost:11434"
    },
    "llama-gpu": {
      "Type": "openai-compatible",
      "Endpoint": "http://localhost:8080"
    }
  },
  "Models": {
    "Main": { "Provider": "ollama-local", "ModelId": "qwen3:30b" },
    "Fallback": { "Provider": "llama-gpu", "ModelId": "qwen3:14b" }
  }
}

Assign models to roles with netclaw model set.

Environment Variable Overrides

# Point at a remote Ollama instance
export NETCLAW_Providers__local__Type="ollama"
export NETCLAW_Providers__local__Endpoint="http://gpu-server:11434"

# Override model assignment
export NETCLAW_Models__Main__Provider="local"
export NETCLAW_Models__Main__ModelId="qwen3:30b"

Environment variables follow the .NET configuration convention — double underscores separate path segments.

Health Checks

Netclaw probes each provider on startup — /api/tags for Ollama, /v1/models for openai-compatible. Each probe times out after 10 seconds.

Common issues:

Symptom	Cause	Fix
Connection refused	Server not running	Start Ollama (`ollama serve`) or llama-server
Empty model list	No models pulled	Run `ollama pull <model>`
Timeout	Server overloaded or wrong port	Check endpoint URL and server logs
401 Unauthorized	Token required	Add `ApiKey` to provider config

Run netclaw doctor for a full connectivity diagnostic, or open netclaw provider to see live health status.

Applying Changes

After editing netclaw.json, restart the daemon for changes to take effect:

netclaw daemon restart

Verify your provider is healthy:

netclaw provider    # check health indicators in the TUI
netclaw doctor      # full diagnostic including provider probes

Provider Manager TUI

Provider Manager TUI showing configured providers with health status

Self-hosted entries show ✓ when reachable with models discovered, or ⚠ when the server is down or returns errors. Select a provider to manage endpoints, re-probe, or remove it.

Limitations

Changing providers requires a daemon restart.
Tool calling quality varies between models. Qwen3 30B+ and Llama 3.1 70B+ handle it well; smaller models often choke on complex tool schemas.
The openai-compatible provider sends standard OpenAI tool-calling format. Servers that don’t implement tool calling will fall back to structured prompting.

Resources

Ollama — install and run local models
Ollama model library — browse available models
Ollama API reference — model management, tags, and show endpoints
llama.cpp server docs — flags, endpoints, performance tuning
vLLM OpenAI-compatible serving — model parallelism and quantization options
.NET environment variable configuration — the double-underscore nesting convention