Skip to content

Self-Hosted Providers

Self-hosted providers run inference on your own hardware. No API keys, no data leaving your network. Netclaw has two provider types for this: native Ollama integration and an OpenAI-compatible mode that works with any server exposing a /v1/chat/completions endpoint — llama.cpp, vLLM, Lemonade, or anything else OpenAI-compatible.

Config goes in ~/.netclaw/config/netclaw.json. Self-hosted providers don’t need credentials unless you’re running an authenticated endpoint. Environment variables with the NETCLAW_ prefix override file-based config.

netclaw init handles provider setup interactively if you’re starting fresh. This page covers manual configuration.

For cloud-hosted providers (OpenRouter, Anthropic, OpenAI), see Managed Providers.

  • Ollama or another inference server installed and running
  • netclaw init completed (or you’re configuring manually for the first time)
  • The netclaw daemon running (netclaw daemon start)
TypeDisplay NameDefault EndpointAuthUse Case
ollamaOllamahttp://localhost:11434NoneOllama servers (auto-detects model capabilities)
openai-compatiblellama.cpp / vLLMhttp://localhost:11434Optional Bearer tokenAnything exposing /v1/chat/completions
FieldTypeDefaultDescription
Typestring"ollama"Provider SDK: ollama or openai-compatible
Endpointstringvaries by typeBase URL of the inference server
ApiKeystring?nullOptional Bearer token for authenticated endpoints

The default provider. Netclaw discovers models via /api/tags and detects capabilities per-model through /api/show.

Ollama must have at least one model pulled before netclaw can use it — an empty model list is treated as a probe failure. Pull a model first:

Terminal window
ollama pull qwen3:30b

Browse available models at the Ollama model library.

{
"Providers": {
"local": {
"Type": "ollama",
"Endpoint": "http://localhost:11434"
}
},
"Models": {
"Main": { "Provider": "local", "ModelId": "qwen3:30b" }
}
}

If Ollama is running locally, no other config is needed.

Point to any machine on your network:

{
"Providers": {
"gpu-box": {
"Type": "ollama",
"Endpoint": "http://192.168.1.50:11434"
}
}
}

Netclaw queries /api/show for each model and inspects the architecture metadata:

CapabilityDetection MethodExample Models
Context window{arch}.context_length fieldAll models
Vision{arch}.vision.block_count fieldllava, llama3.2-vision

Models without native tool calling still work — netclaw falls back to structured prompting for tool use.

Ollama provider setup during the init wizard

The init wizard auto-discovers models pulled in your local Ollama instance and lets you assign them to roles.

OpenAI-Compatible (llama.cpp, vLLM, Lemonade)

Section titled “OpenAI-Compatible (llama.cpp, vLLM, Lemonade)”

For any inference server with a /v1/chat/completions endpoint — llama.cpp, vLLM, Lemonade, or anything else OpenAI-compatible. Netclaw discovers models via /v1/models and streams completions with tool calling.

{
"Providers": {
"llama-server": {
"Type": "openai-compatible",
"Endpoint": "http://localhost:8080"
}
},
"Models": {
"Main": { "Provider": "llama-server", "ModelId": "my-model" }
}
}

Note: the netclaw default endpoint for openai-compatible is http://localhost:11434, but llama-server defaults to port 8080 — specify the endpoint explicitly when using llama.cpp.

Some deployments protect the API with a Bearer token. Store tokens in secrets.json to keep credentials out of version-controlled config:

{
"Providers": {
"vllm-cluster": { "ApiKey": "your-token-here" }
}
}

The main config in netclaw.json just references the provider without the key:

{
"Providers": {
"vllm-cluster": {
"Type": "openai-compatible",
"Endpoint": "https://inference.internal:8443"
}
}
}

Start llama-server, then point netclaw at it:

Terminal window
# Start llama-server (defaults to port 8080)
llama-server -m ./models/qwen3-30b-q4_k_m.gguf --port 8080
# Configure netclaw
netclaw provider add llama-local openai-compatible --endpoint http://localhost:8080

Mix provider types freely. Here’s Ollama handling Main with a llama.cpp instance as Fallback:

{
"Providers": {
"ollama-local": {
"Type": "ollama",
"Endpoint": "http://localhost:11434"
},
"llama-gpu": {
"Type": "openai-compatible",
"Endpoint": "http://localhost:8080"
}
},
"Models": {
"Main": { "Provider": "ollama-local", "ModelId": "qwen3:30b" },
"Fallback": { "Provider": "llama-gpu", "ModelId": "qwen3:14b" }
}
}

Assign models to roles with netclaw model set.

Terminal window
# Point at a remote Ollama instance
export NETCLAW_Providers__local__Type="ollama"
export NETCLAW_Providers__local__Endpoint="http://gpu-server:11434"
# Override model assignment
export NETCLAW_Models__Main__Provider="local"
export NETCLAW_Models__Main__ModelId="qwen3:30b"

Environment variables follow the .NET configuration convention — double underscores separate path segments.

Netclaw probes each provider on startup — /api/tags for Ollama, /v1/models for openai-compatible. Each probe times out after 10 seconds.

Common issues:

SymptomCauseFix
Connection refusedServer not runningStart Ollama (ollama serve) or llama-server
Empty model listNo models pulledRun ollama pull <model>
TimeoutServer overloaded or wrong portCheck endpoint URL and server logs
401 UnauthorizedToken requiredAdd ApiKey to provider config

Run netclaw doctor for a full connectivity diagnostic, or open netclaw provider to see live health status.

After editing netclaw.json, restart the daemon for changes to take effect:

Terminal window
netclaw daemon restart

Verify your provider is healthy:

Terminal window
netclaw provider # check health indicators in the TUI
netclaw doctor # full diagnostic including provider probes

Provider Manager TUI showing configured providers with health status

Self-hosted entries show when reachable with models discovered, or when the server is down or returns errors. Select a provider to manage endpoints, re-probe, or remove it.

  • Changing providers requires a daemon restart.
  • Tool calling quality varies between models. Qwen3 30B+ and Llama 3.1 70B+ handle it well; smaller models often choke on complex tool schemas.
  • The openai-compatible provider sends standard OpenAI tool-calling format. Servers that don’t implement tool calling will fall back to structured prompting.