Skip to content

Images, Audio, and Other File Types

When the agent reads a file, netclaw classifies it upfront instead of assuming everything is text. A text file comes back as text; an image gets handed to the model as something it can actually see; a PDF or zip comes back with a note on how to extract it. One classifier drives this for both file_read and channel attachments.

Netclaw identifies a file by its extension and its magic bytes (the signature in the first few KB), then recognizes its type:

TypeExamples
Text.txt, .md, .csv, .json, .xml, .yaml — plus any UTF-8 text file (source code included), detected by content
ImagePNG, JPEG, GIF, WebP, BMP, TIFF
PDFapplication/pdf
DocumentDOCX, XLSX, PPTX, ODT, RTF
ArchiveZIP, 7z, gzip, bzip2, xz
AudioMP3, M4A, WAV, OGG
VideoMP4, MOV, WebM, MKV, AVI

Extension and signature are reconciled, so a .md file declared as text/plain is still treated as markdown, and a file with the wrong extension is caught by its actual bytes.

What file_read returns depends on the category:

  • Text — the content, with Offset/Limit for paging and truncation past a size cap. Same as you’d expect.
  • Image, on a vision-capable model — netclaw attaches the image to the next model call and returns a note: “Image loaded for model-visible inspection on the next LLM call.” The model sees the actual picture on its next turn. Inlined formats are PNG, JPEG, GIF, and WebP; BMP and TIFF are recognized but passed by path only. Files are capped at 25 MB.
  • Image, on a model with no vision — a note that the current model has no image modality. The file is on disk; the model just can’t look at it.
  • PDF, document, archive, audio, video — metadata plus a pointer to the right extraction tool. file_read never dumps raw bytes into the conversation; use shell_execute (for example, pdftotext for a PDF) to pull text out.

Files shared in Slack, Discord, or Mattermost flow through the same classifier. An image dropped in a channel reaches a vision model the same way a file_read image does, and the same per-format and size rules apply.

What’s allowed per channel is governed by the audience’s attachment policy: which categories are accepted, a max file size (25 MB by default), and a max number of files per message (10 by default). The policy’s categories are coarser than the types above — Image, PDF, Document, Archive, Media, and Other — and audio and video both fall under a single Media category, so you can’t allow one without the other. Lock a channel down by narrowing its allowed categories.