Multimodal - Vibes Agent SDK

Vibes supports four input modalities - images, audio, video, and documents - through a set of message helper functions. Each helper wraps the raw binary or URL data into a UserModelMessage that the agent loop passes directly to the model. On the output side, tools can return BinaryContent (raw bytes with a MIME type) that flows back to the model as part of the conversation.

import { imageMessage, audioMessage, fileMessage } from "jsr:@vibesjs/sdk";

Content type routing

Images

imageMessage(image: string | Uint8Array | URL, text?: string, mediaType?: string): UserModelMessage

The image parameter accepts a URL string, a base64-encoded string, or raw bytes. The optional text parameter adds a text prompt alongside the image. mediaType is optional — when omitted, no MIME type hint is sent and the provider infers the format from the content.

import { imageMessage } from "jsr:@vibesjs/sdk";

const msg = imageMessage(
  "https://example.com/photo.jpg",
  "Describe what you see in this image."
);

When passing a base64 string, include only the encoded payload - not the data:image/jpeg;base64, prefix. Vibes sends the data part directly to the model provider.

Audio

audioMessage(audio: string | Uint8Array, mediaType: string, text?: string): UserModelMessage

Unlike imageMessage, the mediaType argument is required for audio - the runtime has no way to infer the codec from raw bytes.

import { audioMessage } from "jsr:@vibesjs/sdk";

const audioBytes = fs.readFileSync("./recording.mp3");
const base64Audio = audioBytes.toString("base64");

const msg = audioMessage(base64Audio, "audio/mpeg", "Transcribe this recording.");

Audio modality support depends on the model provider. Check your provider’s documentation to confirm which audio formats and codecs are accepted before sending audio messages.

Video

Video follows the same shape as audio: use fileMessage with a video/* MIME type. Provider support varies.

import { fileMessage } from "jsr:@vibesjs/sdk";

const videoBytes = fs.readFileSync("./clip.mp4");
const base64Video = videoBytes.toString("base64");

const msg = fileMessage(base64Video, "video/mp4", "Describe the action in this clip.");

Video support is experimental for most providers. Verify that your chosen model accepts video/mp4 (or the specific video/* MIME type) before sending video in production.

Documents

Use fileMessage to send PDFs, plain text, or other document formats alongside a prompt.

fileMessage(data: string | Uint8Array, mediaType: string, text?: string): UserModelMessage

import { fileMessage } from "jsr:@vibesjs/sdk";

const pdfBytes = fs.readFileSync("./contract.pdf");
const base64Pdf = Buffer.from(pdfBytes).toString("base64");

const msg = fileMessage(
  base64Pdf,
  "application/pdf",
  "Summarize the key obligations in this contract."
);

Common document MIME types:

Format	`mediaType`
PDF	`"application/pdf"`
Plain text	`"text/plain"`
HTML	`"text/html"`
CSV	`"text/csv"`

UploadedFile

When a provider has already stored a file server-side (for example, via the Anthropic Files API), you reference it with an UploadedFile object rather than re-uploading the bytes.

import type { UploadedFile } from "jsr:@vibesjs/sdk";

const file: UploadedFile = {
  type: "uploaded_file",  // underscore  -  not a hyphen
  fileId: "file_abc123",
  mimeType: "application/pdf",
  filename: "contract.pdf",
};

The type discriminant is "uploaded_file" with an underscore. Using "uploaded-file" (hyphen) will fail the type check and the provider will reject the request.

Use uploadedFileSchema when an agent tool needs to accept an UploadedFile as a parameter:

import { uploadedFileSchema, tool } from "jsr:@vibesjs/sdk";
import { z } from "zod";

const analyzeTool = tool({
  name: "analyze_file",
  description: "Analyze an already-uploaded file",
  parameters: z.object({
    file: uploadedFileSchema,
    question: z.string(),
  }),
  execute: async (_ctx, { file, question }) => {
    // file.fileId, file.mimeType, file.filename available here
    return `Analyzing ${file.filename ?? file.fileId} for: ${question}`;
  },
});

Binary content from tools

Tools can return BinaryContent - raw bytes with a MIME type - which the agent loop forwards to the model as part of the next turn. This lets a tool produce an image, audio clip, or document that the model can reason about.

import { tool } from "jsr:@vibesjs/sdk";
import type { BinaryContent } from "jsr:@vibesjs/sdk";
import { z } from "zod";

const screenshotTool = tool({
  name: "take_screenshot",
  description: "Capture a screenshot of a URL and return it as an image",
  parameters: z.object({
    url: z.string().url(),
  }),
  execute: async (_ctx, { url }): Promise<BinaryContent> => {
    const imageBytes = await captureScreenshot(url); // your implementation
    return {
      type: "binary",
      data: imageBytes,
      mimeType: "image/png",
    };
  },
});

To learn how tools integrate with the broader agent loop and how to compose toolsets, see Tools.

Provider compatibility

The table below summarises which content types are accepted by major providers. Always verify your provider’s current documentation before sending multimodal content in production — provider capabilities change frequently.

Modality	Function	Anthropic (Claude)	OpenAI (GPT-4o)	Google (Gemini)
Image (URL)	`imageMessage`	Yes	Yes	Yes
Image (bytes / base64)	`imageMessage`	Yes	Yes	Yes
Audio	`audioMessage`	No	Yes (Whisper/GPT-4o-audio)	Yes
Video	`fileMessage` with `video/*`	No	No	Yes
PDF / document	`fileMessage` with `application/pdf`	Yes (Claude 3+ with PDF support)	No	Yes
Binary tool result (image)	`BinaryContent` from `execute`	Yes	Yes	Yes
Uploaded file reference	`UploadedFile` from `execute`	Yes (Files API)	Yes (Files API)	No

Audio content is encoded as a FilePart (type "file") because the AI SDK UserContent union does not include a separate audio part. Providers that support audio consume it via the file part with the appropriate mediaType (e.g. "audio/wav", "audio/mpeg").

​Content type routing

​Images

​Audio

​Video

​Documents

​UploadedFile

​Binary content from tools

​Tool multi-modal return flow

​Provider compatibility

Content type routing

Images

Audio

Video

Documents

UploadedFile

Binary content from tools

Tool multi-modal return flow

Provider compatibility