Send images, audio, video, and documents to models, and receive binary content from tools.
Vibes supports four input modalities - images, audio, video, and documents - through a set of message helper functions. Each helper wraps the raw binary or URL data into a UserModelMessage that the agent loop passes directly to the model. On the output side, tools can return BinaryContent (raw bytes with a MIME type) that flows back to the model as part of the conversation.
import { imageMessage, audioMessage, fileMessage } from "jsr:@vibesjs/sdk";
The image parameter accepts a URL string, a base64-encoded string, or raw bytes. The optional text parameter adds a text prompt alongside the image. mediaType is optional — when omitted, no MIME type hint is sent and the provider infers the format from the content.
import { imageMessage } from "jsr:@vibesjs/sdk";const msg = imageMessage( "https://example.com/photo.jpg", "Describe what you see in this image.");
When passing a base64 string, include only the encoded payload - not the data:image/jpeg;base64, prefix. Vibes sends the data part directly to the model provider.
Unlike imageMessage, the mediaType argument is required for audio - the runtime has no way to infer the codec from raw bytes.
import { audioMessage } from "jsr:@vibesjs/sdk";const audioBytes = fs.readFileSync("./recording.mp3");const base64Audio = audioBytes.toString("base64");const msg = audioMessage(base64Audio, "audio/mpeg", "Transcribe this recording.");
Audio modality support depends on the model provider. Check your provider’s documentation to confirm which audio formats and codecs are accepted before sending audio messages.
Video follows the same shape as audio: use fileMessage with a video/* MIME type. Provider support varies.
import { fileMessage } from "jsr:@vibesjs/sdk";const videoBytes = fs.readFileSync("./clip.mp4");const base64Video = videoBytes.toString("base64");const msg = fileMessage(base64Video, "video/mp4", "Describe the action in this clip.");
Video support is experimental for most providers. Verify that your chosen model accepts video/mp4 (or the specific video/* MIME type) before sending video in production.
When a provider has already stored a file server-side (for example, via the Anthropic Files API), you reference it with an UploadedFile object rather than re-uploading the bytes.
import type { UploadedFile } from "jsr:@vibesjs/sdk";const file: UploadedFile = { type: "uploaded_file", // underscore - not a hyphen fileId: "file_abc123", mimeType: "application/pdf", filename: "contract.pdf",};
The type discriminant is "uploaded_file" with an underscore. Using "uploaded-file" (hyphen) will fail the type check and the provider will reject the request.
Use uploadedFileSchema when an agent tool needs to accept an UploadedFile as a parameter:
import { uploadedFileSchema, tool } from "jsr:@vibesjs/sdk";import { z } from "zod";const analyzeTool = tool({ name: "analyze_file", description: "Analyze an already-uploaded file", parameters: z.object({ file: uploadedFileSchema, question: z.string(), }), execute: async (_ctx, { file, question }) => { // file.fileId, file.mimeType, file.filename available here return `Analyzing ${file.filename ?? file.fileId} for: ${question}`; },});
Tools can return BinaryContent - raw bytes with a MIME type - which the agent loop forwards to the model as part of the next turn. This lets a tool produce an image, audio clip, or document that the model can reason about.
import { tool } from "jsr:@vibesjs/sdk";import type { BinaryContent } from "jsr:@vibesjs/sdk";import { z } from "zod";const screenshotTool = tool({ name: "take_screenshot", description: "Capture a screenshot of a URL and return it as an image", parameters: z.object({ url: z.string().url(), }), execute: async (_ctx, { url }): Promise<BinaryContent> => { const imageBytes = await captureScreenshot(url); // your implementation return { type: "binary", data: imageBytes, mimeType: "image/png", }; },});
The table below summarises which content types are accepted by major providers. Always verify your provider’s current documentation before sending multimodal content in production — provider capabilities change frequently.
Modality
Function
Anthropic (Claude)
OpenAI (GPT-4o)
Google (Gemini)
Image (URL)
imageMessage
Yes
Yes
Yes
Image (bytes / base64)
imageMessage
Yes
Yes
Yes
Audio
audioMessage
No
Yes (Whisper/GPT-4o-audio)
Yes
Video
fileMessage with video/*
No
No
Yes
PDF / document
fileMessage with application/pdf
Yes (Claude 3+ with PDF support)
No
Yes
Binary tool result (image)
BinaryContent from execute
Yes
Yes
Yes
Uploaded file reference
UploadedFile from execute
Yes (Files API)
Yes (Files API)
No
Audio content is encoded as a FilePart (type "file") because the AI SDK UserContent union does not include a separate audio part. Providers that support audio consume it via the file part with the appropriate mediaType (e.g. "audio/wav", "audio/mpeg").