Send images, audio, video, and documents to models, and receive binary content from tools.
Vibes supports four input modalities - images, audio, video, and documents - through a set of message helper functions. Each helper wraps the raw binary or URL data into a UserModelMessage that the agent loop passes directly to the model. On the output side, tools can return BinaryContent (raw bytes with a MIME type) that flows back to the model as part of the conversation.
Copy
Ask AI
import { imageMessage, audioMessage, fileMessage } from "@vibesjs/sdk";
The image parameter accepts a URL string, a base64-encoded string, or raw bytes. The optional text parameter adds a text prompt alongside the image. mediaType defaults to "image/jpeg" when omitted and the runtime cannot infer it.
Copy
Ask AI
import { imageMessage } from "@vibesjs/sdk";const msg = imageMessage( "https://example.com/photo.jpg", "Describe what you see in this image.");
When passing a base64 string, include only the encoded payload - not the data:image/jpeg;base64, prefix. Vibes sends the data part directly to the model provider.
Unlike imageMessage, the mediaType argument is required for audio - the runtime has no way to infer the codec from raw bytes.
Copy
Ask AI
import { audioMessage } from "@vibesjs/sdk";const audioBytes = fs.readFileSync("./recording.mp3");const base64Audio = audioBytes.toString("base64");const msg = audioMessage(base64Audio, "audio/mpeg", "Transcribe this recording.");
Audio modality support depends on the model provider. Check your provider’s documentation to confirm which audio formats and codecs are accepted before sending audio messages.
Video follows the same shape as audio: use fileMessage with a video/* MIME type. Provider support varies.
Copy
Ask AI
import { fileMessage } from "@vibesjs/sdk";const videoBytes = fs.readFileSync("./clip.mp4");const base64Video = videoBytes.toString("base64");const msg = fileMessage(base64Video, "video/mp4", "Describe the action in this clip.");
Video support is experimental for most providers. Verify that your chosen model accepts video/mp4 (or the specific video/* MIME type) before sending video in production.
When a provider has already stored a file server-side (for example, via the Anthropic Files API), you reference it with an UploadedFile object rather than re-uploading the bytes.
Copy
Ask AI
import type { UploadedFile } from "@vibesjs/sdk";const file: UploadedFile = { type: "uploaded_file", // underscore - not a hyphen fileId: "file_abc123", mimeType: "application/pdf", filename: "contract.pdf",};
The type discriminant is "uploaded_file" with an underscore. Using "uploaded-file" (hyphen) will fail the type check and the provider will reject the request.
Use uploadedFileSchema when an agent tool needs to accept an UploadedFile as a parameter:
Copy
Ask AI
import { uploadedFileSchema } from "@vibesjs/sdk";import { tool } from "@vibesjs/sdk";import { z } from "zod";const analyzeTool = tool({ description: "Analyze an already-uploaded file", parameters: z.object({ file: uploadedFileSchema, question: z.string(), }), execute: async ({ file, question }) => { // file.fileId, file.mimeType, file.filename available here return `Analyzing ${file.filename ?? file.fileId} for: ${question}`; },});
Tools can return BinaryContent - raw bytes with a MIME type - which the agent loop forwards to the model as part of the next turn. This lets a tool produce an image, audio clip, or document that the model can reason about.
Copy
Ask AI
import { tool } from "@vibesjs/sdk";import type { BinaryContent } from "@vibesjs/sdk";import { z } from "zod";const screenshotTool = tool({ description: "Capture a screenshot of a URL and return it as an image", parameters: z.object({ url: z.string().url(), }), execute: async ({ url }): Promise<BinaryContent> => { const imageBytes = await captureScreenshot(url); // your implementation return { type: "binary", data: imageBytes, mimeType: "image/png", }; },});