Generate a chat completion. Streaming (stream: true): Returns an SSE stream. The LLM call runs in a background task that is decoupled from the HTTP connection — if the client disconnects mid-stream (tab close, navigation, network drop) the model continues generating and the assistant message + artifacts are persisted once it finishes. Poll GET /v1/chats/{id} to check completion status. Non-streaming (stream: false): Returns a single JSON response after the model finishes generating. Extended thinking: Models that support reasoning (e.g. AI models-opus-4-6) emit reasoning-start / reasoning-delta / reasoning-end SSE events before the text response. These events can take 60-120 seconds; keep the connection open and display a thinking indicator while they arrive.
JWT token from Stytch B2B authentication (magic link, SSO, or M2M)
Stream format for SSE responses. Set to ai-sdk to enable Vercel AI SDK UI Message Stream Protocol compatibility. When enabled, response includes x-vercel-ai-ui-message-stream: v1 header.
ai-sdk Request schema for chat completions.
Messages to send to the model
1Existing chat ID to continue conversation
"chat_abc123"
Identifier of the AI model for this request. If omitted and chatId is provided, the chat's existing model is used. Must match a model 'id' from the /v1/models API.
System-level instructions for the AI model.
Create temporary chat for testing. Ephemeral chats are automatically deleted.
false
Enable streaming response
true
Maximum number of tokens to generate for this response
1 <= x <= 32000128
Structured output format specification (AI models-compatible json_schema format). Enforces the AI response to match the specified JSON schema.
{
"json_schema": {
"name": "response",
"schema": {
"additionalProperties": false,
"properties": {
"summary": { "type": "string" },
"confidence": {
"maximum": 1,
"minimum": 0,
"type": "number"
}
},
"required": ["summary"],
"type": "object"
},
"strict": true
},
"type": "json_schema"
}Enable reasoning/thinking tokens for supported models. When enabled, the model will expose its thinking process in the response. Supported by: AI models (extended thinking), AI models o1/o3, AI models thinking models. Note: Reasoning tokens are billed separately and may significantly increase costs.
true
false
Maximum tokens for reasoning/thinking (only used when enableReasoning=true). Higher budgets allow deeper reasoning but increase latency and cost. Default: 10000 for AI models, varies by provider.
1000 <= x <= 12800010000
List of tools the model may call. Pass-through to AI provider. Tools are defined and executed by the client, not the server.
[
{
"function": {
"description": "Get current weather for a location",
"name": "get_weather",
"parameters": {
"properties": {
"location": { "type": "string" },
"unit": {
"enum": ["celsius", "fahrenheit"],
"type": "string"
}
},
"required": ["location"],
"type": "object"
}
},
"type": "function"
}
]Controls tool selection. Options: 'auto' (model decides), 'none' (no tools), 'required' (must use a tool), or specific tool object.
"auto"
Whether to allow parallel tool calls (AI models-specific, default true)
Enable artifact creation. When enabled, the AI can create standalone documents (reports, code files, HTML pages, diagrams) that appear in a side panel. Requires stream=true. Artifact tools are injected server-side.
false
true
When enabled, URLs in user messages are automatically fetched and their content is provided to the AI as context. Enables 'summarize this article' or 'what does this page say' use cases. If a URL cannot be fetched (e.g., too large, blocked, or inaccessible), the error is communicated so the AI can inform the user.
true
false
File IDs to associate with this chat before processing. Use when the frontend uploads files first then creates/sends a chat message. Files are associated with the chat (creating vectors if needed) and included in RAG context. If vectors are not ready yet, extracted text is used as fallback context.
["7aa0316a-0a56-4c11-8d37-45f5cf40febd"]Control whether original images are fetched from storage and attached to the LLM prompt for image_ocr RAG chunks. null (default) — server decides: enabled when the model supports vision and RAG_IMAGE_ATTACHMENT_ENABLED is true. true — force on for this request (model must support vision). false — disable for this request regardless of server defaults. Image tokens are billed as part of the normal input token charge.
null
Server-Sent Events stream of chat completion chunks (when stream=true). Each SSE event has the format: data: <json>\n\n where <json> matches the StreamingChatResponse schema. The final event has finished: true and includes usage stats. A data: [DONE]\n\n event is sent after the last chunk.
Chat ID
Text content delta
Reasoning/thinking content delta
Tool call deltas (accumulated by client)
RAG sources used for this response (sent once at start)