Cuadra AI - Connect, Train, and Deploy Your Custom AI Assistant

Authorizations

Authorization

string

header

required

JWT token from Stytch B2B authentication (magic link, SSO, or M2M)

Headers

X-Stream-Format

enum<string>

Stream format for SSE responses. Set to ai-sdk to enable Vercel AI SDK UI Message Stream Protocol compatibility. When enabled, response includes x-vercel-ai-ui-message-stream: v1 header.

Available options:

ai-sdk

Body

application/json

Request schema for chat completions.

messages

MessageCreate · object[]

required

Messages to send to the model

Minimum array length: 1

Show child attributes

chatId

string | null

Existing chat ID to continue conversation

Example:

"chat_abc123"

modelId

string | null

Identifier of the AI model for this request. If omitted and chatId is provided, the chat's existing model is used. Must match a model 'id' from the /v1/models API.

system_prompt

string | null

System-level instructions for the AI model.

ephemeral

boolean

default:false

Create temporary chat for testing. Ephemeral chats are automatically deleted.

Example:

false

stream

boolean

default:false

Enable streaming response

Example:

true

maxTokens

integer | null

Maximum number of tokens to generate for this response

Required range: 1 <= x <= 32000

Example:

128

temperature

any

responseFormat

Responseformat · object

Structured output format specification (AI models-compatible json_schema format). Enforces the AI response to match the specified JSON schema.

Example:

{
  "json_schema": {
    "name": "response",
    "schema": {
      "additionalProperties": false,
      "properties": {
        "summary": { "type": "string" },
        "confidence": {
          "maximum": 1,
          "minimum": 0,
          "type": "number"
        }
      },
      "required": ["summary"],
      "type": "object"
    },
    "strict": true
  },
  "type": "json_schema"
}

enableReasoning

boolean

default:false

Enable reasoning/thinking tokens for supported models. When enabled, the model will expose its thinking process in the response. Supported by: AI models (extended thinking), AI models o1/o3, AI models thinking models. Note: Reasoning tokens are billed separately and may significantly increase costs.

Examples:

true

false

reasoningBudget

integer | null

Maximum tokens for reasoning/thinking (only used when enableReasoning=true). Higher budgets allow deeper reasoning but increase latency and cost. Default: 10000 for AI models, varies by provider.

Required range: 1000 <= x <= 128000

Example:

10000

tools

ToolSchema · object[] | null

List of tools the model may call. Pass-through to AI provider. Tools are defined and executed by the client, not the server.

Show child attributes

Example:

[
  {
    "function": {
      "description": "Get current weather for a location",
      "name": "get_weather",
      "parameters": {
        "properties": {
          "location": { "type": "string" },
          "unit": {
            "enum": ["celsius", "fahrenheit"],
            "type": "string"
          }
        },
        "required": ["location"],
        "type": "object"
      }
    },
    "type": "function"
  }
]

toolChoice

Controls tool selection. Options: 'auto' (model decides), 'none' (no tools), 'required' (must use a tool), or specific tool object.

Example:

"auto"

parallelToolCalls

boolean | null

Whether to allow parallel tool calls (AI models-specific, default true)

enableArtifacts

boolean

default:false

Enable artifact creation. When enabled, the AI can create standalone documents (reports, code files, HTML pages, diagrams) that appear in a side panel. Requires stream=true. Artifact tools are injected server-side.

Examples:

false

true

fetchUrls

boolean

default:true

When enabled, URLs in user messages are automatically fetched and their content is provided to the AI as context. Enables 'summarize this article' or 'what does this page say' use cases. If a URL cannot be fetched (e.g., too large, blocked, or inaccessible), the error is communicated so the AI can inform the user.

Examples:

true

false

fileIds

string[] | null

File IDs to associate with this chat before processing. Use when the frontend uploads files first then creates/sends a chat message. Files are associated with the chat (creating vectors if needed) and included in RAG context. If vectors are not ready yet, extracted text is used as fallback context.

Example:

["7aa0316a-0a56-4c11-8d37-45f5cf40febd"]

useVlmRag

boolean | null

Control whether original images are fetched from storage and attached to the LLM prompt for image_ocr RAG chunks. null (default) — server decides: enabled when the model supports vision and RAG_IMAGE_ATTACHMENT_ENABLED is true. true — force on for this request (model must support vision). false — disable for this request regardless of server defaults. Image tokens are billed as part of the normal input token charge.

Example:

null

Response

Server-Sent Events stream of chat completion chunks (when stream=true). Each SSE event has the format: data: <json>\n\n where <json> matches the StreamingChatResponse schema. The final event has finished: true and includes usage stats. A data: [DONE]\n\n event is sent after the last chunk.

string

required

Chat ID

delta

string

default:""

Text content delta

reasoning

string

default:""

Reasoning/thinking content delta

toolCalls

object[] | null

Tool call deltas (accumulated by client)

Show child attributes

sources

SourceOut · object[] | null

RAG sources used for this response (sent once at start)

Show child attributes

finished

boolean

default:false

usage

object

Show child attributes

Getting Started

API Reference

Guides

Billing

Create or Continue Chat

Authorizations

Headers

Body

Response