Skip to main content
POST
/
v1
/
chats
Basic request
curl -X POST "https://api.cuadra.ai/v1/chats" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "modelId": "84d6f2f1-27a5-4b5c-8a53-e2f7f1f5b0a3",
    "messages": [ { "role": "user", "content": "Hello!" } ]
  }'
{
  "id": "<string>",
  "delta": "",
  "reasoning": "",
  "toolCalls": [
    {
      "index": 1,
      "id": "<string>",
      "type": "function",
      "function": {}
    }
  ],
  "sources": [
    {
      "sourceId": "<string>",
      "filename": "<string>",
      "score": 0.5,
      "chunkId": "<string>",
      "datasetId": "<string>",
      "contentType": "<string>",
      "sourceUrl": "<string>"
    }
  ],
  "finished": false,
  "usage": {
    "inputTokens": 1,
    "outputTokens": 1,
    "totalTokens": 1,
    "cost": "<string>"
  }
}

Authorizations

Authorization
string
header
required

JWT token from Stytch B2B authentication (magic link, SSO, or M2M)

Headers

X-Stream-Format
enum<string>

Stream format for SSE responses. Set to ai-sdk to enable Vercel AI SDK UI Message Stream Protocol compatibility. When enabled, response includes x-vercel-ai-ui-message-stream: v1 header.

Available options:
ai-sdk

Body

application/json

Request schema for chat completions.

messages
MessageCreate · object[]
required

Messages to send to the model

Minimum array length: 1
chatId
string | null

Existing chat ID to continue conversation

Example:

"chat_abc123"

modelId
string | null

Identifier of the AI model for this request. If omitted and chatId is provided, the chat's existing model is used. Must match a model 'id' from the /v1/models API.

system_prompt
string | null

System-level instructions for the AI model.

ephemeral
boolean
default:false

Create temporary chat for testing. Ephemeral chats are automatically deleted.

Example:

false

stream
boolean
default:false

Enable streaming response

Example:

true

maxTokens
integer | null

Maximum number of tokens to generate for this response

Required range: 1 <= x <= 32000
Example:

128

temperature
any
responseFormat
Responseformat · object

Structured output format specification (AI models-compatible json_schema format). Enforces the AI response to match the specified JSON schema.

Example:
{
"json_schema": {
"name": "response",
"schema": {
"additionalProperties": false,
"properties": {
"summary": { "type": "string" },
"confidence": {
"maximum": 1,
"minimum": 0,
"type": "number"
}
},
"required": ["summary"],
"type": "object"
},
"strict": true
},
"type": "json_schema"
}
enableReasoning
boolean
default:false

Enable reasoning/thinking tokens for supported models. When enabled, the model will expose its thinking process in the response. Supported by: AI models (extended thinking), AI models o1/o3, AI models thinking models. Note: Reasoning tokens are billed separately and may significantly increase costs.

Examples:

true

false

reasoningBudget
integer | null

Maximum tokens for reasoning/thinking (only used when enableReasoning=true). Higher budgets allow deeper reasoning but increase latency and cost. Default: 10000 for AI models, varies by provider.

Required range: 1000 <= x <= 128000
Example:

10000

tools
ToolSchema · object[] | null

List of tools the model may call. Pass-through to AI provider. Tools are defined and executed by the client, not the server.

Example:
[
{
"function": {
"description": "Get current weather for a location",
"name": "get_weather",
"parameters": {
"properties": {
"location": { "type": "string" },
"unit": {
"enum": ["celsius", "fahrenheit"],
"type": "string"
}
},
"required": ["location"],
"type": "object"
}
},
"type": "function"
}
]
toolChoice

Controls tool selection. Options: 'auto' (model decides), 'none' (no tools), 'required' (must use a tool), or specific tool object.

Example:

"auto"

parallelToolCalls
boolean | null

Whether to allow parallel tool calls (AI models-specific, default true)

enableArtifacts
boolean
default:false

Enable artifact creation. When enabled, the AI can create standalone documents (reports, code files, HTML pages, diagrams) that appear in a side panel. Requires stream=true. Artifact tools are injected server-side.

Examples:

false

true

fetchUrls
boolean
default:true

When enabled, URLs in user messages are automatically fetched and their content is provided to the AI as context. Enables 'summarize this article' or 'what does this page say' use cases. If a URL cannot be fetched (e.g., too large, blocked, or inaccessible), the error is communicated so the AI can inform the user.

Examples:

true

false

fileIds
string[] | null

File IDs to associate with this chat before processing. Use when the frontend uploads files first then creates/sends a chat message. Files are associated with the chat (creating vectors if needed) and included in RAG context. If vectors are not ready yet, extracted text is used as fallback context.

Example:
["7aa0316a-0a56-4c11-8d37-45f5cf40febd"]
useVlmRag
boolean | null

Control whether original images are fetched from storage and attached to the LLM prompt for image_ocr RAG chunks. null (default) — server decides: enabled when the model supports vision and RAG_IMAGE_ATTACHMENT_ENABLED is true. true — force on for this request (model must support vision). false — disable for this request regardless of server defaults. Image tokens are billed as part of the normal input token charge.

Example:

null

Response

Server-Sent Events stream of chat completion chunks (when stream=true). Each SSE event has the format: data: <json>\n\n where <json> matches the StreamingChatResponse schema. The final event has finished: true and includes usage stats. A data: [DONE]\n\n event is sent after the last chunk.

id
string
required

Chat ID

delta
string
default:""

Text content delta

reasoning
string
default:""

Reasoning/thinking content delta

toolCalls
object[] | null

Tool call deltas (accumulated by client)

sources
SourceOut · object[] | null

RAG sources used for this response (sent once at start)

finished
boolean
default:false
usage
object