Quick Start
curl -X POST https://api.cuadra.ai/v1/chats \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"modelId": "model_abc123",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Streaming Responses
Enable stream: true for real-time responses via Server-Sent Events:
curl -X POST https://api.cuadra.ai/v1/chats \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"modelId": "model_abc", "messages": [...], "stream": true}'
Stream format:
data: {"id":"chat_xyz","delta":"Once","finished":false}
data: {"id":"chat_xyz","delta":" upon","finished":false}
data: {"id":"chat_xyz","delta":"","finished":true,"usage":{...}}
data: [DONE]
For Vercel AI SDK compatibility, add the header:
Events: start, text-delta, source-document, reasoning-delta, tool-input-delta, finish
Reasoning (Extended Thinking)
Enable enableReasoning: true to see the model’s thinking process. Supported by models with extended thinking capabilities — check the model catalog for availability.
{
"modelId" : "model_claude" ,
"messages" : [ ... ],
"enableReasoning" : true ,
"reasoningBudget" : 10000
}
Reasoning tokens are billed separately. Use reasoningBudget to cap costs.
Structured Outputs (JSON Mode)
Force JSON schema compliance with responseFormat:
{
"modelId" : "model_abc" ,
"messages" : [{ "role" : "user" , "content" : "Extract: iPhone 15 Pro costs $999" }],
"responseFormat" : {
"type" : "json_schema" ,
"json_schema" : {
"name" : "product" ,
"strict" : true ,
"schema" : {
"type" : "object" ,
"properties" : {
"name" : { "type" : "string" },
"price" : { "type" : "number" }
},
"required" : [ "name" , "price" ]
}
}
}
}
Response content will be valid JSON: {"name": "iPhone 15 Pro", "price": 999}
Define tools the model can invoke:
{
"modelId" : "model_abc" ,
"messages" : [{ "role" : "user" , "content" : "Weather in Paris?" }],
"tools" : [{
"type" : "function" ,
"function" : {
"name" : "get_weather" ,
"description" : "Get current weather" ,
"parameters" : {
"type" : "object" ,
"properties" : { "location" : { "type" : "string" }},
"required" : [ "location" ]
}
}
}]
}
When the model calls a tool, respond with tool results:
{
"chatId" : "chat_xyz" ,
"messages" : [
{ "role" : "user" , "content" : "Weather in Paris?" },
{ "role" : "assistant" , "toolCalls" : [{ "id" : "call_1" , "function" : { "name" : "get_weather" , "arguments" : "{ \" location \" : \" Paris \" }" }}]},
{ "role" : "tool" , "toolCallId" : "call_1" , "content" : "{ \" temp \" : 18, \" conditions \" : \" sunny \" }" }
]
}
Continuing Conversations
Use chatId to continue an existing chat:
{
"chatId" : "chat_xyz789" ,
"messages" : [{ "role" : "user" , "content" : "Tell me more" }]
}
Previous messages are automatically included in context.
FAQ
How does streaming work?
The API sends Server-Sent Events (SSE) with incremental content. Each data: line contains a JSON object with delta (new text) and finished (boolean). Parse events as they arrive for real-time display.
What’s the max conversation length?
Limited by the model’s context window. The API automatically truncates old messages if needed. Check the model catalog for context window sizes.
Are responses cached?
No. Each request generates a fresh completion. For idempotent behavior, use the same Idempotency-Key header.
How do I count tokens before sending?
The response includes actual token counts in usage. For pre-request estimates, use a tokenizer library compatible with your model’s provider.
Models API Create and configure models
Knowledge Bases Add documents for RAG