Cuadra AI - Connect, Train, and Deploy Your Custom AI Assistant

Quick Start

curl -X POST https://api.cuadra.ai/v1/chats \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "modelId": "model_abc123",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Streaming Responses

Enable stream: true for real-time responses via Server-Sent Events:

curl -X POST https://api.cuadra.ai/v1/chats \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"modelId": "model_abc", "messages": [...], "stream": true}'

Stream format:

data: {"id":"chat_xyz","delta":"Once","finished":false}
data: {"id":"chat_xyz","delta":" upon","finished":false}
data: {"id":"chat_xyz","delta":"","finished":true,"usage":{...}}
data: [DONE]

AI SDK Format

For Vercel AI SDK compatibility, add the header:

X-Stream-Format: ai-sdk

Events: start, text-delta, source-document, reasoning-delta, tool-input-delta, finish

Reasoning (Extended Thinking)

Enable enableReasoning: true to see the model’s thinking process. Supported by models with extended thinking capabilities — check the model catalog for availability.

{
  "modelId": "model_claude",
  "messages": [...],
  "enableReasoning": true,
  "reasoningBudget": 10000
}

Reasoning tokens are billed separately. Use reasoningBudget to cap costs.

Structured Outputs (JSON Mode)

Force JSON schema compliance with responseFormat:

{
  "modelId": "model_abc",
  "messages": [{"role": "user", "content": "Extract: iPhone 15 Pro costs $999"}],
  "responseFormat": {
    "type": "json_schema",
    "json_schema": {
      "name": "product",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "price": {"type": "number"}
        },
        "required": ["name", "price"]
      }
    }
  }
}

Response content will be valid JSON: {"name": "iPhone 15 Pro", "price": 999}

Tool Calling (Function Calling)

Define tools the model can invoke:

{
  "modelId": "model_abc",
  "messages": [{"role": "user", "content": "Weather in Paris?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get current weather",
      "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string"}},
        "required": ["location"]
      }
    }
  }]
}

When the model calls a tool, respond with tool results:

{
  "chatId": "chat_xyz",
  "messages": [
    {"role": "user", "content": "Weather in Paris?"},
    {"role": "assistant", "toolCalls": [{"id": "call_1", "function": {"name": "get_weather", "arguments": "{\"location\":\"Paris\"}"}}]},
    {"role": "tool", "toolCallId": "call_1", "content": "{\"temp\": 18, \"conditions\": \"sunny\"}"}
  ]
}

Continuing Conversations

Use chatId to continue an existing chat:

{
  "chatId": "chat_xyz789",
  "messages": [{"role": "user", "content": "Tell me more"}]
}

Previous messages are automatically included in context.

FAQ

How does streaming work?

The API sends Server-Sent Events (SSE) with incremental content. Each data: line contains a JSON object with delta (new text) and finished (boolean). Parse events as they arrive for real-time display.

What’s the max conversation length?

Limited by the model’s context window. The API automatically truncates old messages if needed. Check the model catalog for context window sizes.

Are responses cached?

No. Each request generates a fresh completion. For idempotent behavior, use the same Idempotency-Key header.

How do I count tokens before sending?

The response includes actual token counts in usage. For pre-request estimates, use a tokenizer library compatible with your model’s provider.

Models API

Create and configure models

Knowledge Bases

Add documents for RAG

Getting Started

API Reference

Guides

Billing

Chat API

Quick Start

Streaming Responses

AI SDK Format

Reasoning (Extended Thinking)

Structured Outputs (JSON Mode)

Tool Calling (Function Calling)

Continuing Conversations

FAQ

How does streaming work?

What’s the max conversation length?

Are responses cached?

How do I count tokens before sending?

Models API

Knowledge Bases

Getting Started

API Reference

Guides

Billing

​Quick Start

​Streaming Responses

​AI SDK Format

​Reasoning (Extended Thinking)

​Structured Outputs (JSON Mode)

​Tool Calling (Function Calling)

​Continuing Conversations

​FAQ

​How does streaming work?

​What’s the max conversation length?

​Are responses cached?

​How do I count tokens before sending?

​Related

Models API

Knowledge Bases

Quick Start

Streaming Responses

AI SDK Format

Reasoning (Extended Thinking)

Structured Outputs (JSON Mode)

Tool Calling (Function Calling)

Continuing Conversations

FAQ

How does streaming work?

What’s the max conversation length?

Are responses cached?

How do I count tokens before sending?

Related