> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cuadra.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Chat API

> Create chat completions with the Cuadra AI Chat API. Supports streaming, reasoning tokens, RAG sources, tool calling, and structured JSON outputs.

## Quick Start

<CodeGroup>
  ```bash curl theme={null}
  curl -X POST https://api.cuadra.ai/v1/chats \
    -H "Authorization: Bearer YOUR_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{
      "modelId": "model_abc123",
      "messages": [{"role": "user", "content": "Hello!"}]
    }'
  ```

  ```python Python theme={null}
  import httpx

  response = httpx.post(
      "https://api.cuadra.ai/v1/chats",
      headers={"Authorization": "Bearer YOUR_TOKEN"},
      json={
          "modelId": "model_abc123",
          "messages": [{"role": "user", "content": "Hello!"}]
      }
  )
  print(response.json()["message"]["content"])
  ```

  ```typescript Node.js theme={null}
  const response = await fetch('https://api.cuadra.ai/v1/chats', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_TOKEN',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      modelId: 'model_abc123',
      messages: [{ role: 'user', content: 'Hello!' }]
    })
  });
  const { message } = await response.json();
  console.log(message.content);
  ```
</CodeGroup>

***

## Streaming Responses

Enable `stream: true` for real-time responses via Server-Sent Events:

<CodeGroup>
  ```bash curl theme={null}
  curl -X POST https://api.cuadra.ai/v1/chats \
    -H "Authorization: Bearer YOUR_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{"modelId": "model_abc", "messages": [...], "stream": true}'
  ```

  ```python Python theme={null}
  from httpx_sse import aconnect_sse
  import httpx, asyncio

  async def stream():
      async with httpx.AsyncClient() as client:
          async with aconnect_sse(client, "POST", "https://api.cuadra.ai/v1/chats",
              headers={"Authorization": "Bearer YOUR_TOKEN"},
              json={"modelId": "model_abc", "messages": [...], "stream": True}
          ) as sse:
              async for event in sse.aiter_sse():
                  if event.data != "[DONE]":
                      print(event.data)

  asyncio.run(stream())
  ```

  ```typescript Node.js theme={null}
  const response = await fetch('https://api.cuadra.ai/v1/chats', {
    method: 'POST',
    headers: { 'Authorization': 'Bearer YOUR_TOKEN', 'Content-Type': 'application/json' },
    body: JSON.stringify({ modelId: 'model_abc', messages: [...], stream: true })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    console.log(decoder.decode(value));
  }
  ```
</CodeGroup>

**Stream format:**

```
data: {"id":"chat_xyz","delta":"Once","finished":false}
data: {"id":"chat_xyz","delta":" upon","finished":false}
data: {"id":"chat_xyz","delta":"","finished":true,"usage":{...}}
data: [DONE]
```

### AI SDK Format

For Vercel AI SDK compatibility, add the header:

```http theme={null}
X-Stream-Format: ai-sdk
```

Events: `start`, `text-delta`, `source-document`, `reasoning-delta`, `tool-input-delta`, `finish`

***

## Reasoning (Extended Thinking)

Enable `enableReasoning: true` to see the model's thinking process. Supported by models with extended thinking capabilities — check the model catalog for availability.

```json theme={null}
{
  "modelId": "model_claude",
  "messages": [...],
  "enableReasoning": true,
  "reasoningBudget": 10000
}
```

<Warning>
  Reasoning tokens are billed separately. Use `reasoningBudget` to cap costs.
</Warning>

***

## Structured Outputs (JSON Mode)

Force JSON schema compliance with `responseFormat`:

```json theme={null}
{
  "modelId": "model_abc",
  "messages": [{"role": "user", "content": "Extract: iPhone 15 Pro costs $999"}],
  "responseFormat": {
    "type": "json_schema",
    "json_schema": {
      "name": "product",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "price": {"type": "number"}
        },
        "required": ["name", "price"]
      }
    }
  }
}
```

Response content will be valid JSON: `{"name": "iPhone 15 Pro", "price": 999}`

***

## Tool Calling (Function Calling)

Define tools the model can invoke:

```json theme={null}
{
  "modelId": "model_abc",
  "messages": [{"role": "user", "content": "Weather in Paris?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get current weather",
      "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string"}},
        "required": ["location"]
      }
    }
  }]
}
```

When the model calls a tool, respond with tool results:

```json theme={null}
{
  "chatId": "chat_xyz",
  "messages": [
    {"role": "user", "content": "Weather in Paris?"},
    {"role": "assistant", "toolCalls": [{"id": "call_1", "function": {"name": "get_weather", "arguments": "{\"location\":\"Paris\"}"}}]},
    {"role": "tool", "toolCallId": "call_1", "content": "{\"temp\": 18, \"conditions\": \"sunny\"}"}
  ]
}
```

***

## Continuing Conversations

Use `chatId` to continue an existing chat:

```json theme={null}
{
  "chatId": "chat_xyz789",
  "messages": [{"role": "user", "content": "Tell me more"}]
}
```

Previous messages are automatically included in context.

***

## FAQ

### How does streaming work?

The API sends Server-Sent Events (SSE) with incremental content. Each `data:` line contains a JSON object with `delta` (new text) and `finished` (boolean). Parse events as they arrive for real-time display.

### What's the max conversation length?

Limited by the model's context window. The API automatically truncates old messages if needed. Check the model catalog for context window sizes.

### Are responses cached?

No. Each request generates a fresh completion. For idempotent behavior, use the same `Idempotency-Key` header.

### How do I count tokens before sending?

The response includes actual token counts in `usage`. For pre-request estimates, use a tokenizer library compatible with your model's provider.

***

## Related

<CardGroup cols={2}>
  <Card title="Models API" icon="cube" href="/api-reference/models">
    Create and configure models
  </Card>

  <Card title="Knowledge Bases" icon="database" href="/guides/knowledge-bases">
    Add documents for RAG
  </Card>
</CardGroup>