> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cuadra.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Knowledge Bases

> Create datasets, upload documents, and power RAG-based AI assistants in Cuadra AI.

## How RAG Works

| Step            | Process                                    |
| --------------- | ------------------------------------------ |
| **1. Upload**   | Add files to the platform                  |
| **2. Chunk**    | Documents split into \~250 token segments  |
| **3. Embed**    | Vector embeddings generated for each chunk |
| **4. Search**   | User query matched against embeddings      |
| **5. Retrieve** | Top chunks injected into LLM context       |

***

## Create Dataset

<CodeGroup>
  ```bash curl theme={null}
  curl -X POST https://api.cuadra.ai/v1/datasets \
    -H "Authorization: Bearer YOUR_TOKEN" \
    -H "Content-Type: application/json" \
    -H "Idempotency-Key: create-ds-001" \
    -d '{"name": "Product Docs", "description": "API guides"}'
  ```

  ```python Python theme={null}
  import httpx

  response = httpx.post(
      "https://api.cuadra.ai/v1/datasets",
      headers={"Authorization": "Bearer YOUR_TOKEN", "Idempotency-Key": "create-ds-001"},
      json={"name": "Product Docs", "description": "API guides"}
  )
  dataset = response.json()
  ```

  ```typescript Node.js theme={null}
  const response = await fetch('https://api.cuadra.ai/v1/datasets', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_TOKEN',
      'Content-Type': 'application/json',
      'Idempotency-Key': 'create-ds-001'
    },
    body: JSON.stringify({ name: 'Product Docs', description: 'API guides' })
  });
  const dataset = await response.json();
  ```
</CodeGroup>

***

## Upload Documents

Adding documents is a two-step process: upload the file, then associate it with a dataset.

### Step 1: Upload File

<CodeGroup>
  ```bash curl theme={null}
  curl -X POST https://api.cuadra.ai/v1/files \
    -H "Authorization: Bearer YOUR_TOKEN" \
    -H "Idempotency-Key: upload-001" \
    -F "file=@product-guide.pdf"
  ```

  ```python Python theme={null}
  import httpx

  with open("product-guide.pdf", "rb") as f:
      response = httpx.post(
          "https://api.cuadra.ai/v1/files",
          headers={"Authorization": "Bearer YOUR_TOKEN", "Idempotency-Key": "upload-001"},
          files={"file": f}
      )
  file = response.json()
  file_id = file["id"]
  ```

  ```typescript Node.js theme={null}
  const formData = new FormData();
  formData.append('file', fileBlob, 'product-guide.pdf');

  const response = await fetch('https://api.cuadra.ai/v1/files', {
    method: 'POST',
    headers: { 'Authorization': 'Bearer YOUR_TOKEN', 'Idempotency-Key': 'upload-001' },
    body: formData
  });
  const file = await response.json();
  const fileId = file.id;
  ```
</CodeGroup>

### Step 2: Associate with Dataset

<CodeGroup>
  ```bash curl theme={null}
  curl -X POST https://api.cuadra.ai/v1/files/file_abc123/associations \
    -H "Authorization: Bearer YOUR_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{"datasetId": "ds_xyz789"}'
  ```

  ```python Python theme={null}
  import httpx

  response = httpx.post(
      f"https://api.cuadra.ai/v1/files/{file_id}/associations",
      headers={"Authorization": "Bearer YOUR_TOKEN"},
      json={"datasetId": "ds_xyz789"}
  )
  print("File added to dataset")
  ```

  ```typescript Node.js theme={null}
  await fetch(`https://api.cuadra.ai/v1/files/${fileId}/associations`, {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_TOKEN',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ datasetId: 'ds_xyz789' })
  });
  console.log('File added to dataset');
  ```
</CodeGroup>

### Supported Formats

| Format | Extensions  | Max Size |
| ------ | ----------- | -------- |
| PDF    | .pdf        | 50MB     |
| Word   | .docx       | 50MB     |
| Text   | .txt, .md   | 50MB     |
| Data   | .csv, .json | 50MB     |

***

## Link to Model

Connect a dataset to enable RAG:

<CodeGroup>
  ```bash curl theme={null}
  curl -X POST https://api.cuadra.ai/v1/models/model_abc/datasets \
    -H "Authorization: Bearer YOUR_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{"datasetId": "ds_xyz", "usageType": "rag"}'
  ```

  ```python Python theme={null}
  import httpx

  response = httpx.post(
      "https://api.cuadra.ai/v1/models/model_abc/datasets",
      headers={"Authorization": "Bearer YOUR_TOKEN"},
      json={"datasetId": "ds_xyz", "usageType": "rag"}
  )
  print("Dataset linked to model")
  ```

  ```typescript Node.js theme={null}
  const response = await fetch('https://api.cuadra.ai/v1/models/model_abc/datasets', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_TOKEN',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ datasetId: 'ds_xyz', usageType: 'rag' })
  });
  console.log('Dataset linked to model');
  ```
</CodeGroup>

***

## Best Practices

| Do                             | Avoid                          |
| ------------------------------ | ------------------------------ |
| Clean formatting before upload | Scanned images without OCR     |
| Use descriptive filenames      | Duplicate content across files |
| Split large docs into sections | Mixing unrelated topics        |
| Group related content          | PII or sensitive data          |

***

## Specifications

| Spec           | Value        |
| -------------- | ------------ |
| Max file size  | 50MB         |
| Chunk size     | \~250 tokens |
| Search latency | 40-120ms     |

***

## FAQ

### What file formats work best?

Markdown and plain text yield the best results. PDFs work well if they're text-based (not scanned images). Use OCR preprocessing for scanned documents.

### How often is content re-indexed?

Uploaded files are indexed once at upload. Re-upload to refresh content.

### Can I preview what chunks were created?

Not via API currently. Use the Dashboard → Datasets → View to inspect chunks.

### How do I improve retrieval quality?

1. Use specific, descriptive filenames
2. Add summaries at the start of documents
3. Remove boilerplate/headers that repeat across pages
4. Split very long documents into logical sections

### What happens if I delete a document?

The document and its chunks are removed. This affects new chats only—existing chat histories retain their context.

***

## Related

<CardGroup cols="2">
  <Card title="Chat API" icon="comments" href="/api-reference/chat">
    Use RAG in chat completions
  </Card>

  <Card title="Models" icon="cube" href="/api-reference/models">
    Link datasets to models
  </Card>
</CardGroup>