Snaga — AGI Ready | Local-First AI Coding Agent

OpenAI-Compatible API

The platform exposes OpenAI-compatible endpoints, allowing you to use existing OpenAI SDKs and tools by simply changing the base URL and API key. All endpoints follow the OpenAI API specification, so any client library that supports a custom base URL will work.

Base URL

Point your OpenAI SDK or HTTP client to this base URL instead ofhttps://api.openai.com/v1.

/v1

Authentication

Pass your platform API key in the Authorization header using the Bearer scheme, the same way you would with the OpenAI API.

http

Authorization: Bearer YOUR_API_KEY

API keys can be created and rotated in Settings → API Keys.

Endpoints

GET/v1/models

List all available models. Returns model IDs you can use in chat completions and embeddings requests.

Response

json

{
  "object": "list",
  "data": [
    {
      "id": "gpt-4o",
      "object": "model",
      "created": 1700000000,
      "owned_by": "organization"
    }
  ]
}

Examples

curl https://your-api-host.com/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

POST/v1/chat/completions

Create a chat completion. Supports both streaming and non-streaming modes.

Request body

json

{
  "model": "gpt-4o",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello!" }
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 1024
}

Response

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 10,
    "total_tokens": 30
  }
}

Examples

curl https://your-api-host.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Set stream: true to receive Server-Sent Events. The streaming format is identical to the OpenAI API.

POST/v1/embeddings

Create embeddings for the given input text. Returns vector representations that can be used for search, clustering, and similarity comparisons.

Request body

json

{
  "model": "text-embedding-3-small",
  "input": "The quick brown fox jumps over the lazy dog"
}

Response

json

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0091, 0.0152, ...]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

Examples

curl https://your-api-host.com/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "Hello world"
  }'

POST/v1/responses

Create a response using the Responses API. This is an alternative to chat completions that supports richer input types and tool use.

Request body

json

{
  "model": "gpt-4o",
  "input": "Explain quantum computing in simple terms."
}

Response

json

{
  "id": "resp-abc123",
  "object": "response",
  "created_at": 1700000000,
  "model": "gpt-4o",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Quantum computing uses..."
        }
      ]
    }
  ]
}

Examples

curl https://your-api-host.com/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "Explain quantum computing."
  }'

GET/v1/realtime

Establish a WebSocket connection for realtime, bidirectional communication. Used for voice and low-latency interactive sessions.

Examples

const ws = new WebSocket(
  "https://your-api-host.com/v1/realtime?model=gpt-4o-realtime",
  ["realtime", "openai-insecure-api-key.YOUR_API_KEY"],
);

ws.onopen = () => {
  ws.send(JSON.stringify({
    type: "session.update",
    session: { modalities: ["text"] },
  }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  // Handle incoming event: data.type, data
};

This is a WebSocket endpoint. Use the model query parameter to select the realtime model. Authentication is passed via the WebSocket protocol header.

Quick Start

The fastest way to get started is to install the OpenAI SDK and point it at your platform base URL.

pip install openai

from openai import OpenAI

client = OpenAI(
    base_url="https://your-api-host.com/v1",
    api_key="YOUR_API_KEY",  # from Settings > API Keys
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Compatibility Notes

Differences and limitations compared to the OpenAI API.

SupportedChat completions (streaming and non-streaming), embeddings, model listing, responses API, and realtime WebSocket connections.

SupportedStandard parameters: model, messages, temperature, max_tokens, top_p, stop, stream, presence_penalty, frequency_penalty.

NoteAvailable models depend on the LLM providers configured in your workspace settings. Use GET /v1/models to see what is available.

NoteFunction calling and tool use are supported through the chat completions tools parameter.

NoteRate limits and token quotas are governed by your organization plan. Check Settings for current limits.

SNAGA

AGI Ready

● Thinking

OpenAI-Compatible API

Base URL

Point your OpenAI SDK or HTTP client to this base URL instead ofhttps://api.openai.com/v1.

/v1

Authentication

Pass your platform API key in the Authorization header using the Bearer scheme, the same way you would with the OpenAI API.

http

Authorization: Bearer YOUR_API_KEY

API keys can be created and rotated in Settings → API Keys.

Endpoints

GET/v1/models

List all available models. Returns model IDs you can use in chat completions and embeddings requests.

Response

json

{
  "object": "list",
  "data": [
    {
      "id": "gpt-4o",
      "object": "model",
      "created": 1700000000,
      "owned_by": "organization"
    }
  ]
}

Examples

curl https://your-api-host.com/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

POST/v1/chat/completions

Create a chat completion. Supports both streaming and non-streaming modes.

Request body

json

{
  "model": "gpt-4o",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello!" }
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 1024
}

Response

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 10,
    "total_tokens": 30
  }
}

Examples

curl https://your-api-host.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Set stream: true to receive Server-Sent Events. The streaming format is identical to the OpenAI API.

POST/v1/embeddings

Create embeddings for the given input text. Returns vector representations that can be used for search, clustering, and similarity comparisons.

Request body

json

{
  "model": "text-embedding-3-small",
  "input": "The quick brown fox jumps over the lazy dog"
}

Response

json

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0091, 0.0152, ...]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

Examples

curl https://your-api-host.com/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "Hello world"
  }'

POST/v1/responses

Create a response using the Responses API. This is an alternative to chat completions that supports richer input types and tool use.

Request body

json

{
  "model": "gpt-4o",
  "input": "Explain quantum computing in simple terms."
}

Response

json

{
  "id": "resp-abc123",
  "object": "response",
  "created_at": 1700000000,
  "model": "gpt-4o",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Quantum computing uses..."
        }
      ]
    }
  ]
}

Examples

curl https://your-api-host.com/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "Explain quantum computing."
  }'

GET/v1/realtime

Establish a WebSocket connection for realtime, bidirectional communication. Used for voice and low-latency interactive sessions.

Examples

const ws = new WebSocket(
  "https://your-api-host.com/v1/realtime?model=gpt-4o-realtime",
  ["realtime", "openai-insecure-api-key.YOUR_API_KEY"],
);

ws.onopen = () => {
  ws.send(JSON.stringify({
    type: "session.update",
    session: { modalities: ["text"] },
  }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  // Handle incoming event: data.type, data
};

This is a WebSocket endpoint. Use the model query parameter to select the realtime model. Authentication is passed via the WebSocket protocol header.

Quick Start

The fastest way to get started is to install the OpenAI SDK and point it at your platform base URL.

pip install openai

from openai import OpenAI

client = OpenAI(
    base_url="https://your-api-host.com/v1",
    api_key="YOUR_API_KEY",  # from Settings > API Keys
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Compatibility Notes

Differences and limitations compared to the OpenAI API.

SupportedChat completions (streaming and non-streaming), embeddings, model listing, responses API, and realtime WebSocket connections.

SupportedStandard parameters: model, messages, temperature, max_tokens, top_p, stop, stream, presence_penalty, frequency_penalty.

NoteAvailable models depend on the LLM providers configured in your workspace settings. Use GET /v1/models to see what is available.

NoteFunction calling and tool use are supported through the chat completions tools parameter.

NoteRate limits and token quotas are governed by your organization plan. Check Settings for current limits.