AI Gateway API Reference

Complete API reference for the AICR AI Gateway.

Base URL

Production: https://api.aicoderally.com/v1
Development: http://localhost:3000/api/v1

Authentication

All requests require authentication via Bearer token:

curl -H "Authorization: Bearer $AICR_API_KEY" ...

Optional tenant context:

curl -H "X-Tenant-ID: tenant-123" ...

Endpoints

Chat Completion

Create a chat completion.

POST /chat

Request Body

Field	Type	Required	Description
model	string	Yes	Model identifier
messages	array	Yes	Conversation messages
temperature	number	No	Sampling temperature (0-2)
max_tokens	number	No	Maximum tokens to generate
stream	boolean	No	Enable streaming response
skip_cache	boolean	No	Bypass response cache

Example Request

curl -X POST https://api.aicoderally.com/v1/chat \
  -H "Authorization: Bearer $AICR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is SPM?"}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
  }'

Example Response

{
  "id": "chat-abc123",
  "model": "gpt-4o-mini",
  "created": 1706380800,
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "SPM (Sales Performance Management) is..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Embeddings

Generate embeddings for text.

POST /embeddings

Request Body

Field	Type	Required	Description
model	string	Yes	Embedding model
input	string or array	Yes	Text to embed

Example Request

curl -X POST https://api.aicoderally.com/v1/embeddings \
  -H "Authorization: Bearer $AICR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed-text",
    "input": "Sales performance management is critical for..."
  }'

Example Response

{
  "model": "nomic-embed-text",
  "data": [
    {
      "index": 0,
      "embedding": [0.0023, -0.0142, ...],
      "dimensions": 768
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Models

List available models.

GET /models

Example Response

{
  "models": [
    {
      "id": "gpt-4o",
      "provider": "openai",
      "type": "chat",
      "available": true
    },
    {
      "id": "gpt-4o-mini",
      "provider": "openai",
      "type": "chat",
      "available": true
    },
    {
      "id": "claude-3-sonnet",
      "provider": "anthropic",
      "type": "chat",
      "available": true
    },
    {
      "id": "llama3",
      "provider": "ollama",
      "type": "chat",
      "available": true
    },
    {
      "id": "nomic-embed-text",
      "provider": "ollama",
      "type": "embedding",
      "available": true,
      "dimensions": 768
    }
  ]
}

Usage

Get usage statistics for current billing period.

GET /usage

Query Parameters

Parameter	Type	Description
start_date	string	Start date (ISO 8601)
end_date	string	End date (ISO 8601)
group_by	string	Group by: day, model, user

Example Response

{
  "period": {
    "start": "2026-01-01T00:00:00Z",
    "end": "2026-01-31T23:59:59Z"
  },
  "usage": {
    "total_requests": 1523,
    "total_tokens": 2450000,
    "total_cost_usd": 12.45,
    "by_model": {
      "gpt-4o-mini": {
        "requests": 1200,
        "tokens": 1800000,
        "cost_usd": 1.08
      },
      "gpt-4o": {
        "requests": 323,
        "tokens": 650000,
        "cost_usd": 11.37
      }
    }
  },
  "budget": {
    "limit_usd": 50.00,
    "used_usd": 12.45,
    "remaining_usd": 37.55
  }
}

Health

Check gateway health status.

GET /health

Example Response

{
  "status": "healthy",
  "providers": {
    "openai": { "status": "up", "latency_ms": 45 },
    "anthropic": { "status": "up", "latency_ms": 62 },
    "ollama": { "status": "up", "latency_ms": 12 }
  },
  "cache": { "status": "up", "hit_rate": 0.34 },
  "database": { "status": "up", "latency_ms": 8 }
}

Error Responses

All errors return a consistent format:

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded. Retry after 60 seconds.",
    "details": {
      "limit": 60,
      "window": "1m",
      "retry_after": 60
    }
  }
}

Error Codes

Code	HTTP Status	Description
UNAUTHORIZED	401	Invalid or missing API key
FORBIDDEN	403	Tenant not authorized
NOT_FOUND	404	Resource not found
RATE_LIMIT_EXCEEDED	429	Rate limit exceeded
BUDGET_EXCEEDED	402	Monthly budget exhausted
PROVIDER_ERROR	502	Upstream provider error
INTERNAL_ERROR	500	Internal server error

Rate Limits

Tier	Requests/min	Tokens/min
Free	10	10,000
Starter	60	100,000
Pro	300	500,000
Enterprise	Custom	Custom

Rate limit headers are included in responses:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1706380860

SDKs

TypeScript/JavaScript

import { AIGateway } from '@aicr/ai-router';
 
const gateway = new AIGateway({
  apiKey: process.env.AICR_API_KEY,
});
 
const response = await gateway.chat({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Hello' }],
});

Python

from aicr import AIGateway
 
gateway = AIGateway(api_key=os.environ['AICR_API_KEY'])
 
response = gateway.chat(
    model='gpt-4o-mini',
    messages=[{'role': 'user', 'content': 'Hello'}]
)

Webhooks

Configure webhooks for async events:

POST /webhooks

Event Types

Event	Description
chat.complete	Chat completion finished
budget.warning	Budget at 80%
budget.exceeded	Budget exhausted
error.provider	Provider error occurred

Architecture Providers