AI Gateway
API Reference

AI Gateway API Reference

Complete API reference for the AICR AI Gateway.

Base URL

Production: https://api.aicoderally.com/v1
Development: http://localhost:3000/api/v1

Authentication

All requests require authentication via Bearer token:

curl -H "Authorization: Bearer $AICR_API_KEY" ...

Optional tenant context:

curl -H "X-Tenant-ID: tenant-123" ...

Endpoints

Chat Completion

Create a chat completion.

POST /chat

Request Body

FieldTypeRequiredDescription
modelstringYesModel identifier
messagesarrayYesConversation messages
temperaturenumberNoSampling temperature (0-2)
max_tokensnumberNoMaximum tokens to generate
streambooleanNoEnable streaming response
skip_cachebooleanNoBypass response cache

Example Request

curl -X POST https://api.aicoderally.com/v1/chat \
  -H "Authorization: Bearer $AICR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is SPM?"}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
  }'

Example Response

{
  "id": "chat-abc123",
  "model": "gpt-4o-mini",
  "created": 1706380800,
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "SPM (Sales Performance Management) is..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Embeddings

Generate embeddings for text.

POST /embeddings

Request Body

FieldTypeRequiredDescription
modelstringYesEmbedding model
inputstring or arrayYesText to embed

Example Request

curl -X POST https://api.aicoderally.com/v1/embeddings \
  -H "Authorization: Bearer $AICR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed-text",
    "input": "Sales performance management is critical for..."
  }'

Example Response

{
  "model": "nomic-embed-text",
  "data": [
    {
      "index": 0,
      "embedding": [0.0023, -0.0142, ...],
      "dimensions": 768
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Models

List available models.

GET /models

Example Response

{
  "models": [
    {
      "id": "gpt-4o",
      "provider": "openai",
      "type": "chat",
      "available": true
    },
    {
      "id": "gpt-4o-mini",
      "provider": "openai",
      "type": "chat",
      "available": true
    },
    {
      "id": "claude-3-sonnet",
      "provider": "anthropic",
      "type": "chat",
      "available": true
    },
    {
      "id": "llama3",
      "provider": "ollama",
      "type": "chat",
      "available": true
    },
    {
      "id": "nomic-embed-text",
      "provider": "ollama",
      "type": "embedding",
      "available": true,
      "dimensions": 768
    }
  ]
}

Usage

Get usage statistics for current billing period.

GET /usage

Query Parameters

ParameterTypeDescription
start_datestringStart date (ISO 8601)
end_datestringEnd date (ISO 8601)
group_bystringGroup by: day, model, user

Example Response

{
  "period": {
    "start": "2026-01-01T00:00:00Z",
    "end": "2026-01-31T23:59:59Z"
  },
  "usage": {
    "total_requests": 1523,
    "total_tokens": 2450000,
    "total_cost_usd": 12.45,
    "by_model": {
      "gpt-4o-mini": {
        "requests": 1200,
        "tokens": 1800000,
        "cost_usd": 1.08
      },
      "gpt-4o": {
        "requests": 323,
        "tokens": 650000,
        "cost_usd": 11.37
      }
    }
  },
  "budget": {
    "limit_usd": 50.00,
    "used_usd": 12.45,
    "remaining_usd": 37.55
  }
}

Health

Check gateway health status.

GET /health

Example Response

{
  "status": "healthy",
  "providers": {
    "openai": { "status": "up", "latency_ms": 45 },
    "anthropic": { "status": "up", "latency_ms": 62 },
    "ollama": { "status": "up", "latency_ms": 12 }
  },
  "cache": { "status": "up", "hit_rate": 0.34 },
  "database": { "status": "up", "latency_ms": 8 }
}

Error Responses

All errors return a consistent format:

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded. Retry after 60 seconds.",
    "details": {
      "limit": 60,
      "window": "1m",
      "retry_after": 60
    }
  }
}

Error Codes

CodeHTTP StatusDescription
UNAUTHORIZED401Invalid or missing API key
FORBIDDEN403Tenant not authorized
NOT_FOUND404Resource not found
RATE_LIMIT_EXCEEDED429Rate limit exceeded
BUDGET_EXCEEDED402Monthly budget exhausted
PROVIDER_ERROR502Upstream provider error
INTERNAL_ERROR500Internal server error

Rate Limits

TierRequests/minTokens/min
Free1010,000
Starter60100,000
Pro300500,000
EnterpriseCustomCustom

Rate limit headers are included in responses:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1706380860

SDKs

TypeScript/JavaScript

import { AIGateway } from '@aicr/ai-router';
 
const gateway = new AIGateway({
  apiKey: process.env.AICR_API_KEY,
});
 
const response = await gateway.chat({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Hello' }],
});

Python

from aicr import AIGateway
 
gateway = AIGateway(api_key=os.environ['AICR_API_KEY'])
 
response = gateway.chat(
    model='gpt-4o-mini',
    messages=[{'role': 'user', 'content': 'Hello'}]
)

Webhooks

Configure webhooks for async events:

POST /webhooks

Event Types

EventDescription
chat.completeChat completion finished
budget.warningBudget at 80%
budget.exceededBudget exhausted
error.providerProvider error occurred