AI Gateway API Reference
Complete API reference for the AICR AI Gateway.
Base URL
Production: https://api.aicoderally.com/v1
Development: http://localhost:3000/api/v1Authentication
All requests require authentication via Bearer token:
curl -H "Authorization: Bearer $AICR_API_KEY" ...Optional tenant context:
curl -H "X-Tenant-ID: tenant-123" ...Endpoints
Chat Completion
Create a chat completion.
POST /chatRequest Body
| Field | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model identifier |
| messages | array | Yes | Conversation messages |
| temperature | number | No | Sampling temperature (0-2) |
| max_tokens | number | No | Maximum tokens to generate |
| stream | boolean | No | Enable streaming response |
| skip_cache | boolean | No | Bypass response cache |
Example Request
curl -X POST https://api.aicoderally.com/v1/chat \
-H "Authorization: Bearer $AICR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is SPM?"}
],
"temperature": 0.7,
"max_tokens": 1024
}'Example Response
{
"id": "chat-abc123",
"model": "gpt-4o-mini",
"created": 1706380800,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "SPM (Sales Performance Management) is..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 150,
"total_tokens": 175
}
}Embeddings
Generate embeddings for text.
POST /embeddingsRequest Body
| Field | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Embedding model |
| input | string or array | Yes | Text to embed |
Example Request
curl -X POST https://api.aicoderally.com/v1/embeddings \
-H "Authorization: Bearer $AICR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "nomic-embed-text",
"input": "Sales performance management is critical for..."
}'Example Response
{
"model": "nomic-embed-text",
"data": [
{
"index": 0,
"embedding": [0.0023, -0.0142, ...],
"dimensions": 768
}
],
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}Models
List available models.
GET /modelsExample Response
{
"models": [
{
"id": "gpt-4o",
"provider": "openai",
"type": "chat",
"available": true
},
{
"id": "gpt-4o-mini",
"provider": "openai",
"type": "chat",
"available": true
},
{
"id": "claude-3-sonnet",
"provider": "anthropic",
"type": "chat",
"available": true
},
{
"id": "llama3",
"provider": "ollama",
"type": "chat",
"available": true
},
{
"id": "nomic-embed-text",
"provider": "ollama",
"type": "embedding",
"available": true,
"dimensions": 768
}
]
}Usage
Get usage statistics for current billing period.
GET /usageQuery Parameters
| Parameter | Type | Description |
|---|---|---|
| start_date | string | Start date (ISO 8601) |
| end_date | string | End date (ISO 8601) |
| group_by | string | Group by: day, model, user |
Example Response
{
"period": {
"start": "2026-01-01T00:00:00Z",
"end": "2026-01-31T23:59:59Z"
},
"usage": {
"total_requests": 1523,
"total_tokens": 2450000,
"total_cost_usd": 12.45,
"by_model": {
"gpt-4o-mini": {
"requests": 1200,
"tokens": 1800000,
"cost_usd": 1.08
},
"gpt-4o": {
"requests": 323,
"tokens": 650000,
"cost_usd": 11.37
}
}
},
"budget": {
"limit_usd": 50.00,
"used_usd": 12.45,
"remaining_usd": 37.55
}
}Health
Check gateway health status.
GET /healthExample Response
{
"status": "healthy",
"providers": {
"openai": { "status": "up", "latency_ms": 45 },
"anthropic": { "status": "up", "latency_ms": 62 },
"ollama": { "status": "up", "latency_ms": 12 }
},
"cache": { "status": "up", "hit_rate": 0.34 },
"database": { "status": "up", "latency_ms": 8 }
}Error Responses
All errors return a consistent format:
{
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "Rate limit exceeded. Retry after 60 seconds.",
"details": {
"limit": 60,
"window": "1m",
"retry_after": 60
}
}
}Error Codes
| Code | HTTP Status | Description |
|---|---|---|
| UNAUTHORIZED | 401 | Invalid or missing API key |
| FORBIDDEN | 403 | Tenant not authorized |
| NOT_FOUND | 404 | Resource not found |
| RATE_LIMIT_EXCEEDED | 429 | Rate limit exceeded |
| BUDGET_EXCEEDED | 402 | Monthly budget exhausted |
| PROVIDER_ERROR | 502 | Upstream provider error |
| INTERNAL_ERROR | 500 | Internal server error |
Rate Limits
| Tier | Requests/min | Tokens/min |
|---|---|---|
| Free | 10 | 10,000 |
| Starter | 60 | 100,000 |
| Pro | 300 | 500,000 |
| Enterprise | Custom | Custom |
Rate limit headers are included in responses:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1706380860SDKs
TypeScript/JavaScript
import { AIGateway } from '@aicr/ai-router';
const gateway = new AIGateway({
apiKey: process.env.AICR_API_KEY,
});
const response = await gateway.chat({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Hello' }],
});Python
from aicr import AIGateway
gateway = AIGateway(api_key=os.environ['AICR_API_KEY'])
response = gateway.chat(
model='gpt-4o-mini',
messages=[{'role': 'user', 'content': 'Hello'}]
)Webhooks
Configure webhooks for async events:
POST /webhooksEvent Types
| Event | Description |
|---|---|
| chat.complete | Chat completion finished |
| budget.warning | Budget at 80% |
| budget.exceeded | Budget exhausted |
| error.provider | Provider error occurred |