Skip to content

Chat Completions

POST /v1/chat/completions
FieldTypeDefaultDescription
modelstring"auto"Model identifier. Use "auto" for health-aware round-robin across all providers, or specify a model from GET /v1/models.
messagesarrayrequiredConversation history. Each item has role, content, and optional name.
streambooleanfalseWhen true, responses are streamed as server-sent events.
temperaturenumberSampling temperature (0–2). Higher values produce more varied output.
max_tokensnumberMaximum number of tokens to generate.
reasoning_effortstringReasoning budget for supported models. One of "auto", "low", "medium", "high".
project_idstringOptional tag for grouping requests in analytics.
{
"role": "user",
"content": "What is the capital of France?",
"name": "alice"
}
FieldTypeDescription
rolestringOne of "system", "user", "assistant".
contentstringMessage text.
namestringOptional display name for the message author.

A standard OpenAI-compatible response object is returned. The gateway adds extra diagnostic headers:

HeaderDescription
x-gateway-providerThe backend provider that served the request (e.g. groq, gemini).
x-gateway-modelThe exact model used by the provider.
x-gateway-attemptsNumber of provider attempts before a successful response.
x-gateway-request-idUnique request identifier for support and log correlation.
x-gateway-reasoning-effortThe reasoning effort level applied to the request.
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1714000000,
"model": "llama-3.3-70b-versatile",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 9,
"total_tokens": 24
}
}

When stream: true, the response is a stream of data: lines in SSE format, each containing a JSON delta object. The stream is terminated by data: [DONE].

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1714000000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"role":"assistant","content":"The"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1714000000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}
data: [DONE]
Terminal window
curl https://your-gateway.workers.dev/v1/chat/completions \
-H "Authorization: Bearer <GATEWAY_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "What is the capital of France?" }
],
"temperature": 0.7,
"max_tokens": 256
}'
Terminal window
curl https://your-gateway.workers.dev/v1/chat/completions \
-H "Authorization: Bearer <GATEWAY_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{ "role": "user", "content": "Tell me a short story." }],
"stream": true
}'
Terminal window
curl https://your-gateway.workers.dev/v1/chat/completions \
-H "Authorization: Bearer <GATEWAY_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{ "role": "user", "content": "Solve: x^2 - 5x + 6 = 0" }],
"reasoning_effort": "high"
}'