Skip to content

Responses API

POST /v1/responses

The Responses API is an alternative to Chat Completions that uses the OpenAI Responses format. It accepts a flexible input field instead of a structured messages array, making it convenient for single-turn prompts and tool-augmented workflows.

FieldTypeDefaultDescription
modelstring"auto"Model identifier. Use "auto" for health-aware round-robin, or specify a model from GET /v1/models.
inputstring | array | objectrequiredThe prompt input. Can be a plain string, an array of message objects, or a structured input object.
streambooleanfalseWhen true, responses are streamed as server-sent events.
temperaturenumberSampling temperature (0–2).
max_output_tokensnumberMaximum number of tokens to generate.
reasoning_effortstringTop-level reasoning budget shorthand. One of "auto", "low", "medium", "high".
reasoningobjectStructured reasoning config. Use reasoning.effort to set the budget (same values as reasoning_effort).
project_idstringOptional tag for grouping requests in analytics.

String — simplest form, treated as a user message:

{
"model": "auto",
"input": "What is the speed of light?"
}

Array — list of message objects with role and content:

{
"model": "auto",
"input": [
{ "role": "system", "content": "You are a physics tutor." },
{ "role": "user", "content": "What is the speed of light?" }
]
}

Object with reasoning — use the reasoning field for fine-grained control:

{
"model": "auto",
"input": "Explain quantum entanglement simply.",
"reasoning": { "effort": "medium" }
}

The response follows the OpenAI Responses format:

{
"id": "resp_abc123",
"object": "response",
"created_at": 1714000000,
"model": "llama-3.3-70b-versatile",
"output": [
{
"type": "message",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "The speed of light in a vacuum is approximately 299,792,458 metres per second."
}
]
}
],
"usage": {
"input_tokens": 12,
"output_tokens": 18,
"total_tokens": 30
}
}

The same x-gateway-* headers present on Chat Completions are also returned here (see Chat Completions).

Terminal window
curl https://your-gateway.workers.dev/v1/responses \
-H "Authorization: Bearer <GATEWAY_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"input": "Summarise the French Revolution in three sentences.",
"temperature": 0.5,
"max_output_tokens": 300
}'
Terminal window
curl https://your-gateway.workers.dev/v1/responses \
-H "Authorization: Bearer <GATEWAY_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"input": [
{ "role": "system", "content": "You are a concise assistant." },
{ "role": "user", "content": "What is 17 * 43?" }
]
}'
Terminal window
curl https://your-gateway.workers.dev/v1/responses \
-H "Authorization: Bearer <GATEWAY_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"input": "Write a haiku about the ocean.",
"stream": true
}'
Terminal window
curl https://your-gateway.workers.dev/v1/responses \
-H "Authorization: Bearer <GATEWAY_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"input": "Prove that the square root of 2 is irrational.",
"reasoning": { "effort": "high" }
}'