LLM Inference
The public LLM endpoint is POST /api/inference/llm. It is the app-facing OpenAI-compatible gateway.
POST /api/inference/llm
Requires a valid app bearer token in the Authorization header.
Request body
| Field | Type | Description |
|---|---|---|
| modelrequired | string | LLM model ID from Models. |
| messagesrequired | array | OpenAI-style message array. |
| toolsoptional | array | OpenAI-compatible function tools. |
| response_formatoptional | object | Supports json_object and json_schema. |
| streamoptional | boolean | Defaults to streaming. Set false for a single JSON response. |
Example request
bash
curl https://serechat.com/api/inference/llm \\
-X POST \\
-H "Authorization: Bearer <access_token>" \\
-H "Content-Type: application/json" \\
-d '{
"model": "gemma-4-31b",
"messages": [
{ "role": "user", "content": "Summarize the difference between REST and SSE." }
]
}'Tool calling
After receiving tool calls, execute them client-side, append the tool result message, then call POST /api/inference/llm again with the updated conversation.
Streaming format
This endpoint is OpenAI-compatible. When stream is enabled, it returns standard OpenAI streaming chunks and ends with data: [DONE].
Models endpoint
Discover live LLM models at GET /api/inference/models?type=llm. If type is omitted, it defaults to llm.