Docs / LLM Inference

LLM Inference

The public LLM endpoint is POST /api/inference/llm. It is the app-facing OpenAI-compatible gateway.

POST /api/inference/llm

Requires a valid app bearer token in the Authorization header.

Request body

Field	Type	Description
modelrequired	string	LLM model ID from Models.
messagesrequired	array	OpenAI-style message array.
toolsoptional	array	OpenAI-compatible function tools.
response_formatoptional	object	Supports `json_object` and `json_schema`.
streamoptional	boolean	Defaults to streaming. Set `false` for a single JSON response.

Example request

bash

curl https://serechat.com/api/inference/llm \\
  -X POST \\
  -H "Authorization: Bearer <access_token>" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "gemma-4-31b",
    "messages": [
      { "role": "user", "content": "Summarize the difference between REST and SSE." }
    ]
  }'

Tool calling

After receiving tool calls, execute them client-side, append the tool result message, then call POST /api/inference/llm again with the updated conversation.

Streaming format

This endpoint is OpenAI-compatible. When stream is enabled, it returns standard OpenAI streaming chunks and ends with data: [DONE].

Models endpoint

Discover live LLM models at GET /api/inference/models?type=llm. If type is omitted, it defaults to llm.