SereChat
Apps
← Back to SereChat

Docs / LLM Inference

LLM Inference

The public LLM endpoint is POST /api/inference/llm. It is the app-facing OpenAI-compatible gateway.

POST /api/inference/llm

Requires a valid app bearer token in the Authorization header.

Request body

FieldTypeDescription
modelrequiredstringLLM model ID from Models.
messagesrequiredarrayOpenAI-style message array.
toolsoptionalarrayOpenAI-compatible function tools.
response_formatoptionalobjectSupports json_object and json_schema.
streamoptionalbooleanDefaults to streaming. Set false for a single JSON response.

Example request

bash
curl https://serechat.com/api/inference/llm \\
  -X POST \\
  -H "Authorization: Bearer <access_token>" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "gemma-4-31b",
    "messages": [
      { "role": "user", "content": "Summarize the difference between REST and SSE." }
    ]
  }'

Tool calling

After receiving tool calls, execute them client-side, append the tool result message, then call POST /api/inference/llm again with the updated conversation.

Streaming format

This endpoint is OpenAI-compatible. When stream is enabled, it returns standard OpenAI streaming chunks and ends with data: [DONE].

Models endpoint

Discover live LLM models at GET /api/inference/models?type=llm. If type is omitted, it defaults to llm.