Docs / Getting Started

Quickstart

From zero to your first streaming inference response in under 5 minutes. This example uses Python but the flow is the same in any language.

Prerequisites

You need requests and a SereChat account with a positive balance. The user who authorizes your app pays for inference.

Step 1 — Start an authorization request

Call POST /api/auth/app/request with your app's display name. No credentials required.

python

import requests

res = requests.post(
    "https://serechat.com/api/auth/app/request",
    json={"app_name": "My Rust Agent"},
)
request_id = res.json()["request_id"]
print(f"request_id: {request_id}")

Step 2 — Ask the user to authorize

Direct the user to the SereChat authorization page. They log in (if not already), approve your app, and receive a 6-digit code.

python

print(f"Open this URL in your browser:\n")
print(f"  https://serechat.com/authorize-app?request_id={request_id}\n")
code = input("Enter the 6-digit code: ").strip()

Step 3 — Exchange the code for a token

python

res = requests.post(
    "https://serechat.com/api/auth/app/exchange",
    json={"request_id": request_id, "code": code},
)
token = res.json()["access_token"]
print("Authorized! Token stored.")

ℹStore the token securely. It is valid for one year and grants inference access on behalf of the user.

Step 4 — Run LLM inference

Call POST /api/inference/llm with your token. Responses stream as Server-Sent Events.

python

import json

with requests.post(
    "https://serechat.com/api/inference/llm",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "model": "gemma-4-31b",
        "messages": [{"role": "user", "content": "Hello!"}],
    },
    stream=True,
) as resp:
    event = None
    for line in resp.iter_lines(decode_unicode=True):
        if not line:
            event = None
            continue
        if line.startswith("event: "):
            event = line[7:]
            continue
        if event == "content" and line.startswith("data: "):
            payload = json.loads(line[6:])
            print(payload["content"], end="", flush=True)

What's next

Device Code Flow →

Full auth endpoint reference with all parameters and error codes.

LLM Inference →

Tools, structured output, image inputs, and the exact SSE event format.

Media Generation →

Async image, video, audio, and SVG generation: job creation, polling, and result retrieval.