Quickstart
From zero to your first streaming inference response in under 5 minutes. This example uses Python but the flow is the same in any language.
Prerequisites
You need requests and a SereChat account with a positive balance. The user who authorizes your app pays for inference.
Step 1 — Start an authorization request
Call POST /api/auth/app/request with your app's display name. No credentials required.
python
import requests
res = requests.post(
"https://serechat.com/api/auth/app/request",
json={"app_name": "My Rust Agent"},
)
request_id = res.json()["request_id"]
print(f"request_id: {request_id}")Step 2 — Ask the user to authorize
Direct the user to the SereChat authorization page. They log in (if not already), approve your app, and receive a 6-digit code.
python
print(f"Open this URL in your browser:\n")
print(f" https://serechat.com/authorize-app?request_id={request_id}\n")
code = input("Enter the 6-digit code: ").strip()Step 3 — Exchange the code for a token
python
res = requests.post(
"https://serechat.com/api/auth/app/exchange",
json={"request_id": request_id, "code": code},
)
token = res.json()["access_token"]
print("Authorized! Token stored.")ℹStore the token securely. It is valid for one year and grants inference access on behalf of the user.
Step 4 — Run LLM inference
Call POST /api/inference/llm with your token. Responses stream as Server-Sent Events.
python
import json
with requests.post(
"https://serechat.com/api/inference/llm",
headers={"Authorization": f"Bearer {token}"},
json={
"model": "gemma-4-31b",
"messages": [{"role": "user", "content": "Hello!"}],
},
stream=True,
) as resp:
event = None
for line in resp.iter_lines(decode_unicode=True):
if not line:
event = None
continue
if line.startswith("event: "):
event = line[7:]
continue
if event == "content" and line.startswith("data: "):
payload = json.loads(line[6:])
print(payload["content"], end="", flush=True)