Skip to content

Chat & streaming

The crown jewel of the bridge. This category is how a native client shows a live assistant response — tokens appearing as they generate, tool calls surfacing in real time, and a final settled transcript. If you implement one thing well, make it this. Read the Streaming primer first for the transport model; this page is the exact contract.

The loop is always: POST /api/chat/start (get a stream_id) → GET /api/chat/stream (open the SSE stream) → render frames → the socket closes on a terminal frame.


POST /api/chat/start — begin a turn

Body

{ "session_id": "…", "message": "…",
  "workspace": "?", "model": "?", "model_provider": "?", "profile": "?",
  "explicit_model_pick": true, "attachments": [] }
session_id + a non-empty message are required; attachments capped at 20.

Response { "stream_id": "<hex>", "session_id": "…", "pending_started_at": <float>, "effective_model": "…" }

Status Meaning
400 missing session_id/message
404 session not found
403 read-only / foreign session
409 { "error": "session already has an active stream", "active_stream_id": "…" }

A session can have only one active stream. Register the returned stream_id, then immediately open the stream below.

GET /api/chat/stream — the live SSE stream

GET /api/chat/stream?stream_id=<id>
Accept: text/event-stream

Optional replay for reconnect: &replay=1&after_seq=<n> (or after_event_id). Response headers: Content-Type: text/event-stream, Cache-Control: no-cache, X-Accel-Buffering: no. If the stream_id is unknown and no run-journal replay exists → 404 {"error":"stream not found"}.

Each frame is event: <name> + data: <json>. Idle heartbeats arrive as SSE comments (: heartbeat) every ~5s — ignore them.

Event frames

event: data payload Client action
token { "text": "<delta>" } Append delta to the live assistant message.
reasoning { "text": "<delta>" } Append to the reasoning/thinking trace.
interim_assistant { "text": "<full>", "already_streamed": bool } Replace/seed the assistant text with this reconciled full text; skip if already_streamed.
tool { "event_type": "tool.started", "name", "preview", "args", "id"/"tool_call_id"/"tool_use_id" } Add a live tool-call card; key it by the stable id.
tool_complete same + { "duration", "is_error" } Mark that card done (match on id).
title { "session_id", "title" } Update the session title in the sidebar.
done { "session": { …full transcript+messages }, "usage": {…}, "terminal_state"? } Reconcile the final transcript from session.messages; usage → context-window indicator. Not the closing framestream_end follows.
stream_end { "session_id" } Closes the stream. Finalize the turn.
cancel { "type": "cancelled", "message", "session"? } Closes the stream. Settle as cancelled.
error { "error", "message"? } Closes the stream. Surface the error.
apperror { "error", "type", "session", "terminal_state"? } Terminal failure. ⚠️ The reference iOS client does not handle this frame — a new native client should: treat it like error + reconcile the attached session.
pending_steer_leftover { "text": "…" } Steer text that wasn't consumed — restore it into the composer.

Non-rendering extras a client may ignore: metering (live TPS), compressing/compressed, warning, context_status, todo_state, goal/goal_continue. Unknown event types must be ignored, never errored — the protocol grows over time.

Terminal-frame semantics

done is not what closes the socket. A successful turn ends with done then stream_end. The socket-terminating frames are stream_end, cancel, and error (and you should treat apperror as terminal too). Close your EventSource on those.

Reconstructing the message

Concatenate token deltas into the live bubble (the web UI paces them word-by-word); an interim_assistant full-text replaces the accumulated text when it arrives; reasoning deltas build the collapsible reasoning card; tool/tool_complete maintain live tool cards keyed by stable id; the done frame's session.messages is the authoritative final transcript — reconcile your streamed text against it. On a dropped connection, reconnect with replay=1&after_seq=<lastEventId>.

GET /api/chat/stream/status — liveness / replay

?stream_id=…{ "active": bool, "stream_id", "replay_available": bool, "journal": { "terminal": bool, "terminal_state" }? }. active = the stream is still live in-process. A client returning to the foreground calls this to decide finalize vs reconnect.

GET /api/chat/cancel — cancel the turn

?stream_id=…{ "ok": true, "cancelled": bool, "stream_id" }. Sets the cancel flag and eagerly frees the stream so a new /api/chat/start succeeds immediately; the worker then emits a terminal cancel frame.

POST /api/chat/steer — inject a mid-turn nudge (non-interrupting)

Body { "session_id", "text" }{ "accepted": bool, "fallback": <reason|null>, "stream_id": <id|null> }. The text is applied at the next tool-result boundary without interrupting the stream. Fallback reasons: no_cached_agent, agent_lacks_steer, not_running, stream_dead, steer_error. Unapplied steer text later surfaces as a pending_steer_leftover frame.

POST /api/goal — goal control

Body { "session_id", "args": "<verb or text>", model?, workspace? }. args semantics: ""/status → status; pause/resume; clear/stop/done → clear; anything else → set the goal (and kick off a turn if none is active). When a turn kicks off, the response merges goal state with the chat-start fields (stream_id, …) — open /api/chat/stream next. 409 { "error": "agent_running" } if a turn is already live.

POST /api/btw — ephemeral side-question

Body { "session_id", "question" }{ "stream_id", "session_id": "<hidden>", "parent_session_id" }. Runs a throwaway turn that borrows the parent's context, then is discarded. Open /api/chat/stream?stream_id=…; the terminal done frame carries "ephemeral": true and an "answer". The parent session is left unmodified.

Background tasks

  • POST /api/background { "session_id", "prompt" }{ "task_id", "stream_id", "session_id": "<hidden>" } — spawns a parallel background agent.
  • GET /api/background/status ?session_id=<parent>{ "results": [ { "task_id", "prompt", "answer", "completed_at" } ] }. Background results are retrieved by polling this, not via an SSE stream.

Waiting-on-user prompts (tool approvals, clarifying questions) arrive on their own SSE streams — see Approvals & clarify.