Chat & streaming¶

The crown jewel of the bridge. This category is how a native client shows a live assistant response — tokens appearing as they generate, tool calls surfacing in real time, and a final settled transcript. If you implement one thing well, make it this. Read the Streaming primer first for the transport model; this page is the exact contract.

The loop is always: POST /api/chat/start (get a stream_id) → GET /api/chat/stream (open the SSE stream) → render frames → the socket closes on a terminal frame.

`POST /api/chat/start` — begin a turn¶

Body

{ "session_id": "…", "message": "…",
  "workspace": "?", "model": "?", "model_provider": "?", "profile": "?",
  "explicit_model_pick": true, "attachments": [] }

session_id + a non-empty message are required; attachments capped at 20.

Response { "stream_id": "<hex>", "session_id": "…", "pending_started_at": <float>, "effective_model": "…" }

Status	Meaning
`400`	missing `session_id`/`message`
`404`	session not found
`403`	read-only / foreign session
`409`	`{ "error": "session already has an active stream", "active_stream_id": "…" }`

A session can have only one active stream. Register the returned stream_id, then immediately open the stream below.

`GET /api/chat/stream` — the live SSE stream¶

GET /api/chat/stream?stream_id=<id>
Accept: text/event-stream

Optional replay for reconnect: &replay=1&after_seq=<n> (or after_event_id). Response headers: Content-Type: text/event-stream, Cache-Control: no-cache, X-Accel-Buffering: no. If the stream_id is unknown and no run-journal replay exists → 404 {"error":"stream not found"}.

Each frame is event: <name> + data: <json>. Idle heartbeats arrive as SSE comments (: heartbeat) every ~5s — ignore them.

Event frames¶

`event:`	`data` payload	Client action
`token`	`{ "text": "<delta>" }`	Append delta to the live assistant message.
`reasoning`	`{ "text": "<delta>" }`	Append to the reasoning/thinking trace.
`interim_assistant`	`{ "text": "<full>", "already_streamed": bool }`	Replace/seed the assistant text with this reconciled full text; skip if `already_streamed`.
`tool`	`{ "event_type": "tool.started", "name", "preview", "args", "id"/"tool_call_id"/"tool_use_id" }`	Add a live tool-call card; key it by the stable id.
`tool_complete`	same + `{ "duration", "is_error" }`	Mark that card done (match on id).
`title`	`{ "session_id", "title" }`	Update the session title in the sidebar.
`done`	`{ "session": { …full transcript+messages }, "usage": {…}, "terminal_state"? }`	Reconcile the final transcript from `session.messages`; `usage` → context-window indicator. Not the closing frame — `stream_end` follows.
`stream_end`	`{ "session_id" }`	Closes the stream. Finalize the turn.
`cancel`	`{ "type": "cancelled", "message", "session"? }`	Closes the stream. Settle as cancelled.
`error`	`{ "error", "message"? }`	Closes the stream. Surface the error.
`apperror`	`{ "error", "type", "session", "terminal_state"? }`	Terminal failure. ⚠️ The reference iOS client does not handle this frame — a new native client should: treat it like `error` + reconcile the attached `session`.
`pending_steer_leftover`	`{ "text": "…" }`	Steer text that wasn't consumed — restore it into the composer.

Non-rendering extras a client may ignore: metering (live TPS), compressing/compressed, warning, context_status, todo_state, goal/goal_continue. Unknown event types must be ignored, never errored — the protocol grows over time.

Terminal-frame semantics

done is not what closes the socket. A successful turn ends with done then stream_end. The socket-terminating frames are stream_end, cancel, and error (and you should treat apperror as terminal too). Close your EventSource on those.

Reconstructing the message¶

Concatenate token deltas into the live bubble (the web UI paces them word-by-word); an interim_assistant full-text replaces the accumulated text when it arrives; reasoning deltas build the collapsible reasoning card; tool/tool_complete maintain live tool cards keyed by stable id; the done frame's session.messages is the authoritative final transcript — reconcile your streamed text against it. On a dropped connection, reconnect with replay=1&after_seq=<lastEventId>.

`GET /api/chat/stream/status` — liveness / replay¶

?stream_id=… → { "active": bool, "stream_id", "replay_available": bool, "journal": { "terminal": bool, "terminal_state" }? }. active = the stream is still live in-process. A client returning to the foreground calls this to decide finalize vs reconnect.

`GET /api/chat/cancel` — cancel the turn¶

?stream_id=… → { "ok": true, "cancelled": bool, "stream_id" }. Sets the cancel flag and eagerly frees the stream so a new /api/chat/start succeeds immediately; the worker then emits a terminal cancel frame.

`POST /api/chat/steer` — inject a mid-turn nudge (non-interrupting)¶

Body { "session_id", "text" } → { "accepted": bool, "fallback": <reason|null>, "stream_id": <id|null> }. The text is applied at the next tool-result boundary without interrupting the stream. Fallback reasons: no_cached_agent, agent_lacks_steer, not_running, stream_dead, steer_error. Unapplied steer text later surfaces as a pending_steer_leftover frame.

`POST /api/goal` — goal control¶

Body { "session_id", "args": "<verb or text>", model?, workspace? }. args semantics: ""/status → status; pause/resume; clear/stop/done → clear; anything else → set the goal (and kick off a turn if none is active). When a turn kicks off, the response merges goal state with the chat-start fields (stream_id, …) — open /api/chat/stream next. 409 { "error": "agent_running" } if a turn is already live.

`POST /api/btw` — ephemeral side-question¶

Body { "session_id", "question" } → { "stream_id", "session_id": "<hidden>", "parent_session_id" }. Runs a throwaway turn that borrows the parent's context, then is discarded. Open /api/chat/stream?stream_id=…; the terminal done frame carries "ephemeral": true and an "answer". The parent session is left unmodified.

Background tasks¶

POST /api/background { "session_id", "prompt" } → { "task_id", "stream_id", "session_id": "<hidden>" } — spawns a parallel background agent.
GET /api/background/status ?session_id=<parent> → { "results": [ { "task_id", "prompt", "answer", "completed_at" } ] }. Background results are retrieved by polling this, not via an SSE stream.

Waiting-on-user prompts (tool approvals, clarifying questions) arrive on their own SSE streams — see Approvals & clarify.

Chat & streaming¶

POST /api/chat/start — begin a turn¶

GET /api/chat/stream — the live SSE stream¶

Event frames¶

Reconstructing the message¶

GET /api/chat/stream/status — liveness / replay¶

GET /api/chat/cancel — cancel the turn¶

POST /api/chat/steer — inject a mid-turn nudge (non-interrupting)¶

POST /api/goal — goal control¶

POST /api/btw — ephemeral side-question¶