Replay and Resume

Workflows are closures. Every invocation runs the handler from the top. Replay short-circuits past completed work by reading the event log.

The log

Append-only. Optimistic-CAS on expectedNextIndex. Stored via RunStore.appendEvent(runId, index, event).

Checkpoint events — replay reads these to skip work:

STEP_FINISHED / STEP_FAILED
SIGNAL_RESOLVED / APPROVAL_RESOLVED
NOW_RECORDED / UUID_RECORDED
RUN_FINISHED / RUN_ERRORED

Coordination events — persisted so hosts and resume calls can identify the pending wait:

SIGNAL_AWAITED
APPROVAL_REQUESTED

Observability-only events — emit-only, not persisted:

RUN_STARTED, STEP_STARTED
STATE_DELTA
CUSTOM (from ctx.emit)

How replay works

For each ctx.step('id', fn):

Walk the log for STEP_FINISHED / STEP_FAILED with this id.
Found → return the recorded result (or rethrow the recorded error). fn is NOT called.
Not found → run fn, append STEP_FINISHED, return.

Same algorithm for waitForEvent, approve, now, and uuid: replay first matches the operation's durable stepId. ctx.step(id, ...) uses its first argument. Other primitives accept id in their options and otherwise use a generated positional internal ID.

Determinism contract

The handler must reach the same primitives in the same order on every replay:

// Determinism violations:
const t = Date.now()                 // use ctx.now()
const id = Math.random()             // use ctx.uuid()
if (await fetchFlag()) { ... }       // wrap the fetch in ctx.step()

// Safe:
const t = await ctx.now({ id: 'started-at' })
const id = await ctx.uuid({ id: 'correlation-id' })
const flag = await ctx.step('flag', fetchFlag)
if (flag) { ... }

// Determinism violations:
const t = Date.now()                 // use ctx.now()
const id = Math.random()             // use ctx.uuid()
if (await fetchFlag()) { ... }       // wrap the fetch in ctx.step()

// Safe:
const t = await ctx.now({ id: 'started-at' })
const id = await ctx.uuid({ id: 'correlation-id' })
const flag = await ctx.step('flag', fetchFlag)
if (flag) { ... }

State mutations re-run on replay. They're reapplied deterministically because they depend only on replayed step results.

Pause and resume

Run pauses when the handler reaches:

ctx.approve with no APPROVAL_RESOLVED in the log
ctx.waitForEvent(name) with no matching SIGNAL_RESOLVED
ctx.sleep / ctx.sleepUntil (internally a signal-wait on __timer)

The engine writes RunState.status = 'paused' with awaiting[] populated, also filling the current waitingFor / pendingApproval projection fields, ends the event stream, and returns.

Resume:

runWorkflow({
  workflow,
  runId,
  runStore,
  // pick one:
  approval:        { approvalId, approved, feedback? },
  signalDelivery:  { signalId, stepId?, name, payload },
})

runWorkflow({
  workflow,
  runId,
  runStore,
  // pick one:
  approval:        { approvalId, approved, feedback? },
  signalDelivery:  { signalId, stepId?, name, payload },
})

The engine appends APPROVAL_RESOLVED or SIGNAL_RESOLVED to the log, re-runs the handler from the top, and replay carries through to the next primitive after the pause.

Idempotency and lost races

Every signal delivery carries a signalId. If the waited operation has an explicit ID, pass it back as signalDelivery.stepId. Two deliveries for the same waiting operation:

Same signalId → idempotent. The engine no-ops and returns success.
Different signalId → the loser sees RUN_ERRORED { code: 'signal_lost' }. The winner's payload is what the workflow sees.

Use this for safe webhook retries: pick a stable signalId per webhook event.

Version routing

When workflow code changes, declare a version and keep old code reachable:

const v2 = createWorkflow({ id: 'pipeline', version: 'v2' })
  .previousVersions([v1])        // v1 stays callable for in-flight v1 runs
  .handler(async (ctx) => { /* v2 body */ })

const v2 = createWorkflow({ id: 'pipeline', version: 'v2' })
  .previousVersions([v1])        // v1 stays callable for in-flight v1 runs
  .handler(async (ctx) => { /* v2 body */ })

On resume the engine reads RunState.workflowVersion and routes to the matching definition. Drop a version from previousVersions only after all runs at that version have terminated.

Mismatched version with nothing in previousVersions → RUN_ERRORED { code: 'workflow_version_mismatch' }.

A second subscriber (browser refresh, mobile reconnect) reads current state without driving the run forward:

runWorkflow({ workflow, runId, runStore, attach: true })

runWorkflow({ workflow, runId, runStore, attach: true })

Engine emits: RUN_STARTED → replay of the log → terminal event (RUN_FINISHED, RUN_ERRORED, or pause info), then ends.

Webhook execution

For Durable-Streams-style stateless invocations:

import { handleWorkflowWebhook } from '@tanstack/workflow-core'

await handleWorkflowWebhook({
  workflow,
  runStore,
  payload: { runId, signalDelivery, approval },
})

import { handleWorkflowWebhook } from '@tanstack/workflow-core'

await handleWorkflowWebhook({
  workflow,
  runStore,
  payload: { runId, signalDelivery, approval },
})

Same engine. One invocation drives the run to its next pause or completion. The HTTP handler returns; the durable stream / queue handles wake-ups.

Cleanup

Terminal runs remain in the store so attach calls and webhook retries can read the final log. Stores decide their retention policy; the in-memory store expires non-paused runs after its TTL (1h default). Hosts can still call RunStore.deleteRun(runId, reason) when they want immediate cleanup.

What the log contains, end to end

plaintext

[
  // RUN_STARTED — emit only, not in the persisted log
  STEP_FINISHED   { stepId: 'fetch-user', result: { id: 'u-1', tier: 'pro' } },
  NOW_RECORDED    { stepId: '__now-0', value: 1737499200000 },
  SIGNAL_AWAITED  { stepId: 'payment-webhook', name: 'payment', deadline: ... },
  SIGNAL_RESOLVED { stepId: 'payment-webhook', name: 'payment', signalId: 'evt-1', payload: { ... } },
  APPROVAL_REQUESTED { stepId: 'review', approvalId: 'a-1', title: 'Continue?' },
  APPROVAL_RESOLVED  { stepId: 'review', approvalId: 'a-1', approved: true },
  STEP_FINISHED   { stepId: 'finalize', result: { ok: true } },
  RUN_FINISHED    { runId, output: { ok: true } },
]

[
  // RUN_STARTED — emit only, not in the persisted log
  STEP_FINISHED   { stepId: 'fetch-user', result: { id: 'u-1', tier: 'pro' } },
  NOW_RECORDED    { stepId: '__now-0', value: 1737499200000 },
  SIGNAL_AWAITED  { stepId: 'payment-webhook', name: 'payment', deadline: ... },
  SIGNAL_RESOLVED { stepId: 'payment-webhook', name: 'payment', signalId: 'evt-1', payload: { ... } },
  APPROVAL_REQUESTED { stepId: 'review', approvalId: 'a-1', title: 'Continue?' },
  APPROVAL_RESOLVED  { stepId: 'review', approvalId: 'a-1', approved: true },
  STEP_FINISHED   { stepId: 'finalize', result: { ok: true } },
  RUN_FINISHED    { runId, output: { ok: true } },
]

Replay walks this; observers tail it.

The log#

How replay works#

Determinism contract#

Pause and resume#

Idempotency and lost races#

Version routing#

Attach (read-only subscribe)#

Webhook execution#

Cleanup#

What the log contains, end to end#