Docs
Cloudflare
Railway
CodeRabbit
Clerk
WorkOS
AG Grid
Netlify
OpenRouter
SerpAPI
Sentry
Unkey
Prisma
Electric
Cloudflare
Railway
CodeRabbit
Clerk
WorkOS
AG Grid
Netlify
OpenRouter
SerpAPI
Sentry
Unkey
Prisma
Electric
Class References
Function References
Interface References
Type Alias References
Variable References
Middleware

Built-in Middleware

TanStack AI ships ready-made middleware so you don't have to hand-roll the common cases. Each one is an ordinary ChatMiddleware — drop it into the middleware array of any chat() call. This page documents every built-in.

MiddlewareImportWhat it does
toolCacheMiddleware@tanstack/ai/middlewaresCache tool-call results by name + arguments
contentGuardMiddleware@tanstack/ai/middlewaresRedact / transform / block streamed text content
otelMiddleware@tanstack/ai/middlewares/otelEmit OpenTelemetry spans + GenAI metrics

toolCacheMiddleware and contentGuardMiddleware are exported from the main @tanstack/ai/middlewares barrel. otelMiddleware lives on its own subpath (@tanstack/ai/middlewares/otel) so that importing the barrel never eagerly pulls in @opentelemetry/api (an optional peer dependency).

toolCacheMiddleware

Caches tool call results based on tool name and arguments. When a tool is called with the same name and arguments as a previous call, the cached result is returned immediately without re-executing the tool.

typescript
import { chat } from "@tanstack/ai";
import { toolCacheMiddleware } from "@tanstack/ai/middlewares";

const stream = chat({
  adapter: openaiText("gpt-5.5"),
  messages,
  tools: [weatherTool, stockTool],
  middleware: [
    toolCacheMiddleware({
      ttl: 60_000, // Cache entries expire after 60 seconds
      maxSize: 50, // Keep at most 50 entries (LRU eviction)
      toolNames: ["getWeather"], // Only cache specific tools
    }),
  ],
});
import { chat } from "@tanstack/ai";
import { toolCacheMiddleware } from "@tanstack/ai/middlewares";

const stream = chat({
  adapter: openaiText("gpt-5.5"),
  messages,
  tools: [weatherTool, stockTool],
  middleware: [
    toolCacheMiddleware({
      ttl: 60_000, // Cache entries expire after 60 seconds
      maxSize: 50, // Keep at most 50 entries (LRU eviction)
      toolNames: ["getWeather"], // Only cache specific tools
    }),
  ],
});

Options:

OptionTypeDefaultDescription
maxSizenumber100Maximum cache entries. Oldest evicted first (LRU). Only applies to the default in-memory storage.
ttlnumberInfinityTime-to-live in milliseconds. Expired entries are not served.
toolNamesstring[]All toolsOnly cache these tools. Others pass through.
keyFn(toolName, args) => stringJSON.stringify([toolName, args])Custom cache key derivation.
storageToolCacheStorageIn-memory MapCustom storage backend. When provided, maxSize is ignored — the storage manages its own capacity.

Behaviors:

  • Only successful tool calls are cached — errors are never stored
  • Cache hits trigger { type: 'skip', result } via onBeforeToolCall
  • LRU eviction: when maxSize is reached, the oldest entry is removed (default storage only)
  • Cache hits refresh the entry's LRU position (moved to most-recently-used)

Custom key function — useful when you want to ignore certain arguments:

typescript
function isRecord(value: unknown): value is Record<string, unknown> {
  return typeof value === "object" && value !== null;
}

toolCacheMiddleware({
  keyFn: (toolName, args) => {
    // Ignore pagination, cache by query only. `args` is `unknown`, so
    // narrow it with a type guard before destructuring.
    if (!isRecord(args)) return JSON.stringify([toolName, args]);
    const { page, ...rest } = args;
    return JSON.stringify([toolName, rest]);
  },
});
function isRecord(value: unknown): value is Record<string, unknown> {
  return typeof value === "object" && value !== null;
}

toolCacheMiddleware({
  keyFn: (toolName, args) => {
    // Ignore pagination, cache by query only. `args` is `unknown`, so
    // narrow it with a type guard before destructuring.
    if (!isRecord(args)) return JSON.stringify([toolName, args]);
    const { page, ...rest } = args;
    return JSON.stringify([toolName, rest]);
  },
});

Custom Storage

By default the cache lives in-memory and is scoped to a single toolCacheMiddleware() instance. Pass a storage option to use an external backend like Redis, localStorage, or a database. This also enables sharing a cache across multiple chat() calls.

The storage interface:

typescript
// Implement this interface (exported from `@tanstack/ai/middlewares`):
interface ToolCacheStorage {
  getItem: (key: string) => ToolCacheEntry | undefined | Promise<ToolCacheEntry | undefined>;
  setItem: (key: string, value: ToolCacheEntry) => void | Promise<void>;
  deleteItem: (key: string) => void | Promise<void>;
}

// ToolCacheEntry is { result: unknown; timestamp: number }
// Implement this interface (exported from `@tanstack/ai/middlewares`):
interface ToolCacheStorage {
  getItem: (key: string) => ToolCacheEntry | undefined | Promise<ToolCacheEntry | undefined>;
  setItem: (key: string, value: ToolCacheEntry) => void | Promise<void>;
  deleteItem: (key: string) => void | Promise<void>;
}

// ToolCacheEntry is { result: unknown; timestamp: number }

All methods may return a Promise for async backends. The middleware handles TTL checking — your storage just needs to store and retrieve entries.

Redis example:

typescript
import { createClient } from "redis";
import { toolCacheMiddleware, type ToolCacheStorage } from "@tanstack/ai/middlewares";

const redis = createClient();

const redisStorage: ToolCacheStorage = {
  getItem: async (key) => {
    const raw = await redis.get(`tool-cache:${key}`);
    return raw ? JSON.parse(raw) : undefined;
  },
  setItem: async (key, value) => {
    await redis.set(`tool-cache:${key}`, JSON.stringify(value));
  },
  deleteItem: async (key) => {
    await redis.del(`tool-cache:${key}`);
  },
};

const stream = chat({
  adapter,
  messages,
  tools: [weatherTool],
  middleware: [toolCacheMiddleware({ storage: redisStorage, ttl: 60_000 })],
});
import { createClient } from "redis";
import { toolCacheMiddleware, type ToolCacheStorage } from "@tanstack/ai/middlewares";

const redis = createClient();

const redisStorage: ToolCacheStorage = {
  getItem: async (key) => {
    const raw = await redis.get(`tool-cache:${key}`);
    return raw ? JSON.parse(raw) : undefined;
  },
  setItem: async (key, value) => {
    await redis.set(`tool-cache:${key}`, JSON.stringify(value));
  },
  deleteItem: async (key) => {
    await redis.del(`tool-cache:${key}`);
  },
};

const stream = chat({
  adapter,
  messages,
  tools: [weatherTool],
  middleware: [toolCacheMiddleware({ storage: redisStorage, ttl: 60_000 })],
});

Sharing a cache across requests:

typescript
// Create storage once, reuse across chat() calls
const sharedStorage: ToolCacheStorage = {
  getItem: (key) => globalCache.get(key),
  setItem: (key, value) => { globalCache.set(key, value); },
  deleteItem: (key) => { globalCache.delete(key); },
};

// Both requests share the same cache
app.post("/api/chat", async (req) => {
  const stream = chat({
    adapter,
    messages: req.body.messages,
    tools: [weatherTool],
    middleware: [toolCacheMiddleware({ storage: sharedStorage })],
  });
  return toServerSentEventsResponse(stream);
});
// Create storage once, reuse across chat() calls
const sharedStorage: ToolCacheStorage = {
  getItem: (key) => globalCache.get(key),
  setItem: (key, value) => { globalCache.set(key, value); },
  deleteItem: (key) => { globalCache.delete(key); },
};

// Both requests share the same cache
app.post("/api/chat", async (req) => {
  const stream = chat({
    adapter,
    messages: req.body.messages,
    tools: [weatherTool],
    middleware: [toolCacheMiddleware({ storage: sharedStorage })],
  });
  return toServerSentEventsResponse(stream);
});

contentGuardMiddleware

Filters or transforms streamed text content as it flows through onChunk. Use it to redact sensitive data (SSNs, emails, API keys), enforce a profanity filter, or rewrite text on the fly. Rules are applied to TEXT_MESSAGE_CONTENT chunks; all other chunk types pass through untouched.

typescript
import { chat } from "@tanstack/ai";
import { contentGuardMiddleware } from "@tanstack/ai/middlewares";

const stream = chat({
  adapter: openaiText("gpt-5.5"),
  messages,
  middleware: [
    contentGuardMiddleware({
      rules: [
        // Regex + replacement
        { pattern: /\b\d{3}-\d{2}-\d{4}\b/g, replacement: "[SSN REDACTED]" },
        // Custom transform function
        { fn: (text) => text.replaceAll("badword", "****") },
      ],
      strategy: "buffered",
    }),
  ],
});
import { chat } from "@tanstack/ai";
import { contentGuardMiddleware } from "@tanstack/ai/middlewares";

const stream = chat({
  adapter: openaiText("gpt-5.5"),
  messages,
  middleware: [
    contentGuardMiddleware({
      rules: [
        // Regex + replacement
        { pattern: /\b\d{3}-\d{2}-\d{4}\b/g, replacement: "[SSN REDACTED]" },
        // Custom transform function
        { fn: (text) => text.replaceAll("badword", "****") },
      ],
      strategy: "buffered",
    }),
  ],
});

Options:

OptionTypeDefaultDescription
rulesContentGuardRule[]Required. Applied in order; each rule receives the previous rule's output. A rule is either { pattern: RegExp; replacement: string } or { fn: (text: string) => string }.
strategy'delta' | 'buffered''buffered'How content is matched. See below.
bufferSizenumber50(Buffered only) Characters held back before emitting, so patterns spanning chunk boundaries still match. Set it ≥ the longest pattern you expect. Flushed at stream end.
blockOnMatchbooleanfalseWhen true, drop the entire chunk if any rule changes the content (instead of emitting the filtered version).
onFiltered(info: ContentFilteredInfo) => voidCallback fired whenever a rule changes content. Receives { messageId, original, filtered, strategy }.

Matching strategies:

  • 'buffered' (default) — Accumulates content and applies rules to the settled portion, holding back a bufferSize look-behind window so a pattern split across two chunks ("...123-45" then "-6789...") is still caught. The buffer is flushed when the message or run ends. Use this for anything that can span deltas — which is most redaction.
  • 'delta' — Applies rules to each delta in isolation as it arrives. Fastest and lowest-latency, but a pattern split across a chunk boundary may slip through. Use only when your patterns are guaranteed to fit within a single delta.

Behaviors:

  • Only TEXT_MESSAGE_CONTENT chunks are inspected; every other chunk type passes through.
  • A rule that doesn't change the text is a no-op — the chunk passes through unchanged.
  • With blockOnMatch: true, a matched chunk is dropped entirely (returns null from onChunk) rather than emitting the redacted text.
  • The onFiltered callback is for observability/audit — it fires with the before/after text but does not alter what is emitted.

otelMiddleware

Emits vendor-neutral OpenTelemetry traces and metrics for every chat() call — a root span per call, a child span per agent-loop iteration, and a grandchild span per tool execution, all tagged with GenAI semantic-convention attributes.

typescript
import { chat } from "@tanstack/ai";
import { otelMiddleware } from "@tanstack/ai/middlewares/otel";
import { trace, metrics } from "@opentelemetry/api";

const otel = otelMiddleware({
  tracer: trace.getTracer("my-app"),
  meter: metrics.getMeter("my-app"), // optional — enables GenAI histograms
});

const result = await chat({
  adapter: openaiText("gpt-5.5"),
  messages,
  middleware: [otel],
});
import { chat } from "@tanstack/ai";
import { otelMiddleware } from "@tanstack/ai/middlewares/otel";
import { trace, metrics } from "@opentelemetry/api";

const otel = otelMiddleware({
  tracer: trace.getTracer("my-app"),
  meter: metrics.getMeter("my-app"), // optional — enables GenAI histograms
});

const result = await chat({
  adapter: openaiText("gpt-5.5"),
  messages,
  middleware: [otel],
});

otelMiddleware has its own configuration surface (content capture, redaction, span-name formatting, attribute enrichment, lifecycle callbacks) and requires the optional @opentelemetry/api peer dependency. See the dedicated OpenTelemetry guide for full setup, the span/metric catalogue, and all options.

Writing your own

These built-ins are just ChatMiddleware objects — nothing about them is privileged. To build your own, see the Middleware guide for the full hook reference, the context object, and composition rules.

Next Steps