Production Patterns
Streaming LLM endpoints have failure and cost characteristics that ordinary JSON
endpoints do not. These patterns keep an @AiStream route safe and affordable in
production. All of them are application-owned — the package gives you the hooks;
your app sets the policy.
Cancel On Disconnect (cost control)
The single most important pattern: forward @AiAbortSignal() into every AI SDK
call. A client that closes the tab otherwise leaves the model generating tokens
you pay for.
@Post()
@AiStream()
chat(@Body() body: ChatDto, @AiAbortSignal() signal: AbortSignal) {
return streamText({ model, prompt: body.prompt, abortSignal: signal });
}
See @AiAbortSignal.
Cap Output Length
Bound the model's output so a single request cannot run away. The AI SDK exposes this on the call itself:
return streamText({
model,
prompt: body.prompt,
abortSignal: signal,
maxOutputTokens: 1024,
});
Rate Limiting
Apply rate limiting as a guard so a rejection is a pre-stream HTTP 429,
never a stream frame. @nestjs/throttler works unchanged because guards run
before the stream opens:
@Post()
@AiStream()
@UseGuards(ThrottlerGuard)
chat(@Body() body: ChatDto) {
return streamText({ model, prompt: body.prompt });
}
Input Validation
Validate the request body with a pipe so malformed input is a clean pre-stream
HTTP 400. Both class-validator DTOs and Zod schemas are supported. See
Enhancer Pipeline.
Observability
Pre-stream interceptors run before the first byte, so use them for logging,
metrics, and tracing. A response-transform interceptor is incompatible with
streaming — keep it off @AiStream routes (see
Enhancer Pipeline). For in-stream telemetry, instrument
inside the AI SDK call (the AI SDK exposes its own telemetry hooks) rather than
trying to wrap the response.
In-Stream Error Hygiene
Set a module-wide onError mapper so mid-stream failures emit a stable,
non-sensitive message rather than the AI SDK's generic default. Never surface raw
provider errors. See Error Mapping.
AiModule.forRoot({
onError: () => 'The assistant is temporarily unavailable.',
});
Checklist
- Forward
@AiAbortSignal()into every AI SDK call. - Set
maxOutputTokens(or the provider equivalent) on every call. - Rate-limit with a guard so over-limit is a pre-stream
429. - Validate input with a pipe so bad input is a pre-stream
400. - Set a module-wide
onErrorthat only emits vetted messages. - Keep response-transform interceptors off streaming routes.
See Security for the auth and secret-leakage angle.