Streaming Responses Guide
This guide covers best practices for handling streaming responses across different platforms and languages.
Why stream?
- Better UX: Users see tokens appear in real time instead of waiting 2-5 seconds for a full response
- Lower perceived latency: Users see the first token as soon as the model starts generating (latency varies by model and region)
- Memory efficient: Process tokens as they arrive instead of buffering the full response
Python (OpenAI SDK)
stream = client.chat.completions.create(
model="...",
messages=[{"role": "user", "content": "Explain transformers"}],
stream=True
)
full_response = ""
for chunk in stream:
content = chunk.choices[0].delta.content or ""
full_response += content
print(content, end="", flush=True)Python (async)
import asyncio
async def stream_response():
stream = await async_client.chat.completions.create(
model="...",
messages=[{"role": "user", "content": "Hello"}],
stream=True
)
async for chunk in stream:
content = chunk.choices[0].delta.content or ""
print(content, end="", flush=True)
asyncio.run(stream_response())Node.js
const stream = await client.chat.completions.create({
model: '...',
messages: [{ role: 'user', content: 'Hello' }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
}React (Next.js)
'use client'
import { useState } from 'react'
export function Chat() {
const [output, setOutput] = useState('')
const [loading, setLoading] = useState(false)
async function handleSubmit(prompt: string) {
setLoading(true)
setOutput('')
const response = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify({ prompt }),
})
const reader = response.body!.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader.read()
if (done) break
const text = decoder.decode(value)
setOutput(prev => prev + text)
}
setLoading(false)
}
return <div>{output}{loading && <span className="animate-pulse">▊</span>}</div>
}Error handling
Always handle stream interruptions:
try:
for chunk in stream:
content = chunk.choices[0].delta.content or ""
process(content)
except Exception as e:
print(f"Stream interrupted: {e}")
# Optionally retry or fall back to non-streamingBest practices
- Always flush output — use
flush=Truein Python orprocess.stdout.writein Node - Show a cursor — display a blinking cursor while streaming for better UX
- Handle
[DONE]— the stream ends withdata: [DONE]; your parser must handle this - Set timeouts — if no token arrives in 30 seconds, the connection may be stale
- Buffer by word — for display, buffer until a space character for smoother text rendering