Spanlens SDK
Thin wrappers around the official OpenAI / Anthropic / Gemini SDKs that route traffic through Spanlens and add agent tracing primitives. Zero lock-in — response types and method signatures match the upstream SDKs 1:1. Available for TypeScript and Python.
⚡ Tip: use streaming for long responses
For requests with large max_tokens, slower models, or big JSON outputs, enable streaming — first byte arrives in ~200ms and total duration is unbounded. If you still want a single object back, accumulate chunks server-side and return the merged result from your route handler. See the streaming example below.
Install
npm install @spanlens/sdk
# or
pnpm add @spanlens/sdktsProvider SDKs are installed on demand. For TypeScript, install openai, @anthropic-ai/sdk, or @google/generative-ai alongside Spanlens. For Python, use the matching extras shown above.
createOpenAI() — proxy mode
Constructs the official provider client with base_url pointed at the Spanlens proxy and api_key set to your Spanlens key. Your real OpenAI key never leaves the Spanlens server.
import { createOpenAI } from '@spanlens/sdk/openai'
const openai = createOpenAI({
apiKey: process.env.SPANLENS_API_KEY, // optional — defaults to env
project: 'my-app', // optional — project scope
})
const res = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Hi' }],
})tsOptions
| Option | Type | Default | Description |
|---|---|---|---|
apiKey / api_key | string | SPANLENS_API_KEY env var | Your Spanlens API key (not your OpenAI key) |
baseURL / base_url | string | Spanlens cloud proxy | Override for self-hosting |
createAnthropic()
import { createAnthropic } from '@spanlens/sdk/anthropic'
const anthropic = createAnthropic()
const msg = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hi' }],
})tscreateGemini()
Gemini doesn’t expose a per-instance base_url the way OpenAI/Anthropic do. On TypeScript we wrap GoogleGenerativeAI with a proxy. On Python the helper returns a pre-configured httpx.Client for raw REST calls; for the official Python SDK use configure_gemini() instead.
import { createGemini } from '@spanlens/sdk/gemini'
const genAI = createGemini()
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' })
const result = await model.generateContent('Hi')tswithPromptVersion() — tag a request with a prompt version
Link a logged request to a specific Prompts version so it appears in the A/B comparison table. Pass the helper as the second argument (TS) or unpack into kwargs (Python):
import { createOpenAI, withPromptVersion } from '@spanlens/sdk/openai'
const openai = createOpenAI()
const res = await openai.chat.completions.create(
{
model: 'gpt-4o-mini',
messages: [{ role: 'system', content: systemPromptV3 }, { role: 'user', content: userMsg }],
},
withPromptVersion('chatbot-system@3'),
)tsAccepted formats:
<name>@<version>— e.g.chatbot-system@3<name>@latest— auto-resolves server-side on every call- Raw
prompt_versions.idUUID
The same helper exists on the Anthropic integration. For Gemini and any non-SDK transport, set the header directly: x-spanlens-prompt-version: <id>.
observe() — agent tracing
Wrap any function to turn it into a span in an agent trace. The callback’s return value is automatically captured as the span’s output — no extra code needed. Pass input in the span options to record the inputs too.
import { SpanlensClient, observe } from '@spanlens/sdk'
const client = new SpanlensClient()
const trace = client.startTrace('answer-question')
const docs = await observe(
trace,
{ name: 'retrieve', spanType: 'retrieval', input: { query } },
async () => vectorDb.search(query), // return value → auto-saved as output
)
const response = await observe(
trace,
{ name: 'generate', spanType: 'llm' },
async () => openai.chat.completions.create({ /* ... */ }),
)
await trace.end()tsEach observe() call creates a row in the spans table with timing, input/output, and a link to the parent trace. Inspect traces in /traces.
Streaming inside observe()
With stream: true you control the chunk loop, so pass the final token counts to span.end() once the stream is exhausted. The accumulated text you return is auto-captured as output.
Proxy users: output is automatic
If you route through the Spanlens proxy via createOpenAI(), createAnthropic(), or createGemini(), the proxy captures the completed response server-side and writes it to your span automatically — no extra code needed. The return accumulated pattern below is the fallback for direct (non-proxy) calls.
const text = await observe(
trace,
{
name: 'gpt-4o-mini · analysis',
spanType: 'llm',
input: messages, // captured at span creation
},
async (span) => {
const stream = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages,
stream: true,
stream_options: { include_usage: true },
}, { headers: span.traceHeaders() })
let accumulated = ''
let usage: { prompt_tokens: number; completion_tokens: number; total_tokens: number } | null = null
for await (const chunk of stream) {
accumulated += chunk.choices[0]?.delta?.content ?? ''
if (chunk.usage) usage = chunk.usage
}
// Pass token counts manually — the SDK can't read streaming chunks
if (usage) {
await span.end({
status: 'completed',
promptTokens: usage.prompt_tokens,
completionTokens: usage.completion_tokens,
totalTokens: usage.total_tokens,
})
}
return accumulated // ← auto-saved as output; no need to pass output: here
},
)tsobserveOpenAI() — span + auto-parsed usage
Shorthand that wraps a single LLM call as a span, injects the trace headers so the proxy log can be linked to the span, and auto-parses usage from the response. Pass promptVersion in one shot:
import { observeOpenAI } from '@spanlens/sdk'
const res = await observeOpenAI(trace, 'greeting', (headers) =>
openai.chat.completions.create(
{ model: 'gpt-4o-mini', messages: [{ role: 'user', content: 'Hi' }] },
{ headers, ...withPromptVersion('greeter@latest') },
),
)tsSame pattern works with observeAnthropic() / observe_anthropic() and observeGemini() / observe_gemini().
Low-level: trace + span handles
For complex flows (parallel spans, manual timing) use the handle-based API directly. Spans end automatically on context-exit in Python; in TypeScript call span.end() explicitly.
import { SpanlensClient } from '@spanlens/sdk'
const client = new SpanlensClient()
const trace = client.startTrace('multi-agent-workflow')
const spanA = trace.startSpan('agent-a')
const spanB = trace.startSpan('agent-b')
const [resA, resB] = await Promise.all([
runAgentA().then((r) => { spanA.end({ output: r }); return r }),
runAgentB().then((r) => { spanB.end({ output: r }); return r }),
])
await trace.end()tsNon-blocking by design
Both SDKs do the actual ingest HTTP calls in the background — the TypeScript SDK uses the runtime’s native promise queue, while Python uses a small daemon thread pool. Either way, your hot path (the LLM call itself) is never delayed by Spanlens, and a slow / down Spanlens server never crashes your app. Failures are swallowed by default; pass silent: false (TS) or silent=False (Python) plus an onError hook to surface them.
TypeScript & Python compatibility
- TypeScript SDK: Node 18+, Deno, Bun, Vercel Edge / Cloudflare Workers
- Python SDK: 3.9, 3.10, 3.11, 3.12, 3.13
Next: direct proxy for languages without an SDK, or self-hosting.
Reference: original CodeBlock without tabs
# Quick links
# • TypeScript: https://www.npmjs.com/package/@spanlens/sdk
# • Python: https://pypi.org/project/spanlens/bash