Spanlens SDK

Thin wrappers around the official OpenAI / Anthropic / Gemini SDKs that route traffic through Spanlens and add agent tracing primitives. Zero lock-in — response types and method signatures match the upstream SDKs 1:1. Available for TypeScript and Python.

⚡ Tip: use streaming for long responses

For requests with large max_tokens, slower models, or big JSON outputs, enable streaming — first byte arrives in ~200ms and total duration is unbounded. If you still want a single object back, accumulate chunks server-side and return the merged result from your route handler. See the streaming example below.

Install

npm install @spanlens/sdk
# or
pnpm add @spanlens/sdk
ts

Provider SDKs are installed on demand. For TypeScript, install openai, @anthropic-ai/sdk, or @google/generative-ai alongside Spanlens. For Python, use the matching extras shown above.

createOpenAI() — proxy mode

Constructs the official provider client with base_url pointed at the Spanlens proxy and api_key set to your Spanlens key. Your real OpenAI key never leaves the Spanlens server.

import { createOpenAI } from '@spanlens/sdk/openai'

const openai = createOpenAI({
  apiKey: process.env.SPANLENS_API_KEY,   // optional — defaults to env
  project: 'my-app',                      // optional — project scope
})

const res = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Hi' }],
})
ts

Options

OptionTypeDefaultDescription
apiKey / api_keystringSPANLENS_API_KEY env varYour Spanlens API key (not your OpenAI key)
baseURL / base_urlstringSpanlens cloud proxyOverride for self-hosting

createAnthropic()

import { createAnthropic } from '@spanlens/sdk/anthropic'

const anthropic = createAnthropic()

const msg = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hi' }],
})
ts

createGemini()

Gemini doesn’t expose a per-instance base_url the way OpenAI/Anthropic do. On TypeScript we wrap GoogleGenerativeAI with a proxy. On Python the helper returns a pre-configured httpx.Client for raw REST calls; for the official Python SDK use configure_gemini() instead.

import { createGemini } from '@spanlens/sdk/gemini'

const genAI = createGemini()
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' })

const result = await model.generateContent('Hi')
ts

withPromptVersion() — tag a request with a prompt version

Link a logged request to a specific Prompts version so it appears in the A/B comparison table. Pass the helper as the second argument (TS) or unpack into kwargs (Python):

import { createOpenAI, withPromptVersion } from '@spanlens/sdk/openai'

const openai = createOpenAI()

const res = await openai.chat.completions.create(
  {
    model: 'gpt-4o-mini',
    messages: [{ role: 'system', content: systemPromptV3 }, { role: 'user', content: userMsg }],
  },
  withPromptVersion('chatbot-system@3'),
)
ts

Accepted formats:

  • <name>@<version> — e.g. chatbot-system@3
  • <name>@latest — auto-resolves server-side on every call
  • Raw prompt_versions.id UUID

The same helper exists on the Anthropic integration. For Gemini and any non-SDK transport, set the header directly: x-spanlens-prompt-version: <id>.

observe() — agent tracing

Wrap any function to turn it into a span in an agent trace. The callback’s return value is automatically captured as the span’s output — no extra code needed. Pass input in the span options to record the inputs too.

import { SpanlensClient, observe } from '@spanlens/sdk'

const client = new SpanlensClient()
const trace = client.startTrace('answer-question')

const docs = await observe(
  trace,
  { name: 'retrieve', spanType: 'retrieval', input: { query } },
  async () => vectorDb.search(query),   // return value → auto-saved as output
)

const response = await observe(
  trace,
  { name: 'generate', spanType: 'llm' },
  async () => openai.chat.completions.create({ /* ... */ }),
)

await trace.end()
ts

Each observe() call creates a row in the spans table with timing, input/output, and a link to the parent trace. Inspect traces in /traces.

Streaming inside observe()

With stream: true you control the chunk loop, so pass the final token counts to span.end() once the stream is exhausted. The accumulated text you return is auto-captured as output.

Proxy users: output is automatic

If you route through the Spanlens proxy via createOpenAI(), createAnthropic(), or createGemini(), the proxy captures the completed response server-side and writes it to your span automatically — no extra code needed. The return accumulated pattern below is the fallback for direct (non-proxy) calls.

const text = await observe(
  trace,
  {
    name: 'gpt-4o-mini · analysis',
    spanType: 'llm',
    input: messages,           // captured at span creation
  },
  async (span) => {
    const stream = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages,
      stream: true,
      stream_options: { include_usage: true },
    }, { headers: span.traceHeaders() })

    let accumulated = ''
    let usage: { prompt_tokens: number; completion_tokens: number; total_tokens: number } | null = null

    for await (const chunk of stream) {
      accumulated += chunk.choices[0]?.delta?.content ?? ''
      if (chunk.usage) usage = chunk.usage
    }

    // Pass token counts manually — the SDK can't read streaming chunks
    if (usage) {
      await span.end({
        status: 'completed',
        promptTokens: usage.prompt_tokens,
        completionTokens: usage.completion_tokens,
        totalTokens: usage.total_tokens,
      })
    }

    return accumulated   // ← auto-saved as output; no need to pass output: here
  },
)
ts

observeOpenAI() — span + auto-parsed usage

Shorthand that wraps a single LLM call as a span, injects the trace headers so the proxy log can be linked to the span, and auto-parses usage from the response. Pass promptVersion in one shot:

import { observeOpenAI } from '@spanlens/sdk'

const res = await observeOpenAI(trace, 'greeting', (headers) =>
  openai.chat.completions.create(
    { model: 'gpt-4o-mini', messages: [{ role: 'user', content: 'Hi' }] },
    { headers, ...withPromptVersion('greeter@latest') },
  ),
)
ts

Same pattern works with observeAnthropic() / observe_anthropic() and observeGemini() / observe_gemini().

Low-level: trace + span handles

For complex flows (parallel spans, manual timing) use the handle-based API directly. Spans end automatically on context-exit in Python; in TypeScript call span.end() explicitly.

import { SpanlensClient } from '@spanlens/sdk'

const client = new SpanlensClient()
const trace = client.startTrace('multi-agent-workflow')

const spanA = trace.startSpan('agent-a')
const spanB = trace.startSpan('agent-b')

const [resA, resB] = await Promise.all([
  runAgentA().then((r) => { spanA.end({ output: r }); return r }),
  runAgentB().then((r) => { spanB.end({ output: r }); return r }),
])

await trace.end()
ts

Non-blocking by design

Both SDKs do the actual ingest HTTP calls in the background — the TypeScript SDK uses the runtime’s native promise queue, while Python uses a small daemon thread pool. Either way, your hot path (the LLM call itself) is never delayed by Spanlens, and a slow / down Spanlens server never crashes your app. Failures are swallowed by default; pass silent: false (TS) or silent=False (Python) plus an onError hook to surface them.

TypeScript & Python compatibility

  • TypeScript SDK: Node 18+, Deno, Bun, Vercel Edge / Cloudflare Workers
  • Python SDK: 3.9, 3.10, 3.11, 3.12, 3.13

Next: direct proxy for languages without an SDK, or self-hosting.

Reference: original CodeBlock without tabs

# Quick links
# • TypeScript:  https://www.npmjs.com/package/@spanlens/sdk
# • Python:      https://pypi.org/project/spanlens/
bash