Skip to content

LLM Providers

Tepa is LLM-agnostic. The LLMProvider interface is a single method — complete() — that abstracts away every provider-specific SDK, API shape, and authentication flow. Tepa ships with three built-in providers (Anthropic, OpenAI, Gemini), and you can add any other by extending BaseLLMProvider.

This section covers the provider interface, the three built-in providers and their options, native tool use, the provider logging system, and how to build a custom provider. For how providers fit into the broader package architecture, see How Tepa Works — Package Architecture.


Provider Interface

All provider types live in @tepa/types. The core interface is intentionally minimal:

LLMProvider

interface LLMProvider {
  complete(messages: LLMMessage[], options: LLMRequestOptions): Promise<LLMResponse>;
  getModels(): ModelInfo[];
}

Two methods. complete() is the LLM call. getModels() returns the provider's model catalog — the set of models it supports, each with metadata the Planner uses to make intelligent model assignments for individual steps. The pipeline never touches provider SDKs directly — it only talks through this interface.

ModelInfo

interface ModelInfo {
  id: string;
  description: string;
  tier: "fast" | "balanced" | "advanced";
  capabilities?: string[];
}
Field Description
id Model identifier as passed to the provider API (e.g. "claude-sonnet-4-6")
description Human-readable description rendered in the Planner's system prompt
tier Capability tier — helps the Planner pick fast models for simple tasks, advanced models for complex reasoning
capabilities Optional list (e.g. ["tool_use", "vision"]) for future programmatic filtering

LLMMessage

interface LLMMessage {
  role: "user" | "assistant";
  content: string;
}

A simple role/content pair. System prompts are passed separately through LLMRequestOptions, not as messages.

LLMRequestOptions

interface LLMRequestOptions {
  model: string;
  maxTokens?: number;
  temperature?: number;
  systemPrompt?: string;
  tools?: ToolSchema[];
}

The tools field is how the pipeline passes tool schemas for native tool use. When present, the provider converts these schemas into its SDK's native format and includes them in the API call.

LLMResponse

interface LLMResponse {
  text: string;
  tokensUsed: {
    input: number;
    output: number;
  };
  finishReason: "end_turn" | "max_tokens" | "stop_sequence" | "tool_use";
  toolUse?: LLMToolUseBlock[];
}

Every provider maps its SDK-specific finish reasons to this standard enum. When finishReason is "tool_use", the toolUse array contains the parsed tool calls.

LLMToolUseBlock

interface LLMToolUseBlock {
  id: string;
  name: string;
  input: Record<string, unknown>;
}
Field Description
id Provider-assigned ID for correlating tool calls with results.
name Name of the tool the LLM wants to call.
input Parsed input parameters — already an object, not a JSON string.

The input field is pre-parsed by the provider. The Executor passes it directly to tool.execute() without any JSON parsing step.


Built-in Providers

Anthropic

Package: @tepa/provider-anthropic
SDK: @anthropic-ai/sdk
Default model: claude-haiku-4-5

npm install @tepa/provider-anthropic
import { AnthropicProvider, AnthropicModels } from "@tepa/provider-anthropic";

const provider = new AnthropicProvider({
  apiKey: process.env.ANTHROPIC_API_KEY, // omit to read from env automatically
});

Model catalog: Claude_Haiku_4_5 (fast), Claude_Sonnet_4_6 (balanced), Claude_Opus_4_6 (advanced). Use AnthropicModels.* constants for type-safe config references.

Options:

Option Type Default Description
apiKey string ANTHROPIC_API_KEY env var API key for authentication.
maxRetries number 3 Max retries on transient or rate-limit errors.
retryBaseDelayMs number 1000 Base delay in ms for exponential backoff.
defaultLog boolean true Enable automatic JSONL file logging.
logDir string ".tepa/logs" Directory for log files.
includeContent boolean false Include full message content in logs.

Retryable errors: Rate limit (429), internal server error (500), connection errors, overloaded (529).

Finish reason mapping:

Anthropic Tepa
"max_tokens" "max_tokens"
"stop_sequence" "stop_sequence"
"tool_use" "tool_use"
"end_turn" / other "end_turn"

OpenAI

Package: @tepa/provider-openai
SDK: openai
API: Responses API
Default model: gpt-5-mini

npm install @tepa/provider-openai
import { OpenAIProvider, OpenAIModels } from "@tepa/provider-openai";

const provider = new OpenAIProvider({
  apiKey: process.env.OPENAI_API_KEY,
});

Model catalog: GPT_5_Mini (fast), GPT_5 (advanced). Use OpenAIModels.* constants for type-safe config references.

Options:

Option Type Default Description
apiKey string OPENAI_API_KEY env var API key for authentication.
maxRetries number 3 Max retries on transient or rate-limit errors.
retryBaseDelayMs number 1000 Base delay in ms for exponential backoff.
defaultLog boolean true Enable automatic JSONL file logging.
logDir string ".tepa/logs" Directory for log files.
includeContent boolean false Include full message content in logs.

The OpenAI provider uses the Responses API (client.responses.create()), not the legacy Chat Completions API. System prompts are passed as a system-role input item, and tool calls are extracted from FunctionCallOutput items in the response.

Retryable errors: Rate limit (429), internal server error (500), connection errors.

Finish reason mapping:

OpenAI Tepa
"incomplete" "max_tokens"
Tool calls in output "tool_use"
Other / null "end_turn"

Gemini

Package: @tepa/provider-gemini
SDK: @google/genai
Default model: gemini-3-flash-preview

npm install @tepa/provider-gemini
import { GeminiProvider, GeminiModels } from "@tepa/provider-gemini";

const provider = new GeminiProvider({
  apiKey: process.env.GEMINI_API_KEY, // also reads GOOGLE_API_KEY
});

Model catalog: Gemini_3_Flash_Preview (fast), Gemini_3_Pro_Preview (advanced). Use GeminiModels.* constants for type-safe config references.

Options:

Option Type Default Description
apiKey string GEMINI_API_KEY or GOOGLE_API_KEY env var API key for authentication.
maxRetries number 3 Max retries on transient or rate-limit errors.
retryBaseDelayMs number 1000 Base delay in ms for exponential backoff.
defaultLog boolean true Enable automatic JSONL file logging.
logDir string ".tepa/logs" Directory for log files.
includeContent boolean false Include full message content in logs.

Gemini maps "assistant" roles to "model" and passes system prompts via the SDK's systemInstruction config field. Tool calls are extracted from functionCall parts in the response, with synthetic IDs (gemini-call-0, gemini-call-1, ...) since the Gemini API doesn't assign call IDs.

Retryable errors: Rate limit (429), server errors (5xx), connection errors. Non-retryable: 400, 401, 403, 404.

Finish reason mapping:

Gemini Tepa
"MAX_TOKENS" "max_tokens"
Function calls in response "tool_use"
"STOP" / other "end_turn"

Native Tool Use

All three providers use native tool use — the LLM's built-in function calling capability — rather than embedding tool descriptions in the prompt and parsing JSON from the response.

How It Works

When a plan step declares tools, the Executor:

  1. Builds tool schemas from the tool registry and passes them in LLMRequestOptions.tools
  2. The provider converts ToolSchema[] to its SDK's native format
  3. The LLM responds with structured tool call blocks instead of free-form text
  4. The provider extracts tool calls into LLMToolUseBlock[] with pre-parsed parameters
  5. The Executor invokes the tool directly with the parsed input object — no JSON.parse needed

Why It Matters

Text-based tool calling requires the LLM to produce valid JSON inside its response, which is fragile:

  • Escaping errors — large file contents with quotes, newlines, or special characters break JSON parsing
  • Format drift — the LLM might wrap the JSON in markdown code fences or add commentary
  • Partial output — token limits can truncate the JSON mid-object

Native tool use eliminates all of these. The provider SDK handles serialisation and the parameters arrive as a ready-to-use object. Every built-in provider uses this approach — there is no fallback to text parsing.

Schema Conversion by Provider

Each provider converts ToolSchema to its SDK's expected format internally. You pass a single ToolSchema[] and the provider does the rest:

Anthropicinput_schema with JSON Schema object:

{ "name": "file_read", "description": "...", "input_schema": { "type": "object", "properties": { ... }, "required": [...] } }

OpenAIfunction type with parameters object:

{ "type": "function", "name": "file_read", "description": "...", "parameters": { "type": "object", "properties": { ... }, "required": [...] } }

GeminifunctionDeclarations array with uppercase types:

{ "functionDeclarations": [{ "name": "file_read", "description": "...", "parameters": { "type": "OBJECT", "properties": { ... }, "required": [...] } }] }

Provider Logging System

Every provider built on BaseLLMProvider — including all three built-ins — automatically logs every LLM call to a JSONL file and optionally to custom listeners. This is one of Tepa's most useful operational features: a complete, structured audit trail of every request and response, available out of the box with zero configuration.

Default File Logging

By default, each provider instance creates a JSONL log file at .tepa/logs/llm-{timestamp}.jsonl. Each line is one LLMLogEntry. This is enabled by default — disable it with defaultLog: false or move it with logDir:

// Default: logs to .tepa/logs/llm-{timestamp}.jsonl
const provider = new AnthropicProvider({ apiKey: "..." });

// Disable file logging entirely
const provider = new AnthropicProvider({ apiKey: "...", defaultLog: false });

// Custom log directory
const provider = new AnthropicProvider({ apiKey: "...", logDir: "./my-logs" });

LLMLogEntry

Every entry captures the full context of an LLM call:

interface LLMLogEntry {
  timestamp: string;
  provider: string; // "anthropic", "openai", "gemini"
  status: "success" | "error" | "retry";
  durationMs: number;
  attempt: number; // 0-based attempt number
  request: {
    model: string;
    messageCount: number;
    totalCharLength: number;
    promptPreview: string; // First 120 chars of the last message
    maxTokens?: number;
    temperature?: number;
    hasSystemPrompt: boolean;
    hasTools?: boolean;
    messages?: LLMMessage[]; // Only if includeContent: true
    systemPrompt?: string; // Only if includeContent: true
  };
  response?: {
    // Present on "success"
    text: string;
    tokensUsed: { input: number; output: number };
    finishReason: string;
    toolUseCount?: number;
  };
  error?: {
    // Present on "error" and "retry"
    message: string;
    retryable: boolean;
  };
}

A "retry" entry indicates the call failed but will be retried. A "success" entry includes the full response. An "error" entry indicates the final failure after all retries are exhausted.

Accessing Logs After a Run

Providers accumulate entries in memory throughout a run. Access them via the provider instance after tepa.run() completes:

const result = await tepa.run(prompt);

const entries = provider.getLogEntries();
console.log(`Total LLM calls: ${entries.length}`);
console.log(`Retries: ${entries.filter((e) => e.status === "retry").length}`);
console.log(`Failed: ${entries.filter((e) => e.status === "error").length}`);

// Path to the JSONL file on disk
const logPath = provider.getLogFilePath();
console.log(`Full logs at: ${logPath}`);

Custom Log Listeners

Register custom callbacks with onLog() to process entries in real time — useful for streaming metrics to monitoring platforms or triggering alerts on errors:

const provider = new AnthropicProvider({ apiKey: "..." });

// Alert on errors
provider.onLog((entry) => {
  if (entry.status === "error") {
    alertOncall(`LLM error: ${entry.error?.message}`);
  }
});

// Prometheus-style metrics
provider.onLog((entry) => {
  llmCallsTotal.inc({ provider: entry.provider, status: entry.status });
  llmDurationMs.observe({ provider: entry.provider }, entry.durationMs);

  if (entry.response) {
    llmTokensTotal.inc(
      { provider: entry.provider, direction: "input" },
      entry.response.tokensUsed.input,
    );
    llmTokensTotal.inc(
      { provider: entry.provider, direction: "output" },
      entry.response.tokensUsed.output,
    );
  }
});

Multiple listeners can be registered. Each receives every log entry.

Built-in Log Callbacks

@tepa/provider-core exports two ready-made handlers:

consoleLogCallback — Formats entries for console output with timing and preview:

import { consoleLogCallback } from "@tepa/provider-core";

provider.onLog(consoleLogCallback);
// [2026-03-15T10:30:00.000Z] anthropic success (1234ms) model=claude-haiku-4-5 tokens=150+200

createFileLogWriter — Creates a JSONL writer for a custom path:

import { createFileLogWriter } from "@tepa/provider-core";

const writer = createFileLogWriter("./custom-logs/anthropic.jsonl");
provider.onLog(writer.callback);
writer.close(); // Close when done

Privacy Controls

By default, log entries do not include full message content or system prompts — only metadata: message count, character length, and a 120-character preview. Set includeContent: true to include full content for debugging:

const provider = new AnthropicProvider({
  apiKey: "...",
  includeContent: true, // Not recommended in production
});

When includeContent is true, the request object includes the full messages array and systemPrompt string. When false (the default), these fields are omitted.

Token Usage & Cost

Every provider extracts the token counts the underlying SDK reports, including prompt-cache fields when present:

interface LLMTokensUsed {
  input: number;
  output: number;
  cacheRead?: number; // Anthropic cache hits, OpenAI cached prompt tokens, Gemini cached content tokens
  cacheWrite?: number; // Anthropic only (cache_creation_input_tokens)
}

cacheRead / cacheWrite appear on both LLMResponse.tokensUsed and LLMLogEntry.response.tokensUsed, so they flow into custom onLog handlers automatically.

ModelInfo now carries an optional cost: ModelPricing field — provider packages may ship best-effort pricing for their built-in models, and you can attach pricing to any custom model when you register it:

interface ModelPricing {
  inputPer1M: number;
  outputPer1M: number;
  cacheReadPer1M?: number;
  cacheWritePer1M?: number;
  currency?: string; // defaults to "USD"
}

Pricing data goes stale; treat shipped values as a starting point and override per-instance for production billing (see the bridge pricing option below).

Pairing with llmvantage for Cost & Cross-SDK Observability

Tepa's provider logs are pipeline-aware — they capture concepts like retry status, attempt number, normalized finish reasons, and tool-use counts that only exist above the HTTP layer. They are not, however, the right place for raw token-cost accounting across every LLM call your process makes (including any non-Tepa calls in the same app).

For that, llmvantage is a good fit. It patches global fetch and captures the underlying request/response for Anthropic, OpenAI, and Gemini SDKs — which is exactly what Tepa's providers call under the hood. The two layers compose without any glue code:

// 1. Foundation: cost & raw-HTTP observability for any LLM traffic in the process.
import "llmvantage";
import { observer } from "llmvantage";
import { consoleSink } from "llmvantage/sinks/console";

observer.pipe(consoleSink);

// 2. Tepa layer: pipeline-aware structured logs (retries, attempts, tool use).
import { AnthropicProvider } from "@tepa/provider-anthropic";

const provider = new AnthropicProvider({
  apiKey: process.env.ANTHROPIC_API_KEY!,
  defaultLog: false, // avoid double-writing to disk if llmvantage already has a file sink
});

provider.onLog((entry) => {
  if (entry.status === "retry") {
    console.warn(`retry #${entry.attempt}: ${entry.error?.message}`);
  }
});

Which layer captures what:

Concern Use llmvantage Use Tepa onLog
Token totals & cost rollups across all LLM calls
Raw request/response bodies for replay
PII redaction at the HTTP boundary
Retry status, attempt number
Normalized finish reasons across providers
Tool-use counts per call
Per-provider model catalog correlation

One HTTP attempt corresponds to one llmvantage event; one Tepa complete() may emit multiple log entries (one per retry plus a terminal success/error). The mapping is intentionally not 1-to-1 — each layer reflects what's true at its layer. If both layers write to disk, set defaultLog: false on the provider (or skip the llmvantage file sink) to avoid duplicate JSONL output.

@tepa/observability-llmvantage

For tighter integration without coupling the core packages, install the optional adapter:

npm install @tepa/observability-llmvantage llmvantage

It exposes two pieces:

1. createLlmvantageBridge — cost rollups from Tepa's onLog. Wire it into any provider and call summary() after the run for per-provider, per-model totals:

import { createLlmvantageBridge, defaultPricing } from "@tepa/observability-llmvantage";
import { AnthropicProvider } from "@tepa/provider-anthropic";

const bridge = createLlmvantageBridge({
  pricing: {
    ...defaultPricing,
    anthropic: {
      ...defaultPricing.anthropic,
      // Override stale defaults, or add a model the provider package doesn't ship yet
      "claude-sonnet-4-6": {
        inputPer1M: 3,
        outputPer1M: 15,
        cacheReadPer1M: 0.3,
        cacheWritePer1M: 3.75,
      },
    },
  },
});

const provider = new AnthropicProvider({ apiKey: process.env.ANTHROPIC_API_KEY! });
provider.onLog(bridge.callback);

await tepa.run(prompt);

const summary = bridge.summary();
// {
//   calls, retries, errors,
//   tokens: { input, output, cacheRead, cacheWrite },
//   cost: { total: 0.0234, currency: "USD" },
//   byModel: { "anthropic:claude-sonnet-4-6": { calls, tokens, cost } },
//   byProvider: { anthropic: { calls, tokens, cost } },
//   pricingMissing: []  // provider:model pairs with no pricing entry
// }

Pricing resolution order (highest to lowest): BridgeOptions.pricingdefaultPricing shipped by the adapter. ModelInfo.cost on each provider's model catalog is reserved for v2; for now, supply overrides explicitly via pricing. Use ignoreDefaultPricing: true to bypass the shipped snapshot entirely.

2. tagCost — llmvantage plugin that enriches raw fetch events. Useful if you want sinks (file, HTTP shipper, console) to receive cost and normalized tokens per HTTP call:

import "llmvantage";
import { observer } from "llmvantage";
import { consoleSink } from "llmvantage/sinks/console";
import { tagCost } from "@tepa/observability-llmvantage";

observer
  .use(
    tagCost({
      pricing: {
        /* same overrides */
      },
    }),
  )
  .pipe(consoleSink);

The plugin parses Anthropic / OpenAI / Gemini response bodies to extract tokens (including cache fields) and attaches { cost: { value, currency, pricingKnown }, tokens } to each event before downstream sinks see it. Use the bridge for tepa-aware aggregation; use the plugin when you want cost visible inside the llmvantage pipeline itself.


Creating a Custom Provider

Adding a new LLM provider means extending BaseLLMProvider from @tepa/provider-core and implementing four methods plus a model catalog. By extending rather than implementing LLMProvider directly, your provider gets retry logic, exponential backoff, rate limit handling, the full logging system, and getModels() for free.

The Required Members

import { BaseLLMProvider, type BaseLLMProviderOptions } from "@tepa/provider-core";
import type { LLMMessage, LLMRequestOptions, LLMResponse, ModelInfo } from "@tepa/types";

class MyProvider extends BaseLLMProvider {
  protected readonly providerName = "my-provider";

  // Required: declare the models this provider supports
  protected readonly models: ModelInfo[] = [
    { id: "my-model-fast", tier: "fast", description: "Fast and cheap for simple tasks." },
    { id: "my-model-pro", tier: "advanced", description: "Most capable for complex reasoning." },
  ];

  constructor(options: { apiKey: string } & BaseLLMProviderOptions) {
    super(options);
    // Initialize your SDK client
  }

  // Required: make the API call, return a normalised LLMResponse
  protected async doComplete(
    messages: LLMMessage[],
    options: LLMRequestOptions,
  ): Promise<LLMResponse> {
    // Convert messages and options to your SDK's format
    // Make the API call
    // Map finish reasons to the standard enum
    // Extract tool use blocks if present
    // Return LLMResponse
  }

  // Required: true for transient errors that should be retried (500s, network errors)
  protected isRetryable(error: unknown): boolean { ... }

  // Required: true specifically for rate limit errors (gets 30x longer backoff)
  protected isRateLimitError(error: unknown): boolean { ... }

  // Required: extract Retry-After header value in ms, or return null
  protected getRetryAfterMs(error: unknown): number | null { ... }
}

BaseLLMProvider wraps doComplete() in the retry loop and exposes getModels() from your models array automatically — you implement the API call and catalog, the framework handles the rest.

BaseLLMProviderOptions

interface BaseLLMProviderOptions {
  maxRetries?: number; // Default: 3
  retryBaseDelayMs?: number; // Default: 1000
  defaultLog?: boolean; // Default: true
  logDir?: string; // Default: ".tepa/logs"
  includeContent?: boolean; // Default: false
}

Retry and Backoff Behaviour

The retry loop runs from attempt 0 through maxRetries inclusive — so maxRetries: 3 means up to 4 total attempts. Backoff delay depends on error type:

Error type Delay formula
Transient error retryBaseDelayMs × 2^attempt
Rate limit error retryBaseDelayMs × 30 × 2^attempt

If the API returns a Retry-After header (via getRetryAfterMs()), that value takes precedence over the calculated delay.

Example with defaults (retryBaseDelayMs: 1000):

Attempt Transient delay Rate limit delay
0 1s 30s
1 2s 60s
2 4s 120s

Key Implementation Notes

  • Tool schemas — if your LLM supports native function calling, convert ToolSchema[] to the SDK's format in doComplete(). See Native Tool Use above for the conversion patterns used by the built-in providers.
  • Finish reasons — map your SDK's stop reasons to the four standard values: "end_turn", "max_tokens", "stop_sequence", "tool_use". Some SDKs don't set a dedicated tool-use finish reason — detect tool calls in the response and override the reason accordingly.
  • Synthetic IDs — if the API doesn't assign IDs to tool calls (like Gemini), generate them: my-provider-call-0, my-provider-call-1, etc.

Minimal Provider (Without BaseLLMProvider)

If you don't need retry logic or logging, implement LLMProvider directly:

import type {
  LLMProvider,
  LLMMessage,
  LLMRequestOptions,
  LLMResponse,
  ModelInfo,
} from "@tepa/types";

const myProvider: LLMProvider = {
  async complete(messages, options): Promise<LLMResponse> {
    // Make the API call and return an LLMResponse
  },
  getModels(): ModelInfo[] {
    return [{ id: "my-model", tier: "balanced", description: "My custom model." }];
  },
};

Useful for testing, mocking, or wrapping a provider you've already built with its own retry logic.

Publishing as an npm Package

To share a provider with the community, publish it as a standalone package. Only @tepa/types and @tepa/provider-core are needed as dependencies — no dependency on @tepa/core or @tepa/tools:

mkdir tepa-provider-myllm
cd tepa-provider-myllm
npm init -y
npm install @tepa/types @tepa/provider-core
npm install -D typescript tsup

For the complete scaffolding walkthrough — recommended project structure, formatting.ts conversion helpers, factory function pattern, test setup, and publish steps — see the Contributing Guide.


What's Next

  • Examples and Demos — See providers in action across different use cases: autonomous code generation, data pipelines, and human-in-the-loop interaction.
  • Contributing — Full scaffolding guide for publishing providers and tools as community packages.
  • API Reference — Complete interface definitions for LLMProvider, BaseLLMProvider, and all related types.