LLM Providers¶
Tepa is LLM-agnostic. The LLMProvider interface is a single method — complete() — that abstracts away every provider-specific SDK, API shape, and authentication flow. Tepa ships with three built-in providers (Anthropic, OpenAI, Gemini), and you can add any other by extending BaseLLMProvider.
This section covers the provider interface, the three built-in providers and their options, native tool use, the provider logging system, and how to build a custom provider. For how providers fit into the broader package architecture, see How Tepa Works — Package Architecture.
Provider Interface¶
All provider types live in @tepa/types. The core interface is intentionally minimal:
LLMProvider¶
interface LLMProvider {
complete(messages: LLMMessage[], options: LLMRequestOptions): Promise<LLMResponse>;
getModels(): ModelInfo[];
}
Two methods. complete() is the LLM call. getModels() returns the provider's model catalog — the set of models it supports, each with metadata the Planner uses to make intelligent model assignments for individual steps. The pipeline never touches provider SDKs directly — it only talks through this interface.
ModelInfo¶
interface ModelInfo {
id: string;
description: string;
tier: "fast" | "balanced" | "advanced";
capabilities?: string[];
}
| Field | Description |
|---|---|
id |
Model identifier as passed to the provider API (e.g. "claude-sonnet-4-6") |
description |
Human-readable description rendered in the Planner's system prompt |
tier |
Capability tier — helps the Planner pick fast models for simple tasks, advanced models for complex reasoning |
capabilities |
Optional list (e.g. ["tool_use", "vision"]) for future programmatic filtering |
LLMMessage¶
A simple role/content pair. System prompts are passed separately through LLMRequestOptions, not as messages.
LLMRequestOptions¶
interface LLMRequestOptions {
model: string;
maxTokens?: number;
temperature?: number;
systemPrompt?: string;
tools?: ToolSchema[];
}
The tools field is how the pipeline passes tool schemas for native tool use. When present, the provider converts these schemas into its SDK's native format and includes them in the API call.
LLMResponse¶
interface LLMResponse {
text: string;
tokensUsed: {
input: number;
output: number;
};
finishReason: "end_turn" | "max_tokens" | "stop_sequence" | "tool_use";
toolUse?: LLMToolUseBlock[];
}
Every provider maps its SDK-specific finish reasons to this standard enum. When finishReason is "tool_use", the toolUse array contains the parsed tool calls.
LLMToolUseBlock¶
| Field | Description |
|---|---|
id |
Provider-assigned ID for correlating tool calls with results. |
name |
Name of the tool the LLM wants to call. |
input |
Parsed input parameters — already an object, not a JSON string. |
The input field is pre-parsed by the provider. The Executor passes it directly to tool.execute() without any JSON parsing step.
Built-in Providers¶
Anthropic¶
Package: @tepa/provider-anthropic
SDK: @anthropic-ai/sdk
Default model: claude-haiku-4-5
import { AnthropicProvider, AnthropicModels } from "@tepa/provider-anthropic";
const provider = new AnthropicProvider({
apiKey: process.env.ANTHROPIC_API_KEY, // omit to read from env automatically
});
Model catalog: Claude_Haiku_4_5 (fast), Claude_Sonnet_4_6 (balanced), Claude_Opus_4_6 (advanced). Use AnthropicModels.* constants for type-safe config references.
Options:
| Option | Type | Default | Description |
|---|---|---|---|
apiKey |
string |
ANTHROPIC_API_KEY env var |
API key for authentication. |
maxRetries |
number |
3 |
Max retries on transient or rate-limit errors. |
retryBaseDelayMs |
number |
1000 |
Base delay in ms for exponential backoff. |
defaultLog |
boolean |
true |
Enable automatic JSONL file logging. |
logDir |
string |
".tepa/logs" |
Directory for log files. |
includeContent |
boolean |
false |
Include full message content in logs. |
Retryable errors: Rate limit (429), internal server error (500), connection errors, overloaded (529).
Finish reason mapping:
| Anthropic | Tepa |
|---|---|
"max_tokens" |
"max_tokens" |
"stop_sequence" |
"stop_sequence" |
"tool_use" |
"tool_use" |
"end_turn" / other |
"end_turn" |
OpenAI¶
Package: @tepa/provider-openai
SDK: openai
API: Responses API
Default model: gpt-5-mini
import { OpenAIProvider, OpenAIModels } from "@tepa/provider-openai";
const provider = new OpenAIProvider({
apiKey: process.env.OPENAI_API_KEY,
});
Model catalog: GPT_5_Mini (fast), GPT_5 (advanced). Use OpenAIModels.* constants for type-safe config references.
Options:
| Option | Type | Default | Description |
|---|---|---|---|
apiKey |
string |
OPENAI_API_KEY env var |
API key for authentication. |
maxRetries |
number |
3 |
Max retries on transient or rate-limit errors. |
retryBaseDelayMs |
number |
1000 |
Base delay in ms for exponential backoff. |
defaultLog |
boolean |
true |
Enable automatic JSONL file logging. |
logDir |
string |
".tepa/logs" |
Directory for log files. |
includeContent |
boolean |
false |
Include full message content in logs. |
The OpenAI provider uses the Responses API (client.responses.create()), not the legacy Chat Completions API. System prompts are passed as a system-role input item, and tool calls are extracted from FunctionCallOutput items in the response.
Retryable errors: Rate limit (429), internal server error (500), connection errors.
Finish reason mapping:
| OpenAI | Tepa |
|---|---|
"incomplete" |
"max_tokens" |
| Tool calls in output | "tool_use" |
| Other / null | "end_turn" |
Gemini¶
Package: @tepa/provider-gemini
SDK: @google/genai
Default model: gemini-3-flash-preview
import { GeminiProvider, GeminiModels } from "@tepa/provider-gemini";
const provider = new GeminiProvider({
apiKey: process.env.GEMINI_API_KEY, // also reads GOOGLE_API_KEY
});
Model catalog: Gemini_3_Flash_Preview (fast), Gemini_3_Pro_Preview (advanced). Use GeminiModels.* constants for type-safe config references.
Options:
| Option | Type | Default | Description |
|---|---|---|---|
apiKey |
string |
GEMINI_API_KEY or GOOGLE_API_KEY env var |
API key for authentication. |
maxRetries |
number |
3 |
Max retries on transient or rate-limit errors. |
retryBaseDelayMs |
number |
1000 |
Base delay in ms for exponential backoff. |
defaultLog |
boolean |
true |
Enable automatic JSONL file logging. |
logDir |
string |
".tepa/logs" |
Directory for log files. |
includeContent |
boolean |
false |
Include full message content in logs. |
Gemini maps "assistant" roles to "model" and passes system prompts via the SDK's systemInstruction config field. Tool calls are extracted from functionCall parts in the response, with synthetic IDs (gemini-call-0, gemini-call-1, ...) since the Gemini API doesn't assign call IDs.
Retryable errors: Rate limit (429), server errors (5xx), connection errors. Non-retryable: 400, 401, 403, 404.
Finish reason mapping:
| Gemini | Tepa |
|---|---|
"MAX_TOKENS" |
"max_tokens" |
| Function calls in response | "tool_use" |
"STOP" / other |
"end_turn" |
Native Tool Use¶
All three providers use native tool use — the LLM's built-in function calling capability — rather than embedding tool descriptions in the prompt and parsing JSON from the response.
How It Works¶
When a plan step declares tools, the Executor:
- Builds tool schemas from the tool registry and passes them in
LLMRequestOptions.tools - The provider converts
ToolSchema[]to its SDK's native format - The LLM responds with structured tool call blocks instead of free-form text
- The provider extracts tool calls into
LLMToolUseBlock[]with pre-parsed parameters - The Executor invokes the tool directly with the parsed
inputobject — noJSON.parseneeded
Why It Matters¶
Text-based tool calling requires the LLM to produce valid JSON inside its response, which is fragile:
- Escaping errors — large file contents with quotes, newlines, or special characters break JSON parsing
- Format drift — the LLM might wrap the JSON in markdown code fences or add commentary
- Partial output — token limits can truncate the JSON mid-object
Native tool use eliminates all of these. The provider SDK handles serialisation and the parameters arrive as a ready-to-use object. Every built-in provider uses this approach — there is no fallback to text parsing.
Schema Conversion by Provider¶
Each provider converts ToolSchema to its SDK's expected format internally. You pass a single ToolSchema[] and the provider does the rest:
Anthropic — input_schema with JSON Schema object:
{ "name": "file_read", "description": "...", "input_schema": { "type": "object", "properties": { ... }, "required": [...] } }
OpenAI — function type with parameters object:
{ "type": "function", "name": "file_read", "description": "...", "parameters": { "type": "object", "properties": { ... }, "required": [...] } }
Gemini — functionDeclarations array with uppercase types:
{ "functionDeclarations": [{ "name": "file_read", "description": "...", "parameters": { "type": "OBJECT", "properties": { ... }, "required": [...] } }] }
Provider Logging System¶
Every provider built on BaseLLMProvider — including all three built-ins — automatically logs every LLM call to a JSONL file and optionally to custom listeners. This is one of Tepa's most useful operational features: a complete, structured audit trail of every request and response, available out of the box with zero configuration.
Default File Logging¶
By default, each provider instance creates a JSONL log file at .tepa/logs/llm-{timestamp}.jsonl. Each line is one LLMLogEntry. This is enabled by default — disable it with defaultLog: false or move it with logDir:
// Default: logs to .tepa/logs/llm-{timestamp}.jsonl
const provider = new AnthropicProvider({ apiKey: "..." });
// Disable file logging entirely
const provider = new AnthropicProvider({ apiKey: "...", defaultLog: false });
// Custom log directory
const provider = new AnthropicProvider({ apiKey: "...", logDir: "./my-logs" });
LLMLogEntry¶
Every entry captures the full context of an LLM call:
interface LLMLogEntry {
timestamp: string;
provider: string; // "anthropic", "openai", "gemini"
status: "success" | "error" | "retry";
durationMs: number;
attempt: number; // 0-based attempt number
request: {
model: string;
messageCount: number;
totalCharLength: number;
promptPreview: string; // First 120 chars of the last message
maxTokens?: number;
temperature?: number;
hasSystemPrompt: boolean;
hasTools?: boolean;
messages?: LLMMessage[]; // Only if includeContent: true
systemPrompt?: string; // Only if includeContent: true
};
response?: {
// Present on "success"
text: string;
tokensUsed: { input: number; output: number };
finishReason: string;
toolUseCount?: number;
};
error?: {
// Present on "error" and "retry"
message: string;
retryable: boolean;
};
}
A "retry" entry indicates the call failed but will be retried. A "success" entry includes the full response. An "error" entry indicates the final failure after all retries are exhausted.
Accessing Logs After a Run¶
Providers accumulate entries in memory throughout a run. Access them via the provider instance after tepa.run() completes:
const result = await tepa.run(prompt);
const entries = provider.getLogEntries();
console.log(`Total LLM calls: ${entries.length}`);
console.log(`Retries: ${entries.filter((e) => e.status === "retry").length}`);
console.log(`Failed: ${entries.filter((e) => e.status === "error").length}`);
// Path to the JSONL file on disk
const logPath = provider.getLogFilePath();
console.log(`Full logs at: ${logPath}`);
Custom Log Listeners¶
Register custom callbacks with onLog() to process entries in real time — useful for streaming metrics to monitoring platforms or triggering alerts on errors:
const provider = new AnthropicProvider({ apiKey: "..." });
// Alert on errors
provider.onLog((entry) => {
if (entry.status === "error") {
alertOncall(`LLM error: ${entry.error?.message}`);
}
});
// Prometheus-style metrics
provider.onLog((entry) => {
llmCallsTotal.inc({ provider: entry.provider, status: entry.status });
llmDurationMs.observe({ provider: entry.provider }, entry.durationMs);
if (entry.response) {
llmTokensTotal.inc(
{ provider: entry.provider, direction: "input" },
entry.response.tokensUsed.input,
);
llmTokensTotal.inc(
{ provider: entry.provider, direction: "output" },
entry.response.tokensUsed.output,
);
}
});
Multiple listeners can be registered. Each receives every log entry.
Built-in Log Callbacks¶
@tepa/provider-core exports two ready-made handlers:
consoleLogCallback — Formats entries for console output with timing and preview:
import { consoleLogCallback } from "@tepa/provider-core";
provider.onLog(consoleLogCallback);
// [2026-03-15T10:30:00.000Z] anthropic success (1234ms) model=claude-haiku-4-5 tokens=150+200
createFileLogWriter — Creates a JSONL writer for a custom path:
import { createFileLogWriter } from "@tepa/provider-core";
const writer = createFileLogWriter("./custom-logs/anthropic.jsonl");
provider.onLog(writer.callback);
writer.close(); // Close when done
Privacy Controls¶
By default, log entries do not include full message content or system prompts — only metadata: message count, character length, and a 120-character preview. Set includeContent: true to include full content for debugging:
const provider = new AnthropicProvider({
apiKey: "...",
includeContent: true, // Not recommended in production
});
When includeContent is true, the request object includes the full messages array and systemPrompt string. When false (the default), these fields are omitted.
Token Usage & Cost¶
Every provider extracts the token counts the underlying SDK reports, including prompt-cache fields when present:
interface LLMTokensUsed {
input: number;
output: number;
cacheRead?: number; // Anthropic cache hits, OpenAI cached prompt tokens, Gemini cached content tokens
cacheWrite?: number; // Anthropic only (cache_creation_input_tokens)
}
cacheRead / cacheWrite appear on both LLMResponse.tokensUsed and LLMLogEntry.response.tokensUsed, so they flow into custom onLog handlers automatically.
ModelInfo now carries an optional cost: ModelPricing field — provider packages may ship best-effort pricing for their built-in models, and you can attach pricing to any custom model when you register it:
interface ModelPricing {
inputPer1M: number;
outputPer1M: number;
cacheReadPer1M?: number;
cacheWritePer1M?: number;
currency?: string; // defaults to "USD"
}
Pricing data goes stale; treat shipped values as a starting point and override per-instance for production billing (see the bridge pricing option below).
Pairing with llmvantage for Cost & Cross-SDK Observability¶
Tepa's provider logs are pipeline-aware — they capture concepts like retry status, attempt number, normalized finish reasons, and tool-use counts that only exist above the HTTP layer. They are not, however, the right place for raw token-cost accounting across every LLM call your process makes (including any non-Tepa calls in the same app).
For that, llmvantage is a good fit. It patches global fetch and captures the underlying request/response for Anthropic, OpenAI, and Gemini SDKs — which is exactly what Tepa's providers call under the hood. The two layers compose without any glue code:
// 1. Foundation: cost & raw-HTTP observability for any LLM traffic in the process.
import "llmvantage";
import { observer } from "llmvantage";
import { consoleSink } from "llmvantage/sinks/console";
observer.pipe(consoleSink);
// 2. Tepa layer: pipeline-aware structured logs (retries, attempts, tool use).
import { AnthropicProvider } from "@tepa/provider-anthropic";
const provider = new AnthropicProvider({
apiKey: process.env.ANTHROPIC_API_KEY!,
defaultLog: false, // avoid double-writing to disk if llmvantage already has a file sink
});
provider.onLog((entry) => {
if (entry.status === "retry") {
console.warn(`retry #${entry.attempt}: ${entry.error?.message}`);
}
});
Which layer captures what:
| Concern | Use llmvantage | Use Tepa onLog |
|---|---|---|
| Token totals & cost rollups across all LLM calls | ✓ | |
| Raw request/response bodies for replay | ✓ | |
| PII redaction at the HTTP boundary | ✓ | |
| Retry status, attempt number | ✓ | |
| Normalized finish reasons across providers | ✓ | |
| Tool-use counts per call | ✓ | |
| Per-provider model catalog correlation | ✓ |
One HTTP attempt corresponds to one llmvantage event; one Tepa complete() may emit multiple log entries (one per retry plus a terminal success/error). The mapping is intentionally not 1-to-1 — each layer reflects what's true at its layer. If both layers write to disk, set defaultLog: false on the provider (or skip the llmvantage file sink) to avoid duplicate JSONL output.
@tepa/observability-llmvantage¶
For tighter integration without coupling the core packages, install the optional adapter:
It exposes two pieces:
1. createLlmvantageBridge — cost rollups from Tepa's onLog. Wire it into any provider and call summary() after the run for per-provider, per-model totals:
import { createLlmvantageBridge, defaultPricing } from "@tepa/observability-llmvantage";
import { AnthropicProvider } from "@tepa/provider-anthropic";
const bridge = createLlmvantageBridge({
pricing: {
...defaultPricing,
anthropic: {
...defaultPricing.anthropic,
// Override stale defaults, or add a model the provider package doesn't ship yet
"claude-sonnet-4-6": {
inputPer1M: 3,
outputPer1M: 15,
cacheReadPer1M: 0.3,
cacheWritePer1M: 3.75,
},
},
},
});
const provider = new AnthropicProvider({ apiKey: process.env.ANTHROPIC_API_KEY! });
provider.onLog(bridge.callback);
await tepa.run(prompt);
const summary = bridge.summary();
// {
// calls, retries, errors,
// tokens: { input, output, cacheRead, cacheWrite },
// cost: { total: 0.0234, currency: "USD" },
// byModel: { "anthropic:claude-sonnet-4-6": { calls, tokens, cost } },
// byProvider: { anthropic: { calls, tokens, cost } },
// pricingMissing: [] // provider:model pairs with no pricing entry
// }
Pricing resolution order (highest to lowest): BridgeOptions.pricing → defaultPricing shipped by the adapter. ModelInfo.cost on each provider's model catalog is reserved for v2; for now, supply overrides explicitly via pricing. Use ignoreDefaultPricing: true to bypass the shipped snapshot entirely.
2. tagCost — llmvantage plugin that enriches raw fetch events. Useful if you want sinks (file, HTTP shipper, console) to receive cost and normalized tokens per HTTP call:
import "llmvantage";
import { observer } from "llmvantage";
import { consoleSink } from "llmvantage/sinks/console";
import { tagCost } from "@tepa/observability-llmvantage";
observer
.use(
tagCost({
pricing: {
/* same overrides */
},
}),
)
.pipe(consoleSink);
The plugin parses Anthropic / OpenAI / Gemini response bodies to extract tokens (including cache fields) and attaches { cost: { value, currency, pricingKnown }, tokens } to each event before downstream sinks see it. Use the bridge for tepa-aware aggregation; use the plugin when you want cost visible inside the llmvantage pipeline itself.
Creating a Custom Provider¶
Adding a new LLM provider means extending BaseLLMProvider from @tepa/provider-core and implementing four methods plus a model catalog. By extending rather than implementing LLMProvider directly, your provider gets retry logic, exponential backoff, rate limit handling, the full logging system, and getModels() for free.
The Required Members¶
import { BaseLLMProvider, type BaseLLMProviderOptions } from "@tepa/provider-core";
import type { LLMMessage, LLMRequestOptions, LLMResponse, ModelInfo } from "@tepa/types";
class MyProvider extends BaseLLMProvider {
protected readonly providerName = "my-provider";
// Required: declare the models this provider supports
protected readonly models: ModelInfo[] = [
{ id: "my-model-fast", tier: "fast", description: "Fast and cheap for simple tasks." },
{ id: "my-model-pro", tier: "advanced", description: "Most capable for complex reasoning." },
];
constructor(options: { apiKey: string } & BaseLLMProviderOptions) {
super(options);
// Initialize your SDK client
}
// Required: make the API call, return a normalised LLMResponse
protected async doComplete(
messages: LLMMessage[],
options: LLMRequestOptions,
): Promise<LLMResponse> {
// Convert messages and options to your SDK's format
// Make the API call
// Map finish reasons to the standard enum
// Extract tool use blocks if present
// Return LLMResponse
}
// Required: true for transient errors that should be retried (500s, network errors)
protected isRetryable(error: unknown): boolean { ... }
// Required: true specifically for rate limit errors (gets 30x longer backoff)
protected isRateLimitError(error: unknown): boolean { ... }
// Required: extract Retry-After header value in ms, or return null
protected getRetryAfterMs(error: unknown): number | null { ... }
}
BaseLLMProvider wraps doComplete() in the retry loop and exposes getModels() from your models array automatically — you implement the API call and catalog, the framework handles the rest.
BaseLLMProviderOptions¶
interface BaseLLMProviderOptions {
maxRetries?: number; // Default: 3
retryBaseDelayMs?: number; // Default: 1000
defaultLog?: boolean; // Default: true
logDir?: string; // Default: ".tepa/logs"
includeContent?: boolean; // Default: false
}
Retry and Backoff Behaviour¶
The retry loop runs from attempt 0 through maxRetries inclusive — so maxRetries: 3 means up to 4 total attempts. Backoff delay depends on error type:
| Error type | Delay formula |
|---|---|
| Transient error | retryBaseDelayMs × 2^attempt |
| Rate limit error | retryBaseDelayMs × 30 × 2^attempt |
If the API returns a Retry-After header (via getRetryAfterMs()), that value takes precedence over the calculated delay.
Example with defaults (retryBaseDelayMs: 1000):
| Attempt | Transient delay | Rate limit delay |
|---|---|---|
| 0 | 1s | 30s |
| 1 | 2s | 60s |
| 2 | 4s | 120s |
Key Implementation Notes¶
- Tool schemas — if your LLM supports native function calling, convert
ToolSchema[]to the SDK's format indoComplete(). See Native Tool Use above for the conversion patterns used by the built-in providers. - Finish reasons — map your SDK's stop reasons to the four standard values:
"end_turn","max_tokens","stop_sequence","tool_use". Some SDKs don't set a dedicated tool-use finish reason — detect tool calls in the response and override the reason accordingly. - Synthetic IDs — if the API doesn't assign IDs to tool calls (like Gemini), generate them:
my-provider-call-0,my-provider-call-1, etc.
Minimal Provider (Without BaseLLMProvider)¶
If you don't need retry logic or logging, implement LLMProvider directly:
import type {
LLMProvider,
LLMMessage,
LLMRequestOptions,
LLMResponse,
ModelInfo,
} from "@tepa/types";
const myProvider: LLMProvider = {
async complete(messages, options): Promise<LLMResponse> {
// Make the API call and return an LLMResponse
},
getModels(): ModelInfo[] {
return [{ id: "my-model", tier: "balanced", description: "My custom model." }];
},
};
Useful for testing, mocking, or wrapping a provider you've already built with its own retry logic.
Publishing as an npm Package¶
To share a provider with the community, publish it as a standalone package. Only @tepa/types and @tepa/provider-core are needed as dependencies — no dependency on @tepa/core or @tepa/tools:
mkdir tepa-provider-myllm
cd tepa-provider-myllm
npm init -y
npm install @tepa/types @tepa/provider-core
npm install -D typescript tsup
For the complete scaffolding walkthrough — recommended project structure, formatting.ts conversion helpers, factory function pattern, test setup, and publish steps — see the Contributing Guide.
What's Next¶
- Examples and Demos — See providers in action across different use cases: autonomous code generation, data pipelines, and human-in-the-loop interaction.
- Contributing — Full scaffolding guide for publishing providers and tools as community packages.
- API Reference — Complete interface definitions for
LLMProvider,BaseLLMProvider, and all related types.