This guide shows you how to implement cost tracking in your application using ABV’s SDKs. Choose your preferred language and integration method below.
Before you start: Decide whether to ingest or infer usage data. See the Cost Tracking Reference to understand the differences.
Python SDK Implementation
Installation
Install the required packages:
pip install abvdev
# Also install your LLM provider SDK
pip install anthropic # for Anthropic
pip install openai # for OpenAI
Method 1: Using the Decorator (Recommended)
The decorator approach is clean and integrates seamlessly with your existing functions.
from abvdev import ABV , observe
import anthropic
# ABV client initialization
abv = ABV(
api_key = "sk-abv-..." ,
host = "https://app.abv.dev" , # or "https://eu.app.abv.dev" for EU
)
# Anthropic client initialization
anthropic_client = anthropic.Anthropic( api_key = "sk-ant-..." )
@observe ( as_type = "generation" )
def anthropic_completion ( ** kwargs ):
# Extract fields from kwargs
kwargs_clone = kwargs.copy()
input = kwargs_clone.pop( 'messages' , None )
model = kwargs_clone.pop( 'model' , None )
# Set initial generation data
abv.update_current_generation(
input = input ,
model = model,
metadata = kwargs_clone
)
# Make the LLM call
response = anthropic_client.messages.create( ** kwargs)
# Update with usage and cost details
abv.update_current_generation(
usage_details = {
"input" : response.usage.input_tokens,
"output" : response.usage.output_tokens,
"cache_read_input_tokens" : response.usage.cache_read_input_tokens
# "total" is automatically derived if not set
},
# Optional: Ingest cost if you calculate it yourself
# Otherwise, ABV will infer it from model definitions
cost_details = {
"input" : 1.0 ,
"cache_read_input_tokens" : 0.5 ,
"output" : 1.0 ,
# "total" is automatically derived if not set
}
)
return response.content[ 0 ].text
@observe ()
def main ():
return anthropic_completion(
model = "claude-sonnet-4-5" ,
max_tokens = 1024 ,
messages = [
{ "role" : "user" , "content" : "Hello, Claude" }
]
)
main()
from abvdev import ABV , observe
from openai import OpenAI
# ABV client initialization
abv = ABV(
api_key = "sk-abv-..." ,
host = "https://app.abv.dev" ,
)
openai_client = OpenAI( api_key = "sk-proj-..." )
@observe ( as_type = "generation" )
def openai_completion ( ** kwargs ):
kwargs_clone = kwargs.copy()
messages = kwargs_clone.pop( 'messages' , None )
model = kwargs_clone.pop( 'model' , None )
abv.update_current_generation(
input = messages,
model = model,
metadata = kwargs_clone
)
response = openai_client.chat.completions.create( ** kwargs)
# Use OpenAI-style usage schema (automatically mapped by ABV)
abv.update_current_generation(
usage_details = {
"prompt_tokens" : response.usage.prompt_tokens,
"completion_tokens" : response.usage.completion_tokens,
"total_tokens" : response.usage.total_tokens,
"prompt_tokens_details" : {
"cached_tokens" : response.usage.prompt_tokens_details.cached_tokens,
"audio_tokens" : response.usage.prompt_tokens_details.audio_tokens,
},
"completion_tokens_details" : {
"reasoning_tokens" : response.usage.completion_tokens_details.reasoning_tokens,
},
}
)
return response.choices[ 0 ].message.content
@observe ()
def main ():
return openai_completion(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello, OpenAI" }]
)
main()
OpenAI Usage Schema: When using OpenAI-style usage details, ABV automatically maps:
prompt_tokens → input
completion_tokens → output
total_tokens → total
prompt_tokens_details.* → input_*
completion_tokens_details.* → output_*
Method 2: Manual Context Manager
For more control, use the context manager approach:
from abvdev import ABV
import anthropic
abv = ABV(
api_key = "sk-abv-..." ,
host = "https://app.abv.dev" ,
)
anthropic_client = anthropic.Anthropic( api_key = "sk-ant-..." )
with abv.start_as_current_observation(
as_type = 'generation' ,
name = "anthropic-completion" ,
model = "claude-haiku-4-5" ,
input = [{ "role" : "user" , "content" : "Hello, Claude" }]
) as generation:
# Make the LLM call
response = anthropic_client.messages.create(
model = "claude-haiku-4-5" ,
max_tokens = 1024 ,
messages = [{ "role" : "user" , "content" : "Hello, Claude" }]
)
# Update with response and usage
generation.update(
output = response.content[ 0 ].text,
usage_details = {
"input" : response.usage.input_tokens,
"output" : response.usage.output_tokens,
"cache_read_input_tokens" : response.usage.cache_read_input_tokens
},
# Optional: Add cost details
cost_details = {
"input" : 1.0 ,
"cache_read_input_tokens" : 0.5 ,
"output" : 1.0 ,
}
)
JavaScript/TypeScript SDK Implementation
Installation
Install the required packages:
npm install @abvdev/tracing @abvdev/otel @opentelemetry/sdk-node dotenv openai
Setup
1. Add credentials to .env:
ABV_API_KEY = "sk-abv-..."
ABV_BASE_URL = "https://app.abv.dev" # or "https://eu.app.abv.dev" for EU
OPENAI_API_KEY = "sk-proj-..."
2. Create instrumentation.ts:
import dotenv from "dotenv" ;
dotenv . config ();
import { NodeSDK } from "@opentelemetry/sdk-node" ;
import { ABVSpanProcessor } from "@abvdev/otel" ;
const sdk = new NodeSDK ({
spanProcessors: [
new ABVSpanProcessor ({
apiKey: process . env . ABV_API_KEY ,
baseUrl: process . env . ABV_BASE_URL ,
exportMode: "immediate" ,
flushAt: 1 ,
flushInterval: 1 ,
additionalHeaders: {
"Content-Type" : "application/json" ,
"Accept" : "application/json"
}
})
],
});
sdk . start ();
Important: Import instrumentation.ts as the first import in your application to ensure proper initialization.
Method 1: Context Manager (Recommended)
import "./instrumentation" ; // Must be first!
import {
startActiveObservation ,
startObservation ,
} from "@abvdev/tracing" ;
import OpenAI from "openai" ;
import dotenv from "dotenv" ;
dotenv . config ();
const openai = new OpenAI ({
apiKey: process . env . OPENAI_API_KEY ,
});
async function main () {
await startActiveObservation ( "user-request" , async ( span ) => {
span . update ({
input: { query: "What is the capital of France?" },
});
const model = "gpt-4o" ;
const input = [{ role: "user" , content: "What is the capital of France?" }];
// Create a generation observation
const generation = startObservation (
"llm-call" ,
{
model: model ,
input: input ,
},
{ asType: "generation" }
);
// Make the LLM call
const response = await openai . chat . completions . create ({
messages: input ,
model: model ,
});
const llmOutput = response . choices [ 0 ]. message . content ;
// Update with usage details
generation . update ({
usageDetails: {
prompt_tokens: response . usage . prompt_tokens ,
completion_tokens: response . usage . completion_tokens ,
total_tokens: response . usage . total_tokens ,
prompt_tokens_details: {
cached_tokens: response . usage . prompt_tokens_details . cached_tokens ,
audio_tokens: response . usage . prompt_tokens_details . audio_tokens ,
},
completion_tokens_details: {
reasoning_tokens: response . usage . completion_tokens_details . reasoning_tokens ,
},
},
output: { content: llmOutput },
});
generation . end ();
});
}
main ();
Method 2: Observe Wrapper
Wrap existing functions to trace them automatically:
import "./instrumentation" ; // Must be first!
import { observe , updateActiveObservation } from "@abvdev/tracing" ;
import OpenAI from "openai" ;
import dotenv from "dotenv" ;
dotenv . config ();
const openai = new OpenAI ({
apiKey: process . env . OPENAI_API_KEY ,
});
// Existing function
async function fetchData ( source : string ) {
const model = "gpt-4o" ;
const input = [{ role: "user" , content: "What is the capital of France?" }];
const response = await openai . chat . completions . create ({
messages: input ,
model: model ,
});
const llmOutput = response . choices [ 0 ]. message . content ;
// Update the active observation with usage details
updateActiveObservation (
"generation" ,
{
usageDetails: {
prompt_tokens: response . usage . prompt_tokens ,
completion_tokens: response . usage . completion_tokens ,
total_tokens: response . usage . total_tokens ,
prompt_tokens_details: {
cached_tokens: response . usage . prompt_tokens_details . cached_tokens ,
audio_tokens: response . usage . prompt_tokens_details . audio_tokens ,
},
completion_tokens_details: {
reasoning_tokens: response . usage . completion_tokens_details . reasoning_tokens ,
},
},
output: { content: llmOutput },
}
);
return { data: llmOutput };
}
// Wrap the function to trace it
const tracedFetchData = observe ( fetchData , {
name: "fetch-data" ,
asType: "generation" ,
});
async function main () {
const result = await tracedFetchData ( "API" );
}
main ();
Method 3: Manual Span Creation
For maximum control, create spans manually:
import "./instrumentation" ; // Must be first!
import { startObservation } from "@abvdev/tracing" ;
import OpenAI from "openai" ;
import dotenv from "dotenv" ;
dotenv . config ();
const openai = new OpenAI ({
apiKey: process . env . OPENAI_API_KEY ,
});
async function main () {
const span = startObservation ( "manual-observation" , {
input: { query: "What is the capital of France?" },
});
const model = "gpt-4o" ;
const input = [{ role: "user" , content: "What is the capital of France?" }];
const response = await openai . chat . completions . create ({
messages: input ,
model: model ,
});
const llmOutput = response . choices [ 0 ]. message . content ;
// Create a child generation observation
const generation = span . startObservation (
"llm-call" ,
{
model: model ,
input: input ,
},
{ asType: "generation" }
);
// Update with usage details
generation . update ({
usageDetails: {
prompt_tokens: response . usage . prompt_tokens ,
completion_tokens: response . usage . completion_tokens ,
total_tokens: response . usage . total_tokens ,
prompt_tokens_details: {
cached_tokens: response . usage . prompt_tokens_details . cached_tokens ,
audio_tokens: response . usage . prompt_tokens_details . audio_tokens ,
},
completion_tokens_details: {
reasoning_tokens: response . usage . completion_tokens_details . reasoning_tokens ,
},
},
output: { content: llmOutput },
});
generation . end ();
span . update ({ output: "Successfully answered user request." }). end ();
}
main ();
Automatic Usage Capture via Integrations
Many ABV integrations automatically capture usage and cost from LLM responses. If you’re using an integration and usage data isn’t appearing as expected, contact support.
Ingesting Only Usage (Let ABV Calculate Cost)
You don’t need to calculate costs yourself. Simply ingest usage details, and ABV will calculate costs based on model definitions:
# Python example - only usage, no cost
generation.update(
usage_details = {
"input" : response.usage.input_tokens,
"output" : response.usage.output_tokens,
}
# No cost_details - ABV will infer cost from model pricing
)
// TypeScript example - only usage, no cost
generation . update ({
usageDetails: {
prompt_tokens: response . usage . prompt_tokens ,
completion_tokens: response . usage . completion_tokens ,
}
// No costDetails - ABV will infer cost from model pricing
});
Next Steps
Common Issues
Usage and cost aren't showing up
Check these:
Verify you’re passing usage_details in the update call
Ensure the model name matches a model definition in ABV
For inference: Check if your model is supported in the model definitions list
Look for errors in your application logs
Possible causes:
Using inference instead of ingestion (switch to ingesting actual usage)
Model pricing may have changed (update model definitions)
Custom models need pricing configured
For historical data: Model definitions aren’t applied retroactively
Missing usage for reasoning models
Solution:
Reasoning models (like OpenAI o1) require ingested usage. ABV cannot infer usage because reasoning tokens are generated internally.Always ingest reasoning_tokens from the API response for these models. Learn more about reasoning models