How Metrics Work
Understanding the metrics pipeline helps you configure tracking effectively:Trace and evaluation data ingestion
As your LLM application runs, ABV captures observability traces containing all the context about each request: inputs, outputs, model parameters, timing, costs, user information, and custom metadata. Evaluation results (scores) get associated with these traces.This raw data forms the foundation for all metrics calculations.
Automatic metric derivation
ABV automatically calculates metrics from your traces:
- Quality: Aggregates user feedback, model-based scores, human annotations, and custom scores
- Cost: Sums token usage and calculates costs based on model pricing
- Latency: Measures request duration including time-to-first-token and generation speed
- Volume: Tracks trace counts and token consumption over time
Dimensional slicing
Metrics become actionable when you can analyze them across dimensions:
- By user: Which users consume the most resources or have quality issues?
- By feature: Which parts of your application need optimization?
- By model: How do different models compare on cost, quality, and speed?
- By version: Did your latest deployment improve performance?
- Over time: Are trends moving in the right direction?
Visualization and analysis
Access your metrics through:
- Custom dashboards: Build visualizations tailored to your needs with flexible chart types and filters
- Metrics API: Query metrics programmatically for custom workflows and integrations
- Pre-built views: Start with ABV’s default dashboards showing key metrics out of the box
Core Metrics
ABV tracks three categories of metrics derived from your observability data:Quality Metrics
Quality Metrics
Quality measures how well your LLM application serves user needs. ABV aggregates quality signals from multiple sources:User feedback: Thumbs up/down, ratings, or custom feedback mechanisms you instrumentModel-based scoring: Automated evaluation using LLMs to assess quality dimensions like relevance, correctness, and safetyHuman-in-the-loop scoring: Expert annotations on sampled traces for ground truth quality assessmentCustom scores: Application-specific quality metrics you define and track via SDK or APITrack quality over time, across prompt versions, between different LLMs, and segmented by user cohorts to identify what drives successful outcomes.
Cost and Latency
Cost and Latency
Cost tracking: ABV accurately calculates LLM API costs based on token consumption and model pricing. Break down costs by user (who’s driving usage?), session (which conversations are expensive?), geography (regional patterns?), feature (which parts of the app cost most?), model (which provider is cheapest?), and prompt version (did optimization reduce costs?).Latency measurement: Track request duration from start to finish, including time-to-first-token for streaming responses. Identify slow requests, analyze latency percentiles (p50, p95, p99), and correlate latency with other dimensions like model choice or request complexity.These metrics help you optimize the cost-performance tradeoff in production.
Volume Metrics
Volume Metrics
Volume metrics track usage patterns:Trace volume: Count of requests over time, showing usage trends and identifying spikes or dropsToken consumption: Total tokens (input + output) processed, indicating computational loadUser activity: Unique users, sessions, and engagement patternsFeature adoption: Which parts of your application see the most usage?Volume metrics help with capacity planning, identifying growth opportunities, and understanding user behavior.
Dimensions for Analysis
Metrics become actionable when analyzed across dimensions. Add these fields to your traces to enable rich analysis:Trace Name
Trace Name
Set a
name field on your traces to differentiate between use cases, features, or workflows. Examples: “document-summarization”, “code-generation”, “customer-support-chat”.This lets you compare quality, cost, and performance across different parts of your application.Learn more about trace naming →User Tracking
User Tracking
Add a
userId to your traces to analyze metrics per user. Identify power users consuming disproportionate resources, users with quality issues, or cohorts with different usage patterns.Essential for per-user billing, user-level analytics, and personalization insights.Learn more about user tracking →Tags
Tags
Release and Version
Release and Version
Track release numbers and version identifiers to measure how changes affect your metrics. Did the new prompt version improve quality? Did the model switch reduce costs? Did the latest deployment increase latency?Version tracking enables data-driven decision making about your LLM application evolution.
Features
ABV provides two primary interfaces for working with metrics:Custom Dashboards
Build visualizations tailored to your needs with flexible chart types, filters, and time ranges. Start with pre-built dashboards showing key metrics, then customize to match your workflow. Learn more about custom dashboards →Metrics API
Query metrics programmatically with flexible filtering, aggregation, and time bucketing. Build custom workflows, integrate with external systems, or automate reporting. Learn more about the metrics API →Next Steps
Custom Dashboards
Build and configure dashboards for your metrics analysis needs
Metrics API
Query metrics programmatically for custom workflows and integrations
Observability & Tracing
Learn how to instrument your application to capture rich trace data
Evaluations
Set up evaluation workflows to generate quality scores for your metrics