Skip to main content
Understanding how your LLM application performs in production requires more than raw traces—you need actionable metrics derived from your observability data. ABV’s metrics platform transforms trace and evaluation data into insights you can act on.

How Metrics Work

Understanding the metrics pipeline helps you configure tracking effectively:

Trace and evaluation data ingestion

As your LLM application runs, ABV captures observability traces containing all the context about each request: inputs, outputs, model parameters, timing, costs, user information, and custom metadata. Evaluation results (scores) get associated with these traces.This raw data forms the foundation for all metrics calculations.

Automatic metric derivation

ABV automatically calculates metrics from your traces:
  • Quality: Aggregates user feedback, model-based scores, human annotations, and custom scores
  • Cost: Sums token usage and calculates costs based on model pricing
  • Latency: Measures request duration including time-to-first-token and generation speed
  • Volume: Tracks trace counts and token consumption over time
These calculations happen automatically without additional instrumentation.

Dimensional slicing

Metrics become actionable when you can analyze them across dimensions:
  • By user: Which users consume the most resources or have quality issues?
  • By feature: Which parts of your application need optimization?
  • By model: How do different models compare on cost, quality, and speed?
  • By version: Did your latest deployment improve performance?
  • Over time: Are trends moving in the right direction?
Add trace names, user IDs, tags, and version numbers to your traces to enable this slicing.

Visualization and analysis

Access your metrics through:
  • Custom dashboards: Build visualizations tailored to your needs with flexible chart types and filters
  • Metrics API: Query metrics programmatically for custom workflows and integrations
  • Pre-built views: Start with ABV’s default dashboards showing key metrics out of the box
Choose the analysis approach that fits your workflow.

Core Metrics

ABV tracks three categories of metrics derived from your observability data:
Quality measures how well your LLM application serves user needs. ABV aggregates quality signals from multiple sources:User feedback: Thumbs up/down, ratings, or custom feedback mechanisms you instrumentModel-based scoring: Automated evaluation using LLMs to assess quality dimensions like relevance, correctness, and safetyHuman-in-the-loop scoring: Expert annotations on sampled traces for ground truth quality assessmentCustom scores: Application-specific quality metrics you define and track via SDK or APITrack quality over time, across prompt versions, between different LLMs, and segmented by user cohorts to identify what drives successful outcomes.
Cost tracking: ABV accurately calculates LLM API costs based on token consumption and model pricing. Break down costs by user (who’s driving usage?), session (which conversations are expensive?), geography (regional patterns?), feature (which parts of the app cost most?), model (which provider is cheapest?), and prompt version (did optimization reduce costs?).Latency measurement: Track request duration from start to finish, including time-to-first-token for streaming responses. Identify slow requests, analyze latency percentiles (p50, p95, p99), and correlate latency with other dimensions like model choice or request complexity.These metrics help you optimize the cost-performance tradeoff in production.
Volume metrics track usage patterns:Trace volume: Count of requests over time, showing usage trends and identifying spikes or dropsToken consumption: Total tokens (input + output) processed, indicating computational loadUser activity: Unique users, sessions, and engagement patternsFeature adoption: Which parts of your application see the most usage?Volume metrics help with capacity planning, identifying growth opportunities, and understanding user behavior.

Dimensions for Analysis

Metrics become actionable when analyzed across dimensions. Add these fields to your traces to enable rich analysis:
Set a name field on your traces to differentiate between use cases, features, or workflows. Examples: “document-summarization”, “code-generation”, “customer-support-chat”.This lets you compare quality, cost, and performance across different parts of your application.Learn more about trace naming →
Add a userId to your traces to analyze metrics per user. Identify power users consuming disproportionate resources, users with quality issues, or cohorts with different usage patterns.Essential for per-user billing, user-level analytics, and personalization insights.Learn more about user tracking →
Add tags to traces for flexible filtering and grouping. Tag by customer type (“enterprise”, “free-tier”), environment (“production”, “staging”), feature flags, experiment variants, or any custom dimension.Tags provide the most flexible slicing mechanism for ad-hoc analysis.Learn more about tags →
Track release numbers and version identifiers to measure how changes affect your metrics. Did the new prompt version improve quality? Did the model switch reduce costs? Did the latest deployment increase latency?Version tracking enables data-driven decision making about your LLM application evolution.

Features

ABV provides two primary interfaces for working with metrics:

Custom Dashboards

Build visualizations tailored to your needs with flexible chart types, filters, and time ranges. Start with pre-built dashboards showing key metrics, then customize to match your workflow. Learn more about custom dashboards →

Metrics API

Query metrics programmatically with flexible filtering, aggregation, and time bucketing. Build custom workflows, integrate with external systems, or automate reporting. Learn more about the metrics API →

Next Steps