How to Use This FAQ

This guide is organized by topic to help you quickly find answers:

Getting Started - Basic concepts, managing prompts, prompt engineering fundamentals
Configuration & Setup - Retries, timeouts, caching configuration
Performance & Reliability - Caching strategies, guaranteed availability, performance optimization
Advanced Features - Version control, A/B testing, metrics tracking
Integration & Tracing - Linking prompts to traces, measuring performance

Each question includes working code examples for both Python and JavaScript/TypeScript SDKs.

Getting Started

How can I manage my prompts with ABV?

ABV provides comprehensive prompt management through the UI, SDKs, and API.Creating Prompts:Via UI:

Sign in to ABV
Navigate to Prompts section
Click “Create Prompt”
Enter prompt content with {{variables}}
Add configuration (model, temperature, etc.)
Assign labels for deployment

Via Python SDK:

from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# Create text prompt
abv.create_prompt(
    name="movie-critic",
    type="text",
    prompt="As a {{criticlevel}} movie critic, do you like {{movie}}?",
    labels=["production"],
    config={
        "model": "gpt-4o",
        "temperature": 0.7,
        "max_tokens": 1000
    }
)

# Create chat prompt
abv.create_prompt(
    name="chat-assistant",
    type="chat",
    prompt=[
        {"role": "system", "content": "You are a {{persona}} assistant"},
        {"role": "user", "content": "{{user_query}}"}
    ],
    labels=["production"]
)

Via JavaScript/TypeScript SDK:

import { ABVClient } from "@abvdev/client";

const abv = new ABVClient();

await abv.prompt.create({
  name: "movie-critic",
  type: "text",
  prompt: "As a {{criticlevel}} critic, do you like {{movie}}?",
  labels: ["production"],
  config: {
    model: "gpt-4o",
    temperature: 0.7
  }
});

Fetching Prompts:

# Python
prompt = abv.get_prompt("movie-critic")  # Gets production version
prompt = abv.get_prompt("movie-critic", version=1)  # Specific version
prompt = abv.get_prompt("movie-critic", label="staging")  # Specific label
prompt = abv.get_prompt("movie-critic", label="latest")  # Latest version

# Compile with variables
compiled = prompt.compile(criticlevel="expert", movie="Dune 2")

// JavaScript/TypeScript
const prompt = await abv.prompt.get("movie-critic");
const prompt2 = await abv.prompt.get("movie-critic", { version: 1 });
const prompt3 = await abv.prompt.get("movie-critic", { label: "staging" });

// Compile with variables
const compiled = prompt.compile({ criticlevel: "expert", movie: "Dune 2" });

Updating Labels:

# Python
abv.update_prompt(
    name="movie-critic",
    version=2,
    new_labels=["production", "experiment-a"]
)

// JavaScript/TypeScript
await abv.prompt.update({
  name: "movie-critic",
  version: 2,
  newLabels: ["production", "experiment-a"]
});

Key Features:

Version control with automatic versioning
Labels for deployment management (production, staging, etc.)
Config versioning alongside prompts
Diff view to see changes between versions
Protected labels for production safety
Rollback capability with one click or API call
Variables with {{mustache}} syntax for dynamic content

What is prompt engineering?

Prompt engineering is the practice of designing and optimizing text prompts to get better outputs from Large Language Models (LLMs).Why it matters:

Better prompt = better LLM output quality
Can significantly impact accuracy, relevance, and usefulness
More cost-effective than fine-tuning models
Faster iteration cycle than model training

Key Techniques:1. Clear Instructions: Be specific about what you want, provide context and constraints, and define the output format.

Bad: "Write about dogs"

Good: "Write a 3-paragraph informative article about Golden Retrievers,
      including their history, temperament, and care requirements.
      Use a friendly tone suitable for first-time dog owners."

2. Few-Shot Examples: Show the model examples of desired output to establish patterns and format.

Classify sentiment:

Text: "I love this product!"
Sentiment: Positive

Text: "This is terrible"
Sentiment: Negative

Text: "{{user_input}}"
Sentiment:

3. Role/Persona: Define who the LLM should act as, which influences tone and expertise level.

You are an expert software architect with 15 years of experience
in distributed systems. Analyze the following system design...

4. Chain of Thought: Ask the model to think step-by-step to improve reasoning and accuracy.

Solve this problem step by step, showing your reasoning:
{{problem}}

5. Constraints and Format: Specify output format (JSON, markdown, etc.), set length limits, and define what to avoid.

Respond in JSON format with keys: "summary", "key_points", "confidence_score".
Keep summary under 100 words.
Do not include personal opinions.

ABV’s Role in Prompt Engineering:ABV helps you iterate on prompts systematically:

Version control to track changes and compare iterations
A/B testing to compare variants with statistical rigor
Metrics tracking to measure improvements objectively
Tracing to see prompts in context with real user interactions
Team collaboration via UI for cross-functional input
Quick rollbacks when changes don’t work as expected

Best Practices:

Start simple, then iterate based on results
Test with diverse inputs representing edge cases
Measure performance metrics (latency, cost, quality)
Use version control to track what works
A/B test significant changes in production
Document what works and why for team knowledge
Keep prompts maintainable and readable for future iterations

Configuration & Setup

How to configure retries and timeouts when fetching prompts?

ABV prompts are cached client-side by default, so network-related issues are minimized after the first fetch. However, you can configure network behavior for initial requests.Caching Configuration:The default cache TTL is 60 seconds. You can customize this to reduce network calls:Python SDK:

from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# Increase cache duration to reduce network calls
prompt = abv.get_prompt("my-prompt", cache_ttl_seconds=300)  # 5 minutes

# Disable caching for development (see all changes immediately)
prompt = abv.get_prompt("my-prompt", cache_ttl_seconds=0)

JavaScript/TypeScript SDK:

import { ABVClient } from "@abvdev/client";

const abv = new ABVClient();

// Increase cache duration
const prompt = await abv.prompt.get("my-prompt", {
  cacheTtlSeconds: 300  // 5 minutes
});

// Disable caching for development
const prompt = await abv.prompt.get("my-prompt", {
  cacheTtlSeconds: 0
});

Guaranteed Availability:For critical applications requiring 100% availability, use these strategies:1. Pre-fetch prompts on startup to populate the cache:

from abvdev import ABV
import sys

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# Pre-fetch during application startup
startup_prompts = ["critical-prompt-1", "critical-prompt-2"]

try:
    for prompt_name in startup_prompts:
        abv.get_prompt(prompt_name)
    print("All critical prompts cached successfully")
except Exception as e:
    print(f"CRITICAL: Failed to fetch prompts: {e}")
    sys.exit(1)  # Fail fast if prompts unavailable

2. Provide fallback prompts for when the API is unavailable:

from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# Use with fallback
try:
    prompt = abv.get_prompt("my-prompt")
    prompt_text = prompt.compile(input="user query")
except Exception as e:
    # Log error and use fallback
    print(f"Warning: Using fallback prompt due to: {e}")
    prompt_text = "As a helpful assistant, respond to: {{input}}"

How caching works:

Cache hit: Prompt returned immediately from memory (no network call)
Stale cache: Old prompt returned immediately while revalidating in background (stale-while-revalidate pattern)
Cache miss: Prompt fetched from API (ABV uses Redis cache for low latency ~15-50ms median)

See also: Guaranteed Availability Guide for comprehensive strategies.

Performance & Reliability

How do I cache prompts for better performance?

ABV prompts are automatically cached client-side in the SDKs with intelligent background revalidation, ensuring minimal latency impact.How Caching Works:

Cache Hit - Prompt in cache and fresh → returned immediately (0ms network overhead)
Stale Cache - Prompt in cache but expired → returned immediately, revalidated in background
Cache Miss - First request → fetched from API (low latency, Redis-backed on ABV side)

Default Behavior:

# Python - Default 60-second cache
from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# First call - fetches from API and caches
prompt = abv.get_prompt("my-prompt")

# Subsequent calls within 60s - instant from cache
prompt = abv.get_prompt("my-prompt")  # No network call

Custom Cache Duration:Python SDK:

# Cache for 5 minutes
prompt = abv.get_prompt("my-prompt", cache_ttl_seconds=300)

# Cache for 1 hour (for stable production prompts)
prompt = abv.get_prompt("my-prompt", cache_ttl_seconds=3600)

# Disable caching (development/testing)
prompt = abv.get_prompt("my-prompt", cache_ttl_seconds=0)

# Common pattern: no cache + latest version in development
prompt = abv.get_prompt(
    "my-prompt",
    cache_ttl_seconds=0,
    label="latest"
)

JavaScript/TypeScript SDK:

import { ABVClient } from "@abvdev/client";

const abv = new ABVClient();

// Cache for 5 minutes
const prompt1 = await abv.prompt.get("my-prompt", {
  cacheTtlSeconds: 300
});

// Cache for 1 hour
const prompt2 = await abv.prompt.get("my-prompt", {
  cacheTtlSeconds: 3600
});

// Disable caching
const prompt3 = await abv.prompt.get("my-prompt", {
  cacheTtlSeconds: 0
});

// Development pattern
const devPrompt = await abv.prompt.get("my-prompt", {
  cacheTtlSeconds: 0,
  label: "latest"
});

Pre-fetching for Zero Latency:Load prompts during application startup to eliminate runtime latency:

# Python
from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# Pre-fetch during startup
critical_prompts = [
    "user-greeting",
    "error-handler",
    "main-assistant"
]

for prompt_name in critical_prompts:
    abv.get_prompt(prompt_name)  # Populates cache

# Now runtime requests are instant (0ms)

// JavaScript/TypeScript
const abv = new ABVClient();

// Pre-fetch during startup
const criticalPrompts = [
  "user-greeting",
  "error-handler",
  "main-assistant"
];

await Promise.all(
  criticalPrompts.map(name => abv.prompt.get(name))
);

// Now runtime requests are instant

Fallback for 100% Availability:

# Python
from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

def get_prompt_with_fallback(name: str, fallback: str):
    try:
        return abv.get_prompt(name)
    except Exception as e:
        # Log error for monitoring
        print(f"WARNING: Failed to fetch prompt '{name}': {e}")
        # Return fallback prompt
        return type('Prompt', (), {'prompt': fallback, 'compile': lambda **kw: fallback})()

# Usage
prompt = get_prompt_with_fallback(
    "my-prompt",
    fallback="You are a helpful assistant. {{user_input}}"
)

Performance Benchmarks:From ABV’s testing (1000 sequential requests):Without caching (cache_ttl_seconds=0):

Median latency: ~50ms
95th percentile: ~100ms
99th percentile: ~150ms

With caching enabled (default):

Cached requests: 0ms (instant, in-memory)
Stale-while-revalidate: 0ms (instant return, background update)

Best Practices:

Production: Use default 60s cache or longer (5-10 minutes) for stable prompts
Development: Disable cache to see changes immediately
Critical paths: Pre-fetch prompts on application startup
High availability: Implement fallback prompts for mission-critical flows
Staging: Use moderate cache (30-60s) for balance between freshness and performance
Monitor: Check ABV status page (status.abv.dev) for API availability

When to Adjust Cache TTL:

Increase TTL: Stable production prompts, reduce API calls, improve performance
Decrease TTL: Frequently updated prompts, need faster updates
Disable (0s): Local development, testing prompt changes in real-time
Pre-fetch: Startup-critical prompts, serverless cold start optimization

See also: Client-Side Caching Guide for technical implementation details.

Advanced Features

How do I version control my prompts?

ABV provides built-in version control for all prompts with automatic versioning and label-based deployment.Automatic Versioning:Every time you create or update a prompt, ABV automatically assigns an incrementing version number:

# Python
from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# First creation - becomes version 1
abv.create_prompt(
    name="movie-critic",
    prompt="Do you like {{movie}}?",
    labels=["production"]
)

# Update (create new version) - becomes version 2
abv.create_prompt(
    name="movie-critic",
    prompt="As a critic, do you like {{movie}}?",
    labels=["staging"]
)

Labels for Deployment:Use labels to manage which version is deployed to different environments:

# Assign labels to versions
abv.update_prompt(
    name="movie-critic",
    version=1,
    new_labels=["production"]
)

abv.update_prompt(
    name="movie-critic",
    version=2,
    new_labels=["staging", "experiment-a"]
)

Fetching Specific Versions:

# Python
# Get production version (default behavior)
prod_prompt = abv.get_prompt("movie-critic")
prod_prompt = abv.get_prompt("movie-critic", label="production")

# Get staging version
staging_prompt = abv.get_prompt("movie-critic", label="staging")

# Get specific version number
v1_prompt = abv.get_prompt("movie-critic", version=1)

# Get latest version (most recent, regardless of labels)
latest_prompt = abv.get_prompt("movie-critic", label="latest")

// JavaScript/TypeScript
const prodPrompt = await abv.prompt.get("movie-critic");
const stagingPrompt = await abv.prompt.get("movie-critic", { label: "staging" });
const v1Prompt = await abv.prompt.get("movie-critic", { version: 1 });
const latestPrompt = await abv.prompt.get("movie-critic", { label: "latest" });

Version Comparison:The ABV UI provides a diff view to compare prompt versions:

See exactly what changed between versions (text diff)
Track who made changes and when (audit trail)
Review config changes alongside prompt changes
View commit messages explaining why changes were made

Rollback:To rollback to a previous version, simply reassign the production label:

# Rollback: make version 1 the production version again
abv.update_prompt(
    name="movie-critic",
    version=1,
    new_labels=["production"]
)

Or perform the rollback in the UI with one click.Protected Labels:For additional production safety, admins can mark labels as “protected”:

Only admins/owners can modify protected labels
Prevents accidental changes to production prompts
Enforces change management process
Configure in project settings

Best Practices:

Always use production label for deployed versions
Use staging for testing before promoting to production
Use descriptive labels for experiments (e.g., experiment-longer-context, variant-a)
The latest label is automatically maintained by ABV (always points to newest version)
Never delete old versions - keep history for debugging and rollback
Use commit messages to document why changes were made
Review diffs before promoting to production to catch unintended changes

Common Workflow:

Develop prompt changes locally (use label="latest" and cache_ttl_seconds=0)
Deploy to staging (labels=["staging"])
Test in staging environment
Review metrics and validate quality
Promote to production by reassigning production label
Monitor production metrics
Rollback if issues detected (reassign production to previous version)

See also: Version Control Guide for deployment workflows.

How do I implement A/B testing for prompts?

ABV enables A/B testing by using labels to identify different prompt variants, then randomly selecting between them in your application.Step 1: Create Prompt VariantsCreate multiple versions and label them for your test:

from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# Variant A - shorter prompt
abv.create_prompt(
    name="movie-critic",
    prompt="Do you like {{movie}}?",
    labels=["prod-a"]
)

# Variant B - more detailed prompt
abv.create_prompt(
    name="movie-critic",
    prompt="As an expert film critic, provide your opinion on {{movie}}. Include analysis of the plot, acting, and cinematography.",
    labels=["prod-b"]
)

Step 2: Implement Random SelectionPython SDK:

from abvdev import ABV
from openai import OpenAI
import random

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")
openai_client = OpenAI(api_key="sk-proj-...")

# Fetch both variants (cached after first request)
prompt_a = abv.get_prompt("movie-critic", label="prod-a")
prompt_b = abv.get_prompt("movie-critic", label="prod-b")

# Randomly select (50/50 split)
selected_prompt = random.choice([prompt_a, prompt_b])

# Use in LLM call with tracing (crucial for metrics by variant)
with abv.start_as_current_observation(
    as_type="generation",
    name="movie-review",
    model="gpt-4o",
    prompt=selected_prompt  # Links to specific variant for metrics tracking
) as generation:
    compiled = selected_prompt.compile(movie="Dune 2")

    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": compiled}]
    )

    generation.update(
        output=response.choices[0].message.content,
        usage_details={
            "input": response.usage.input_tokens,
            "output": response.usage.output_tokens
        }
    )

abv.flush()

JavaScript/TypeScript SDK:

import { ABVClient } from "@abvdev/client";
import { startObservation } from "@abvdev/tracing";
import OpenAI from "openai";

const abv = new ABVClient();
const openai = new OpenAI();

async function runABTest() {
  // Fetch both variants
  const promptA = await abv.prompt.get("movie-critic", { label: "prod-a" });
  const promptB = await abv.prompt.get("movie-critic", { label: "prod-b" });

  // Randomly select (50/50 split)
  const selectedPrompt = Math.random() < 0.5 ? promptA : promptB;

  // Use in LLM call with tracing
  const generation = startObservation(
    "movie-review",
    {
      model: "gpt-4o",
      prompt: selectedPrompt  // Links to specific variant
    },
    { asType: "generation" }
  );

  const compiled = selectedPrompt.compile({ movie: "Dune 2" });

  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: compiled }]
  });

  generation.update({
    output: response.choices[0].message.content,
    usageDetails: {
      prompt_tokens: response.usage.prompt_tokens,
      completion_tokens: response.usage.completion_tokens
    }
  });

  generation.end();
}

Step 3: Analyze ResultsNavigate to your prompt in the ABV UI and view the Metrics tab:Compare Metrics by Variant:

Response latency (median, p95, p99)
Token usage (input tokens, output tokens)
Cost per request
Quality scores (if you’re scoring responses via evaluations)
Volume/distribution between variants

Statistical Significance:

Run tests long enough to gather sufficient data (minimum 100-200 requests per variant)
Use statistical tests (t-test, Mann-Whitney U) to determine significance
Consider using staged rollout (90/10 split initially) for safety

See also: A/B Testing Guide for statistical rigor and best practices.Advanced: Weighted Distribution

import random

# 90% variant A, 10% variant B (canary deployment)
selected_prompt = prompt_a if random.random() < 0.9 else prompt_b

# 80/20 split
selected_prompt = prompt_a if random.random() < 0.8 else prompt_b

Best Practices:

Start with canary deployment (90/10 or 95/5) to limit blast radius
Monitor error rates and user feedback closely during initial rollout
Use A/B testing for significant changes (major rewrites, different approaches)
Run tests long enough for statistical significance (don’t stop early)
Consider user segments (test on subset of users first)
Have rollback plan ready (can immediately switch back to variant A)
Track multiple metrics (not just one - latency, cost, quality, user satisfaction)
Document test hypotheses and results for organizational learning

When to Use A/B Testing:

Testing prompt improvements in production with real users
Validating changes before full rollout
When evaluation datasets don’t capture real usage patterns
For consumer apps where some variation is acceptable
After thorough testing on evaluation datasets (A/B test is final validation)

Integration & Tracing

How do I link prompt management with tracing in ABV?

Linking prompts to traces enables you to track which prompt version was used for each LLM call and analyze performance by prompt version.Why Link Prompts to Traces:

See which prompt version was used in each generation
Filter traces by prompt name or version
Track metrics aggregated by prompt version
Compare performance between prompt versions
Identify which prompts lead to better outcomes (higher user satisfaction, lower cost, etc.)

Python SDK:

from abvdev import ABV, observe
from openai import OpenAI

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")
openai_client = OpenAI(api_key="sk-proj-...")

# Method 1: With @observe decorator
@observe(as_type="generation")
def generate_response():
    prompt = abv.get_prompt("movie-critic")

    # Link prompt to current generation (crucial for tracking)
    abv.update_current_generation(prompt=prompt)

    compiled = prompt.compile(criticlevel="expert", movie="Dune 2")
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": compiled}]
    )

    return response.choices[0].message.content

# Method 2: With context manager
prompt = abv.get_prompt("movie-critic")

with abv.start_as_current_observation(
    as_type='generation',
    name="movie-generation",
    model="gpt-4o",
    prompt=prompt  # Link prompt here
) as generation:
    compiled = prompt.compile(criticlevel="expert", movie="Dune 2")

    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": compiled}]
    )

    generation.update(output=response.choices[0].message.content)

JavaScript/TypeScript SDK:

import "./instrumentation";  // Must be imported first for tracing
import { ABVClient } from "@abvdev/client";
import { startObservation, observe, updateActiveObservation } from "@abvdev/tracing";

const abv = new ABVClient();

// Method 1: Manual observation
async function example1() {
  const prompt = await abv.prompt.get("movie-critic");

  const generation = startObservation(
    "llm-call",
    {
      model: "gpt-4o",
      input: prompt.prompt,
      prompt: prompt  // Link prompt here
    },
    { asType: "generation" }
  );

  // ... LLM call ...
  generation.end();
}

// Method 2: With observe wrapper
const callLLM = async (input: string) => {
  const prompt = await abv.prompt.get("my-prompt");

  // Link prompt to active observation
  updateActiveObservation(
    { prompt },
    { asType: "generation" }
  );

  return await invokeLLM(input);
};

export const observedCallLLM = observe(callLLM);

Benefits of Linking:

Trace filtering: Filter traces by prompt name or version in ABV UI
Metrics by version: See latency, cost, tokens by prompt version
Performance comparison: Compare metrics between v1 and v2 of a prompt
Debugging: Identify which prompt version caused issues in production
A/B testing: Track metrics by variant label for statistical analysis

Important Notes:

If a fallback prompt is used (when API is unavailable), no prompt link is created
Prompt link must be set before generation completes to appear in metrics
Use same prompt object from get_prompt() to ensure version tracking works

See also: Link Prompts to Traces Guide for detailed integration patterns.

How do I measure prompt performance?

ABV provides comprehensive metrics when you link prompts to traces, enabling performance tracking by prompt version.Step 1: Link prompts to generationsPython SDK:

from abvdev import ABV, observe

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# With decorator
@observe(as_type="generation")
def call_llm():
    prompt = abv.get_prompt("my-prompt")

    abv.update_current_generation(
        prompt=prompt,
        model="gpt-4o"
    )

    # Your LLM call here
    response = openai_client.chat.completions.create(...)

    abv.update_current_generation(
        output=response.choices[0].message.content,
        usage_details={
            "input": response.usage.input_tokens,
            "output": response.usage.output_tokens,
        }
    )

# With context manager
prompt = abv.get_prompt("my-prompt")

with abv.start_as_current_observation(
    as_type='generation',
    name="llm-call",
    model="gpt-4o",
    prompt=prompt  # Link prompt for metrics
) as generation:
    # Your LLM call
    generation.update(output="response")

JavaScript/TypeScript SDK:

import { ABVClient } from "@abvdev/client";
import { startObservation } from "@abvdev/tracing";

const abv = new ABVClient();
const prompt = await abv.prompt.get("my-prompt");

const generation = startObservation(
  "llm-call",
  {
    model: "gpt-4o",
    input: prompt.prompt,
    prompt: prompt  // Link the prompt for metrics
  },
  { asType: "generation" }
);

// ... LLM call ...

generation.update({
  output: "response",
  usageDetails: { /* token counts */ }
}).end();

Step 2: View metrics in ABV UINavigate to your prompt in the ABV UI and click the Metrics tab to see:Available Metrics:

Median generation latency - How long generations take
Median input tokens - Token count for prompts sent to LLM
Median output tokens - Token count for LLM responses
Median generation costs - Cost per generation (based on model pricing)
Generation count - Total number of generations using this prompt
Median score values - From evaluations or custom scores
First and last generation timestamps - When prompt was first/last used

Compare versions:

Use the UI to compare metrics across different prompt versions
A/B test variants to see which performs better
Track improvements over time as you iterate on prompts

Custom metrics: Add custom scores via the Scores API to track domain-specific metrics:

Accuracy (for tasks with right/wrong answers)
Relevance (how well response addresses the query)
User satisfaction (thumbs up/down, star ratings)
Hallucination rate (factual correctness)
Tone appropriateness (for customer-facing apps)

Example: Adding custom scores

from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# After generation completes, add custom score
abv.score(
    trace_id=trace_id,
    name="relevance",
    value=0.85,  # 0-1 scale
    comment="Response addressed all parts of the query"
)

abv.score(
    trace_id=trace_id,
    name="user_satisfaction",
    value=1.0,  # 1 = thumbs up, 0 = thumbs down
    comment="User clicked helpful button"
)

Best Practices:

Always link prompts to generations for metrics tracking
Track multiple metrics (latency, cost, quality) not just one
Use custom scores for domain-specific quality measures
Compare versions systematically using A/B tests
Monitor trends over time to catch regressions
Set up alerts for anomalies (cost spikes, latency increases)

See also:

Link Prompts to Traces for integration details
Evaluations for automated quality scoring

Next Steps

Get Started with Prompt Management

Complete quickstart guide for creating, versioning, and deploying prompts

Caching Prompts

Client-side caching implementation and stale-while-revalidate strategy

Version Control

Deploy and rollback prompts safely using labels and versions

A/B Testing Prompts

Run statistical A/B tests on prompt variants in production

Getting Started

Basic Features

LLM Gateway

Guardrails

Evaluations

Prompt Management

Cookbook

SDKs

Platform

Support

Prompt Management FAQ and Troubleshooting

How to Use This FAQ

Getting Started

Configuration & Setup

Performance & Reliability

Advanced Features

Integration & Tracing

Next Steps

Get Started with Prompt Management

Caching Prompts

Version Control

A/B Testing Prompts

Getting Started

Basic Features

LLM Gateway

Guardrails

Evaluations

Prompt Management

Cookbook

SDKs

Platform

Support

​How to Use This FAQ

​Getting Started

​Configuration & Setup

​Performance & Reliability

​Advanced Features

​Integration & Tracing

​Next Steps

Get Started with Prompt Management

Caching Prompts

Version Control

A/B Testing Prompts

How to Use This FAQ

Getting Started

Configuration & Setup

Performance & Reliability

Advanced Features

Integration & Tracing

Next Steps