Skip to main content

How to Use This FAQ

This guide is organized by topic to help you quickly find answers: Each question includes working code examples for both Python and JavaScript/TypeScript SDKs.

Getting Started

ABV provides comprehensive prompt management through the UI, SDKs, and API.Creating Prompts:Via UI:
  1. Sign in to ABV
  2. Navigate to Prompts section
  3. Click “Create Prompt”
  4. Enter prompt content with {{variables}}
  5. Add configuration (model, temperature, etc.)
  6. Assign labels for deployment
Via Python SDK:
from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# Create text prompt
abv.create_prompt(
    name="movie-critic",
    type="text",
    prompt="As a {{criticlevel}} movie critic, do you like {{movie}}?",
    labels=["production"],
    config={
        "model": "gpt-4o",
        "temperature": 0.7,
        "max_tokens": 1000
    }
)

# Create chat prompt
abv.create_prompt(
    name="chat-assistant",
    type="chat",
    prompt=[
        {"role": "system", "content": "You are a {{persona}} assistant"},
        {"role": "user", "content": "{{user_query}}"}
    ],
    labels=["production"]
)
Via JavaScript/TypeScript SDK:
import { ABVClient } from "@abvdev/client";

const abv = new ABVClient();

await abv.prompt.create({
  name: "movie-critic",
  type: "text",
  prompt: "As a {{criticlevel}} critic, do you like {{movie}}?",
  labels: ["production"],
  config: {
    model: "gpt-4o",
    temperature: 0.7
  }
});
Fetching Prompts:
# Python
prompt = abv.get_prompt("movie-critic")  # Gets production version
prompt = abv.get_prompt("movie-critic", version=1)  # Specific version
prompt = abv.get_prompt("movie-critic", label="staging")  # Specific label
prompt = abv.get_prompt("movie-critic", label="latest")  # Latest version

# Compile with variables
compiled = prompt.compile(criticlevel="expert", movie="Dune 2")
// JavaScript/TypeScript
const prompt = await abv.prompt.get("movie-critic");
const prompt2 = await abv.prompt.get("movie-critic", { version: 1 });
const prompt3 = await abv.prompt.get("movie-critic", { label: "staging" });

// Compile with variables
const compiled = prompt.compile({ criticlevel: "expert", movie: "Dune 2" });
Updating Labels:
# Python
abv.update_prompt(
    name="movie-critic",
    version=2,
    new_labels=["production", "experiment-a"]
)
// JavaScript/TypeScript
await abv.prompt.update({
  name: "movie-critic",
  version: 2,
  newLabels: ["production", "experiment-a"]
});
Key Features:
  • Version control with automatic versioning
  • Labels for deployment management (production, staging, etc.)
  • Config versioning alongside prompts
  • Diff view to see changes between versions
  • Protected labels for production safety
  • Rollback capability with one click or API call
  • Variables with {{mustache}} syntax for dynamic content
Prompt engineering is the practice of designing and optimizing text prompts to get better outputs from Large Language Models (LLMs).Why it matters:
  • Better prompt = better LLM output quality
  • Can significantly impact accuracy, relevance, and usefulness
  • More cost-effective than fine-tuning models
  • Faster iteration cycle than model training
Key Techniques:1. Clear Instructions: Be specific about what you want, provide context and constraints, and define the output format.
Bad: "Write about dogs"

Good: "Write a 3-paragraph informative article about Golden Retrievers,
      including their history, temperament, and care requirements.
      Use a friendly tone suitable for first-time dog owners."
2. Few-Shot Examples: Show the model examples of desired output to establish patterns and format.
Classify sentiment:

Text: "I love this product!"
Sentiment: Positive

Text: "This is terrible"
Sentiment: Negative

Text: "{{user_input}}"
Sentiment:
3. Role/Persona: Define who the LLM should act as, which influences tone and expertise level.
You are an expert software architect with 15 years of experience
in distributed systems. Analyze the following system design...
4. Chain of Thought: Ask the model to think step-by-step to improve reasoning and accuracy.
Solve this problem step by step, showing your reasoning:
{{problem}}
5. Constraints and Format: Specify output format (JSON, markdown, etc.), set length limits, and define what to avoid.
Respond in JSON format with keys: "summary", "key_points", "confidence_score".
Keep summary under 100 words.
Do not include personal opinions.
ABV’s Role in Prompt Engineering:ABV helps you iterate on prompts systematically:
  • Version control to track changes and compare iterations
  • A/B testing to compare variants with statistical rigor
  • Metrics tracking to measure improvements objectively
  • Tracing to see prompts in context with real user interactions
  • Team collaboration via UI for cross-functional input
  • Quick rollbacks when changes don’t work as expected
Best Practices:
  1. Start simple, then iterate based on results
  2. Test with diverse inputs representing edge cases
  3. Measure performance metrics (latency, cost, quality)
  4. Use version control to track what works
  5. A/B test significant changes in production
  6. Document what works and why for team knowledge
  7. Keep prompts maintainable and readable for future iterations

Configuration & Setup

ABV prompts are cached client-side by default, so network-related issues are minimized after the first fetch. However, you can configure network behavior for initial requests.Caching Configuration:The default cache TTL is 60 seconds. You can customize this to reduce network calls:Python SDK:
from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# Increase cache duration to reduce network calls
prompt = abv.get_prompt("my-prompt", cache_ttl_seconds=300)  # 5 minutes

# Disable caching for development (see all changes immediately)
prompt = abv.get_prompt("my-prompt", cache_ttl_seconds=0)
JavaScript/TypeScript SDK:
import { ABVClient } from "@abvdev/client";

const abv = new ABVClient();

// Increase cache duration
const prompt = await abv.prompt.get("my-prompt", {
  cacheTtlSeconds: 300  // 5 minutes
});

// Disable caching for development
const prompt = await abv.prompt.get("my-prompt", {
  cacheTtlSeconds: 0
});
Guaranteed Availability:For critical applications requiring 100% availability, use these strategies:1. Pre-fetch prompts on startup to populate the cache:
from abvdev import ABV
import sys

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# Pre-fetch during application startup
startup_prompts = ["critical-prompt-1", "critical-prompt-2"]

try:
    for prompt_name in startup_prompts:
        abv.get_prompt(prompt_name)
    print("All critical prompts cached successfully")
except Exception as e:
    print(f"CRITICAL: Failed to fetch prompts: {e}")
    sys.exit(1)  # Fail fast if prompts unavailable
2. Provide fallback prompts for when the API is unavailable:
from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# Use with fallback
try:
    prompt = abv.get_prompt("my-prompt")
    prompt_text = prompt.compile(input="user query")
except Exception as e:
    # Log error and use fallback
    print(f"Warning: Using fallback prompt due to: {e}")
    prompt_text = "As a helpful assistant, respond to: {{input}}"
How caching works:
  • Cache hit: Prompt returned immediately from memory (no network call)
  • Stale cache: Old prompt returned immediately while revalidating in background (stale-while-revalidate pattern)
  • Cache miss: Prompt fetched from API (ABV uses Redis cache for low latency ~15-50ms median)
See also: Guaranteed Availability Guide for comprehensive strategies.

Performance & Reliability

ABV prompts are automatically cached client-side in the SDKs with intelligent background revalidation, ensuring minimal latency impact.How Caching Works:
  1. Cache Hit - Prompt in cache and fresh → returned immediately (0ms network overhead)
  2. Stale Cache - Prompt in cache but expired → returned immediately, revalidated in background
  3. Cache Miss - First request → fetched from API (low latency, Redis-backed on ABV side)
Default Behavior:
# Python - Default 60-second cache
from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# First call - fetches from API and caches
prompt = abv.get_prompt("my-prompt")

# Subsequent calls within 60s - instant from cache
prompt = abv.get_prompt("my-prompt")  # No network call
Custom Cache Duration:Python SDK:
# Cache for 5 minutes
prompt = abv.get_prompt("my-prompt", cache_ttl_seconds=300)

# Cache for 1 hour (for stable production prompts)
prompt = abv.get_prompt("my-prompt", cache_ttl_seconds=3600)

# Disable caching (development/testing)
prompt = abv.get_prompt("my-prompt", cache_ttl_seconds=0)

# Common pattern: no cache + latest version in development
prompt = abv.get_prompt(
    "my-prompt",
    cache_ttl_seconds=0,
    label="latest"
)
JavaScript/TypeScript SDK:
import { ABVClient } from "@abvdev/client";

const abv = new ABVClient();

// Cache for 5 minutes
const prompt1 = await abv.prompt.get("my-prompt", {
  cacheTtlSeconds: 300
});

// Cache for 1 hour
const prompt2 = await abv.prompt.get("my-prompt", {
  cacheTtlSeconds: 3600
});

// Disable caching
const prompt3 = await abv.prompt.get("my-prompt", {
  cacheTtlSeconds: 0
});

// Development pattern
const devPrompt = await abv.prompt.get("my-prompt", {
  cacheTtlSeconds: 0,
  label: "latest"
});
Pre-fetching for Zero Latency:Load prompts during application startup to eliminate runtime latency:
# Python
from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# Pre-fetch during startup
critical_prompts = [
    "user-greeting",
    "error-handler",
    "main-assistant"
]

for prompt_name in critical_prompts:
    abv.get_prompt(prompt_name)  # Populates cache

# Now runtime requests are instant (0ms)
// JavaScript/TypeScript
const abv = new ABVClient();

// Pre-fetch during startup
const criticalPrompts = [
  "user-greeting",
  "error-handler",
  "main-assistant"
];

await Promise.all(
  criticalPrompts.map(name => abv.prompt.get(name))
);

// Now runtime requests are instant
Fallback for 100% Availability:
# Python
from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

def get_prompt_with_fallback(name: str, fallback: str):
    try:
        return abv.get_prompt(name)
    except Exception as e:
        # Log error for monitoring
        print(f"WARNING: Failed to fetch prompt '{name}': {e}")
        # Return fallback prompt
        return type('Prompt', (), {'prompt': fallback, 'compile': lambda **kw: fallback})()

# Usage
prompt = get_prompt_with_fallback(
    "my-prompt",
    fallback="You are a helpful assistant. {{user_input}}"
)
Performance Benchmarks:From ABV’s testing (1000 sequential requests):Without caching (cache_ttl_seconds=0):
  • Median latency: ~50ms
  • 95th percentile: ~100ms
  • 99th percentile: ~150ms
With caching enabled (default):
  • Cached requests: 0ms (instant, in-memory)
  • Stale-while-revalidate: 0ms (instant return, background update)
Best Practices:
  1. Production: Use default 60s cache or longer (5-10 minutes) for stable prompts
  2. Development: Disable cache to see changes immediately
  3. Critical paths: Pre-fetch prompts on application startup
  4. High availability: Implement fallback prompts for mission-critical flows
  5. Staging: Use moderate cache (30-60s) for balance between freshness and performance
  6. Monitor: Check ABV status page (status.abv.dev) for API availability
When to Adjust Cache TTL:
  • Increase TTL: Stable production prompts, reduce API calls, improve performance
  • Decrease TTL: Frequently updated prompts, need faster updates
  • Disable (0s): Local development, testing prompt changes in real-time
  • Pre-fetch: Startup-critical prompts, serverless cold start optimization
See also: Client-Side Caching Guide for technical implementation details.

Advanced Features

ABV provides built-in version control for all prompts with automatic versioning and label-based deployment.Automatic Versioning:Every time you create or update a prompt, ABV automatically assigns an incrementing version number:
# Python
from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# First creation - becomes version 1
abv.create_prompt(
    name="movie-critic",
    prompt="Do you like {{movie}}?",
    labels=["production"]
)

# Update (create new version) - becomes version 2
abv.create_prompt(
    name="movie-critic",
    prompt="As a critic, do you like {{movie}}?",
    labels=["staging"]
)
Labels for Deployment:Use labels to manage which version is deployed to different environments:
# Assign labels to versions
abv.update_prompt(
    name="movie-critic",
    version=1,
    new_labels=["production"]
)

abv.update_prompt(
    name="movie-critic",
    version=2,
    new_labels=["staging", "experiment-a"]
)
Fetching Specific Versions:
# Python
# Get production version (default behavior)
prod_prompt = abv.get_prompt("movie-critic")
prod_prompt = abv.get_prompt("movie-critic", label="production")

# Get staging version
staging_prompt = abv.get_prompt("movie-critic", label="staging")

# Get specific version number
v1_prompt = abv.get_prompt("movie-critic", version=1)

# Get latest version (most recent, regardless of labels)
latest_prompt = abv.get_prompt("movie-critic", label="latest")
// JavaScript/TypeScript
const prodPrompt = await abv.prompt.get("movie-critic");
const stagingPrompt = await abv.prompt.get("movie-critic", { label: "staging" });
const v1Prompt = await abv.prompt.get("movie-critic", { version: 1 });
const latestPrompt = await abv.prompt.get("movie-critic", { label: "latest" });
Version Comparison:The ABV UI provides a diff view to compare prompt versions:
  • See exactly what changed between versions (text diff)
  • Track who made changes and when (audit trail)
  • Review config changes alongside prompt changes
  • View commit messages explaining why changes were made
Rollback:To rollback to a previous version, simply reassign the production label:
# Rollback: make version 1 the production version again
abv.update_prompt(
    name="movie-critic",
    version=1,
    new_labels=["production"]
)
Or perform the rollback in the UI with one click.Protected Labels:For additional production safety, admins can mark labels as “protected”:
  • Only admins/owners can modify protected labels
  • Prevents accidental changes to production prompts
  • Enforces change management process
  • Configure in project settings
Best Practices:
  1. Always use production label for deployed versions
  2. Use staging for testing before promoting to production
  3. Use descriptive labels for experiments (e.g., experiment-longer-context, variant-a)
  4. The latest label is automatically maintained by ABV (always points to newest version)
  5. Never delete old versions - keep history for debugging and rollback
  6. Use commit messages to document why changes were made
  7. Review diffs before promoting to production to catch unintended changes
Common Workflow:
  1. Develop prompt changes locally (use label="latest" and cache_ttl_seconds=0)
  2. Deploy to staging (labels=["staging"])
  3. Test in staging environment
  4. Review metrics and validate quality
  5. Promote to production by reassigning production label
  6. Monitor production metrics
  7. Rollback if issues detected (reassign production to previous version)
See also: Version Control Guide for deployment workflows.
ABV enables A/B testing by using labels to identify different prompt variants, then randomly selecting between them in your application.Step 1: Create Prompt VariantsCreate multiple versions and label them for your test:
from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# Variant A - shorter prompt
abv.create_prompt(
    name="movie-critic",
    prompt="Do you like {{movie}}?",
    labels=["prod-a"]
)

# Variant B - more detailed prompt
abv.create_prompt(
    name="movie-critic",
    prompt="As an expert film critic, provide your opinion on {{movie}}. Include analysis of the plot, acting, and cinematography.",
    labels=["prod-b"]
)
Step 2: Implement Random SelectionPython SDK:
from abvdev import ABV
from openai import OpenAI
import random

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")
openai_client = OpenAI(api_key="sk-proj-...")

# Fetch both variants (cached after first request)
prompt_a = abv.get_prompt("movie-critic", label="prod-a")
prompt_b = abv.get_prompt("movie-critic", label="prod-b")

# Randomly select (50/50 split)
selected_prompt = random.choice([prompt_a, prompt_b])

# Use in LLM call with tracing (crucial for metrics by variant)
with abv.start_as_current_observation(
    as_type="generation",
    name="movie-review",
    model="gpt-4o",
    prompt=selected_prompt  # Links to specific variant for metrics tracking
) as generation:
    compiled = selected_prompt.compile(movie="Dune 2")

    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": compiled}]
    )

    generation.update(
        output=response.choices[0].message.content,
        usage_details={
            "input": response.usage.input_tokens,
            "output": response.usage.output_tokens
        }
    )

abv.flush()
JavaScript/TypeScript SDK:
import { ABVClient } from "@abvdev/client";
import { startObservation } from "@abvdev/tracing";
import OpenAI from "openai";

const abv = new ABVClient();
const openai = new OpenAI();

async function runABTest() {
  // Fetch both variants
  const promptA = await abv.prompt.get("movie-critic", { label: "prod-a" });
  const promptB = await abv.prompt.get("movie-critic", { label: "prod-b" });

  // Randomly select (50/50 split)
  const selectedPrompt = Math.random() < 0.5 ? promptA : promptB;

  // Use in LLM call with tracing
  const generation = startObservation(
    "movie-review",
    {
      model: "gpt-4o",
      prompt: selectedPrompt  // Links to specific variant
    },
    { asType: "generation" }
  );

  const compiled = selectedPrompt.compile({ movie: "Dune 2" });

  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: compiled }]
  });

  generation.update({
    output: response.choices[0].message.content,
    usageDetails: {
      prompt_tokens: response.usage.prompt_tokens,
      completion_tokens: response.usage.completion_tokens
    }
  });

  generation.end();
}
Step 3: Analyze ResultsNavigate to your prompt in the ABV UI and view the Metrics tab:Compare Metrics by Variant:
  • Response latency (median, p95, p99)
  • Token usage (input tokens, output tokens)
  • Cost per request
  • Quality scores (if you’re scoring responses via evaluations)
  • Volume/distribution between variants
Statistical Significance:
  • Run tests long enough to gather sufficient data (minimum 100-200 requests per variant)
  • Use statistical tests (t-test, Mann-Whitney U) to determine significance
  • Consider using staged rollout (90/10 split initially) for safety
See also: A/B Testing Guide for statistical rigor and best practices.Advanced: Weighted Distribution
import random

# 90% variant A, 10% variant B (canary deployment)
selected_prompt = prompt_a if random.random() < 0.9 else prompt_b

# 80/20 split
selected_prompt = prompt_a if random.random() < 0.8 else prompt_b
Best Practices:
  1. Start with canary deployment (90/10 or 95/5) to limit blast radius
  2. Monitor error rates and user feedback closely during initial rollout
  3. Use A/B testing for significant changes (major rewrites, different approaches)
  4. Run tests long enough for statistical significance (don’t stop early)
  5. Consider user segments (test on subset of users first)
  6. Have rollback plan ready (can immediately switch back to variant A)
  7. Track multiple metrics (not just one - latency, cost, quality, user satisfaction)
  8. Document test hypotheses and results for organizational learning
When to Use A/B Testing:
  • Testing prompt improvements in production with real users
  • Validating changes before full rollout
  • When evaluation datasets don’t capture real usage patterns
  • For consumer apps where some variation is acceptable
  • After thorough testing on evaluation datasets (A/B test is final validation)

Integration & Tracing

ABV provides comprehensive metrics when you link prompts to traces, enabling performance tracking by prompt version.Step 1: Link prompts to generationsPython SDK:
from abvdev import ABV, observe

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# With decorator
@observe(as_type="generation")
def call_llm():
    prompt = abv.get_prompt("my-prompt")

    abv.update_current_generation(
        prompt=prompt,
        model="gpt-4o"
    )

    # Your LLM call here
    response = openai_client.chat.completions.create(...)

    abv.update_current_generation(
        output=response.choices[0].message.content,
        usage_details={
            "input": response.usage.input_tokens,
            "output": response.usage.output_tokens,
        }
    )

# With context manager
prompt = abv.get_prompt("my-prompt")

with abv.start_as_current_observation(
    as_type='generation',
    name="llm-call",
    model="gpt-4o",
    prompt=prompt  # Link prompt for metrics
) as generation:
    # Your LLM call
    generation.update(output="response")
JavaScript/TypeScript SDK:
import { ABVClient } from "@abvdev/client";
import { startObservation } from "@abvdev/tracing";

const abv = new ABVClient();
const prompt = await abv.prompt.get("my-prompt");

const generation = startObservation(
  "llm-call",
  {
    model: "gpt-4o",
    input: prompt.prompt,
    prompt: prompt  // Link the prompt for metrics
  },
  { asType: "generation" }
);

// ... LLM call ...

generation.update({
  output: "response",
  usageDetails: { /* token counts */ }
}).end();
Step 2: View metrics in ABV UINavigate to your prompt in the ABV UI and click the Metrics tab to see:Available Metrics:
  • Median generation latency - How long generations take
  • Median input tokens - Token count for prompts sent to LLM
  • Median output tokens - Token count for LLM responses
  • Median generation costs - Cost per generation (based on model pricing)
  • Generation count - Total number of generations using this prompt
  • Median score values - From evaluations or custom scores
  • First and last generation timestamps - When prompt was first/last used
Compare versions:
  • Use the UI to compare metrics across different prompt versions
  • A/B test variants to see which performs better
  • Track improvements over time as you iterate on prompts
Custom metrics: Add custom scores via the Scores API to track domain-specific metrics:
  • Accuracy (for tasks with right/wrong answers)
  • Relevance (how well response addresses the query)
  • User satisfaction (thumbs up/down, star ratings)
  • Hallucination rate (factual correctness)
  • Tone appropriateness (for customer-facing apps)
Example: Adding custom scores
from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# After generation completes, add custom score
abv.score(
    trace_id=trace_id,
    name="relevance",
    value=0.85,  # 0-1 scale
    comment="Response addressed all parts of the query"
)

abv.score(
    trace_id=trace_id,
    name="user_satisfaction",
    value=1.0,  # 1 = thumbs up, 0 = thumbs down
    comment="User clicked helpful button"
)
Best Practices:
  1. Always link prompts to generations for metrics tracking
  2. Track multiple metrics (latency, cost, quality) not just one
  3. Use custom scores for domain-specific quality measures
  4. Compare versions systematically using A/B tests
  5. Monitor trends over time to catch regressions
  6. Set up alerts for anomalies (cost spikes, latency increases)
See also:

Next Steps