Client-side caching eliminates network latency when fetching prompts, ensuring your application remains fast even when making frequent prompt requests. ABV SDKs implement smart caching strategies that balance freshness with performance—serving prompts instantly from local cache while keeping the cache synchronized in the background.

How Client-Side Caching Works

Understanding the caching architecture and request flow:

Cache hit: Instant return from memory

When the SDK cache contains a fresh prompt (within TTL), it’s returned immediately without any network requests. Cache hit diagram

Performance:

Latency: Sub-millisecond (memory read)
Network requests: Zero
ABV API load: None

When this occurs: Every get_prompt() call after the first fetch, as long as the TTL hasn’t expired.Example:

# First call: Fetches from API, stores in cache
prompt = abv.get_prompt("movie-critic")  # ~20ms (network)

# Subsequent calls within TTL: Return from cache
prompt = abv.get_prompt("movie-critic")  # <1ms (memory)
prompt = abv.get_prompt("movie-critic")  # <1ms (memory)
# ... thousands more calls, all <1ms

Background revalidation: Stale-while-revalidate

When the cache TTL has expired, stale prompts are served immediately while revalidating in the background. Background revalidation diagram

Process:

SDK detects cache entry has expired (past TTL)
Immediately return the stale cached prompt (zero latency)
Asynchronously fetch updated prompt from ABV API in the background
Update cache with fresh prompt for next request
Next request gets updated prompt (still from cache, still zero latency)

User experience: No latency impact. Users always get instant responses.Prompt freshness: New prompts appear after background revalidation completes (typically within 1-2 seconds of expiry).Why this matters: You get the best of both worlds—zero latency on every request, plus automatic updates when prompts change in ABV.

Cache miss: Fetch from multi-layer API cache

When no cached prompt exists (e.g., first application startup), the prompt is fetched from the API. The API itself caches prompts in Redis to ensure low latency. Cache miss diagram

Multi-layer caching in ABV API:

Redis cache (primary): Single-digit millisecond latency
Database fallback: Tens of milliseconds if Redis unavailable

Resilience: Multiple fallback layers ensure availability even during infrastructure issues.Performance: First fetch typically completes in 10-30ms depending on your distance from ABV’s servers.When this occurs:

Application startup (cold start)
First use of a new prompt name
After explicitly disabling cache (cache_ttl_seconds=0)

Label changes propagate automatically

When you reassign labels (e.g., change production from version 3 to version 4), the cache updates automatically on the next revalidation.Deployment timeline:

Reassign production label to new version in ABV UI
SDK caches continue serving version 3 (within TTL)
After TTL expiry, background revalidation fetches version 4
Next request serves version 4 (new production version)

Propagation time: At most 1x TTL duration (default 60 seconds). Within 60 seconds, all application instances will have the new version cached.No code changes or restarts required: Your application automatically picks up the new prompt version.

Cache Configuration

Customize caching behavior based on your requirements:

Default Caching (Recommended)

By default, prompts are cached for 60 seconds with background revalidation.Default behavior:

TTL: 60 seconds
Strategy: Stale-while-revalidate
Freshness guarantee: Prompts update within 60 seconds of changes
Performance: Sub-millisecond after first fetch

Python SDK:

from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# Uses default 60-second TTL
prompt = abv.get_prompt("movie-critic")

JavaScript/TypeScript SDK:

import { ABVClient } from "@abvdev/client";

const abv = new ABVClient();

// Uses default 60-second TTL
const prompt = await abv.prompt.get("movie-critic");

When to use: Most production scenarios. 60-second freshness is sufficient for typical prompt iteration cycles, and performance is optimal.

Custom Cache TTL

Adjust the cache duration if you need different freshness/performance tradeoffs:Python SDK:

# Cache for 5 minutes (reduce API load for very stable prompts)
prompt = abv.get_prompt("movie-critic", cache_ttl_seconds=300)

# Cache for 10 seconds (faster propagation of prompt changes)
prompt = abv.get_prompt("movie-critic", cache_ttl_seconds=10)

JavaScript/TypeScript SDK:

// Cache for 5 minutes
const prompt1 = await abv.prompt.get("movie-critic", {
  cacheTtlSeconds: 300,
});

// Cache for 10 seconds
const prompt2 = await abv.prompt.get("movie-critic", {
  cacheTtlSeconds: 10,
});

Tradeoffs:

Longer TTL (300s): Better performance (fewer API requests), slower prompt updates
Shorter TTL (10s): Faster prompt updates, slightly more API requests (still minimal)

Recommendation:

Stable production prompts: 300-600 seconds (5-10 minutes)
Rapid iteration during development: 10-30 seconds
Default (60s): Good balance for most scenarios

Disable Caching (Development)

Disable caching entirely for development environments where you want every fetch to return the latest prompt immediately.Python SDK:

# Fetch from API on every call (no caching)
prompt = abv.get_prompt("movie-critic", cache_ttl_seconds=0)

# Common in development: no cache + latest version
prompt = abv.get_prompt("movie-critic", cache_ttl_seconds=0, label="latest")

JavaScript/TypeScript SDK:

// Fetch from API on every call
const prompt1 = await abv.prompt.get("movie-critic", {
  cacheTtlSeconds: 0,
});

// Common in development: no cache + latest version
const prompt2 = await abv.prompt.get("movie-critic", {
  cacheTtlSeconds: 0,
  label: "latest",
});

When to use:

Local development: Iterate on prompts in ABV UI, see changes immediately in your application
Testing environments: Ensure tests always use the exact prompt version expected
Debugging: Eliminate caching as a variable when troubleshooting

Not recommended for production: Adds 10-30ms latency to every fetch and increases API load. Use default caching in production.

Optional: Pre-fetching on Application Startup

Pre-fetch prompts during application initialization to ensure the cache is populated before serving requests:

When to Pre-fetch

Consider pre-fetching if:

Your application is latency-sensitive and cannot tolerate even a single 20ms first fetch
You want to fail fast at startup if ABV is unreachable
You’re deploying to edge environments where cold starts are common

Skip pre-fetching if:

Your application has natural warmup time (background workers, long-lived servers)
A single 20ms fetch on first use is acceptable
You fetch many prompts (pre-fetching all of them adds startup time)

Typical scenario: Most applications don’t need pre-fetching. The minimal latency of the first fetch (10-30ms) is acceptable.

Pre-fetching Implementation

Python SDK:

from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# Pre-fetch prompts during application initialization
def initialize_app():
    # Fetch and cache all prompts used by the application
    prompts = {
        "movie-critic": abv.get_prompt("movie-critic"),
        "summarizer": abv.get_prompt("summarizer"),
        "translator": abv.get_prompt("translator"),
    }

    print("Prompts pre-fetched and cached")
    return prompts

# Call during application startup
cached_prompts = initialize_app()

# Later during request handling: prompts return instantly from cache
prompt = abv.get_prompt("movie-critic")  # <1ms (cached)

JavaScript/TypeScript SDK:

import { ABVClient } from "@abvdev/client";

const abv = new ABVClient();

// Pre-fetch prompts during application initialization
async function initializeApp() {
  // Fetch and cache all prompts used by the application
  const prompts = {
    movieCritic: await abv.prompt.get("movie-critic"),
    summarizer: await abv.prompt.get("summarizer"),
    translator: await abv.prompt.get("translator"),
  };

  console.log("Prompts pre-fetched and cached");
  return prompts;
}

// Call during application startup
await initializeApp();

// Later during request handling: prompts return instantly from cache
const prompt = await abv.prompt.get("movie-critic");  // <1ms (cached)

Benefit: First user request has zero network latency for prompt fetching.Cost: Application startup is slightly slower (one network fetch per prompt).

Optional: Fallback Prompts for Maximum Availability

Provide fallback prompts to ensure 100% availability even if ABV’s API is completely unreachable:

When Fallback Prompts Are Necessary

Fallback prompts are needed when:

Both local cache is empty (e.g., fresh application startup)
AND ABV API is unavailable (network issues, ABV outage)

In practice, this is extremely rare because:

ABV’s prompts API is highly available (status page)
SDK-level caching means your application continues working during brief outages
Even if ABV goes down, cached prompts remain available for hours

Consider fallback prompts only if:

Your application is absolutely mission-critical with zero-downtime requirements
You cannot tolerate even a startup failure if ABV is unreachable
Regulatory or contractual obligations require offline operation

For most applications: Fallback prompts are unnecessary. The SDK cache provides sufficient resilience.

Implementing Fallback Prompts

Fallback prompts are typically hardcoded strings used as a last resort when ABV is unreachable.Python SDK:

from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

FALLBACK_PROMPTS = {
    "movie-critic": "As a movie critic, provide your review of {{movie}}.",
    "summarizer": "Summarize the following text: {{text}}",
}

def get_prompt_with_fallback(name: str):
    try:
        # Try to fetch from ABV (with caching)
        return abv.get_prompt(name)
    except Exception as e:
        print(f"ABV unavailable, using fallback for {name}: {e}")
        # Return fallback prompt
        return FallbackPrompt(FALLBACK_PROMPTS[name])

class FallbackPrompt:
    def __init__(self, template: str):
        self.template = template

    def compile(self, **variables):
        # Simple variable substitution
        result = self.template
        for key, value in variables.items():
            result = result.replace(f"{{{{{key}}}}}", str(value))
        return result

# Usage
prompt = get_prompt_with_fallback("movie-critic")
compiled = prompt.compile(movie="Dune 2")

JavaScript/TypeScript SDK:

import { ABVClient } from "@abvdev/client";

const abv = new ABVClient();

const FALLBACK_PROMPTS = {
  "movie-critic": "As a movie critic, provide your review of {{movie}}.",
  "summarizer": "Summarize the following text: {{text}}",
};

async function getPromptWithFallback(name: string) {
  try {
    // Try to fetch from ABV (with caching)
    return await abv.prompt.get(name);
  } catch (error) {
    console.log(`ABV unavailable, using fallback for ${name}:`, error);
    // Return fallback prompt
    return new FallbackPrompt(FALLBACK_PROMPTS[name]);
  }
}

class FallbackPrompt {
  constructor(private template: string) {}

  compile(variables: Record<string, string>): string {
    // Simple variable substitution
    let result = this.template;
    for (const [key, value] of Object.entries(variables)) {
      result = result.replace(`{{${key}}}`, value);
    }
    return result;
  }
}

// Usage
const prompt = await getPromptWithFallback("movie-critic");
const compiled = prompt.compile({ movie: "Dune 2" });

Important: Fallback prompts are never linked to traces (no metrics tracking when using fallbacks).Learn more about guaranteed availability →

Performance Benchmarks

Real-world performance measurements of prompt fetching with caching disabled:

First Fetch Performance

We measured execution time of the following code with caching fully disabled (worst-case scenario):

prompt = abv.get_prompt("perf-test", cache_ttl_seconds=0)
prompt.compile(input="test")

Results from 1,000 sequential executions: Performance benchmark

Key findings:

Median latency: ~15-20ms (includes network round-trip)
p95 latency: ~30-40ms (accounting for network variability)
p99 latency: ~50-60ms (rare slow requests)

With caching enabled (default):

First fetch: 15-20ms (fetches from API, stores in cache)
All subsequent fetches: <1ms (from cache)

Conclusion: Even without caching, prompt fetching is fast (<50ms p99). With caching, it’s essentially free (<1ms) for all requests after the first.Run the benchmark yourself: Jupyter notebook

Next Steps

Guaranteed Availability

Implement fallback prompts for maximum resilience

Get Started with Prompts

Create and fetch your first prompt

Version Control

Understand how label changes propagate through the cache

Configuration

Store model parameters alongside prompts

Getting Started

Basic Features

LLM Gateway

Guardrails

Evaluations

Prompt Management

Cookbook

SDKs

Platform

Support

Client-Side Prompt Caching

How Client-Side Caching Works

Cache Configuration

Optional: Pre-fetching on Application Startup

Optional: Fallback Prompts for Maximum Availability

Performance Benchmarks

Next Steps

Guaranteed Availability

Get Started with Prompts

Version Control

Configuration

Getting Started

Basic Features

LLM Gateway

Guardrails

Evaluations

Prompt Management

Cookbook

SDKs

Platform

Support

​How Client-Side Caching Works

​Cache Configuration

​Optional: Pre-fetching on Application Startup

​Optional: Fallback Prompts for Maximum Availability

​Performance Benchmarks

​Next Steps

Guaranteed Availability

Get Started with Prompts

Version Control

Configuration

How Client-Side Caching Works

Cache Configuration

Optional: Pre-fetching on Application Startup

Optional: Fallback Prompts for Maximum Availability

Performance Benchmarks

Next Steps