Skip to main content
Client-side caching eliminates network latency when fetching prompts, ensuring your application remains fast even when making frequent prompt requests. ABV SDKs implement smart caching strategies that balance freshness with performance—serving prompts instantly from local cache while keeping the cache synchronized in the background.

How Client-Side Caching Works

Understanding the caching architecture and request flow:

Cache hit: Instant return from memory

When the SDK cache contains a fresh prompt (within TTL), it’s returned immediately without any network requests.Cache hit diagramPerformance:
  • Latency: Sub-millisecond (memory read)
  • Network requests: Zero
  • ABV API load: None
When this occurs: Every get_prompt() call after the first fetch, as long as the TTL hasn’t expired.Example:
# First call: Fetches from API, stores in cache
prompt = abv.get_prompt("movie-critic")  # ~20ms (network)

# Subsequent calls within TTL: Return from cache
prompt = abv.get_prompt("movie-critic")  # <1ms (memory)
prompt = abv.get_prompt("movie-critic")  # <1ms (memory)
# ... thousands more calls, all <1ms

Background revalidation: Stale-while-revalidate

When the cache TTL has expired, stale prompts are served immediately while revalidating in the background.Background revalidation diagramProcess:
  1. SDK detects cache entry has expired (past TTL)
  2. Immediately return the stale cached prompt (zero latency)
  3. Asynchronously fetch updated prompt from ABV API in the background
  4. Update cache with fresh prompt for next request
  5. Next request gets updated prompt (still from cache, still zero latency)
User experience: No latency impact. Users always get instant responses.Prompt freshness: New prompts appear after background revalidation completes (typically within 1-2 seconds of expiry).Why this matters: You get the best of both worlds—zero latency on every request, plus automatic updates when prompts change in ABV.

Cache miss: Fetch from multi-layer API cache

When no cached prompt exists (e.g., first application startup), the prompt is fetched from the API. The API itself caches prompts in Redis to ensure low latency.Cache miss diagramMulti-layer caching in ABV API:
  1. Redis cache (primary): Single-digit millisecond latency
  2. Database fallback: Tens of milliseconds if Redis unavailable
Resilience: Multiple fallback layers ensure availability even during infrastructure issues.Performance: First fetch typically completes in 10-30ms depending on your distance from ABV’s servers.When this occurs:
  • Application startup (cold start)
  • First use of a new prompt name
  • After explicitly disabling cache (cache_ttl_seconds=0)

Label changes propagate automatically

When you reassign labels (e.g., change production from version 3 to version 4), the cache updates automatically on the next revalidation.Deployment timeline:
  1. Reassign production label to new version in ABV UI
  2. SDK caches continue serving version 3 (within TTL)
  3. After TTL expiry, background revalidation fetches version 4
  4. Next request serves version 4 (new production version)
Propagation time: At most 1x TTL duration (default 60 seconds). Within 60 seconds, all application instances will have the new version cached.No code changes or restarts required: Your application automatically picks up the new prompt version.

Cache Configuration

Customize caching behavior based on your requirements:
Adjust the cache duration if you need different freshness/performance tradeoffs:Python SDK:
# Cache for 5 minutes (reduce API load for very stable prompts)
prompt = abv.get_prompt("movie-critic", cache_ttl_seconds=300)

# Cache for 10 seconds (faster propagation of prompt changes)
prompt = abv.get_prompt("movie-critic", cache_ttl_seconds=10)
JavaScript/TypeScript SDK:
// Cache for 5 minutes
const prompt1 = await abv.prompt.get("movie-critic", {
  cacheTtlSeconds: 300,
});

// Cache for 10 seconds
const prompt2 = await abv.prompt.get("movie-critic", {
  cacheTtlSeconds: 10,
});
Tradeoffs:
  • Longer TTL (300s): Better performance (fewer API requests), slower prompt updates
  • Shorter TTL (10s): Faster prompt updates, slightly more API requests (still minimal)
Recommendation:
  • Stable production prompts: 300-600 seconds (5-10 minutes)
  • Rapid iteration during development: 10-30 seconds
  • Default (60s): Good balance for most scenarios
Disable caching entirely for development environments where you want every fetch to return the latest prompt immediately.Python SDK:
# Fetch from API on every call (no caching)
prompt = abv.get_prompt("movie-critic", cache_ttl_seconds=0)

# Common in development: no cache + latest version
prompt = abv.get_prompt("movie-critic", cache_ttl_seconds=0, label="latest")
JavaScript/TypeScript SDK:
// Fetch from API on every call
const prompt1 = await abv.prompt.get("movie-critic", {
  cacheTtlSeconds: 0,
});

// Common in development: no cache + latest version
const prompt2 = await abv.prompt.get("movie-critic", {
  cacheTtlSeconds: 0,
  label: "latest",
});
When to use:
  • Local development: Iterate on prompts in ABV UI, see changes immediately in your application
  • Testing environments: Ensure tests always use the exact prompt version expected
  • Debugging: Eliminate caching as a variable when troubleshooting
Not recommended for production: Adds 10-30ms latency to every fetch and increases API load. Use default caching in production.

Optional: Pre-fetching on Application Startup

Pre-fetch prompts during application initialization to ensure the cache is populated before serving requests:
Consider pre-fetching if:
  • Your application is latency-sensitive and cannot tolerate even a single 20ms first fetch
  • You want to fail fast at startup if ABV is unreachable
  • You’re deploying to edge environments where cold starts are common
Skip pre-fetching if:
  • Your application has natural warmup time (background workers, long-lived servers)
  • A single 20ms fetch on first use is acceptable
  • You fetch many prompts (pre-fetching all of them adds startup time)
Typical scenario: Most applications don’t need pre-fetching. The minimal latency of the first fetch (10-30ms) is acceptable.
Pre-fetch diagramPython SDK:
from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

# Pre-fetch prompts during application initialization
def initialize_app():
    # Fetch and cache all prompts used by the application
    prompts = {
        "movie-critic": abv.get_prompt("movie-critic"),
        "summarizer": abv.get_prompt("summarizer"),
        "translator": abv.get_prompt("translator"),
    }

    print("Prompts pre-fetched and cached")
    return prompts

# Call during application startup
cached_prompts = initialize_app()

# Later during request handling: prompts return instantly from cache
prompt = abv.get_prompt("movie-critic")  # <1ms (cached)
JavaScript/TypeScript SDK:
import { ABVClient } from "@abvdev/client";

const abv = new ABVClient();

// Pre-fetch prompts during application initialization
async function initializeApp() {
  // Fetch and cache all prompts used by the application
  const prompts = {
    movieCritic: await abv.prompt.get("movie-critic"),
    summarizer: await abv.prompt.get("summarizer"),
    translator: await abv.prompt.get("translator"),
  };

  console.log("Prompts pre-fetched and cached");
  return prompts;
}

// Call during application startup
await initializeApp();

// Later during request handling: prompts return instantly from cache
const prompt = await abv.prompt.get("movie-critic");  // <1ms (cached)
Benefit: First user request has zero network latency for prompt fetching.Cost: Application startup is slightly slower (one network fetch per prompt).

Optional: Fallback Prompts for Maximum Availability

Provide fallback prompts to ensure 100% availability even if ABV’s API is completely unreachable:
Fallback diagramFallback prompts are needed when:
  • Both local cache is empty (e.g., fresh application startup)
  • AND ABV API is unavailable (network issues, ABV outage)
In practice, this is extremely rare because:
  • ABV’s prompts API is highly available (status page)
  • SDK-level caching means your application continues working during brief outages
  • Even if ABV goes down, cached prompts remain available for hours
Consider fallback prompts only if:
  • Your application is absolutely mission-critical with zero-downtime requirements
  • You cannot tolerate even a startup failure if ABV is unreachable
  • Regulatory or contractual obligations require offline operation
For most applications: Fallback prompts are unnecessary. The SDK cache provides sufficient resilience.
Fallback prompts are typically hardcoded strings used as a last resort when ABV is unreachable.Python SDK:
from abvdev import ABV

abv = ABV(api_key="sk-abv-...", host="https://app.abv.dev")

FALLBACK_PROMPTS = {
    "movie-critic": "As a movie critic, provide your review of {{movie}}.",
    "summarizer": "Summarize the following text: {{text}}",
}

def get_prompt_with_fallback(name: str):
    try:
        # Try to fetch from ABV (with caching)
        return abv.get_prompt(name)
    except Exception as e:
        print(f"ABV unavailable, using fallback for {name}: {e}")
        # Return fallback prompt
        return FallbackPrompt(FALLBACK_PROMPTS[name])

class FallbackPrompt:
    def __init__(self, template: str):
        self.template = template

    def compile(self, **variables):
        # Simple variable substitution
        result = self.template
        for key, value in variables.items():
            result = result.replace(f"{{{{{key}}}}}", str(value))
        return result

# Usage
prompt = get_prompt_with_fallback("movie-critic")
compiled = prompt.compile(movie="Dune 2")
JavaScript/TypeScript SDK:
import { ABVClient } from "@abvdev/client";

const abv = new ABVClient();

const FALLBACK_PROMPTS = {
  "movie-critic": "As a movie critic, provide your review of {{movie}}.",
  "summarizer": "Summarize the following text: {{text}}",
};

async function getPromptWithFallback(name: string) {
  try {
    // Try to fetch from ABV (with caching)
    return await abv.prompt.get(name);
  } catch (error) {
    console.log(`ABV unavailable, using fallback for ${name}:`, error);
    // Return fallback prompt
    return new FallbackPrompt(FALLBACK_PROMPTS[name]);
  }
}

class FallbackPrompt {
  constructor(private template: string) {}

  compile(variables: Record<string, string>): string {
    // Simple variable substitution
    let result = this.template;
    for (const [key, value] of Object.entries(variables)) {
      result = result.replace(`{{${key}}}`, value);
    }
    return result;
  }
}

// Usage
const prompt = await getPromptWithFallback("movie-critic");
const compiled = prompt.compile({ movie: "Dune 2" });
Important: Fallback prompts are never linked to traces (no metrics tracking when using fallbacks).Learn more about guaranteed availability →

Performance Benchmarks

Real-world performance measurements of prompt fetching with caching disabled:
We measured execution time of the following code with caching fully disabled (worst-case scenario):
prompt = abv.get_prompt("perf-test", cache_ttl_seconds=0)
prompt.compile(input="test")
Results from 1,000 sequential executions:Performance benchmarkKey findings:
  • Median latency: ~15-20ms (includes network round-trip)
  • p95 latency: ~30-40ms (accounting for network variability)
  • p99 latency: ~50-60ms (rare slow requests)
With caching enabled (default):
  • First fetch: 15-20ms (fetches from API, stores in cache)
  • All subsequent fetches: <1ms (from cache)
Conclusion: Even without caching, prompt fetching is fast (<50ms p99). With caching, it’s essentially free (<1ms) for all requests after the first.Run the benchmark yourself: Jupyter notebook

Next Steps