How Client-Side Caching Works
Understanding the caching architecture and request flow:Cache hit: Instant return from memory
When the SDK cache contains a fresh prompt (within TTL), it’s returned immediately without any network requests.
Performance:
Performance:- Latency: Sub-millisecond (memory read)
- Network requests: Zero
- ABV API load: None
get_prompt() call after the first fetch, as long as the TTL hasn’t expired.Example:Background revalidation: Stale-while-revalidate
When the cache TTL has expired, stale prompts are served immediately while revalidating in the background.
Process:
Process:- SDK detects cache entry has expired (past TTL)
- Immediately return the stale cached prompt (zero latency)
- Asynchronously fetch updated prompt from ABV API in the background
- Update cache with fresh prompt for next request
- Next request gets updated prompt (still from cache, still zero latency)
Cache miss: Fetch from multi-layer API cache
When no cached prompt exists (e.g., first application startup), the prompt is fetched from the API. The API itself caches prompts in Redis to ensure low latency.
Multi-layer caching in ABV API:
Multi-layer caching in ABV API:- Redis cache (primary): Single-digit millisecond latency
- Database fallback: Tens of milliseconds if Redis unavailable
- Application startup (cold start)
- First use of a new prompt name
- After explicitly disabling cache (
cache_ttl_seconds=0)
Label changes propagate automatically
When you reassign labels (e.g., change
production from version 3 to version 4), the cache updates automatically on the next revalidation.Deployment timeline:- Reassign
productionlabel to new version in ABV UI - SDK caches continue serving version 3 (within TTL)
- After TTL expiry, background revalidation fetches version 4
- Next request serves version 4 (new production version)
Cache Configuration
Customize caching behavior based on your requirements:Default Caching (Recommended)
Default Caching (Recommended)
By default, prompts are cached for 60 seconds with background revalidation.Default behavior:JavaScript/TypeScript SDK:When to use: Most production scenarios. 60-second freshness is sufficient for typical prompt iteration cycles, and performance is optimal.
- TTL: 60 seconds
- Strategy: Stale-while-revalidate
- Freshness guarantee: Prompts update within 60 seconds of changes
- Performance: Sub-millisecond after first fetch
Custom Cache TTL
Custom Cache TTL
Adjust the cache duration if you need different freshness/performance tradeoffs:Python SDK:JavaScript/TypeScript SDK:Tradeoffs:
- Longer TTL (300s): Better performance (fewer API requests), slower prompt updates
- Shorter TTL (10s): Faster prompt updates, slightly more API requests (still minimal)
- Stable production prompts: 300-600 seconds (5-10 minutes)
- Rapid iteration during development: 10-30 seconds
- Default (60s): Good balance for most scenarios
Disable Caching (Development)
Disable Caching (Development)
Disable caching entirely for development environments where you want every fetch to return the latest prompt immediately.Python SDK:JavaScript/TypeScript SDK:When to use:
- Local development: Iterate on prompts in ABV UI, see changes immediately in your application
- Testing environments: Ensure tests always use the exact prompt version expected
- Debugging: Eliminate caching as a variable when troubleshooting
Optional: Pre-fetching on Application Startup
Pre-fetch prompts during application initialization to ensure the cache is populated before serving requests:When to Pre-fetch
When to Pre-fetch
Consider pre-fetching if:
- Your application is latency-sensitive and cannot tolerate even a single 20ms first fetch
- You want to fail fast at startup if ABV is unreachable
- You’re deploying to edge environments where cold starts are common
- Your application has natural warmup time (background workers, long-lived servers)
- A single 20ms fetch on first use is acceptable
- You fetch many prompts (pre-fetching all of them adds startup time)
Pre-fetching Implementation
Pre-fetching Implementation
Python SDK:Optional: Fallback Prompts for Maximum Availability
Provide fallback prompts to ensure 100% availability even if ABV’s API is completely unreachable:When Fallback Prompts Are Necessary
When Fallback Prompts Are Necessary
Fallback prompts are needed when:- Both local cache is empty (e.g., fresh application startup)
- AND ABV API is unavailable (network issues, ABV outage)
- ABV’s prompts API is highly available (status page)
- SDK-level caching means your application continues working during brief outages
- Even if ABV goes down, cached prompts remain available for hours
- Your application is absolutely mission-critical with zero-downtime requirements
- You cannot tolerate even a startup failure if ABV is unreachable
- Regulatory or contractual obligations require offline operation
Implementing Fallback Prompts
Implementing Fallback Prompts
Fallback prompts are typically hardcoded strings used as a last resort when ABV is unreachable.Python SDK:JavaScript/TypeScript SDK:Important: Fallback prompts are never linked to traces (no metrics tracking when using fallbacks).Learn more about guaranteed availability →
Performance Benchmarks
Real-world performance measurements of prompt fetching with caching disabled:First Fetch Performance
First Fetch Performance
We measured execution time of the following code with caching fully disabled (worst-case scenario):Results from 1,000 sequential executions:
Key findings:
Key findings:- Median latency: ~15-20ms (includes network round-trip)
- p95 latency: ~30-40ms (accounting for network variability)
- p99 latency: ~50-60ms (rare slow requests)
- First fetch: 15-20ms (fetches from API, stores in cache)
- All subsequent fetches:
<1ms(from cache)
<50ms p99). With caching, it’s essentially free (<1ms) for all requests after the first.Run the benchmark yourself: Jupyter notebook