Best Practices

This guide covers strategies for using guardrails effectively in production applications. You’ll learn how to optimize performance and costs, implement robust error handling, maintain security, and monitor your validation pipeline.

Choosing Between LLM-Powered and Rule-Based Guardrails

The first and most important decision in building your validation pipeline is understanding when to use LLM-powered guardrails versus rule-based ones. This choice fundamentally affects your application’s performance, cost, and accuracy. Rule-based guardrails like contains string and valid JSON use deterministic logic. They check for exact patterns or validate against fixed rules. This makes them extremely fast, responding in under ten milliseconds, and completely free since they run locally without making any API calls. The tradeoff is that they can only catch what you explicitly define. If you’re checking whether text contains the word “password”, it will catch that exact word but won’t catch “my pw” or “my login credentials” unless you’ve added those to your list. LLM-powered guardrails like toxic language and biased language use language models to understand context and meaning. They can recognize that “people like you are the problem” is hostile even though it contains no profanity. They understand sarcasm, coded language, and subtle implications. This power comes at a cost of about one to three seconds per check and token consumption for the API call. The key insight for building efficient systems is that you should use rule-based guardrails whenever your requirements can be expressed as explicit patterns. Save LLM-powered guardrails for cases where you need semantic understanding. In many scenarios, you can use both in sequence, with the fast rule-based check acting as a filter before the expensive LLM check.

Building Efficient Validation Pipelines

Understanding how to sequence and combine guardrails is essential for building performant applications. The pattern that works best in most cases involves running fast checks first to catch obvious issues, then running expensive checks only when necessary. Consider a scenario where you want to validate user input for both forbidden terms and toxic content. Your intuition might be to run both checks in parallel to minimize latency. However, if the content contains an explicitly forbidden term, there’s no point in spending one to three seconds and token cost to analyze its tone. Instead, check for forbidden terms first using contains string, which takes under ten milliseconds. Only if that passes do you run the toxic language check. This sequential pattern creates a validation pipeline where each stage filters out content before more expensive stages run. The earlier stages should be faster and cheaper, acting as gatekeepers. Here’s what this looks like in practice:

TypeScript/JavaScript
Python

async function validateContent(content: string): Promise<{ valid: boolean; reason?: string }> {
  // Stage 1: Fast rule-based checks (< 10ms total, $0)

  // Check for forbidden terms first
  const forbiddenCheck = await abv.guardrails.containsString.validate(content, {
    strings: ["forbidden-term", "explicit-slur", "banned-phrase"],
    mode: "none"
  });

  if (forbiddenCheck.status === "fail") {
    return { valid: false, reason: "Contains forbidden content" };
  }

  // Check JSON structure if relevant
  if (expectingJson) {
    const jsonCheck = await abv.guardrails.validJson.validate(content);
    if (jsonCheck.status === "fail") {
      return { valid: false, reason: "Invalid format" };
    }
  }

  // Stage 2: LLM-powered checks (only if stage 1 passed)
  // These can run in parallel since they're independent
  const [toxicCheck, biasCheck] = await Promise.all([
    abv.guardrails.toxicLanguage.validate(content, { sensitivity: "medium" }),
    abv.guardrails.biasedLanguage.validate(content, { sensitivity: "medium" })
  ]);

  if (toxicCheck.status === "fail") {
    return { valid: false, reason: "Contains toxic content" };
  }

  if (biasCheck.status === "fail") {
    return { valid: false, reason: "Contains biased language" };
  }

  return { valid: true };
}

async def validate_content(content: str) -> dict:
    # Stage 1: Fast rule-based checks (< 10ms total, $0)

    # Check for forbidden terms first
    forbidden_check = await abv.guardrails.contains_string.validate_async(content, {
        "strings": ["forbidden-term", "explicit-slur", "banned-phrase"],
        "mode": "none"
    })

    if forbidden_check["status"] == "fail":
        return {"valid": False, "reason": "Contains forbidden content"}

    # Check JSON structure if relevant
    if expecting_json:
        json_check = await abv.guardrails.valid_json.validate_async(content)
        if json_check["status"] == "fail":
            return {"valid": False, "reason": "Invalid format"}

    # Stage 2: LLM-powered checks (only if stage 1 passed)
    # These can run in parallel since they're independent
    toxic_check, bias_check = await asyncio.gather(
        abv.guardrails.toxic_language.validate_async(content, {"sensitivity": "medium"}),
        abv.guardrails.biased_language.validate_async(content, {"sensitivity": "medium"})
    )

    if toxic_check["status"] == "fail":
        return {"valid": False, "reason": "Contains toxic content"}

    if bias_check["status"] == "fail":
        return {"valid": False, "reason": "Contains biased language"}

    return {"valid": True}

This pattern minimizes cost and latency by failing fast when possible. Most forbidden content gets caught in under ten milliseconds by the rule-based checks. Only content that passes those hurdles gets the full LLM analysis. When you do run LLM-powered checks, you run them in parallel since they’re independent analyses, which means your total LLM check time is the maximum of the individual checks rather than the sum.

Selecting the Right Sensitivity Level

Sensitivity level is one of the most important configuration choices for LLM-powered guardrails, and it requires careful thought about your application’s context and risk tolerance. The right sensitivity balances safety against false positives. Consider starting with medium sensitivity in development and adjusting based on your observations. Medium represents a balanced default that catches clear violations while allowing professional disagreement. As you review your guardrail observations in the ABV dashboard, you’ll see patterns that help you tune your sensitivity. If you’re seeing too many false positives where acceptable content gets blocked, you’re probably using sensitivity that’s too high for your context. For example, if you’re building a technical forum where people debate code quality and you’re seeing technical critiques getting flagged, consider lowering your sensitivity. Technical discussions naturally include strong opinions about approaches and implementations. If you’re seeing problematic content slip through, your sensitivity is probably too low. This is particularly important to watch for in consumer-facing applications or applications used by children. Any hostile content that reaches users damages trust and potentially exposes you to liability. You can also implement context-aware sensitivity where different parts of your application or different user types get different sensitivity levels. A children’s section gets high sensitivity automatically. Content from authenticated users with good history might get medium sensitivity while anonymous users get high sensitivity. Customer service queries might get lower sensitivity since you expect frustrated customers to express strong emotions.

Implementing Sophisticated Decision Logic

The simplest decision logic uses only the status field. If status is pass you allow the content, if it’s fail you block it, and if it’s unsure you choose whether to be conservative and block or permissive and allow. This works for many applications, but you can build more sophisticated logic using confidence scores and multiple guardrails together. Consider implementing tiered responses based on confidence. When a guardrail returns fail with high confidence above 0.9, you can automatically block the content without human review. When it returns fail with moderate confidence between 0.6 and 0.9, you might flag it for human review rather than automatically blocking. When it returns fail with low confidence below 0.6 or returns unsure, you definitely want human eyes on it before making a final decision. This approach recognizes that LLM-powered guardrails are probabilistic systems. They’re very good at catching clear violations but less certain about edge cases. By using confidence scores, you can automate the obvious cases while routing ambiguous cases to humans. This gives you both efficiency and accuracy. Here’s what tiered decision logic looks like:

TypeScript/JavaScript
Python

async function handleContent(content: string, context: "public" | "private") {
  // Adjust sensitivity based on context
  const sensitivity = context === "public" ? "high" : "medium";

  const result = await abv.guardrails.toxicLanguage.validate(content, { sensitivity });

  // Tiered decision logic based on confidence
  if (result.status === "pass") {
    // Always allow clear passes
    return { action: "approve", content };
  }

  if (result.status === "fail" && result.confidence > 0.9) {
    // High-confidence violations: auto-block
    await logRejection(content, result.reason, "auto");
    return { action: "reject", message: "Content violates guidelines" };
  }

  if (result.status === "fail" && result.confidence > 0.6) {
    // Medium-confidence violations: flag for review
    await queueForReview(content, result);
    return { action: "pending", message: "Content under review" };
  }

  // Low confidence or unsure: always review
  await queueForReview(content, result);
  return { action: "pending", message: "Content under review" };
}

async def handle_content(content: str, context: str):
    # Adjust sensitivity based on context
    sensitivity = "high" if context == "public" else "medium"

    result = await abv.guardrails.toxic_language.validate_async(content, {"sensitivity": sensitivity})

    # Tiered decision logic based on confidence
    if result["status"] == "pass":
        # Always allow clear passes
        return {"action": "approve", "content": content}

    if result["status"] == "fail" and result["confidence"] > 0.9:
        # High-confidence violations: auto-block
        await log_rejection(content, result["reason"], "auto")
        return {"action": "reject", "message": "Content violates guidelines"}

    if result["status"] == "fail" and result["confidence"] > 0.6:
        # Medium-confidence violations: flag for review
        await queue_for_review(content, result)
        return {"action": "pending", "message": "Content under review"}

    # Low confidence or unsure: always review
    await queue_for_review(content, result)
    return {"action": "pending", "message": "Content under review"}

Security and Privacy Considerations

Guardrails are part of your security posture, and you need to think carefully about how you use them to avoid creating vulnerabilities. The most important security principle is to never expose the detailed reason field to end users. When a guardrail blocks content, the reason explains exactly why, and this information helps attackers understand your validation logic so they can evade it. Instead of showing the detailed reason, use generic error messages to users while logging the detailed reason internally for monitoring and debugging. This prevents adversarial learning where users iterate on their inputs to find ways around your guardrails. Another security consideration is validating in both directions. You should validate user inputs before sending them to your LLM to prevent prompt injection and context poisoning. You should also validate LLM outputs before showing them to users to maintain brand safety and compliance. Don’t assume that because your LLM is generally helpful it will never generate problematic content. LLMs can be manipulated through cleverly crafted prompts, and even without manipulation they sometimes generate content you wouldn’t want to show users. Rate limiting is another security measure worth implementing. Guardrail checks have a cost, and malicious actors could potentially run up your bill by making many validation requests. Implement rate limits on how many validations a single user or IP address can request in a given time window. This protects both your costs and your service availability. Consider implementing monitoring and alerting for unusual patterns. If you suddenly see a spike in failed validations from a particular user or for a particular type of content, that might indicate an attack or a system misconfiguration. The ABV dashboard gives you observability into these patterns, and you should regularly review it for anomalies.

Error Handling and Resilience

Guardrails make network requests to external services, which means they can fail in ways beyond your control. Network issues, service outages, and rate limits can all cause validation requests to fail. You need to decide how to handle these failure modes based on your application’s risk tolerance. The two fundamental strategies are failing open and failing closed. Failing open means that when validation fails due to an error, you allow the content through. This prioritizes availability over security. Use failing open when blocking legitimate content due to a service outage would be worse than allowing some potentially problematic content through temporarily. Failing closed means that when validation fails, you block the content. This prioritizes security over availability. Use failing closed when allowing problematic content through could cause serious harm or legal liability. Most applications should fail closed for user-facing content and fail open for internal content. If an end user is posting content that will be seen by other users, fail closed to protect your community. If an employee is using an internal tool and validation fails, fail open to avoid disrupting their work. Implement retry logic with exponential backoff for transient failures. Network blips and temporary service issues often resolve themselves quickly. If a validation request fails, wait a moment and try again. If it fails again, wait longer and try again. After a few attempts, accept the failure and fall back to your fail-open or fail-closed strategy. Here’s what robust error handling looks like:

TypeScript/JavaScript
Python

async function validateWithRetry(
  content: string,
  maxRetries: number = 3
): Promise<{ valid: boolean; error?: string }> {
  let lastError: Error | undefined;

  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const result = await abv.guardrails.toxicLanguage.validate(content);

      // Success - return the result
      return {
        valid: result.status === "pass"
      };

    } catch (error) {
      lastError = error as Error;

      // Log the error
      console.error(`Validation attempt ${attempt + 1} failed:`, error);

      // If this isn't our last attempt, wait before retrying
      if (attempt < maxRetries - 1) {
        const delay = Math.pow(2, attempt) * 1000;  // Exponential backoff
        await new Promise(resolve => setTimeout(resolve, delay));
      }
    }
  }

  // All retries failed - implement fail-closed strategy
  // For production, you'd make this configurable
  console.error("All validation attempts failed, blocking content");
  return {
    valid: false,
    error: "Unable to validate content, please try again"
  };
}

async def validate_with_retry(
    content: str,
    max_retries: int = 3
) -> dict:
    last_error = None

    for attempt in range(max_retries):
        try:
            result = await abv.guardrails.toxic_language.validate_async(content)

            # Success - return the result
            return {"valid": result["status"] == "pass"}

        except Exception as error:
            last_error = error

            # Log the error
            print(f"Validation attempt {attempt + 1} failed: {error}")

            # If this isn't our last attempt, wait before retrying
            if attempt < max_retries - 1:
                delay = (2 ** attempt) * 1.0  # Exponential backoff
                await asyncio.sleep(delay)

    # All retries failed - implement fail-closed strategy
    # For production, you'd make this configurable
    print("All validation attempts failed, blocking content")
    return {
        "valid": False,
        "error": "Unable to validate content, please try again"
    }

Monitoring and Continuous Improvement

Guardrails aren’t set-it-and-forget-it systems. They require ongoing monitoring and tuning based on real data from your application. The ABV dashboard provides observability into every guardrail execution, and you should develop a practice of regularly reviewing this data. Look for patterns in failures. If you’re seeing the same types of content repeatedly flagged, that might indicate that your sensitivity is miscalibrated or that you need to adjust your prompts if you’re validating LLM outputs. If you’re seeing very few failures, you might be using sensitivity that’s too low and missing problematic content. Pay attention to confidence distributions. If most of your failures have low confidence, that suggests you’re on the edge of ambiguity and might benefit from human review workflows. If most of your failures have high confidence, your validation is clear-cut and you can confidently automate decisions. Monitor your false positive rate by sampling content that was blocked and manually reviewing whether it should have been allowed. This helps you tune sensitivity and adjust your configuration to reduce cases where good content gets incorrectly blocked. Similarly, monitor your false negative rate by sampling content that was allowed and checking whether any of it should have been blocked. This is harder to do systematically but is important for maintaining safety. Track your costs and performance metrics. How much are you spending on guardrail validation? What’s your average latency? Are there specific patterns where validation is particularly slow or expensive? Use this data to optimize your validation pipeline by implementing caching where appropriate, using rule-based pre-filters more aggressively, or adjusting which guardrails you run in which scenarios.

Caching and Performance Optimization

For applications with high throughput, consider implementing caching for validation results. If you’re checking the same content repeatedly, there’s no need to validate it each time. Cache the result and reuse it. This is particularly valuable for template content or common phrases that appear frequently. Be thoughtful about cache invalidation. If you change your guardrail configuration, particularly sensitivity levels, you need to invalidate cached results since they might no longer be accurate. Similarly, if you’re caching LLM-powered guardrail results, consider setting a TTL since language models can be updated and results might change over time. Another performance optimization is batching validations when possible. If you have multiple pieces of content to validate and they’re independent, validate them in parallel. If you’re validating a document with multiple sections, you might validate each section in parallel rather than sequentially. This multiplies your throughput without increasing your per-item latency.

Next Steps

The patterns and principles in this guide give you a foundation for using guardrails effectively in production. As you gain experience with guardrails in your specific application, you’ll develop intuition for which configurations work best for your use case. The key is to treat guardrails as a system that requires ongoing attention and tuning, not as a static configuration you set once and forget about. Make sure to explore the specific guardrail documentation pages for detailed information about each guardrail’s unique characteristics and configuration options. The toxic language page covers sensitivity levels in depth. The biased language page explains the different categories of bias. The contains string page shows efficient pattern matching strategies. The valid JSON page details schema validation options. Understanding these specifics helps you make informed decisions about which guardrails to use and how to configure them.

Getting Started

Basic Features

LLM Gateway

Guardrails

Evaluations

Prompt Management

Cookbook

SDKs

Platform

Support

Best Practices

Best Practices

Choosing Between LLM-Powered and Rule-Based Guardrails

Building Efficient Validation Pipelines

Selecting the Right Sensitivity Level

Implementing Sophisticated Decision Logic

Security and Privacy Considerations

Error Handling and Resilience

Monitoring and Continuous Improvement

Caching and Performance Optimization

Next Steps

Getting Started

Basic Features

LLM Gateway

Guardrails

Evaluations

Prompt Management

Cookbook

SDKs

Platform

Support

​Best Practices

​Choosing Between LLM-Powered and Rule-Based Guardrails

​Building Efficient Validation Pipelines

​Selecting the Right Sensitivity Level

​Implementing Sophisticated Decision Logic

​Security and Privacy Considerations

​Error Handling and Resilience

​Monitoring and Continuous Improvement

​Caching and Performance Optimization

​Next Steps

Best Practices

Choosing Between LLM-Powered and Rule-Based Guardrails

Building Efficient Validation Pipelines

Selecting the Right Sensitivity Level

Implementing Sophisticated Decision Logic

Security and Privacy Considerations

Error Handling and Resilience

Monitoring and Continuous Improvement

Caching and Performance Optimization

Next Steps