Understanding a few fundamental concepts will help you use guardrails effectively. This guide explains how guardrails work, what their results mean, and how to make decisions based on those results.

How Guardrails Make Decisions

Guardrails fall into two categories based on how they analyze content:

LLM-Powered
Rule-Based

LLM-powered guardrails use language models to understand context, nuance, and intent.When you send text to an LLM-powered guardrail like toxic language or biased language, it asks a language model to analyze the content and make a judgment. This takes about 1-3 seconds but gives you sophisticated analysis that understands sarcasm, coded language, and cultural context.Example: “People like you are the problem” → LLM recognizes this as hostile even without explicit profanity

Key difference: LLM-powered guardrails understand meaning while rule-based guardrails match patterns. If someone writes “people like you are the problem,” an LLM-powered guardrail recognizes this as hostile even though it doesn’t contain explicit profanity. A rule-based guardrail would only catch it if you explicitly listed that exact phrase.

Understanding Results

Every guardrail returns a result with three essential pieces of information:

Status: pass, fail, or unsure

The status field tells you the outcome of validation:

pass - Content meets your validation criteria and is safe to use
fail - Content violates your criteria and should be blocked or regenerated
unsure - The guardrail cannot make a confident determination (AI-powered only)

The unsure status only appears with LLM-powered guardrails since rule-based guardrails make binary decisions. You’ll see unsure when content is genuinely ambiguous or sits right on the boundary between acceptable and unacceptable.Examples of unsure cases:

Mild sarcasm that’s hard to judge definitively
Comments that could be critical feedback or personal attacks
Context-dependent language without enough context

Confidence: 0.0 to 1.0

The confidence field indicates how certain the guardrail is about its decision:Rule-based guardrails always return 1.0 because their logic is deterministic—there’s no uncertainty when checking if a string contains another string or if JSON is valid.LLM-powered guardrails return variable confidence:

0.9-1.0 (Very high) - Clear, unambiguous indicators
0.7-0.9 (High) - Strong evidence with minor ambiguity
0.5-0.7 (Moderate) - Notable ambiguity in the case
< 0.5 (Low) - Borderline, difficult to judge definitively

Factors affecting confidence:

Content with obvious violations or clear acceptability → High confidence
Ambiguous phrasing, sarcasm, context-dependent meaning → Lower confidence
Very short text with little context → Lower confidence
Cultural or linguistic nuances → May reduce confidence

Reason: Human-readable explanation

The reason field provides a human-readable explanation of why the guardrail made its decision.Use cases:

Internal logging and debugging
Analyzing patterns in your dashboard
Understanding edge cases
Tuning your configuration

Security note: Never expose the detailed reason to end users. This information helps attackers understand your validation logic so they can evade it. Use generic error messages for users while logging detailed reasons internally.

How Guardrails Process Content

Key differences:

Rule-based guardrails always return binary results (PASS/FAIL) with confidence 1.0
LLM-powered guardrails can return UNSURE status with variable confidence scores
Your application decides how to handle each result based on status and confidence

Common Usage Patterns

Understanding common patterns helps you apply guardrails effectively in your application:

Input Validation

Run guardrails before sending user content to your LLM. This protects your LLM from toxic prompts, prevents prompt injection attacks, and ensures you only process valid requests. Common pattern: check for forbidden strings first (instant), then run LLM-powered toxicity detection only if the content passes the initial screening.

Output Validation

Run guardrails after your LLM generates content but before showing it to users. This maintains brand safety, ensures compliance with regulations, and catches cases where the LLM generates unexpected content. Essential for customer-facing applications and regulated industries.

Layered Validation

Use multiple guardrails in sequence for comprehensive protection. Start with fast rule-based checks to catch obvious problems, then run expensive LLM-powered checks only if content passes initial screening. This pattern minimizes cost while maintaining thorough validation.

Parallel Validation

Run multiple independent guardrails simultaneously to minimize latency. For example, checking for toxic language and biased language are independent analyses that can happen in parallel. Total time equals the slowest check, not the sum of all checks.

Sensitivity Levels

LLM-powered guardrails support sensitivity settings that control validation strictness. Choosing the appropriate level depends on your system’s risk classification under regulatory frameworks and the potential harm from content failures.

Regulatory Framework Mapping

Low Sensitivity
Medium Sensitivity
High Sensitivity

Permissive validation—only severe violations trigger failuresRegulatory Context: Maps to Minimal/No-Risk AI Systems under the EU AI Act with no mandatory compliance obligations.EU AI Act Alignment:

Internal productivity tools
Non-critical recommendation systems
Entertainment and gaming applications
General-purpose utilities (spam filters, search)

ISO 42001 Requirements:

Voluntary codes of conduct
Basic ethical AI principles
Standard software development practices
No specific AI governance mandates

NIST AI RMF Considerations:

Minimal potential for harm to people
Low organizational impact
Limited ecosystem effects
Easily reversible outcomes

Validation Behavior:

Flags explicit threats and hate speech
Detects clear, unambiguous violations
Identifies severe discriminatory language
Allows robust debate and strong opinions

Compliance Note: Sensitivity level selection affects your ability to demonstrate compliance with regulatory requirements. High-risk LLM systems under the EU AI Act require stricter content controls and comprehensive audit trails. ABV automatically captures all validation results for compliance documentation.

Making Decisions with Results

Your decision logic determines how your application responds to validation results:

Simple Decision Strategy

TypeScript/JavaScript
Python

// Simple approach using only status
if (result.status === "pass") {
  return { action: "allow", content };
}

if (result.status === "fail") {
  return { action: "block", message: "Content violates guidelines" };
}

// You choose how to handle "unsure"
// Conservative: treat as fail
// Permissive: treat as pass
// Balanced: flag for review
return { action: "review", content };

# Simple approach using only status
if result["status"] == "pass":
    return {"action": "allow", "content": content}

if result["status"] == "fail":
    return {"action": "block", "message": "Content violates guidelines"}

# You choose how to handle "unsure"
# Conservative: treat as fail
# Permissive: treat as pass
# Balanced: flag for review
return {"action": "review", "content": content}

Sophisticated Decision Strategy

Use confidence scores for tiered responses:

TypeScript/JavaScript
Python

// Tiered approach using status AND confidence
if (result.status === "pass") {
  return { action: "allow", content };
}

if (result.status === "fail" && result.confidence > 0.8) {
  // High-confidence failure: auto-block
  await logRejection(content, result.reason, "auto");
  return { action: "block", message: "Content violates guidelines" };
}

if (result.status === "fail" && result.confidence > 0.6) {
  // Medium-confidence failure: flag for review
  await queueForReview(content, result);
  return { action: "pending", message: "Content under review" };
}

// Low confidence or unsure: always review
await queueForReview(content, result);
return { action: "pending", message: "Content under review" };

# Tiered approach using status AND confidence
if result["status"] == "pass":
    return {"action": "allow", "content": content}

if result["status"] == "fail" and result["confidence"] > 0.8:
    # High-confidence failure: auto-block
    await log_rejection(content, result["reason"], "auto")
    return {"action": "block", "message": "Content violates guidelines"}

if result["status"] == "fail" and result["confidence"] > 0.6:
    # Medium-confidence failure: flag for review
    await queue_for_review(content, result)
    return {"action": "pending", "message": "Content under review"}

# Low confidence or unsure: always review
await queue_for_review(content, result)
return {"action": "pending", "message": "Content under review"}

Response Times and Costs

Understanding performance characteristics helps you build efficient validation pipelines:

Rule-Based Guardrails
LLM-Powered Guardrails

Performance:

Response time: < 10 milliseconds
Cost: $0 (runs locally)
Predictable: Always same speed

Best for:

Pre-filtering before expensive checks
High-volume validation
Real-time validation
Patterns you can enumerate

Examples:

Contains String: Check for forbidden terms
Valid JSON: Validate structured outputs

Observations and Monitoring

Every guardrail execution automatically creates an observation in your ABV dashboard: What’s captured:

Input text (the content you validated)
Result (status, confidence, reason)
Configuration (sensitivity, mode, schema)
Performance (timing, token usage)
Context (user, session, trace)

How to use observations:

Monitor Patterns Over Time

Track failure rates by guardrail type
See which content types cause the most failures
Identify trends in user behavior
Spot unusual spikes in violations

Analyze Confidence Distributions

See where ambiguity occurs in your validation
Identify categories that need better rules
Understand when human review is needed most
Tune confidence thresholds for decisions

Tune Sensitivity Settings

Too many false positives? Lower sensitivity
Harmful content slipping through? Raise sensitivity
Different sensitivities for different contexts
A/B test different sensitivity levels

Debug Unexpected Results

Examine full context of a validation
Understand why specific content failed/passed
Reproduce issues for investigation
Improve your prompts based on patterns

Combining Multiple Guardrails

Most applications use multiple guardrails together:

Independent Guardrails (Run in Parallel)

Guardrails checking different criteria can run simultaneously:

TypeScript/JavaScript
Python

// These checks are independent - run in parallel
const [toxicCheck, biasCheck] = await Promise.all([
  abv.guardrails.toxicLanguage.validate(content),
  abv.guardrails.biasedLanguage.validate(content)
]);

// Total time = slowest check (not sum of both)

# These checks are independent - run in parallel
toxic_check, bias_check = await asyncio.gather(
    abv.guardrails.toxic_language.validate_async(content),
    abv.guardrails.biased_language.validate_async(content)
)

# Total time = slowest check (not sum of both)

Dependent Guardrails (Run Sequentially)

Create validation pipelines where fast checks filter before expensive checks:

TypeScript/JavaScript
Python

// Sequential pipeline: fast check filters before expensive check
const quickCheck = await abv.guardrails.containsString.validate(content, {
  strings: ["forbidden", "banned", "prohibited"],
  mode: "none"
});

if (quickCheck.status === "fail") {
  return { valid: false };  // Failed quick check, skip expensive check
}

// Only run expensive LLM check if quick check passed
const deepCheck = await abv.guardrails.toxicLanguage.validate(content);
return { valid: deepCheck.status === "pass" };

# Sequential pipeline: fast check filters before expensive check
quick_check = await abv.guardrails.contains_string.validate_async(content, {
    "strings": ["forbidden", "banned", "prohibited"],
    "mode": "none"
})

if quick_check["status"] == "fail":
    return {"valid": False}  # Failed quick check, skip expensive check

# Only run expensive LLM check if quick check passed
deep_check = await abv.guardrails.toxic_language.validate_async(content)
return {"valid": deep_check["status"] == "pass"}

Next Steps

Best Practices

Learn optimal patterns for performance, cost management, and error handling

Toxic Language

Explore sensitivity levels in detail with specific examples

Biased Language

Understand different categories of bias detection

Contains String

Master patterns for efficient string matching

Valid JSON

Learn schema validation and strict mode

Quickstart

Get hands-on with your first guardrail validation

Getting Started

Basic Features

LLM Gateway

Guardrails

Evaluations

Prompt Management

Cookbook

SDKs

Platform

Support

Concepts

How Guardrails Make Decisions

Understanding Results

How Guardrails Process Content

Common Usage Patterns

Sensitivity Levels

Regulatory Framework Mapping

Making Decisions with Results

Simple Decision Strategy

Sophisticated Decision Strategy

Response Times and Costs

Observations and Monitoring

Combining Multiple Guardrails

Independent Guardrails (Run in Parallel)

Dependent Guardrails (Run Sequentially)

Next Steps

Best Practices

Toxic Language

Biased Language

Contains String

Valid JSON

Quickstart

Getting Started

Basic Features

LLM Gateway

Guardrails

Evaluations

Prompt Management

Cookbook

SDKs

Platform

Support

​How Guardrails Make Decisions

​Understanding Results

​How Guardrails Process Content

​Common Usage Patterns

​Sensitivity Levels

​Regulatory Framework Mapping

​Making Decisions with Results

​Simple Decision Strategy

​Sophisticated Decision Strategy

​Response Times and Costs

​Observations and Monitoring

​Combining Multiple Guardrails

​Independent Guardrails (Run in Parallel)

​Dependent Guardrails (Run Sequentially)

​Next Steps

Best Practices

Toxic Language

Biased Language

Contains String

Valid JSON

Quickstart

How Guardrails Make Decisions

Understanding Results

How Guardrails Process Content

Common Usage Patterns

Sensitivity Levels

Regulatory Framework Mapping

Making Decisions with Results

Simple Decision Strategy

Sophisticated Decision Strategy

Response Times and Costs

Observations and Monitoring

Combining Multiple Guardrails

Independent Guardrails (Run in Parallel)

Dependent Guardrails (Run Sequentially)

Next Steps