How Guardrails Make Decisions
Guardrails fall into two categories based on how they analyze content:- LLM-Powered
- Rule-Based
LLM-powered guardrails use language models to understand context, nuance, and intent.When you send text to an LLM-powered guardrail like toxic language or biased language, it asks a language model to analyze the content and make a judgment. This takes about 1-3 seconds but gives you sophisticated analysis that understands sarcasm, coded language, and cultural context.Example: “People like you are the problem” → LLM recognizes this as hostile even without explicit profanity
Key difference: LLM-powered guardrails understand meaning while rule-based guardrails match patterns. If someone writes “people like you are the problem,” an LLM-powered guardrail recognizes this as hostile even though it doesn’t contain explicit profanity. A rule-based guardrail would only catch it if you explicitly listed that exact phrase.
Understanding Results
Every guardrail returns a result with three essential pieces of information:Status: pass, fail, or unsure
Status: pass, fail, or unsure
The status field tells you the outcome of validation:
- pass - Content meets your validation criteria and is safe to use
- fail - Content violates your criteria and should be blocked or regenerated
- unsure - The guardrail cannot make a confident determination (AI-powered only)
unsure status only appears with LLM-powered guardrails since rule-based guardrails make binary decisions. You’ll see unsure when content is genuinely ambiguous or sits right on the boundary between acceptable and unacceptable.Examples of unsure cases:- Mild sarcasm that’s hard to judge definitively
- Comments that could be critical feedback or personal attacks
- Context-dependent language without enough context
Confidence: 0.0 to 1.0
Confidence: 0.0 to 1.0
The confidence field indicates how certain the guardrail is about its decision:Rule-based guardrails always return
1.0 because their logic is deterministic—there’s no uncertainty when checking if a string contains another string or if JSON is valid.LLM-powered guardrails return variable confidence:- 0.9-1.0 (Very high) - Clear, unambiguous indicators
- 0.7-0.9 (High) - Strong evidence with minor ambiguity
- 0.5-0.7 (Moderate) - Notable ambiguity in the case
- < 0.5 (Low) - Borderline, difficult to judge definitively
- Content with obvious violations or clear acceptability → High confidence
- Ambiguous phrasing, sarcasm, context-dependent meaning → Lower confidence
- Very short text with little context → Lower confidence
- Cultural or linguistic nuances → May reduce confidence
Reason: Human-readable explanation
Reason: Human-readable explanation
The reason field provides a human-readable explanation of why the guardrail made its decision.Use cases:
- Internal logging and debugging
- Analyzing patterns in your dashboard
- Understanding edge cases
- Tuning your configuration
How Guardrails Process Content
Key differences:- Rule-based guardrails always return binary results (PASS/FAIL) with confidence 1.0
- LLM-powered guardrails can return UNSURE status with variable confidence scores
- Your application decides how to handle each result based on status and confidence
Common Usage Patterns
Understanding common patterns helps you apply guardrails effectively in your application:Input Validation
Input Validation
Run guardrails before sending user content to your LLM. This protects your LLM from toxic prompts, prevents prompt injection attacks, and ensures you only process valid requests. Common pattern: check for forbidden strings first (instant), then run LLM-powered toxicity detection only if the content passes the initial screening.
Output Validation
Output Validation
Run guardrails after your LLM generates content but before showing it to users. This maintains brand safety, ensures compliance with regulations, and catches cases where the LLM generates unexpected content. Essential for customer-facing applications and regulated industries.
Layered Validation
Layered Validation
Use multiple guardrails in sequence for comprehensive protection. Start with fast rule-based checks to catch obvious problems, then run expensive LLM-powered checks only if content passes initial screening. This pattern minimizes cost while maintaining thorough validation.
Parallel Validation
Parallel Validation
Run multiple independent guardrails simultaneously to minimize latency. For example, checking for toxic language and biased language are independent analyses that can happen in parallel. Total time equals the slowest check, not the sum of all checks.
Sensitivity Levels
LLM-powered guardrails support sensitivity settings that control validation strictness. Choosing the appropriate level depends on your system’s risk classification under regulatory frameworks and the potential harm from content failures.Regulatory Framework Mapping
- Low Sensitivity
- Medium Sensitivity
- High Sensitivity
Permissive validation—only severe violations trigger failuresRegulatory Context:
Maps to Minimal/No-Risk AI Systems under the EU AI Act with no mandatory compliance obligations.EU AI Act Alignment:
- Internal productivity tools
- Non-critical recommendation systems
- Entertainment and gaming applications
- General-purpose utilities (spam filters, search)
- Voluntary codes of conduct
- Basic ethical AI principles
- Standard software development practices
- No specific AI governance mandates
- Minimal potential for harm to people
- Low organizational impact
- Limited ecosystem effects
- Easily reversible outcomes
- Flags explicit threats and hate speech
- Detects clear, unambiguous violations
- Identifies severe discriminatory language
- Allows robust debate and strong opinions
Compliance Note: Sensitivity level selection affects your ability to demonstrate compliance with regulatory requirements. High-risk LLM systems under the EU AI Act require stricter content controls and comprehensive audit trails. ABV automatically captures all validation results for compliance documentation.
Making Decisions with Results
Your decision logic determines how your application responds to validation results:Simple Decision Strategy
- TypeScript/JavaScript
- Python
Sophisticated Decision Strategy
Use confidence scores for tiered responses:- TypeScript/JavaScript
- Python
Response Times and Costs
Understanding performance characteristics helps you build efficient validation pipelines:- Rule-Based Guardrails
- LLM-Powered Guardrails
Performance:
- Response time: < 10 milliseconds
- Cost: $0 (runs locally)
- Predictable: Always same speed
- Pre-filtering before expensive checks
- High-volume validation
- Real-time validation
- Patterns you can enumerate
- Contains String: Check for forbidden terms
- Valid JSON: Validate structured outputs
Observations and Monitoring
Every guardrail execution automatically creates an observation in your ABV dashboard: What’s captured:- Input text (the content you validated)
- Result (status, confidence, reason)
- Configuration (sensitivity, mode, schema)
- Performance (timing, token usage)
- Context (user, session, trace)
Monitor Patterns Over Time
Monitor Patterns Over Time
- Track failure rates by guardrail type
- See which content types cause the most failures
- Identify trends in user behavior
- Spot unusual spikes in violations
Analyze Confidence Distributions
Analyze Confidence Distributions
- See where ambiguity occurs in your validation
- Identify categories that need better rules
- Understand when human review is needed most
- Tune confidence thresholds for decisions
Tune Sensitivity Settings
Tune Sensitivity Settings
- Too many false positives? Lower sensitivity
- Harmful content slipping through? Raise sensitivity
- Different sensitivities for different contexts
- A/B test different sensitivity levels
Debug Unexpected Results
Debug Unexpected Results
- Examine full context of a validation
- Understand why specific content failed/passed
- Reproduce issues for investigation
- Improve your prompts based on patterns
Combining Multiple Guardrails
Most applications use multiple guardrails together:Independent Guardrails (Run in Parallel)
Guardrails checking different criteria can run simultaneously:- TypeScript/JavaScript
- Python
Dependent Guardrails (Run Sequentially)
Create validation pipelines where fast checks filter before expensive checks:- TypeScript/JavaScript
- Python
Next Steps
Best Practices
Learn optimal patterns for performance, cost management, and error handling
Toxic Language
Explore sensitivity levels in detail with specific examples
Biased Language
Understand different categories of bias detection
Contains String
Master patterns for efficient string matching
Valid JSON
Learn schema validation and strict mode
Quickstart
Get hands-on with your first guardrail validation