How Guardrails Work
Send content for validation
This could be a user message before it reaches your LLM, or LLM-generated content before it reaches your users.
Guardrail analyzes the content
This can take up to 3 seconds.
Receive validation result
The guardrail returns a status (pass, fail, or unsure), confidence score, and reason for the decision.
Make decisions based on results
Based on the result, you decide whether to allow the content, flag it for human review, or regenerate with different parameters.
Monitor in your dashboard
Every validation automatically creates a complete observation in ABV, capturing the input, result, configuration, timing, and token usage.
Available Guardrails
Guardrails come in two categories based on how they work. LLM-Powered Guardrails can understand context and nuance, but about 1-3 seconds per check:Toxic Language
Toxic Language
Detects hate speech, threats, insults, and harmful content that could damage your community or poison your LLM’s context. Use this when you need to understand intent and tone, not just keywords.Learn more →
Biased Language
Biased Language
Identifies stereotypes, discriminatory assumptions, and coded language that could harm your brand or violate compliance requirements. Essential for HR content, job postings, and customer-facing communications.Learn more →
Contains String
Contains String
Validates presence or absence of specific text—perfect for ensuring legal disclaimers appear in LLM outputs or blocking sensitive information like passwords and credit card numbers in user inputs.Learn more →
Valid JSON
Valid JSON
Ensures LLM’s structured outputs are properly formatted and match your expected schema, preventing application errors from malformed data.Learn more →
Integrations
Guardrails integrate with ABV’s core features:Automatic Observability
Automatic Observability
Every guardrail check creates an observation in your traces. Monitor patterns, analyze confidence distributions, tune sensitivity settings based on real data.
SDK Integration
SDK Integration
Guardrails come with the Python SDK and JS/TS SDK. Native async support, type safety, and framework integration included.
Evaluation Tracking
Evaluation Tracking
Track guardrail performance using Evaluations. Measure false positive rates, analyze failure patterns, and optimize your validation strategy using the wizard.
Metrics Dashboard
Metrics Dashboard
Monitor guardrail status in the Metrics Dashboard. See failure rates by type, track costs, analyze latency, and identify optimization opportunities.
Next Steps
Ready to implement guardrails in your application?Quickstart
Get up and running with your first guardrail in under 5 minutes
Core Concepts
Understand sensitivity levels, confidence scores, and validation strategies
Best Practices
Learn optimal patterns for layering and combining guardrails
Security Guardrails
Explore security-focused guardrails for protecting against attacks