Best Practices
This guide covers strategies for using guardrails effectively in production applications. Youâll learn how to optimize performance and costs, implement robust error handling, maintain security, and monitor your validation pipeline.Choosing Between LLM-Powered and Rule-Based Guardrails
The first and most important decision in building your validation pipeline is understanding when to use LLM-powered guardrails versus rule-based ones. This choice fundamentally affects your applicationâs performance, cost, and accuracy. Rule-based guardrails like contains string and valid JSON use deterministic logic. They check for exact patterns or validate against fixed rules. This makes them extremely fast, responding in under ten milliseconds, and completely free since they run locally without making any API calls. The tradeoff is that they can only catch what you explicitly define. If youâre checking whether text contains the word âpasswordâ, it will catch that exact word but wonât catch âmy pwâ or âmy login credentialsâ unless youâve added those to your list. LLM-powered guardrails like toxic language and biased language use language models to understand context and meaning. They can recognize that âpeople like you are the problemâ is hostile even though it contains no profanity. They understand sarcasm, coded language, and subtle implications. This power comes at a cost of about one to three seconds per check and token consumption for the API call. The key insight for building efficient systems is that you should use rule-based guardrails whenever your requirements can be expressed as explicit patterns. Save LLM-powered guardrails for cases where you need semantic understanding. In many scenarios, you can use both in sequence, with the fast rule-based check acting as a filter before the expensive LLM check.Building Efficient Validation Pipelines
Understanding how to sequence and combine guardrails is essential for building performant applications. The pattern that works best in most cases involves running fast checks first to catch obvious issues, then running expensive checks only when necessary. Consider a scenario where you want to validate user input for both forbidden terms and toxic content. Your intuition might be to run both checks in parallel to minimize latency. However, if the content contains an explicitly forbidden term, thereâs no point in spending one to three seconds and token cost to analyze its tone. Instead, check for forbidden terms first using contains string, which takes under ten milliseconds. Only if that passes do you run the toxic language check. This sequential pattern creates a validation pipeline where each stage filters out content before more expensive stages run. The earlier stages should be faster and cheaper, acting as gatekeepers. Hereâs what this looks like in practice:- TypeScript/JavaScript
- Python
Selecting the Right Sensitivity Level
Sensitivity level is one of the most important configuration choices for LLM-powered guardrails, and it requires careful thought about your applicationâs context and risk tolerance. The right sensitivity balances safety against false positives. Consider starting with medium sensitivity in development and adjusting based on your observations. Medium represents a balanced default that catches clear violations while allowing professional disagreement. As you review your guardrail observations in the ABV dashboard, youâll see patterns that help you tune your sensitivity. If youâre seeing too many false positives where acceptable content gets blocked, youâre probably using sensitivity thatâs too high for your context. For example, if youâre building a technical forum where people debate code quality and youâre seeing technical critiques getting flagged, consider lowering your sensitivity. Technical discussions naturally include strong opinions about approaches and implementations. If youâre seeing problematic content slip through, your sensitivity is probably too low. This is particularly important to watch for in consumer-facing applications or applications used by children. Any hostile content that reaches users damages trust and potentially exposes you to liability. You can also implement context-aware sensitivity where different parts of your application or different user types get different sensitivity levels. A childrenâs section gets high sensitivity automatically. Content from authenticated users with good history might get medium sensitivity while anonymous users get high sensitivity. Customer service queries might get lower sensitivity since you expect frustrated customers to express strong emotions.Implementing Sophisticated Decision Logic
The simplest decision logic uses only the status field. If status is pass you allow the content, if itâs fail you block it, and if itâs unsure you choose whether to be conservative and block or permissive and allow. This works for many applications, but you can build more sophisticated logic using confidence scores and multiple guardrails together. Consider implementing tiered responses based on confidence. When a guardrail returns fail with high confidence above 0.9, you can automatically block the content without human review. When it returns fail with moderate confidence between 0.6 and 0.9, you might flag it for human review rather than automatically blocking. When it returns fail with low confidence below 0.6 or returns unsure, you definitely want human eyes on it before making a final decision. This approach recognizes that LLM-powered guardrails are probabilistic systems. Theyâre very good at catching clear violations but less certain about edge cases. By using confidence scores, you can automate the obvious cases while routing ambiguous cases to humans. This gives you both efficiency and accuracy. Hereâs what tiered decision logic looks like:- TypeScript/JavaScript
- Python
Security and Privacy Considerations
Guardrails are part of your security posture, and you need to think carefully about how you use them to avoid creating vulnerabilities. The most important security principle is to never expose the detailed reason field to end users. When a guardrail blocks content, the reason explains exactly why, and this information helps attackers understand your validation logic so they can evade it. Instead of showing the detailed reason, use generic error messages to users while logging the detailed reason internally for monitoring and debugging. This prevents adversarial learning where users iterate on their inputs to find ways around your guardrails. Another security consideration is validating in both directions. You should validate user inputs before sending them to your LLM to prevent prompt injection and context poisoning. You should also validate LLM outputs before showing them to users to maintain brand safety and compliance. Donât assume that because your LLM is generally helpful it will never generate problematic content. LLMs can be manipulated through cleverly crafted prompts, and even without manipulation they sometimes generate content you wouldnât want to show users. Rate limiting is another security measure worth implementing. Guardrail checks have a cost, and malicious actors could potentially run up your bill by making many validation requests. Implement rate limits on how many validations a single user or IP address can request in a given time window. This protects both your costs and your service availability. Consider implementing monitoring and alerting for unusual patterns. If you suddenly see a spike in failed validations from a particular user or for a particular type of content, that might indicate an attack or a system misconfiguration. The ABV dashboard gives you observability into these patterns, and you should regularly review it for anomalies.Error Handling and Resilience
Guardrails make network requests to external services, which means they can fail in ways beyond your control. Network issues, service outages, and rate limits can all cause validation requests to fail. You need to decide how to handle these failure modes based on your applicationâs risk tolerance. The two fundamental strategies are failing open and failing closed. Failing open means that when validation fails due to an error, you allow the content through. This prioritizes availability over security. Use failing open when blocking legitimate content due to a service outage would be worse than allowing some potentially problematic content through temporarily. Failing closed means that when validation fails, you block the content. This prioritizes security over availability. Use failing closed when allowing problematic content through could cause serious harm or legal liability. Most applications should fail closed for user-facing content and fail open for internal content. If an end user is posting content that will be seen by other users, fail closed to protect your community. If an employee is using an internal tool and validation fails, fail open to avoid disrupting their work. Implement retry logic with exponential backoff for transient failures. Network blips and temporary service issues often resolve themselves quickly. If a validation request fails, wait a moment and try again. If it fails again, wait longer and try again. After a few attempts, accept the failure and fall back to your fail-open or fail-closed strategy. Hereâs what robust error handling looks like:- TypeScript/JavaScript
- Python