How do I create and manage Score Configs?
How do I create and manage Score Configs?
Score Configs ensure scores follow a specific schema and standardize scoring across your team.Create a Score Config:Manage Configs:
- Navigate to your project in the ABV UI
- Go to Evaluations → Score Configs
- Click Create Score Config
- Configure:
- Name: e.g.,
user_feedback,hallucination_eval - Data Type:
NUMERIC,CATEGORICAL, orBOOLEAN - Constraints: Min/Max for numeric, custom categories for categorical
- Name: e.g.,
- Configs are immutable but can be archived
- Archived configs can be restored anytime
- Link scores to configs using
config_idto ensure schema compliance
I don't see traces in the dashboard. How do I troubleshoot?
I don't see traces in the dashboard. How do I troubleshoot?
Common causes and solutions:
-
Events not flushed (short-lived apps):
- Python: Call
abv.flush()before exit - JS/TS: Call
await abvSpanProcessor.forceFlush()before exit
- Python: Call
-
Incorrect API credentials:
- Verify your API key is correct
- Check region (US:
https://app.abv.dev, EU:https://eu.app.abv.dev) - Python: Use
abv.auth_check()to verify credentials
-
Instrumentation not loaded:
- JS/TS: Ensure
import "./instrumentation"is the FIRST import - Python: Initialize with
get_client()orABV()
- JS/TS: Ensure
-
Network/firewall issues:
- Verify your application can reach the ABV API
- Check for proxy/firewall blocking requests
-
Sampling too aggressive:
- Check if sampling is filtering out traces
- Temporarily set sample rate to 1.0 (100%) to test
-
Wrong project:
- Verify you’re viewing the correct project in the ABV UI
- Confirm API key belongs to the project you’re viewing
-
For JS/TS with @vercel/otel:
- Use manual OpenTelemetry setup via
NodeTracerProvider - The @vercel/otel package doesn’t support OpenTelemetry JS SDK v2
- Use manual OpenTelemetry setup via
-
Enable debug logging:
- Python: Set log level in code
- JS/TS: Set
ABV_LOG_LEVEL="DEBUG"in environment variables
How do I capture user feedback for evaluation?
How do I capture user feedback for evaluation?
Capture user feedback as scores to evaluate LLM application quality.Method 1: Frontend Collection (Browser SDK)Method 2: Backend Collection (Python SDK)Method 3: Human Annotation UIUse Annotation Queues for structured team reviews:
- Create Score Configs for feedback dimensions
- Create an Annotation Queue
- Assign team members to review traces
- Annotate traces directly in the ABV UI
- Link scores to Score Configs for consistent schema
- Use
trace_idto associate feedback with specific interactions - Scores can be ingested before the trace is created (linked automatically)
How do Score Configs ensure data consistency?
How do Score Configs ensure data consistency?
Score Configs enforce schema validation across your evaluation workflows.Benefits:When you create a score with this Scores outside the 0-1 range will be rejected.See Scores Data Model for configuration options.
- Standardized scoring: All team members use the same criteria
- Data validation: Automatic validation of score values
- Type safety: Ensures numeric/categorical/boolean consistency
- Schema evolution: Archive old configs, create new versions
config_id, ABV validates that string_value matches one of the defined categories.Example: Numeric Score Config with ConstraintsCan I use scores without Score Configs?
Can I use scores without Score Configs?
Yes, Score Configs are optional but recommended.Without Score Configs:With Score Configs:Recommendation: Use Score Configs for production evaluation workflows.
- Manually specify
data_typefor each score - No automatic validation of value ranges
- Less consistency across team members
- Reference
config_idto automatically setdata_type - Automatic value validation
- Standardized across all scores with that name
How do I link scores to traces, observations, or sessions?
How do I link scores to traces, observations, or sessions?
Scores can be attached to different levels of your application data.Trace-level (most common):Observation-level (specific LLM call):Session-level (multi-turn conversation):Dataset Run-level (experiment performance):Note: Each score references exactly one of these objects.See Scores Data Model for use cases.
What's the difference between API, EVAL, and ANNOTATION scores?
What's the difference between API, EVAL, and ANNOTATION scores?
The
Automatic Assignment:
source field automatically categorizes how scores were created:| Source | Description | Example Use Case |
|---|---|---|
| API | Scores created via SDK or API | User feedback, runtime metrics, custom evaluations |
| EVAL | Scores from LLM-as-a-Judge evaluations | Automated quality checks, hallucination detection |
| ANNOTATION | Scores from Human Annotation UI | Manual reviews, annotation queues, team collaboration |
- SDK/API calls →
source="API" - LLM-as-a-Judge runs →
source="EVAL" - UI annotations →
source="ANNOTATION"
- View scores by source in the ABV UI
- Query via API:
abv.get_scores(source="EVAL") - Useful for comparing human vs automated evaluations