Skip to main content
Score Configs ensure scores follow a specific schema and standardize scoring across your team.Create a Score Config:
  1. Navigate to your project in the ABV UI
  2. Go to EvaluationsScore Configs
  3. Click Create Score Config
  4. Configure:
    • Name: e.g., user_feedback, hallucination_eval
    • Data Type: NUMERIC, CATEGORICAL, or BOOLEAN
    • Constraints: Min/Max for numeric, custom categories for categorical
Via API:
from abvdev import ABV

abv = ABV(api_key="sk-abv-...")

abv.create_score_config(
    name="correctness",
    data_type="NUMERIC",
    min_value=0,
    max_value=1,
    description="Measures factual accuracy"
)
Manage Configs:
  • Configs are immutable but can be archived
  • Archived configs can be restored anytime
  • Link scores to configs using config_id to ensure schema compliance
See Scores Data Model for complete details.
Common causes and solutions:
  1. Events not flushed (short-lived apps):
    • Python: Call abv.flush() before exit
    • JS/TS: Call await abvSpanProcessor.forceFlush() before exit
  2. Incorrect API credentials:
    • Verify your API key is correct
    • Check region (US: https://app.abv.dev, EU: https://eu.app.abv.dev)
    • Python: Use abv.auth_check() to verify credentials
  3. Instrumentation not loaded:
    • JS/TS: Ensure import "./instrumentation" is the FIRST import
    • Python: Initialize with get_client() or ABV()
  4. Network/firewall issues:
    • Verify your application can reach the ABV API
    • Check for proxy/firewall blocking requests
  5. Sampling too aggressive:
    • Check if sampling is filtering out traces
    • Temporarily set sample rate to 1.0 (100%) to test
  6. Wrong project:
    • Verify you’re viewing the correct project in the ABV UI
    • Confirm API key belongs to the project you’re viewing
  7. For JS/TS with @vercel/otel:
    • Use manual OpenTelemetry setup via NodeTracerProvider
    • The @vercel/otel package doesn’t support OpenTelemetry JS SDK v2
  8. Enable debug logging:
    • Python: Set log level in code
    • JS/TS: Set ABV_LOG_LEVEL="DEBUG" in environment variables
See Troubleshooting FAQ for general troubleshooting.
Capture user feedback as scores to evaluate LLM application quality.Method 1: Frontend Collection (Browser SDK)
import { ABVClient } from '@abvdev/client';

const abv = new ABVClient({ apiKey: 'sk-abv-...' });

// Capture thumbs up/down
abv.createScore({
  name: 'user_feedback',
  value: 1,  // 1 for positive, 0 for negative
  traceId: traceId,
  dataType: 'BOOLEAN',
  comment: 'User found this helpful'
});
Method 2: Backend Collection (Python SDK)
from abvdev import ABV

abv = ABV(api_key="sk-abv-...")

# Categorical feedback
abv.create_score(
    name="user_rating",
    string_value="excellent",  # or "good", "poor"
    trace_id="trace_id_here",
    data_type="CATEGORICAL",
    comment="User provided detailed feedback"
)
Method 3: Human Annotation UIUse Annotation Queues for structured team reviews:
  1. Create Score Configs for feedback dimensions
  2. Create an Annotation Queue
  3. Assign team members to review traces
  4. Annotate traces directly in the ABV UI
Best Practices:
  • Link scores to Score Configs for consistent schema
  • Use trace_id to associate feedback with specific interactions
  • Scores can be ingested before the trace is created (linked automatically)
See Custom Scores and Human Annotation for details.
Score Configs enforce schema validation across your evaluation workflows.Benefits:
  • Standardized scoring: All team members use the same criteria
  • Data validation: Automatic validation of score values
  • Type safety: Ensures numeric/categorical/boolean consistency
  • Schema evolution: Archive old configs, create new versions
Example: Categorical Score Config
abv.create_score_config(
    name="sentiment",
    data_type="CATEGORICAL",
    categories=[
        {"label": "positive", "value": 1},
        {"label": "neutral", "value": 0},
        {"label": "negative", "value": -1}
    ]
)
When you create a score with this config_id, ABV validates that string_value matches one of the defined categories.Example: Numeric Score Config with Constraints
abv.create_score_config(
    name="accuracy",
    data_type="NUMERIC",
    min_value=0.0,
    max_value=1.0
)
Scores outside the 0-1 range will be rejected.See Scores Data Model for configuration options.
Yes, Score Configs are optional but recommended.Without Score Configs:
  • Manually specify data_type for each score
  • No automatic validation of value ranges
  • Less consistency across team members
Example:
abv.create_score(
    name="custom_metric",
    value=42,
    trace_id="trace_id",
    data_type="NUMERIC"  # Must specify manually
)
With Score Configs:
  • Reference config_id to automatically set data_type
  • Automatic value validation
  • Standardized across all scores with that name
abv.create_score(
    name="custom_metric",
    value=42,
    trace_id="trace_id",
    config_id="config_id_here"  # data_type set automatically
)
Recommendation: Use Score Configs for production evaluation workflows.
The source field automatically categorizes how scores were created:
SourceDescriptionExample Use Case
APIScores created via SDK or APIUser feedback, runtime metrics, custom evaluations
EVALScores from LLM-as-a-Judge evaluationsAutomated quality checks, hallucination detection
ANNOTATIONScores from Human Annotation UIManual reviews, annotation queues, team collaboration
Automatic Assignment:
  • SDK/API calls → source="API"
  • LLM-as-a-Judge runs → source="EVAL"
  • UI annotations → source="ANNOTATION"
Filter by source:
  • View scores by source in the ABV UI
  • Query via API: abv.get_scores(source="EVAL")
  • Useful for comparing human vs automated evaluations
This helps track evaluation provenance and compare different evaluation methods.

Related Resources