Traces, Observations, Sessions, or DatasetRuns via the Score object (see Scores Data Model).
This is achieved by ingesting scores via the ABV SDKs or API.
Common Use Cases
- Collecting user feedback: Capture in-app feedback from users on application quality or performance via the Browser SDK.
- Custom evaluation data pipeline: Continuously monitor quality by fetching traces from ABV, running custom evaluations, and ingesting scores back.
- Custom internal workflow tooling: build custom internal tooling that helps you manage human-in-the-loop workflows. Ingest scores back into ABV, optionally following your custom schema by referencing a config.
- Custom run-time evaluations: e.g. track whether the generated SQL code actually worked, or if the structured output was valid JSON.
Ingesting Scores via API/SDKs
You can add scores via the ABV SDKs or API. Scores can take one of three data types: Numeric, Categorical or Boolean. If a score is ingested manually using atrace_id to link the score to a trace, it is not necessary to wait until the trace has been created. The score will show up in the scores table and will be linked to the trace once the trace with the same trace_id is created.
Here are examples by Score data types
Python SDK
Install packageCategorical
Categorical score values must be provided as strings.Boolean
Boolean scores must be provided as a float. The valueâs string equivalent will be automatically populated and is accessible on read.JS/TS SDK
.env
Numeric
Numeric score values must be provided as float.Categorical
Categorical score values must be provided as strings.Boolean
Boolean scores must be provided as a float. The valueâs string equivalent will be automatically populated and is accessible on read. See API reference for more details on POST/GET scores endpoints.Preventing Duplicate Scores
By default, ABV allows for multiple scores of the samename on the same trace. This is useful if youâd like to track the evolution of a score over time or if e.g. youâve received multiple user feedback scores on the same trace.
In some cases, you want to prevent this behavior or update an existing score. This can be achieved by creating an idempotency key on the score and add this as an id when creating the score, e.g. <trace_id>-<score_name>.
Enforcing a Score Config
Score configs are helpful when you want to standardize your scores for future analysis.
To enforce a score config, you can provide a configId when creating a score to reference a ScoreConfig that was previously created. Score Configs can be defined in the ABV UI or via our API. .
Whenever you provide a ScoreConfig, the score data will be validated against the config. The following rules apply:
- Score Name: Must equal the configâs name
- Score Data Type: When provided, must match the configâs data type
- Score Value when Type is numeric: Value must be within the min and max values defined in the config (if provided, min and max are optional and otherwise are assumed as -â and +â respectively)
- Score Value when Type is categorical: Value must map to one of the categories defined in the config
- Score Value when Type is boolean: Value must equal
0or1
Python SDK
Numeric Scores When ingesting numeric scores, you can provide the value as a float. If you provide a configId, the score value will be validated against the configâs numeric range, which might be defined by a minimum and/or maximum value.JS/TS SDK
Numeric Scores When ingesting numeric scores, you can provide the value as a float. If you provide a configId, the score value will be validated against the configâs numeric range, which might be defined by a minimum and/or maximum value.Inferred Score Properties
Certain score properties might be inferred based on your input:- If you donât provide a score data type it will always be inferred. See tables below for details.
- For boolean and categorical scores, we will provide the score value in both numerical and string format where possible. The score value format that is not provided as input, i.e. the translated value is referred to as the inferred value in the tables below.
- On read for boolean scores both numerical and string representations of the score value will be returned, e.g. both 1 and True.
- For categorical scores, the string representation is always provided and a numerical mapping of the category will be produced only if a
ScoreConfigwas provided.
Numeric Scores
For example, letâs assume youâd like to ingest a numeric score to measure accuracy. We have included a table of possible score ingestion scenarios below.| Value | Data Type | Config Id | Description | Inferred Data Type | Valid |
|---|---|---|---|---|---|
0.9 | Null | Null | Data type is inferred | NUMERIC | Yes |
0.9 | NUMERIC | Null | No properties inferred | Yes | |
depth | NUMERIC | Null | Error: data type of value does not match provided data type | No | |
0.9 | NUMERIC | 78545 | No properties inferred | Conditional on config validation | |
0.9 | Null | 78545 | Data type inferred | NUMERIC | Conditional on config validation |
depth | NUMERIC | 78545 | Error: data type of value does not match provided data type | No |
Categorical Scores
For example, letâs assume youâd like to ingest a categorical score to measure correctness. We have included a table of possible score ingestion scenarios below.| Value | Data Type | Config Id | Description | Inferred Data Type | Inferred Value representation | Valid |
|---|---|---|---|---|---|---|
correct | Null | Null | Data type is inferred | CATEGORICAL | Yes | |
correct | CATEGORICAL | Null | No properties inferred | Yes | ||
1 | CATEGORICAL | Null | Error: data type of value does not match provided data type | No | ||
correct | CATEGORICAL | 12345 | Numeric value inferred | 4 numeric config category mapping | Conditional on config validation | |
correct | NULL | 12345 | Data type inferred | CATEGORICAL | Conditional on config validation | |
1 | CATEGORICAL | 12345 | Error: data type of value does not match provided data type | No |
Boolean Scores
For example, letâs assume youâd like to ingest a boolean score to measure helpfulness. We have included a table of possible score ingestion scenarios below.| Value | Data Type | Config Id | Description | Inferred Data Type | Inferred Value representation | Valid |
|---|---|---|---|---|---|---|
1 | BOOLEAN | Null | Valueâs string equivalent inferred | True | Yes | |
true | BOOLEAN | Null | Error: data type of value does not match provided data type | No | ||
3 | BOOLEAN | Null | Error: boolean data type expects 0 or 1 as input value | No | ||
0.9 | Null | 93547 | Data type and valueâs string equivalent inferred | BOOLEAN | True | Conditional on config validation |
depth | BOOLEAN | 93547 | Error: data type of value does not match provided data type | No |
Update Existing Scores via API/SDKs
When creating a score, you can provide an optionalid parameter. This will update the score if it already exists within your project.
If you want to update a score without needing to fetch the list of existing scores from ABV, you can set your own id parameter as an idempotency key when initially creating the score.