Evaluations
Datasets
16 min
a dataset is a collection of inputs and expected outputs and is used to test your application before executing your first dataset run, you need to create a dataset why use datasets? datasets prerequisite for dataset runs , they serve as the data input of dataset runs create test cases for your application with real production traces collaboratively create and collect dataset items with your team have a single source of truth for your test data get started 1\) creating a dataset datasets have a name which is unique within a project abv ui navigate to your project > datasets click on + new dataset to create a new dataset python sdk abv create dataset( name="\<dataset name>", \# optional description description="my first dataset", \# optional metadata metadata={ "author" "alice", "date" "2025 01 01", "type" "benchmark" } ) see python sdk https //docs abv dev/python sdk docs for details on how to initialize the python client js/ts sdk import { abvclient } from "@abvdev/client" const abv = new abvclient() await abv api datasets create({ name "\<dataset name>", // optional description description "my first dataset", // optional metadata metadata { author "alice", date "2025 01 01", type "benchmark", }, }); see docid\ j4sdnlmdmnfmk99ootgn7 docs for details on how to initialize the js/ts client 2\) create new dataset items dataset items can be added to a dataset by providing the input and optionally the expected output abv ui add item add item manually via ui import csv import csv file add from trace add from the trace view python sdk abv create dataset item( dataset name="\<dataset name>", \# any python object or value, optional input={ "text" "hello world" }, \# any python object or value, optional expected output={ "text" "hello world" }, \# metadata, optional metadata={ "model" "llama3", } ) see python sdk https //docs abv dev/python sdk docs for details on how to initialize the python client js/ts sdk import { abvclient } from "@abvdev/client"; const abv = new abvclient(); await abv api datasetitems create({ datasetname "\<dataset name>", // any js object or value input { text "hello world", }, // any js object or value, optional expectedoutput { text "hello world", }, // metadata, optional metadata { model "llama3", }, }); see typescript sdk overview docid\ j4sdnlmdmnfmk99ootgn7 docs for details on how to initialize the js/ts client create synthetic datasets frequently, you want to create synthetic examples to test your application to bootstrap your dataset llms are great at generating these by prompting for common questions/tasks to get started have a look at this cookbook for examples on how to generate synthetic datasets create items from production data a common workflow is to select production traces where the application did not perform as expected then you let an expert add the expected output to test new versions of your application on the same data abv ui in the ui, use + add to dataset on any observation (span, event, generation) of a production trace python sdk abv create dataset item( dataset name="\<dataset name>", input={ "text" "hello world" }, expected output={ "text" "hello world" }, \# link to a trace source trace id="\<trace id>", \# optional link to a specific span, event, or generation source observation id="\<observation id>" ) js/ts sdk import { abvclient } from "@abvdev/client"; const abv = new abvclient(); await abv api datasetitems create({ datasetname "\<dataset name>", input { text "hello world" }, expectedoutput { text "hello world" }, // link to a trace sourcetraceid "\<trace id>", // optional link to a specific span, event, or generation sourceobservationid "\<observation id>", }); edit/archive dataset items you can edit or archive dataset items archiving items will remove them from future experiment runs abv ui in the ui, you can edit the item by clicking on the item id to archive or delete the item, click on the dots next to the item and select archive or delete python sdk you can upsert items by providing the id of the item you want to update abv create dataset item( dataset name="\<dataset name>", id="\<item id>", \# example update status to "archived" status="archived" ) js/ts sdk you can upsert items by providing the id of the item you want to update import { abvclient } from "@abvdev/client"; const abv = new abvclient(); await abv api datasetitems create({ datasetname "\<dataset name>", id "\<item id>", // example update status to "archived" status "archived", }); dataset runs once you created a dataset, you can test and evaluate your application based on it native dataset runs https //docs abv dev/native dataset runs prompt experiments remote dataset runs https //docs abv dev/remote dataset runs