In TensorZero, datasets are collections of data that can be used for workflows like evaluations and optimization recipes. You can create and manage datasets using the TensorZero UI or programmatically using the TensorZero Gateway. A dataset is a named collection of datapoints. Each datapoint belongs to a function, with fields that depend on the function’s type. Broadly speaking, each datapoint largely mirrors the structure of an inference, with an input, an optional output, and other associated metadata (e.g. tags).Documentation Index
Fetch the complete documentation index at: https://www.tensorzero.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Endpoints & Methods
Get datapoints by ID
This endpoint retrieves specific datapoints by their IDs.- Gateway Endpoint:
POST /v1/datasets/{dataset_name}/get_datapoints - Client Method:
get_datapoints - Parameters:
dataset_name(string)ids(list of UUIDs, required)
datapoints, a list of datapoint objects.
Stale (soft-deleted) datapoints are included in the response when fetched by ID.
List datapoints
This endpoint returns a list of datapoints in the dataset. Each datapoint is an object that includes all the relevant fields (e.g. input, output, tags).- Gateway Endpoint:
POST /v1/datasets/{dataset_name}/list_datapoints - Client Method:
list_datapoints - Parameters:
dataset_name(string)function_name(string, optional) - only return datapoints for this functionlimit(int, optional, defaults to 20)offset(int, optional, defaults to 0)filter(object, optional) - filter by tags, time, or logical combinations (AND/OR/NOT)order_by(list of objects, optional) - ordering criteria (e.g. bytimestamporsearch_relevance)
datapoints, a list of datapoint objects.
Create datapoints
This endpoint adds a list of datapoints to a dataset. If the dataset does not exist, it will be created with the given name.- Gateway Endpoint:
POST /v1/datasets/{dataset_name}/datapoints - Client Method:
create_datapoints - Parameters:
dataset_name(string)datapoints(list of objects, see below)
type field ("chat" or "json") and a function_name field.
For chat datapoints, the following fields are available:
function_name(string, required)input(object, required, identical to an inference’sinput)output(a list of objects, optional, each object must be a content block like in an inference’s output)episode_id(UUID, optional)allowed_tools(list of strings, optional, identical to an inference’sallowed_tools)tool_choice(string, optional, identical to an inference’stool_choice)parallel_tool_calls(boolean, optional, defaults tofalse)tags(map of string to string, optional)name(string, optional)
json datapoints, the following fields are available:
function_name(string, required)input(object, required, identical to an inference’sinput)output(object, optional, an object that matches theoutput_schemaof the function)output_schema(object, optional, a dynamic JSON schema that overrides the output schema of the function)episode_id(UUID, optional)tags(map of string to string, optional)name(string, optional)
ids, a list of IDs (strings, UUIDv7) of the newly created datapoints.
Create datapoints from inferences
This endpoint creates datapoints from existing inferences. You can specify either a list of inference IDs or a query to find matching inferences. If the dataset does not exist, it will be created with the given name.- Gateway Endpoint:
POST /v1/datasets/{dataset_name}/from_inferences - Client Method:
create_datapoints_from_inferences - Parameters:
dataset_name(string)type(string, either"inference_ids"or"inference_query")
type is "inference_ids":
inference_ids(list of UUIDs, required) - the inference IDs to create datapoints fromoutput_source(string, optional, defaults to"inference") - the source of the output for the datapoint ("inference","demonstration", or"none")
type is "inference_query", the request body accepts the same parameters as the List Inferences endpoint (e.g. function_name, variant_name, output_source, filters, etc.).
The endpoint returns an object with ids, a list of IDs (strings, UUIDv7) of the newly created datapoints.
Update datapoints
This endpoint updates one or more datapoints in a dataset by creating new versions. The original datapoint is marked as stale (i.e. a soft deletion), and a new datapoint is created with the updated values and a new ID. The response returns the newly created IDs.- Gateway Endpoint:
PATCH /v1/datasets/{dataset_name}/datapoints - Client Method:
update_datapoints - Parameters:
dataset_name(string)datapoints(list of objects, see below)
id (string, UUIDv7) and type ("chat" or "json").
The following fields are optional.
If provided, they will update the corresponding fields in the datapoint.
If omitted, the fields will remain unchanged.
If set to null, the fields will be cleared (as long as they are nullable).
For chat datapoints, you can update the following fields:
input(object) - replaces the datapoint’s inputoutput(list of content blocks) - replaces the datapoint’s outputallowed_tools(list of strings or null) - replaces the allowed toolstool_choice(string or null) - replaces the tool choiceparallel_tool_calls(boolean or null) - replaces the parallel tool calls settingtags(map of string to string) - replaces all tagsname(string or null) - replaces the name (can be set tonullto clear)
json datapoints, you can update the following fields:
input(object) - replaces the datapoint’s inputoutput(object or null) - replaces the output (validated against the output schema; can be set tonullto clear)output_schema(object) - replaces the output schematags(map of string to string) - replaces all tagsname(string or null) - replaces the name (can be set tonullto clear)
ids, a list of IDs (strings, UUIDv7) of the updated datapoints.
Update datapoint metadata
This endpoint updates metadata fields for one or more datapoints in a dataset. Unlike updating the full datapoint, this operation updates the datapoint in-place without creating a new version.- Gateway Endpoint:
PATCH /v1/datasets/{dataset_name}/datapoints/metadata - Client Method:
update_datapoints_metadata - Parameters:
dataset_name(string)datapoints(list of objects, see below)
datapoints field must contain a list of objects.
Each object must have the field id (string, UUIDv7).
The following field is optional:
name(string or null) - replaces the name (can be set tonullto clear)
name is omitted, no changes will be made to the datapoint.
The endpoint returns an object with ids, a list of IDs (strings, UUIDv7) of the updated datapoints.
These IDs are the same as the input IDs since the datapoints are updated in-place.
Delete datapoints
This endpoint performs a soft deletion of one or more datapoints: the datapoints are marked as stale and will be disregarded by the system in the future (e.g. when listing datapoints or running evaluations), but the data remains in the database.- Gateway Endpoint:
DELETE /v1/datasets/{dataset_name}/datapoints - Client Method:
delete_datapoints - Parameters:
dataset_name(string)ids(list of UUIDs, required)
num_deleted_datapoints, the number of datapoints that were deleted.
Delete a dataset
This endpoint performs a soft deletion of an entire dataset: all datapoints in the dataset are marked as stale.- Gateway Endpoint:
DELETE /v1/datasets/{dataset_name} - Client Method:
delete_dataset - Parameters:
dataset_name(string)
num_deleted_datapoints, the number of datapoints that were deleted.