TensorZero Autopilot is an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. Schedule a demo →
Learn how to use the TensorZero Evaluations to build principled LLM-powered applications.
TensorZero offers two types of evaluations:Inference Evaluations focus on evaluating the performance of a TensorZero variant (i.e. a choice of prompt, model, inference strategy, etc.) on a given dataset.Workflow Evaluations focus on evaluating complex workflows that might include multiple TensorZero inference calls, arbitrary application logic, and more.As a vague analogy, inference evaluations are like unit tests for individual inference calls, and workflow evaluations are like integration tests for complex workflows.