When should you use GEPA?
GEPA is particularly useful if you have high-quality inference evaluations to optimize against.| Criterion | Impact | Details |
|---|---|---|
| Complexity | Moderate | Requires inference evaluation and prompt templates |
| Data Efficiency | High | Achieves good results with limited data |
| Optimization Ceiling | Moderate | Limited to static prompt improvements |
| Optimization Cost | Moderate | Requires many evaluation runs |
| Inference Cost | Low | Generated prompt templates tend to be longer than original |
| Inference Latency | Low | Generated prompt templates tend to be longer than original |
Optimize your prompt templates with GEPA
Configure your LLM application
Define a function and variant for your application.
The variant must have at least one prompt template (e.g. the LLM system instructions).
tensorzero.toml
Example: Data Extraction (Named Entity Recognition) — Configuration
Example: Data Extraction (Named Entity Recognition) — Configuration
system_template.minijinja
Collect your optimization data
After deploying the TensorZero Gateway with Postgres, build a dataset for the
extract_entities function you configured.
You can create datapoints from historical inferences or external/synthetic datasets.When you launch GEPA with a single dataset_name, the dataset is automatically split 50/50 into training and validation sets.
You can also provide separate train_dataset_name and val_dataset_name for explicit control over the split.Configure an evaluation
GEPA is guided by evaluator scores, so let’s define an Inference Evaluation in your TensorZero configuration.
To demonstrate that GEPA works even with noisy evaluators, we don’t provide demonstrations (labels), only an LLM judge.
GEPA supports evaluations with any number of evaluators and any evaluator type (e.g. exact match, LLM judges).
Example: Data Extraction (Named Entity Recognition) — Evaluation
Example: Data Extraction (Named Entity Recognition) — Evaluation
tensorzero.toml
system_instructions.txt
Launch GEPA
Launch GEPA by specifying the name of your function, dataset, and evaluation.
You are also free to choose the models used to analyze inferences and generate new templates.The
analysis_model reflects on individual inferences, reports on whether they are optimal, need improvement, or are erroneous, and provides suggestions for prompt template improvement.
The mutation_model generates new templates based on the collected analysis reports.
We recommend using strong models for these tasks.- Python
- HTTP
The GEPA API requires the gateway to be configured with Postgres for durable task execution.
Poll for results
The launch endpoint returns a
task_id that you can use to poll for results.
The response will have one of three statuses: pending, completed, or error.- Python
- HTTP
Update your configuration
Review the generated templates and write them to your config directory:Finally, add the new variant to your configuration.
That’s it!
You are now ready to deploy your GEPA-optimized LLM application!
Example: Data Extraction (Named Entity Recognition) — Optimized Variant
Example: Data Extraction (Named Entity Recognition) — Optimized Variant
tensorzero.toml
gepa-iter-9-gepa-iter-6-gepa-iter-4-baseline/system_template.minijinja
API Reference
POST /v1/optimization/gepa
Launch a GEPA optimization task. Returns a task ID for polling.
Request
Name of the TensorZero function to optimize.
Model used to analyze inference results (e.g.
"openai::gpt-5.2").Model used to generate prompt mutations (e.g.
"openai::gpt-5.2").Maximum number of optimization iterations.
Single dataset name. The dataset is automatically split 50/50 into training
and validation sets. Mutually exclusive with
train_dataset_name/val_dataset_name.Training dataset name. Must be paired with
val_dataset_name. Mutually
exclusive with dataset_name.Validation dataset name. Must be paired with
train_dataset_name. Mutually
exclusive with dataset_name.Name of a configured evaluation to use for scoring.
List of variant names to initialize GEPA with. If not specified, uses all
variants defined for the function.
Prefix for naming newly generated variants.
Number of training samples to analyze per iteration. Default: 5.
Random seed for reproducibility.
Maximum number of concurrent inference calls. Default: 10.
Maximum number of datapoints to use from the dataset. Default: 1000.
Whether to include inference input/output in the analysis passed to the
mutation model. Useful for few-shot examples but can cause context overflow
with long conversations or outputs. Default: true.
Response
GET /v1/optimization/gepa/{task_id}
Poll the status of a GEPA optimization task.
Request
The request URL should include thetask_id you received when launching the GEPA workflow.
Response
The response is a tagged union on thestatus field:
Pending