When should you use GEPA?
GEPA is particularly useful if you have high-quality inference evaluations to optimize against.| Criterion | Impact | Details |
|---|---|---|
| Complexity | Moderate | Requires inference evaluation and prompt templates |
| Data Efficiency | High | Achieves good results with limited data |
| Optimization Ceiling | Moderate | Limited to static prompt improvements |
| Optimization Cost | Moderate | Requires many evaluation runs |
| Inference Cost | Low | Generated prompt templates tend to be longer than original |
| Inference Latency | Low | Generated prompt templates tend to be longer than original |
Optimize your prompt templates with GEPA
Configure your LLM application
Define a function and variant for your application.
The variant must have at least one prompt template (e.g. the LLM system instructions).
tensorzero.toml
Example: Data Extraction (Named Entity Recognition) — Configuration
Example: Data Extraction (Named Entity Recognition) — Configuration
system_template.minijinja
Collect your optimization data
- TensorZero Dataset
- Historical Inferences
After deploying the TensorZero Gateway with ClickHouse, build a dataset for the
extract_entities function you configured.
You can create datapoints from historical inferences or external/synthetic datasets.Configure an evaluation
GEPA template refinement is guided by evaluator scores.
Define an Inference Evaluation in your TensorZero configuration.
To demonstrate that GEPA works even with noisy evaluators, we don’t provide demonstrations (labels), only an LLM judge.
GEPA supports evaluations with any number of evaluators and any evaluator type (e.g. exact match, LLM judges).
Example: Data Extraction (Named Entity Recognition) — Evaluation
Example: Data Extraction (Named Entity Recognition) — Evaluation
tensorzero.toml
system_instructions.txt
Configure GEPA
Configure GEPA by specifying the name of your function and evaluation.
You are also free to choose the models used to analyze inferences and generate new templates.The
analysis_model reflects on individual inferences, reports on whether they are optimal, need improvement, or are erroneous, and provides suggestions for prompt template improvement.
The mutation_model generates new templates based on the collected analysis reports.
We recommend using strong models for these tasks.Update your configuration
Review the generated templates and write them to your config directory:Finally, add the new variant to your configuration.
That’s it!
You are now ready to deploy your GEPA-optimized LLM application!
Example: Data Extraction (Named Entity Recognition) — Optimized Variant
Example: Data Extraction (Named Entity Recognition) — Optimized Variant
tensorzero.toml
gepa-iter-9-gepa-iter-6-gepa-iter-4-baseline/system_template.minijinja
GEPAConfig
Configure GEPA optimization by creating a GEPAConfig object with the following parameters:
Model used to analyze inference results (e.g.
"anthropic::claude-sonnet-4-5").Name of the evaluation used to score candidate variants.
Name of the TensorZero function to optimize.
Model used to generate prompt mutations (e.g.
"anthropic::claude-sonnet-4-5").Number of training samples to analyze per iteration.
Whether to include inference input/output in the analysis passed to the
mutation model. Useful for few-shot examples but can cause context overflow
with long conversations or outputs.
List of variant names to initialize GEPA with. If not specified, uses all
variants defined for the function.
Maximum number of concurrent inference calls.
Maximum number of optimization iterations.
Maximum tokens for analysis and mutation model calls. Required for Anthropic
models.
Retry configuration for inference calls during optimization.
Random seed for reproducibility.
Client timeout in seconds for TensorZero gateway operations.
Prefix for naming newly generated variants.