How to configure functions & variants

A function represents a task or agent in your application (e.g. “write a product description” or “answer a customer question”).
A variant is a specific way to accomplish it: a choice of model, prompt, inference parameters, etc.

You can call models directly when getting started, but functions and variants unlock powerful capabilities as your application matures. Some of the benefits include:

Collect metrics and feedback: Track performance and gather feedback for optimization.
Run A/B tests: Experiment with different models, prompts, and parameters.
Create prompt templates: Decouple prompts from application code for easier iteration.
Configure retries & fallbacks: Build systems that handle provider downtime gracefully.
Use advanced inference strategies: Easily implement advanced inference-time optimizations like dynamic in-context-learning and best-of-N sampling.

Configure functions & variants

TensorZero supports two function types:

chat is the typical chat interface used by most LLMs. It returns unstructured text responses.
json is for structured outputs. It returns responses that conform to a JSON schema. See Generate structured outputs (JSON).

The skeleton of a function configuration looks like this:

tensorzero.toml

[functions.my_function_name]
type = "..." # "chat" or "json"
# ... other fields depend on the function type ...

A variant is a particular implementation of a function. It specifies the model to use, prompt templates, decoding strategy, hyperparameters, and other settings. The skeleton of a variant configuration looks like this:

tensorzero.toml

[functions.my_function_name.variants.my_variant_name]
type = "..." # e.g. "chat_completion"
model = "..." # e.g. "openai::gpt-5" or "my_gpt_5"
# ... other fields (e.g. prompt templates, inference parameters) ...

The simplest variant type is chat_completion, which is the typical chat completion format used by OpenAI and many other LLM providers. TensorZero supports other variant types that implement inference-time optimizations. You can define prompt templates in your variant configuration rather than sending prompts directly in your inference requests. This decouples prompts from application code and enables easier experimentation and optimization. See Create a prompt template for more details. If you define multiple variants, TensorZero will randomly sample one of them at inference time. You can define more advanced experimentation strategies (e.g. Run adaptive A/B tests), fallback-only variants (e.g. Retries & Fallbacks), and more.

Example

Let’s create a function called answer_customer with two variants: GPT-5 and Claude Sonnet 4.5.

tensorzero.toml

[functions.answer_customer]
type = "chat"

[functions.answer_customer.variants.gpt_5_baseline]
type = "chat_completion"
model = "openai::gpt-5"

[functions.answer_customer.variants.claude_sonnet_4_5]
type = "chat_completion"
model = "anthropic::claude-sonnet-4-5"

You can now call the answer_customer function and TensorZero will randomly select one of the two variants for each request.

Make inference requests

Once you’ve configured a function and its variants, you can make inference requests to the TensorZero Gateway.

Python
Python (OpenAI SDK)
Node (OpenAI SDK)
HTTP

result = t0.inference(
    function_name="answer_customer",
    input={
        "messages": [
            {"role": "user", "content": "What is your return policy?"},
        ],
    },
)

result = client.chat.completions.create(
    model="tensorzero::function_name::answer_customer",
    messages=[
        {"role": "user", "content": "What is your return policy?"},
    ],
)

const response = await client.chat.completions.create({
  model: "tensorzero::function_name::answer_customer",
  messages: [
    {
      role: "user",
      content: "What is your return policy?",
    },
  ],
});

curl -X POST "http://localhost:3000/inference" \
  -H "Content-Type: application/json" \
  -d '{
    "function_name": "answer_customer",
    "input": {
      "messages": [
        {
          "role": "user",
          "content": "What is your return policy?"
        }
      ]
    }
  }'

See Call any LLM for complete examples including setup and sample responses.

Introduction

Gateway

Observability

Optimization

Evaluations

Experimentation

Deployment

Operations

How to configure functions & variants

Configure functions & variants

Example

Make inference requests

Introduction

Gateway

Observability

Optimization

Evaluations

Experimentation

Deployment

Operations

​Configure functions & variants

​Example

​Make inference requests

Configure functions & variants

Example

Make inference requests