How to call any LLM

This page shows how to:

Call any LLM with the same API. TensorZero unifies every major LLM API (e.g. OpenAI) and inference server (e.g. Ollama).
Get started with a few lines of code. Later, you can optionally add observability, automatic fallbacks, A/B testing, and much more.
Use any programming language. You can use TensorZero with its Python SDK, any OpenAI SDK (Python, Node, Go, etc.), or its HTTP API.

We provide complete code examples on GitHub.

Python
Python (OpenAI SDK)
Node (OpenAI SDK)
HTTP

The TensorZero Python SDK provides a unified API for calling any LLM.

Set up the credentials for your LLM provider

For example, if you’re using OpenAI, you can set the OPENAI_API_KEY environment variable with your API key.

export OPENAI_API_KEY="sk-..."

See the Integrations page to learn how to set up credentials for other LLM providers.

Install the TensorZero Python SDK

You can install the TensorZero SDK with a Python package manager like pip.

pip install tensorzero

Initialize the TensorZero Gateway

Let’s initialize the TensorZero Gateway. For simplicity, we’ll use an embedded gateway without observability or custom configuration.

from tensorzero import TensorZeroGateway

t0 = TensorZeroGateway.build_embedded()

The TensorZero Python SDK includes a synchronous TensorZeroGateway client and an asynchronous AsyncTensorZeroGateway client. Both options support running the gateway embedded in your application with build_embedded or connecting to a standalone gateway with build_http. See Clients for more details.

Call the LLM

response = t0.inference(
    model_name="openai::gpt-5-mini",
    # or: model="anthropic::claude-sonnet-4-20250514"
    # or: Google, AWS, Azure, xAI, vLLM, Ollama, and many more
    input={
        "messages": [
            {
                "role": "user",
                "content": "Tell me a fun fact.",
            }
        ]
    },
)

Sample Response

ChatInferenceResponse(
    inference_id=UUID('0198d339-be77-74e0-b522-e08ec12d3831'),
    episode_id=UUID('0198d339-be77-74e0-b522-e09f578f34d0'),
    variant_name='openai::gpt-5-mini',
    content=[
        Text(
            text='Fun fact: Botanically, bananas are berries but strawberries are not. \n\nA true berry develops from a single ovary and has seeds embedded in the flesh—bananas fit that definition. Strawberries are "aggregate accessory fruits": the tiny seeds on the outside are each from a separate ovary.',
            arguments=None,
            type='text'
        )
    ],
    usage=Usage(input_tokens=12, output_tokens=261),
    finish_reason=FinishReason.STOP,
    original_response=None
)

See the Inference API Reference for more details on the request and response formats.

See Configure models and providers to set up multiple providers with routing and fallbacks and Configure functions and variants to manage your LLM logic with experimentation and observability.

Introduction

Gateway

Optimization

Evaluations

Experimentation

Deployment

Operations