TensorZero Autopilot is an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. Join the waitlist →
chat: the default choice for most LLM chat completion use cases
json: a specialized function type when your goal is generating structured outputs
As a rule of thumb, you should use JSON functions if you have a single, well-defined output schema.
If you need more flexibility (e.g. letting the model pick between multiple tools, or whether to pick a tool at all), then Chat Functions with tool use might be a better fit.
Create a configuration file that defines your JSON function with the output schema and JSON mode.
If you don’t specify an output_schema, the gateway will default to accepting any valid JSON output.
The field json_mode can be one of the following: off, on, strict, or tool.
The tool strategy is a custom TensorZero implementation that leverages tool use under the hood for generating JSON.
See Configuration Reference for details.
Use "strict" mode for providers that support it (e.g. OpenAI) or "tool" for others.
2
Configure your output schema
If you choose to specify a schema, place it in the relevant file:
Create a template that instructs the model to extract the information you need.
system_template.minijinja
Copy
You are a helpful AI assistant that extracts customer information from messages.Extract the customer's name and email address if present. Use null for any fields that are not found.Your output should be a JSON object with the following schema:{ "name": string or null, "email": string or null}---Examples:User: Hi, I'm Sarah Johnson and you can reach me at [email protected]Assistant: {"name": "Sarah Johnson", "email": "[email protected]"}User: My email is [email protected]Assistant: {"name": null, "email": "[email protected]"}User: This is John Doe reaching outAssistant: {"name": "John Doe", "email": null}
Including examples in your prompt helps the model understand the expected output format and improves accuracy.
4
Call the function
Python
Python (OpenAI SDK)
Node (OpenAI SDK)
HTTP
When using the TensorZero SDK, the response will include raw and parsed values.
The parsed field contains the validated JSON object.
If the output doesn’t match the schema or isn’t valid JSON, parsed will be None and you can fall back to the raw string output.
Copy
from tensorzero import TensorZeroGatewayt0 = TensorZeroGateway.build_http(gateway_url="http://localhost:3000")response = t0.inference( function_name="extract_data", input={ "messages": [ { "role": "user", "content": "Hi, I'm Sarah Johnson and you can reach me at [email protected]", } ] },)
When using the OpenAI SDK, the response content is the JSON string generated by the model.
TensorZero does not return a validated object.
Copy
from openai import OpenAIclient = OpenAI( base_url="http://localhost:3000/openai/v1", api_key="unused",)response = client.chat.completions.create( model="tensorzero::function_name::extract_data", messages=[ { "role": "user", "content": "Hi, I'm Sarah Johnson and you can reach me at [email protected]", } ],)
When using the OpenAI SDK, the response content is the JSON string generated by the model.
TensorZero does not return a validated object.
Copy
import OpenAI from "openai";const client = new OpenAI({ baseURL: "http://localhost:3000/openai/v1", apiKey: "unused",});const response = await client.chat.completions.create({ model: "tensorzero::function_name::extract_data", messages: [ { role: "user", content: "Hi, I'm Sarah Johnson and you can reach me at [email protected]", }, ],});
When using the TensorZero Inference API, the response will include raw and parsed values.
The parsed field contains the validated JSON object.
If the output doesn’t match the schema or isn’t valid JSON, parsed will be null and you can fall back to the raw string output.
Copy
curl -X POST "http://localhost:3000/inference" \ -H "Content-Type: application/json" \ -d '{ "function_name": "extract_data", "input": { "messages": [ { "role": "user", "content": "Hi, I'\''m Sarah Johnson and you can reach me at [email protected]" } ] } }'
While we recommend specifying a fixed schema in the configuration whenever possible, you can provide the output schema dynamically at inference time if your use case demands it.See output_schema in the Inference API Reference or response_format in the Inference (OpenAI) API Reference.You can also override json_mode at inference time if necessary.
Dynamic inference parameters like json_mode apply to specific variant types.
Unless you’re using an advanced variant type, the variant type will be chat_completion.
For the direct Anthropic provider, json_mode = "strict" automatically uses Anthropic’s structured outputs feature for guaranteed schema compliance.AWS Bedrock and GCP Vertex AI do not support Anthropic’s structured outputs, so json_mode = "strict" falls back to prompt-based JSON mode. Use json_mode = "tool" for more reliable schema compliance on these providers.
For Anthropic’s extended thinking models, only json_mode = "strict" (direct Anthropic) or json_mode = "off" are compatible. Other modes use prefill or forced tool use, which conflict with thinking.
GCP Vertex AI Gemini and Google AI Studio support structured outputs, but only support a subset of the JSON Schema specification.
TensorZero automatically handles some known limitations, but certain output schemas will still be rejected by the model provider.
Refer to the Google documentation for details on supported JSON Schema features.
Some model providers (e.g. OpenAI, Google) support strictly enforcing output schemas natively, but others (e.g. AWS Bedrock) do not.For providers without native support, you can still generate structured outputs with json_mode = "tool".
TensorZero converts your output schema into a tool call, then transforms the tool response back into JSON output.You can set json_mode = "tool" in your configuration file or at inference time.