Usage
The TensorZero Gateway supports the following cache modes:write_only
(default): Only write to cache but don’t serve cached responsesread_only
: Only read from cache but don’t write new entrieson
: Both read from and write to cacheoff
: Disable caching completely
Example
Technical Notes
- The cache applies to individual model requests, not inference requests. This means that the following will be cached separately: multiple variants of the same function; multiple calls to the same function with different parameters; individual model requests for inference-time optimizations; and so on.
- The
max_age_s
parameter applies to the retrieval of cached responses. The cache does not automatically delete old entries (i.e. not a TTL). - When the gateway serves a cached response, the usage fields are set to zero.
- The cache data is stored in ClickHouse.
- For batch inference, the gateway only writes to the cache but does not serve cached responses.