Skip to main content

Batching Backend

📝 Coming Soon: Guide on batching LLM requests for optimal throughput.

Preview

Batching combines multiple requests to reduce overhead:

// Coming in v0.5.0
val batchedBackend = OpenAIBackend().batched(
    maxBatchSize = 10,
    maxWaitTime = 100.milliseconds
)

See Also

Preview
See Also