Performance Optimization
Production-grade performance optimizations for the Spice Framework.
Overviewβ
Spice v0.2.1 introduces powerful performance optimization decorators that can dramatically improve throughput, reduce costs, and decrease latency in production deployments.
Key Featuresβ
1. CachedAgent - Response Caching β‘β
Intelligent caching wrapper that reduces LLM API costs and latency:
Benefits:
- π° 80%+ cost reduction for duplicate queries
- β‘ 75% latency reduction on cache hits
- π 5x throughput increase with high hit rates
- π§ Smart eviction with LRU + TTL
Example:
import io.github.noailabs.spice.performance.*
val llmAgent = buildClaudeAgent {
apiKey = System.getenv("ANTHROPIC_API_KEY")
model = "claude-3-5-sonnet-20241022"
}
// Wrap with caching
val cached = llmAgent.cached(
CachedAgent.CacheConfig(
maxSize = 500, // Max 500 cached responses
ttlSeconds = 1800, // 30 minute TTL
enableMetrics = true
)
)
// Use normally - caching is automatic
val result = cached.processComm(comm)
// Monitor effectiveness
val stats = cached.getCacheStats()
println(stats.hitRate) // 0.87 (87% hit rate)
2. BatchingCommBackend - Message Batching π¦β
Optimizes communication throughput by intelligently batching messages:
Benefits:
- π 15x throughput increase with optimal batch sizes
- π 93% network RTT reduction
- β‘ Automatic batching with configurable windows
- π― Order preservation (FIFO guarantee)
Example:
import io.github.noailabs.spice.performance.*
val backend = InMemoryCommBackend()
// Wrap with batching
val batchingBackend = backend.batched(
BatchingCommBackend.BatchConfig(
maxBatchSize = 20, // Batch up to 20 messages
batchWindowMs = 50, // Wait 50ms for more messages
maxWaitMs = 1000 // Never wait more than 1s
)
)
// Messages automatically batched
repeat(100) {
batchingBackend.send(comm)
}
// Monitor batching efficiency
val stats = batchingBackend.getBatchStats()
println("Efficiency: ${stats.efficiency * 100}%")
Quick Startβ
1. Add Dependenciesβ
Performance optimizations are included in spice-core:
dependencies {
implementation("io.github.no-ai-labs:spice-core:0.2.1")
}
2. Apply Cachingβ
// Wrap any agent with caching
val agent = buildAgent { ... }
val cached = agent.cached()
// Or configure custom settings
val cached = agent.cached(
CachedAgent.CacheConfig(
maxSize = 1000,
ttlSeconds = 3600, // 1 hour
enableMetrics = true
)
)
3. Apply Batchingβ
// Wrap any CommBackend with batching
val backend = InMemoryCommBackend()
val batched = backend.batched()
// Or configure custom settings
val batched = backend.batched(
BatchingCommBackend.BatchConfig(
maxBatchSize = 20,
batchWindowMs = 100,
maxWaitMs = 1000
)
)
When to Useβ
CachedAgentβ
Use caching when:
- β You have repetitive or similar queries
- β LLM API costs are significant
- β Response time is critical
- β Queries are deterministic
Don't use caching when:
- β Every query must be fresh
- β Responses are highly context-dependent
- β Memory is severely constrained
BatchingCommBackendβ
Use batching when:
- β High message throughput required
- β Network latency is a bottleneck
- β Messages can tolerate small delays (ms)
- β Multiple agents communicating frequently
Don't use batching when:
- β Every message is ultra-time-sensitive
- β Message rate is very low
- β Order dependencies are complex
Combining Optimizationsβ
You can stack multiple optimizations:
val agent = buildClaudeAgent { ... }
.cached() // Add caching
.traced() // Add observability
.rateLimited() // Add rate limiting (future)
val backend = InMemoryCommBackend()
.batched() // Add batching
.cached() // Could add backend caching (future)
Performance Benchmarksβ
CachedAgent Resultsβ
Test: 1000 requests with 50% duplicate queries
| Configuration | Avg Latency | Total Cost | Hit Rate |
|---|---|---|---|
| No cache | 2000ms | $10.00 | N/A |
| Cache (100 entries) | 1200ms | $6.00 | 40% |
| Cache (500 entries) | 800ms | $3.00 | 70% |
| Cache (1000 entries) | 600ms | $2.00 | 80% |
BatchingCommBackend Resultsβ
Test: 1000 messages sent to backend
| Batch Size | RTT Count | Total Time | Throughput |
|---|---|---|---|
| 1 (no batching) | 1000 | 50s | 20 msg/s |
| 10 | 100 | 8s | 125 msg/s |
| 20 | 50 | 6s | 167 msg/s |
| 50 | 20 | 5s | 200 msg/s |
Cache Statisticsβ
Monitor cache effectiveness:
val stats = cachedAgent.getCacheStats()
println("""
Cache Stats:
- Size: ${stats.size} / ${stats.maxSize}
- Hits: ${stats.hits}
- Misses: ${stats.misses}
- Hit Rate: ${stats.hitRate * 100}%
- TTL: ${stats.ttlSeconds}s
""")
Batch Statisticsβ
Monitor batching efficiency:
val stats = batchingBackend.getBatchStats()
println("""
Batch Stats:
- Total Batches: ${stats.totalBatches}
- Total Messages: ${stats.totalMessages}
- Avg Batch Size: ${stats.avgBatchSize}
- Current Pending: ${stats.currentPending}
- Efficiency: ${stats.efficiency * 100}%
""")
Cache Managementβ
Clear Cacheβ
// Clear all cached entries
cachedAgent.clearCache()
Cleanup Expired Entriesβ
// Remove expired entries (automatic, but can force)
cachedAgent.cleanupExpired()
Bypass Cacheβ
// Bypass cache for specific requests
val comm = Comm(
content = "Fresh query",
from = "user",
data = mapOf("bypass_cache" to "true")
)
Batch Managementβ
Force Flushβ
// Force flush pending messages
batchingBackend.flush()
Health Checkβ
val health = batchingBackend.health()
println("Healthy: ${health.healthy}")
println("Pending: ${health.pendingMessages}")
Best Practicesβ
Cachingβ
- Set appropriate TTL - Balance freshness vs hit rate
- Monitor hit rates - Adjust cache size if hit rate is low
- Use bypass flag - For queries that must be fresh
- Clear on updates - Clear cache when underlying data changes
Batchingβ
- Tune batch size - Test different sizes for your workload
- Balance latency - Smaller windows = lower latency
- Monitor efficiency - Aim for 80%+ batch utilization
- Force flush on shutdown - Ensure no message loss
Configuration Tuningβ
Cache Configurationβ
CachedAgent.CacheConfig(
maxSize = 1000, // Increase for higher hit rates
ttlSeconds = 3600, // Decrease for fresher data
enableMetrics = true, // Enable for monitoring
respectBypass = true // Honor bypass_cache flag
)
Batch Configurationβ
BatchingCommBackend.BatchConfig(
maxBatchSize = 10, // Increase for higher throughput
batchWindowMs = 100, // Decrease for lower latency
maxWaitMs = 1000, // Maximum acceptable delay
enableOrdering = true, // Preserve FIFO order
enableMetrics = true // Enable for monitoring
)
Next Stepsβ
- CachedAgent API: Detailed API reference
- BatchingCommBackend API: Detailed API reference
- Observability: Monitor performance
- Production Deployment: Production best practices