Skip to main content

Performance Optimization

Production-grade performance optimizations for the Spice Framework.

Overview​

Spice v0.2.1 introduces powerful performance optimization decorators that can dramatically improve throughput, reduce costs, and decrease latency in production deployments.

Key Features​

1. CachedAgent - Response Caching βš‘β€‹

Intelligent caching wrapper that reduces LLM API costs and latency:

Benefits:

  • πŸ’° 80%+ cost reduction for duplicate queries
  • ⚑ 75% latency reduction on cache hits
  • πŸš€ 5x throughput increase with high hit rates
  • 🧠 Smart eviction with LRU + TTL

Example:

import io.github.noailabs.spice.performance.*

val llmAgent = buildClaudeAgent {
apiKey = System.getenv("ANTHROPIC_API_KEY")
model = "claude-3-5-sonnet-20241022"
}

// Wrap with caching
val cached = llmAgent.cached(
CachedAgent.CacheConfig(
maxSize = 500, // Max 500 cached responses
ttlSeconds = 1800, // 30 minute TTL
enableMetrics = true
)
)

// Use normally - caching is automatic
val result = cached.processComm(comm)

// Monitor effectiveness
val stats = cached.getCacheStats()
println(stats.hitRate) // 0.87 (87% hit rate)

2. BatchingCommBackend - Message Batching πŸ“¦β€‹

Optimizes communication throughput by intelligently batching messages:

Benefits:

  • πŸ“Š 15x throughput increase with optimal batch sizes
  • 🌐 93% network RTT reduction
  • ⚑ Automatic batching with configurable windows
  • 🎯 Order preservation (FIFO guarantee)

Example:

import io.github.noailabs.spice.performance.*

val backend = InMemoryCommBackend()

// Wrap with batching
val batchingBackend = backend.batched(
BatchingCommBackend.BatchConfig(
maxBatchSize = 20, // Batch up to 20 messages
batchWindowMs = 50, // Wait 50ms for more messages
maxWaitMs = 1000 // Never wait more than 1s
)
)

// Messages automatically batched
repeat(100) {
batchingBackend.send(comm)
}

// Monitor batching efficiency
val stats = batchingBackend.getBatchStats()
println("Efficiency: ${stats.efficiency * 100}%")

Quick Start​

1. Add Dependencies​

Performance optimizations are included in spice-core:

dependencies {
implementation("io.github.no-ai-labs:spice-core:0.2.1")
}

2. Apply Caching​

// Wrap any agent with caching
val agent = buildAgent { ... }
val cached = agent.cached()

// Or configure custom settings
val cached = agent.cached(
CachedAgent.CacheConfig(
maxSize = 1000,
ttlSeconds = 3600, // 1 hour
enableMetrics = true
)
)

3. Apply Batching​

// Wrap any CommBackend with batching
val backend = InMemoryCommBackend()
val batched = backend.batched()

// Or configure custom settings
val batched = backend.batched(
BatchingCommBackend.BatchConfig(
maxBatchSize = 20,
batchWindowMs = 100,
maxWaitMs = 1000
)
)

When to Use​

CachedAgent​

Use caching when:

  • βœ… You have repetitive or similar queries
  • βœ… LLM API costs are significant
  • βœ… Response time is critical
  • βœ… Queries are deterministic

Don't use caching when:

  • ❌ Every query must be fresh
  • ❌ Responses are highly context-dependent
  • ❌ Memory is severely constrained

BatchingCommBackend​

Use batching when:

  • βœ… High message throughput required
  • βœ… Network latency is a bottleneck
  • βœ… Messages can tolerate small delays (ms)
  • βœ… Multiple agents communicating frequently

Don't use batching when:

  • ❌ Every message is ultra-time-sensitive
  • ❌ Message rate is very low
  • ❌ Order dependencies are complex

Combining Optimizations​

You can stack multiple optimizations:

val agent = buildClaudeAgent { ... }
.cached() // Add caching
.traced() // Add observability
.rateLimited() // Add rate limiting (future)

val backend = InMemoryCommBackend()
.batched() // Add batching
.cached() // Could add backend caching (future)

Performance Benchmarks​

CachedAgent Results​

Test: 1000 requests with 50% duplicate queries

ConfigurationAvg LatencyTotal CostHit Rate
No cache2000ms$10.00N/A
Cache (100 entries)1200ms$6.0040%
Cache (500 entries)800ms$3.0070%
Cache (1000 entries)600ms$2.0080%

BatchingCommBackend Results​

Test: 1000 messages sent to backend

Batch SizeRTT CountTotal TimeThroughput
1 (no batching)100050s20 msg/s
101008s125 msg/s
20506s167 msg/s
50205s200 msg/s

Cache Statistics​

Monitor cache effectiveness:

val stats = cachedAgent.getCacheStats()

println("""
Cache Stats:
- Size: ${stats.size} / ${stats.maxSize}
- Hits: ${stats.hits}
- Misses: ${stats.misses}
- Hit Rate: ${stats.hitRate * 100}%
- TTL: ${stats.ttlSeconds}s
""")

Batch Statistics​

Monitor batching efficiency:

val stats = batchingBackend.getBatchStats()

println("""
Batch Stats:
- Total Batches: ${stats.totalBatches}
- Total Messages: ${stats.totalMessages}
- Avg Batch Size: ${stats.avgBatchSize}
- Current Pending: ${stats.currentPending}
- Efficiency: ${stats.efficiency * 100}%
""")

Cache Management​

Clear Cache​

// Clear all cached entries
cachedAgent.clearCache()

Cleanup Expired Entries​

// Remove expired entries (automatic, but can force)
cachedAgent.cleanupExpired()

Bypass Cache​

// Bypass cache for specific requests
val comm = Comm(
content = "Fresh query",
from = "user",
data = mapOf("bypass_cache" to "true")
)

Batch Management​

Force Flush​

// Force flush pending messages
batchingBackend.flush()

Health Check​

val health = batchingBackend.health()
println("Healthy: ${health.healthy}")
println("Pending: ${health.pendingMessages}")

Best Practices​

Caching​

  1. Set appropriate TTL - Balance freshness vs hit rate
  2. Monitor hit rates - Adjust cache size if hit rate is low
  3. Use bypass flag - For queries that must be fresh
  4. Clear on updates - Clear cache when underlying data changes

Batching​

  1. Tune batch size - Test different sizes for your workload
  2. Balance latency - Smaller windows = lower latency
  3. Monitor efficiency - Aim for 80%+ batch utilization
  4. Force flush on shutdown - Ensure no message loss

Configuration Tuning​

Cache Configuration​

CachedAgent.CacheConfig(
maxSize = 1000, // Increase for higher hit rates
ttlSeconds = 3600, // Decrease for fresher data
enableMetrics = true, // Enable for monitoring
respectBypass = true // Honor bypass_cache flag
)

Batch Configuration​

BatchingCommBackend.BatchConfig(
maxBatchSize = 10, // Increase for higher throughput
batchWindowMs = 100, // Decrease for lower latency
maxWaitMs = 1000, // Maximum acceptable delay
enableOrdering = true, // Preserve FIFO order
enableMetrics = true // Enable for monitoring
)

Next Steps​