Caching Architecture¶

The AI Assistant System includes a sophisticated multi-layer caching architecture designed to improve performance and reduce API costs.

Overview¶

The caching system is built around several key components: - Cache Layers: Multiple storage backends (memory, Redis) - Compression: Reduces memory usage for cached content - Batching: Groups similar operations for efficiency - Integration: Seamless integration with agents and tools

Cache Layers¶

Memory Cache¶

The fastest cache layer, storing data in application memory.

from app.core.caching.layers.memory import MemoryCache

cache = MemoryCache(max_size=1000, ttl=3600)

Redis Cache¶

Distributed cache layer for multi-instance deployments.

from app.core.caching.layers.redis_cache import RedisCache

cache = RedisCache(redis_url="redis://localhost:6379/0")

Compression¶

The caching system includes intelligent compression to reduce memory usage:

from app.core.caching.compression.compressor import CacheCompressor

compressor = CacheCompressor()
compressed_data = compressor.compress(large_data)

Batching¶

Batch processing groups similar operations to reduce overhead:

from app.core.caching.batching.batch_processor import BatchProcessor

processor = BatchProcessor(batch_size=10, timeout=5.0)

Integration with Agents¶

The agent system includes automatic caching of responses:

from app.core.caching.integration.agent_cache import AgentCache

agent_cache = AgentCache(cache_layer=redis_cache)

Integration with Tools¶

Tool responses are automatically cached when enabled:

from app.core.caching.integration.tool_cache import ToolCache

tool_cache = ToolCache(cache_layer=memory_cache)

Configuration¶

Configure caching in your environment:

CACHE_ENABLED=true
CACHE_TTL=3600
REDIS_URL=redis://localhost:6379/0
CACHE_COMPRESSION=true

Monitoring¶

Monitor cache performance using the built-in metrics:

from app.core.caching.monitoring.metrics import CacheMetrics

metrics = CacheMetrics()
hit_rate = metrics.get_hit_rate()

Best Practices¶

Choose appropriate TTL: Set expiration times based on data volatility
Use compression: Enable compression for large cached objects
Monitor hit rates: Track cache effectiveness
Layer appropriately: Use memory cache for hot data, Redis for shared data