SearX Integration¶
SearX is a privacy-respecting, hackable metasearch engine that allows the AI assistant to perform real-time web searches. This integration provides up-to-date information beyond the LLM's training data cutoff.
Overview¶
The SearX integration enables the AI assistant to: - Search the web for current information - Access real-time data and news - Bypass model knowledge limitations - Provide citations and sources for information
Configuration¶
Environment Variables¶
Add the following to your .env
file:
# SearX instance URL (self-hosted or public instance)
SEARXNG_URL=http://localhost:8080
# Optional: API key if using a secured instance
SEARX_API_KEY=your_api_key_here
# Search settings
SEARX_TIMEOUT=30
SEARX_MAX_RESULTS=10
Supported SearX Instances¶
You can use: - Self-hosted: Run your own SearX instance for maximum privacy - Public instances: Use reliable public SearX instances - Cloud deployment: Deploy to cloud providers for scalability
Implementation Status¶
Current Status: Planned Feature
This integration is part of the roadmap and will be implemented in future releases.
Planned Features¶
Basic Search Functionality¶
# Planned implementation
async def search_web(query: str, categories: List[str] = None) -> List[SearchResult]:
"""
Perform web search using SearX
Args:
query: Search query
categories: Optional search categories (news, images, etc.)
Returns:
List of search results with titles, URLs, and snippets
"""
pass
Advanced Search Options¶
- Category filtering: News, images, videos, maps
- Language targeting: Search in specific languages
- Time filtering: Recent results only
- Region targeting: Country-specific results
Result Processing¶
- Relevance scoring: Rank results by relevance
- Content extraction: Extract key information from pages
- Citation generation: Create proper citations for responses
- Duplicate detection: Filter duplicate results
Usage Examples¶
Basic Search Integration¶
# Planned usage in tool calling
@tool
async def web_search(query: str) -> str:
"""
Search the web for current information
Use this tool when you need up-to-date information that
may not be in the model's training data.
"""
results = await searx_client.search(query)
return format_search_results(results)
Integration with AI Agent¶
The SearX tool will be automatically available to the AI assistant when needed. The agent will determine when real-time information is required and use the search tool accordingly.
Setup Instructions¶
Option 1: Self-Hosted SearX (Recommended)¶
-
Install SearX:
# Using Docker (easiest method) docker run -d -p 8080:8080 --name searx searx/searx
-
Verify installation:
curl http://localhost:8080
-
Configure the AI assistant:
echo "SEARXNG_URL=http://localhost:8080" >> .env
Option 2: Public Instance¶
- Find a reliable public instance from searx.space
- Update configuration:
echo "SEARXNG_URL=https://public.searx.instance" >> .env
Security Considerations¶
Privacy Protection¶
- SearX doesn't track users or store search history
- Requests are anonymized through the SearX instance
- No personal data is shared with search engines
Rate Limiting¶
- Implement appropriate rate limiting to avoid abuse
- Respect search engine terms of service
- Use caching to reduce duplicate requests
Content Safety¶
- Implement result filtering for safe content
- Validate URLs before accessing content
- Use HTTPS for secure connections
Error Handling¶
Common Issues¶
- Instance unavailable: Fallback to cached results or disable feature
- Rate limiting: Implement exponential backoff
- Network errors: Retry with circuit breaker pattern
Planned Error Handling¶
class SearXClient:
async def search_with_fallback(self, query: str) -> SearchResults:
try:
return await self.search(query)
except SearXError as e:
logger.warning(f"SearX search failed: {e}")
return await self.get_cached_results(query)
Performance Optimization¶
Caching Strategy¶
- Query result caching: Cache search results for common queries
- Content caching: Cache extracted content from pages
- TTL settings: Set appropriate cache expiration times
Async Operations¶
- Use async/await for non-blocking operations
- Implement connection pooling for HTTP requests
- Use background tasks for content processing
Testing¶
Unit Tests¶
@pytest.mark.asyncio
async def test_searx_search():
"""Test basic search functionality"""
client = SearXClient("http://localhost:8080")
results = await client.search("test query")
assert len(results) > 0
Integration Tests¶
@pytest.mark.integration
async def test_searx_integration():
"""Test full integration with AI agent"""
# Test that agent can use search tool effectively
pass
Roadmap¶
Phase 1: Basic Integration¶
- [ ] Implement basic search functionality
- [ ] Add error handling and retry logic
- [ ] Create unit tests
Phase 2: Advanced Features¶
- [ ] Add result filtering and ranking
- [ ] Implement content extraction
- [ ] Add citation generation
Phase 3: Production Ready¶
- [ ] Performance optimization
- [ ] Comprehensive error handling
- [ ] Security hardening
Related Documentation¶
Support¶
For issues with SearX integration: 1. Check the SearX instance is running and accessible 2. Verify network connectivity and firewall settings 3. Review logs for detailed error information 4. Consult the troubleshooting guide