Deployment¶
This guide covers the settings and precautions required to run iscc-search reliably in production.
Single worker only¶
Data corruption risk
The usearch backend has no multi-process coordination. Running multiple workers against the same data directory will corrupt your indexes. This is not recoverable without a full re-index.
Always run with exactly one worker process. FastAPI's async/await handles concurrent connections within that single process.
Do not set ISCC_SEARCH_WORKERS to a value greater than 1 when using the usearch backend.
Docker quick start¶
services:
iscc-search:
image: ghcr.io/iscc/iscc-search:latest
container_name: iscc-search-api
ports:
- 8000:8000
volumes:
- iscc-data:/data
environment:
- ISCC_SEARCH_INDEX_URI=usearch:///data
# - ISCC_SEARCH_API_SECRET=your-secret-key
# - ISCC_SEARCH_CORS_ORIGINS=https://example.com
restart: unless-stopped
stop_grace_period: 300s
deploy:
resources:
limits:
cpus: '2.0'
memory: 4G
reservations:
cpus: '1.0'
memory: 2G
volumes:
iscc-data:
driver: local
Start it:
Graceful shutdown¶
On shutdown, iscc-search flushes all dirty HNSW indexes to disk. Large indexes can take a few minutes to save. You must give the process enough time to complete the drain phase plus the flush phase.
uvicorn's shutdown is strictly sequential:
- uvicorn stops accepting new connections (immediate).
- uvicorn waits for in-flight requests to complete, bounded by
--timeout-graceful-shutdown. - uvicorn runs the FastAPI lifespan handler, which calls
index.close()to flush HNSW shards. This step has no timeout in uvicorn — only Docker'sstop_grace_periodcan stop it. - Docker sends SIGKILL once
stop_grace_periodelapses since the initial SIGTERM.
This means:
If stop_grace_period equals timeout_graceful_shutdown, a slow request can consume the entire grace
window and Docker SIGKILLs the process the moment the lifespan flush tries to start, losing all dirty
HNSW state. This is a real failure mode, not a theoretical one.
Defaults:
- Dockerfile:
--timeout-graceful-shutdown 60(drain timeout) - compose.yaml:
stop_grace_period 300s(60s drain + 240s flush headroom)
For very large indexes (10M+ vectors), raise stop_grace_period to 600s or more — the drain timeout
stays at 60s because it bounds request latency, not flush latency.
Environment variables¶
| Variable | Default | Production recommendation |
|---|---|---|
ISCC_SEARCH_INDEX_URI |
usearch:// + platform dir |
usearch:///data (explicit path) |
ISCC_SEARCH_API_SECRET |
None (public) | Set a strong secret |
ISCC_SEARCH_CORS_ORIGINS |
* |
Restrict to your domains |
ISCC_SEARCH_FLUSH_INTERVAL |
100000 |
Keep at default, or raise for higher write throughput |
ISCC_SEARCH_LOG_LEVEL |
info |
info or warning |
ISCC_SEARCH_SENTRY_DSN |
None | Set for error tracking |
ISCC_SEARCH_HOST |
0.0.0.0 |
0.0.0.0 |
ISCC_SEARCH_PORT |
8000 | 8000 |
Flush interval
The default FLUSH_INTERVAL=100000 auto-flushes derived HNSW indexes every 100,000 mutations, capping
data loss on hard crashes (OOM, SIGKILL, power loss). Setting it to 0 disables auto-flush and means
indexes are only saved on graceful shutdown — faster ingestion but unbounded loss on crash. Raise the
value for slightly higher write throughput at the cost of a larger loss window.
Sizing profiles¶
| Profile | Assets | RAM | CPU | Notes |
|---|---|---|---|---|
| Sandbox | up to 100K | 4 GB | 2 cores | Development and testing |
| Validation | up to 500K | 64 GB | 4 cores | Pre-production validation |
| Launch | up to 1M | 128 GB | 8 cores | Initial production deployment |
| Growth | 5M+ | 256 GB+ | 16+ cores | Large-scale production |
HNSW indexes are memory-mapped. RAM requirements grow with index size. Monitor RSS and adjust limits.
Health probes¶
Use the built-in health endpoints with your orchestrator.
Liveness (/healthz) - always returns 200 if the process responds. Use for restart decisions:
Readiness (/readyz) - returns 200 only when the index is initialized and list_indexes() succeeds.
Use for traffic routing:
Horizontal scaling¶
Each iscc-search instance must have its own data volume. Shared volumes between instances will corrupt data.
Run independent instances behind a load balancer. Each instance holds a separate copy of the index (or a partition of the data).
upstream iscc_search {
server iscc-search-1:8000;
server iscc-search-2:8000;
server iscc-search-3:8000;
}
server {
listen 443 ssl;
location / {
proxy_pass http://iscc_search;
}
}
Feed the same data to each instance, or shard by content type. Writes must be routed to the correct instance.
Production checklist¶
Before going live, verify the following:
-
ISCC_SEARCH_INDEX_URIpoints to a persistent volume - Worker count is 1 (or unset)
-
ISCC_SEARCH_FLUSH_INTERVALis non-zero (default100000) -
ISCC_SEARCH_API_SECRETis set -
ISCC_SEARCH_CORS_ORIGINSis restricted to your domains -
stop_grace_periodis>= timeout_graceful_shutdown + expected_flush_duration(default300scovers ~60s drain + ~240s flush; raise for indexes over 10M vectors) - Health probes are configured in your orchestrator
- Each instance has its own data volume (no sharing)
- Resource limits (CPU, memory) match your sizing profile
- Sentry DSN is configured for error tracking
- Backups are scheduled for the data volume
Troubleshooting¶
Derived indexes out of sync¶
Symptom: Boot logs show ShardedNphdIndex 'X' out of sync: expected N vectors, found M or
UsearchSimprintIndex 'X' out of sync: expected N, found M. Search results are stale or empty for affected
unit/simprint types. Asset counts in /indexes reflect LMDB and may be larger than what searches return.
Cause: The process was killed (SIGKILL, OOM, host crash, stop_grace_period too short) before the
lifespan handler could flush dirty HNSW shards to disk. LMDB is the source of truth and survives unclean
exits; derived HNSW shards do not unless they were saved by flush_interval rotation, shard-size rotation,
or graceful close().
Fix: Stop the server, then run an explicit rebuild from the intact LMDB data. Auto-rebuild on startup is intentionally disabled because rebuilding large indexes can OOM the container.
# One-shot rebuild from a Python REPL or script (until the CLI command lands)
from iscc_search.indexes.usearch.index import UsearchIndex
idx = UsearchIndex("/path/to/index-dir")
for unit_type in ("META_NONE_V0", "DATA_NONE_V0", "CONTENT_TEXT_V0", "SEMANTIC_TEXT_V0"):
idx._rebuild_nphd_index(unit_type)
for sp_type in ("CONTENT_TEXT_V0", "SEMANTIC_TEXT_V0"):
idx._rebuild_simprint_index(sp_type)
idx.close()
Restart the server. To prevent recurrence, ensure ISCC_SEARCH_FLUSH_INTERVAL is set to a non-zero value
(default 100000) and stop_grace_period is sized as
timeout_graceful_shutdown + expected_flush_duration + buffer (default 60s + 240s = 300s).
LMDB corruption¶
Symptom: Server crashes on startup with lmdb.Error reading index.lmdb, or asset retrieval returns
malformed data.
Cause: Disk corruption, killed mid-write at the LMDB layer (very rare — LMDB uses MVCC and is crash-safe by design), or downgrading the LMDB version with on-disk format incompatibility.
Fix: Restore index.lmdb from backup, then run the rebuild procedure above to regenerate derived
indexes. Re-ingest if no backup is available.
Slow shutdown¶
Symptom: Container takes a long time to stop or is killed by Docker after the grace period.
Cause: Large HNSW indexes need time to flush to disk. The grace period is too short.
Fix: Raise stop_grace_period in your compose file. Keep --timeout-graceful-shutdown at the default
60s (it bounds request drain, not flush). Use the formula
stop_grace_period = 60s + measured_flush_duration + 60s buffer. Monitor shutdown logs
(Saved ShardedNphdIndex, Saved UsearchSimprintIndex) to measure actual flush duration.
Out of memory¶
Symptom: Process is killed by the OOM killer. Container restarts repeatedly.
Cause: HNSW indexes are memory-mapped. Index size exceeds available RAM.
Fix: Increase the memory limit in your container configuration. Refer to the sizing profiles table above. Consider sharding data across multiple instances.
Memory budget under container limits¶
When running with a container memory limit (e.g., mem_limit: 2g), three components compete for the budget:
| Component | What it is | Grows with |
|---|---|---|
| Python heap | Process memory (buffers, active shards, Python objects) | Concurrent writes, batch size |
| LMDB mmap | Memory-mapped pages of index.lmdb |
Total assets stored |
| Shard mmaps | Memory-mapped views of sealed .usearch shard files |
Number and size of shards |
All three count against the cgroup memory limit. The kernel reclaims file-backed pages (LMDB and shard mmaps) under pressure, but reclaimed pages cause additional read I/O when accessed again. Under tight limits this creates a feedback loop: reclaim page, re-read from disk, reclaim again.
Stress testing at a 2 GB limit with ~9K assets showed:
- Python heap: ~910 MB
- LMDB mmap (file-backed): ~470 MB
- Kernel overhead: ~22 MB
- 3,616 page reclamation events, zero OOM kills
The server survived at the absolute ceiling, but I/O increased significantly from re-reading evicted pages.
Recommendations:
- Set memory limits to at least 2x the expected LMDB size plus headroom for Python heap
- Monitor
memory.stat(anonymous vs file-backed) to understand where memory goes - With many small shards (e.g., low
SHARD_SIZE_*settings), each sealed shard is mmap'd on load — hundreds of shard files increase file-backed memory pressure and slow startup on IOPS-limited storage