Skip to content

Getting started

By the end of this tutorial you will have a working iscc-search installation, an index with ISCC codes, and search results showing similar content.

Prerequisites

  • Python 3.10 or later
  • uv or pip for package installation
uv add iscc-search
pip install iscc-search

Verify the installation:

iscc-search version

What is ISCC?

ISCC (International Standard Content Code, ISO 24138) is a content fingerprinting system for digital media. It generates short codes from content - text, images, audio, video - that preserve similarity. Two documents with overlapping content produce ISCC codes that are close in Hamming distance. iscc-search exploits this property to find similar content across large collections.

For a deeper explanation, see the ISCC primer.

Create an index

An index stores ISCC codes and enables similarity search. Start with the memory:// backend - it keeps everything in RAM and requires no setup.

import os

os.environ["ISCC_SEARCH_INDEX_URI"] = "memory://"

from iscc_search.options import get_index
from iscc_search.schema import IsccIndex

index = get_index()
index.create_index(IsccIndex(name="myindex"))
iscc-search index add myindex --local
iscc-search index use myindex

Start the server first:

ISCC_SEARCH_INDEX_URI=memory:// iscc-search serve --dev

Then create an index:

curl -X POST http://localhost:8000/indexes \
    -H "Content-Type: application/json" \
    -d '{"name": "myindex"}'

Add ISCC codes

Each asset you add contains an ISCC-CODE - a composite fingerprint that encodes multiple similarity dimensions (content, data, instance). The index decomposes the code into individual units and indexes each one for search.

from iscc_search.schema import IsccEntry

assets = [
    IsccEntry(iscc_code="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY"),
    IsccEntry(iscc_code="ISCC:KEC6CAS5WCRSL4AE"),
]
results = index.add_assets("myindex", assets)

for r in results:
    print(f"{r.iscc_id}  status={r.status}")

Create a JSON file asset.json:

{
  "iscc_code": "ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY"
}

Add it to the active index:

iscc-search add asset.json
curl -X POST http://localhost:8000/indexes/myindex/assets \
    -H "Content-Type: application/json" \
    -d '[{"iscc_code": "ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY"}]'

Each asset gets an auto-generated ISCC-ID (a unique identifier) if you do not provide one. The status field in the result tells you whether the asset was created or updated.

Index a ready-made dataset

To experiment with real data without preparing JSON files, index one of the published ISCC datasets from the HuggingFace Hub. List them with iscc-search datasets and pull one in:

iscc-search datasets
iscc-search hub iscc/iscc-flickr30k --limit 1000

hub auto-registers a local index named after the dataset.

Search for similar content

Pass an ISCC-CODE as a query. The engine compares it against all indexed codes and returns ranked matches.

from iscc_search.schema import IsccQuery

query = IsccQuery(iscc_code="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY")
results = index.search_assets("myindex", query, limit=10)

for match in results.global_matches:
    print(f"{match.iscc_id}  score={match.score}")

index.close()
iscc-search search "ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY" --limit 10
curl -X POST http://localhost:8000/indexes/myindex/search \
    -H "Content-Type: application/json" \
    -d '{"iscc_code": "ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY"}'

The score field ranges from 0.0 to 1.0. A score of 1.0 means the codes are identical. Scores above 0.75 (the default threshold) indicate strong similarity.

Tip

You can also search using a GET request with a query parameter: GET /indexes/myindex/search?iscc_code=ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY

Try a persistent backend

The memory:// backend loses data when the process exits. For persistent storage, use lmdb:// which stores indexes on disk using LMDB (Lightning Memory-Mapped Database).

import os

os.environ["ISCC_SEARCH_INDEX_URI"] = "lmdb:///tmp/iscc-data"

from iscc_search.options import get_index
from iscc_search.schema import IsccIndex

index = get_index()
index.create_index(IsccIndex(name="persistent"))

# Add assets and search as before...
# Data survives restarts.

index.close()
iscc-search index add persistent --local --path /tmp/iscc-data
iscc-search index use persistent

Note

For production workloads with large collections, use the usearch:// backend. It adds HNSW (Hierarchical Navigable Small World) graph indexing for fast approximate nearest neighbor search. See the index backends guide.

Next steps