Substreams
- Name: Substreams
- URL: https://docs.substreams.dev/
- Category: Blockchain data extraction / streaming-first indexing / parallel data-transformation infrastructure
- Summary: Substreams is a streaming-first transformation layer on top of Firehose. Rust/WASM modules, package reuse, parallel execution, and sink choice are the real story; this is where a lot of indexing power gets decided before anyone sees a clean query surface.
- What it does:
- Lets developers define Rust/WASM modules that extract, filter, aggregate, and transform blockchain data into reusable streams
- Runs on top of StreamingFast Firehose, inheriting file-based block storage, cursor-based reorg handling, and high-throughput historical replay
- Supports package-based reuse so downstream developers can compose existing Substreams packages instead of rebuilding transformations from raw chain data
- Sends transformed outputs into external sinks such as PostgreSQL, MongoDB, Kafka, ClickHouse, BigQuery, files, and other consumers
- Serves as a lower-layer data engine for high-performance indexing, especially where historical backfills and deterministic parallel processing matter
- Key claims:
- The official docs describe Substreams as a powerful indexing technology that can extract data from multiple chains, apply custom transformations, and send the results to destinations of the developer’s choice
- The raw README says Substreams was developed for The Graph Network and achieves extremely high performance indexing through parallelization in a streaming-first architecture
- Firehose overview docs say Firehose is a files-based, streaming-first blockchain data stack and explicitly position Substreams as the parallel transformation engine that works alongside it
- Firehose docs also claim Substreams can reuse the same underlying block storage and data sources while adding custom Rust/WASM transformation logic and dozens of downstream data sinks
- The Firehose overview positions the stack as a response to brittle or slow JSON-RPC extraction patterns, which makes Substreams especially useful as a comparison point when reasoning about where data-control power sits below classic hosted indexers
- Whitepaper: No classic whitepaper or litepaper was found during this pass. The strongest primary materials were the official Substreams docs, StreamingFast Firehose docs, and the public GitHub repository; see
../whitepapers/substreams-primary-sources-2026-05-09.md. - Sources:
- https://docs.substreams.dev/
- https://raw.githubusercontent.com/streamingfast/substreams/develop/README.md
- https://firehose.streamingfast.io/
- https://firehose.streamingfast.io/introduction/firehose-overview
- https://firehose.streamingfast.io/firehose/architecture/components
- https://github.com/streamingfast/substreams
Internal linkages
- Keep this note on the strongest adjacent reads: firehose, the-graph, and sqd.
- Useful cut: Substreams matters because package authorship, module reuse, and sink choice decide a lot of the real power before anyone sees a polished query API.
Control surface
-
The leverage sits in module authorship, package reuse, Firehose dependency, replay behavior, and sink configuration.
-
That makes Substreams a transformation control plane, not a query market and not the chain itself.
-
Keep the comparison cut clean: this sits below The Graph and closer to the extraction-and-packaging layer shared with Firehose.
-
Last reviewed: 2026-05-31 UTC