Substreams

  • Name: Substreams
  • URL: https://docs.substreams.dev/
  • Category: Blockchain data extraction / streaming-first indexing / parallel data-transformation infrastructure
  • Summary: Substreams is a streaming-first transformation layer on top of Firehose. Rust/WASM modules, package reuse, parallel execution, and sink choice are the real story; this is where a lot of indexing power gets decided before anyone sees a clean query surface.
  • What it does:
    • Lets developers define Rust/WASM modules that extract, filter, aggregate, and transform blockchain data into reusable streams
    • Runs on top of StreamingFast Firehose, inheriting file-based block storage, cursor-based reorg handling, and high-throughput historical replay
    • Supports package-based reuse so downstream developers can compose existing Substreams packages instead of rebuilding transformations from raw chain data
    • Sends transformed outputs into external sinks such as PostgreSQL, MongoDB, Kafka, ClickHouse, BigQuery, files, and other consumers
    • Serves as a lower-layer data engine for high-performance indexing, especially where historical backfills and deterministic parallel processing matter
  • Key claims:
    • The official docs describe Substreams as a powerful indexing technology that can extract data from multiple chains, apply custom transformations, and send the results to destinations of the developer’s choice
    • The raw README says Substreams was developed for The Graph Network and achieves extremely high performance indexing through parallelization in a streaming-first architecture
    • Firehose overview docs say Firehose is a files-based, streaming-first blockchain data stack and explicitly position Substreams as the parallel transformation engine that works alongside it
    • Firehose docs also claim Substreams can reuse the same underlying block storage and data sources while adding custom Rust/WASM transformation logic and dozens of downstream data sinks
    • The Firehose overview positions the stack as a response to brittle or slow JSON-RPC extraction patterns, which makes Substreams especially useful as a comparison point when reasoning about where data-control power sits below classic hosted indexers
  • Whitepaper: No classic whitepaper or litepaper was found during this pass. The strongest primary materials were the official Substreams docs, StreamingFast Firehose docs, and the public GitHub repository; see ../whitepapers/substreams-primary-sources-2026-05-09.md.
  • Sources:

Internal linkages

  • Keep this note on the strongest adjacent reads: firehose, the-graph, and sqd.
  • Useful cut: Substreams matters because package authorship, module reuse, and sink choice decide a lot of the real power before anyone sees a polished query API.

Control surface

  • The leverage sits in module authorship, package reuse, Firehose dependency, replay behavior, and sink configuration.

  • That makes Substreams a transformation control plane, not a query market and not the chain itself.

  • Keep the comparison cut clean: this sits below The Graph and closer to the extraction-and-packaging layer shared with Firehose.

  • Last reviewed: 2026-05-31 UTC