Blockchain Data Streaming Service Development Case Study - Blum

Blum

Infrastructure

Blum is building a trading and analytics platform that runs across multiple blockchains — including EVM networks, Solana, and TON. With a community of over 65 million users, they’re focused on creating products that deliver fast, reliable performance no matter how big they grow. As they introduced new trading features and expanded to more chains, Blum started looking for an indexing setup that could handle real-time data, process historical information when needed, and scale easily as demand increased.

Funds raised: $5,000,000

Market Cap: $10,210,000

Project link:

https://www.blum.io

Technology Stack:

Rust

Table of contents

Text Link

Indexing Requirements

Blum set out to build an indexing solution that could support their platform’s scale and performance requirements across EVM chains, Solana, and TON. The system needed to:

Scale horizontally to handle over 10,000 requests per second
Keep sub-10 milliseconds added latency

On TON, the indexer had to:

Get events for specific Jettons
Get events for specific accounts
Retrieve a list of all Jettons
Fetch balances of TON for sets of addresses

On EVM chains and Solana, the indexing requirements were similar — with the ability to scale horizontally by adding nodes or index RPCs as demand increased.

How Substreams Fit the Picture

Traditional blockchain indexing solutions often rely on polling nodes — repeatedly querying them for new data. This approach is inherently slow, inefficient, and difficult to scale, especially for high-throughput multi-chain platforms like Blum’s trading and analytics system. Polling creates latency bottlenecks and puts significant load on nodes, leading to potential data gaps and delays.

To meet Blum’s needs, we shifted to a data push model: instead of constantly asking nodes for data, nodes push new block information as it’s produced. However, this required modifying standard blockchain nodes to support streaming data efficiently and reliably.

The architecture also had to address common blockchain challenges: handling chain reorgs, managing data deduplication across multiple sources, and providing seamless support for both real-time streaming and historical data queries.

Our goal was to build a system that supports:

Real-time data streaming
Cursor-based streaming
Historical data streaming
Historical data processing unlimited parallelization
Caching system to reduce redundant computation and speed up repeated queries

Architecture Components Explained

Arhitecture of Substreams and Firehose blockchain data streaming service - Blum case study

Blocks Source

To enable the data push model, we utilized custom blockchain nodes for each supported chain:

Ethereum: Custom validator node was used to stream new blocks over UNIX pipe rather than relying on polling.
Solana: Geyser plugin was utilized to capture blocks as they’re produced and stream them efficiently.
These modifications allowed nodes to proactively push block data, lowering latency and node load.

Reader Services

For each chain, we run multiple reader services that connect to modified nodes and external RPC providers as fallback sources. Readers:

Stream raw block data in binary format via gRPC, enabling language-agnostic, efficient communication.
Operate redundantly—if one reader fails, another takes over without interrupting the data flow.
Aggregate block data and push it into storage buckets, forming the foundational data feed for the system.

Blocks Storage

Readers write raw block data into S3 buckets to separate concerns and optimize data handling:

One Blocks Bucket: Stores raw single blocks streamed directly from readers.
Merged Blocks Bucket: Holds deduplicated and bundled blocks (typically batches of 100) created by the Merger service.
Forked Blocks Bucket: Maintains blocks from chain reorganizations separately to support rollback and reorg handling.

Merger

The Merger service plays a vital role in cleaning and organizing block data:

Deduplicates finalized blocks coming from multiple readers, ensuring a consistent data source.
Bundles blocks into manageable sets for efficient storage and faster retrieval.
Segregates forked blocks into a dedicated bucket to maintain data integrity during chain reorganizations.

Relayer

With the introduction of Substreams, the deduplication responsibility was extracted from Firehose into a standalone Relayer service. The Relayer:

Receives raw block streams from readers.
Performs deduplication once at the source, so Firehose and Substreams consume clean, unified block data.
Simplifies client-side data consumption by guaranteeing deduplicated and ordered streams.

Firehose

The Firehose service is the primary interface for clients to consume block data:

Streams low-latency, ordered block data from the relayer or directly from historical storage.
Automatically switches between live streaming and historical backfills, abstracting complexity from clients.
Handles high throughput, supporting over 10,000 requests per second with minimal latency.

Substreams

Substreams enable developers to define exactly what blockchain data they want — reducing overfetching and improving efficiency.

‍

Substreams arhitecture flow chart. Blum data streaming service case study

The Substreams system includes:

Substreams Front Tier: Handles incoming requests, splits large block ranges into smaller chunks, and manages the overall data flow.
Linkerd2 Proxy: Distributes requests evenly and securely among worker nodes.
Tier 2 (Workers): A pool of worker services that process block data chunks in parallel, running user-defined WebAssembly (WASM) modules that filter and transform blockchain data.
Substreams Store Buckets: Cache processed data from workers, enabling reuse and reducing computation costs for repeated queries.

Cache & Reuse Data

If a module (for example, a Uniswap V3 event filter) has already produced filtered output for a given block range, Substreams can cache this data in the Substreams Store Bucket and retrieve it via putting this filter binary in WASM requirements this instead of recomputing it. This significantly saves processing time and reduces costs for users, as cached results are delivered instantly when available. The caching mechanism improves system responsiveness and overall efficiency, especially for popular or repeated queries.

SQL Sink

To support structured, queryable data for Blum, we built a custom SQL Sink that translates Substreams output into standard SQL operations — including insert, upsert, update, delete and added custom logic to support increment. The sink logs every operation to support safe rollbacks during chain reorganizations. It’s also extensible, allowing the Blum team to define custom logic and adapt it to other storage backends as needed.

So summing up, the service:

Translates Substreams output into SQL operations (insert, update, delete).
Logs all operations to enable clean rollbacks in case of chain reorganizations.
Is extensible, allowing developers to add custom operations or build alternative sinks for different storage backends.

Results

Scalable, Reliable Indexing: Blum’s platform now handles over 10,000 requests per second across EVM, Solana, and TON without significant latency increases, ensuring smooth user experience even during peak loads.
Real-Time and Historical Data Streaming: The architecture supports seamless switching between live block streaming and historical data queries, allowing fast backfills and reliable access to complete blockchain histories.
Robust Chain Reorg Handling: the system efficiently manages chain reorganizations, notifying clients when chain reorganization occurs.
Custom Data Filtering at Scale: Substreams empowered Blum’s developers to create tailored data filters in WASM, reducing bandwidth and processing overhead by streaming only relevant blockchain events.
Unlimited Parallel Processing: The distributed worker tiers enable near-unlimited horizontal scaling for historical data processing, allowing the platform to quickly catch up on large data ranges when needed.
Cost and Performance Optimizations: Smart caching of processed data cut down redundant computation and sped up data delivery, lowering infrastructure costs while improving response times.
Improved Availability: Redundant node readers and fallback mechanisms ensure high uptime and uninterrupted data flow, even if some nodes go offline.

Overall, Blum received a future-proof, scalable indexing system that meets their performance needs, scales with their growing user base, and gives their development teams the flexibility to build on top of reliable, precise blockchain data.

“At Rock'n'Block, we take your blockchain ideas

and turn them into tangible, innovative solutions."

“At Rock'n'Block, we take your blockchain ideas and turn them into tangible, innovative solutions."

Get a Free consultation

Product manager | Rock’n’Block

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Next case

Blockchain Data Streaming Service Development

Blum

Scalable blockchain data streaming across EVM, Solana, and TON using Firehose and Substreams. Real-time and historical indexing with robust reorg handling.

Technology Stack:

Ve3,3 DEX

SwapX

Inside our ve(3,3) DEX development: custom emissions, gauge voting, CLMM with Algebra, and ICHI-powered liquidity automation on Sonic.

Technology Stack:

Infrastructure

CLAMM

Bot

Marketplace

Collection

Explorer

NodeService

Launchpad

Social

P2E

CloudMining

Wallet

RealEstate

Commodities

Farming

LSD

DPoS

DEX

Rust

Node.Js

Golang

Wagmi

Next.js

Hardhat

Truffle

IPFS

Ethers.js

Mercurio

Firebase sdk

Nginx

Docker

Redis

Celery

Material UI

Redux

PostgreSQL

Django

Fireblocks SDK

TypeScript

Python

React

GraphQL

Solidity

Blockchain Data Streaming Service Development Case Study - Blum

Blum

Indexing Requirements

On TON, the indexer had to:

How Substreams Fit the Picture

Our goal was to build a system that supports:

Architecture Components Explained

Blocks Source

Reader Services

Blocks Storage

Merger

Relayer

Firehose

Substreams

Cache & Reuse Data

SQL Sink

Results

Next case

Let's Connect and Innovate Together!

Reach out to our team

Our Latest Blog Updates

A Deep Dive into How to Index Blockchain Data

How Polymarket Works | The Tech Behind Prediction Markets

How to list a token on the exchange. Complete guide for founders