You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Itai Frenkel <It...@forter.com> on 2014/07/25 05:12:50 UTC

Design Patterns for Interactive Stream Processing using Storm

Hi,

I would like to discuss possible Storm design patterns for the following requirement:

Given a storm topology that is used in production for (automatic) real-time stream processing, a REST API is required by the user interface to interactively (manually) run a subset of the topology and display interim results.

For simplicity let's assume the following topology:

QueueSpout -> (multiple parallel) ProcessingBolt(s) -> Join -> ReduceBolt -> PersistenceBolt

The user interface requires each of the ProcessingBolts to be exposed as a separate REST API.

Design 1:

Deploy a separate DRPCTopology for each ProcessingBolt.

REST server acts as a reverse proxy that forwards the requests to the DRPC server.

Design 2:

REST server puts message in a priority queue with low priority, and subscribes for result in Redis.

Use OOP to enhance all processing bolts to be aware of toggles in the tuple. Effectively the tupple contains toggles, to disable all Processing bolts but one.

Another toggle forwards interim results to a (Redis) Publish Bolt instead of the ReduceBolt.

Design 1 Pros:

1. Follows the principle of immutable stream processing graph.
2. Follows the principle of preferring N simpler systems over 1 complex system.

Design 2 Pros:

1. Makes operations life easier. One system to monitor/upgrade.
2. Enabler for fine-grained monitoring probe to continuously monitor the real-time system (one subsystem at a time)
3. Enabler for customer specific stream processing (instead of topology per tenant).

Thoughts?

Itai