You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Andrew Xor <an...@gmail.com> on 2014/07/31 04:14:13 UTC

Batch Process tuples emitted by different streams

Hi,

 I have a scenario where I have a bolt that receives the outputs of
multiple spouts (each spout is a live-stream of emitted sensor values).
>From my understanding the processing bolt that is assigned to the task will
receive each tuple by it's own separately (per each stream).

The thing is that I want to process in the bolt the values of all the
streams in the same tick. One method (if I have only one thread in the
processing bolt) is to wait small-time period or some ticks (for example
process and emit per x received tuples while storing in a map the tuples
received from each stream); that helps to receive tuples of each stream and
process it as a batch.

Will that be a sound approach in a non-transactional topology or should I
use Trident in order to ensure ordering? Also in Storm's documentation I
could not find if the chronological-ordering is enforced in any way... for
example let's say that we have two spouts that each emit two tuples:

 Spout1: (Tuple1, t1), (Tuple2, t2)
 Spout2: (Tuple3, t1), (Tuple4, t2)

In which order will the bolt receive the tuples? Will the chronological
order be preserved in a trident topology?

Thanks...!