You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Manikandan <ma...@gmail.com> on 2014/01/17 20:29:38 UTC

Using the results of the aggregation - grouping operation

Hi

My requirement is to process the set of input files, do some
transformations and then generate the output file.

The topology is something like this
Step 1 (Read data) -> Step2 (calculate average on a column) -> Step3
(Filter based on the average) -> Step4 (write to file)

Step1 emits input records to step2 for average calculation.
Step2 has to wait until all the input records are exhausted and then pass
on the result to step3 along with the input records, because step3 has to
filter input based on average.

Now step2 has to maintain the input records in memory. Even if step2 passes
on to step3 as and when it receives (taking only the column required for
calculation), step3 still has to wait for the result of step3, which is
again a memory bottleneck.

How to handled this situation in all the following cases?
DRPC topology
Trident
Regular Topology.

Thanks & Regards
Manikandan