You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Behzad Pirvali <bp...@gmail.com> on 2015/01/30 23:33:30 UTC

Storm Trident doing many aggregations.

I am new to storm-trident. I am supposed to do around 100 different
aggregations on a stream.
I would like to distribute into 4 different workers/machines using
partitionBy("key-x") claus.

In each node, "key-x" aggregates could be done through
partitionAggregate(...). Any other key has to go through a
partitionAggregate followed by a global aggregator/reducer.

So far, I have come up with the following approach:

    topology.newStream("faketweetspout", spout).
    each( spout.getOutputFields(), new Debug() ).
     //shuffle().
     partitionBy(new Fields("City")).
     chainedAgg().
     partitionAggregate(new Fields("Country"), new CityAggregator(), new
Fields("country_partition_counts")).
     partitionAggregate(new Fields("City"), new CityAggregator(), new
Fields("city_counts")).
     chainEnd().
     parallelismHint(4).
     each(new Fields("country_partition_counts","city_counts"), new
TridentUtility.Print()).
     aggregate(new Fields("country_partition_counts"), new
CountryAggregator(), new Fields("country_counts")).
     each(new Fields("country_counts"), new TridentUtility.Print())
     ;

Am I going in the right direction?
I have got a bad feeling with Trident as with normal Storm I am in control
of number of Spouts/Bolts and distribution and can make sure that I am
efficient whereas with Trident, I do not have that type of transparency. I
want to make sure to use efficiently the nodes and the network bandwidth.

Another approach would be to use each claus:

    Stream mainStream = topology.newStream("my-stream", new MySpout());
    Stream avgAggStream = mainStream.each(new Fields("field1"), new
FilterAvg());
    Stream sumAggStream = mainStream.each(new Fields("field1"), new
FilterSum());

Is using each(...) clause results in more Spouts, which might cause more
network-bandwidth usage?

I feel that Trident when it compiles into Spout and Bolts, it should
printout a help on how many spouts/bolts and their inter-connections!


Thanks,
Behzad