You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Behzad Pirvali <bp...@gmail.com> on 2015/01/30 23:33:30 UTC
Storm Trident doing many aggregations.
I am new to storm-trident. I am supposed to do around 100 different
aggregations on a stream.
I would like to distribute into 4 different workers/machines using
partitionBy("key-x") claus.
In each node, "key-x" aggregates could be done through
partitionAggregate(...). Any other key has to go through a
partitionAggregate followed by a global aggregator/reducer.
So far, I have come up with the following approach:
topology.newStream("faketweetspout", spout).
each( spout.getOutputFields(), new Debug() ).
//shuffle().
partitionBy(new Fields("City")).
chainedAgg().
partitionAggregate(new Fields("Country"), new CityAggregator(), new
Fields("country_partition_counts")).
partitionAggregate(new Fields("City"), new CityAggregator(), new
Fields("city_counts")).
chainEnd().
parallelismHint(4).
each(new Fields("country_partition_counts","city_counts"), new
TridentUtility.Print()).
aggregate(new Fields("country_partition_counts"), new
CountryAggregator(), new Fields("country_counts")).
each(new Fields("country_counts"), new TridentUtility.Print())
;
Am I going in the right direction?
I have got a bad feeling with Trident as with normal Storm I am in control
of number of Spouts/Bolts and distribution and can make sure that I am
efficient whereas with Trident, I do not have that type of transparency. I
want to make sure to use efficiently the nodes and the network bandwidth.
Another approach would be to use each claus:
Stream mainStream = topology.newStream("my-stream", new MySpout());
Stream avgAggStream = mainStream.each(new Fields("field1"), new
FilterAvg());
Stream sumAggStream = mainStream.each(new Fields("field1"), new
FilterSum());
Is using each(...) clause results in more Spouts, which might cause more
network-bandwidth usage?
I feel that Trident when it compiles into Spout and Bolts, it should
printout a help on how many spouts/bolts and their inter-connections!
Thanks,
Behzad