You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Eugene <ed...@gmail.com> on 2014/08/07 19:16:15 UTC

Understanding Trident parallelism

I am running Trident topology and looking for some guidance in topology
parallelism and how it gets broken down across cluster.

Here is my topology with some parallelism settings and number of workers is
3.

TridentTopology topology = new TridentTopology();
TridentKafkaConfig spoutConfig = new TridentKafkaConfig(
        brokerHosts,
        "batch_processing_events2",
        "batch_processing_event_client2");

spoutConfig.scheme = new SchemeAsMultiScheme(new EventStringScheme());
OpaqueTridentKafkaSpout spout = new OpaqueTridentKafkaSpout(spoutConfig);

Stream inputStream=topology.newStream("offlineEvents", spout)
inputStream.each(new Fields("event"),new getProfile(),new
Fields("profile")).parallelismHint(8)
.each(new Fields("event"),new getTargetList(),new
Fields("targetList")).parallelismHint(8)
.each(new Fields("event"),new partitionRules(),new
Fields("ruleId")).parallelismHint(3)
.each(new Fields("event","profile","targetList","ruleId"), new
executeRule(), new Fields("tacticId","tactic")).parallelismHint(*30*)
.project(new Fields("tacticId","tactic","event"))
.each(new Fields("tactic"),new filterAssignedTactic()).parallelismHint(8)
.partitionBy(new Fields("tacticId"))
.partitionPersist(mongoDBStateFactory, new Fields("tactic","event"), new
MongoDBStateUpdater())

Config config = new Config()
config.setNumWorkers(*3*)

My observation is that only last value (or bigger value) of 30 applied to
combined trident bolt.
Secondly, all the work went to one node, capacity close to 1 , all other
nodes did not get much work. Why it was not evenly balanced across the
nodes?  Because I don't have any partitioning operations and everything
went to the same node? I checked in Graphite than execute method was
running only on one node.


[image: Inline image 1]


Second, how can I learn what functions and filter those b1 and b0 contains.
It was quite clear with Storm topology, but not so with Trident.
[image: Inline image 2]

So question is how parallelism settings work in cluster, should I use
partition operation to break into bolts and across machines, and how to
increase parallelism of specific function.


Thanks
Eugene