You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Martin Illecker <mi...@apache.org> on 2015/01/21 23:12:19 UTC

Fwd: Immutable Objects - ConcurrentModificationException - Parallelism hint

Hello,

I'm executing a topology of four bolts (A - B - C - D) and one spout (S).
Three of these four bolts A, B and D have an execute latency of
approximately 0.5 ms.
But one bolt C needs up to 10 ms.
If I decrease the sleeping time in the spout to 10 ms then I face the
ConcurrentModificationException.

The troubleshooting page [1] says that a bolt, which might be my slow bolt
C, is modifying a object while it is being serialized to be sent over the
network.

I would like to have one object which passes the pipeline and is updated by
each bolt. Because at the end of the stream at bolt D I would like to have
all the accumulated data.
Since emitting mutual objects is not possible in Storm, I tried to solve
this by an immutable wrapper class including a deep clone.

But I think a deep clone might be a bottleneck. Is there a better solution
for one object passing through the pipeline in Storm?

And how can I achieve an optimal parallelism for this topology?
Assuming that bolt C is 10 times slower than every other bolt, should its
parallelism hint be 10 times higher too?

For example if I would use 8-core nodes *m3.2xlarge* on Amazon EC2 then I
should use 8 executors per machine for my topology according to [2].
If I would like to have a parallelism hint of 10 for every fast bolt A, B
and D and 100 for the slow bolt C, how can I realize that?

topologyBuilder.setBolt("A", new A(), 10)
topologyBuilder.setBolt("C", new A(), 100)


Thanks for your help!
Best regards
Martin

[1] https://storm.apache.org/documentation/Troubleshooting.html
[2] https://storm.apache.org/documentation/FAQ.html