You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Martin Burian <ma...@gmail.com> on 2015/08/18 12:04:15 UTC

Topology execution synchronization

Good noon to everyone,
I have encountered problems with synchronization of component execution.

When the topology starts, the spout starts emitting tuples before the other
components are prepared (the bolts recover state from redis, it takes
considerable amount of time). The emitted tuples that cannot be processed
are buffered in memory. fill the heap and cause premanent GC and a slow
painful death for the worker process. Is it meant to be like this?

In the second case, a worker running part of the topology (not the spout)
dies. The spout keeps working even though it knows the other worker is dead
and drops messages to it instead of waiting for it to come up again.

We have solved the first one by making the spout wait until all the bolts
come up, but the secondd problem would require noticeable amount of work.
Are theese behaviors intended? Can they be changed?

Thanks for replies in advance,
Martin B

Re: Topology execution synchronization

Posted by Kishore Senji <ks...@gmail.com>.

Hi Martin,

Use "topology.max.spout.pending" config. It will solve both of the
problems. Storm does not have back pressure, but this is kind of a
throttling valve on the Spout. Make sure this value is up to twice the max
throughput you expect out of your system.

On Tue, Aug 18, 2015 at 8:14 AM Pasquini, Reuben <re...@hp.com>
wrote:

> Hi Martin,
>
>
>
> Initialize your bolts in the ‘prepare’ method – the topology will not go
> live until all the bolts have completed their preparation.
>
>
>
> I believe the topology will automatically throttle calls to a spout’s
> ‘nextTuple’ method if the rest of the topology does not ‘ACK’ the tuples –
> be sure that your bolts are not blindly ‘acking’ tuples that have not
> completed processing in the bolt.
>
>
>
>
>
>
>
> *From:* Martin Burian [mailto:martin.burianjr@gmail.com]
> *Sent:* Tuesday, August 18, 2015 5:04 AM
> *To:* user@storm.apache.org
> *Subject:* Topology execution synchronization
>
>
>
> Good noon to everyone,
>
> I have encountered problems with synchronization of component execution.
>
>
>
> When the topology starts, the spout starts emitting tuples before the
> other components are prepared (the bolts recover state from redis, it takes
> considerable amount of time). The emitted tuples that cannot be processed
> are buffered in memory. fill the heap and cause premanent GC and a slow
> painful death for the worker process. Is it meant to be like this?
>
>
>
> In the second case, a worker running part of the topology (not the spout)
> dies. The spout keeps working even though it knows the other worker is dead
> and drops messages to it instead of waiting for it to come up again.
>
>
>
> We have solved the first one by making the spout wait until all the bolts
> come up, but the secondd problem would require noticeable amount of work.
> Are theese behaviors intended? Can they be changed?
>
>
>
> Thanks for replies in advance,
>
> Martin B
>

RE: Topology execution synchronization

Posted by "Pasquini, Reuben" <re...@hp.com>.

Hi Martin,

Initialize your bolts in the ‘prepare’ method – the topology will not go live until all the bolts have completed their preparation.

I believe the topology will automatically throttle calls to a spout’s ‘nextTuple’ method if the rest of the topology does not ‘ACK’ the tuples – be sure that your bolts are not blindly ‘acking’ tuples that have not completed processing in the bolt.



From: Martin Burian [mailto:martin.burianjr@gmail.com]
Sent: Tuesday, August 18, 2015 5:04 AM
To: user@storm.apache.org
Subject: Topology execution synchronization

Good noon to everyone,
I have encountered problems with synchronization of component execution.

When the topology starts, the spout starts emitting tuples before the other components are prepared (the bolts recover state from redis, it takes considerable amount of time). The emitted tuples that cannot be processed are buffered in memory. fill the heap and cause premanent GC and a slow painful death for the worker process. Is it meant to be like this?

In the second case, a worker running part of the topology (not the spout) dies. The spout keeps working even though it knows the other worker is dead and drops messages to it instead of waiting for it to come up again.

We have solved the first one by making the spout wait until all the bolts come up, but the secondd problem would require noticeable amount of work. Are theese behaviors intended? Can they be changed?

Thanks for replies in advance,
Martin B