You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by preethini v <pr...@gmail.com> on 2017/08/10 13:06:36 UTC

Storm latency with fail() | acker tasks die

Hi,

I have a situation where the bolts ack, but the acker tasks fail (which is
expected as per my logic).

I am measuring the latency of the topology using timestamps in ack() and
fail() methods.

*Observations:*
*-------------------------------------*
*ack() - latency ~ 100ms*

*fail() - latency ~ 15000ms. *

I have set *topology.message.timeout.secs to 10*.

Which means there is a timeout of 10s before fail is called. But, 15000 -
10000 = 5000ms (which is still a large value).


*1.  What are the reasons for such high latency before calling fail() ?*

*2. What other time factors contribute to latency apart  from timeout? Any
ideas?*

Thanks,
Preethini

Re: Storm latency with fail() | acker tasks die

Posted by preethini v <pr...@gmail.com>.

Hi,

Which newer version do you mean? I am working with storm version 1.1.0


Thanks,
Preethini

On Thu, Aug 10, 2017 at 3:36 PM, Bobby Evans <ev...@yahoo-inc.com> wrote:

> What version of storm are you using?  In older versions of storm the
> timeout check was done once every topology.message.timeout.secs.  So that
> means nothing will timeout sooner than topology.message.timeout.secs, but
> could in the worst case be almost 2x that. If I remember correctly that in
> newer versions of storm we have adjusted it to check more frequently, but I
> don't know the JIRA off the top of my head.
>
> - Bobby
>
>
>
> On Thursday, August 10, 2017, 8:06:51 AM CDT, preethini v <
> preethini.v@gmail.com> wrote:
>
>
> Hi,
>
> I have a situation where the bolts ack, but the acker tasks fail (which is
> expected as per my logic).
>
> I am measuring the latency of the topology using timestamps in ack() and
> fail() methods.
>
> *Observations:*
> *-------------------------------------*
> *ack() - latency ~ 100ms*
>
> *fail() - latency ~ 15000ms. *
>
> I have set *topology.message.timeout.secs to 10*.
>
> Which means there is a timeout of 10s before fail is called. But, 15000 -
> 10000 = 5000ms (which is still a large value).
>
>
> *1.  What are the reasons for such high latency before calling fail() ?*
>
> *2. What other time factors contribute to latency apart  from timeout? Any
> ideas?*
>
> Thanks,
> Preethini
>

Re: Storm latency with fail() | acker tasks die

Posted by Stig Rohde Døssing <sr...@apache.org>.

I think the spout still checks once every topology.message.timeout.secs,
which means a tuple will time out between topology.message.timeout.secs and
2*topology.message.timeout.secs after being emitted.

The spout times out tuples by putting emitted message ids in a rotating map
with 2 buckets. A newly emitted message id is failed on the spout once the
map has rotated twice. See
https://github.com/apache/storm/blob/90ca7fa0c8e73a1884c70e2d3da3388b24d13db0/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutExecutor.java#L97

The map rotates when the spout receives a tick tuple, which it does every
topology.message.timeout.secs
https://github.com/apache/storm/blob/4c8a986f519cdf3e63bed47e9c4f723e4867267a/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L297

I think it is expected that you will see an average latency of 1.5x
topology.message.timeout.secs, because when tuples get added to the pending
map evenly, you will on average be 0.5x topology.message.timeout.secs into
the bucket's time as "bucket number 1" when adding new tuples, and the
bucket survives another 1x topology.message.timeout.secs after that.

2017-08-10 15:36 GMT+02:00 Bobby Evans <ev...@yahoo-inc.com>:

> What version of storm are you using?  In older versions of storm the
> timeout check was done once every topology.message.timeout.secs.  So that
> means nothing will timeout sooner than topology.message.timeout.secs, but
> could in the worst case be almost 2x that. If I remember correctly that in
> newer versions of storm we have adjusted it to check more frequently, but I
> don't know the JIRA off the top of my head.
>
> - Bobby
>
>
>
> On Thursday, August 10, 2017, 8:06:51 AM CDT, preethini v <
> preethini.v@gmail.com> wrote:
>
>
> Hi,
>
> I have a situation where the bolts ack, but the acker tasks fail (which is
> expected as per my logic).
>
> I am measuring the latency of the topology using timestamps in ack() and
> fail() methods.
>
> *Observations:*
> *-------------------------------------*
> *ack() - latency ~ 100ms*
>
> *fail() - latency ~ 15000ms. *
>
> I have set *topology.message.timeout.secs to 10*.
>
> Which means there is a timeout of 10s before fail is called. But, 15000 -
> 10000 = 5000ms (which is still a large value).
>
>
> *1.  What are the reasons for such high latency before calling fail() ?*
>
> *2. What other time factors contribute to latency apart  from timeout? Any
> ideas?*
>
> Thanks,
> Preethini
>

Re: Storm latency with fail() | acker tasks die

Posted by Bobby Evans <ev...@yahoo-inc.com>.

What version of storm are you using?  In older versions of storm the timeout check was done once every topology.message.timeout.secs.  So that means nothing will timeout sooner than topology.message.timeout.secs, but could in the worst case be almost 2x that. If I remember correctly that in newer versions of storm we have adjusted it to check more frequently, but I don't know the JIRA off the top of my head.

- Bobby


On Thursday, August 10, 2017, 8:06:51 AM CDT, preethini v <pr...@gmail.com> wrote:

Hi,
I have a situation where the bolts ack, but the acker tasks fail (which is expected as per my logic).
I am measuring the latency of the topology using timestamps in ack() and fail() methods.
Observations:-------------------------------------ack() - latency ~ 100ms
fail() - latency ~ 15000ms. 
I have set topology.message.timeout.secs to 10. 
Which means there is a timeout of 10s before fail is called. But, 15000 - 10000 = 5000ms (which is still a large value).

1.  What are the reasons for such high latency before calling fail() ?
2. What other time factors contribute to latency apart  from timeout? Any ideas?
Thanks,Preethini