You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2017/10/23 13:50:00 UTC

[jira] [Commented] (STORM-2786) Ackers leak tracking info on failure and lots of other cases.

    [ https://issues.apache.org/jira/browse/STORM-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215158#comment-16215158 ] 

Jungtaek Lim commented on STORM-2786:
-------------------------------------

Great finding!

There're generally two kinds of failed tuples - explicit fail and timeout - and explicitly failed tuples will be remove from the map via https://github.com/apache/storm/blob/124acb92dff04a57b530ab4d95a698abc8ff46d9/storm-client/src/jvm/org/apache/storm/daemon/Acker.java#L120-L122 but timed-out tuples still would be leaked because of what you observed. 
Spout handles timed-out tuples independently hence only memory leak will occur which makes users not able be aware of.

> Ackers leak tracking info on failure and lots of other cases.
> -------------------------------------------------------------
>
>                 Key: STORM-2786
>                 URL: https://issues.apache.org/jira/browse/STORM-2786
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-client, storm-core
>    Affects Versions: 0.9.1-incubating, 0.10.0, 1.0.0, 2.0.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>
> Over the weekend we had an incident where ackers were running out of memory at a really scary rate.  It turns out that they were having a lot of failures, for an unrelated reason, but each of the failures were resulting in tuple tracking being lost because... 
> We don't send ticks to any system components ever...
> https://github.com/apache/storm/blob/124acb92dff04a57b530ab4d95a698abc8ff46d9/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L384
> and ackers are system components.
> So the tracking map was never rotated and all failed tuples
> https://github.com/apache/storm/blob/124acb92dff04a57b530ab4d95a698abc8ff46d9/storm-client/src/jvm/org/apache/storm/daemon/Acker.java#L97-L103
> Were never deleted from the map.
> This leak eventually made the ackers crash, and when they came back up the other components kept blasting them with messages that would never be fully acked which also leaked because of the tick problem.
> Looking back this has been in every release since 0.9.1-incubating.  It appears to have been introduced by https://github.com/apache/storm/commit/483ce454a3b2cd31b5d1c34e9365346459b358a8
> So every apache release has this problem (which is the only reason I have not marked this as a blocker, because apparently it is not so bad that anyone has noticed in the past 4 years).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)