You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/10/23 18:19:00 UTC

[jira] [Updated] (STORM-2786) Ackers leak tracking info on failure and lots of other cases.

     [ https://issues.apache.org/jira/browse/STORM-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated STORM-2786:
----------------------------------
    Labels: pull-request-available  (was: )

> Ackers leak tracking info on failure and lots of other cases.
> -------------------------------------------------------------
>
>                 Key: STORM-2786
>                 URL: https://issues.apache.org/jira/browse/STORM-2786
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-client, storm-core
>    Affects Versions: 0.9.1-incubating, 0.10.0, 1.0.0, 2.0.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>              Labels: pull-request-available
>
> Over the weekend we had an incident where ackers were running out of memory at a really scary rate.  It turns out that they were having a lot of failures, for an unrelated reason, but each of the failures were resulting in tuple tracking being lost because... 
> We don't send ticks to any system components ever...
> https://github.com/apache/storm/blob/124acb92dff04a57b530ab4d95a698abc8ff46d9/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L384
> and ackers are system components.
> So the tracking map was never rotated and all failed tuples
> https://github.com/apache/storm/blob/124acb92dff04a57b530ab4d95a698abc8ff46d9/storm-client/src/jvm/org/apache/storm/daemon/Acker.java#L97-L103
> Were never deleted from the map.
> This leak eventually made the ackers crash, and when they came back up the other components kept blasting them with messages that would never be fully acked which also leaked because of the tick problem.
> Looking back this has been in every release since 0.9.1-incubating.  It appears to have been introduced by https://github.com/apache/storm/commit/483ce454a3b2cd31b5d1c34e9365346459b358a8
> So every apache release has this problem (which is the only reason I have not marked this as a blocker, because apparently it is not so bad that anyone has noticed in the past 4 years).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)