You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2014/06/16 19:44:01 UTC

[jira] [Commented] (STORM-339) Severe memory leak to OOM when ackers disabled

    [ https://issues.apache.org/jira/browse/STORM-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032672#comment-14032672 ] 

Robert Joseph Evans commented on STORM-339:
-------------------------------------------

This is not specific to the Netty transport, although it does not help because it does buffer a lot more than the zeromq code did.  If you don't have acking enabled there is no flow control in storm, and if you have not properly sized your components the tuples will be buffered in memory and eventually OOM or be shot by the supervisor because GC took too long and the heartbeats stopped coming.

I'm not sure that there is a really good way to fix this totally without acking.

> Severe memory leak to OOM when ackers disabled
> ----------------------------------------------
>
>                 Key: STORM-339
>                 URL: https://issues.apache.org/jira/browse/STORM-339
>             Project: Apache Storm (Incubating)
>          Issue Type: Bug
>    Affects Versions: 0.9.2-incubating
>            Reporter: Jiahong Li
>            Priority: Critical
>
> Without any ackers enabled, fast component  will continuously leak memory and causing OOM problems when target component is slow. The OOM problem can be reproduced by running this fast-slow-topology:
> https://github.com/Gvain/storm-perf-test/tree/fast-slow-topology
> with command:
> {code}
> $ storm jar storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar com.yahoo.storm.perftest.Main --spout 1 --bolt 1 --workers 2 --testTime 600 --messageSize 6400
> {code}
> And the worker childopts with {{-Xms2g -Xmx2g -Xmn512m ...}}.
> At the same time, the executed count of target component is far behind from the emitted count of source component.  I guess it could be that netty client is buffering too much messages in its message_queue as target component sends back OK/Failure Response too slowly. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)