You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Tim Fendt <Ti...@virginpulse.com> on 2017/05/01 18:55:58 UTC

Disruptor Queue Filling Memory

We have been having an issue where after about a week of running our old gen on the JVM has troubles freeing space. I generated a heapdump during the last issue and found it to be filled with DisruptorQueue objects. Is there a memory leak with the disruptor queue or is there some configuration we are missing? We are running Storm version 1.0.2.

org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher and org.apache.storm.utils.DisruptorQueue classes fill the memory.
https://puu.sh/vCkQE/cda1f319ad.png

This is our config for the supervisors:
storm.local.dir: "/var/storm-local"
storm.zookeeper.servers:
- “10.0.0.5”
storm.zookeeper.port: 2181

nimbus.seeds: ["10.0.0.6"]

supervisor.slots.ports:
- 6700

worker.childopts: "-Xms3072m -Xmx3072m"

Thanks,

--
Tim

Confidentiality Notice: The information contained in this e-mail, including any attachment(s), is intended solely for use by the designated recipient(s). Unauthorized use, dissemination, distribution, or reproduction of this message by anyone other than the intended recipient(s), or a person designated as responsible for delivering such messages to the intended recipient, is strictly prohibited and may be unlawful. This e-mail may contain proprietary, confidential or privileged information. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Virgin Pulse, Inc. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender and delete this e-mail message.

Re: Disruptor Queue Filling Memory

Posted by Tim Fendt <Ti...@virginpulse.com>.

When we build a spout we set the max spout pending:

builder.setSpout(spoutConfig.getId(), spoutConfig.getSpout(), spoutConfig.getParallelismHint()).setMaxSpoutPending(spoutConfig.getMaxSpoutPending());

Here is our full topology builder code for reference:

TopologyBuilder builder = new TopologyBuilder();

for (TopologySetup topologySetup : topologySetupList) {
    if (topologySetup.getSpoutConfiguration() != null) {
        SpoutConfiguration spoutConfig = topologySetup.getSpoutConfiguration();
        builder.setSpout(spoutConfig.getId(), spoutConfig.getSpout(), spoutConfig.getParallelismHint()).setMaxSpoutPending(spoutConfig.getMaxSpoutPending());
    }

    BoltConfiguration boltConfig = topologySetup.getBoltConfiguration();

    if (boltConfig.isShuffleGrouping()) {
        builder.setBolt(boltConfig.getId(), boltConfig.getBolt(), boltConfig.getParallelismHint()).shuffleGrouping(boltConfig.getReadTuplesFrom()).setNumTasks(boltConfig.getTasks());
    } else if (boltConfig.isFieldGrouping()) {
        builder.setBolt(boltConfig.getId(), boltConfig.getBolt(), boltConfig.getParallelismHint()).fieldsGrouping(boltConfig.getReadTuplesFrom(), boltConfig.getFields()).setNumTasks(boltConfig.getTasks());
    }
}

The set max spout pending comes from the SpoutDeclarer which inherits it from the ComponentConfigurationDeclarer.

Are you saying this actually doesn’t do anything?

Thanks,

--
Tim


From: Roshan Naik <ro...@hortonworks.com>
Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
Date: Tuesday, May 2, 2017 at 4:54 PM
To: "user@storm.apache.org" <us...@storm.apache.org>
Subject: Re: Disruptor Queue Filling Memory

Tim,
You max spout pending is disabled too. I think the max spout pending is a topology wide setting and not a per spout setting. When submitting the topology via ‘storm jar‘ cmd, you can provide custom settings using  -c. Ex:
  storm jar …..  -c topology.acker.executors=1 –c topology.max.spout.pending=10000

In your case, without ACKers, enabling back-pressure might be the only quick fix. But you will have to live with the occasional stalls… and restart the topos as needed …like Alexandre is doing.
You can try to mitigate the backpressure situations from triggering (and consequently the stalling issues) by identifying which bolt is the bottleneck is and see if increasing the parallelism on that bolt helps.

Better if you can enable ACKing after ensuring your bolts/spouts are handling ACKs properly… and then enable topology.max.spout.pending

In 2.0 we are planning for a different backpressure model  (https://issues.apache.org/jira/browse/STORM-2310)

I suspect the new model will not make it into 1.x, anytime soon due. So would be good to get some movement on STORM-1949 and see if it fixes the stall issue. But I am not in a position to spend much time on it for about a couple weeks.

-roshan


From: Tim Fendt <Ti...@virginpulse.com>
Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
Date: Tuesday, May 2, 2017 at 6:34 AM
To: "user@storm.apache.org" <us...@storm.apache.org>
Subject: Re: Disruptor Queue Filling Memory

Hey Roshan,

Here are our settings:

Topology.max.spout.pending: null
topology.acker.executors: null
topology.worker.max.heap.size.mb: 768
worker.heap.memory.mb: 768
topology.backpressure.enable: false
topology.message.timeout.secs: 30
worker.childopts: “-Xmx%HEAP-MEM%m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump


What is interesting is the worker.childops is listed incorrectly on the UI. In my yml file I have the following defined for worker childops: -Xms3072m -Xmx3072m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/ubuntu -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=5555 -Dcom.sun.management.jmxremote.rmi.port=5555 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -javaagent:/opt/newrelic-java/newrelic/newrelic.jar”

I can confirm with other tools that my worker ops defined in the yml file are being applied and the ones listed in the UI are not.

Also, we set the max spout pending for each spout in code. Do we also have to set it for the topology as a whole? And as you mentioned we do not have ack turned on so does it even matter? We have 9-10 spouts per supervisor do they all share one disruptor queue like the heapdump seems to suggest?

Thanks,

--
Tim


From: Alexandre Vermeerbergen <av...@gmail.com>
Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
Date: Tuesday, May 2, 2017 at 7:27 AM
To: "user@storm.apache.org" <us...@storm.apache.org>
Subject: Re: Disruptor Queue Filling Memory

Hello Roshan,
Thanks for the hint.

Regarding back pressure fix: it looks like the last activity on the associated JIRA (https://issues.apache.org/jira/browse/STORM-1949) was 1st of September 2016, and that Zhuo Liu was asking you (and also to Alessandro Bellina) to perform some tests in 2.0 branch... and this JIRA never got updated anymore.
It would be great to have some follow-up on this backpressure issue.

In the meantime, I have to make a quick decision about our use of Storm 1.0.3 in production : we have re-enabled backpressure, and so far it's behaving like we had with 1.0.1 (yet we have not yet observed workers blocking).

So between seeing our workers accumulating to much lag versus using a backpressure which sometimes can block our workers - but we have our self-healing, I'll use backpressure with Storm 1.0.3 for the short term.
Our next target is based on Storm 1.1.0, so we will take more time to weight the alternative (ie: keep backpressure or spend more time on searching for bottlenecks & tuning)
Thanks,
Alexandre Vermeerbergen



2017-05-02 11:19 GMT+02:00 Roshan Naik <ro...@hortonworks.com>>:
Like I suspected …your topology.max.spout.pending is disabled.
Set it to something like 10k or 50k  .. assuming your message sizes are in kb or less.

The worker stall/blocked issue may have been due to the backpressure subsystem. I remember reporting that bug, not sure if it got addressed fully. That’s why we disabled it by default.

-roshan

From: Alexandre Vermeerbergen <av...@gmail.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Tuesday, May 2, 2017 at 2:11 AM

To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Re: Disruptor Queue Filling Memory

Hi Rohan,
Thank you very much for your answers.

For your information, with Storm 1.0.1 our topologies work with the by-default enabled back-pressure, we sometimes have the blocked worker issue which we have mitigated by writing our own "fail-over" system which detects such situation and automatically restart impacted topologies.
With Storm 1.0.3, we no longer have blocked workers, but our lag sometimes gets crazy, CPU load bumps and we have a huge accumulation of memory with disruptor queue.
To answer your questions about our topologies' settings, here's what we currently have:
Required information

Property name (if not the same)

Property value

topology.acker.executors

-

1

topology.worker.max.heap.size.mb

-

768

worker heap size

worker.heap.memory.mb

768

max spout pending

topology.max.spout.pending

Null

back pressure settings

backpressure.disruptor.high.watermark
backpressure.disruptor.low.watermark
task.backpressure.poll.secs
topology.backpressure.enable

0.9
0.4
30
false

topology.message.timeout.secs

-

30


We're going to study metrics with your suggested approach
Best regards,
Alexandre


2017-05-02 9:52 GMT+02:00 Roshan Naik <ro...@hortonworks.com>>:
That ConcurrentLinkedQueue  is the overflow list that I was referring to earlier. It is part of org.apache.storm.utils.DisruptorQueue.
This DisruptorQueue class is Storm’s wrapper around the lmax disruptor q.

When a spout/bolt instance cannot emit() to its downstream bolt (within the same worker process), because the inbound DisruptorQ of the destination bolt is full… the messages are stashed away in the overflow linked list associated with that DisruptorQ . As the disruptor q gets gradually drained a bit, the messages from the overflow are drained into the available space in the Disruptor.

In cases like this the max spout pending, if enabled, should kick in to prevent excessive accumulation of un-acked messages in the topology.
I assume you are using ACKers in your topo ? Otherwise this won’t help.

Can you share the values of the below settings … as shown by the topology settings search box in the topology UI page …
- topology.acker.executors
- topology.worker.max.heap.size.mb:
- worker heap size
- max spout pending
- back pressure settings
 - topology.message.timeout.secs


Also on the topology metrics table, you may be able to identify which spout->bolt or bolt->bolt  connection is congested by looking at the ‘transferred’/emits metrics of each spout and bolt. Also examine the ack counts.

It looks like Back pressure is still disabled by default.
https://github.com/apache/storm/blob/v1.0.3/conf/defaults.yaml
I am not sure how stable it is at the moment so wont be able to recommend on turning it on.

-roshan


From: Alexandre Vermeerbergen <av...@gmail.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Monday, May 1, 2017 at 2:50 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>

Subject: Re: Disruptor Queue Filling Memory

Hello,
I think that I am experiencing the same kind of issue as Tim with Storm 1.0.3 : I have a big instability in my storm cluster whenever I add a certain topology, leading to very high CPU load on the VM which hosts the worker process getting this topology.
I made a heap dump, opened it with Eclipse MAT, and bingo: it gives me "org.apache.storm.utils.DisruptorQueue" as the leaks / problem suspect 1.
More detail on Eclipse MAT's output:

One instance of "org.apache.storm.utils.DisruptorQueue" loaded by "sun.misc.Launcher$AppClassLoader @ 0x80013d40" occupies 766 807 504 (46,64%) bytes. The memory is accumulated in one instance of "java.util.concurrent.ConcurrentLinkedQueue$Node" loaded by "<system class loader>".

Keywords
org.apache.storm.utils.DisruptorQueue
sun.misc.Launcher$AppClassLoader @ 0x80013d40
java.util.concurrent.ConcurrentLinkedQueue$Node
The same set of topologies never "eats" that much CPU & memory with Storm 1.0.1, so I guess that with https://issues.apache.org/jira/browse/STORM-1956 the main difference between our full set of topologies working with Storm 1.0.1 vers 1.0.3 is that we no longer have backpressure with Storm 1.0.3.
I have a few questions which consolidate Tim's:
1. Is backpressure enabled again by default with Storm 1.1.0 ?
2. Are there guidelines to re-enable backpressure and correctly tune it ?
Best regards,
Alexandre Vermeerbergen

2017-05-01 21:52 GMT+02:00 Tim Fendt <Ti...@virginpulse.com>>:
We have max spout pending enabled and it is set to 1000 and we have the back pressure system turned off. We did see increased latency for the processor which contributed to the queueing. Given what you are saying I assume that 1000 messages are just too large to fit in memory we have assigned? Should we look at turning on back pressure and reducing max spout mending?

Thanks,

--
Tim


From: Roshan Naik <ro...@hortonworks.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Monday, May 1, 2017 at 2:26 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>, "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Re: Disruptor Queue Filling Memory

You are most likely experiencing back pressure and your max spout pending is not enabled. That is causing the overflow (unbounded) linked list inside stom's disruptor wrapper to swallow all the memory. You can try using max spout pending to throttle the spouts under such scenarios.

Get Outlook for iOS<https://aka.ms/o0ukef>


On Mon, May 1, 2017 at 11:56 AM -0700, "Tim Fendt" <Ti...@virginpulse.com>> wrote:
We have been having an issue where after about a week of running our old gen on the JVM has troubles freeing space. I generated a heapdump during the last issue and found it to be filled with DisruptorQueue objects. Is there a memory leak with the disruptor queue or is there some configuration we are missing? We are running Storm version 1.0.2.

org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher and org.apache.storm.utils.DisruptorQueue classes fill the memory.
https://puu.sh/vCkQE/cda1f319ad.png

This is our config for the supervisors:
storm.local.dir: "/var/storm-local"
storm.zookeeper.servers:
    - “10.0.0.5”
storm.zookeeper.port: 2181

nimbus.seeds: ["10.0.0.6"]

supervisor.slots.ports:
    - 6700

worker.childopts: "-Xms3072m -Xmx3072m"


Thanks,

--
Tim

Confidentiality Notice: The information contained in this e-mail, including any attachment(s), is intended solely for use by the designated recipient(s). Unauthorized use, dissemination, distribution, or reproduction of this message by anyone other than the intended recipient(s), or a person designated as responsible for delivering such messages to the intended recipient, is strictly prohibited and may be unlawful. This e-mail may contain proprietary, confidential or privileged information. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Virgin Pulse, Inc. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender and delete this e-mail message.

Re: Disruptor Queue Filling Memory

Posted by Roshan Naik <ro...@hortonworks.com>.

Tim,
You max spout pending is disabled too. I think the max spout pending is a topology wide setting and not a per spout setting. When submitting the topology via ‘storm jar‘ cmd, you can provide custom settings using  -c. Ex:
  storm jar …..  -c topology.acker.executors=1 –c topology.max.spout.pending=10000

In your case, without ACKers, enabling back-pressure might be the only quick fix. But you will have to live with the occasional stalls… and restart the topos as needed …like Alexandre is doing.
You can try to mitigate the backpressure situations from triggering (and consequently the stalling issues) by identifying which bolt is the bottleneck is and see if increasing the parallelism on that bolt helps.

Better if you can enable ACKing after ensuring your bolts/spouts are handling ACKs properly… and then enable topology.max.spout.pending

In 2.0 we are planning for a different backpressure model  (https://issues.apache.org/jira/browse/STORM-2310)

I suspect the new model will not make it into 1.x, anytime soon due. So would be good to get some movement on STORM-1949 and see if it fixes the stall issue. But I am not in a position to spend much time on it for about a couple weeks.

-roshan


From: Tim Fendt <Ti...@virginpulse.com>
Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
Date: Tuesday, May 2, 2017 at 6:34 AM
To: "user@storm.apache.org" <us...@storm.apache.org>
Subject: Re: Disruptor Queue Filling Memory

Hey Roshan,

Here are our settings:

Topology.max.spout.pending: null
topology.acker.executors: null
topology.worker.max.heap.size.mb: 768
worker.heap.memory.mb: 768
topology.backpressure.enable: false
topology.message.timeout.secs: 30
worker.childopts: “-Xmx%HEAP-MEM%m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump


What is interesting is the worker.childops is listed incorrectly on the UI. In my yml file I have the following defined for worker childops: -Xms3072m -Xmx3072m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/ubuntu -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=5555 -Dcom.sun.management.jmxremote.rmi.port=5555 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -javaagent:/opt/newrelic-java/newrelic/newrelic.jar”

I can confirm with other tools that my worker ops defined in the yml file are being applied and the ones listed in the UI are not.

Also, we set the max spout pending for each spout in code. Do we also have to set it for the topology as a whole? And as you mentioned we do not have ack turned on so does it even matter? We have 9-10 spouts per supervisor do they all share one disruptor queue like the heapdump seems to suggest?

Thanks,

--
Tim


From: Alexandre Vermeerbergen <av...@gmail.com>
Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
Date: Tuesday, May 2, 2017 at 7:27 AM
To: "user@storm.apache.org" <us...@storm.apache.org>
Subject: Re: Disruptor Queue Filling Memory

Hello Roshan,
Thanks for the hint.

Regarding back pressure fix: it looks like the last activity on the associated JIRA (https://issues.apache.org/jira/browse/STORM-1949) was 1st of September 2016, and that Zhuo Liu was asking you (and also to Alessandro Bellina) to perform some tests in 2.0 branch... and this JIRA never got updated anymore.
It would be great to have some follow-up on this backpressure issue.

In the meantime, I have to make a quick decision about our use of Storm 1.0.3 in production : we have re-enabled backpressure, and so far it's behaving like we had with 1.0.1 (yet we have not yet observed workers blocking).

So between seeing our workers accumulating to much lag versus using a backpressure which sometimes can block our workers - but we have our self-healing, I'll use backpressure with Storm 1.0.3 for the short term.
Our next target is based on Storm 1.1.0, so we will take more time to weight the alternative (ie: keep backpressure or spend more time on searching for bottlenecks & tuning)
Thanks,
Alexandre Vermeerbergen



2017-05-02 11:19 GMT+02:00 Roshan Naik <ro...@hortonworks.com>>:
Like I suspected …your topology.max.spout.pending is disabled.
Set it to something like 10k or 50k  .. assuming your message sizes are in kb or less.

The worker stall/blocked issue may have been due to the backpressure subsystem. I remember reporting that bug, not sure if it got addressed fully. That’s why we disabled it by default.

-roshan

From: Alexandre Vermeerbergen <av...@gmail.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Tuesday, May 2, 2017 at 2:11 AM

To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Re: Disruptor Queue Filling Memory

Hi Rohan,
Thank you very much for your answers.

For your information, with Storm 1.0.1 our topologies work with the by-default enabled back-pressure, we sometimes have the blocked worker issue which we have mitigated by writing our own "fail-over" system which detects such situation and automatically restart impacted topologies.
With Storm 1.0.3, we no longer have blocked workers, but our lag sometimes gets crazy, CPU load bumps and we have a huge accumulation of memory with disruptor queue.
To answer your questions about our topologies' settings, here's what we currently have:
Required information

Property name (if not the same)

Property value

topology.acker.executors

-

1

topology.worker.max.heap.size.mb

-

768

worker heap size

worker.heap.memory.mb

768

max spout pending

topology.max.spout.pending

Null

back pressure settings

backpressure.disruptor.high.watermark
backpressure.disruptor.low.watermark
task.backpressure.poll.secs
topology.backpressure.enable

0.9
0.4
30
false

topology.message.timeout.secs

-

30


We're going to study metrics with your suggested approach
Best regards,
Alexandre


2017-05-02 9:52 GMT+02:00 Roshan Naik <ro...@hortonworks.com>>:
That ConcurrentLinkedQueue  is the overflow list that I was referring to earlier. It is part of org.apache.storm.utils.DisruptorQueue.
This DisruptorQueue class is Storm’s wrapper around the lmax disruptor q.

When a spout/bolt instance cannot emit() to its downstream bolt (within the same worker process), because the inbound DisruptorQ of the destination bolt is full… the messages are stashed away in the overflow linked list associated with that DisruptorQ . As the disruptor q gets gradually drained a bit, the messages from the overflow are drained into the available space in the Disruptor.

In cases like this the max spout pending, if enabled, should kick in to prevent excessive accumulation of un-acked messages in the topology.
I assume you are using ACKers in your topo ? Otherwise this won’t help.

Can you share the values of the below settings … as shown by the topology settings search box in the topology UI page …
- topology.acker.executors
- topology.worker.max.heap.size.mb:
- worker heap size
- max spout pending
- back pressure settings
 - topology.message.timeout.secs


Also on the topology metrics table, you may be able to identify which spout->bolt or bolt->bolt  connection is congested by looking at the ‘transferred’/emits metrics of each spout and bolt. Also examine the ack counts.

It looks like Back pressure is still disabled by default.
https://github.com/apache/storm/blob/v1.0.3/conf/defaults.yaml
I am not sure how stable it is at the moment so wont be able to recommend on turning it on.

-roshan


From: Alexandre Vermeerbergen <av...@gmail.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Monday, May 1, 2017 at 2:50 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>

Subject: Re: Disruptor Queue Filling Memory

Hello,
I think that I am experiencing the same kind of issue as Tim with Storm 1.0.3 : I have a big instability in my storm cluster whenever I add a certain topology, leading to very high CPU load on the VM which hosts the worker process getting this topology.
I made a heap dump, opened it with Eclipse MAT, and bingo: it gives me "org.apache.storm.utils.DisruptorQueue" as the leaks / problem suspect 1.
More detail on Eclipse MAT's output:

One instance of "org.apache.storm.utils.DisruptorQueue" loaded by "sun.misc.Launcher$AppClassLoader @ 0x80013d40" occupies 766 807 504 (46,64%) bytes. The memory is accumulated in one instance of "java.util.concurrent.ConcurrentLinkedQueue$Node" loaded by "<system class loader>".

Keywords
org.apache.storm.utils.DisruptorQueue
sun.misc.Launcher$AppClassLoader @ 0x80013d40
java.util.concurrent.ConcurrentLinkedQueue$Node
The same set of topologies never "eats" that much CPU & memory with Storm 1.0.1, so I guess that with https://issues.apache.org/jira/browse/STORM-1956 the main difference between our full set of topologies working with Storm 1.0.1 vers 1.0.3 is that we no longer have backpressure with Storm 1.0.3.
I have a few questions which consolidate Tim's:
1. Is backpressure enabled again by default with Storm 1.1.0 ?
2. Are there guidelines to re-enable backpressure and correctly tune it ?
Best regards,
Alexandre Vermeerbergen

2017-05-01 21:52 GMT+02:00 Tim Fendt <Ti...@virginpulse.com>>:
We have max spout pending enabled and it is set to 1000 and we have the back pressure system turned off. We did see increased latency for the processor which contributed to the queueing. Given what you are saying I assume that 1000 messages are just too large to fit in memory we have assigned? Should we look at turning on back pressure and reducing max spout mending?

Thanks,

--
Tim


From: Roshan Naik <ro...@hortonworks.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Monday, May 1, 2017 at 2:26 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>, "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Re: Disruptor Queue Filling Memory

You are most likely experiencing back pressure and your max spout pending is not enabled. That is causing the overflow (unbounded) linked list inside stom's disruptor wrapper to swallow all the memory. You can try using max spout pending to throttle the spouts under such scenarios.

Get Outlook for iOS<https://aka.ms/o0ukef>


On Mon, May 1, 2017 at 11:56 AM -0700, "Tim Fendt" <Ti...@virginpulse.com>> wrote:
We have been having an issue where after about a week of running our old gen on the JVM has troubles freeing space. I generated a heapdump during the last issue and found it to be filled with DisruptorQueue objects. Is there a memory leak with the disruptor queue or is there some configuration we are missing? We are running Storm version 1.0.2.

org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher and org.apache.storm.utils.DisruptorQueue classes fill the memory.
https://puu.sh/vCkQE/cda1f319ad.png

This is our config for the supervisors:
storm.local.dir: "/var/storm-local"
storm.zookeeper.servers:
    - “10.0.0.5”
storm.zookeeper.port: 2181

nimbus.seeds: ["10.0.0.6"]

supervisor.slots.ports:
    - 6700

worker.childopts: "-Xms3072m -Xmx3072m"


Thanks,

--
Tim

Confidentiality Notice: The information contained in this e-mail, including any attachment(s), is intended solely for use by the designated recipient(s). Unauthorized use, dissemination, distribution, or reproduction of this message by anyone other than the intended recipient(s), or a person designated as responsible for delivering such messages to the intended recipient, is strictly prohibited and may be unlawful. This e-mail may contain proprietary, confidential or privileged information. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Virgin Pulse, Inc. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender and delete this e-mail message.

Re: Disruptor Queue Filling Memory

Posted by Tim Fendt <Ti...@virginpulse.com>.

Hey Roshan,

Here are our settings:

Topology.max.spout.pending: null
topology.acker.executors: null
topology.worker.max.heap.size.mb: 768
worker.heap.memory.mb: 768
topology.backpressure.enable: false
topology.message.timeout.secs: 30
worker.childopts: “-Xmx%HEAP-MEM%m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump


What is interesting is the worker.childops is listed incorrectly on the UI. In my yml file I have the following defined for worker childops: -Xms3072m -Xmx3072m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/ubuntu -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=5555 -Dcom.sun.management.jmxremote.rmi.port=5555 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -javaagent:/opt/newrelic-java/newrelic/newrelic.jar”

I can confirm with other tools that my worker ops defined in the yml file are being applied and the ones listed in the UI are not.

Also, we set the max spout pending for each spout in code. Do we also have to set it for the topology as a whole? And as you mentioned we do not have ack turned on so does it even matter? We have 9-10 spouts per supervisor do they all share one disruptor queue like the heapdump seems to suggest?

Thanks,

--
Tim


From: Alexandre Vermeerbergen <av...@gmail.com>
Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
Date: Tuesday, May 2, 2017 at 7:27 AM
To: "user@storm.apache.org" <us...@storm.apache.org>
Subject: Re: Disruptor Queue Filling Memory

Hello Roshan,
Thanks for the hint.

Regarding back pressure fix: it looks like the last activity on the associated JIRA (https://issues.apache.org/jira/browse/STORM-1949) was 1st of September 2016, and that Zhuo Liu was asking you (and also to Alessandro Bellina) to perform some tests in 2.0 branch... and this JIRA never got updated anymore.
It would be great to have some follow-up on this backpressure issue.

In the meantime, I have to make a quick decision about our use of Storm 1.0.3 in production : we have re-enabled backpressure, and so far it's behaving like we had with 1.0.1 (yet we have not yet observed workers blocking).

So between seeing our workers accumulating to much lag versus using a backpressure which sometimes can block our workers - but we have our self-healing, I'll use backpressure with Storm 1.0.3 for the short term.
Our next target is based on Storm 1.1.0, so we will take more time to weight the alternative (ie: keep backpressure or spend more time on searching for bottlenecks & tuning)
Thanks,
Alexandre Vermeerbergen



2017-05-02 11:19 GMT+02:00 Roshan Naik <ro...@hortonworks.com>>:
Like I suspected …your topology.max.spout.pending is disabled.
Set it to something like 10k or 50k  .. assuming your message sizes are in kb or less.

The worker stall/blocked issue may have been due to the backpressure subsystem. I remember reporting that bug, not sure if it got addressed fully. That’s why we disabled it by default.

-roshan

From: Alexandre Vermeerbergen <av...@gmail.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Tuesday, May 2, 2017 at 2:11 AM

To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Re: Disruptor Queue Filling Memory

Hi Rohan,
Thank you very much for your answers.

For your information, with Storm 1.0.1 our topologies work with the by-default enabled back-pressure, we sometimes have the blocked worker issue which we have mitigated by writing our own "fail-over" system which detects such situation and automatically restart impacted topologies.
With Storm 1.0.3, we no longer have blocked workers, but our lag sometimes gets crazy, CPU load bumps and we have a huge accumulation of memory with disruptor queue.
To answer your questions about our topologies' settings, here's what we currently have:
Required information

Property name (if not the same)

Property value

topology.acker.executors

-

1

topology.worker.max.heap.size.mb

-

768

worker heap size

worker.heap.memory.mb

768

max spout pending

topology.max.spout.pending

Null

back pressure settings

backpressure.disruptor.high.watermark
backpressure.disruptor.low.watermark
task.backpressure.poll.secs
topology.backpressure.enable

0.9
0.4
30
false

topology.message.timeout.secs

-

30


We're going to study metrics with your suggested approach
Best regards,
Alexandre


2017-05-02 9:52 GMT+02:00 Roshan Naik <ro...@hortonworks.com>>:
That ConcurrentLinkedQueue  is the overflow list that I was referring to earlier. It is part of org.apache.storm.utils.DisruptorQueue.
This DisruptorQueue class is Storm’s wrapper around the lmax disruptor q.

When a spout/bolt instance cannot emit() to its downstream bolt (within the same worker process), because the inbound DisruptorQ of the destination bolt is full… the messages are stashed away in the overflow linked list associated with that DisruptorQ . As the disruptor q gets gradually drained a bit, the messages from the overflow are drained into the available space in the Disruptor.

In cases like this the max spout pending, if enabled, should kick in to prevent excessive accumulation of un-acked messages in the topology.
I assume you are using ACKers in your topo ? Otherwise this won’t help.

Can you share the values of the below settings … as shown by the topology settings search box in the topology UI page …
- topology.acker.executors
- topology.worker.max.heap.size.mb:
- worker heap size
- max spout pending
- back pressure settings
 - topology.message.timeout.secs


Also on the topology metrics table, you may be able to identify which spout->bolt or bolt->bolt  connection is congested by looking at the ‘transferred’/emits metrics of each spout and bolt. Also examine the ack counts.

It looks like Back pressure is still disabled by default.
https://github.com/apache/storm/blob/v1.0.3/conf/defaults.yaml
I am not sure how stable it is at the moment so wont be able to recommend on turning it on.

-roshan


From: Alexandre Vermeerbergen <av...@gmail.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Monday, May 1, 2017 at 2:50 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>

Subject: Re: Disruptor Queue Filling Memory

Hello,
I think that I am experiencing the same kind of issue as Tim with Storm 1.0.3 : I have a big instability in my storm cluster whenever I add a certain topology, leading to very high CPU load on the VM which hosts the worker process getting this topology.
I made a heap dump, opened it with Eclipse MAT, and bingo: it gives me "org.apache.storm.utils.DisruptorQueue" as the leaks / problem suspect 1.
More detail on Eclipse MAT's output:

One instance of "org.apache.storm.utils.DisruptorQueue" loaded by "sun.misc.Launcher$AppClassLoader @ 0x80013d40" occupies 766 807 504 (46,64%) bytes. The memory is accumulated in one instance of "java.util.concurrent.ConcurrentLinkedQueue$Node" loaded by "<system class loader>".

Keywords
org.apache.storm.utils.DisruptorQueue
sun.misc.Launcher$AppClassLoader @ 0x80013d40
java.util.concurrent.ConcurrentLinkedQueue$Node
The same set of topologies never "eats" that much CPU & memory with Storm 1.0.1, so I guess that with https://issues.apache.org/jira/browse/STORM-1956 the main difference between our full set of topologies working with Storm 1.0.1 vers 1.0.3 is that we no longer have backpressure with Storm 1.0.3.
I have a few questions which consolidate Tim's:
1. Is backpressure enabled again by default with Storm 1.1.0 ?
2. Are there guidelines to re-enable backpressure and correctly tune it ?
Best regards,
Alexandre Vermeerbergen

2017-05-01 21:52 GMT+02:00 Tim Fendt <Ti...@virginpulse.com>>:
We have max spout pending enabled and it is set to 1000 and we have the back pressure system turned off. We did see increased latency for the processor which contributed to the queueing. Given what you are saying I assume that 1000 messages are just too large to fit in memory we have assigned? Should we look at turning on back pressure and reducing max spout mending?

Thanks,

--
Tim


From: Roshan Naik <ro...@hortonworks.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Monday, May 1, 2017 at 2:26 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>, "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Re: Disruptor Queue Filling Memory

You are most likely experiencing back pressure and your max spout pending is not enabled. That is causing the overflow (unbounded) linked list inside stom's disruptor wrapper to swallow all the memory. You can try using max spout pending to throttle the spouts under such scenarios.

Get Outlook for iOS<https://aka.ms/o0ukef>


On Mon, May 1, 2017 at 11:56 AM -0700, "Tim Fendt" <Ti...@virginpulse.com>> wrote:
We have been having an issue where after about a week of running our old gen on the JVM has troubles freeing space. I generated a heapdump during the last issue and found it to be filled with DisruptorQueue objects. Is there a memory leak with the disruptor queue or is there some configuration we are missing? We are running Storm version 1.0.2.

org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher and org.apache.storm.utils.DisruptorQueue classes fill the memory.
https://puu.sh/vCkQE/cda1f319ad.png

This is our config for the supervisors:
storm.local.dir: "/var/storm-local"
storm.zookeeper.servers:
    - “10.0.0.5”
storm.zookeeper.port: 2181

nimbus.seeds: ["10.0.0.6"]

supervisor.slots.ports:
    - 6700

worker.childopts: "-Xms3072m -Xmx3072m"


Thanks,

--
Tim

Confidentiality Notice: The information contained in this e-mail, including any attachment(s), is intended solely for use by the designated recipient(s). Unauthorized use, dissemination, distribution, or reproduction of this message by anyone other than the intended recipient(s), or a person designated as responsible for delivering such messages to the intended recipient, is strictly prohibited and may be unlawful. This e-mail may contain proprietary, confidential or privileged information. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Virgin Pulse, Inc. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender and delete this e-mail message.

Re: Disruptor Queue Filling Memory

Posted by Alexandre Vermeerbergen <av...@gmail.com>.

Hello Roshan,

Thanks for the hint.

Regarding back pressure fix: it looks like the last activity on the
associated JIRA (https://issues.apache.org/jira/browse/STORM-1949) was 1st
of September 2016, and that Zhuo Liu was asking you (and also to Alessandro
Bellina) to perform some tests in 2.0 branch... and this JIRA never got
updated anymore.

It would be great to have some follow-up on this backpressure issue.

In the meantime, I have to make a quick decision about our use of Storm
1.0.3 in production : we have re-enabled backpressure, and so far it's
behaving like we had with 1.0.1 (yet we have not yet observed workers
blocking).

So between seeing our workers accumulating to much lag versus using a
backpressure which sometimes can block our workers - but we have our
self-healing, I'll use backpressure with Storm 1.0.3 for the short term.

Our next target is based on Storm 1.1.0, so we will take more time to
weight the alternative (ie: keep backpressure or spend more time on
searching for bottlenecks & tuning)

Thanks,
Alexandre Vermeerbergen




2017-05-02 11:19 GMT+02:00 Roshan Naik <ro...@hortonworks.com>:

> Like I suspected …your topology.max.spout.pending is disabled.
>
> Set it to something like 10k or 50k  .. assuming your message sizes are in
> kb or less.
>
>
>
> The worker stall/blocked issue may have been due to the backpressure
> subsystem. I remember reporting that bug, not sure if it got addressed
> fully. That’s why we disabled it by default.
>
>
>
> -roshan
>
>
>
> *From: *Alexandre Vermeerbergen <av...@gmail.com>
> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
> *Date: *Tuesday, May 2, 2017 at 2:11 AM
>
> *To: *"user@storm.apache.org" <us...@storm.apache.org>
> *Subject: *Re: Disruptor Queue Filling Memory
>
>
>
> Hi Rohan,
>
> Thank you very much for your answers.
>
>
> For your information, with Storm 1.0.1 our topologies work with the
> by-default enabled back-pressure, we sometimes have the blocked worker
> issue which we have mitigated by writing our own "fail-over" system which
> detects such situation and automatically restart impacted topologies.
>
> With Storm 1.0.3, we no longer have blocked workers, but our lag sometimes
> gets crazy, CPU load bumps and we have a huge accumulation of memory with
> disruptor queue.
>
> To answer your questions about our topologies' settings, here's what we
> currently have:
>
> *Required information*
>
> *Property name (if not the same)*
>
> *Property value*
>
> topology.acker.executors
>
> -
>
> 1
>
> topology.worker.max.heap.size.mb
>
> -
>
> 768
>
> worker heap size
>
> worker.heap.memory.mb
>
> 768
>
> max spout pending
>
> topology.max.spout.pending
>
> Null
>
> back pressure settings
>
> backpressure.disruptor.high.watermark
>
> backpressure.disruptor.low.watermark
>
> task.backpressure.poll.secs
>
> topology.backpressure.enable
>
> 0.9
>
> 0.4
>
> 30
>
> false
>
> topology.message.timeout.secs
>
> -
>
> 30
>
>
>
> We're going to study metrics with your suggested approach
>
> Best regards,
>
> Alexandre
>
>
>
>
>
> 2017-05-02 9:52 GMT+02:00 Roshan Naik <ro...@hortonworks.com>:
>
> That *ConcurrentLinkedQueue*  is the overflow list that I was referring
> to earlier. It is part of *org.apache.storm.utils.DisruptorQueue.*
>
> This DisruptorQueue class is Storm’s wrapper around the lmax disruptor q.
>
>
>
> When a spout/bolt instance cannot emit() to its downstream bolt (within
> the same worker process), because the inbound DisruptorQ of the destination
> bolt is full… the messages are stashed away in the overflow linked list
> associated with that DisruptorQ . As the disruptor q gets gradually drained
> a bit, the messages from the overflow are drained into the available space
> in the Disruptor.
>
>
>
> In cases like this the max spout pending, if enabled, should kick in to
> prevent excessive accumulation of un-acked messages in the topology.
>
> I assume you are using ACKers in your topo ? Otherwise this won’t help.
>
>
>
> Can you share the values of the below settings … as shown by the topology
> settings search box in the topology UI page …
>
> - topology.acker.executors
>
> - topology.worker.max.heap.size.mb:
>
> - worker heap size
>
> - max spout pending
>
> - back pressure settings
>
>  - topology.message.timeout.secs
>
>
>
>
>
> Also on the topology metrics table, you may be able to identify which
> spout->bolt or bolt->bolt  connection is congested by looking at the
> ‘transferred’/emits metrics of each spout and bolt. Also examine the ack
> counts.
>
>
>
> It looks like Back pressure is still disabled by default.
>
> https://github.com/apache/storm/blob/v1.0.3/conf/defaults.yaml
>
> I am not sure how stable it is at the moment so wont be able to recommend
> on turning it on.
>
>
>
> -roshan
>
>
>
>
>
> *From: *Alexandre Vermeerbergen <av...@gmail.com>
> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
> *Date: *Monday, May 1, 2017 at 2:50 PM
> *To: *"user@storm.apache.org" <us...@storm.apache.org>
>
>
> *Subject: *Re: Disruptor Queue Filling Memory
>
>
>
> Hello,
>
> I think that I am experiencing the same kind of issue as Tim with Storm
> 1.0.3 : I have a big instability in my storm cluster whenever I add a
> certain topology, leading to very high CPU load on the VM which hosts the
> worker process getting this topology.
>
> I made a heap dump, opened it with Eclipse MAT, and bingo: it gives me
> "org.apache.storm.utils.DisruptorQueue" as the leaks / problem suspect 1.
>
> More detail on Eclipse MAT's output:
>
> One instance of *"org.apache.storm.utils.DisruptorQueue"* loaded by *"sun.misc.Launcher$AppClassLoader
> @ 0x80013d40"* occupies *766 807 504 (46,64%)* bytes. The memory is
> accumulated in one instance of *
> "java.util.concurrent.ConcurrentLinkedQueue$Node"* loaded by *"<system
> class loader>"*.
>
> *Keywords*
> org.apache.storm.utils.DisruptorQueue
> sun.misc.Launcher$AppClassLoader @ 0x80013d40
> java.util.concurrent.ConcurrentLinkedQueue$Node
>
> The same set of topologies never "eats" that much CPU & memory with Storm
> 1.0.1, so I guess that with https://issues.apache.org/
> jira/browse/STORM-1956 the main difference between our full set of
> topologies working with Storm 1.0.1 vers 1.0.3 is that we no longer have
> backpressure with Storm 1.0.3.
>
> I have a few questions which consolidate Tim's:
>
> 1. Is backpressure enabled again by default with Storm 1.1.0 ?
>
> 2. Are there guidelines to re-enable backpressure and correctly tune it ?
>
> Best regards,
>
> Alexandre Vermeerbergen
>
>
>
> 2017-05-01 21:52 GMT+02:00 Tim Fendt <Ti...@virginpulse.com>:
>
> We have max spout pending enabled and it is set to 1000 and we have the
> back pressure system turned off. We did see increased latency for the
> processor which contributed to the queueing. Given what you are saying I
> assume that 1000 messages are just too large to fit in memory we have
> assigned? Should we look at turning on back pressure and reducing max spout
> mending?
>
>
>
> Thanks,
>
>
>
> --
>
> Tim
>
>
>
>
>
> *From: *Roshan Naik <ro...@hortonworks.com>
> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
> *Date: *Monday, May 1, 2017 at 2:26 PM
> *To: *"user@storm.apache.org" <us...@storm.apache.org>, "
> user@storm.apache.org" <us...@storm.apache.org>
> *Subject: *Re: Disruptor Queue Filling Memory
>
>
>
> You are most likely experiencing back pressure and your max spout pending
> is not enabled. That is causing the overflow (unbounded) linked list inside
> stom's disruptor wrapper to swallow all the memory. You can try using max
> spout pending to throttle the spouts under such scenarios.
>
>
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
>
>
>
>
>
> On Mon, May 1, 2017 at 11:56 AM -0700, "Tim Fendt" <
> Tim.Fendt@virginpulse.com> wrote:
>
> We have been having an issue where after about a week of running our old
> gen on the JVM has troubles freeing space. I generated a heapdump during
> the last issue and found it to be filled with DisruptorQueue objects. Is
> there a memory leak with the disruptor queue or is there some configuration
> we are missing? We are running Storm version 1.0.2.
>
>
>
> org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher and
> org.apache.storm.utils.DisruptorQueue classes fill the memory.
>
> https://puu.sh/vCkQE/cda1f319ad.png
>
>
>
> This is our config for the supervisors:
>
> storm.local.dir: "/var/storm-local"
> storm.zookeeper.servers:
>     - “10.0.0.5”
> storm.zookeeper.port: 2181
>
> nimbus.seeds: ["10.0.0.6"]
>
> supervisor.slots.ports:
>     - 6700
>
> worker.childopts: "-Xms3072m -Xmx3072m"
>
>
>
>
>
> Thanks,
>
>
>
> --
>
> Tim
>
>
>
> Confidentiality Notice: The information contained in this e-mail,
> including any attachment(s), is intended solely for use by the designated
> recipient(s). Unauthorized use, dissemination, distribution, or
> reproduction of this message by anyone other than the intended
> recipient(s), or a person designated as responsible for delivering such
> messages to the intended recipient, is strictly prohibited and may be
> unlawful. This e-mail may contain proprietary, confidential or privileged
> information. Any views or opinions expressed are solely those of the author
> and do not necessarily represent those of Virgin Pulse, Inc. If you have
> received this message in error, or are not the named recipient(s), please
> immediately notify the sender and delete this e-mail message.
>
>
>
>
>

Re: Disruptor Queue Filling Memory

Posted by Roshan Naik <ro...@hortonworks.com>.

Like I suspected …your topology.max.spout.pending is disabled.
Set it to something like 10k or 50k  .. assuming your message sizes are in kb or less.

The worker stall/blocked issue may have been due to the backpressure subsystem. I remember reporting that bug, not sure if it got addressed fully. That’s why we disabled it by default.

-roshan

From: Alexandre Vermeerbergen <av...@gmail.com>
Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
Date: Tuesday, May 2, 2017 at 2:11 AM
To: "user@storm.apache.org" <us...@storm.apache.org>
Subject: Re: Disruptor Queue Filling Memory

Hi Rohan,
Thank you very much for your answers.

For your information, with Storm 1.0.1 our topologies work with the by-default enabled back-pressure, we sometimes have the blocked worker issue which we have mitigated by writing our own "fail-over" system which detects such situation and automatically restart impacted topologies.
With Storm 1.0.3, we no longer have blocked workers, but our lag sometimes gets crazy, CPU load bumps and we have a huge accumulation of memory with disruptor queue.
To answer your questions about our topologies' settings, here's what we currently have:
Required information

Property name (if not the same)

Property value

topology.acker.executors

-

1

topology.worker.max.heap.size.mb

-

768

worker heap size

worker.heap.memory.mb

768

max spout pending

topology.max.spout.pending

Null

back pressure settings

backpressure.disruptor.high.watermark
backpressure.disruptor.low.watermark
task.backpressure.poll.secs
topology.backpressure.enable

0.9
0.4
30
false

topology.message.timeout.secs

-

30

We're going to study metrics with your suggested approach
Best regards,
Alexandre

2017-05-02 9:52 GMT+02:00 Roshan Naik <ro...@hortonworks.com>>:
That ConcurrentLinkedQueue  is the overflow list that I was referring to earlier. It is part of org.apache.storm.utils.DisruptorQueue.
This DisruptorQueue class is Storm’s wrapper around the lmax disruptor q.

When a spout/bolt instance cannot emit() to its downstream bolt (within the same worker process), because the inbound DisruptorQ of the destination bolt is full… the messages are stashed away in the overflow linked list associated with that DisruptorQ . As the disruptor q gets gradually drained a bit, the messages from the overflow are drained into the available space in the Disruptor.

In cases like this the max spout pending, if enabled, should kick in to prevent excessive accumulation of un-acked messages in the topology.
I assume you are using ACKers in your topo ? Otherwise this won’t help.

Can you share the values of the below settings … as shown by the topology settings search box in the topology UI page …
- topology.acker.executors
- topology.worker.max.heap.size.mb:
- worker heap size
- max spout pending
- back pressure settings
 - topology.message.timeout.secs

Also on the topology metrics table, you may be able to identify which spout->bolt or bolt->bolt  connection is congested by looking at the ‘transferred’/emits metrics of each spout and bolt. Also examine the ack counts.

It looks like Back pressure is still disabled by default.
https://github.com/apache/storm/blob/v1.0.3/conf/defaults.yaml
I am not sure how stable it is at the moment so wont be able to recommend on turning it on.

-roshan

From: Alexandre Vermeerbergen <av...@gmail.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Monday, May 1, 2017 at 2:50 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>

Subject: Re: Disruptor Queue Filling Memory

Hello,
I think that I am experiencing the same kind of issue as Tim with Storm 1.0.3 : I have a big instability in my storm cluster whenever I add a certain topology, leading to very high CPU load on the VM which hosts the worker process getting this topology.
I made a heap dump, opened it with Eclipse MAT, and bingo: it gives me "org.apache.storm.utils.DisruptorQueue" as the leaks / problem suspect 1.
More detail on Eclipse MAT's output:

One instance of "org.apache.storm.utils.DisruptorQueue" loaded by "sun.misc.Launcher$AppClassLoader @ 0x80013d40" occupies 766 807 504 (46,64%) bytes. The memory is accumulated in one instance of "java.util.concurrent.ConcurrentLinkedQueue$Node" loaded by "<system class loader>".

Keywords
org.apache.storm.utils.DisruptorQueue
sun.misc.Launcher$AppClassLoader @ 0x80013d40
java.util.concurrent.ConcurrentLinkedQueue$Node
The same set of topologies never "eats" that much CPU & memory with Storm 1.0.1, so I guess that with https://issues.apache.org/jira/browse/STORM-1956 the main difference between our full set of topologies working with Storm 1.0.1 vers 1.0.3 is that we no longer have backpressure with Storm 1.0.3.
I have a few questions which consolidate Tim's:
1. Is backpressure enabled again by default with Storm 1.1.0 ?
2. Are there guidelines to re-enable backpressure and correctly tune it ?
Best regards,
Alexandre Vermeerbergen

2017-05-01 21:52 GMT+02:00 Tim Fendt <Ti...@virginpulse.com>>:
We have max spout pending enabled and it is set to 1000 and we have the back pressure system turned off. We did see increased latency for the processor which contributed to the queueing. Given what you are saying I assume that 1000 messages are just too large to fit in memory we have assigned? Should we look at turning on back pressure and reducing max spout mending?

Thanks,

--
Tim

From: Roshan Naik <ro...@hortonworks.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Monday, May 1, 2017 at 2:26 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>, "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Re: Disruptor Queue Filling Memory

You are most likely experiencing back pressure and your max spout pending is not enabled. That is causing the overflow (unbounded) linked list inside stom's disruptor wrapper to swallow all the memory. You can try using max spout pending to throttle the spouts under such scenarios.

Get Outlook for iOS<https://aka.ms/o0ukef>

On Mon, May 1, 2017 at 11:56 AM -0700, "Tim Fendt" <Ti...@virginpulse.com>> wrote:
We have been having an issue where after about a week of running our old gen on the JVM has troubles freeing space. I generated a heapdump during the last issue and found it to be filled with DisruptorQueue objects. Is there a memory leak with the disruptor queue or is there some configuration we are missing? We are running Storm version 1.0.2.

org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher and org.apache.storm.utils.DisruptorQueue classes fill the memory.
https://puu.sh/vCkQE/cda1f319ad.png

This is our config for the supervisors:
storm.local.dir: "/var/storm-local"
storm.zookeeper.servers:
    - “10.0.0.5”
storm.zookeeper.port: 2181

nimbus.seeds: ["10.0.0.6"]

supervisor.slots.ports:
    - 6700

worker.childopts: "-Xms3072m -Xmx3072m"

Thanks,

--
Tim

Confidentiality Notice: The information contained in this e-mail, including any attachment(s), is intended solely for use by the designated recipient(s). Unauthorized use, dissemination, distribution, or reproduction of this message by anyone other than the intended recipient(s), or a person designated as responsible for delivering such messages to the intended recipient, is strictly prohibited and may be unlawful. This e-mail may contain proprietary, confidential or privileged information. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Virgin Pulse, Inc. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender and delete this e-mail message.

Re: Disruptor Queue Filling Memory

Posted by Alexandre Vermeerbergen <av...@gmail.com>.

Hi Rohan,

Thank you very much for your answers.

For your information, with Storm 1.0.1 our topologies work with the
by-default enabled back-pressure, we sometimes have the blocked worker
issue which we have mitigated by writing our own "fail-over" system which
detects such situation and automatically restart impacted topologies.

With Storm 1.0.3, we no longer have blocked workers, but our lag sometimes
gets crazy, CPU load bumps and we have a huge accumulation of memory with
disruptor queue.

To answer your questions about our topologies' settings, here's what we
currently have:

*Required information*

*Property name (if not the same)*

*Property value*

topology.acker.executors

-

1

topology.worker.max.heap.size.mb

-

768

worker heap size

worker.heap.memory.mb

768

max spout pending

topology.max.spout.pending

Null

back pressure settings

backpressure.disruptor.high.watermark

backpressure.disruptor.low.watermark

task.backpressure.poll.secs

topology.backpressure.enable

0.9

0.4

30

false

topology.message.timeout.secs

-

30

We're going to study metrics with your suggested approach

Best regards,
Alexandre



2017-05-02 9:52 GMT+02:00 Roshan Naik <ro...@hortonworks.com>:

> That *ConcurrentLinkedQueue*  is the overflow list that I was referring
> to earlier. It is part of *org.apache.storm.utils.DisruptorQueue.*
>
> This DisruptorQueue class is Storm’s wrapper around the lmax disruptor q.
>
>
>
> When a spout/bolt instance cannot emit() to its downstream bolt (within
> the same worker process), because the inbound DisruptorQ of the destination
> bolt is full… the messages are stashed away in the overflow linked list
> associated with that DisruptorQ . As the disruptor q gets gradually drained
> a bit, the messages from the overflow are drained into the available space
> in the Disruptor.
>
>
>
> In cases like this the max spout pending, if enabled, should kick in to
> prevent excessive accumulation of un-acked messages in the topology.
>
> I assume you are using ACKers in your topo ? Otherwise this won’t help.
>
>
>
> Can you share the values of the below settings … as shown by the topology
> settings search box in the topology UI page …
>
> - topology.acker.executors
>
> - topology.worker.max.heap.size.mb:
>
> - worker heap size
>
> - max spout pending
>
> - back pressure settings
>
>  - topology.message.timeout.secs
>
>
>
>
>
> Also on the topology metrics table, you may be able to identify which
> spout->bolt or bolt->bolt  connection is congested by looking at the
> ‘transferred’/emits metrics of each spout and bolt. Also examine the ack
> counts.
>
>
>
> It looks like Back pressure is still disabled by default.
>
> https://github.com/apache/storm/blob/v1.0.3/conf/defaults.yaml
>
> I am not sure how stable it is at the moment so wont be able to recommend
> on turning it on.
>
>
>
> -roshan
>
>
>
>
>
> *From: *Alexandre Vermeerbergen <av...@gmail.com>
> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
> *Date: *Monday, May 1, 2017 at 2:50 PM
> *To: *"user@storm.apache.org" <us...@storm.apache.org>
>
> *Subject: *Re: Disruptor Queue Filling Memory
>
>
>
> Hello,
>
> I think that I am experiencing the same kind of issue as Tim with Storm
> 1.0.3 : I have a big instability in my storm cluster whenever I add a
> certain topology, leading to very high CPU load on the VM which hosts the
> worker process getting this topology.
>
> I made a heap dump, opened it with Eclipse MAT, and bingo: it gives me
> "org.apache.storm.utils.DisruptorQueue" as the leaks / problem suspect 1.
>
> More detail on Eclipse MAT's output:
>
> One instance of *"org.apache.storm.utils.DisruptorQueue"* loaded by *"sun.misc.Launcher$AppClassLoader
> @ 0x80013d40"* occupies *766 807 504 (46,64%)* bytes. The memory is
> accumulated in one instance of *
> "java.util.concurrent.ConcurrentLinkedQueue$Node"* loaded by *"<system
> class loader>"*.
>
> *Keywords*
> org.apache.storm.utils.DisruptorQueue
> sun.misc.Launcher$AppClassLoader @ 0x80013d40
> java.util.concurrent.ConcurrentLinkedQueue$Node
>
> The same set of topologies never "eats" that much CPU & memory with Storm
> 1.0.1, so I guess that with https://issues.apache.org/
> jira/browse/STORM-1956 the main difference between our full set of
> topologies working with Storm 1.0.1 vers 1.0.3 is that we no longer have
> backpressure with Storm 1.0.3.
>
> I have a few questions which consolidate Tim's:
>
> 1. Is backpressure enabled again by default with Storm 1.1.0 ?
>
> 2. Are there guidelines to re-enable backpressure and correctly tune it ?
>
> Best regards,
>
> Alexandre Vermeerbergen
>
>
>
> 2017-05-01 21:52 GMT+02:00 Tim Fendt <Ti...@virginpulse.com>:
>
> We have max spout pending enabled and it is set to 1000 and we have the
> back pressure system turned off. We did see increased latency for the
> processor which contributed to the queueing. Given what you are saying I
> assume that 1000 messages are just too large to fit in memory we have
> assigned? Should we look at turning on back pressure and reducing max spout
> mending?
>
>
>
> Thanks,
>
>
>
> --
>
> Tim
>
>
>
>
>
> *From: *Roshan Naik <ro...@hortonworks.com>
> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
> *Date: *Monday, May 1, 2017 at 2:26 PM
> *To: *"user@storm.apache.org" <us...@storm.apache.org>, "
> user@storm.apache.org" <us...@storm.apache.org>
> *Subject: *Re: Disruptor Queue Filling Memory
>
>
>
> You are most likely experiencing back pressure and your max spout pending
> is not enabled. That is causing the overflow (unbounded) linked list inside
> stom's disruptor wrapper to swallow all the memory. You can try using max
> spout pending to throttle the spouts under such scenarios.
>
>
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
>
>
>
>
>
> On Mon, May 1, 2017 at 11:56 AM -0700, "Tim Fendt" <
> Tim.Fendt@virginpulse.com> wrote:
>
> We have been having an issue where after about a week of running our old
> gen on the JVM has troubles freeing space. I generated a heapdump during
> the last issue and found it to be filled with DisruptorQueue objects. Is
> there a memory leak with the disruptor queue or is there some configuration
> we are missing? We are running Storm version 1.0.2.
>
>
>
> org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher and
> org.apache.storm.utils.DisruptorQueue classes fill the memory.
>
> https://puu.sh/vCkQE/cda1f319ad.png
>
>
>
> This is our config for the supervisors:
>
> storm.local.dir: "/var/storm-local"
> storm.zookeeper.servers:
>     - “10.0.0.5”
> storm.zookeeper.port: 2181
>
> nimbus.seeds: ["10.0.0.6"]
>
> supervisor.slots.ports:
>     - 6700
>
> worker.childopts: "-Xms3072m -Xmx3072m"
>
>
>
>
>
> Thanks,
>
>
>
> --
>
> Tim
>
>
>
> Confidentiality Notice: The information contained in this e-mail,
> including any attachment(s), is intended solely for use by the designated
> recipient(s). Unauthorized use, dissemination, distribution, or
> reproduction of this message by anyone other than the intended
> recipient(s), or a person designated as responsible for delivering such
> messages to the intended recipient, is strictly prohibited and may be
> unlawful. This e-mail may contain proprietary, confidential or privileged
> information. Any views or opinions expressed are solely those of the author
> and do not necessarily represent those of Virgin Pulse, Inc. If you have
> received this message in error, or are not the named recipient(s), please
> immediately notify the sender and delete this e-mail message.
>
>
>

Re: Disruptor Queue Filling Memory

Posted by Roshan Naik <ro...@hortonworks.com>.

That ConcurrentLinkedQueue  is the overflow list that I was referring to earlier. It is part of org.apache.storm.utils.DisruptorQueue.
This DisruptorQueue class is Storm’s wrapper around the lmax disruptor q.

When a spout/bolt instance cannot emit() to its downstream bolt (within the same worker process), because the inbound DisruptorQ of the destination bolt is full… the messages are stashed away in the overflow linked list associated with that DisruptorQ . As the disruptor q gets gradually drained a bit, the messages from the overflow are drained into the available space in the Disruptor.

In cases like this the max spout pending, if enabled, should kick in to prevent excessive accumulation of un-acked messages in the topology.
I assume you are using ACKers in your topo ? Otherwise this won’t help.

Can you share the values of the below settings … as shown by the topology settings search box in the topology UI page …
- topology.acker.executors
- topology.worker.max.heap.size.mb:
- worker heap size
- max spout pending
- back pressure settings
 - topology.message.timeout.secs


Also on the topology metrics table, you may be able to identify which spout->bolt or bolt->bolt  connection is congested by looking at the ‘transferred’/emits metrics of each spout and bolt. Also examine the ack counts.

It looks like Back pressure is still disabled by default.
https://github.com/apache/storm/blob/v1.0.3/conf/defaults.yaml
I am not sure how stable it is at the moment so wont be able to recommend on turning it on.

-roshan


From: Alexandre Vermeerbergen <av...@gmail.com>
Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
Date: Monday, May 1, 2017 at 2:50 PM
To: "user@storm.apache.org" <us...@storm.apache.org>
Subject: Re: Disruptor Queue Filling Memory

Hello,
I think that I am experiencing the same kind of issue as Tim with Storm 1.0.3 : I have a big instability in my storm cluster whenever I add a certain topology, leading to very high CPU load on the VM which hosts the worker process getting this topology.
I made a heap dump, opened it with Eclipse MAT, and bingo: it gives me "org.apache.storm.utils.DisruptorQueue" as the leaks / problem suspect 1.
More detail on Eclipse MAT's output:

One instance of "org.apache.storm.utils.DisruptorQueue" loaded by "sun.misc.Launcher$AppClassLoader @ 0x80013d40" occupies 766 807 504 (46,64%) bytes. The memory is accumulated in one instance of "java.util.concurrent.ConcurrentLinkedQueue$Node" loaded by "<system class loader>".

Keywords
org.apache.storm.utils.DisruptorQueue
sun.misc.Launcher$AppClassLoader @ 0x80013d40
java.util.concurrent.ConcurrentLinkedQueue$Node
The same set of topologies never "eats" that much CPU & memory with Storm 1.0.1, so I guess that with https://issues.apache.org/jira/browse/STORM-1956 the main difference between our full set of topologies working with Storm 1.0.1 vers 1.0.3 is that we no longer have backpressure with Storm 1.0.3.
I have a few questions which consolidate Tim's:
1. Is backpressure enabled again by default with Storm 1.1.0 ?
2. Are there guidelines to re-enable backpressure and correctly tune it ?
Best regards,
Alexandre Vermeerbergen

2017-05-01 21:52 GMT+02:00 Tim Fendt <Ti...@virginpulse.com>>:
We have max spout pending enabled and it is set to 1000 and we have the back pressure system turned off. We did see increased latency for the processor which contributed to the queueing. Given what you are saying I assume that 1000 messages are just too large to fit in memory we have assigned? Should we look at turning on back pressure and reducing max spout mending?

Thanks,

--
Tim


From: Roshan Naik <ro...@hortonworks.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Monday, May 1, 2017 at 2:26 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>, "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Re: Disruptor Queue Filling Memory

You are most likely experiencing back pressure and your max spout pending is not enabled. That is causing the overflow (unbounded) linked list inside stom's disruptor wrapper to swallow all the memory. You can try using max spout pending to throttle the spouts under such scenarios.

Get Outlook for iOS<https://aka.ms/o0ukef>


On Mon, May 1, 2017 at 11:56 AM -0700, "Tim Fendt" <Ti...@virginpulse.com>> wrote:
We have been having an issue where after about a week of running our old gen on the JVM has troubles freeing space. I generated a heapdump during the last issue and found it to be filled with DisruptorQueue objects. Is there a memory leak with the disruptor queue or is there some configuration we are missing? We are running Storm version 1.0.2.

org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher and org.apache.storm.utils.DisruptorQueue classes fill the memory.
https://puu.sh/vCkQE/cda1f319ad.png

This is our config for the supervisors:
storm.local.dir: "/var/storm-local"
storm.zookeeper.servers:
    - “10.0.0.5”
storm.zookeeper.port: 2181

nimbus.seeds: ["10.0.0.6"]

supervisor.slots.ports:
    - 6700

worker.childopts: "-Xms3072m -Xmx3072m"


Thanks,

--
Tim

Confidentiality Notice: The information contained in this e-mail, including any attachment(s), is intended solely for use by the designated recipient(s). Unauthorized use, dissemination, distribution, or reproduction of this message by anyone other than the intended recipient(s), or a person designated as responsible for delivering such messages to the intended recipient, is strictly prohibited and may be unlawful. This e-mail may contain proprietary, confidential or privileged information. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Virgin Pulse, Inc. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender and delete this e-mail message.

Re: Disruptor Queue Filling Memory

Posted by Alexandre Vermeerbergen <av...@gmail.com>.

Hello,

I think that I am experiencing the same kind of issue as Tim with Storm
1.0.3 : I have a big instability in my storm cluster whenever I add a
certain topology, leading to very high CPU load on the VM which hosts the
worker process getting this topology.

I made a heap dump, opened it with Eclipse MAT, and bingo: it gives me
"org.apache.storm.utils.DisruptorQueue" as the leaks / problem suspect 1.

More detail on Eclipse MAT's output:

One instance of *"org.apache.storm.utils.DisruptorQueue"* loaded by
*"sun.misc.Launcher$AppClassLoader
@ 0x80013d40"* occupies *766 807 504 (46,64%)* bytes. The memory is
accumulated in one instance of
*"java.util.concurrent.ConcurrentLinkedQueue$Node"* loaded by *"<system
class loader>"*.

*Keywords*
org.apache.storm.utils.DisruptorQueue
sun.misc.Launcher$AppClassLoader @ 0x80013d40
java.util.concurrent.ConcurrentLinkedQueue$Node

The same set of topologies never "eats" that much CPU & memory with Storm
1.0.1, so I guess that with https://issues.apache.org/jira/browse/STORM-1956
the main difference between our full set of topologies working with Storm
1.0.1 vers 1.0.3 is that we no longer have backpressure with Storm 1.0.3.

I have a few questions which consolidate Tim's:
1. Is backpressure enabled again by default with Storm 1.1.0 ?
2. Are there guidelines to re-enable backpressure and correctly tune it ?

Best regards,
Alexandre Vermeerbergen


2017-05-01 21:52 GMT+02:00 Tim Fendt <Ti...@virginpulse.com>:

> We have max spout pending enabled and it is set to 1000 and we have the
> back pressure system turned off. We did see increased latency for the
> processor which contributed to the queueing. Given what you are saying I
> assume that 1000 messages are just too large to fit in memory we have
> assigned? Should we look at turning on back pressure and reducing max spout
> mending?
>
>
>
> Thanks,
>
>
>
> --
>
> Tim
>
>
>
>
>
> *From: *Roshan Naik <ro...@hortonworks.com>
> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
> *Date: *Monday, May 1, 2017 at 2:26 PM
> *To: *"user@storm.apache.org" <us...@storm.apache.org>, "
> user@storm.apache.org" <us...@storm.apache.org>
> *Subject: *Re: Disruptor Queue Filling Memory
>
>
>
> You are most likely experiencing back pressure and your max spout pending
> is not enabled. That is causing the overflow (unbounded) linked list inside
> stom's disruptor wrapper to swallow all the memory. You can try using max
> spout pending to throttle the spouts under such scenarios.
>
>
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
>
>
>
>
>
> On Mon, May 1, 2017 at 11:56 AM -0700, "Tim Fendt" <
> Tim.Fendt@virginpulse.com> wrote:
>
> We have been having an issue where after about a week of running our old
> gen on the JVM has troubles freeing space. I generated a heapdump during
> the last issue and found it to be filled with DisruptorQueue objects. Is
> there a memory leak with the disruptor queue or is there some configuration
> we are missing? We are running Storm version 1.0.2.
>
>
>
> org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher and
> org.apache.storm.utils.DisruptorQueue classes fill the memory.
>
> https://puu.sh/vCkQE/cda1f319ad.png
>
>
>
> This is our config for the supervisors:
>
> storm.local.dir: "/var/storm-local"
> storm.zookeeper.servers:
>     - “10.0.0.5”
> storm.zookeeper.port: 2181
>
> nimbus.seeds: ["10.0.0.6"]
>
> supervisor.slots.ports:
>     - 6700
>
> worker.childopts: "-Xms3072m -Xmx3072m"
>
>
>
>
>
> Thanks,
>
>
>
> --
>
> Tim
>
>
>
> Confidentiality Notice: The information contained in this e-mail,
> including any attachment(s), is intended solely for use by the designated
> recipient(s). Unauthorized use, dissemination, distribution, or
> reproduction of this message by anyone other than the intended
> recipient(s), or a person designated as responsible for delivering such
> messages to the intended recipient, is strictly prohibited and may be
> unlawful. This e-mail may contain proprietary, confidential or privileged
> information. Any views or opinions expressed are solely those of the author
> and do not necessarily represent those of Virgin Pulse, Inc. If you have
> received this message in error, or are not the named recipient(s), please
> immediately notify the sender and delete this e-mail message.
>
>

Re: Disruptor Queue Filling Memory

Posted by Tim Fendt <Ti...@virginpulse.com>.

We have max spout pending enabled and it is set to 1000 and we have the back pressure system turned off. We did see increased latency for the processor which contributed to the queueing. Given what you are saying I assume that 1000 messages are just too large to fit in memory we have assigned? Should we look at turning on back pressure and reducing max spout mending?

Thanks,

--
Tim


From: Roshan Naik <ro...@hortonworks.com>
Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
Date: Monday, May 1, 2017 at 2:26 PM
To: "user@storm.apache.org" <us...@storm.apache.org>, "user@storm.apache.org" <us...@storm.apache.org>
Subject: Re: Disruptor Queue Filling Memory

You are most likely experiencing back pressure and your max spout pending is not enabled. That is causing the overflow (unbounded) linked list inside stom's disruptor wrapper to swallow all the memory. You can try using max spout pending to throttle the spouts under such scenarios.

Get Outlook for iOS<https://aka.ms/o0ukef>



On Mon, May 1, 2017 at 11:56 AM -0700, "Tim Fendt" <Ti...@virginpulse.com>> wrote:
We have been having an issue where after about a week of running our old gen on the JVM has troubles freeing space. I generated a heapdump during the last issue and found it to be filled with DisruptorQueue objects. Is there a memory leak with the disruptor queue or is there some configuration we are missing? We are running Storm version 1.0.2.

org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher and org.apache.storm.utils.DisruptorQueue classes fill the memory.
https://puu.sh/vCkQE/cda1f319ad.png

This is our config for the supervisors:
storm.local.dir: "/var/storm-local"
storm.zookeeper.servers:
    - “10.0.0.5”
storm.zookeeper.port: 2181

nimbus.seeds: ["10.0.0.6"]

supervisor.slots.ports:
    - 6700

worker.childopts: "-Xms3072m -Xmx3072m"


Thanks,

--
Tim

Confidentiality Notice: The information contained in this e-mail, including any attachment(s), is intended solely for use by the designated recipient(s). Unauthorized use, dissemination, distribution, or reproduction of this message by anyone other than the intended recipient(s), or a person designated as responsible for delivering such messages to the intended recipient, is strictly prohibited and may be unlawful. This e-mail may contain proprietary, confidential or privileged information. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Virgin Pulse, Inc. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender and delete this e-mail message.

Re: Disruptor Queue Filling Memory

Posted by Roshan Naik <ro...@hortonworks.com>.

You are most likely experiencing back pressure and your max spout pending is not enabled. That is causing the overflow (unbounded) linked list inside stom's disruptor wrapper to swallow all the memory. You can try using max spout pending to throttle the spouts under such scenarios.

Get Outlook for iOS<https://aka.ms/o0ukef>

On Mon, May 1, 2017 at 11:56 AM -0700, "Tim Fendt" <Ti...@virginpulse.com>> wrote:

org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher and org.apache.storm.utils.DisruptorQueue classes fill the memory.
https://puu.sh/vCkQE/cda1f319ad.png

This is our config for the supervisors:
storm.local.dir: "/var/storm-local"
storm.zookeeper.servers:
- “10.0.0.5”
storm.zookeeper.port: 2181

nimbus.seeds: ["10.0.0.6"]

supervisor.slots.ports:
- 6700

worker.childopts: "-Xms3072m -Xmx3072m"

Thanks,

--
Tim