You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by "Mitchell Rathbun (BLOOMBERG/ 731 LEX)" <mr...@bloomberg.net> on 2018/06/06 03:43:02 UTC

Nimbus repeatedly crashing to issue with disk/ZooKeeper resources

From: Mitchell Rathbun (BLOOMBERG/ 731 LEX) At: 06/05/18 23:42:02To:  Mitchell Rathbun (BLOOMBERG/ 731 LEX ) 
Subject: Nimbus repeatedly crashing to issue with disk/ZooKeeper resources
Recently, our Nimbus crashed with a stack overflow error, and we are having some difficulty determining what the initial cause was. I have attached the stack trace to help with the debugging. This same stack trace occurred every time I ran Nimbus. I then deleted everything in the directory specified by storm.local.dir and removed everything in ZooKeeper under the storm.zookeeper.root path. I was then able to successfully run Nimbus. So this points to there being an issue with the data/state that Nimbus keeps. Has this issue been seen before, and how could the state reach a point that would prevent Nimbus from running at all? Is it possible that there was not enough disk/zk space, even though the logs don't really point to this being the issue?

Re: Nimbus repeatedly crashing to issue with disk/ZooKeeper resources

Posted by Bobby Evans <bo...@apache.org>.

The issue is that intervleave-all is a recursive function.

https://github.com/apache/storm/blob/e40d213de7067f7d3aa4d4992b81890d8ed6ff31/storm-core/src/clj/org/apache/storm/util.clj#L776-L784

So the depth of the stack trace is the number of slots you want to schedule
on * 3 because of how the recursion happens.

Sadly in the latest code it is the same, but still in java so it is not *
3, but still bad.

https://github.com/apache/storm/blob/3e098f12e2b09d4954eeeaaf807e4ff6006a6929/storm-server/src/main/java/org/apache/storm/utils/ServerUtils.java#L113-L130

So if you want to file a JIRA for us to fix this, that would be great.
Even better if you could look at making interleaveAll no longer recursive.

Thanks,

Bobby

On Tue, Jun 5, 2018 at 10:43 PM Mitchell Rathbun (BLOOMBERG/ 731 LEX) <
mrathbun1@bloomberg.net> wrote:

>
>
> From: Mitchell Rathbun (BLOOMBERG/ 731 LEX) At: 06/05/18 23:42:02
> To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) <mr...@bloomberg.net>
> Subject: Nimbus repeatedly crashing to issue with disk/ZooKeeper resources
> Recently, our Nimbus crashed with a stack overflow error, and we are
> having some difficulty determining what the initial cause was. I have
> attached the stack trace to help with the debugging. This same stack trace
> occurred every time I ran Nimbus. I then deleted everything in the
> directory specified by storm.local.dir and removed everything in ZooKeeper
> under the storm.zookeeper.root path. I was then able to successfully run
> Nimbus. So this points to there being an issue with the data/state that
> Nimbus keeps. Has this issue been seen before, and how could the state
> reach a point that would prevent Nimbus from running at all? Is it possible
> that there was not enough disk/zk space, even though the logs don't really
> point to this being the issue?
>