You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "James Xu (JIRA)" <ji...@apache.org> on 2013/12/14 07:25:06 UTC

[jira] [Created] (STORM-75) Dead lock between ShellBolt and ShellProcess

James Xu created STORM-75:
-----------------------------

             Summary: Dead lock between ShellBolt and ShellProcess
                 Key: STORM-75
                 URL: https://issues.apache.org/jira/browse/STORM-75
             Project: Apache Storm (Incubating)
          Issue Type: Bug
            Reporter: James Xu
            Priority: Minor


https://github.com/nathanmarz/storm/issues/423

The ShellBolt creates shell process and read data from output stream and error stream. The current implementation only read error stream when the output stream is closed. So messages in error stream will be put into the buffer of error stream. When the buffer is fully filled, the output in shell process would be blocked waiting for the error stream buffer to become available. While in ShellBolt it will also block there wait for the output in output stream from shell process. So it's a dead lock.

This behavior seems dangerous as the issue can be hidden, it can hardly be seen in normal tests. And normally the error output won't be too big to fill up the error stream buffer, but after the system have been running for a while on production, the error stream can be accumulated to full, and then dead lock would happen. There's no any error in log, hard to debug.

Here in Yahoo we are using many native libraries which is built long time ago, which sometimes writes to error stream when there's some error. It's impossible for us to inspect all the direct and indirect native library dependencies and rebuild all to remove all the error stream writing.

Now we used a workaround to redirect error stream into /dev/null at the beginning of our shell process. But I think in long term it should be fixed in ShellBolt and ShellProcess.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)