You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Rick Kellogg (JIRA)" <ji...@apache.org> on 2015/09/30 03:02:04 UTC

[jira] [Updated] (STORM-1021) HeartbeatExecutorService issue in Shell Spout\Bolt

     [ https://issues.apache.org/jira/browse/STORM-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Kellogg updated STORM-1021:
--------------------------------
    Component/s: storm-multilang

> HeartbeatExecutorService issue in Shell Spout\Bolt
> --------------------------------------------------
>
>                 Key: STORM-1021
>                 URL: https://issues.apache.org/jira/browse/STORM-1021
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-multilang
>    Affects Versions: 0.10.0, 0.9.5
>         Environment: Alpine Linux 3.2, openjdk-7
>            Reporter: Oleh Hordiichuk
>              Labels: easyfix
>             Fix For: 0.10.0
>
>
> ShellSpout class (and it seems that this touches ShellBolt as well) doesn't restart when hearbeat timeout occurs. To reproduce this bug you should do the following:
> 1. Set supervisor.worker.timeout.secs property to e.g. 1;
> 2. Create a shell spout (as a standalone application, not Java class) that hangs for more than 1 second and doesn't respond on heartbeat messages, e.g. Thread.Sleep(5000);
> 3. After timeout Storm will try to kill the shell spout process with calling die function:
> https://github.com/apache/storm/blob/v0.10.0-beta/storm-core/src/jvm/backtype/storm/spout/ShellSpout.java#L237
> 4. The "die" function will call heartBeatExecutorService.shutdownNow() function that raises InterruptedException, which is not caughted by the calling thread. In a result topology stops working properly, however you may see it in ./storm list.
> I'm not Java developer and thus I'm not sure whether code below is valid, however it seems to fix the problem:
>     private void die(Throwable exception) {
>         heartBeatExecutorService.shutdownNow();
>         try {
>             heartBeatExecutorService.awaitTermination(5, TimeUnit.SECONDS);
>         } catch (InterruptedException e) {
>             LOG.error("await catch ", e);
>         }
>         _collector.reportError(exception);
>         _process.destroy();
>         System.exit(11);
>     }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)