You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2016/07/08 07:25:10 UTC

[jira] [Resolved] (STORM-1946) ShellBolt.java - On busy system BoltHeartbeatTimerTask fires before setHeartbeat() is executed

     [ https://issues.apache.org/jira/browse/STORM-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jungtaek Lim resolved STORM-1946.
---------------------------------
       Resolution: Fixed
         Assignee: Slava Andreyev
    Fix Version/s: 1.1.0
                   1.0.2
                   2.0.0

Thanks [~slava92], I merged into master, 1.x, 1.0.x branches respectively.

> ShellBolt.java - On busy system BoltHeartbeatTimerTask fires before setHeartbeat() is executed
> ----------------------------------------------------------------------------------------------
>
>                 Key: STORM-1946
>                 URL: https://issues.apache.org/jira/browse/STORM-1946
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>            Reporter: Slava Andreyev
>            Assignee: Slava Andreyev
>              Labels: patch
>             Fix For: 2.0.0, 1.0.2, 1.1.0
>
>         Attachments: ShellBolt.java.patch
>
>
> When storm stars a large number of ShellBolt-s that consume a lot of CPU time to initialize, it creates a lot of contention between processes for CPU resource. That leads to [BoltHeartbeatTimerTask|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L142] being fired up after 1 second delay _before_ [setHeartbeat()|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L145] assigns initial value to [lastHeartbeatTimestamp|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L91] variable.
> As a result when {{BoltHeartbeatTimeTask}} fires up for the first time, [getLastHeartbeat()|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L316] returns value of *0*. This in turn leads bolt to die with ["subprocess heartbeat timeout"|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L322] message.
> The fix is to place {{setHeartBeat()}} _before_ {{BoltHeartbeatTimerTask}} is created. The patch for this is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)