You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/05/03 10:06:12 UTC

[jira] [Commented] (STORM-1750) Report-error-and-die may not kill the worker

    [ https://issues.apache.org/jira/browse/STORM-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268330#comment-15268330 ] 

ASF GitHub Bot commented on STORM-1750:
---------------------------------------

Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/storm/pull/1384#issuecomment-216463598
  
    +1 
    @srdo Nice finding. Could you address this to 1.x-branch and 0.10.x-branch, too? Thanks in advance!


> Report-error-and-die may not kill the worker
> --------------------------------------------
>
>                 Key: STORM-1750
>                 URL: https://issues.apache.org/jira/browse/STORM-1750
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 0.10.0, 1.0.0, 2.0.0
>            Reporter: Stig Rohde Døssing
>            Assignee: Stig Rohde Døssing
>            Priority: Critical
>
> The report-error-and-die function in executor.clj calls report-error, which can throw exceptions if Curator runs into any kind of trouble while registering the error. I suspect this may happen with network errors, but it can also happen if two executors for the same component throw exceptions at the same time and no errors have been registered for the component previously. This is because both calls to report-error-and-die update the lastErrorPath, and ZkStateStorage set_data doesn't catch the potential NodeExistsException that may be thrown from the create call.
> If an exception is thrown from report-error, the suicide-fn is never called, and the worker keeps running sans the crashed executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)