You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Stig Rohde Døssing (JIRA)" <ji...@apache.org> on 2016/06/02 11:32:59 UTC

[jira] [Comment Edited] (STORM-1879) Supervisor may not shut down workers cleanly

    [ https://issues.apache.org/jira/browse/STORM-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312147#comment-15312147 ] 

Stig Rohde Døssing edited comment on STORM-1879 at 6/2/16 11:32 AM:
--------------------------------------------------------------------

The nimbus and supervisor logs for the period. The undead worker is running ZendeskTicketTopology-127-1464780171. The initial attempt at shutting it down is at 2016-06-01 13:44:39.955 in the supervisor log.


was (Author: srdo):
The nimbus and supervisor logs for the period. The undead worker is running ZendeskTicketTopology-127-1464780171.

> Supervisor may not shut down workers cleanly
> --------------------------------------------
>
>                 Key: STORM-1879
>                 URL: https://issues.apache.org/jira/browse/STORM-1879
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.0.1
>            Reporter: Stig Rohde Døssing
>         Attachments: nimbus-supervisor.zip
>
>
> We've run into a strange issue with a zombie worker process. It looks like the worker pid file somehow got deleted without the worker process shutting down. This causes the supervisor to try repeatedly to kill the worker unsuccessfully, and means multiple workers may be assigned to the same port. The worker root folder sticks around because the worker is still heartbeating to it.
> It may or may not be related that we've seen Nimbus occasionally enter an infinite loop of printing logs similar to the below.
> {code}
> 2016-05-19 14:55:14.196 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
> 2016-05-19 14:55:14.210 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
> 2016-05-19 14:55:14.218 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
> 2016-05-19 14:55:14.256 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
> 2016-05-19 14:55:14.273 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
> 2016-05-19 14:55:14.316 o.a.s.b.BlobStoreUtils [ERROR] Could not update the blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
> {code}
> Which continues until Nimbus is rebooted. We also see repeating blocks similar to the logs below.
> {code}
> 2016-06-02 07:45:03.656 o.a.s.d.nimbus [INFO] Cleaning up ZendeskTicketTopology-127-1464780171
> 2016-06-02 07:45:04.132 o.a.s.d.nimbus [INFO] ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormjar.jar)
> 2016-06-02 07:45:04.144 o.a.s.d.nimbus [INFO] ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormconf.ser)
> 2016-06-02 07:45:04.155 o.a.s.d.nimbus [INFO] ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormcode.ser)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)