You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Laxman (JIRA)" <ji...@apache.org> on 2011/06/24 06:48:55 UTC

[jira] [Created] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Zookeeper service is down when SyncRequestProcessor meets any exception.
------------------------------------------------------------------------

                 Key: ZOOKEEPER-1109
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
             Project: ZooKeeper
          Issue Type: Bug
          Components: quorum
    Affects Versions: 3.3.3, 3.3.2, 3.3.1, 3.3.0
            Reporter: Laxman


*Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
 

*Scenario*
If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.

*Root Cause* 
this.join() is invoked in the same thread where System.exit(11) has been triggered.

When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057755#comment-13057755 ] 

Hadoop QA commented on ZOOKEEPER-1109:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12484757/ZOOKEEPER-1109.patch
  against trunk revision 1140017.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/360//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/360//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/360//console

This message is automatically generated.

> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>         Attachments: ZOOKEEPER-1109.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070753#comment-13070753 ] 

Mahadev konar edited comment on ZOOKEEPER-1109 at 7/25/11 9:00 PM:
-------------------------------------------------------------------

I am removing the fix version for 3.3.4. The patch doesnt apply to 3.3 branch. Ill let 3.3.4 Release manager decide is they want to back port this. 



      was (Author: mahadev):
    I am removing the fix version to be 3.4 only. The patch doesnt apply to 3.3 branch. Ill let 3.3.4 Release manager decide is they want to back port this. 


  
> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>            Assignee: Laxman
>            Priority: Critical
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1109.1.patch, ZOOKEEPER-1109.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061783#comment-13061783 ] 

Mahadev konar edited comment on ZOOKEEPER-1109 at 7/8/11 6:00 AM:
------------------------------------------------------------------

Laxman,
 I think we should probably use volatile for boolean running? Other than that it looks good.


      was (Author: mahadev):
    Laxman,
 I think we should probably use volatile for boolean running?

  
> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>         Attachments: ZOOKEEPER-1109.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Laxman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058989#comment-13058989 ] 

Laxman commented on ZOOKEEPER-1109:
-----------------------------------

Test failure reported here doesn't seems to be introduced by this patch.
All tests are verified locally and all are passing.

> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>         Attachments: ZOOKEEPER-1109.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Laxman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069507#comment-13069507 ] 

Laxman commented on ZOOKEEPER-1109:
-----------------------------------

Hi Mahadev, any suggestions on the reworked patch?

> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>            Assignee: Laxman
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1109.1.patch, ZOOKEEPER-1109.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1109:
------------------------------------

         Priority: Critical  (was: Major)
    Fix Version/s: 3.4.0
                   3.3.4

> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>            Assignee: Laxman
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1109.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061783#comment-13061783 ] 

Mahadev konar commented on ZOOKEEPER-1109:
------------------------------------------

Laxman,
 I think we should probably use volatile for boolean running?


> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>         Attachments: ZOOKEEPER-1109.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070729#comment-13070729 ] 

Mahadev konar commented on ZOOKEEPER-1109:
------------------------------------------

+1, patch looks good. Ill go ahead and commit. Thanks Laxman! Sorry for my late response.

> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>            Assignee: Laxman
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1109.1.patch, ZOOKEEPER-1109.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt reassigned ZOOKEEPER-1109:
---------------------------------------

    Assignee: Laxman

> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>            Assignee: Laxman
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1109.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-1109:
-------------------------------------

    Fix Version/s:     (was: 3.3.4)

I am removing the fix version to be 3.4 only. The patch doesnt apply to 3.3 branch. Ill let 3.3.4 Release manager decide is they want to back port this. 



> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>            Assignee: Laxman
>            Priority: Critical
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1109.1.patch, ZOOKEEPER-1109.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Laxman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Laxman updated ZOOKEEPER-1109:
------------------------------

    Attachment: ZOOKEEPER-1109.1.patch

> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>            Assignee: Laxman
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1109.1.patch, ZOOKEEPER-1109.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Laxman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Laxman updated ZOOKEEPER-1109:
------------------------------

    Attachment: ZOOKEEPER-1109.patch

Tested the patch with debug points.
Not able to add a testcase as this System.exit scenario.

*Patch*
If the shutdown has been triggered by this thread, we dont call this.join().


> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>         Attachments: ZOOKEEPER-1109.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071033#comment-13071033 ] 

Hudson commented on ZOOKEEPER-1109:
-----------------------------------

Integrated in ZooKeeper-trunk #1255 (See [https://builds.apache.org/job/ZooKeeper-trunk/1255/])
    ZOOKEEPER-1109. Zookeeper service is down when SyncRequestProcessor meets any exception. (Laxman via mahadev)

mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1150903
Files : 
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/SyncRequestProcessor.java


> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>            Assignee: Laxman
>            Priority: Critical
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1109.1.patch, ZOOKEEPER-1109.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065030#comment-13065030 ] 

Hadoop QA commented on ZOOKEEPER-1109:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12486399/ZOOKEEPER-1109.1.patch
  against trunk revision 1146025.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/393//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/393//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/393//console

This message is automatically generated.

> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>            Assignee: Laxman
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1109.1.patch, ZOOKEEPER-1109.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Laxman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054239#comment-13054239 ] 

Laxman commented on ZOOKEEPER-1109:
-----------------------------------

Reposting the comments and analysis

I've also gone through Ted's earlier response on disk full scenario.
http://www.google.co.in/url?sa=t&source=web&cd=3&ved=0CCAQFjAC&url=http%3A%2F%2Fmail-archives.apache.org%2Fmod_mbox%2Fzookeeper-user%2F201106.mbox%2F%253CBANLkTimzQjXZvDKnP6xQLF9jHfsaz6JstA%40mail.gmail.com%253E&ei=FBQETvPWIcLNrQfk75yaDA&usg=AFQjCNFTkguyxTligpz1TZBmkqe9Osz-uw

We feel, even when one of the cluster member's disk is full, we should not interrupt the complete service from entire cluster.

*Thread dumps*

The following thread dump shows the QuorumPeerMain thread is infntely waiting inside SyncRequestProcessor. 

{noformat}
"Thread-2" prio=10 tid=0x0810a400 nid=0x1695 in Object.wait() [0xac783000] 
   java.lang.Thread.State: WAITING (on object monitor) 
        at java.lang.Object.wait(Native Method) 
        - waiting on <0xb804f5e8> (a org.apache.zookeeper.server.SyncRequestProcessor) 
        at java.lang.Thread.join(Thread.java:1143) 
        - locked <0xb804f5e8> (a org.apache.zookeeper.server.SyncRequestProcessor) 
        at java.lang.Thread.join(Thread.java:1196) 
        at org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:171) 
        at org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(ProposalRequestProcessor.java:79) 
        at org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:513) 
        at org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:413) 
        at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:411) 
        at org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694) 
        at org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java:126) 

"SyncThread:2" prio=10 tid=0xad7fd800 nid=0x4acb in Object.wait() [0xac9ba000] 
   java.lang.Thread.State: WAITING (on object monitor) 
        at java.lang.Object.wait(Native Method) 
        - waiting on <0xb8030d00> (a org.apache.zookeeper.server.quorum.QuorumPeerMain$1) 
        at java.lang.Thread.join(Thread.java:1143) 
        - locked <0xb8030d00> (a org.apache.zookeeper.server.quorum.QuorumPeerMain$1) 
        at java.lang.Thread.join(Thread.java:1196) 
        at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79) 
        at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24) 
        at java.lang.Shutdown.runHooks(Shutdown.java:79) 
        at java.lang.Shutdown.sequence(Shutdown.java:123) 
        at java.lang.Shutdown.exit(Shutdown.java:168) 
        - locked <0xf01ff3b0> (a java.lang.Class for java.lang.Shutdown) 
        at java.lang.Runtime.exit(Runtime.java:90) 
        at java.lang.System.exit(System.java:904) 
        at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:149)
{noformat}


*Logs*

{noformat}
2011-06-21 10:09:59,730 - FATAL [SyncThread:2:SyncRequestProcessor@148] - Severe unrecoverable error, exiting 
java.io.IOException: No space left on device 
        at java.io.FileOutputStream.writeBytes(Native Method) 
        at java.io.FileOutputStream.write(FileOutputStream.java:260) 
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) 
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) 
        at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:305) 
        at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:324) 
        at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) 
        at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:158) 
        at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98) 
2011-06-21 10:09:59,732 - INFO  [Thread-2:QuorumPeer@691] - The Quorum server is going for shutdown 
2011-06-21 10:09:59,732 - INFO  [Thread-2:Leader@393] - Shutdown called 
java.lang.Exception: shutdown Leader! reason: quorum Peer shutdown 
        at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:393) 
        at org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694) 
        at org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java:126) 
2011-06-21 10:09:59,733 - INFO  [Thread-6:Leader$LearnerCnxAcceptor@243] - exception while shutting down acceptor: java.net.SocketException: Socket closed 
2011-06-21 10:09:59,758 - INFO  [ProcessThread:-1:PrepRequestProcessor@120] - PrepRequestProcessor exited loop! 
2011-06-21 10:09:59,758 - INFO  [CommitProcessor:2:CommitProcessor@150] - CommitProcessor exited loop! 
2011-06-21 10:09:59,759 - INFO  [Thread-2:FinalRequestProcessor@379] - shutdown of request processor complete 
2011-06-21 10:10:00,000 - INFO  [SessionTracker:SessionTrackerImpl@165] - SessionTrackerImpl exited loop! 
{noformat}


> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063576#comment-13063576 ] 

Mahadev konar commented on ZOOKEEPER-1109:
------------------------------------------

Laxman, 
 Any update? Are you planning to update the patch? If not please let me know.

> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>            Assignee: Laxman
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1109.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira