You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Patrick Hunt (JIRA)" <ji...@apache.org> on 2009/07/23 23:47:14 UTC

[jira] Created: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes

need ops documentation that details supervision of ZK server processes
----------------------------------------------------------------------

                 Key: ZOOKEEPER-485
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485
             Project: Zookeeper
          Issue Type: Bug
          Components: documentation, server
            Reporter: Patrick Hunt
             Fix For: 3.2.1, 3.3.0


We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process
exits/dies/crashes/etc...

In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm.

Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script
should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done
automatically then it will have to be done manually, by operator restarting the ZK server jvm

The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained - 
fits into this nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836099#action_12836099 ] 

Hadoop QA commented on ZOOKEEPER-485:
-------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12436408/ZOOKEEPER-485.patch
  against trunk revision 912052.

    +1 @author.  The patch does not contain any @author tags.

    +0 tests included.  The patch appears to be a documentation patch that doesn't require tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/74/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/74/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/74/console

This message is automatically generated.

> need ops documentation that details supervision of ZK server processes
> ----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-485
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: documentation, server
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>             Fix For: 3.3.0
>
>         Attachments: ZOOKEEPER-485.patch
>
>
> We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process
> exits/dies/crashes/etc...
> In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm.
> Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script
> should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done
> automatically then it will have to be done manually, by operator restarting the ZK server jvm
> The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained - 
> fits into this nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-485:
-----------------------------------

    Assignee: Patrick Hunt
      Status: Patch Available  (was: Open)

> need ops documentation that details supervision of ZK server processes
> ----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-485
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: documentation, server
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>             Fix For: 3.3.0
>
>         Attachments: ZOOKEEPER-485.patch
>
>
> We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process
> exits/dies/crashes/etc...
> In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm.
> Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script
> should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done
> automatically then it will have to be done manually, by operator restarting the ZK server jvm
> The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained - 
> fits into this nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-485:
-----------------------------------

    Fix Version/s:     (was: 3.2.1)

> need ops documentation that details supervision of ZK server processes
> ----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-485
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: documentation, server
>            Reporter: Patrick Hunt
>             Fix For: 3.3.0
>
>
> We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process
> exits/dies/crashes/etc...
> In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm.
> Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script
> should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done
> automatically then it will have to be done manually, by operator restarting the ZK server jvm
> The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained - 
> fits into this nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-485:
-----------------------------------

    Attachment: ZOOKEEPER-485.patch

this patch details having a supervisory process (also fills out the monitoring section)

> need ops documentation that details supervision of ZK server processes
> ----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-485
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: documentation, server
>            Reporter: Patrick Hunt
>             Fix For: 3.3.0
>
>         Attachments: ZOOKEEPER-485.patch
>
>
> We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process
> exits/dies/crashes/etc...
> In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm.
> Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script
> should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done
> automatically then it will have to be done manually, by operator restarting the ZK server jvm
> The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained - 
> fits into this nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838330#action_12838330 ] 

Hudson commented on ZOOKEEPER-485:
----------------------------------

Integrated in ZooKeeper-trunk #706 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/706/])
    . Need ops documentation that details supervision of ZK server processes. (phunt via mahadev)


> need ops documentation that details supervision of ZK server processes
> ----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-485
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: documentation, server
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>             Fix For: 3.3.0
>
>         Attachments: ZOOKEEPER-485.patch
>
>
> We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process
> exits/dies/crashes/etc...
> In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm.
> Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script
> should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done
> automatically then it will have to be done manually, by operator restarting the ZK server jvm
> The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained - 
> fits into this nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes

Posted by "Brett Eisenberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734911#action_12734911 ] 

Brett Eisenberg commented on ZOOKEEPER-485:
-------------------------------------------

FWIW, Zookeeper works great under SMF (http://en.wikipedia.org/wiki/Service_Management_Facility)

> need ops documentation that details supervision of ZK server processes
> ----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-485
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: documentation, server
>            Reporter: Patrick Hunt
>             Fix For: 3.2.1, 3.3.0
>
>
> We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process
> exits/dies/crashes/etc...
> In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm.
> Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script
> should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done
> automatically then it will have to be done manually, by operator restarting the ZK server jvm
> The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained - 
> fits into this nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734915#action_12734915 ] 

Patrick Hunt commented on ZOOKEEPER-485:
----------------------------------------

Brett -- nice, we'll try to include that in the writeup. Do you have anything we could use as an example of how to run ZK under SMF?

> need ops documentation that details supervision of ZK server processes
> ----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-485
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: documentation, server
>            Reporter: Patrick Hunt
>             Fix For: 3.2.1, 3.3.0
>
>
> We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process
> exits/dies/crashes/etc...
> In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm.
> Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script
> should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done
> automatically then it will have to be done manually, by operator restarting the ZK server jvm
> The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained - 
> fits into this nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-485:
------------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

+1 the patch looks good.

I just committed this. thanks pat.

> need ops documentation that details supervision of ZK server processes
> ----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-485
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: documentation, server
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>             Fix For: 3.3.0
>
>         Attachments: ZOOKEEPER-485.patch
>
>
> We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process
> exits/dies/crashes/etc...
> In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm.
> Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script
> should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done
> automatically then it will have to be done manually, by operator restarting the ZK server jvm
> The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained - 
> fits into this nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.