You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Patrick Hunt (JIRA)" <ji...@apache.org> on 2010/02/20 01:32:27 UTC

[jira] Updated: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-485:
-----------------------------------

    Attachment: ZOOKEEPER-485.patch

this patch details having a supervisory process (also fills out the monitoring section)

> need ops documentation that details supervision of ZK server processes
> ----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-485
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: documentation, server
>            Reporter: Patrick Hunt
>             Fix For: 3.3.0
>
>         Attachments: ZOOKEEPER-485.patch
>
>
> We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process
> exits/dies/crashes/etc...
> In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm.
> Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script
> should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done
> automatically then it will have to be done manually, by operator restarting the ZK server jvm
> The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained - 
> fits into this nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.