You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Will Johnson (JIRA)" <ji...@apache.org> on 2012/07/05 19:34:36 UTC

[jira] [Created] (ZOOKEEPER-1502) Prevent multiple zookeeper servers from using the same data directory

Will Johnson created ZOOKEEPER-1502:
---------------------------------------

             Summary: Prevent multiple zookeeper servers from using the same data directory
                 Key: ZOOKEEPER-1502
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1502
             Project: ZooKeeper
          Issue Type: Improvement
          Components: server
    Affects Versions: 3.4.3
            Reporter: Will Johnson


We recently ran into an issue where two zookeepers servers which were a part of two separate quorums were configured to use the same data directory.  Interestingly, the zookeeper servers did not seem to complain and both seemed to work fine until one of them was restarted.  Once that happened all sort of chaos ensued.  I understand that this is a misconfiguration should zookeeper complain about this or do users need to protect themselves in some external fashion?  Is a simple file lock enough or are there other things I should take into consideration if it’s up to me to handle?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (ZOOKEEPER-1502) Prevent multiple zookeeper servers from using the same data directory

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426244#comment-13426244 ] 

Patrick Hunt commented on ZOOKEEPER-1502:
-----------------------------------------

Thanks Jacob. So java FileLock would suffice here then?
                
> Prevent multiple zookeeper servers from using the same data directory
> ---------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1502
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1502
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Will Johnson
>
> We recently ran into an issue where two zookeepers servers which were a part of two separate quorums were configured to use the same data directory.  Interestingly, the zookeeper servers did not seem to complain and both seemed to work fine until one of them was restarted.  Once that happened all sort of chaos ensued.  I understand that this is a misconfiguration should zookeeper complain about this or do users need to protect themselves in some external fashion?  Is a simple file lock enough or are there other things I should take into consideration if it’s up to me to handle?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (ZOOKEEPER-1502) Prevent multiple zookeeper servers from using the same data directory

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426190#comment-13426190 ] 

Patrick Hunt commented on ZOOKEEPER-1502:
-----------------------------------------

Sounds to me like a lock file is the way to go. However handling the corner cases this brings out might be tricky. In particular we should log this issue (dir already locked) very clearly and exit the JVM. Honestly it seems to me that this will raise more issues than it will address. In particular I've never heard of anyone experiencing this issue before. Failing to remove the lock file, and then not being able to start the server, seems like it will be much more common. I'd look at how other systems handle this.
                
> Prevent multiple zookeeper servers from using the same data directory
> ---------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1502
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1502
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Will Johnson
>
> We recently ran into an issue where two zookeepers servers which were a part of two separate quorums were configured to use the same data directory.  Interestingly, the zookeeper servers did not seem to complain and both seemed to work fine until one of them was restarted.  Once that happened all sort of chaos ensued.  I understand that this is a misconfiguration should zookeeper complain about this or do users need to protect themselves in some external fashion?  Is a simple file lock enough or are there other things I should take into consideration if it’s up to me to handle?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (ZOOKEEPER-1502) Prevent multiple zookeeper servers from using the same data directory

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426248#comment-13426248 ] 

Ted Dunning commented on ZOOKEEPER-1502:
----------------------------------------

Another, more robust option, is to put the PID of the ZK process into the lock file.  If that process doesn't exist or isn't a ZK process, then the lock is an orphan and can be removed.  Touching the file every minute or so also makes identification of an orphan very easy.

Other systems that use a similar approach include mySQL, Solr and mongodb.  
                
> Prevent multiple zookeeper servers from using the same data directory
> ---------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1502
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1502
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Will Johnson
>
> We recently ran into an issue where two zookeepers servers which were a part of two separate quorums were configured to use the same data directory.  Interestingly, the zookeeper servers did not seem to complain and both seemed to work fine until one of them was restarted.  Once that happened all sort of chaos ensued.  I understand that this is a misconfiguration should zookeeper complain about this or do users need to protect themselves in some external fashion?  Is a simple file lock enough or are there other things I should take into consideration if it’s up to me to handle?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (ZOOKEEPER-1502) Prevent multiple zookeeper servers from using the same data directory

Posted by "Jacob Mandelson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426219#comment-13426219 ] 

Jacob Mandelson commented on ZOOKEEPER-1502:
--------------------------------------------

On systems which use lockfiles, it's common to lockf the lockfile, so that an orphaned lockfile can be easily identified by being lockf-able.  Though this can break on remote filesystems, it's really robust on local disk.

                
> Prevent multiple zookeeper servers from using the same data directory
> ---------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1502
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1502
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Will Johnson
>
> We recently ran into an issue where two zookeepers servers which were a part of two separate quorums were configured to use the same data directory.  Interestingly, the zookeeper servers did not seem to complain and both seemed to work fine until one of them was restarted.  Once that happened all sort of chaos ensued.  I understand that this is a misconfiguration should zookeeper complain about this or do users need to protect themselves in some external fashion?  Is a simple file lock enough or are there other things I should take into consideration if it’s up to me to handle?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira