You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Patrick Hunt (JIRA)" <ji...@apache.org> on 2016/08/24 21:59:22 UTC

[jira] [Updated] (ZOOKEEPER-2528) ZooKeeper cluster can become unavailable due to power failures

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-2528:
------------------------------------
    Priority: Critical  (was: Major)

> ZooKeeper cluster can become unavailable due to power failures
> --------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2528
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2528
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.8
>         Environment: A normal ZooKeeper cluster of 3 nodes running on 3 Linux machines. 
>            Reporter: Ramnatthan Alagappan
>            Priority: Critical
>
> ZooKeeper cluster can become unavailable if power failures happen at certain specific points in time. 
> Details:
> I am running a three-node ZooKeeper cluster. I perform a simple update from a client machine. 
> When I try to update a value, ZooKeeper creates a new log file (for example, when the current log is fully utilized). First, it creates the file and appends some header information to the newly created log. The system call sequence looks like below:
> creat(log.200000001)
> append(log.200000001, offset=0,  count=16)
> Now, if a power failure happens just after the creat of the log file but before the append of the header information, the node simply crashes with an EOF exception. If the same problem occurs at two or more nodes in my three-node cluster, the entire cluster becomes unavailable as the majority of servers have crashed because of the above problem.  
> A power failure at the same time across multiple nodes may be possible in single data center or single rack deployment scenarios. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)