You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Samer Al-Kiswany (JIRA)" <ji...@apache.org> on 2014/08/23 03:45:11 UTC

[jira] [Commented] (ZOOKEEPER-2018) Zookeper node fails to boot if writes are reordered

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107794#comment-14107794 ] 

Samer Al-Kiswany commented on ZOOKEEPER-2018:
---------------------------------------------

On a related note. But for your information. The first write in the trace (#1) should be atomic, else Zookeeper will not boot. This should not be an issue on current systems as the first write is only 12-124 bytes long.

> Zookeper node fails to boot if writes are reordered
> ---------------------------------------------------
>
>                 Key: ZOOKEEPER-2018
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2018
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.6
>            Reporter: Samer Al-Kiswany
>
> After studying the steps ZooKeeper takes to update the logs we found the following bug. The bug may manifest in file systems with writeback buffering. 
> If you run the zookeeper client script (zkCli.sh) with the following commands:
> VALUE=”8KB value”  # 8KB in size
> create /dir1 $VALUE
> create /dir1/dir2 $VALUE
> the strace generated at the zookeeprer node is: 
> mkdir(v)
> create(v/log)
> append(v/log)
> trunk(v/log)
> …
> fdatasync(v/log)
> write(v/log)    ……. 1
> write(v/log)    ……. 2
> write(v/log)    ……. 3
> fdatasync(v/log)
> The last four calls are related to the second create of dir2.
> If the last write (#3) goes to disk before the second write (#2) and the system crashes before #2 reaches the disk, the zookeeper node will not boot.



--
This message was sent by Atlassian JIRA
(v6.2#6252)