You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Todd Lipcon (Created) (JIRA)" <ji...@apache.org> on 2012/03/19 22:27:38 UTC

[jira] [Created] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Writing to local files is done non-atomically
---------------------------------------------

                 Key: ZOOKEEPER-1427
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.4.3
            Reporter: Todd Lipcon


Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404488#comment-13404488 ] 

Rakesh R commented on ZOOKEEPER-1427:
-------------------------------------

Its great. Thanks Todd, Pat

I also come across with the same situation and currentEpoch is coming as empty.

{noformat}
2012-06-28 14:23:13,307 [myid:2] - ERROR [main:QuorumPeer@453] - Unable to load database on disk
java.io.IOException: Found null in ../dataDir/version-2/currentEpoch
	at org.apache.zookeeper.server.quorum.QuorumPeer.readLongFromFile(QuorumPeer.java:1080)
{noformat}

After analysing found the reason for empty file is due to 'no disk space'.

The patch is really good and proven logic from hdfs. But I'm having one small clarification. Is there any case in zookeeper to look at the '.tmp' file while reading back?

Say, (Current/accepted)epoch is 3 and the new epoch 4 has updated into '.tmp' file. Unfortunately after updating the '.tmp' file, zk got killed. During next startup whether zk server needs to look at the '.tmp' file as it is having the highest number than the actual epoch?

-Rakesh
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409579#comment-13409579 ] 

Benjamin Reed commented on ZOOKEEPER-1427:
------------------------------------------

+1 great work. one thing that would be nice, but not needed for correctness and for a later jira, is to use a shutdown hook to cleanup the tmp files.
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404108#comment-13404108 ] 

Patrick Hunt commented on ZOOKEEPER-1427:
-----------------------------------------

Comment from the ML that I believe is related to this issue:

{noformat}
I am currently facing an issue with zookeeper restart.  Version used is
3.4.3

Following is the scenario.

While setting the currentEpoch or acceptedEpoch to file, the zookeeper
process was abruptly killed.

This lead to an empty file being left under the dataDir (acceptedEpoch or
currentEpoch).

On zookeeper restart it fails indicating that null was found in the file.
{noformat}

                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1427:
------------------------------------

    Attachment: ZOOKEEPER-1427.patch
    
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416994#comment-13416994 ] 

Hudson commented on ZOOKEEPER-1427:
-----------------------------------

Integrated in ZooKeeper-trunk #1617 (See [https://builds.apache.org/job/ZooKeeper-trunk/1617/])
    ZOOKEEPER-1427. Writing to local files is done non-atomically (phunt) (Revision 1362656)

     Result = SUCCESS
phunt : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1362656
Files : 
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/common/AtomicFileOutputStream.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/common/IOUtils.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java
* /zookeeper/trunk/src/java/test/config/findbugsExcludeFile.xml
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/AtomicFileOutputStreamTest.java
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/ClientBase.java

                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1427:
------------------------------------

    Attachment: ZOOKEEPER-1427_br34.patch
    
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408136#comment-13408136 ] 

Flavio Junqueira commented on ZOOKEEPER-1427:
---------------------------------------------

Just to add a few more comments of top of Pat's. We assume that a server logs to disk before it acks, so if an entry is written partially, then we can safely discard it. We use crcs to check the integrity of entries. 

The same doesn't hold for the epochs. If the files storing them get corrupted, then we might violate a promise of the server that's critical to guarantee the correctness of the replication protocol. 
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404131#comment-13404131 ] 

Patrick Hunt commented on ZOOKEEPER-1427:
-----------------------------------------

FYI, I was writing my most recent comment while yours was in-flight. :-)

Seems we are thinking the same thing.
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406177#comment-13406177 ] 

Patrick Hunt commented on ZOOKEEPER-1427:
-----------------------------------------

Thanks for the feedback Rakesh/Todd/Flavio.

bq. it might be a good idea to make sure that the server doesn't make further progress if the write disk fail

Using Eclipse I backtracked the calls to the atomic operation. It branches quickly, but from what I can see any exceptions thrown when writing the file are handled properly.

bq. Is there any case in zookeeper to look at the '.tmp' file while reading back?

I think the comments clarified this. But basically the server never tries to read the .tmp file, it will only write that file and eventually do the rename. If the rename occurs we're done, if the rename fails we attempt to remove the .tmp file, regardless we ignore that file outside of the atomicoutputstream named file creation process. (so I think we're good)

bq. You may also want to raise the level of the log message to error in the catch block.

I'll submit an updated patch for this.


                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404428#comment-13404428 ] 

Flavio Junqueira commented on ZOOKEEPER-1427:
---------------------------------------------

Great catch, Todd. For the patch Pat is proposing, it might be a good idea to make sure that the server doesn't make further progress if the write disk fail (exception is thrown in writeLongToFile). As was pointed out in this jira, if the server can't fulfill its promise, it shouldn't proceed. You may also want to raise the level of the log message to error in the catch block. 
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405154#comment-13405154 ] 

Todd Lipcon commented on ZOOKEEPER-1427:
----------------------------------------

The patch looks good to me. Thanks for the new tests, I'll try to pull them back into HDFS soon.

bq. Say, (Current/accepted)epoch is 3 and the new epoch 4 has updated into '.tmp' file. Unfortunately after updating the '.tmp' file, zk got killed. During next startup whether zk server needs to look at the '.tmp' file as it is having the highest number than the actual epoch?

I think it's fine -- if it is killed before renaming tmp to the final location, then it also will not have responded to any other process with the "promise". So it's OK to forget the promise -- ie it wasn't successfully made.
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408118#comment-13408118 ] 

Patrick Hunt commented on ZOOKEEPER-1427:
-----------------------------------------

Hi Marshal. That's not the case as both logs and snapshot files are handled in such a way that this mechanism is not needed (and in the case of logs not possible/wanted).
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1427:
------------------------------------

    Attachment: ZOOKEEPER-1427_br34.patch
    
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1427:
------------------------------------

    Attachment: ZOOKEEPER-1427.patch
    
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404329#comment-13404329 ] 

Hadoop QA commented on ZOOKEEPER-1427:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12534071/ZOOKEEPER-1427.patch
  against trunk revision 1355541.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 5 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1115//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1115//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1115//console

This message is automatically generated.
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1427:
------------------------------------

         Priority: Critical  (was: Major)
    Fix Version/s: 3.4.4
    
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 3.4.4
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt reassigned ZOOKEEPER-1427:
---------------------------------------

    Assignee: Patrick Hunt
    
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404169#comment-13404169 ] 

Todd Lipcon commented on ZOOKEEPER-1427:
----------------------------------------

Yea, I didn't know whether the data was fixed-length (and <512bytes). Write, fsync, and rename is best. In theory we should fsync the directory, too, but I don't know of any way to do it from Java :(
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408936#comment-13408936 ] 

Camille Fournier commented on ZOOKEEPER-1427:
---------------------------------------------

This patch looks ok to me, +1. Pat do you want to check it in or should I?
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406738#comment-13406738 ] 

Hadoop QA commented on ZOOKEEPER-1427:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535016/ZOOKEEPER-1427.patch
  against trunk revision 1355657.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 8 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1122//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1122//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1122//console

This message is automatically generated.
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1427:
------------------------------------

    Fix Version/s: 3.5.0
    
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1427:
------------------------------------

    Attachment: ZOOKEEPER-1427_br34.patch
    
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405283#comment-13405283 ] 

Flavio Junqueira commented on ZOOKEEPER-1427:
---------------------------------------------

bq. Say, (Current/accepted)epoch is 3 and the new epoch 4 has updated into '.tmp' file. Unfortunately after updating the '.tmp' file, zk got killed. During next startup whether zk server needs to look at the '.tmp' file as it is having the highest number than the actual epoch?

Servers are not required to complete the recovery phase of the protocol, they can crash before completing. There is a correctness issue only if a server goes back to an epoch that it promised not to go back to. It is ok to accept an epoch and not participating in it. 
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404109#comment-13404109 ] 

Patrick Hunt commented on ZOOKEEPER-1427:
-----------------------------------------

FYI: Simple workaround is to remove the acceptedEpoch/currentEpoch files and restart the server in question.
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1427:
------------------------------------

    Attachment: ZOOKEEPER-1427.patch
    
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408102#comment-13408102 ] 

Marshall McMullen commented on ZOOKEEPER-1427:
----------------------------------------------

It seems this only provide atomic write of acceptedEpoch and currentEpoch files. What about the other local files like the log and snapshot files? Seems like they should be written atomically as well.
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404406#comment-13404406 ] 

Hadoop QA commented on ZOOKEEPER-1427:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12534090/ZOOKEEPER-1427.patch
  against trunk revision 1355541.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 8 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1116//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1116//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1116//console

This message is automatically generated.
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404128#comment-13404128 ] 

Patrick Hunt commented on ZOOKEEPER-1427:
-----------------------------------------

Hi Todd, I don't think the original suggestion will work (use RAF) - the length of the file may change, in which case you would end up with the possibility of "1000" and then "999" resulting in "9990" (in the error case, ie process failure during update).

I think we need to use rename for this, that sound right to you? Create the new file as "foo.tmp" and then rename to "foo" once the file is fully created.
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408152#comment-13408152 ] 

Patrick Hunt commented on ZOOKEEPER-1427:
-----------------------------------------

Flavio you bring up a good point. There is no checksum (etc...) on the epoch files themselves. That's really unfortunate/shortsighted. We need to do a better job on this in future (when adding new file/types).

Seems we need to add a new jira for this. In particular I notice that while we are catching the numberformatexception we're not logging it. Might be useful to do that in the read method directly for debugging purposes. Also someone should verify all the call paths are handling the thrown IOException (NFE is converted to IOE). See ZOOKEEPER-1507

                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch, ZOOKEEPER-1427_br34.patch
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1427) Writing to local files is done non-atomically

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404111#comment-13404111 ] 

Todd Lipcon commented on ZOOKEEPER-1427:
----------------------------------------

Alternatively, https://github.com/toddlipcon/hadoop-common/blob/qjm-patchseries/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/AtomicFileOutputStream.java might be useful for you.
                
> Writing to local files is done non-atomically
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1427
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1427
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Todd Lipcon
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>
> Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira