You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@zookeeper.apache.org by GitBox <gi...@apache.org> on 2020/01/22 22:01:58 UTC

[GitHub] [zookeeper] anmolnar opened a new pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

anmolnar opened a new pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233
 
 
   Issue described here:
   https://issues.apache.org/jira/browse/ZOOKEEPER-3701
   
   Proposing a fix with catching `IOException` within the truncate method to properly return with `false` if truncate fails.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] nkalmar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
nkalmar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-579240262
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] eolivelli commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r370082101
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileSnap.java
 ##########
 @@ -260,6 +260,8 @@ public synchronized void serialize(
                     Util.getZxidFromName(snapShot.getName(), SNAPSHOT_FILE_PREFIX),
                     snapShot.lastModified() / 1000);
             }
+        } else {
+            throw new IllegalStateException("FileSnap has already been closed");
 
 Review comment:
   will existing code handle this RuntimeException properly ?
   what about throwing a IOException ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] eolivelli commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-578086478
 
 
   @anmolnar you called for a double check from @hanm @lvfangmin @enixon 
   
   I feel we can pack 3.6.0 and 3.5.7 releases meanwhile, if they have strong concerns we can roll out a new RC.
   
   Or do you think we should wait for them ?
   
   cc @nkalmar 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] nkalmar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
nkalmar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-578704379
 
 
   I don't think there is anything planned for 3.4 branch.
   I can try to push this patch to 3.4 as well, but I'm afraid it will not cherry-pick easily. Especially the tests. So probably better to do a seperate PR, but I haven't tried it yet.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
anmolnar commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r370159576
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileSnap.java
 ##########
 @@ -260,6 +260,8 @@ public synchronized void serialize(
                     Util.getZxidFromName(snapShot.getName(), SNAPSHOT_FILE_PREFIX),
                     snapShot.lastModified() / 1000);
             }
+        } else {
+            throw new IllegalStateException("FileSnap has already been closed");
 
 Review comment:
   @ivankelly @eolivelli This is the serialize() method of FileSnap. Call chain is:
   FileSnap.serialize() <- FileTxnSnapLog.save() <- ZooKeeperServer.takeSnapshot()
   
   `takeSnapshot()` catches IOException and call system exit with error code of `TXNLOG_ERROR_TAKING_SNAPSHOT`. Sounds like this should be the expected behavior.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] eolivelli commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-578697590
 
 
   @ivankelly there is currently no interest in working on 3.4 and we are trying to push users to 3.5
   
   As soon as 3.6 will be out (we are at 2nd rc) 3.4 will be sent to EOL (I hope)
   
   If you have time for one last 3.4 release you can do it I think there is no showstopper

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
anmolnar commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r370154563
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
 ##########
 @@ -595,7 +600,9 @@ public void rollLog() throws IOException {
      */
     public void close() throws IOException {
         txnLog.close();
+        txnLog = null;
         snapLog.close();
+        snapLog = null;
 
 Review comment:
   Done.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] nkalmar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
nkalmar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-578684537
 
 
   This is the last blocker for 3.5.7, I'll commit this today. (Both to master, 3.6 and 3.5)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
anmolnar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-578746032
 
 
   @nkalmar Please don't push this patch to 3.4 yet. This is not a security fix and I'm not sure if we want to change the behavior around this logic. People should upgrade to 3.5 instead.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] ivankelly commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r369971700
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
 ##########
 @@ -499,23 +499,28 @@ public void save(
      * @return true if able to truncate the log, false if not
      * @throws IOException
      */
-    public boolean truncateLog(long zxid) throws IOException {
-        // close the existing txnLog and snapLog
-        close();
-
-        // truncate it
-        FileTxnLog truncLog = new FileTxnLog(dataDir);
-        boolean truncated = truncLog.truncate(zxid);
-        truncLog.close();
-
-        // re-open the txnLog and snapLog
-        // I'd rather just close/reopen this object itself, however that
-        // would have a big impact outside ZKDatabase as there are other
-        // objects holding a reference to this object.
-        txnLog = new FileTxnLog(dataDir);
-        snapLog = new FileSnap(snapDir);
-
-        return truncated;
+    public boolean truncateLog(long zxid) {
+        try {
+            // close the existing txnLog and snapLog
+            close();
+
+            // truncate it
+            FileTxnLog truncLog = new FileTxnLog(dataDir);
+            boolean truncated = truncLog.truncate(zxid);
+            truncLog.close();
+
+            // re-open the txnLog and snapLog
+            // I'd rather just close/reopen this object itself, however that
+            // would have a big impact outside ZKDatabase as there are other
+            // objects holding a reference to this object.
+            txnLog = new FileTxnLog(dataDir);
+            snapLog = new FileSnap(snapDir);
+
+            return truncated;
+        } catch (IOException e) {
 
 Review comment:
   I'm wondering if IOException is sufficient. If you look at 
   https://github.com/apache/zookeeper/blob/c234848caef0cc920f19befd1b0b14251b17da92/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Follower.java#L128
   It's catching an Exception, so basically any exception other than IOException occurring here, could kick us back into the same situation. In fact, even the caller of Follower.followLeader() catches Exception. They're all over QuorumPeer, with "Unexpected exception" log warnings.
   
   I think as part of the same fix, these "Unexpected exception" errors should also take down the process. wdyt?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] nkalmar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
nkalmar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-579248322
 
 
   @ztzg you have a -1, but your remark (thanks again!) has been fixed, so I'll commit this today to move forward with the 3.5.7 release. If you get the chance please +1, I take it you had no other issues. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] ivankelly commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r369968658
 
 

 ##########
 File path: zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/LearnerTest.java
 ##########
 @@ -300,4 +307,52 @@ public void syncTest() throws Exception {
         }
     }
 
+    @Test
+    public void truncFailTest() throws Exception {
+        final boolean[] exitProcCalled = {false};
+
+        ServiceUtils.setSystemExitProcedure(new Consumer<Integer>() {
+            @Override
+            public void accept(Integer exitCode) {
+                exitProcCalled[0] = true;
+                assertThat(exitCode, equalTo(ExitCode.QUORUM_PACKET_ERROR.getValue()));
 
 Review comment:
   This assertion will not happen in the main junit thread, so if it asserts false, the behaviour will be unpredictable (the test will probably ignore it). 
   
   Instead of this, I'd replace exitProcCalled with a ```CompletableFuture<Integer>``` and at the end of the  test method do
   ```
   assertThat(exitCode.get(10, TimeUnit.SECONDS), equalTo(ExitCode.QUORUM_PACKET_ERROR.getValue()));
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] nkalmar commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
nkalmar commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r371156681
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
 ##########
 @@ -499,23 +499,28 @@ public void save(
      * @return true if able to truncate the log, false if not
      * @throws IOException
      */
-    public boolean truncateLog(long zxid) throws IOException {
-        // close the existing txnLog and snapLog
-        close();
-
-        // truncate it
-        FileTxnLog truncLog = new FileTxnLog(dataDir);
-        boolean truncated = truncLog.truncate(zxid);
-        truncLog.close();
-
-        // re-open the txnLog and snapLog
-        // I'd rather just close/reopen this object itself, however that
-        // would have a big impact outside ZKDatabase as there are other
-        // objects holding a reference to this object.
-        txnLog = new FileTxnLog(dataDir);
-        snapLog = new FileSnap(snapDir);
-
-        return truncated;
+    public boolean truncateLog(long zxid) {
+        try {
+            // close the existing txnLog and snapLog
+            close();
+
+            // truncate it
+            FileTxnLog truncLog = new FileTxnLog(dataDir);
+            boolean truncated = truncLog.truncate(zxid);
+            truncLog.close();
+
+            // re-open the txnLog and snapLog
+            // I'd rather just close/reopen this object itself, however that
+            // would have a big impact outside ZKDatabase as there are other
+            // objects holding a reference to this object.
+            txnLog = new FileTxnLog(dataDir);
+            snapLog = new FileSnap(snapDir);
+
+            return truncated;
+        } catch (IOException e) {
 
 Review comment:
   I also think this should be another ticket/improvement to fix

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
anmolnar commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r370037970
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
 ##########
 @@ -499,23 +499,28 @@ public void save(
      * @return true if able to truncate the log, false if not
      * @throws IOException
      */
-    public boolean truncateLog(long zxid) throws IOException {
-        // close the existing txnLog and snapLog
-        close();
-
-        // truncate it
-        FileTxnLog truncLog = new FileTxnLog(dataDir);
-        boolean truncated = truncLog.truncate(zxid);
-        truncLog.close();
-
-        // re-open the txnLog and snapLog
-        // I'd rather just close/reopen this object itself, however that
-        // would have a big impact outside ZKDatabase as there are other
-        // objects holding a reference to this object.
-        txnLog = new FileTxnLog(dataDir);
-        snapLog = new FileSnap(snapDir);
-
-        return truncated;
+    public boolean truncateLog(long zxid) {
+        try {
+            // close the existing txnLog and snapLog
+            close();
+
+            // truncate it
+            FileTxnLog truncLog = new FileTxnLog(dataDir);
+            boolean truncated = truncLog.truncate(zxid);
+            truncLog.close();
+
+            // re-open the txnLog and snapLog
+            // I'd rather just close/reopen this object itself, however that
+            // would have a big impact outside ZKDatabase as there are other
+            // objects holding a reference to this object.
+            txnLog = new FileTxnLog(dataDir);
+            snapLog = new FileSnap(snapDir);
+
+            return truncated;
+        } catch (IOException e) {
 
 Review comment:
   I'll check it, but currently I don't see any reason why not change it to `Exception`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
anmolnar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-579165712
 
 
   @nkalmar Thanks. I restored the binary file with original content.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
anmolnar commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r370156303
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
 ##########
 @@ -499,23 +499,28 @@ public void save(
      * @return true if able to truncate the log, false if not
      * @throws IOException
      */
-    public boolean truncateLog(long zxid) throws IOException {
-        // close the existing txnLog and snapLog
-        close();
-
-        // truncate it
-        FileTxnLog truncLog = new FileTxnLog(dataDir);
-        boolean truncated = truncLog.truncate(zxid);
-        truncLog.close();
-
-        // re-open the txnLog and snapLog
-        // I'd rather just close/reopen this object itself, however that
-        // would have a big impact outside ZKDatabase as there are other
-        // objects holding a reference to this object.
-        txnLog = new FileTxnLog(dataDir);
-        snapLog = new FileSnap(snapDir);
-
-        return truncated;
+    public boolean truncateLog(long zxid) {
+        try {
+            // close the existing txnLog and snapLog
+            close();
+
+            // truncate it
+            FileTxnLog truncLog = new FileTxnLog(dataDir);
+            boolean truncated = truncLog.truncate(zxid);
+            truncLog.close();
 
 Review comment:
   this is done slightly differently. please check.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] nkalmar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
nkalmar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-579712105
 
 
   No worries, thanks @ztzg , 3.5 backport available here thanks to Andor:
   https://github.com/apache/zookeeper/pull/1237

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] eolivelli commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r370081578
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
 ##########
 @@ -499,23 +499,28 @@ public void save(
      * @return true if able to truncate the log, false if not
      * @throws IOException
      */
-    public boolean truncateLog(long zxid) throws IOException {
-        // close the existing txnLog and snapLog
-        close();
-
-        // truncate it
-        FileTxnLog truncLog = new FileTxnLog(dataDir);
-        boolean truncated = truncLog.truncate(zxid);
-        truncLog.close();
-
-        // re-open the txnLog and snapLog
-        // I'd rather just close/reopen this object itself, however that
-        // would have a big impact outside ZKDatabase as there are other
-        // objects holding a reference to this object.
-        txnLog = new FileTxnLog(dataDir);
-        snapLog = new FileSnap(snapDir);
-
-        return truncated;
+    public boolean truncateLog(long zxid) {
+        try {
+            // close the existing txnLog and snapLog
+            close();
+
+            // truncate it
+            FileTxnLog truncLog = new FileTxnLog(dataDir);
+            boolean truncated = truncLog.truncate(zxid);
+            truncLog.close();
 
 Review comment:
   what about calling this **close** in a finally block ?
   
   ```
   try { 
       FileTxnLog truncLog = new FileTxnLog(dataDir);
               boolean truncated = truncLog.truncate(zxid);
     } finally {
               truncLog.close();
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] ivankelly commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r370162944
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileSnap.java
 ##########
 @@ -260,6 +260,8 @@ public synchronized void serialize(
                     Util.getZxidFromName(snapShot.getName(), SNAPSHOT_FILE_PREFIX),
                     snapShot.lastModified() / 1000);
             }
+        } else {
+            throw new IllegalStateException("FileSnap has already been closed");
 
 Review comment:
   Ok, IOException works in that case. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] ivankelly commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r370042831
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
 ##########
 @@ -499,23 +499,28 @@ public void save(
      * @return true if able to truncate the log, false if not
      * @throws IOException
      */
-    public boolean truncateLog(long zxid) throws IOException {
-        // close the existing txnLog and snapLog
-        close();
-
-        // truncate it
-        FileTxnLog truncLog = new FileTxnLog(dataDir);
-        boolean truncated = truncLog.truncate(zxid);
-        truncLog.close();
-
-        // re-open the txnLog and snapLog
-        // I'd rather just close/reopen this object itself, however that
-        // would have a big impact outside ZKDatabase as there are other
-        // objects holding a reference to this object.
-        txnLog = new FileTxnLog(dataDir);
-        snapLog = new FileSnap(snapDir);
-
-        return truncated;
+    public boolean truncateLog(long zxid) {
+        try {
+            // close the existing txnLog and snapLog
+            close();
+
+            // truncate it
+            FileTxnLog truncLog = new FileTxnLog(dataDir);
+            boolean truncated = truncLog.truncate(zxid);
+            truncLog.close();
+
+            // re-open the txnLog and snapLog
+            // I'd rather just close/reopen this object itself, however that
+            // would have a big impact outside ZKDatabase as there are other
+            // objects holding a reference to this object.
+            txnLog = new FileTxnLog(dataDir);
+            snapLog = new FileSnap(snapDir);
+
+            return truncated;
+        } catch (IOException e) {
 
 Review comment:
   I think it would be better to change the behaviour in QuorumPeer to requestSystemExit any time it gets an unexpected exception. QuorumPeer runs in a loop starting Learners and Leaders etc. There are objects shared between iterations of these loops, and as far I can see little checking that the objects are in a sane state if an exception occurred. Of course, I this is a bigger change, so perhaps it can be done later.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] nkalmar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
nkalmar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-579338714
 
 
   Thanks @anmolnar . Merged to master and 3.6, did not pick to branch-3.5. The conflict was not trivial on the tests, so I aborted the cherry-pick. Needs a seperate PR. I'll take a look tomorrow, or you can also create one for 3.5 Andor.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] ivankelly commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r370041702
 
 

 ##########
 File path: zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/LearnerTest.java
 ##########
 @@ -300,4 +307,52 @@ public void syncTest() throws Exception {
         }
     }
 
+    @Test
+    public void truncFailTest() throws Exception {
+        final boolean[] exitProcCalled = {false};
+
+        ServiceUtils.setSystemExitProcedure(new Consumer<Integer>() {
+            @Override
+            public void accept(Integer exitCode) {
+                exitProcCalled[0] = true;
+                assertThat(exitCode, equalTo(ExitCode.QUORUM_PACKET_ERROR.getValue()));
 
 Review comment:
   Ok. I see that now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] ivankelly commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
ivankelly commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-578685529
 
 
   Is 3.4 still being maintained? The customer that experienced the issue is on 3.4.13. If there's no plans for another 3.4 release, I'll manually backport internally.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] asfgit closed pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
anmolnar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-577621142
 
 
   @ivankelly I'm concerned about the huge number of tests that we broke with this patch. I'll check and try to fix them in the most harmless way, but I'm afraid we will change the original behavior significantly.
   
   If that's the case we still need to fix this issue, but not sure it can fit in a maintenance release of 3.5.x. We have to keep that branch stable.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
anmolnar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-577710976
 
 
   A lot better now: only 2 tests failed in the latest build. I'll take a look soon.
   @hanm @lvfangmin @enixon I suspect we need your brains here. PTAL when you get a chance.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
anmolnar commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-578150109
 
 
   @eolivelli I think you're right, let's go forward with this and start the releases.
   I don't want to commit my own patch, would you mind?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] nkalmar edited a comment on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
nkalmar edited a comment on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-579240262
 
 
   retest this please
   - Was not needed, oh well, we'll get another mvn check :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] eolivelli commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r370082976
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
 ##########
 @@ -595,7 +600,9 @@ public void rollLog() throws IOException {
      */
     public void close() throws IOException {
         txnLog.close();
+        txnLog = null;
 
 Review comment:
   if you set this variable to null then calling twice this method will result in an ugly NullPointerException
   what about
   ```
   try {
   if (txnLog != null) {
         txnLog.close();
   }
   } finally {
   txnLog = null;
   }
   
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] eolivelli commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r370083030
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
 ##########
 @@ -595,7 +600,9 @@ public void rollLog() throws IOException {
      */
     public void close() throws IOException {
         txnLog.close();
+        txnLog = null;
         snapLog.close();
+        snapLog = null;
 
 Review comment:
   the same here

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
anmolnar commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r370037499
 
 

 ##########
 File path: zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/LearnerTest.java
 ##########
 @@ -300,4 +307,52 @@ public void syncTest() throws Exception {
         }
     }
 
+    @Test
+    public void truncFailTest() throws Exception {
+        final boolean[] exitProcCalled = {false};
+
+        ServiceUtils.setSystemExitProcedure(new Consumer<Integer>() {
+            @Override
+            public void accept(Integer exitCode) {
+                exitProcCalled[0] = true;
+                assertThat(exitCode, equalTo(ExitCode.QUORUM_PACKET_ERROR.getValue()));
 
 Review comment:
   I double checked it. Consumer's accept() method is called synchronously, no threading involved. In other words it will be called on junit's main thread.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] ivankelly commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
ivankelly commented on issue #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#issuecomment-578719003
 
 
   I think it'll be easier to do it internally. The fix itself is pretty simple. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
anmolnar commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r370154653
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
 ##########
 @@ -595,7 +600,9 @@ public void rollLog() throws IOException {
      */
     public void close() throws IOException {
         txnLog.close();
+        txnLog = null;
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] ivankelly commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #1233: ZOOKEEPER-3701 Split brain on log disk full
URL: https://github.com/apache/zookeeper/pull/1233#discussion_r370120580
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileSnap.java
 ##########
 @@ -260,6 +260,8 @@ public synchronized void serialize(
                     Util.getZxidFromName(snapShot.getName(), SNAPSHOT_FILE_PREFIX),
                     snapShot.lastModified() / 1000);
             }
+        } else {
+            throw new IllegalStateException("FileSnap has already been closed");
 
 Review comment:
   I dunno if I suggested IllegalStateException originally, but it's what I had in mind when I suggested this should throw an exception. 
   
   If you call close() on FileSnap, and it's already closed, that's a programmer's mistake. There *should not* be any handling for this, because the object is in a state which it should never be. The best thing in this case is for the process to crash (this isn't necessarily the case though, due to the catchall Exception handlers in QuorumPeer).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services