You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Vishal Kathuria (JIRA)" <ji...@apache.org> on 2011/08/18 19:48:29 UTC

[jira] [Created] (ZOOKEEPER-1156) Log truncation truncating log too much - can cause data loss

Log truncation truncating log too much - can cause data loss
------------------------------------------------------------

                 Key: ZOOKEEPER-1156
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1156
             Project: ZooKeeper
          Issue Type: Bug
          Components: quorum, server
    Affects Versions: 3.3.3
            Reporter: Vishal Kathuria
            Priority: Blocker
             Fix For: 3.3.4


The log truncation relies on position calculation for a particular zxid to figure out the new size of the log file. There is a bug in PositionInputStream implementation which skips counting the bytes in the log which have value 0. This can lead to underestimating the actual log size. The log records which should be there can get truncated, leading to data loss on the participant which is executing the trunc.

Clients can see different values depending on whether they connect to the node on which trunc was executed. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1156) Log truncation truncating log too much - can cause data loss

Posted by "Vishal Kathuria (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087266#comment-13087266 ] 

Vishal Kathuria commented on ZOOKEEPER-1156:
--------------------------------------------

Forgot to mention the testing for this fix. I have a test that I am writing for ZOOKEEPER-1154 that tests for this scenario as well.

> Log truncation truncating log too much - can cause data loss
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1156
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1156
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.3.3
>            Reporter: Vishal Kathuria
>            Priority: Blocker
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1156.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The log truncation relies on position calculation for a particular zxid to figure out the new size of the log file. There is a bug in PositionInputStream implementation which skips counting the bytes in the log which have value 0. This can lead to underestimating the actual log size. The log records which should be there can get truncated, leading to data loss on the participant which is executing the trunc.
> Clients can see different values depending on whether they connect to the node on which trunc was executed. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1156) Log truncation truncating log too much - can cause data loss

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096631#comment-13096631 ] 

Hudson commented on ZOOKEEPER-1156:
-----------------------------------

Integrated in ZooKeeper-trunk #1293 (See [https://builds.apache.org/job/ZooKeeper-trunk/1293/])
    ZOOKEEPER-1154, ZOOKEEPER-1156: 
Data inconsistency when the node(s) with the highest zxid is not present at the time of leader election
Log truncation truncating log too much - can cause data loss

Vishal Kathuria via camille

camille : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1164758
Files : 
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java


> Log truncation truncating log too much - can cause data loss
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1156
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1156
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.3.3
>            Reporter: Vishal Kathuria
>            Assignee: Vishal Kathuria
>            Priority: Blocker
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1156.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The log truncation relies on position calculation for a particular zxid to figure out the new size of the log file. There is a bug in PositionInputStream implementation which skips counting the bytes in the log which have value 0. This can lead to underestimating the actual log size. The log records which should be there can get truncated, leading to data loss on the participant which is executing the trunc.
> Clients can see different values depending on whether they connect to the node on which trunc was executed. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1156) Log truncation truncating log too much - can cause data loss

Posted by "Vishal Kathuria (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087178#comment-13087178 ] 

Vishal Kathuria commented on ZOOKEEPER-1156:
--------------------------------------------

Here is the scenario

Lets say the current leader A is at zxid 80.
A participant B with zxid 81 joins and gets a message from leader TRUNC,80

B then calculates the length of log up till zxid 80. The actual length is, say  450, but because of the bug, the value it calculates is 420. B truncates the log to size 420.

When loadDatabase is called again, the log is replayed till 79 because log record 80 isn't complete.

The node B doesn't have the change that had zxid 80. The leader will not send change 80 to B either.

In my manual repro, the change with zxid 80 was a create. I could see the created node when I connected to A but not when connected to B.


> Log truncation truncating log too much - can cause data loss
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1156
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1156
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.3.3
>            Reporter: Vishal Kathuria
>            Priority: Blocker
>             Fix For: 3.3.4
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The log truncation relies on position calculation for a particular zxid to figure out the new size of the log file. There is a bug in PositionInputStream implementation which skips counting the bytes in the log which have value 0. This can lead to underestimating the actual log size. The log records which should be there can get truncated, leading to data loss on the participant which is executing the trunc.
> Clients can see different values depending on whether they connect to the node on which trunc was executed. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (ZOOKEEPER-1156) Log truncation truncating log too much - can cause data loss

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt reassigned ZOOKEEPER-1156:
---------------------------------------

    Assignee: Vishal Kathuria

> Log truncation truncating log too much - can cause data loss
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1156
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1156
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.3.3
>            Reporter: Vishal Kathuria
>            Assignee: Vishal Kathuria
>            Priority: Blocker
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1156.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The log truncation relies on position calculation for a particular zxid to figure out the new size of the log file. There is a bug in PositionInputStream implementation which skips counting the bytes in the log which have value 0. This can lead to underestimating the actual log size. The log records which should be there can get truncated, leading to data loss on the participant which is executing the trunc.
> Clients can see different values depending on whether they connect to the node on which trunc was executed. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1156) Log truncation truncating log too much - can cause data loss

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087255#comment-13087255 ] 

Hadoop QA commented on ZOOKEEPER-1156:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12490846/ZOOKEEPER-1156.patch
  against trunk revision 1157698.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/465//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/465//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/465//console

This message is automatically generated.

> Log truncation truncating log too much - can cause data loss
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1156
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1156
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.3.3
>            Reporter: Vishal Kathuria
>            Priority: Blocker
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1156.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The log truncation relies on position calculation for a particular zxid to figure out the new size of the log file. There is a bug in PositionInputStream implementation which skips counting the bytes in the log which have value 0. This can lead to underestimating the actual log size. The log records which should be there can get truncated, leading to data loss on the participant which is executing the trunc.
> Clients can see different values depending on whether they connect to the node on which trunc was executed. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1156) Log truncation truncating log too much - can cause data loss

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1156:
------------------------------------

    Fix Version/s: 3.4.0

> Log truncation truncating log too much - can cause data loss
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1156
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1156
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.3.3
>            Reporter: Vishal Kathuria
>            Assignee: Vishal Kathuria
>            Priority: Blocker
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1156.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The log truncation relies on position calculation for a particular zxid to figure out the new size of the log file. There is a bug in PositionInputStream implementation which skips counting the bytes in the log which have value 0. This can lead to underestimating the actual log size. The log records which should be there can get truncated, leading to data loss on the participant which is executing the trunc.
> Clients can see different values depending on whether they connect to the node on which trunc was executed. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1156) Log truncation truncating log too much - can cause data loss

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087152#comment-13087152 ] 

Mahadev konar commented on ZOOKEEPER-1156:
------------------------------------------

Vishal,
 I am little confused. Why is there a correctness issue here? I can see performance issue but dont see a correctness issue. Can you explain the scenario?



> Log truncation truncating log too much - can cause data loss
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1156
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1156
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.3.3
>            Reporter: Vishal Kathuria
>            Priority: Blocker
>             Fix For: 3.3.4
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The log truncation relies on position calculation for a particular zxid to figure out the new size of the log file. There is a bug in PositionInputStream implementation which skips counting the bytes in the log which have value 0. This can lead to underestimating the actual log size. The log records which should be there can get truncated, leading to data loss on the participant which is executing the trunc.
> Clients can see different values depending on whether they connect to the node on which trunc was executed. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1156) Log truncation truncating log too much - can cause data loss

Posted by "Vishal Kathuria (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal Kathuria updated ZOOKEEPER-1156:
---------------------------------------

    Attachment: ZOOKEEPER-1156.patch

This patch includes the fix. It makes the following changes to PositionInputStream

1. In read, use rc > -1 instead of rc > 0 so the bytes with 0 value do not get skipped.
2. Overrided some functions related to marking. Idea is that if the user of this stream uses these functions, we shouldn't silently return incorrect position.



> Log truncation truncating log too much - can cause data loss
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1156
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1156
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.3.3
>            Reporter: Vishal Kathuria
>            Priority: Blocker
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1156.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The log truncation relies on position calculation for a particular zxid to figure out the new size of the log file. There is a bug in PositionInputStream implementation which skips counting the bytes in the log which have value 0. This can lead to underestimating the actual log size. The log records which should be there can get truncated, leading to data loss on the participant which is executing the trunc.
> Clients can see different values depending on whether they connect to the node on which trunc was executed. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira