You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Thawan Kooburat (JIRA)" <ji...@apache.org> on 2012/10/01 02:58:07 UTC

[jira] [Created] (ZOOKEEPER-1551) Observer ignore txns that comes after snapshot and UPTODATE

Thawan Kooburat created ZOOKEEPER-1551:
------------------------------------------

             Summary: Observer ignore txns that comes after snapshot and UPTODATE 
                 Key: ZOOKEEPER-1551
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.4.3
            Reporter: Thawan Kooburat
            Assignee: Thawan Kooburat
            Priority: Critical
             Fix For: 3.5.0


In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1551) Observer ignore txns that comes after snapshot and UPTODATE

Posted by "Thawan Kooburat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471706#comment-13471706 ] 

Thawan Kooburat commented on ZOOKEEPER-1551:
--------------------------------------------

I think this patch is very critical and we need to fix it as soon as possible since data inconsistency occur quite frequent in our environment due to this bug. 

The current patch doesn't break the compatibility so we can commit it. Then, we can revisit/rewrite this logic as needed to fix ZOOKEEPER-1559. 
                
> Observer ignore txns that comes after snapshot and UPTODATE 
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1551
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Thawan Kooburat
>            Assignee: Thawan Kooburat
>            Priority: Critical
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1551.patch
>
>
> In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1551) Observer ignore txns that comes after snapshot and UPTODATE

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467213#comment-13467213 ] 

Flavio Junqueira commented on ZOOKEEPER-1551:
---------------------------------------------

I'd like to ask a couple of quick clarifications about this patch if you don't mind:

# Given that in Learner both follower and observer need to commit transactions after the snapshot, do we really need different code for follower and observer? 
# Observers also need to receive outstanding proposals, since commits do not have the proposals themselves, so I was wondering if the last optimization introduced in Leader is actually correct.

Thanks! 
                
> Observer ignore txns that comes after snapshot and UPTODATE 
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1551
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Thawan Kooburat
>            Assignee: Thawan Kooburat
>            Priority: Critical
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1551.patch
>
>
> In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1551) Observer ignore txns that comes after snapshot and UPTODATE

Posted by "Thawan Kooburat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thawan Kooburat updated ZOOKEEPER-1551:
---------------------------------------

    Attachment: ZOOKEEPER-1551.patch

This is patch is created using --ignore-all-space flag, since my editor is configured to trim all white spaces. So the indentation is incorrect in one place.

We recently found this issue since we enabled  sync request processor in the observer (ZOOKEEPER-1552). So we saw that the snapshot is invalid in some machines.    


                
> Observer ignore txns that comes after snapshot and UPTODATE 
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1551
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Thawan Kooburat
>            Assignee: Thawan Kooburat
>            Priority: Critical
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1551.patch
>
>
> In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1551) Observer ignore txns that comes after snapshot and UPTODATE

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468819#comment-13468819 ] 

Flavio Junqueira commented on ZOOKEEPER-1551:
---------------------------------------------

Thanks for the clarifications. I actually forgot that we send the transaction with INFORM. 

I was wondering about this part of the code:

{noformat}
+                    if (!snapshotTaken) {
+                        // Apply to db directly if we haven't taken the snapshot
+                        zk.processTxn(packet.hdr, packet.rec);
+                    } else {
+                        packetsNotCommitted.add(packet);
+                        packetsCommitted.add(qp.getZxid());
+                    }
{noformat}

INFORM is supposed to inform an observer of a committed transaction, so why keep them as not committed? I'm getting the impression that this is related to the issue of ZOOKEEPER-1549.
                
> Observer ignore txns that comes after snapshot and UPTODATE 
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1551
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Thawan Kooburat
>            Assignee: Thawan Kooburat
>            Priority: Critical
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1551.patch
>
>
> In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1551) Observer ignore txns that comes after snapshot and UPTODATE

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466613#comment-13466613 ] 

Hadoop QA commented on ZOOKEEPER-1551:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12547189/ZOOKEEPER-1551.patch
  against trunk revision 1391526.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1198//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1198//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1198//console

This message is automatically generated.
                
> Observer ignore txns that comes after snapshot and UPTODATE 
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1551
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Thawan Kooburat
>            Assignee: Thawan Kooburat
>            Priority: Critical
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1551.patch
>
>
> In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1551) Observer ignore txns that comes after snapshot and UPTODATE

Posted by "Thawan Kooburat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468843#comment-13468843 ] 

Thawan Kooburat commented on ZOOKEEPER-1551:
--------------------------------------------

This is to make ZOOKEEPER-1552 works correctly.  Previously, Observer only create a snapshot after sync-up and never write to disk at all. 

We need to log INFORM packets, which come after the snapshot is taken, to disk like the followers. So that on the next restart, these txns won't be missing (if the observer is restarted before the next snapshot)  

The patch has 5 lines commented in the unit test (Since it requires ZOOKEEPER-1552) which will cover this case. 
                
> Observer ignore txns that comes after snapshot and UPTODATE 
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1551
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Thawan Kooburat
>            Assignee: Thawan Kooburat
>            Priority: Critical
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1551.patch
>
>
> In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1551) Observer ignore txns that comes after snapshot and UPTODATE

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471021#comment-13471021 ] 

Flavio Junqueira commented on ZOOKEEPER-1551:
---------------------------------------------

I understand that this patch reflects the follower behavior. However, as per ZOOKEEPER-1549, this behavior is not correct because a snapshot at that point might contain uncommitted state. If we commit this patch, then we will need to fix it in ZOOKEEPER-1559. What do you think?

 
                
> Observer ignore txns that comes after snapshot and UPTODATE 
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1551
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Thawan Kooburat
>            Assignee: Thawan Kooburat
>            Priority: Critical
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1551.patch
>
>
> In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1551) Observer ignore txns that comes after snapshot and UPTODATE

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471831#comment-13471831 ] 

Flavio Junqueira commented on ZOOKEEPER-1551:
---------------------------------------------

Sounds fine. I'd like to have closer look at the tests, but otherwise it looks good to me.
                
> Observer ignore txns that comes after snapshot and UPTODATE 
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1551
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Thawan Kooburat
>            Assignee: Thawan Kooburat
>            Priority: Critical
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1551.patch
>
>
> In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1551) Observer ignore txns that comes after snapshot and UPTODATE

Posted by "Thawan Kooburat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466611#comment-13466611 ] 

Thawan Kooburat commented on ZOOKEEPER-1551:
--------------------------------------------

The change to Leader.java is just for optimization. I don't think it break the compatibility. 
                
> Observer ignore txns that comes after snapshot and UPTODATE 
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1551
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Thawan Kooburat
>            Assignee: Thawan Kooburat
>            Priority: Critical
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1551.patch
>
>
> In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1551) Observer ignore txns that comes after snapshot and UPTODATE

Posted by "Thawan Kooburat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467403#comment-13467403 ] 

Thawan Kooburat commented on ZOOKEEPER-1551:
--------------------------------------------

1. If we do a bit of refactoring on ObserverZookeeperServer, then it should be able to use the same logic. However, this will also require us keep track of pendingTxns like FollowerZookeeperServer which have a separate method for logRequest() and commit(). 

2. During normal operation, observers receive in-flight txns via INFORM packets which include the request itself. So the request from outstanding proposals will eventually reach the observer as INFORM packet after startForwarding() is called. There is no need for observer to receive these proposals.

I intended to have this patch on-top of ZOOKEEPER-1552.
                
> Observer ignore txns that comes after snapshot and UPTODATE 
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1551
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Thawan Kooburat
>            Assignee: Thawan Kooburat
>            Priority: Critical
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1551.patch
>
>
> In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira