You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Thawan Kooburat <th...@fb.com> on 2013/05/20 07:40:01 UTC

Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/
-----------------------------------------------------------

Review request for zookeeper.


Description
-------

ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

- Use txnlog for learner synchronization if learner fall too far behind
- Refactoring LearnerHandler to deal with different cases of handling learner synchronization  


This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
    https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413


Diffs
-----

  /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
  /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1483440 
  /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1483440 
  /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1483440 
  /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1483440 
  /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1483440 
  /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
  /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1483440 
  /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
  /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1483440 

Diff: https://reviews.apache.org/r/11231/diff/


Testing
-------

- unit tests
- ran in prod for more than half a year


Thanks,

Thawan Kooburat


Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

Posted by Thawan Kooburat <th...@fb.com>.

> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java, line 67
> > <https://reviews.apache.org/r/11231/diff/1/?file=293955#file293955line67>
> >
> >     Ideally we propagate the exception up, but it is incompatible with the signature of the method unless we throw NoSuchElementException. What if we return null in the case of an exception?
> 
> Thawan Kooburat wrote:
>     IOException can be thrown from  
>     
>     hasNext = itr.next();
>     
>     which for prefetching the next element. 
>     
>     I think we only need to set hasNext to false when we see exception, so the client won't try to call next() again
>
> 
> Raul Gutierrez Segales wrote:
>     Record#serialize() can also throw IOException. Are callers of this method expected to deal with a Proposal that isn't properly initialized (i.e.: won't have the QuorumPacket member won't be initialized so might have wrong zxid, type, etc)?
> 
> Thawan Kooburat wrote:
>     itr.next() is the only place that actually perform IO-related operation. 
>     
>     Record#serialize() can thrown IO exception if the output stream is writing to network or disk. However, in this case, we are writing it to memory because we are using BinaryOutputArchive/ByteArrayOutputStream.  I looked at the code, these implementation cannot throw IOException but they have to inherit the method signature of output stream.
> 
> fpj wrote:
>     My concern here is that we simply log the error and keep making progress regularly, that's why I suggested we could try to propagate an exception up, but it is not completely straightforward because of the reasons I stated. If a server acks a txn but later cannot read it, then we might end up in an inconsistent state.

>From synchronization perspective, if the leader encounter a possible hole or corrupted txnlog, it will have to abort by clear the txn queue which is going to be sent to the learner. Then, it can fall back to sending a snapshot.

The txnlog facility need to be able to guarantee there properties but i don't think we have that yet.  I think we only do per-entry checksum and throws IOexception because of disk read, checksum or deserialization failure.   


- Thawan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
-----------------------------------------------------------


On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11231/
> -----------------------------------------------------------
> 
> (Updated May 20, 2013, 5:40 a.m.)
> 
> 
> Review request for zookeeper.
> 
> 
> Description
> -------
> 
> ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
> 
> - Use txnlog for learner synchronization if learner fall too far behind
> - Refactoring LearnerHandler to deal with different cases of handling learner synchronization  
> 
> 
> This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> 
> 
> Diffs
> -----
> 
>   /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
>   /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1483440 
>   /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1483440 
>   /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1483440 
> 
> Diff: https://reviews.apache.org/r/11231/diff/
> 
> 
> Testing
> -------
> 
> - unit tests
> - ran in prod for more than half a year
> 
> 
> Thanks,
> 
> Thawan Kooburat
> 
>


Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

Posted by fp...@apache.org.

> On May 24, 2013, 12:19 a.m., fpj wrote:
> > Awesome job, Thawan. I have some comments and questions below, most of them are minor.

If you produce a new patch with the changes we agreed upon, then I'll +1 it and we can commit it.


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java, line 67
> > <https://reviews.apache.org/r/11231/diff/1/?file=293955#file293955line67>
> >
> >     Ideally we propagate the exception up, but it is incompatible with the signature of the method unless we throw NoSuchElementException. What if we return null in the case of an exception?
> 
> Thawan Kooburat wrote:
>     IOException can be thrown from  
>     
>     hasNext = itr.next();
>     
>     which for prefetching the next element. 
>     
>     I think we only need to set hasNext to false when we see exception, so the client won't try to call next() again
>
> 
> Raul Gutierrez Segales wrote:
>     Record#serialize() can also throw IOException. Are callers of this method expected to deal with a Proposal that isn't properly initialized (i.e.: won't have the QuorumPacket member won't be initialized so might have wrong zxid, type, etc)?
> 
> Thawan Kooburat wrote:
>     itr.next() is the only place that actually perform IO-related operation. 
>     
>     Record#serialize() can thrown IO exception if the output stream is writing to network or disk. However, in this case, we are writing it to memory because we are using BinaryOutputArchive/ByteArrayOutputStream.  I looked at the code, these implementation cannot throw IOException but they have to inherit the method signature of output stream.
> 
> fpj wrote:
>     My concern here is that we simply log the error and keep making progress regularly, that's why I suggested we could try to propagate an exception up, but it is not completely straightforward because of the reasons I stated. If a server acks a txn but later cannot read it, then we might end up in an inconsistent state.
> 
> Thawan Kooburat wrote:
>     From synchronization perspective, if the leader encounter a possible hole or corrupted txnlog, it will have to abort by clear the txn queue which is going to be sent to the learner. Then, it can fall back to sending a snapshot.
>     
>     The txnlog facility need to be able to guarantee there properties but i don't think we have that yet.  I think we only do per-entry checksum and throws IOexception because of disk read, checksum or deserialization failure.
> 
> fpj wrote:
>     It is a bit worse than what you're saying. If we miss txns because the log is corrupt, then we might end up in a situation in which we don't have a majority of servers containing a given txn for an operation that has been replied. In such a case, we may end up starting an epoch missing some txns that we have replied to. 
>     
>     If you think that the problem is deeper (and it might be) and we need to do more, then perhaps we need to discuss it in another jira. From your response, you prefer not to propagate an exception up the call chain.
> 
> Thawan Kooburat wrote:
>     This depends on how we decide to handle the error.  With the current code, it will be easier but slower to verify the integrity of the entire txnlog file during db.getProposalsFromTxnLog() call,  so the leader won't enter try to use txnlog at all and fallback to use snapshot. If we decide to load txnlog on-demand, we need a way to propagate the exception and abort the DIFF attempt.
>     
>     Right now, I am not sure if we have a way to verify the integrity of txnlog file.  For example, we should have a notion that a file is already closed and it should contains txn [1..10].  If we get IOException before reading txn 10, its means that txnlog is corrupted. If a file is not close, it means that this is the latest file so failing to reading txn 10 may a result of a partial write, so it is ok to ignore.
>     
>     I think there are 2 main problems that can cause txnlog to fail integrity check needed by this feature: the file is partially corrupt or the operator didn't copied all the txnlog file. We may have to have a separate JIRA to deal with these problem.

Ok, let's wrap up this jira with the changes we agreed upon and discuss txn log integrity separately. Could you create a jira for this discussion, please?


- fpj


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
-----------------------------------------------------------


On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11231/
> -----------------------------------------------------------
> 
> (Updated May 20, 2013, 5:40 a.m.)
> 
> 
> Review request for zookeeper.
> 
> 
> Description
> -------
> 
> ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
> 
> - Use txnlog for learner synchronization if learner fall too far behind
> - Refactoring LearnerHandler to deal with different cases of handling learner synchronization  
> 
> 
> This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> 
> 
> Diffs
> -----
> 
>   /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
>   /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1483440 
>   /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1483440 
>   /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1483440 
> 
> Diff: https://reviews.apache.org/r/11231/diff/
> 
> 
> Testing
> -------
> 
> - unit tests
> - ran in prod for more than half a year
> 
> 
> Thanks,
> 
> Thawan Kooburat
> 
>


Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

Posted by fp...@apache.org.

> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java, line 67
> > <https://reviews.apache.org/r/11231/diff/1/?file=293955#file293955line67>
> >
> >     Ideally we propagate the exception up, but it is incompatible with the signature of the method unless we throw NoSuchElementException. What if we return null in the case of an exception?
> 
> Thawan Kooburat wrote:
>     IOException can be thrown from  
>     
>     hasNext = itr.next();
>     
>     which for prefetching the next element. 
>     
>     I think we only need to set hasNext to false when we see exception, so the client won't try to call next() again
>
> 
> Raul Gutierrez Segales wrote:
>     Record#serialize() can also throw IOException. Are callers of this method expected to deal with a Proposal that isn't properly initialized (i.e.: won't have the QuorumPacket member won't be initialized so might have wrong zxid, type, etc)?
> 
> Thawan Kooburat wrote:
>     itr.next() is the only place that actually perform IO-related operation. 
>     
>     Record#serialize() can thrown IO exception if the output stream is writing to network or disk. However, in this case, we are writing it to memory because we are using BinaryOutputArchive/ByteArrayOutputStream.  I looked at the code, these implementation cannot throw IOException but they have to inherit the method signature of output stream.
> 
> fpj wrote:
>     My concern here is that we simply log the error and keep making progress regularly, that's why I suggested we could try to propagate an exception up, but it is not completely straightforward because of the reasons I stated. If a server acks a txn but later cannot read it, then we might end up in an inconsistent state.
> 
> Thawan Kooburat wrote:
>     From synchronization perspective, if the leader encounter a possible hole or corrupted txnlog, it will have to abort by clear the txn queue which is going to be sent to the learner. Then, it can fall back to sending a snapshot.
>     
>     The txnlog facility need to be able to guarantee there properties but i don't think we have that yet.  I think we only do per-entry checksum and throws IOexception because of disk read, checksum or deserialization failure.

It is a bit worse than what you're saying. If we miss txns because the log is corrupt, then we might end up in a situation in which we don't have a majority of servers containing a given txn for an operation that has been replied. In such a case, we may end up starting an epoch missing some txns that we have replied to. 

If you think that the problem is deeper (and it might be) and we need to do more, then perhaps we need to discuss it in another jira. From your response, you prefer not to propagate an exception up the call chain.


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 861
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line861>
> >
> >     I'm confused about this case. The only way I can think of right now of falling into this case is if the follower has tried to join an epoch that has never become established. 
> >     
> >     Why does the learner crash?
> 
> Thawan Kooburat wrote:
>     This is also a case when quorum is formed and it has never see any request.  Then, we restart the quorum.
> 
> fpj wrote:
>     When you say that a quorum is formed, which point of the protocol you're referring to? When a quorum has acknowledged NEWLEADER or some earlier point?
> 
> Thawan Kooburat wrote:
>     I think the case that prompt me to add NewEpochZxid-related stuff is below .   Here the leader has seen the follower transaction T(epoch, zxid)   
>     
>     1. T(1, 4)
>     2. T(2, 0)  - NEWLEADER
>     3. T(3, 0)  - NEWLEADER
>     4. T(3, 1)
>     
>     If a follower joined a quorum at epoch 2 and die. Its peerLastZxid can be T(2,0) if it get a snapshot from the leader, or it gets NEWLEADER packet via outstanding proposals.
>     
>     When the same follower try to join the quorum again, the leader will try to ask the follower to TRUNC to T(1,4) which is not possible because there is no txnlog to truncate and the learner will crash. 
>     
>     The fix is to send a DIFF starting at T(1,4) instead.  However, I think I added a block that handle (Cannot send TRUNC to peer ...) just for safety based on the observation that TRUNC can only work if it is within the same epoch.

Right now it sounds correct to me, but I'll think some more about this one. 


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 866
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line866>
> >
> >     This seems to be more than a warn, right? It should be either error or fatal.
> 
> Thawan Kooburat wrote:
>     Should be WARN or INFO because we just send SNAP whenever we have a case that we cannot handle. It is correct but not optimized.
> 
> fpj wrote:
>     You're right, but the logic doesn't seem completely straightforward to me. Is it the case that we end up sending a snapshot because of the special case handled by this if block? 
>     
>             if (needOpPacket && !needSnap) {
>                 // This should never happen, but we should fall back to sending
>                 // snapshot just in case.
>     
>                 LOG.error("Unhandled scenario for peer sid: " +  getSid() +
>                          " fall back to use snapshot");
>     
>                 needSnap = true;
>             }
>     
>     If this is right, why do we need special handling instead of just capturing the fact we need a snapshot from the return value of queuedCommittedProposal?
>
> 
> Thawan Kooburat wrote:
>     This block just act as a safety net to send SNAP if there is case that we forgot to handle (in addition to this one).  When I thought about this carefully, I am not sure if I ever see this log line (Cannot send TRUNC to peer sid...) in our prod or not.

I was actually asking how we end up sending a snapshot. If I get it right, we need to set needSnap and the only place I could spot that sets needSnap after hitting the case I originally commented on (the comment about being error or fatal). Is the block I copied above the one that causes us to send a snapshot in the case we hit the scenario of LearnerHandler:750? 


- fpj


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
-----------------------------------------------------------


On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11231/
> -----------------------------------------------------------
> 
> (Updated May 20, 2013, 5:40 a.m.)
> 
> 
> Review request for zookeeper.
> 
> 
> Description
> -------
> 
> ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
> 
> - Use txnlog for learner synchronization if learner fall too far behind
> - Refactoring LearnerHandler to deal with different cases of handling learner synchronization  
> 
> 
> This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> 
> 
> Diffs
> -----
> 
>   /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
>   /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1483440 
>   /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1483440 
>   /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1483440 
> 
> Diff: https://reviews.apache.org/r/11231/diff/
> 
> 
> Testing
> -------
> 
> - unit tests
> - ran in prod for more than half a year
> 
> 
> Thanks,
> 
> Thawan Kooburat
> 
>


Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

Posted by Thawan Kooburat <th...@fb.com>.

> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java, line 18
> > <https://reviews.apache.org/r/11231/diff/1/?file=293955#file293955line18>
> >
> >     "... provides an..."

fixed


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java, line 523
> > <https://reviews.apache.org/r/11231/diff/1/?file=293957#file293957line523>
> >
> >     What if we do the fast forwarding in this method instead of having a boolean passed to init?
> 
> Thawan Kooburat wrote:
>     sure, I can refactor it

fixed


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java, line 545
> > <https://reviews.apache.org/r/11231/diff/1/?file=293957#file293957line545>
> >
> >     This fast forward parameter is a bit odd. Need to check why that's necessary and if there is an alternative.
> 
> Thawan Kooburat wrote:
>     From learner synchronization perspective, the leader what to send only txns which has zxid > peerLastSeenZxid.  I have to do this either in this layer or in the LearnerHandler. Doing in in this layer is easier to write a unit test and reduce code complexity in LearnerHandler. 
>     
>
> 
> fpj wrote:
>     Ok, I think my comment is mostly about style, I rarely see an init method with parameters, but there is nothing really broken about doing it.

fixed


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java, line 200
> > <https://reviews.apache.org/r/11231/diff/1/?file=293958#file293958line200>
> >
> >     Should we call this method and the one below something like readTxnLog?
> 
> Thawan Kooburat wrote:
>     This is an existing method, I think we should only rename a method only if its semantic has changed or the name is really miss leading.
> 
> fpj wrote:
>     I think the read method you say exists is in FileTxnLog. My comment is about having a read() method in FileTxnSnapLog because the read could be either for a txn log or a snapshot. I don't think we would be renaming an existing method, but perhaps I'm not getting your comment right.
> 
> Thawan Kooburat wrote:
>     Yes, you are right. I was confused with FileTxnLog.read(). Sure, I can rename this one.

fixed


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 699
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line699>
> >
> >     Make the log info conditional or use curly braces.
> 
> Thawan Kooburat wrote:
>     You mean that I should use this? http://www.slf4j.org/apidocs/org/slf4j/Logger.html#info%28java.lang.String,%20java.lang.Object...%29 
>
> 
> fpj wrote:
>     Yep!

This requires new version of SLF4J


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 733
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line733>
> >
> >     "send send"

fixed


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 812
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line812>
> >
> >     Zab

fixed


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 815
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line815>
> >
> >     suggested name for the method: queueCommittedProposals
> 
> Thawan Kooburat wrote:
>     no problem

fixed


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 841
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line841>
> >
> >     ... when we see the...

fixed


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 864
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line864>
> >
> >     ... if it is asked to do so...

fixed


- Thawan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
-----------------------------------------------------------


On June 24, 2013, 8:40 a.m., Thawan Kooburat wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11231/
> -----------------------------------------------------------
> 
> (Updated June 24, 2013, 8:40 a.m.)
> 
> 
> Review request for zookeeper.
> 
> 
> Description
> -------
> 
> ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
> 
> - Use txnlog for learner synchronization if learner fall too far behind
> - Refactoring LearnerHandler to deal with different cases of handling learner synchronization  
> 
> 
> This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> 
> 
> Diffs
> -----
> 
>   /ivy.xml 1495522 
>   /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
>   /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1495522 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1495522 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1495522 
>   /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1495522 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1495522 
>   /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1495522 
>   /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1495522 
> 
> Diff: https://reviews.apache.org/r/11231/diff/
> 
> 
> Testing
> -------
> 
> - unit tests
> - ran in prod for more than half a year
> 
> 
> Thanks,
> 
> Thawan Kooburat
> 
>


Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

Posted by Thawan Kooburat <th...@fb.com>.

> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java, line 67
> > <https://reviews.apache.org/r/11231/diff/1/?file=293955#file293955line67>
> >
> >     Ideally we propagate the exception up, but it is incompatible with the signature of the method unless we throw NoSuchElementException. What if we return null in the case of an exception?
> 
> Thawan Kooburat wrote:
>     IOException can be thrown from  
>     
>     hasNext = itr.next();
>     
>     which for prefetching the next element. 
>     
>     I think we only need to set hasNext to false when we see exception, so the client won't try to call next() again
>
> 
> Raul Gutierrez Segales wrote:
>     Record#serialize() can also throw IOException. Are callers of this method expected to deal with a Proposal that isn't properly initialized (i.e.: won't have the QuorumPacket member won't be initialized so might have wrong zxid, type, etc)?

itr.next() is the only place that actually perform IO-related operation. 

Record#serialize() can thrown IO exception if the output stream is writing to network or disk. However, in this case, we are writing it to memory because we are using BinaryOutputArchive/ByteArrayOutputStream.  I looked at the code, these implementation cannot throw IOException but they have to inherit the method signature of output stream.  


- Thawan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
-----------------------------------------------------------


On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11231/
> -----------------------------------------------------------
> 
> (Updated May 20, 2013, 5:40 a.m.)
> 
> 
> Review request for zookeeper.
> 
> 
> Description
> -------
> 
> ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
> 
> - Use txnlog for learner synchronization if learner fall too far behind
> - Refactoring LearnerHandler to deal with different cases of handling learner synchronization  
> 
> 
> This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> 
> 
> Diffs
> -----
> 
>   /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
>   /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1483440 
>   /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1483440 
>   /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1483440 
> 
> Diff: https://reviews.apache.org/r/11231/diff/
> 
> 
> Testing
> -------
> 
> - unit tests
> - ran in prod for more than half a year
> 
> 
> Thanks,
> 
> Thawan Kooburat
> 
>


Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

Posted by Thawan Kooburat <th...@fb.com>.

> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java, line 67
> > <https://reviews.apache.org/r/11231/diff/1/?file=293955#file293955line67>
> >
> >     Ideally we propagate the exception up, but it is incompatible with the signature of the method unless we throw NoSuchElementException. What if we return null in the case of an exception?
> 
> Thawan Kooburat wrote:
>     IOException can be thrown from  
>     
>     hasNext = itr.next();
>     
>     which for prefetching the next element. 
>     
>     I think we only need to set hasNext to false when we see exception, so the client won't try to call next() again
>
> 
> Raul Gutierrez Segales wrote:
>     Record#serialize() can also throw IOException. Are callers of this method expected to deal with a Proposal that isn't properly initialized (i.e.: won't have the QuorumPacket member won't be initialized so might have wrong zxid, type, etc)?
> 
> Thawan Kooburat wrote:
>     itr.next() is the only place that actually perform IO-related operation. 
>     
>     Record#serialize() can thrown IO exception if the output stream is writing to network or disk. However, in this case, we are writing it to memory because we are using BinaryOutputArchive/ByteArrayOutputStream.  I looked at the code, these implementation cannot throw IOException but they have to inherit the method signature of output stream.
> 
> fpj wrote:
>     My concern here is that we simply log the error and keep making progress regularly, that's why I suggested we could try to propagate an exception up, but it is not completely straightforward because of the reasons I stated. If a server acks a txn but later cannot read it, then we might end up in an inconsistent state.
> 
> Thawan Kooburat wrote:
>     From synchronization perspective, if the leader encounter a possible hole or corrupted txnlog, it will have to abort by clear the txn queue which is going to be sent to the learner. Then, it can fall back to sending a snapshot.
>     
>     The txnlog facility need to be able to guarantee there properties but i don't think we have that yet.  I think we only do per-entry checksum and throws IOexception because of disk read, checksum or deserialization failure.
> 
> fpj wrote:
>     It is a bit worse than what you're saying. If we miss txns because the log is corrupt, then we might end up in a situation in which we don't have a majority of servers containing a given txn for an operation that has been replied. In such a case, we may end up starting an epoch missing some txns that we have replied to. 
>     
>     If you think that the problem is deeper (and it might be) and we need to do more, then perhaps we need to discuss it in another jira. From your response, you prefer not to propagate an exception up the call chain.
> 
> Thawan Kooburat wrote:
>     This depends on how we decide to handle the error.  With the current code, it will be easier but slower to verify the integrity of the entire txnlog file during db.getProposalsFromTxnLog() call,  so the leader won't enter try to use txnlog at all and fallback to use snapshot. If we decide to load txnlog on-demand, we need a way to propagate the exception and abort the DIFF attempt.
>     
>     Right now, I am not sure if we have a way to verify the integrity of txnlog file.  For example, we should have a notion that a file is already closed and it should contains txn [1..10].  If we get IOException before reading txn 10, its means that txnlog is corrupted. If a file is not close, it means that this is the latest file so failing to reading txn 10 may a result of a partial write, so it is ok to ignore.
>     
>     I think there are 2 main problems that can cause txnlog to fail integrity check needed by this feature: the file is partially corrupt or the operator didn't copied all the txnlog file. We may have to have a separate JIRA to deal with these problem.
> 
> fpj wrote:
>     Ok, let's wrap up this jira with the changes we agreed upon and discuss txn log integrity separately. Could you create a jira for this discussion, please?

Ok,  we can discuss about this in ZOOKEEPER-1710


- Thawan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
-----------------------------------------------------------


On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11231/
> -----------------------------------------------------------
> 
> (Updated May 20, 2013, 5:40 a.m.)
> 
> 
> Review request for zookeeper.
> 
> 
> Description
> -------
> 
> ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
> 
> - Use txnlog for learner synchronization if learner fall too far behind
> - Refactoring LearnerHandler to deal with different cases of handling learner synchronization  
> 
> 
> This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> 
> 
> Diffs
> -----
> 
>   /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
>   /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1483440 
>   /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1483440 
>   /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1483440 
> 
> Diff: https://reviews.apache.org/r/11231/diff/
> 
> 
> Testing
> -------
> 
> - unit tests
> - ran in prod for more than half a year
> 
> 
> Thanks,
> 
> Thawan Kooburat
> 
>


Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

Posted by Thawan Kooburat <th...@fb.com>.

> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 866
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line866>
> >
> >     This seems to be more than a warn, right? It should be either error or fatal.
> 
> Thawan Kooburat wrote:
>     Should be WARN or INFO because we just send SNAP whenever we have a case that we cannot handle. It is correct but not optimized.
> 
> fpj wrote:
>     You're right, but the logic doesn't seem completely straightforward to me. Is it the case that we end up sending a snapshot because of the special case handled by this if block? 
>     
>             if (needOpPacket && !needSnap) {
>                 // This should never happen, but we should fall back to sending
>                 // snapshot just in case.
>     
>                 LOG.error("Unhandled scenario for peer sid: " +  getSid() +
>                          " fall back to use snapshot");
>     
>                 needSnap = true;
>             }
>     
>     If this is right, why do we need special handling instead of just capturing the fact we need a snapshot from the return value of queuedCommittedProposal?
>
> 
> Thawan Kooburat wrote:
>     This block just act as a safety net to send SNAP if there is case that we forgot to handle (in addition to this one).  When I thought about this carefully, I am not sure if I ever see this log line (Cannot send TRUNC to peer sid...) in our prod or not.
> 
> fpj wrote:
>     I was actually asking how we end up sending a snapshot. If I get it right, we need to set needSnap and the only place I could spot that sets needSnap after hitting the case I originally commented on (the comment about being error or fatal). Is the block I copied above the one that causes us to send a snapshot in the case we hit the scenario of LearnerHandler:750?

Yes, your understanding is correct


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java, line 67
> > <https://reviews.apache.org/r/11231/diff/1/?file=293955#file293955line67>
> >
> >     Ideally we propagate the exception up, but it is incompatible with the signature of the method unless we throw NoSuchElementException. What if we return null in the case of an exception?
> 
> Thawan Kooburat wrote:
>     IOException can be thrown from  
>     
>     hasNext = itr.next();
>     
>     which for prefetching the next element. 
>     
>     I think we only need to set hasNext to false when we see exception, so the client won't try to call next() again
>
> 
> Raul Gutierrez Segales wrote:
>     Record#serialize() can also throw IOException. Are callers of this method expected to deal with a Proposal that isn't properly initialized (i.e.: won't have the QuorumPacket member won't be initialized so might have wrong zxid, type, etc)?
> 
> Thawan Kooburat wrote:
>     itr.next() is the only place that actually perform IO-related operation. 
>     
>     Record#serialize() can thrown IO exception if the output stream is writing to network or disk. However, in this case, we are writing it to memory because we are using BinaryOutputArchive/ByteArrayOutputStream.  I looked at the code, these implementation cannot throw IOException but they have to inherit the method signature of output stream.
> 
> fpj wrote:
>     My concern here is that we simply log the error and keep making progress regularly, that's why I suggested we could try to propagate an exception up, but it is not completely straightforward because of the reasons I stated. If a server acks a txn but later cannot read it, then we might end up in an inconsistent state.
> 
> Thawan Kooburat wrote:
>     From synchronization perspective, if the leader encounter a possible hole or corrupted txnlog, it will have to abort by clear the txn queue which is going to be sent to the learner. Then, it can fall back to sending a snapshot.
>     
>     The txnlog facility need to be able to guarantee there properties but i don't think we have that yet.  I think we only do per-entry checksum and throws IOexception because of disk read, checksum or deserialization failure.
> 
> fpj wrote:
>     It is a bit worse than what you're saying. If we miss txns because the log is corrupt, then we might end up in a situation in which we don't have a majority of servers containing a given txn for an operation that has been replied. In such a case, we may end up starting an epoch missing some txns that we have replied to. 
>     
>     If you think that the problem is deeper (and it might be) and we need to do more, then perhaps we need to discuss it in another jira. From your response, you prefer not to propagate an exception up the call chain.

This depends on how we decide to handle the error.  With the current code, it will be easier but slower to verify the integrity of the entire txnlog file during db.getProposalsFromTxnLog() call,  so the leader won't enter try to use txnlog at all and fallback to use snapshot. If we decide to load txnlog on-demand, we need a way to propagate the exception and abort the DIFF attempt.

Right now, I am not sure if we have a way to verify the integrity of txnlog file.  For example, we should have a notion that a file is already closed and it should contains txn [1..10].  If we get IOException before reading txn 10, its means that txnlog is corrupted. If a file is not close, it means that this is the latest file so failing to reading txn 10 may a result of a partial write, so it is ok to ignore.

I think there are 2 main problems that can cause txnlog to fail integrity check needed by this feature: the file is partially corrupt or the operator didn't copied all the txnlog file. We may have to have a separate JIRA to deal with these problem.    


- Thawan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
-----------------------------------------------------------


On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11231/
> -----------------------------------------------------------
> 
> (Updated May 20, 2013, 5:40 a.m.)
> 
> 
> Review request for zookeeper.
> 
> 
> Description
> -------
> 
> ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
> 
> - Use txnlog for learner synchronization if learner fall too far behind
> - Refactoring LearnerHandler to deal with different cases of handling learner synchronization  
> 
> 
> This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> 
> 
> Diffs
> -----
> 
>   /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
>   /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1483440 
>   /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1483440 
>   /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1483440 
> 
> Diff: https://reviews.apache.org/r/11231/diff/
> 
> 
> Testing
> -------
> 
> - unit tests
> - ran in prod for more than half a year
> 
> 
> Thanks,
> 
> Thawan Kooburat
> 
>


Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

Posted by Thawan Kooburat <th...@fb.com>.

> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java, line 67
> > <https://reviews.apache.org/r/11231/diff/1/?file=293955#file293955line67>
> >
> >     Ideally we propagate the exception up, but it is incompatible with the signature of the method unless we throw NoSuchElementException. What if we return null in the case of an exception?

IOException can be thrown from  

hasNext = itr.next();

which for prefetching the next element. 

I think we only need to set hasNext to false when we see exception, so the client won't try to call next() again


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/ZKDatabase.java, line 82
> > <https://reviews.apache.org/r/11231/diff/1/?file=293956#file293956line82>
> >
> >     1/3 seems reasonable but arbitrary. I'm curious to know if there is a concrete reason behind the choice or if it just feels right for the default.

This require some tuning. I will comment in the JIRA since this is related to other diff as well


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java, line 523
> > <https://reviews.apache.org/r/11231/diff/1/?file=293957#file293957line523>
> >
> >     What if we do the fast forwarding in this method instead of having a boolean passed to init?

sure, I can refactor it


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java, line 545
> > <https://reviews.apache.org/r/11231/diff/1/?file=293957#file293957line545>
> >
> >     This fast forward parameter is a bit odd. Need to check why that's necessary and if there is an alternative.

>From learner synchronization perspective, the leader what to send only txns which has zxid > peerLastSeenZxid.  I have to do this either in this layer or in the LearnerHandler. Doing in in this layer is easier to write a unit test and reduce code complexity in LearnerHandler. 


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java, line 200
> > <https://reviews.apache.org/r/11231/diff/1/?file=293958#file293958line200>
> >
> >     Should we call this method and the one below something like readTxnLog?

This is an existing method, I think we should only rename a method only if its semantic has changed or the name is really miss leading.  


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 677
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line677>
> >
> >     I think this means to say that there is no txn for this epoch yet because the leader of the epoch is just starting, is it right?

Any zxid should point to a valid txn.  However, this zxid point to NEWLEADER txn which is never persist to txnlog. If I don't have special handling for this case, the leader will think that this learner has a txn which it has never seen. So it will send TRUNC message to the learner and cause it to crash. 

Some follower may report that it has this zxid, if it gets a snapshot when the quorum is recently formed or get NEWLEADER package as part of outstanding proposals.  

You may also noticed that this patch only changes the leader side. This is intentional because I need to support rolling upgrade.   


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 699
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line699>
> >
> >     Make the log info conditional or use curly braces.

You mean that I should use this? http://www.slf4j.org/apidocs/org/slf4j/Logger.html#info%28java.lang.String,%20java.lang.Object...%29 


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 726
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line726>
> >
> >     I'm not sure I understand the second part of this case. Is this referring to a follower that is starting with a clean state?

See my comment about NEWLEADER zxid above 


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 806
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line806>
> >
> >     This comment implies that if peerLastZxid == maxZxid, we still queue maxZxid, but it is not necessary, right? I have a related comment in one of the test cases.

If there is a better math notation, let me know. This method won't queue maxZxid in this case. 


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 815
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line815>
> >
> >     suggested name for the method: queueCommittedProposals

no problem


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 861
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line861>
> >
> >     I'm confused about this case. The only way I can think of right now of falling into this case is if the follower has tried to join an epoch that has never become established. 
> >     
> >     Why does the learner crash?

This is also a case when quorum is formed and it has never see any request.  Then, we restart the quorum.


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 866
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line866>
> >
> >     This seems to be more than a warn, right? It should be either error or fatal.

Should be WARN or INFO because we just send SNAP whenever we have a case that we cannot handle. It is correct but not optimized.  


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java, line 216
> > <https://reviews.apache.org/r/11231/diff/1/?file=293961#file293961line216>
> >
> >     I'm wondering why we should expect to have a packet in the queue, the peer is already synced. As I understand this is the case in which (peerLaxtZxid, maxZxid] = (1, 1].

DIFF packet


- Thawan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
-----------------------------------------------------------


On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11231/
> -----------------------------------------------------------
> 
> (Updated May 20, 2013, 5:40 a.m.)
> 
> 
> Review request for zookeeper.
> 
> 
> Description
> -------
> 
> ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
> 
> - Use txnlog for learner synchronization if learner fall too far behind
> - Refactoring LearnerHandler to deal with different cases of handling learner synchronization  
> 
> 
> This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> 
> 
> Diffs
> -----
> 
>   /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
>   /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1483440 
>   /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1483440 
>   /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1483440 
> 
> Diff: https://reviews.apache.org/r/11231/diff/
> 
> 
> Testing
> -------
> 
> - unit tests
> - ran in prod for more than half a year
> 
> 
> Thanks,
> 
> Thawan Kooburat
> 
>


Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

Posted by Raul Gutierrez Segales <rg...@itevenworks.net>.

> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java, line 67
> > <https://reviews.apache.org/r/11231/diff/1/?file=293955#file293955line67>
> >
> >     Ideally we propagate the exception up, but it is incompatible with the signature of the method unless we throw NoSuchElementException. What if we return null in the case of an exception?
> 
> Thawan Kooburat wrote:
>     IOException can be thrown from  
>     
>     hasNext = itr.next();
>     
>     which for prefetching the next element. 
>     
>     I think we only need to set hasNext to false when we see exception, so the client won't try to call next() again
>

Record#serialize() can also throw IOException. Are callers of this method expected to deal with a Proposal that isn't properly initialized (i.e.: won't have the QuorumPacket member won't be initialized so might have wrong zxid, type, etc)?


- Raul


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
-----------------------------------------------------------


On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11231/
> -----------------------------------------------------------
> 
> (Updated May 20, 2013, 5:40 a.m.)
> 
> 
> Review request for zookeeper.
> 
> 
> Description
> -------
> 
> ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
> 
> - Use txnlog for learner synchronization if learner fall too far behind
> - Refactoring LearnerHandler to deal with different cases of handling learner synchronization  
> 
> 
> This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> 
> 
> Diffs
> -----
> 
>   /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
>   /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1483440 
>   /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1483440 
>   /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1483440 
> 
> Diff: https://reviews.apache.org/r/11231/diff/
> 
> 
> Testing
> -------
> 
> - unit tests
> - ran in prod for more than half a year
> 
> 
> Thanks,
> 
> Thawan Kooburat
> 
>


Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

Posted by fp...@apache.org.

> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java, line 545
> > <https://reviews.apache.org/r/11231/diff/1/?file=293957#file293957line545>
> >
> >     This fast forward parameter is a bit odd. Need to check why that's necessary and if there is an alternative.
> 
> Thawan Kooburat wrote:
>     From learner synchronization perspective, the leader what to send only txns which has zxid > peerLastSeenZxid.  I have to do this either in this layer or in the LearnerHandler. Doing in in this layer is easier to write a unit test and reduce code complexity in LearnerHandler. 
>     
>

Ok, I think my comment is mostly about style, I rarely see an init method with parameters, but there is nothing really broken about doing it.


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java, line 200
> > <https://reviews.apache.org/r/11231/diff/1/?file=293958#file293958line200>
> >
> >     Should we call this method and the one below something like readTxnLog?
> 
> Thawan Kooburat wrote:
>     This is an existing method, I think we should only rename a method only if its semantic has changed or the name is really miss leading.

I think the read method you say exists is in FileTxnLog. My comment is about having a read() method in FileTxnSnapLog because the read could be either for a txn log or a snapshot. I don't think we would be renaming an existing method, but perhaps I'm not getting your comment right.


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 677
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line677>
> >
> >     I think this means to say that there is no txn for this epoch yet because the leader of the epoch is just starting, is it right?
> 
> Thawan Kooburat wrote:
>     Any zxid should point to a valid txn.  However, this zxid point to NEWLEADER txn which is never persist to txnlog. If I don't have special handling for this case, the leader will think that this learner has a txn which it has never seen. So it will send TRUNC message to the learner and cause it to crash. 
>     
>     Some follower may report that it has this zxid, if it gets a snapshot when the quorum is recently formed or get NEWLEADER package as part of outstanding proposals.  
>     
>     You may also noticed that this patch only changes the leader side. This is intentional because I need to support rolling upgrade.

Ok, sounds good.


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 699
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line699>
> >
> >     Make the log info conditional or use curly braces.
> 
> Thawan Kooburat wrote:
>     You mean that I should use this? http://www.slf4j.org/apidocs/org/slf4j/Logger.html#info%28java.lang.String,%20java.lang.Object...%29 
>

Yep!


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 726
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line726>
> >
> >     I'm not sure I understand the second part of this case. Is this referring to a follower that is starting with a clean state?
> 
> Thawan Kooburat wrote:
>     See my comment about NEWLEADER zxid above

Ok.


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 806
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line806>
> >
> >     This comment implies that if peerLastZxid == maxZxid, we still queue maxZxid, but it is not necessary, right? I have a related comment in one of the test cases.
> 
> Thawan Kooburat wrote:
>     If there is a better math notation, let me know. This method won't queue maxZxid in this case.

No, I think this is ok, I just wanted to make sure I had the right understanding. 


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 861
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line861>
> >
> >     I'm confused about this case. The only way I can think of right now of falling into this case is if the follower has tried to join an epoch that has never become established. 
> >     
> >     Why does the learner crash?
> 
> Thawan Kooburat wrote:
>     This is also a case when quorum is formed and it has never see any request.  Then, we restart the quorum.

When you say that a quorum is formed, which point of the protocol you're referring to? When a quorum has acknowledged NEWLEADER or some earlier point?


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 866
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line866>
> >
> >     This seems to be more than a warn, right? It should be either error or fatal.
> 
> Thawan Kooburat wrote:
>     Should be WARN or INFO because we just send SNAP whenever we have a case that we cannot handle. It is correct but not optimized.

You're right, but the logic doesn't seem completely straightforward to me. Is it the case that we end up sending a snapshot because of the special case handled by this if block? 

        if (needOpPacket && !needSnap) {
            // This should never happen, but we should fall back to sending
            // snapshot just in case.

            LOG.error("Unhandled scenario for peer sid: " +  getSid() +
                     " fall back to use snapshot");

            needSnap = true;
        }

If this is right, why do we need special handling instead of just capturing the fact we need a snapshot from the return value of queuedCommittedProposal?


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java, line 216
> > <https://reviews.apache.org/r/11231/diff/1/?file=293961#file293961line216>
> >
> >     I'm wondering why we should expect to have a packet in the queue, the peer is already synced. As I understand this is the case in which (peerLaxtZxid, maxZxid] = (1, 1].
> 
> Thawan Kooburat wrote:
>     DIFF packet

Ok.


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java, line 67
> > <https://reviews.apache.org/r/11231/diff/1/?file=293955#file293955line67>
> >
> >     Ideally we propagate the exception up, but it is incompatible with the signature of the method unless we throw NoSuchElementException. What if we return null in the case of an exception?
> 
> Thawan Kooburat wrote:
>     IOException can be thrown from  
>     
>     hasNext = itr.next();
>     
>     which for prefetching the next element. 
>     
>     I think we only need to set hasNext to false when we see exception, so the client won't try to call next() again
>
> 
> Raul Gutierrez Segales wrote:
>     Record#serialize() can also throw IOException. Are callers of this method expected to deal with a Proposal that isn't properly initialized (i.e.: won't have the QuorumPacket member won't be initialized so might have wrong zxid, type, etc)?
> 
> Thawan Kooburat wrote:
>     itr.next() is the only place that actually perform IO-related operation. 
>     
>     Record#serialize() can thrown IO exception if the output stream is writing to network or disk. However, in this case, we are writing it to memory because we are using BinaryOutputArchive/ByteArrayOutputStream.  I looked at the code, these implementation cannot throw IOException but they have to inherit the method signature of output stream.

My concern here is that we simply log the error and keep making progress regularly, that's why I suggested we could try to propagate an exception up, but it is not completely straightforward because of the reasons I stated. If a server acks a txn but later cannot read it, then we might end up in an inconsistent state.  


- fpj


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
-----------------------------------------------------------


On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11231/
> -----------------------------------------------------------
> 
> (Updated May 20, 2013, 5:40 a.m.)
> 
> 
> Review request for zookeeper.
> 
> 
> Description
> -------
> 
> ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
> 
> - Use txnlog for learner synchronization if learner fall too far behind
> - Refactoring LearnerHandler to deal with different cases of handling learner synchronization  
> 
> 
> This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> 
> 
> Diffs
> -----
> 
>   /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
>   /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1483440 
>   /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1483440 
>   /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1483440 
> 
> Diff: https://reviews.apache.org/r/11231/diff/
> 
> 
> Testing
> -------
> 
> - unit tests
> - ran in prod for more than half a year
> 
> 
> Thanks,
> 
> Thawan Kooburat
> 
>


Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

Posted by Thawan Kooburat <th...@fb.com>.

> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java, line 200
> > <https://reviews.apache.org/r/11231/diff/1/?file=293958#file293958line200>
> >
> >     Should we call this method and the one below something like readTxnLog?
> 
> Thawan Kooburat wrote:
>     This is an existing method, I think we should only rename a method only if its semantic has changed or the name is really miss leading.
> 
> fpj wrote:
>     I think the read method you say exists is in FileTxnLog. My comment is about having a read() method in FileTxnSnapLog because the read could be either for a txn log or a snapshot. I don't think we would be renaming an existing method, but perhaps I'm not getting your comment right.

Yes, you are right. I was confused with FileTxnLog.read(). Sure, I can rename this one. 


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 866
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line866>
> >
> >     This seems to be more than a warn, right? It should be either error or fatal.
> 
> Thawan Kooburat wrote:
>     Should be WARN or INFO because we just send SNAP whenever we have a case that we cannot handle. It is correct but not optimized.
> 
> fpj wrote:
>     You're right, but the logic doesn't seem completely straightforward to me. Is it the case that we end up sending a snapshot because of the special case handled by this if block? 
>     
>             if (needOpPacket && !needSnap) {
>                 // This should never happen, but we should fall back to sending
>                 // snapshot just in case.
>     
>                 LOG.error("Unhandled scenario for peer sid: " +  getSid() +
>                          " fall back to use snapshot");
>     
>                 needSnap = true;
>             }
>     
>     If this is right, why do we need special handling instead of just capturing the fact we need a snapshot from the return value of queuedCommittedProposal?
>

This block just act as a safety net to send SNAP if there is case that we forgot to handle (in addition to this one).  When I thought about this carefully, I am not sure if I ever see this log line (Cannot send TRUNC to peer sid...) in our prod or not. 


> On May 24, 2013, 12:19 a.m., fpj wrote:
> > /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, line 861
> > <https://reviews.apache.org/r/11231/diff/1/?file=293960#file293960line861>
> >
> >     I'm confused about this case. The only way I can think of right now of falling into this case is if the follower has tried to join an epoch that has never become established. 
> >     
> >     Why does the learner crash?
> 
> Thawan Kooburat wrote:
>     This is also a case when quorum is formed and it has never see any request.  Then, we restart the quorum.
> 
> fpj wrote:
>     When you say that a quorum is formed, which point of the protocol you're referring to? When a quorum has acknowledged NEWLEADER or some earlier point?

I think the case that prompt me to add NewEpochZxid-related stuff is below .   Here the leader has seen the follower transaction T(epoch, zxid)   

1. T(1, 4)
2. T(2, 0)  - NEWLEADER
3. T(3, 0)  - NEWLEADER
4. T(3, 1)

If a follower joined a quorum at epoch 2 and die. Its peerLastZxid can be T(2,0) if it get a snapshot from the leader, or it gets NEWLEADER packet via outstanding proposals.

When the same follower try to join the quorum again, the leader will try to ask the follower to TRUNC to T(1,4) which is not possible because there is no txnlog to truncate and the learner will crash. 

The fix is to send a DIFF starting at T(1,4) instead.  However, I think I added a block that handle (Cannot send TRUNC to peer ...) just for safety based on the observation that TRUNC can only work if it is within the same epoch.   


- Thawan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
-----------------------------------------------------------


On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11231/
> -----------------------------------------------------------
> 
> (Updated May 20, 2013, 5:40 a.m.)
> 
> 
> Review request for zookeeper.
> 
> 
> Description
> -------
> 
> ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
> 
> - Use txnlog for learner synchronization if learner fall too far behind
> - Refactoring LearnerHandler to deal with different cases of handling learner synchronization  
> 
> 
> This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> 
> 
> Diffs
> -----
> 
>   /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
>   /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1483440 
>   /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1483440 
>   /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1483440 
> 
> Diff: https://reviews.apache.org/r/11231/diff/
> 
> 
> Testing
> -------
> 
> - unit tests
> - ran in prod for more than half a year
> 
> 
> Thanks,
> 
> Thawan Kooburat
> 
>


Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

Posted by fp...@apache.org.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20941
-----------------------------------------------------------


Awesome job, Thawan. I have some comments and questions below, most of them are minor.


/src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java
<https://reviews.apache.org/r/11231/#comment43273>

    "... provides an..."



/src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java
<https://reviews.apache.org/r/11231/#comment43288>

    Ideally we propagate the exception up, but it is incompatible with the signature of the method unless we throw NoSuchElementException. What if we return null in the case of an exception?



/src/java/main/org/apache/zookeeper/server/ZKDatabase.java
<https://reviews.apache.org/r/11231/#comment43174>

    1/3 seems reasonable but arbitrary. I'm curious to know if there is a concrete reason behind the choice or if it just feels right for the default.



/src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java
<https://reviews.apache.org/r/11231/#comment43310>

    What if we do the fast forwarding in this method instead of having a boolean passed to init?



/src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java
<https://reviews.apache.org/r/11231/#comment43178>

    This fast forward parameter is a bit odd. Need to check why that's necessary and if there is an alternative.



/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
<https://reviews.apache.org/r/11231/#comment43315>

    Should we call this method and the one below something like readTxnLog?



/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
<https://reviews.apache.org/r/11231/#comment43325>

    I think this means to say that there is no txn for this epoch yet because the leader of the epoch is just starting, is it right?



/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
<https://reviews.apache.org/r/11231/#comment43326>

    Make the log info conditional or use curly braces.



/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
<https://reviews.apache.org/r/11231/#comment43272>

    typo



/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
<https://reviews.apache.org/r/11231/#comment43327>

    I'm not sure I understand the second part of this case. Is this referring to a follower that is starting with a clean state?



/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
<https://reviews.apache.org/r/11231/#comment43328>

    "send send"



/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
<https://reviews.apache.org/r/11231/#comment43349>

    This comment implies that if peerLastZxid == maxZxid, we still queue maxZxid, but it is not necessary, right? I have a related comment in one of the test cases.



/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
<https://reviews.apache.org/r/11231/#comment43346>

    Zab



/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
<https://reviews.apache.org/r/11231/#comment43329>

    suggested name for the method: queueCommittedProposals



/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
<https://reviews.apache.org/r/11231/#comment43340>

    ... when we see the...



/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
<https://reviews.apache.org/r/11231/#comment43335>

    I'm confused about this case. The only way I can think of right now of falling into this case is if the follower has tried to join an epoch that has never become established. 
    
    Why does the learner crash?



/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
<https://reviews.apache.org/r/11231/#comment43333>

    ... if it is asked to do so...



/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
<https://reviews.apache.org/r/11231/#comment43332>

    This seems to be more than a warn, right? It should be either error or fatal.



/src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java
<https://reviews.apache.org/r/11231/#comment43338>

    I'm wondering why we should expect to have a packet in the queue, the peer is already synced. As I understand this is the case in which (peerLaxtZxid, maxZxid] = (1, 1].


- fpj


On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11231/
> -----------------------------------------------------------
> 
> (Updated May 20, 2013, 5:40 a.m.)
> 
> 
> Review request for zookeeper.
> 
> 
> Description
> -------
> 
> ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
> 
> - Use txnlog for learner synchronization if learner fall too far behind
> - Refactoring LearnerHandler to deal with different cases of handling learner synchronization  
> 
> 
> This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> 
> 
> Diffs
> -----
> 
>   /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
>   /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1483440 
>   /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1483440 
>   /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1483440 
> 
> Diff: https://reviews.apache.org/r/11231/diff/
> 
> 
> Testing
> -------
> 
> - unit tests
> - ran in prod for more than half a year
> 
> 
> Thanks,
> 
> Thawan Kooburat
> 
>


Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

Posted by fp...@apache.org.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review21111
-----------------------------------------------------------



/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
<https://reviews.apache.org/r/11231/#comment43844>

    I think there is a read call in FileTxnLog, but not in FileTxnSnapLog. My recommendation is mostly because the read in FileTxnSnapLog could be both of the txn logs or snapshots. I don't think you would be renaming a method, but perhaps I'm not getting it right.  


- fpj


On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11231/
> -----------------------------------------------------------
> 
> (Updated May 20, 2013, 5:40 a.m.)
> 
> 
> Review request for zookeeper.
> 
> 
> Description
> -------
> 
> ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
> 
> - Use txnlog for learner synchronization if learner fall too far behind
> - Refactoring LearnerHandler to deal with different cases of handling learner synchronization  
> 
> 
> This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> 
> 
> Diffs
> -----
> 
>   /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
>   /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1483440 
>   /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1483440 
>   /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1483440 
> 
> Diff: https://reviews.apache.org/r/11231/diff/
> 
> 
> Testing
> -------
> 
> - unit tests
> - ran in prod for more than half a year
> 
> 
> Thanks,
> 
> Thawan Kooburat
> 
>


Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

Posted by Thawan Kooburat <th...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/
-----------------------------------------------------------

(Updated June 24, 2013, 8:40 a.m.)


Review request for zookeeper.


Changes
-------

- Move to new SLF4J.  The new version (1.7) require JDK5.  (ZK require 1.6 at the moment).  This is required to support new method signature. I can split this into a separate jira as well if needed
- Fix other comments and refactoring 


Description
-------

ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

- Use txnlog for learner synchronization if learner fall too far behind
- Refactoring LearnerHandler to deal with different cases of handling learner synchronization  


This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
    https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413


Diffs (updated)
-----

  /ivy.xml 1495522 
  /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
  /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1495522 
  /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1495522 
  /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1495522 
  /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1495522 
  /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1495522 
  /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
  /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1495522 
  /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
  /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1495522 

Diff: https://reviews.apache.org/r/11231/diff/


Testing
-------

- unit tests
- ran in prod for more than half a year


Thanks,

Thawan Kooburat


Re: Review Request: ZOOKEEPER-1413: Use on-disk transaction log for learner sync up

Posted by Thawan Kooburat <th...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11231/#review20979
-----------------------------------------------------------


- Thawan Kooburat


On May 20, 2013, 5:40 a.m., Thawan Kooburat wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11231/
> -----------------------------------------------------------
> 
> (Updated May 20, 2013, 5:40 a.m.)
> 
> 
> Review request for zookeeper.
> 
> 
> Description
> -------
> 
> ZOOKEEPER-1413: Use on-disk transaction log for learner sync up
> 
> - Use txnlog for learner synchronization if learner fall too far behind
> - Refactoring LearnerHandler to deal with different cases of handling learner synchronization  
> 
> 
> This addresses bug https://issues.apache.org/jira/browse/ZOOKEEPER-1413.
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/ZOOKEEPER-1413
> 
> 
> Diffs
> -----
> 
>   /src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java PRE-CREATION 
>   /src/java/main/org/apache/zookeeper/server/ZKDatabase.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java 1483440 
>   /src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1483440 
>   /src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java 1483440 
>   /src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java PRE-CREATION 
>   /src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1483440 
> 
> Diff: https://reviews.apache.org/r/11231/diff/
> 
> 
> Testing
> -------
> 
> - unit tests
> - ran in prod for more than half a year
> 
> 
> Thanks,
> 
> Thawan Kooburat
> 
>