You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Sijie Guo (Created) (JIRA)" <ji...@apache.org> on 2012/01/11 11:11:43 UTC
[jira] [Created] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Can't recover a ledger whose current ensemble contain failed bookie.
--------------------------------------------------------------------
Key: BOOKKEEPER-152
URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
Project: Bookkeeper
Issue Type: Bug
Components: bookkeeper-client
Affects Versions: 4.0.0
Reporter: Sijie Guo
Fix For: 4.1.0
Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
bk2 is crashed.
we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199057#comment-13199057 ]
jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/
-----------------------------------------------------------
Review request for bookkeeper.
Summary
-------
Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.
Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.
The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.
There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
This addresses bug BOOKKEEPER-152.
https://issues.apache.org/jira/browse/BOOKKEEPER-152
Diffs
-----
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java 547e240
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747
bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java ded1379
bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java 8526db5
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java db1a763
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69
Diff: https://reviews.apache.org/r/3737/diff
Testing
-------
Thanks,
Ivan
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "Sijie Guo (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198751#comment-13198751 ]
Sijie Guo commented on BOOKKEEPER-152:
--------------------------------------
> modify LedgerRecoveryOp to access timeout as a valid response
If I remember correctly, if a bookie is crashed, the operation would return could not connect exception not timeout exception.
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206801#comment-13206801 ]
jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/#review5050
-----------------------------------------------------------
Ship it!
+1
- Sijie
On 2012-02-13 10:18:44, Ivan Kelly wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3737/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-13 10:18:44)
bq.
bq.
bq. Review request for bookkeeper.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.
bq.
bq. Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.
bq.
bq. The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.
bq.
bq. There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
bq.
bq.
bq. This addresses bug BOOKKEEPER-152.
bq. https://issues.apache.org/jira/browse/BOOKKEEPER-152
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DigestManager.java ae375ec
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69
bq.
bq. Diff: https://reviews.apache.org/r/3737/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Ivan
bq.
bq.
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205330#comment-13205330 ]
jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/#review5006
-----------------------------------------------------------
most is good to me. but it seems that you didn't assign right last confirmed value in readLastConfirmedOp callback.
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java
<https://reviews.apache.org/r/3737/#comment11009>
actually the result is not maxAddConfirmed, is lastAddConfirmed. maxAddConfirmed is member parameter in LedgerRecoveryOp.
- Sijie
On 2012-02-10 09:47:35, Ivan Kelly wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3737/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-10 09:47:35)
bq.
bq.
bq. Review request for bookkeeper.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.
bq.
bq. Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.
bq.
bq. The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.
bq.
bq. There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
bq.
bq.
bq. This addresses bug BOOKKEEPER-152.
bq. https://issues.apache.org/jira/browse/BOOKKEEPER-152
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d
bq.
bq. Diff: https://reviews.apache.org/r/3737/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Ivan
bq.
bq.
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sijie Guo updated BOOKKEEPER-152:
---------------------------------
Attachment: BK-152.draft.patch
attach a draft patch.
the idea is to add a excluded list of bookies when openLedgerNoRecovery. If the excluded bookie is in the quorum set which maxAddConfirmed entry belongs to, bookie client just needs to wait quorumSize-n responses when readLastConfirmed.
then in recovery tool, we can pass the failed bookie as excluded bookies.
how is your opinion?
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "Ivan Kelly (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Kelly updated BOOKKEEPER-152:
----------------------------------
Attachment: BOOKKEEPER-152.diff
Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.
Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.
The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.
There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "Ivan Kelly (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Kelly updated BOOKKEEPER-152:
----------------------------------
Attachment: BOOKKEEPER-152.diff
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "Ivan Kelly (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Kelly updated BOOKKEEPER-152:
----------------------------------
Attachment: BOOKKEEPER-152.diff
Brought up to trunk
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206832#comment-13206832 ]
jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/
-----------------------------------------------------------
(Updated 2012-02-13 12:16:12.769117)
Review request for bookkeeper.
Summary
-------
Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.
Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.
The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.
There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
This addresses bug BOOKKEEPER-152.
https://issues.apache.org/jira/browse/BOOKKEEPER-152
Diffs (updated)
-----
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DigestManager.java ae375ec
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java d6ade83
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java db2f782
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java f5f0523
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747
bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java 0064e24
bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69
Diff: https://reviews.apache.org/r/3737/diff
Testing
-------
Thanks,
Ivan
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206788#comment-13206788 ]
jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/
-----------------------------------------------------------
(Updated 2012-02-13 10:18:44.480209)
Review request for bookkeeper.
Summary
-------
Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.
Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.
The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.
There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
This addresses bug BOOKKEEPER-152.
https://issues.apache.org/jira/browse/BOOKKEEPER-152
Diffs (updated)
-----
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DigestManager.java ae375ec
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747
bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c
bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69
Diff: https://reviews.apache.org/r/3737/diff
Testing
-------
Thanks,
Ivan
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205349#comment-13205349 ]
jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/
-----------------------------------------------------------
(Updated 2012-02-10 10:33:25.397382)
Review request for bookkeeper.
Summary
-------
Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.
Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.
The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.
There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
This addresses bug BOOKKEEPER-152.
https://issues.apache.org/jira/browse/BOOKKEEPER-152
Diffs (updated)
-----
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747
bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c
bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d
Diff: https://reviews.apache.org/r/3737/diff
Testing
-------
Thanks,
Ivan
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "Ivan Kelly (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Kelly updated BOOKKEEPER-152:
----------------------------------
Attachment: BOOKKEEPER-152.diff
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198687#comment-13198687 ]
Ivan Kelly commented on BOOKKEEPER-152:
---------------------------------------
I think a simpler solution to this is to modify LedgerRecoveryOp to access timeout as a valid response. Then the quorum checking should take care of the rest. I'll test this hypothesis later.
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198709#comment-13198709 ]
Ivan Kelly commented on BOOKKEEPER-152:
---------------------------------------
It appears that ReadLastConfirmedOp isn't actually correct. It duplicates code from RecoverLedgerOp but doesn't check that all quorums have replied, just that the number of responses is the same as quorum size. In practice, this isn't a big issue, but the code for checking the last confirmed entry should be common, so I'll merge them.
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205498#comment-13205498 ]
jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/
-----------------------------------------------------------
(Updated 2012-02-10 15:23:20.842439)
Review request for bookkeeper.
Summary
-------
Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.
Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.
The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.
There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
This addresses bug BOOKKEEPER-152.
https://issues.apache.org/jira/browse/BOOKKEEPER-152
Diffs (updated)
-----
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DigestManager.java ae375ec
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747
bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c
bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69
Diff: https://reviews.apache.org/r/3737/diff
Testing
-------
Thanks,
Ivan
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "Ivan Kelly (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Kelly updated BOOKKEEPER-152:
----------------------------------
Attachment: BOOKKEEPER-152.diff
Hopefully the final patch for this JIRA. Resolves the conflicts with BOOKKEEPER-162.
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205350#comment-13205350 ]
jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/#review5007
-----------------------------------------------------------
Ship it!
+1. thanks Ivan.
- Sijie
On 2012-02-10 10:33:25, Ivan Kelly wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3737/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-10 10:33:25)
bq.
bq.
bq. Review request for bookkeeper.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.
bq.
bq. Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.
bq.
bq. The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.
bq.
bq. There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
bq.
bq.
bq. This addresses bug BOOKKEEPER-152.
bq. https://issues.apache.org/jira/browse/BOOKKEEPER-152
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d
bq.
bq. Diff: https://reviews.apache.org/r/3737/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Ivan
bq.
bq.
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "Sijie Guo (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206858#comment-13206858 ]
Sijie Guo commented on BOOKKEEPER-152:
--------------------------------------
yeah. I am ok with the final patch :) +1
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "Sijie Guo (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202260#comment-13202260 ]
Sijie Guo commented on BOOKKEEPER-152:
--------------------------------------
BOOKKEEPER-163 and BOOKKEEPER-164 has been created to prevent incorrect responses to avoid reading wrong last confirmed. so I think we can let the jira go first, since the bug here is more related to logic of reading last confirmed as Ivan stated.
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206370#comment-13206370 ]
jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/#review5029
-----------------------------------------------------------
ah, thanks Ivan for fixing the wrong assignment of ledger length.
the new patch is good to me. but I don't like the new callback name 'RecoveredDataCallback', which sounds like that the data is returned by some recovery actions although it doesn't. This data is retrieved by reading last confirmed without recovery. I prefer changing this callback to 'ReadLastConfirmedDataCallback' and the method to 'readLastConfirmedDataComplete'.
- Sijie
On 2012-02-10 15:23:20, Ivan Kelly wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3737/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-10 15:23:20)
bq.
bq.
bq. Review request for bookkeeper.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.
bq.
bq. Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.
bq.
bq. The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.
bq.
bq. There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
bq.
bq.
bq. This addresses bug BOOKKEEPER-152.
bq. https://issues.apache.org/jira/browse/BOOKKEEPER-152
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DigestManager.java ae375ec
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747
bq. bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255
bq. bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69
bq.
bq. Diff: https://reviews.apache.org/r/3737/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Ivan
bq.
bq.
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205319#comment-13205319 ]
jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/
-----------------------------------------------------------
(Updated 2012-02-10 09:47:35.723185)
Review request for bookkeeper.
Summary
-------
Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.
Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.
The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.
There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
This addresses bug BOOKKEEPER-152.
https://issues.apache.org/jira/browse/BOOKKEEPER-152
Diffs (updated)
-----
bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c
bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d
Diff: https://reviews.apache.org/r/3737/diff
Testing
-------
Thanks,
Ivan
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-152) Can't recover a ledger whose
current ensemble contain failed bookie.
Posted by "Ivan Kelly (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Kelly updated BOOKKEEPER-152:
----------------------------------
Attachment: BOOKKEEPER-152.diff
The previous patch wasn't passing all tests. LedgerRecoveryTest was failing. This one fixes that problem.
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
> Key: BOOKKEEPER-152
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.0.0
> Reporter: Sijie Guo
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed.
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira