You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Sijie Guo (Created) (JIRA)" <ji...@apache.org> on 2012/01/11 11:11:43 UTC

[jira] [Created] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Can't recover a ledger whose current ensemble contain failed bookie.
--------------------------------------------------------------------

                 Key: BOOKKEEPER-152
                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
             Project: Bookkeeper
          Issue Type: Bug
          Components: bookkeeper-client
    Affects Versions: 4.0.0
            Reporter: Sijie Guo
             Fix For: 4.1.0


Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.

bk2 is crashed. 

we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 

recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)










--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199057#comment-13199057 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/
-----------------------------------------------------------

Review request for bookkeeper.


Summary
-------

Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.

Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.

The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.

There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.


This addresses bug BOOKKEEPER-152.
    https://issues.apache.org/jira/browse/BOOKKEEPER-152


Diffs
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java 547e240 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java ded1379 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java 8526db5 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java db1a763 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69 

Diff: https://reviews.apache.org/r/3737/diff


Testing
-------


Thanks,

Ivan


                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "Sijie Guo (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198751#comment-13198751 ] 

Sijie Guo commented on BOOKKEEPER-152:
--------------------------------------

> modify LedgerRecoveryOp to access timeout as a valid response

If I remember correctly, if a bookie is crashed, the operation would return could not connect exception not timeout exception.
                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206801#comment-13206801 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/#review5050
-----------------------------------------------------------

Ship it!


+1

- Sijie


On 2012-02-13 10:18:44, Ivan Kelly wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3737/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-13 10:18:44)
bq.  
bq.  
bq.  Review request for bookkeeper.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.
bq.  
bq.  Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.
bq.  
bq.  The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.
bq.  
bq.  There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
bq.  
bq.  
bq.  This addresses bug BOOKKEEPER-152.
bq.      https://issues.apache.org/jira/browse/BOOKKEEPER-152
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DigestManager.java ae375ec 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69 
bq.  
bq.  Diff: https://reviews.apache.org/r/3737/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ivan
bq.  
bq.


                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205330#comment-13205330 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/#review5006
-----------------------------------------------------------


most is good to me. but it seems that you didn't assign right last confirmed value in readLastConfirmedOp callback.


bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java
<https://reviews.apache.org/r/3737/#comment11009>

    actually the result is not maxAddConfirmed, is lastAddConfirmed. maxAddConfirmed is member parameter in LedgerRecoveryOp.


- Sijie


On 2012-02-10 09:47:35, Ivan Kelly wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3737/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-10 09:47:35)
bq.  
bq.  
bq.  Review request for bookkeeper.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.
bq.  
bq.  Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.
bq.  
bq.  The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.
bq.  
bq.  There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
bq.  
bq.  
bq.  This addresses bug BOOKKEEPER-152.
bq.      https://issues.apache.org/jira/browse/BOOKKEEPER-152
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d 
bq.  
bq.  Diff: https://reviews.apache.org/r/3737/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ivan
bq.  
bq.


                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-152:
---------------------------------

    Attachment: BK-152.draft.patch

attach a draft patch.

the idea is to add a excluded list of bookies when openLedgerNoRecovery. If the excluded bookie is in the quorum set which maxAddConfirmed entry belongs to, bookie client just needs to wait quorumSize-n responses when readLastConfirmed.

then in recovery tool, we can pass the failed bookie as excluded bookies.

how is your opinion?

                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "Ivan Kelly (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Kelly updated BOOKKEEPER-152:
----------------------------------

    Attachment: BOOKKEEPER-152.diff

Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed. 

Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.

The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.

There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "Ivan Kelly (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Kelly updated BOOKKEEPER-152:
----------------------------------

    Attachment: BOOKKEEPER-152.diff
    
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "Ivan Kelly (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Kelly updated BOOKKEEPER-152:
----------------------------------

    Attachment: BOOKKEEPER-152.diff

Brought up to trunk
                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206832#comment-13206832 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/
-----------------------------------------------------------

(Updated 2012-02-13 12:16:12.769117)


Review request for bookkeeper.


Summary
-------

Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.

Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.

The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.

There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.


This addresses bug BOOKKEEPER-152.
    https://issues.apache.org/jira/browse/BOOKKEEPER-152


Diffs (updated)
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DigestManager.java ae375ec 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java d6ade83 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java db2f782 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java f5f0523 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java 0064e24 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69 

Diff: https://reviews.apache.org/r/3737/diff


Testing
-------


Thanks,

Ivan


                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206788#comment-13206788 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/
-----------------------------------------------------------

(Updated 2012-02-13 10:18:44.480209)


Review request for bookkeeper.


Summary
-------

Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.

Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.

The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.

There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.


This addresses bug BOOKKEEPER-152.
    https://issues.apache.org/jira/browse/BOOKKEEPER-152


Diffs (updated)
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DigestManager.java ae375ec 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69 

Diff: https://reviews.apache.org/r/3737/diff


Testing
-------


Thanks,

Ivan


                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205349#comment-13205349 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/
-----------------------------------------------------------

(Updated 2012-02-10 10:33:25.397382)


Review request for bookkeeper.


Summary
-------

Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.

Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.

The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.

There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.


This addresses bug BOOKKEEPER-152.
    https://issues.apache.org/jira/browse/BOOKKEEPER-152


Diffs (updated)
-----

  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d 

Diff: https://reviews.apache.org/r/3737/diff


Testing
-------


Thanks,

Ivan


                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "Ivan Kelly (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Kelly updated BOOKKEEPER-152:
----------------------------------

    Attachment: BOOKKEEPER-152.diff
    
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198687#comment-13198687 ] 

Ivan Kelly commented on BOOKKEEPER-152:
---------------------------------------

I think a simpler solution to this is to modify LedgerRecoveryOp to access timeout as a valid response. Then the quorum checking should take care of the rest. I'll test this hypothesis later.
                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198709#comment-13198709 ] 

Ivan Kelly commented on BOOKKEEPER-152:
---------------------------------------

It appears that ReadLastConfirmedOp isn't actually correct. It duplicates code from RecoverLedgerOp but doesn't check that all quorums have replied, just that the number of responses is the same as quorum size. In practice, this isn't a big issue, but the code for checking the last confirmed entry should be common, so I'll merge them.
                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205498#comment-13205498 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/
-----------------------------------------------------------

(Updated 2012-02-10 15:23:20.842439)


Review request for bookkeeper.


Summary
-------

Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.

Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.

The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.

There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.


This addresses bug BOOKKEEPER-152.
    https://issues.apache.org/jira/browse/BOOKKEEPER-152


Diffs (updated)
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DigestManager.java ae375ec 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69 

Diff: https://reviews.apache.org/r/3737/diff


Testing
-------


Thanks,

Ivan


                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "Ivan Kelly (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Kelly updated BOOKKEEPER-152:
----------------------------------

    Attachment: BOOKKEEPER-152.diff

Hopefully the final patch for this JIRA. Resolves the conflicts with BOOKKEEPER-162.
                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205350#comment-13205350 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/#review5007
-----------------------------------------------------------

Ship it!


+1. thanks Ivan.

- Sijie


On 2012-02-10 10:33:25, Ivan Kelly wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3737/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-10 10:33:25)
bq.  
bq.  
bq.  Review request for bookkeeper.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.
bq.  
bq.  Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.
bq.  
bq.  The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.
bq.  
bq.  There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
bq.  
bq.  
bq.  This addresses bug BOOKKEEPER-152.
bq.      https://issues.apache.org/jira/browse/BOOKKEEPER-152
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d 
bq.  
bq.  Diff: https://reviews.apache.org/r/3737/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ivan
bq.  
bq.


                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "Sijie Guo (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206858#comment-13206858 ] 

Sijie Guo commented on BOOKKEEPER-152:
--------------------------------------

yeah. I am ok with the final patch :) +1
                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "Sijie Guo (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202260#comment-13202260 ] 

Sijie Guo commented on BOOKKEEPER-152:
--------------------------------------

BOOKKEEPER-163 and BOOKKEEPER-164 has been created to prevent incorrect responses to avoid reading wrong last confirmed. so I think we can let the jira go first, since the bug here is more related to logic of reading last confirmed as Ivan stated. 
                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206370#comment-13206370 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/#review5029
-----------------------------------------------------------


ah, thanks Ivan for fixing the wrong assignment of ledger length. 
the new patch is good to me. but I don't like the new callback name 'RecoveredDataCallback', which sounds like that the data is returned by some recovery actions although it doesn't. This data is retrieved by reading last confirmed without recovery. I prefer changing this callback to 'ReadLastConfirmedDataCallback' and the method to 'readLastConfirmedDataComplete'. 

- Sijie


On 2012-02-10 15:23:20, Ivan Kelly wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3737/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-10 15:23:20)
bq.  
bq.  
bq.  Review request for bookkeeper.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.
bq.  
bq.  Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.
bq.  
bq.  The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.
bq.  
bq.  There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
bq.  
bq.  
bq.  This addresses bug BOOKKEEPER-152.
bq.      https://issues.apache.org/jira/browse/BOOKKEEPER-152
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DigestManager.java ae375ec 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747 
bq.    bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255 
bq.    bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69 
bq.  
bq.  Diff: https://reviews.apache.org/r/3737/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ivan
bq.  
bq.


                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205319#comment-13205319 ] 

jiraposter@reviews.apache.org commented on BOOKKEEPER-152:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3737/
-----------------------------------------------------------

(Updated 2012-02-10 09:47:35.723185)


Review request for bookkeeper.


Summary
-------

Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed.

Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.

The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.

There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.


This addresses bug BOOKKEEPER-152.
    https://issues.apache.org/jira/browse/BOOKKEEPER-152


Diffs (updated)
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java a68fc8c 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java cbd2277 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java da52ca5 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookieFailureTest.java 5873255 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java 77a2f69 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RoundRobinDistributionSchedule.java 4a88747 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DistributionSchedule.java f2ed6bd 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java e3d1847 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d 

Diff: https://reviews.apache.org/r/3737/diff


Testing
-------


Thanks,

Ivan


                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

Posted by "Ivan Kelly (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Kelly updated BOOKKEEPER-152:
----------------------------------

    Attachment: BOOKKEEPER-152.diff

The previous patch wasn't passing all tests. LedgerRecoveryTest was failing. This one fixes that problem.
                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira