You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Ivan Kelly (Updated) (JIRA)" <ji...@apache.org> on 2012/02/02 19:18:53 UTC

[jira] [Updated] (BOOKKEEPER-152) Can't recover a ledger whose current ensemble contain failed bookie.

     [ https://issues.apache.org/jira/browse/BOOKKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Kelly updated BOOKKEEPER-152:
----------------------------------

    Attachment: BOOKKEEPER-152.diff

Proposed fix ensures that at least one of each quorum replies to ReadLastConfirmed. 

Refactors code a bit to make the read last confirmed common for recovery and standalone read last confirmed.

The bug here was actually that we were waiting for quorumSize responses, from the bookies, when really all we need to get a response from one bookie in each possible quorum. in the 2/2 case as above this means only 1 bookie need response.

There's a fix for the timeouts and an improvement in fencing which fixing this uncovered.
                
> Can't recover a ledger whose current ensemble contain failed bookie.
> --------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-152
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-152
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-152.draft.patch, BOOKKEEPER-152.diff
>
>
> Suppose we have a unclosed ledger L, whose ensemble size is 2, quorum size is 2. the ledger's current ensemble is <bk1, bk2>.
> bk2 is crashed. 
> we use recovery tool to recover entries in bk2. $ bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools bk2 
> recovery failed due to recovery tool can't open ledger L, since ledger L doesn't have enough quorum to readLastConfirmed entry. (asyncOpenLedgerNoRecovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira