You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Sijie Guo (JIRA)" <ji...@apache.org> on 2012/12/12 06:47:27 UTC

[jira] [Updated] (BOOKKEEPER-365) Ledger will never recover if one of the quorum bookie is down forever and others dont have entry

     [ https://issues.apache.org/jira/browse/BOOKKEEPER-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-365:
---------------------------------

    Attachment: BOOKKEEPER-365.diff

Attach a patch tries to address the issue separating write_quorum_size and ack_quorum_size.

the patch includes:

1) a test case produce the issue separating write_quorum_size and ack_quorum_size.
2) move LedgerRecoveryTest to client package, for ease to access ledger metadata. and it also make more sense that ledger recovery is a client procedure.
3) the fix check how many missed read (NoSuchLedger/NoSuchEntry) we had received and if it reaches more than (write_quorum_size - ack_quorum_size + 1) and there was no sensible exceptions (like ReadException) found during reading. we treated the entry is missed, return submit NoSuchEntry to client.
                
> Ledger will never recover if one of the quorum bookie is down forever and others dont have entry
> ------------------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-365
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-365
>             Project: Bookkeeper
>          Issue Type: Bug
>    Affects Versions: 4.0.0, 4.1.0
>            Reporter: Sijie Guo
>            Assignee: Yixue (Andrew) Zhu
>             Fix For: 4.2.0
>
>         Attachments: BOOKKEEPER-365.diff
>
>
> As discussed in BOOKKEEPER-355, current fix to handle the below issue is not correct. Need to find out new solution
> If some bookies of a quorum gone forever, some bookies of this quorum are still alive but doesn't have that entry (NoSuchEntry or NoSuchLedger), then the ledger doesn't have any evidence to recovery/close it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira