You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Sijie Guo (JIRA)" <ji...@apache.org> on 2012/12/13 20:38:11 UTC

[jira] [Commented] (BOOKKEEPER-355) Ledger recovery will mark ledger as closed with -1, in case of slow bookie is added to ensemble during recovery add

    [ https://issues.apache.org/jira/browse/BOOKKEEPER-355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13531373#comment-13531373 ] 

Sijie Guo commented on BOOKKEEPER-355:
--------------------------------------

two minor questions about ledger close need you to clarify.

after this change, two clients openLedger and tried ledger recovery at the same time. so these two clients might recover this ledger with different ensembles. during closing, there are two chances:

1) client1 closed the ledger succeed. client2 tried to update ledger metadata with its recovered ensemble but failed. client2 reread ledger metadata and find the conflicts between these two ledger metadata could be resolved. so client2 update the ledger metadata with its version, which overwrite client1's.

for example, ensemble (A, B, C) is changed to (A, B, D) by client 1, overwritten to (A, B, E) by client 2.
so in client 1, it still see the ensemble as (A, B, D) not (A, B, E). both (A, B, D) and (A, B, E) are right from the logic. but these two different reads seems different views. could it be a problem? if not, we need to clarify it.

2) if the confliction between these two ledger metadata could not be resolved. one client would fail with openLedger. then the behavior of two current readers of a un-closed ledger is undefined as 1). so the question is if the ledger is closed by other client, should we need to fail the client trying opening the ledger? If not, we need to improve the close logic. otherwise, we need clarify the undefined behavior.

for example, ensemble N => (A, B, C) is changed to N => (A, B, D) and M => (A, B, E) (N < M) in client 1. ensemble N => (A, B, C) is changed to N => (A, B, E) in client 2.



                
> Ledger recovery will mark ledger as closed with -1, in case of slow bookie is added to ensemble during  recovery add
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-355
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-355
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.1.0, 4.2.0
>            Reporter: Vinay
>            Assignee: Ivan Kelly
>             Fix For: 4.2.0
>
>         Attachments: 0001-BOOKKEEPER-355-Ledger-recovery-will-mark-ledger-as-c.patch, BOOKKEEPER-355.patch, BOOKKEEPER-355.patch
>
>
> Scenario:
> ------------
> 1. Ledger is created with ensemble and quorum size as 2, written with one entry
> 2. Now first bookie is in the ensemble is made down.
> 3. Another client fence and trying to recover the same ledger
> 4. During this time ensemble change will happen and new bookie will be added. But this bookie is not able to connect.
> 5. This recovery will fail.
> 7. Now previously added bookie came up.
> 8. Another client trying to recover the same ledger.
> 9. Since new bookie is first in the ensemble, doRecoveryRead() is reading from that bookie and getting NoSuchLedgerException and closing the ledger with -1
> i.e. Marking the ledger as empty, even though first client had successfully written one entry.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira