You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Sijie Guo (JIRA)" <ji...@apache.org> on 2012/11/01 01:23:12 UTC

[jira] [Commented] (BOOKKEEPER-447) Bookie can fail to recover if index pages flushed before ledger flush acknowledged

    [ https://issues.apache.org/jira/browse/BOOKKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488373#comment-13488373 ] 

Sijie Guo commented on BOOKKEEPER-447:
--------------------------------------

{quote}
Not reading data that has not been persisted can be achieved without having to delay inserting to the index or the log files.
{quote}

Changing the order of committing to journal and  adding to ledger storage doesn't affect the time that an entry to be readable from BookKeeper client. since there was a semantic guarantee in BookKeeper that a client would not read an entry before the entry has been Acked succeed. Ack means the entry should be committed to journal at least before responding to client.
                
> Bookie can fail to recover if index pages flushed before ledger flush acknowledged
> ----------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-447
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-447
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.2.0
>            Reporter: Yixue (Andrew) Zhu
>            Assignee: Robin Dhamankar
>              Labels: patch
>             Fix For: 4.2.0
>
>
> Bookie index page steal (LedgerCacheImpl::grabCleanPage) can cause index file to reflect unacknowledged entries (due to flushLedger). Suppose ledger and entry fail to flush due to Bookkeeper server crash, it will cause ledger recovery not able to use the bookie afterward, due to InterleavedStorageLedger::getEntry throws IOException.
> If the ackSet bookies all experience this problem (DC environment), the ledger will not be able to recover.
> The problem here essentially a violation of WAL. One reasonable fix is to track ledger flush progress (either per-ledger entry, or per-topic message). Do not flush index pages which tracks entries whose ledger (log) has not been flushed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira