You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org> on 2011/10/25 17:46:32 UTC

[jira] [Commented] (BOOKKEEPER-93) bookkeeper doesn't work correctly on OpenLedgerNoRecovery

    [ https://issues.apache.org/jira/browse/BOOKKEEPER-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135159#comment-13135159 ] 

Ivan Kelly commented on BOOKKEEPER-93:
--------------------------------------

1) Yikes, that's a big oversight. There is actually a test for it, BookieReadWriteTest#testReadFromOpenLedger, but the @Test annotation is missing from it so it never gets run. Also, the actual checking code seems to be wrong, as it tries to read from lh, not lhOpen (line 861). Could you break the fix for this problem into a single patch along with the fix for the test and ill commit that as BOOKKEEPER-91. 

2) This is unrelated to 1) so should be in a separate JIRA. Also, im unsure the race you describe can occur. ReadLastConfirmedOp#readEntryComplete is already synchronized.

3) Actually this could go into BOOKKEEPER-91. However, I think a better solution may be to do a ReadLastConfirmedOp in the else part of LedgerOpenOp#processResult. 
{code}
        if(!unsafe) {
            lh.recover(new GenericCallback<Void>() {
            @Override
            public void operationComplete(int rc, Void result) {
                if (rc != BKException.Code.OK) {
                    cb.openComplete(BKException.Code.LedgerRecoveryException, null, LedgerOpenOp.this.ctx);
                } else {
                    cb.openComplete(BKException.Code.OK, lh, LedgerOpenOp.this.ctx);
                }
            }
       } else {
           lh.asyncReadLastConfirmed(new ReadLastConfirmedCallback() {
               void readLastConfirmedComplete(int rc, long lastConfirmed, Object ctx) {
                   lh.lastAddConfirmed = lh.lastAddPushed = lastConfirmed;
                   cb.complete(rc, LedgerOpenOp.this.ctx);
               }
           });
       }
{code}

This way, a non recovery ledger will be able to read entries up to the point it was opened and no further. I think this should be correct behaviour, as otherwise it could be possible for the ledger to read an entry which hasn't been confirmed to the writer. If it hasn't been confirmed to the writer and the writer closes at that point. Which means the reader can read more than the writer, which I don't think affects correctness, but is a little ugly.
                
> bookkeeper doesn't work correctly on OpenLedgerNoRecovery
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-93
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-93
>             Project: Bookkeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 3.4.0
>
>         Attachments: bookkeeper-93.patch
>
>
> 1) bookkeeper hang when openLedgerNoRecovery, since LedgerOpenOp didn't trigger callback when opening ledger no recovery.
> 2) race condition in ReadLastConfirmOp
> ReadLastConfirmOp callback on readEntryComplete.
> a) first decrement numResponsePending
> b) then increment validResponses
> c) check validResponses to callback with OK
> b) check numResponsePending to callback with LedgerRecoveryException
> support two callbacks returns on readEntryComplete: A, B. (quorum/ensemble size : 2)
> a) A first decrement numResponsePending from 2 to 1.
> b) A increment validResponses from 0 to 1.
> c) B then decrement numResponsePending from 1 to 0.
> d) A check numResponsePending before B check validResponse, A found the numResponsePending is 0 now. A will callback with exception. But the right action is B check validResponse and callback with OK.
> 3) if an LegerHandle is opened by openLedgerNoRecovery, the lastAddConfirmed will be set to -1. so all read requests will be failed since readEntry id > lastAddConfirmed.
> so I suggested that if an LegerHandle is opened by openLegerNoRecovery, the ledgerHandle is under unsafeRead mode. close/write operations will be failed, read operations should not check condition entry_id > lastAddConfirmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira