You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Sijie Guo (Created) (JIRA)" <ji...@apache.org> on 2011/10/25 16:46:32 UTC

[jira] [Created] (BOOKKEEPER-93) bookkeeper doesn't work correctly on OpenLedgerNoRecovery

bookkeeper doesn't work correctly on OpenLedgerNoRecovery
---------------------------------------------------------

                 Key: BOOKKEEPER-93
                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-93
             Project: Bookkeeper
          Issue Type: Bug
    Affects Versions: 3.4.0
            Reporter: Sijie Guo
            Assignee: Sijie Guo


1) bookkeeper hang when openLedgerNoRecovery, since LedgerOpenOp didn't trigger callback when opening ledger no recovery.

2) race condition in ReadLastConfirmOp

ReadLastConfirmOp callback on readEntryComplete.
a) first decrement numResponsePending
b) then increment validResponses
c) check validResponses to callback with OK
b) check numResponsePending to callback with LedgerRecoveryException

support two callbacks returns on readEntryComplete: A, B. (quorum/ensemble size : 2)

a) A first decrement numResponsePending from 2 to 1.
b) A increment validResponses from 0 to 1.
c) B then decrement numResponsePending from 1 to 0.
d) A check numResponsePending before B check validResponse, A found the numResponsePending is 0 now. A will callback with exception. But the right action is B check validResponse and callback with OK.

3) if an LegerHandle is opened by openLedgerNoRecovery, the lastAddConfirmed will be set to -1. so all read requests will be failed since readEntry id > lastAddConfirmed.

so I suggested that if an LegerHandle is opened by openLegerNoRecovery, the ledgerHandle is under unsafeRead mode. close/write operations will be failed, read operations should not check condition entry_id > lastAddConfirmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-93) bookkeeper doesn't work correctly on OpenLedgerNoRecovery

Posted by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135846#comment-13135846 ] 

Ivan Kelly commented on BOOKKEEPER-93:
--------------------------------------

My previous comment was incomplete. The changes should be tested also. The whole reason the bug exists is a lack of testing in the first place. The easiest thing is to simply extend the BookieReadWriteTest for his case to ensure that add fails on lhOpen, and that the ledger metadata isn't closed after lhOpen is called.

Im still confused by the callback issue on readLastConfirmedOp. The only scenario where the callback can be called twice is where it recieves more responses than it has requests made. This discussion should continue on BOOKKEEPER-94.
                
> bookkeeper doesn't work correctly on OpenLedgerNoRecovery
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-93
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-93
>             Project: Bookkeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 3.4.0
>
>         Attachments: bookkeeper-93.patch, bookkeeper-93_v2.patch
>
>
> 1) bookkeeper hang when openLedgerNoRecovery, since LedgerOpenOp didn't trigger callback when opening ledger no recovery.
> 2) race condition in ReadLastConfirmOp
> ReadLastConfirmOp callback on readEntryComplete.
> a) first decrement numResponsePending
> b) then increment validResponses
> c) check validResponses to callback with OK
> b) check numResponsePending to callback with LedgerRecoveryException
> support two callbacks returns on readEntryComplete: A, B. (quorum/ensemble size : 2)
> a) A first decrement numResponsePending from 2 to 1.
> b) A increment validResponses from 0 to 1.
> c) B then decrement numResponsePending from 1 to 0.
> d) A check numResponsePending before B check validResponse, A found the numResponsePending is 0 now. A will callback with exception. But the right action is B check validResponse and callback with OK.
> 3) if an LegerHandle is opened by openLedgerNoRecovery, the lastAddConfirmed will be set to -1. so all read requests will be failed since readEntry id > lastAddConfirmed.
> so I suggested that if an LegerHandle is opened by openLegerNoRecovery, the ledgerHandle is under unsafeRead mode. close/write operations will be failed, read operations should not check condition entry_id > lastAddConfirmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-93) bookkeeper doesn't work correctly on OpenLedgerNoRecovery

Posted by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137278#comment-13137278 ] 

Ivan Kelly commented on BOOKKEEPER-93:
--------------------------------------

+1

Committed as r1189867, Thanks Sijie.
                
> bookkeeper doesn't work correctly on OpenLedgerNoRecovery
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-93
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-93
>             Project: Bookkeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 3.4.0
>
>         Attachments: bookkeeper-93.patch, bookkeeper-93_v2.patch, bookkeeper-93_v3.patch
>
>
> 1) bookkeeper hang when openLedgerNoRecovery, since LedgerOpenOp didn't trigger callback when opening ledger no recovery.
> 2) race condition in ReadLastConfirmOp
> ReadLastConfirmOp callback on readEntryComplete.
> a) first decrement numResponsePending
> b) then increment validResponses
> c) check validResponses to callback with OK
> b) check numResponsePending to callback with LedgerRecoveryException
> support two callbacks returns on readEntryComplete: A, B. (quorum/ensemble size : 2)
> a) A first decrement numResponsePending from 2 to 1.
> b) A increment validResponses from 0 to 1.
> c) B then decrement numResponsePending from 1 to 0.
> d) A check numResponsePending before B check validResponse, A found the numResponsePending is 0 now. A will callback with exception. But the right action is B check validResponse and callback with OK.
> 3) if an LegerHandle is opened by openLedgerNoRecovery, the lastAddConfirmed will be set to -1. so all read requests will be failed since readEntry id > lastAddConfirmed.
> so I suggested that if an LegerHandle is opened by openLegerNoRecovery, the ledgerHandle is under unsafeRead mode. close/write operations will be failed, read operations should not check condition entry_id > lastAddConfirmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-93) bookkeeper doesn't work correctly on OpenLedgerNoRecovery

Posted by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135159#comment-13135159 ] 

Ivan Kelly commented on BOOKKEEPER-93:
--------------------------------------

1) Yikes, that's a big oversight. There is actually a test for it, BookieReadWriteTest#testReadFromOpenLedger, but the @Test annotation is missing from it so it never gets run. Also, the actual checking code seems to be wrong, as it tries to read from lh, not lhOpen (line 861). Could you break the fix for this problem into a single patch along with the fix for the test and ill commit that as BOOKKEEPER-91. 

2) This is unrelated to 1) so should be in a separate JIRA. Also, im unsure the race you describe can occur. ReadLastConfirmedOp#readEntryComplete is already synchronized.

3) Actually this could go into BOOKKEEPER-91. However, I think a better solution may be to do a ReadLastConfirmedOp in the else part of LedgerOpenOp#processResult. 
{code}
        if(!unsafe) {
            lh.recover(new GenericCallback<Void>() {
            @Override
            public void operationComplete(int rc, Void result) {
                if (rc != BKException.Code.OK) {
                    cb.openComplete(BKException.Code.LedgerRecoveryException, null, LedgerOpenOp.this.ctx);
                } else {
                    cb.openComplete(BKException.Code.OK, lh, LedgerOpenOp.this.ctx);
                }
            }
       } else {
           lh.asyncReadLastConfirmed(new ReadLastConfirmedCallback() {
               void readLastConfirmedComplete(int rc, long lastConfirmed, Object ctx) {
                   lh.lastAddConfirmed = lh.lastAddPushed = lastConfirmed;
                   cb.complete(rc, LedgerOpenOp.this.ctx);
               }
           });
       }
{code}

This way, a non recovery ledger will be able to read entries up to the point it was opened and no further. I think this should be correct behaviour, as otherwise it could be possible for the ledger to read an entry which hasn't been confirmed to the writer. If it hasn't been confirmed to the writer and the writer closes at that point. Which means the reader can read more than the writer, which I don't think affects correctness, but is a little ugly.
                
> bookkeeper doesn't work correctly on OpenLedgerNoRecovery
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-93
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-93
>             Project: Bookkeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 3.4.0
>
>         Attachments: bookkeeper-93.patch
>
>
> 1) bookkeeper hang when openLedgerNoRecovery, since LedgerOpenOp didn't trigger callback when opening ledger no recovery.
> 2) race condition in ReadLastConfirmOp
> ReadLastConfirmOp callback on readEntryComplete.
> a) first decrement numResponsePending
> b) then increment validResponses
> c) check validResponses to callback with OK
> b) check numResponsePending to callback with LedgerRecoveryException
> support two callbacks returns on readEntryComplete: A, B. (quorum/ensemble size : 2)
> a) A first decrement numResponsePending from 2 to 1.
> b) A increment validResponses from 0 to 1.
> c) B then decrement numResponsePending from 1 to 0.
> d) A check numResponsePending before B check validResponse, A found the numResponsePending is 0 now. A will callback with exception. But the right action is B check validResponse and callback with OK.
> 3) if an LegerHandle is opened by openLedgerNoRecovery, the lastAddConfirmed will be set to -1. so all read requests will be failed since readEntry id > lastAddConfirmed.
> so I suggested that if an LegerHandle is opened by openLegerNoRecovery, the ledgerHandle is under unsafeRead mode. close/write operations will be failed, read operations should not check condition entry_id > lastAddConfirmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-93) bookkeeper doesn't work correctly on OpenLedgerNoRecovery

Posted by "Sijie Guo (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135611#comment-13135611 ] 

Sijie Guo commented on BOOKKEEPER-93:
-------------------------------------

Ivan,

> 2) This is unrelated to 1) so should be in a separate JIRA. Also, im unsure the race you describe can occur. ReadLastConfirmedOp#readEntryComplete is already synchronized.

You are right. readEntryComplete is synchronized, no race condition on it.

the issue is that readLastConfirmedComplete will be triggered twice.

{code:title=ReadLastConfirmedOp.java|borderStyle=solid}
        // other return codes dont count as valid responses
        if ((validResponses >= lh.metadata.quorumSize) &&
                notComplete) {
            notComplete = false;
            if (LOG.isDebugEnabled()) {
                LOG.debug("Read Complete with enough validResponses");
            }
            cb.readLastConfirmedComplete(BKException.Code.OK, maxAddConfirmed, this.ctx);
            return;
        }

        if (numResponsesPending == 0) {
            // Have got all responses back but was still not enough, just fail the operation
            LOG.error("While readLastConfirmed ledger: " + ledgerId + " did not hear success responses from all quorums");
            cb.readLastConfirmedComplete(BKException.Code.LedgerRecoveryException, maxAddConfirmed, this.ctx);
        }
{code}

The last one will trigger readLastConfirmedComplete no matter there is enough valid responses or not.

{quote}
2011-10-26 09:34:48,874 - DEBUG - [pool-174-thread-1:ReadLastConfirmedOp@90] - Read Complete with enough validResponses
2011-10-26 09:34:48,874 - ERROR - [pool-174-thread-1:ReadLastConfirmedOp@97] - While readLastConfirmed ledger: 1 did not hear success responses from
{quote}
                
> bookkeeper doesn't work correctly on OpenLedgerNoRecovery
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-93
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-93
>             Project: Bookkeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 3.4.0
>
>         Attachments: bookkeeper-93.patch
>
>
> 1) bookkeeper hang when openLedgerNoRecovery, since LedgerOpenOp didn't trigger callback when opening ledger no recovery.
> 2) race condition in ReadLastConfirmOp
> ReadLastConfirmOp callback on readEntryComplete.
> a) first decrement numResponsePending
> b) then increment validResponses
> c) check validResponses to callback with OK
> b) check numResponsePending to callback with LedgerRecoveryException
> support two callbacks returns on readEntryComplete: A, B. (quorum/ensemble size : 2)
> a) A first decrement numResponsePending from 2 to 1.
> b) A increment validResponses from 0 to 1.
> c) B then decrement numResponsePending from 1 to 0.
> d) A check numResponsePending before B check validResponse, A found the numResponsePending is 0 now. A will callback with exception. But the right action is B check validResponse and callback with OK.
> 3) if an LegerHandle is opened by openLedgerNoRecovery, the lastAddConfirmed will be set to -1. so all read requests will be failed since readEntry id > lastAddConfirmed.
> so I suggested that if an LegerHandle is opened by openLegerNoRecovery, the ledgerHandle is under unsafeRead mode. close/write operations will be failed, read operations should not check condition entry_id > lastAddConfirmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-93) bookkeeper doesn't work correctly on OpenLedgerNoRecovery

Posted by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135844#comment-13135844 ] 

Ivan Kelly commented on BOOKKEEPER-93:
--------------------------------------

I see you created BOOKKEEPER-94 for the test change. That change should actually be part of this JIRA. It's part 1) (The two callback changes) which should be in the other JIRA, as it's unrelated, whereas 2) & 3) and the fix to testing is all the same thing.

Regarding 2 & 3, these changes look good. However, I'd change the unsafeRead flag to be called readOnly. Also, add a logging line before the addComplete in asyncAddEntry saying that the client tried to write on a read only ledger handle.
                
> bookkeeper doesn't work correctly on OpenLedgerNoRecovery
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-93
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-93
>             Project: Bookkeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 3.4.0
>
>         Attachments: bookkeeper-93.patch, bookkeeper-93_v2.patch
>
>
> 1) bookkeeper hang when openLedgerNoRecovery, since LedgerOpenOp didn't trigger callback when opening ledger no recovery.
> 2) race condition in ReadLastConfirmOp
> ReadLastConfirmOp callback on readEntryComplete.
> a) first decrement numResponsePending
> b) then increment validResponses
> c) check validResponses to callback with OK
> b) check numResponsePending to callback with LedgerRecoveryException
> support two callbacks returns on readEntryComplete: A, B. (quorum/ensemble size : 2)
> a) A first decrement numResponsePending from 2 to 1.
> b) A increment validResponses from 0 to 1.
> c) B then decrement numResponsePending from 1 to 0.
> d) A check numResponsePending before B check validResponse, A found the numResponsePending is 0 now. A will callback with exception. But the right action is B check validResponse and callback with OK.
> 3) if an LegerHandle is opened by openLedgerNoRecovery, the lastAddConfirmed will be set to -1. so all read requests will be failed since readEntry id > lastAddConfirmed.
> so I suggested that if an LegerHandle is opened by openLegerNoRecovery, the ledgerHandle is under unsafeRead mode. close/write operations will be failed, read operations should not check condition entry_id > lastAddConfirmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-93) bookkeeper doesn't work correctly on OpenLedgerNoRecovery

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-93:
--------------------------------

    Attachment: bookkeeper-93_v2.patch

Thanks for Ivan's suggestions.

fixes:

1) avoid two callbacks when readLastConfirmedOp

2) readLastConfirmedOp to set lastAddConfirmed when opening ledger no recovery. so the entries be read will all confirmed by writter.

3) add unsafeRead in LedgerHandle to avoid close/write on it.
                
> bookkeeper doesn't work correctly on OpenLedgerNoRecovery
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-93
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-93
>             Project: Bookkeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 3.4.0
>
>         Attachments: bookkeeper-93.patch, bookkeeper-93_v2.patch
>
>
> 1) bookkeeper hang when openLedgerNoRecovery, since LedgerOpenOp didn't trigger callback when opening ledger no recovery.
> 2) race condition in ReadLastConfirmOp
> ReadLastConfirmOp callback on readEntryComplete.
> a) first decrement numResponsePending
> b) then increment validResponses
> c) check validResponses to callback with OK
> b) check numResponsePending to callback with LedgerRecoveryException
> support two callbacks returns on readEntryComplete: A, B. (quorum/ensemble size : 2)
> a) A first decrement numResponsePending from 2 to 1.
> b) A increment validResponses from 0 to 1.
> c) B then decrement numResponsePending from 1 to 0.
> d) A check numResponsePending before B check validResponse, A found the numResponsePending is 0 now. A will callback with exception. But the right action is B check validResponse and callback with OK.
> 3) if an LegerHandle is opened by openLedgerNoRecovery, the lastAddConfirmed will be set to -1. so all read requests will be failed since readEntry id > lastAddConfirmed.
> so I suggested that if an LegerHandle is opened by openLegerNoRecovery, the ledgerHandle is under unsafeRead mode. close/write operations will be failed, read operations should not check condition entry_id > lastAddConfirmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-93) bookkeeper doesn't work correctly on OpenLedgerNoRecovery

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-93:
--------------------------------

    Attachment: bookkeeper-93.patch
    
> bookkeeper doesn't work correctly on OpenLedgerNoRecovery
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-93
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-93
>             Project: Bookkeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 3.4.0
>
>         Attachments: bookkeeper-93.patch
>
>
> 1) bookkeeper hang when openLedgerNoRecovery, since LedgerOpenOp didn't trigger callback when opening ledger no recovery.
> 2) race condition in ReadLastConfirmOp
> ReadLastConfirmOp callback on readEntryComplete.
> a) first decrement numResponsePending
> b) then increment validResponses
> c) check validResponses to callback with OK
> b) check numResponsePending to callback with LedgerRecoveryException
> support two callbacks returns on readEntryComplete: A, B. (quorum/ensemble size : 2)
> a) A first decrement numResponsePending from 2 to 1.
> b) A increment validResponses from 0 to 1.
> c) B then decrement numResponsePending from 1 to 0.
> d) A check numResponsePending before B check validResponse, A found the numResponsePending is 0 now. A will callback with exception. But the right action is B check validResponse and callback with OK.
> 3) if an LegerHandle is opened by openLedgerNoRecovery, the lastAddConfirmed will be set to -1. so all read requests will be failed since readEntry id > lastAddConfirmed.
> so I suggested that if an LegerHandle is opened by openLegerNoRecovery, the ledgerHandle is under unsafeRead mode. close/write operations will be failed, read operations should not check condition entry_id > lastAddConfirmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-93) bookkeeper doesn't work correctly on OpenLedgerNoRecovery

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-93:
--------------------------------

    Attachment: bookkeeper-93_v3.patch

attach new patch.

add testing close/write on read only LedgerHandle on BookieReadWriteTest#testReadFromOpenLedger 
                
> bookkeeper doesn't work correctly on OpenLedgerNoRecovery
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-93
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-93
>             Project: Bookkeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 3.4.0
>
>         Attachments: bookkeeper-93.patch, bookkeeper-93_v2.patch, bookkeeper-93_v3.patch
>
>
> 1) bookkeeper hang when openLedgerNoRecovery, since LedgerOpenOp didn't trigger callback when opening ledger no recovery.
> 2) race condition in ReadLastConfirmOp
> ReadLastConfirmOp callback on readEntryComplete.
> a) first decrement numResponsePending
> b) then increment validResponses
> c) check validResponses to callback with OK
> b) check numResponsePending to callback with LedgerRecoveryException
> support two callbacks returns on readEntryComplete: A, B. (quorum/ensemble size : 2)
> a) A first decrement numResponsePending from 2 to 1.
> b) A increment validResponses from 0 to 1.
> c) B then decrement numResponsePending from 1 to 0.
> d) A check numResponsePending before B check validResponse, A found the numResponsePending is 0 now. A will callback with exception. But the right action is B check validResponse and callback with OK.
> 3) if an LegerHandle is opened by openLedgerNoRecovery, the lastAddConfirmed will be set to -1. so all read requests will be failed since readEntry id > lastAddConfirmed.
> so I suggested that if an LegerHandle is opened by openLegerNoRecovery, the ledgerHandle is under unsafeRead mode. close/write operations will be failed, read operations should not check condition entry_id > lastAddConfirmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira