You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Sijie Guo (Created) (JIRA)" <ji...@apache.org> on 2012/01/03 15:44:39 UTC

[jira] [Created] (BOOKKEEPER-150) Entry is lost when recovering a ledger with not enough bookies.

Entry is lost when recovering a ledger with not enough bookies.
---------------------------------------------------------------

                 Key: BOOKKEEPER-150
                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-150
             Project: Bookkeeper
          Issue Type: Bug
          Components: bookkeeper-client
    Affects Versions: 4.0.0
            Reporter: Sijie Guo
            Assignee: Sijie Guo
             Fix For: 4.1.0


suppose a ledger is created as ensemble size 3 and quorum size 3.
3 entries is added in this ledger, entry ids are 0, 1, 2.

this ledger is not closed. then a bookie server is down.

the ledger is opened. it would be recovered in following steps:
1) retrieve LAC from all bookie ensemble to get maxAddConfirmed. then maxAddPushed would be 2 and maxAddConfirmed would be 1. then lastAddConfirmed would be 1.
2) doRecovery read lastAddConfirmed + 1 (2). it would return right data since there is still 2 replicas.
3) doRecovery add entry 2. but it would fail since there is not enough bookies to form a new ensemble.
4) this ledger will be closed with lastAddConfirmed (1). entry 2 will be lost.

this issue happened in hub server. old ledger will be recovered and closed when changing ownership. so published messages would be lost.

we should not close ledger when we encountered exception during recovery adding, otherwise we would lose entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (BOOKKEEPER-150) Entry is lost when recovering a ledger with not enough bookies.

Posted by "Ivan Kelly (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Kelly reassigned BOOKKEEPER-150:
-------------------------------------

    Assignee: Ivan Kelly  (was: Sijie Guo)
    
> Entry is lost when recovering a ledger with not enough bookies.
> ---------------------------------------------------------------
>
>                 Key: BOOKKEEPER-150
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-150
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BOOKKEEPER-150.patch
>
>
> suppose a ledger is created as ensemble size 3 and quorum size 3.
> 3 entries is added in this ledger, entry ids are 0, 1, 2.
> this ledger is not closed. then a bookie server is down.
> the ledger is opened. it would be recovered in following steps:
> 1) retrieve LAC from all bookie ensemble to get maxAddConfirmed. then maxAddPushed would be 2 and maxAddConfirmed would be 1. then lastAddConfirmed would be 1.
> 2) doRecovery read lastAddConfirmed + 1 (2). it would return right data since there is still 2 replicas.
> 3) doRecovery add entry 2. but it would fail since there is not enough bookies to form a new ensemble.
> 4) this ledger will be closed with lastAddConfirmed (1). entry 2 will be lost.
> this issue happened in hub server. old ledger will be recovered and closed when changing ownership. so published messages would be lost.
> we should not close ledger when we encountered exception during recovery adding, otherwise we would lose entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-150) Entry is lost when recovering a ledger with not enough bookies.

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-150:
---------------------------------

    Attachment: BOOKKEEPER-150.patch

Attach a patch. including a test case to reproduce this issue and the code to fix it.
                
> Entry is lost when recovering a ledger with not enough bookies.
> ---------------------------------------------------------------
>
>                 Key: BOOKKEEPER-150
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-150
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BOOKKEEPER-150.patch
>
>
> suppose a ledger is created as ensemble size 3 and quorum size 3.
> 3 entries is added in this ledger, entry ids are 0, 1, 2.
> this ledger is not closed. then a bookie server is down.
> the ledger is opened. it would be recovered in following steps:
> 1) retrieve LAC from all bookie ensemble to get maxAddConfirmed. then maxAddPushed would be 2 and maxAddConfirmed would be 1. then lastAddConfirmed would be 1.
> 2) doRecovery read lastAddConfirmed + 1 (2). it would return right data since there is still 2 replicas.
> 3) doRecovery add entry 2. but it would fail since there is not enough bookies to form a new ensemble.
> 4) this ledger will be closed with lastAddConfirmed (1). entry 2 will be lost.
> this issue happened in hub server. old ledger will be recovered and closed when changing ownership. so published messages would be lost.
> we should not close ledger when we encountered exception during recovery adding, otherwise we would lose entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-150) Entry is lost when recovering a ledger with not enough bookies.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187788#comment-13187788 ] 

Hudson commented on BOOKKEEPER-150:
-----------------------------------

Integrated in bookkeeper-trunk #320 (See [https://builds.apache.org/job/bookkeeper-trunk/320/])
    BOOKKEEPER-150: Entry is lost when recovering a ledger with not enough bookies. (Sijie Guo via ivank)

ivank : 
Files : 
* /zookeeper/bookkeeper/trunk/CHANGES.txt
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerRecoveryTest.java

                
> Entry is lost when recovering a ledger with not enough bookies.
> ---------------------------------------------------------------
>
>                 Key: BOOKKEEPER-150
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-150
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BOOKKEEPER-150.patch
>
>
> suppose a ledger is created as ensemble size 3 and quorum size 3.
> 3 entries is added in this ledger, entry ids are 0, 1, 2.
> this ledger is not closed. then a bookie server is down.
> the ledger is opened. it would be recovered in following steps:
> 1) retrieve LAC from all bookie ensemble to get maxAddConfirmed. then maxAddPushed would be 2 and maxAddConfirmed would be 1. then lastAddConfirmed would be 1.
> 2) doRecovery read lastAddConfirmed + 1 (2). it would return right data since there is still 2 replicas.
> 3) doRecovery add entry 2. but it would fail since there is not enough bookies to form a new ensemble.
> 4) this ledger will be closed with lastAddConfirmed (1). entry 2 will be lost.
> this issue happened in hub server. old ledger will be recovered and closed when changing ownership. so published messages would be lost.
> we should not close ledger when we encountered exception during recovery adding, otherwise we would lose entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-150) Entry is lost when recovering a ledger with not enough bookies.

Posted by "Flavio Junqueira (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179063#comment-13179063 ] 

Flavio Junqueira commented on BOOKKEEPER-150:
---------------------------------------------

I agree, it sounds wrong to close the ledger in that case.
                
> Entry is lost when recovering a ledger with not enough bookies.
> ---------------------------------------------------------------
>
>                 Key: BOOKKEEPER-150
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-150
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BOOKKEEPER-150.patch
>
>
> suppose a ledger is created as ensemble size 3 and quorum size 3.
> 3 entries is added in this ledger, entry ids are 0, 1, 2.
> this ledger is not closed. then a bookie server is down.
> the ledger is opened. it would be recovered in following steps:
> 1) retrieve LAC from all bookie ensemble to get maxAddConfirmed. then maxAddPushed would be 2 and maxAddConfirmed would be 1. then lastAddConfirmed would be 1.
> 2) doRecovery read lastAddConfirmed + 1 (2). it would return right data since there is still 2 replicas.
> 3) doRecovery add entry 2. but it would fail since there is not enough bookies to form a new ensemble.
> 4) this ledger will be closed with lastAddConfirmed (1). entry 2 will be lost.
> this issue happened in hub server. old ledger will be recovered and closed when changing ownership. so published messages would be lost.
> we should not close ledger when we encountered exception during recovery adding, otherwise we would lose entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira