You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Sijie Guo <gu...@gmail.com> on 2012/01/12 12:53:47 UTC

Review Request: BOOKKEEPER-112: Bookie Recovery on an open ledger will cause LedgerHandle#close on that ledger to fail

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/
-----------------------------------------------------------

Review request for bookkeeper.


Summary
-------

Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not get notified of this update, so it will try to write out its own ledger metadata, only to fail with KeeperException.BadVersion. This effectively fences all write operations on the LedgerHandle (close and addEntry). close will fail for obvious reasons. addEntry will fail once it gets to the failed bookie in the schedule, tries to write, fails, selects a new bookie and tries to update ledger metadata.

Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
Also, uncomment addEntry in TestFencing#testFencingInteractionWithBookieRecovery()


This addresses bug BOOKKEEPER-112.
    https://issues.apache.org/jira/browse/BOOKKEEPER-112


Diffs
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java 37623dc 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java 8de20c9 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java c353e46 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java 80e46b9 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java e1e8449 

Diff: https://reviews.apache.org/r/3472/diff


Testing
-------


Thanks,

Sijie


Re: Review Request: BOOKKEEPER-112: Bookie Recovery on an open ledger will cause LedgerHandle#close on that ledger to fail

Posted by Sijie Guo <gu...@gmail.com>.

> On 2012-02-01 15:01:34, Ivan Kelly wrote:
> > I had a quick look over this. The exclude bookies stuff needs to be removed since BOOKKEEPER-23 gets rid of the need for it & it exposes internal details through the public api which is bad. The other fix in here looks ok, but I found it hard to pick it out with the excludeBookies stuff in there also.

 if the failed bookie existed in the quorum set which maxAddConfirmed entry belongs to, readLastConfirmed could not succeed, either open and openNoRecovery would fail. (this issue has been reported in BOOKKEEPER-152, but it seems that it is hard to separate into two patches. so I put them in this patch.)

so recovery tool needs failed bookies information as hints to tell readLastConfirmed to skip failed bookies. I think to change excludedBookies related api to protected, not expose to public.

what is your opinion?


- Sijie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/#review4745
-----------------------------------------------------------


On 2012-01-29 09:58:30, Sijie Guo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3472/
> -----------------------------------------------------------
> 
> (Updated 2012-01-29 09:58:30)
> 
> 
> Review request for bookkeeper.
> 
> 
> Summary
> -------
> 
> Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not get notified of this update, so it will try to write out its own ledger metadata, only to fail with KeeperException.BadVersion. This effectively fences all write operations on the LedgerHandle (close and addEntry). close will fail for obvious reasons. addEntry will fail once it gets to the failed bookie in the schedule, tries to write, fails, selects a new bookie and tries to update ledger metadata.
> 
> Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
> Also, uncomment addEntry in TestFencing#testFencingInteractionWithBookieRecovery()
> 
> 
> This addresses bug BOOKKEEPER-112.
>     https://issues.apache.org/jira/browse/BOOKKEEPER-112
> 
> 
> Diffs
> -----
> 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeper.java 5bb37c3 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java 37623dc 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java 547e240 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java b403aa1 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerOpenOp.java 56186ab 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java 29070eb 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java 8526db5 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java 015e4e4 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerOpenTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/3472/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sijie
> 
>


Re: Review Request: BOOKKEEPER-112: Bookie Recovery on an open ledger will cause LedgerHandle#close on that ledger to fail

Posted by Ivan Kelly <iv...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/#review4745
-----------------------------------------------------------


I had a quick look over this. The exclude bookies stuff needs to be removed since BOOKKEEPER-23 gets rid of the need for it & it exposes internal details through the public api which is bad. The other fix in here looks ok, but I found it hard to pick it out with the excludeBookies stuff in there also.

- Ivan


On 2012-01-29 09:58:30, Sijie Guo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3472/
> -----------------------------------------------------------
> 
> (Updated 2012-01-29 09:58:30)
> 
> 
> Review request for bookkeeper.
> 
> 
> Summary
> -------
> 
> Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not get notified of this update, so it will try to write out its own ledger metadata, only to fail with KeeperException.BadVersion. This effectively fences all write operations on the LedgerHandle (close and addEntry). close will fail for obvious reasons. addEntry will fail once it gets to the failed bookie in the schedule, tries to write, fails, selects a new bookie and tries to update ledger metadata.
> 
> Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
> Also, uncomment addEntry in TestFencing#testFencingInteractionWithBookieRecovery()
> 
> 
> This addresses bug BOOKKEEPER-112.
>     https://issues.apache.org/jira/browse/BOOKKEEPER-112
> 
> 
> Diffs
> -----
> 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeper.java 5bb37c3 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java 37623dc 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java 547e240 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java b403aa1 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerOpenOp.java 56186ab 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java 29070eb 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java 8526db5 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java 015e4e4 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerOpenTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/3472/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sijie
> 
>


Re: Review Request: BOOKKEEPER-112: Bookie Recovery on an open ledger will cause LedgerHandle#close on that ledger to fail

Posted by Ivan Kelly <iv...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/#review5324
-----------------------------------------------------------



bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java
<https://reviews.apache.org/r/3472/#comment11620>

    I don't think going up to lastAddConfirmed + 1 is enough. lastAddConfirmed can trail behind the last written entry by an undefined amount. For example, if we asyncAdd 10 entries, they will each have the same lastAddConfirmed in the packet if the 10th is sent before the 1st is acknowledged.
    
    The main problem is that bookie recovery reads each entry in parallel, while for this last bit, we need to read sequentially. I think after each ledger fragment successfully completes, this should kick off another operation, which steps up from lastAddConfirmed until it gets NoSuchEntry.
    


- Ivan


On 2012-02-24 17:09:54, Sijie Guo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3472/
> -----------------------------------------------------------
> 
> (Updated 2012-02-24 17:09:54)
> 
> 
> Review request for bookkeeper.
> 
> 
> Summary
> -------
> 
> Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not get notified of this update, so it will try to write out its own ledger metadata, only to fail with KeeperException.BadVersion. This effectively fences all write operations on the LedgerHandle (close and addEntry). close will fail for obvious reasons. addEntry will fail once it gets to the failed bookie in the schedule, tries to write, fails, selects a new bookie and tries to update ledger metadata.
> 
> Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
> Also, uncomment addEntry in TestFencing#testFencingInteractionWithBookieRecovery()
> 
> 
> This addresses bug BOOKKEEPER-112.
>     https://issues.apache.org/jira/browse/BOOKKEEPER-112
> 
> 
> Diffs
> -----
> 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java 37623dc 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java a94a0e5 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java b403aa1 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java c67a79c 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java 29070eb 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java 99258ac 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java 015e4e4 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java dada67a 
> 
> Diff: https://reviews.apache.org/r/3472/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sijie
> 
>


Re: Review Request: BOOKKEEPER-112: Bookie Recovery on an open ledger will cause LedgerHandle#close on that ledger to fail

Posted by Ivan Kelly <iv...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/#review5494
-----------------------------------------------------------



bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java
<https://reviews.apache.org/r/3472/#comment11911>

    Why not use the SafeOrderedExecutor as BookKeeper does?



bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java
<https://reviews.apache.org/r/3472/#comment11912>

    bitwise &, I guess you meant && here.



bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java
<https://reviews.apache.org/r/3472/#comment11913>

    I dont like boolean flags like readForward being passed to the constructor here. I think it would be better to have a class called SingleFragmentCallbackWithForwardRead which inherits from SingleFragmentCallback. You can then overload processResult on that.


- Ivan


On 2012-02-28 11:09:19, Sijie Guo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3472/
> -----------------------------------------------------------
> 
> (Updated 2012-02-28 11:09:19)
> 
> 
> Review request for bookkeeper.
> 
> 
> Summary
> -------
> 
> Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not get notified of this update, so it will try to write out its own ledger metadata, only to fail with KeeperException.BadVersion. This effectively fences all write operations on the LedgerHandle (close and addEntry). close will fail for obvious reasons. addEntry will fail once it gets to the failed bookie in the schedule, tries to write, fails, selects a new bookie and tries to update ledger metadata.
> 
> Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
> Also, uncomment addEntry in TestFencing#testFencingInteractionWithBookieRecovery()
> 
> 
> This addresses bug BOOKKEEPER-112.
>     https://issues.apache.org/jira/browse/BOOKKEEPER-112
> 
> 
> Diffs
> -----
> 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java 37623dc 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java a94a0e5 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java b403aa1 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java c67a79c 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java 29070eb 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java 99258ac 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java 015e4e4 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java dada67a 
> 
> Diff: https://reviews.apache.org/r/3472/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sijie
> 
>


Re: Review Request: BOOKKEEPER-112: Bookie Recovery on an open ledger will cause LedgerHandle#close on that ledger to fail

Posted by Ivan Kelly <iv...@apache.org>.

> On 2012-03-29 12:44:18, Ivan Kelly wrote:
> > bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java, line 444
> > <https://reviews.apache.org/r/3472/diff/7/?file=94988#file94988line444>
> >
> >     Perhaps we should close lh here. It's a noop really, as a non recovery lh is a ReadOnlyLedgerHandle, where noop is just a stub. We should close it anyhow, just in case it changes to not being a noop in the future.

otherwise the patch looks good to me. +1


- Ivan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/#review6515
-----------------------------------------------------------


On 2012-03-23 15:30:15, Sijie Guo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3472/
> -----------------------------------------------------------
> 
> (Updated 2012-03-23 15:30:15)
> 
> 
> Review request for bookkeeper.
> 
> 
> Summary
> -------
> 
> Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not get notified of this update, so it will try to write out its own ledger metadata, only to fail with KeeperException.BadVersion. This effectively fences all write operations on the LedgerHandle (close and addEntry). close will fail for obvious reasons. addEntry will fail once it gets to the failed bookie in the schedule, tries to write, fails, selects a new bookie and tries to update ledger metadata.
> 
> Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
> Also, uncomment addEntry in TestFencing#testFencingInteractionWithBookieRecovery()
> 
> 
> This addresses bug BOOKKEEPER-112.
>     https://issues.apache.org/jira/browse/BOOKKEEPER-112
> 
> 
> Diffs
> -----
> 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java 37623dc 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java f71e53f 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java b403aa1 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java c67a79c 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java 539d6b2 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java b8923e8 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java 7de1c10 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookKeeperClusterTestCase.java 0b882c6 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/test/CloseTest.java e28d32c 
> 
> Diff: https://reviews.apache.org/r/3472/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sijie
> 
>


Re: Review Request: BOOKKEEPER-112: Bookie Recovery on an open ledger will cause LedgerHandle#close on that ledger to fail

Posted by Ivan Kelly <iv...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/#review6515
-----------------------------------------------------------



bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java
<https://reviews.apache.org/r/3472/#comment14178>

    Perhaps we should close lh here. It's a noop really, as a non recovery lh is a ReadOnlyLedgerHandle, where noop is just a stub. We should close it anyhow, just in case it changes to not being a noop in the future.


- Ivan


On 2012-03-23 15:30:15, Sijie Guo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3472/
> -----------------------------------------------------------
> 
> (Updated 2012-03-23 15:30:15)
> 
> 
> Review request for bookkeeper.
> 
> 
> Summary
> -------
> 
> Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not get notified of this update, so it will try to write out its own ledger metadata, only to fail with KeeperException.BadVersion. This effectively fences all write operations on the LedgerHandle (close and addEntry). close will fail for obvious reasons. addEntry will fail once it gets to the failed bookie in the schedule, tries to write, fails, selects a new bookie and tries to update ledger metadata.
> 
> Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
> Also, uncomment addEntry in TestFencing#testFencingInteractionWithBookieRecovery()
> 
> 
> This addresses bug BOOKKEEPER-112.
>     https://issues.apache.org/jira/browse/BOOKKEEPER-112
> 
> 
> Diffs
> -----
> 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java 37623dc 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java f71e53f 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java b403aa1 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java c67a79c 
>   bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java 539d6b2 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java b8923e8 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java 7de1c10 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookKeeperClusterTestCase.java 0b882c6 
>   bookkeeper-server/src/test/java/org/apache/bookkeeper/test/CloseTest.java e28d32c 
> 
> Diff: https://reviews.apache.org/r/3472/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sijie
> 
>


Re: Review Request: BOOKKEEPER-112: Bookie Recovery on an open ledger will cause LedgerHandle#close on that ledger to fail

Posted by Sijie Guo <gu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/
-----------------------------------------------------------

(Updated 2012-03-30 12:12:54.459832)


Review request for bookkeeper.


Changes
-------

adding one line to release permit when submit callback in PendingReadOp. similar as BOOKKEEPER-186.


Summary
-------

Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not get notified of this update, so it will try to write out its own ledger metadata, only to fail with KeeperException.BadVersion. This effectively fences all write operations on the LedgerHandle (close and addEntry). close will fail for obvious reasons. addEntry will fail once it gets to the failed bookie in the schedule, tries to write, fails, selects a new bookie and tries to update ledger metadata.

Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
Also, uncomment addEntry in TestFencing#testFencingInteractionWithBookieRecovery()


This addresses bug BOOKKEEPER-112.
    https://issues.apache.org/jira/browse/BOOKKEEPER-112


Diffs (updated)
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java 37623dc 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java f71e53f 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java b403aa1 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java c67a79c 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java 539d6b2 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java b8923e8 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java 7de1c10 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookKeeperClusterTestCase.java 0b882c6 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/CloseTest.java e28d32c 

Diff: https://reviews.apache.org/r/3472/diff


Testing
-------


Thanks,

Sijie


Re: Review Request: BOOKKEEPER-112: Bookie Recovery on an open ledger will cause LedgerHandle#close on that ledger to fail

Posted by Sijie Guo <gu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/
-----------------------------------------------------------

(Updated 2012-03-29 13:29:23.834764)


Review request for bookkeeper.


Changes
-------

close opened non-recovery ledger handle as Ivan's suggestion.


Summary
-------

Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not get notified of this update, so it will try to write out its own ledger metadata, only to fail with KeeperException.BadVersion. This effectively fences all write operations on the LedgerHandle (close and addEntry). close will fail for obvious reasons. addEntry will fail once it gets to the failed bookie in the schedule, tries to write, fails, selects a new bookie and tries to update ledger metadata.

Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
Also, uncomment addEntry in TestFencing#testFencingInteractionWithBookieRecovery()


This addresses bug BOOKKEEPER-112.
    https://issues.apache.org/jira/browse/BOOKKEEPER-112


Diffs (updated)
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java 37623dc 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java f71e53f 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java b403aa1 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java c67a79c 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java 539d6b2 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java b8923e8 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java 7de1c10 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookKeeperClusterTestCase.java 0b882c6 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/CloseTest.java e28d32c 

Diff: https://reviews.apache.org/r/3472/diff


Testing
-------


Thanks,

Sijie


Re: Review Request: BOOKKEEPER-112: Bookie Recovery on an open ledger will cause LedgerHandle#close on that ledger to fail

Posted by Sijie Guo <gu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/
-----------------------------------------------------------

(Updated 2012-03-23 15:30:15.796027)


Review request for bookkeeper.


Changes
-------

modify the patch according to the document.


Summary
-------

Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not get notified of this update, so it will try to write out its own ledger metadata, only to fail with KeeperException.BadVersion. This effectively fences all write operations on the LedgerHandle (close and addEntry). close will fail for obvious reasons. addEntry will fail once it gets to the failed bookie in the schedule, tries to write, fails, selects a new bookie and tries to update ledger metadata.

Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
Also, uncomment addEntry in TestFencing#testFencingInteractionWithBookieRecovery()


This addresses bug BOOKKEEPER-112.
    https://issues.apache.org/jira/browse/BOOKKEEPER-112


Diffs (updated)
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java 37623dc 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java f71e53f 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java b403aa1 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java c67a79c 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java 539d6b2 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java b8923e8 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java 7de1c10 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BookKeeperClusterTestCase.java 0b882c6 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/CloseTest.java e28d32c 

Diff: https://reviews.apache.org/r/3472/diff


Testing
-------


Thanks,

Sijie


Re: Review Request: BOOKKEEPER-112: Bookie Recovery on an open ledger will cause LedgerHandle#close on that ledger to fail

Posted by Sijie Guo <gu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/
-----------------------------------------------------------

(Updated 2012-03-02 11:27:34.221927)


Review request for bookkeeper.


Changes
-------

new patch remove readForward flag from constructor. and also remove ScheduledExecutorService, since we don't need to care about long-chain, since the callback will be triggered in bookie client's netty thread not in same thread. 


Summary
-------

Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not get notified of this update, so it will try to write out its own ledger metadata, only to fail with KeeperException.BadVersion. This effectively fences all write operations on the LedgerHandle (close and addEntry). close will fail for obvious reasons. addEntry will fail once it gets to the failed bookie in the schedule, tries to write, fails, selects a new bookie and tries to update ledger metadata.

Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
Also, uncomment addEntry in TestFencing#testFencingInteractionWithBookieRecovery()


This addresses bug BOOKKEEPER-112.
    https://issues.apache.org/jira/browse/BOOKKEEPER-112


Diffs (updated)
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java 37623dc 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java f71e53f 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java b403aa1 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java c67a79c 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java 29070eb 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java 99258ac 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java 015e4e4 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java dada67a 

Diff: https://reviews.apache.org/r/3472/diff


Testing
-------


Thanks,

Sijie


Re: Review Request: BOOKKEEPER-112: Bookie Recovery on an open ledger will cause LedgerHandle#close on that ledger to fail

Posted by Sijie Guo <gu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/
-----------------------------------------------------------

(Updated 2012-02-28 11:09:19.006294)


Review request for bookkeeper.


Changes
-------

adding a readforward operation if lastEnsemble need to be repaired, which addresses Ivan's comment.


Summary
-------

Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not get notified of this update, so it will try to write out its own ledger metadata, only to fail with KeeperException.BadVersion. This effectively fences all write operations on the LedgerHandle (close and addEntry). close will fail for obvious reasons. addEntry will fail once it gets to the failed bookie in the schedule, tries to write, fails, selects a new bookie and tries to update ledger metadata.

Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
Also, uncomment addEntry in TestFencing#testFencingInteractionWithBookieRecovery()


This addresses bug BOOKKEEPER-112.
    https://issues.apache.org/jira/browse/BOOKKEEPER-112


Diffs (updated)
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java 37623dc 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java a94a0e5 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java b403aa1 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java c67a79c 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java 29070eb 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java 99258ac 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java 015e4e4 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java dada67a 

Diff: https://reviews.apache.org/r/3472/diff


Testing
-------


Thanks,

Sijie


Re: Review Request: BOOKKEEPER-112: Bookie Recovery on an open ledger will cause LedgerHandle#close on that ledger to fail

Posted by Sijie Guo <gu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/
-----------------------------------------------------------

(Updated 2012-02-24 17:09:54.127303)


Review request for bookkeeper.


Changes
-------

discussed with Ivan offline, it seems that it is OK to recover those opened / in recovery status ledgers. for those in recovery status ledgers, if the last entry on the failed bookie, the recovery add should change ensemble. the last entry would be written on the other bookies. the tricky case is for opened ledgers, since readEntries for openNoRecovery can't read more than lastAddConfirmed, we change ledgerHandle#readEntries to use PendingReadOp to skip boundary checking to try to read lastAddConfirmed + 1 to avoid missing replicating this entry.


Summary
-------

Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not get notified of this update, so it will try to write out its own ledger metadata, only to fail with KeeperException.BadVersion. This effectively fences all write operations on the LedgerHandle (close and addEntry). close will fail for obvious reasons. addEntry will fail once it gets to the failed bookie in the schedule, tries to write, fails, selects a new bookie and tries to update ledger metadata.

Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
Also, uncomment addEntry in TestFencing#testFencingInteractionWithBookieRecovery()


This addresses bug BOOKKEEPER-112.
    https://issues.apache.org/jira/browse/BOOKKEEPER-112


Diffs (updated)
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java 37623dc 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java a94a0e5 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java b403aa1 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java c67a79c 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java 29070eb 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java 99258ac 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java 015e4e4 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java dada67a 

Diff: https://reviews.apache.org/r/3472/diff


Testing
-------


Thanks,

Sijie


Re: Review Request: BOOKKEEPER-112: Bookie Recovery on an open ledger will cause LedgerHandle#close on that ledger to fail

Posted by Sijie Guo <gu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/
-----------------------------------------------------------

(Updated 2012-02-22 14:43:04.529989)


Review request for bookkeeper.


Changes
-------

attach a new patch to remove excludedBookies related codes. 


Summary
-------

Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not get notified of this update, so it will try to write out its own ledger metadata, only to fail with KeeperException.BadVersion. This effectively fences all write operations on the LedgerHandle (close and addEntry). close will fail for obvious reasons. addEntry will fail once it gets to the failed bookie in the schedule, tries to write, fails, selects a new bookie and tries to update ledger metadata.

Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
Also, uncomment addEntry in TestFencing#testFencingInteractionWithBookieRecovery()


This addresses bug BOOKKEEPER-112.
    https://issues.apache.org/jira/browse/BOOKKEEPER-112


Diffs (updated)
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java 37623dc 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java a94a0e5 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java b403aa1 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java c67a79c 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java 29070eb 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java 99258ac 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java 015e4e4 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/BaseTestCase.java dada67a 

Diff: https://reviews.apache.org/r/3472/diff


Testing
-------


Thanks,

Sijie


Re: Review Request: BOOKKEEPER-112: Bookie Recovery on an open ledger will cause LedgerHandle#close on that ledger to fail

Posted by Sijie Guo <gu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/
-----------------------------------------------------------

(Updated 2012-01-29 09:58:30.513187)


Review request for bookkeeper.


Changes
-------

we need to check the ledger metadata status before proceed recovery action.

for those OPENED ledgers,
1) whose last ensemble contains the failed bookie, we should not proceed recovery action. since we can't promise last entry to be fully replicated. (also there may be other side effects)
2) whose last ensemble doesn't contain the failed bookie, it is safe to proceed recovery action.

for those IN_RECOVERY ledgers, we have to check whether last ensemble contains the failed bookie. if it is, the recovery tool has to help closing this ledger, since the normal bookkeeper client may fail to close it. (a corn case: 3 bookies (bk1, bk2, bk3), quorum size 3, ensemble size 3. no entry is written. bk3 is failed. bk1 and bk2 returns NoEntry, bk3 returns HandleNotAvailable. ledger can't be closed.) 

for 2) case of OPENED ledgers, both PendingAddOp and BookKeeperAdmin needs to rereadMetadata when encountering BADVERSION and try to resolve such confliction to avoid #close it.


Summary
-------

Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not get notified of this update, so it will try to write out its own ledger metadata, only to fail with KeeperException.BadVersion. This effectively fences all write operations on the LedgerHandle (close and addEntry). close will fail for obvious reasons. addEntry will fail once it gets to the failed bookie in the schedule, tries to write, fails, selects a new bookie and tries to update ledger metadata.

Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
Also, uncomment addEntry in TestFencing#testFencingInteractionWithBookieRecovery()


This addresses bug BOOKKEEPER-112.
    https://issues.apache.org/jira/browse/BOOKKEEPER-112


Diffs (updated)
-----

  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeper.java 5bb37c3 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java 37623dc 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java 547e240 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java b403aa1 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerOpenOp.java 56186ab 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java 4625bbb 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java 29070eb 
  bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java 43e999d 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java 8526db5 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java 015e4e4 
  bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerOpenTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/3472/diff


Testing
-------


Thanks,

Sijie