You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2020/03/05 07:57:51 UTC

[GitHub] [bookkeeper] eolivelli commented on issue #2273: Bookie does not try to download ledger from another bookie

eolivelli commented on issue #2273: Bookie does not try to download ledger from another bookie
URL: https://github.com/apache/bookkeeper/issues/2273#issuecomment-595081016
 
 
   Finally we realized the "problem", it is not a strictly a bug in BookKeeper but it is a very unexpected behaviour that needs a workaround in the application and a workaround is not always possible.
   
   The scenario is the following:
   - we have the usual leader/follower pattern for BK users
   - the leader holds on metadata service (ZooKeeper) a list of "active ledgers" that build up an unlimited stream of data
   - the leader creates a new ledger with WQ=2, gets the id and appends it to the list of "active ledgers"
   - the ledger is empty, no write has ever been issues to bookies
   - bookies locally do not know anything about the ledger
   - now let's stop one of the two bookies (or partition it away from client network, that's the @hamadodene 's case)
   - so we have on ledger metadata an ensemble with bookie1 and bookie2, ledger is in state OPEN, LAC = -1
   - bookie1 is up and running, but it doesn't hold any entry
   - bookie2 is unreachable from the client
   - the follower tries to open the ledger (no recovery), and boom !
   - the ledger is OPEN, the follower reads "NoSuchLedger" from Bookie1 and it gets a network error (BookieHandleNotAvailable or something like that) from Bookie2
   
   It looks like that even a recovery read is not possible in this case.
   
   The workaround is to write (and block until a successful acknowledge) an entry to the ledger before adding the ledger to the list of "active ledgers", this way you are sure that each bookie knows about the ledger and does not answer NoSuchLedger.
   
   With QA < WQ this workaround won't work, because the write of entry 0 may not be acklowledged by Bookie1 (the one running during the open action) but the client will consider it successfully written (because Bookie2 at the time of the write is up and running).
   
   cc @hamadodene @aluccaroni @ivankelly @fpj @jvrao @sijie 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services