You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2021/03/01 09:10:34 UTC

[GitHub] [bookkeeper] ivankelly commented on issue #2612: Wrong ReadLastAddConfirmed logic that can lead to data loss in client applications

ivankelly commented on issue #2612:
URL: https://github.com/apache/bookkeeper/issues/2612#issuecomment-787788316


   I haven't read the whole thread, but I think i get the jist. As @Vanlightly says, this isn't a case that bookkeeper is currently designed for. 
   ```
   +    @Test
   +    public void testCoverageSetsOnNoEntry() {
   +        RoundRobinDistributionSchedule schedule = new RoundRobinDistributionSchedule(
   +                3, 3, 3);
   +        DistributionSchedule.QuorumCoverageSet covSet = schedule.getCoverageSet();
   +        covSet.addBookie(1, BKException.Code.NoSuchEntryException);
   +        assertFalse(covSet.checkCovered());
   +    }
   +
   ```
   In this test case, the ledger has an ensemble of 3, write quorum of 3 and ack quorum of 3. I.e. no entry can have been acknowledged to a writer unless it has been written to all 3 nodes. So if a node responses with "No, I don't have it" it means that write was never acknowledged. 
   
   BookKeeper was never designed to be arbitrarily able to lose data. That's why we have the journal and why we have cookies. The assumption is that if a bookie acknowledges a write it's not going to turn around and go "nope, never happened". Expecting otherwise is like expecting single decree paxos to still work without stable storage. 
   
   However, as @Vanlightly also says, we do have a solution for this (and it doesn't involve fundamentally changing the protocol). The core problem here is a bookie going back on its word. It saw an entry and but now it says it didn't. Technically an arbitrary or byzantine failure. The solution is to turn these arbitrary failures into omission failures. In practice, this works by detecting possible data loss on boot (unclean shutdown or missing cookies), figuring out which entries we may have had (via metadata) and if a read is requested for any of these entries, responding with an error that isn't "NoSuchEntry/Ledger". From the client PoV, the response should be the same as if the bookie never responded at all. 
   
   Changing anything in the client is _not_ the solution.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org