You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2021/02/19 18:25:47 UTC

[GitHub] [bookkeeper] dlg99 edited a comment on issue #2612: Wrong ReadLastAddConfirmed logic that can lead to data loss in client applications

dlg99 edited a comment on issue #2612:
URL: https://github.com/apache/bookkeeper/issues/2612#issuecomment-782249032


   @fpj as I understand, checkCovered() makes sure that all "spreads" of entries got covered with AQ responses. You need to consider ES > WQ >= AQ here.
   Everything is simple when ES = AQ, any single response is considered to be "authoritative" (assuming no single bookie data loss scenario).
   Otherwise, for ES = 5, WQ = 2, AQ = 2, entries placement on bookies (round robin schedule)
   ```
   b1  b2  b3  b4  b5
   ------------------
   e1  e1
       e2  e2
           e3  e3
               e4  e4
   e5              e5
   ```
   if only b1 does not respond all spreads are covered, `nodesUnknown < ackQuorumSize`.
   If b1 and b2 didn't respond, `nodesUnknown >= ackQuorumSize` and coverageSet is not covered.
   Non-responding b1 and b4 should still result in covered set, as I understand, as they don't participate in any "spread" (I didn't try to confirm my understanding with unit tests).
   
   All this is done to reduce request latency in case if there is a slow bookie. 
   
   That code assumes that bookie either has consistent dataset (recovered from journal in case of crash or misses some tail entries) or is not responding.
   
   No we are getting into the territory of how to handle case when bookie is up but some or all of the data is missing.
   My original suggestion was to add a check that we updated maxRecoveredData at least one time s not the best one as I think about it more.
   changing `checkCovered()` to treat `NoSuchEntryException`  as a "node unknown" case makes more sense (as in: remove line `&& covered[nodeIndex] != BKException.Code.NoSuchEntryException`).
   
   Another question is what to do with `NoSuchLedgerExistsException`  - either handle it similarly to `NoSuchEntryException` or keep it. I don't remember all details of the check of ledger existence, I think bookie checks the index. Did you keep the index files but deleted the log & journal files? if deleting both results in NoSuchLedgerExistsException I'd remove it from checkCovered as well.
   
   Then, of course, what if there is legit case when ledger was created, no entries ever written, and we are trying to read LAC?
   All bookies will return something like NoSuchEntryException and read LAC op will return error which changes current behavior.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org