You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Venkateswara Rao Jujjuri <ju...@gmail.com> on 2018/12/03 06:57:41 UTC

Bug with blockAddCompletions decrement

This may not be an issue with the master as we moved to immutable metadata,
but older releases have an issue with not decrementing  blockAddCompletions.


Sijie/Ivan/Sam can you validate this?


blockAddCompletions gets incremented while initiating the ensemble change
in handleBookieFailure(). Based on how we handle the bookie failure this
should get decremented. Until this comes back to zero there wont be any
addEntryCompletions are sent to the application.



I found two areas where this counter may not get decremented. (Look for
"BUG" in the code snippet below)



   1. 1.
   2.
   3. updateMetadataIfPossible -> This should decrement blockAddCompletions
   if returning TRUE
   4. {
   5. if (metadata.isNewerThan(newMeta)) {
   6. rereadMetadata(this); ==> This is a loop back to reattempt which
   comes back to here.
   7. return true;
   8. }
   9. if (metadata.isConflictWith(newMeta)) {
   10. return false;
   11. }
   12. // update znode version
   13. metadata.setVersion(newMeta.getVersion());
   14. // merge ensemble infos from new meta except last ensemble
   15. // since they might be modified by recovery tool.
   16. metadata.mergeEnsembles(newMeta.getEnsembles());
   17. writeLedgerConfig(new ChangeEnsembleCb(ensembleInfo,
   curBlockAddCompletions,
   18. ensembleChangeIdx));
   19. return true;
   20. ====> BUG: We are NOT decrementing blockAddCompletions
   21. }
   22.
   23. 2.
   24. resolveConflict()
   25. {
   26. int diff = newMeta.getEnsembles().size() - metadata.getEnsembles().
   size();
   27. if (0 != diff) {
   28. if (-1 == diff) {
   29. // Case 1: metadata is changed by other ones (e.g. Recovery)
   30. return updateMetadataIfPossible(newMeta);
   31. }
   32. return false;
   33. }
   34. if (!areFailedBookiesReplaced(newMeta, ensembleInfo)) {
   35. if (areFailedBookiesReplaced(metadata, ensembleInfo)) {
   36. return updateMetadataIfPossible(newMeta);
   37. }
   38. else??
   39. ===>BUG: There is no else case here. If there can be an else case
   here, we are returning TRUE but not decrementing blockAddCompletions.
   40. } else {
   41. int newBlockAddCompletions = blockAddCompletions.decrementAndGet();
   42. unsetSuccessAndSendWriteRequest(ensembleInfo.replacedBookies);
   43. }
   44. return true;
   45. }

@Sam Just
<https://gus.lightning.force.com/lightning/r/005B0000000SS00IAG/view>  let
us discuss this tomorrow.


-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: Bug with blockAddCompletions decrement

Posted by Ivan Kelly <iv...@apache.org>.
> > Not an issue with master as blockAddCompletions has been replaced with
> > a simple boolean and failure handling changed to only deal with one
> > failure at a time. However, looking at it again, I think I did spot a
> > similar bug. will dig in after I send this.
>
> Great; looking forward.

It was something that sam had raised in the original patch. Fix is up.
https://github.com/apache/bookkeeper/pull/1857

> updateMetadataIfPossible() is called from resolveConflict() -> "return
> updateMetadataIfPossible(newMeta);"
> There is only one case in the  updateMetadataIfPossible(newMeta); that will
> rereadMetadata in other cases
> it can just return TRUE so there is no ChangeEnsembleCb.

When updateMetadataIfPossible is called, the precondition is that
there is at least one 'bump' that needs to be decremented on
blockAddCompletions before adds can be completed.
There's 3 branches in the code.
1. existing metadata is newer. reread is called, which will eventually
call updateMetadataIfPossible again under the same conditions
2. isConflictWith is true, so kill the ledger, we don't want to
decrement because we are in an undefined condition
3. we write new metadata, in which case ChangeEnsembleCb does the decrement.

This code is awful. It gives me nightmares. But AFAICS it is correct
w.r.t. to the preconditions stated above.

[1] https://github.com/apache/bookkeeper/blob/621799172a0de84f97b87d795c31e263de80e8a3/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L2197

-Ivan

Re: Bug with blockAddCompletions decrement

Posted by Venkateswara Rao Jujjuri <ju...@gmail.com>.
On Mon, Dec 3, 2018 at 1:40 AM Ivan Kelly <iv...@apache.org> wrote:

> > This may not be an issue with the master as we moved to immutable
> metadata,
>
> Not an issue with master as blockAddCompletions has been replaced with
> a simple boolean and failure handling changed to only deal with one
> failure at a time. However, looking at it again, I think I did spot a
> similar bug. will dig in after I send this.
>

Great; looking forward.


> Your email client mangled the code, but as far as I could untangle it.
>
> updateMetadataIfPossible: No bug. decrement gets called when
> ChangeEnsembleCb completes.
>

updateMetadataIfPossible() is called from resolveConflict() -> "return
updateMetadataIfPossible(newMeta);"
There is only one case in the  updateMetadataIfPossible(newMeta); that will
rereadMetadata in other cases
it can just return TRUE so there is no ChangeEnsembleCb.

I will work on  test cases.

Thanks,
JV


> resolveConflict: Looks like a bug. Would be good to had a test case
> that tickles it.
>
> -Ivan
>


-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: Bug with blockAddCompletions decrement

Posted by Ivan Kelly <iv...@apache.org>.
> This may not be an issue with the master as we moved to immutable metadata,

Not an issue with master as blockAddCompletions has been replaced with
a simple boolean and failure handling changed to only deal with one
failure at a time. However, looking at it again, I think I did spot a
similar bug. will dig in after I send this.

Your email client mangled the code, but as far as I could untangle it.

updateMetadataIfPossible: No bug. decrement gets called when
ChangeEnsembleCb completes.

resolveConflict: Looks like a bug. Would be good to had a test case
that tickles it.

-Ivan