You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@bookkeeper.apache.org by Enrico Olivelli <eo...@gmail.com> on 2017/04/04 07:47:38 UTC

Re: Write guarantees and the LAC Protocol

I have completed by debug session and I have an answer.

The problematic flow is the following:
- LedgerHandler#asyncAddEntry
- in case of BKNotEnoughBookiesException I'm triggering an async callback
(with a CompletableFuture#handleAsync) which in turn calls
LedgerHandler#close
- the LedgerHandler#close method truncates the ledger and some entry is
lost (at the time of the close call I have local LastAddConfirmed ==
LastAddPushed == 0 on the writer)

Is this a possible explanation ?
I wonder which is the best way to react to an error (in this case
BKNotEnoughBookiesException) in asyncAddEntry, usually I'm calling
LedgerHandler#close in order to the necessary clean up, but in this case it
seems very dangerous
(I am running BK 4.4.0)

This is the issue on my project, where I have put logs
https://github.com/diennea/herddb/issues/100

Thanks
-- Enrico








2017-03-17 16:14 GMT+01:00 Enrico Olivelli <eo...@gmail.com>:

>
>
> 2017-03-17 7:48 GMT+01:00 Sijie Guo <gu...@gmail.com>:
>
>>
>>
>> On Wed, Mar 15, 2017 at 9:17 AM, Enrico Olivelli <eo...@gmail.com>
>> wrote:
>>
>>> Hi BookKeepers,
>>> I'm trying to understand some basic aspect of BookKeeper....
>>>
>>> I have a strange case, and I would like to have some opinion from any
>>> "expert".
>>>
>>> I am using BookKeeper as a transaction log for a database, this is an
>>> example scenario:
>>>
>>> 1) the user inserts a record in a table
>>> the "fact" is logged to the Write-Ahead-log (Bookkeeper ledger), and the
>>> systems blocks until the asyncAddEntry completes (the callback is called
>>> with rc=0)
>>>
>>> now I (the 'writer') assume that the entry is written to durable storage
>>> and has been synched to disk
>>> then the DB applies the change to the local memory as every application
>>> with a WAL does
>>>
>>> 2) The writer 2  other entries for which the asyncAddEntry completes
>>> (the callback is called with rc=0). I block until the callback as been
>>> called
>>>
>>> 2) for tests I'm using an ensemble of 1 bookie, I "stop" the only bookie
>>> the bookie stops "gracefully" (no strange errors)
>>>
>>> 3) The writer issues other asyncAddEntry, which fail due to
>>> "notEnoughBookiesException" and then "closes" the LedgerHandle, inside the
>>> "callback" cade
>>>
>>> 4) I restart the bookie (no strange errors on logs)
>>>
>>> 5) I restart my "writer", opening the ledger with "openLedger" (with
>>> recovery and fencing...) in order to perform recovery from the WAL
>>>
>>> 6) now the LastAddConfirmed is 0 ! so my first entry is lost !
>>>
>>> Am I missing something ?
>>> The LAC has not been advanced even with 3 successful writes to the ledger
>>>
>>
>> Are you using readLastAddConfirmed at step 5)? 0 is expected when you
>> using #readLastAddConfirmed, since the call is retrieving LAC from bookies.
>>
>>
> I have added a call to readLastAddConfirmed but it does not help, the LAC
> is still 0
>
>
> I have double checked my code and the "close" happens in another thread,
> maybe I am hitting a race condition in my code, in fact the problem is not
> reproducible at 100%
>
>
> I will continue debugging
>
> Thanks
>
>
>
>
>> At step 3), it already closed the ledger and update the metadata's last
>> entry id with 2. so for tailing read, you might need to call
>> #readLastAddConfirmed and then get the last add confirmed using
>> #getLastAddConfirmed.
>>
>>
>>
>>>
>>> From the "writer" point-of-view the fact that the callback as been
>>> called is not a guarantee that the entry as been written to durable storage
>>> ?
>>>
>>> I understand that of a "tailing" reader (a ledger opened with
>>> openLedgerNoRecovery) it is very important not to be able to "read" data
>>> before the LAC, but for the "writer" it is important that it can count on
>>> the fact the entry will be recovered with success
>>>
>>>
>>> Thanks
>>> -- Enrico
>>>
>>>
>>
>

Re: Write guarantees and the LAC Protocol

Posted by Enrico Olivelli <eo...@gmail.com>.
Starting from top as I found a good explanation for my "failing" flaky test
case.
I'm asking to the user list some confirm of my explanation

Usecase:
1) 1 single writer, 1 bookie
2) The writer creates a new ledger and writes 1 entry and waits for the
acknowledge (addEntry, blocking call), let's call it "entry A"
3) The writer issues some other asynch writes (say B,C,D) and asserts that
at least one ends with BKNotEnoughBookiesException)
4) Stop the bookie (this will cause BKNotEnoughBookiesException to some of
the writes)
5) Start the bookie (boots OK, no errors)
6) Recover the ledger (open a new client with openLedger, WITH recovery),
use readLastAddConfirmed to be sure to read from all the bookies the max
LastAddConfirmed
7) The entry A some times is present (case a), sometimes is not present
(case b)

case A: success for entry A
at least ONE of the writes completed with success before the stop of the
Bookie, so the LastAddConfirmed has been written to the bookie and so it
can be recovered

case B: entry A disappeared
NONE of the writes completed with success before the stop of the Bookie, so
the LastAddConfirmed has NEVER been written to the bookie and so the ledger
appears EMPTY

Is this a good explanation ?


Thank you
Enrico





2017-04-04 10:05 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:

>
>
> 2017-04-04 9:47 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
>
>> I have completed by debug session and I have an answer.
>>
>> The problematic flow is the following:
>> - LedgerHandler#asyncAddEntry
>> - in case of BKNotEnoughBookiesException I'm triggering an async callback
>> (with a CompletableFuture#handleAsync) which in turn calls
>> LedgerHandler#close
>> - the LedgerHandler#close method truncates the ledger and some entry is
>> lost (at the time of the close call I have local LastAddConfirmed ==
>> LastAddPushed == 0 on the writer)
>>
>> Is this a possible explanation ?
>> I wonder which is the best way to react to an error (in this case
>> BKNotEnoughBookiesException) in asyncAddEntry, usually I'm calling
>> LedgerHandler#close in order to the necessary clean up, but in this case it
>> seems very dangerous
>> (I am running BK 4.4.0)
>>
>> This is the issue on my project, where I have put logs
>> https://github.com/diennea/herddb/issues/100
>>
>> Thanks
>> -- Enrico
>>
>>
>>
>
> I'm sorry, during other test cases skipping the close() did not fix the
> issue consistently, I will debug more
>
>
>>
>>
>>
>>
>>
>>
>> 2017-03-17 16:14 GMT+01:00 Enrico Olivelli <eo...@gmail.com>:
>>
>>>
>>>
>>> 2017-03-17 7:48 GMT+01:00 Sijie Guo <gu...@gmail.com>:
>>>
>>>>
>>>>
>>>> On Wed, Mar 15, 2017 at 9:17 AM, Enrico Olivelli <eo...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi BookKeepers,
>>>>> I'm trying to understand some basic aspect of BookKeeper....
>>>>>
>>>>> I have a strange case, and I would like to have some opinion from any
>>>>> "expert".
>>>>>
>>>>> I am using BookKeeper as a transaction log for a database, this is an
>>>>> example scenario:
>>>>>
>>>>> 1) the user inserts a record in a table
>>>>> the "fact" is logged to the Write-Ahead-log (Bookkeeper ledger), and
>>>>> the systems blocks until the asyncAddEntry completes (the callback is
>>>>> called with rc=0)
>>>>>
>>>>> now I (the 'writer') assume that the entry is written to durable
>>>>> storage and has been synched to disk
>>>>> then the DB applies the change to the local memory as every
>>>>> application with a WAL does
>>>>>
>>>>> 2) The writer 2  other entries for which the asyncAddEntry completes
>>>>> (the callback is called with rc=0). I block until the callback as been
>>>>> called
>>>>>
>>>>> 2) for tests I'm using an ensemble of 1 bookie, I "stop" the only
>>>>> bookie
>>>>> the bookie stops "gracefully" (no strange errors)
>>>>>
>>>>> 3) The writer issues other asyncAddEntry, which fail due to
>>>>> "notEnoughBookiesException" and then "closes" the LedgerHandle, inside the
>>>>> "callback" cade
>>>>>
>>>>> 4) I restart the bookie (no strange errors on logs)
>>>>>
>>>>> 5) I restart my "writer", opening the ledger with "openLedger" (with
>>>>> recovery and fencing...) in order to perform recovery from the WAL
>>>>>
>>>>> 6) now the LastAddConfirmed is 0 ! so my first entry is lost !
>>>>>
>>>>> Am I missing something ?
>>>>> The LAC has not been advanced even with 3 successful writes to the
>>>>> ledger
>>>>>
>>>>
>>>> Are you using readLastAddConfirmed at step 5)? 0 is expected when you
>>>> using #readLastAddConfirmed, since the call is retrieving LAC from bookies.
>>>>
>>>>
>>> I have added a call to readLastAddConfirmed but it does not help, the
>>> LAC is still 0
>>>
>>>
>>> I have double checked my code and the "close" happens in another thread,
>>> maybe I am hitting a race condition in my code, in fact the problem is not
>>> reproducible at 100%
>>>
>>>
>>> I will continue debugging
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>> At step 3), it already closed the ledger and update the metadata's last
>>>> entry id with 2. so for tailing read, you might need to call
>>>> #readLastAddConfirmed and then get the last add confirmed using
>>>> #getLastAddConfirmed.
>>>>
>>>>
>>>>
>>>>>
>>>>> From the "writer" point-of-view the fact that the callback as been
>>>>> called is not a guarantee that the entry as been written to durable storage
>>>>> ?
>>>>>
>>>>> I understand that of a "tailing" reader (a ledger opened with
>>>>> openLedgerNoRecovery) it is very important not to be able to "read" data
>>>>> before the LAC, but for the "writer" it is important that it can count on
>>>>> the fact the entry will be recovered with success
>>>>>
>>>>>
>>>>> Thanks
>>>>> -- Enrico
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Write guarantees and the LAC Protocol

Posted by Enrico Olivelli <eo...@gmail.com>.
2017-04-04 9:47 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:

> I have completed by debug session and I have an answer.
>
> The problematic flow is the following:
> - LedgerHandler#asyncAddEntry
> - in case of BKNotEnoughBookiesException I'm triggering an async callback
> (with a CompletableFuture#handleAsync) which in turn calls
> LedgerHandler#close
> - the LedgerHandler#close method truncates the ledger and some entry is
> lost (at the time of the close call I have local LastAddConfirmed ==
> LastAddPushed == 0 on the writer)
>
> Is this a possible explanation ?
> I wonder which is the best way to react to an error (in this case
> BKNotEnoughBookiesException) in asyncAddEntry, usually I'm calling
> LedgerHandler#close in order to the necessary clean up, but in this case it
> seems very dangerous
> (I am running BK 4.4.0)
>
> This is the issue on my project, where I have put logs
> https://github.com/diennea/herddb/issues/100
>
> Thanks
> -- Enrico
>
>
>

I'm sorry, during other test cases skipping the close() did not fix the
issue consistently, I will debug more


>
>
>
>
>
>
> 2017-03-17 16:14 GMT+01:00 Enrico Olivelli <eo...@gmail.com>:
>
>>
>>
>> 2017-03-17 7:48 GMT+01:00 Sijie Guo <gu...@gmail.com>:
>>
>>>
>>>
>>> On Wed, Mar 15, 2017 at 9:17 AM, Enrico Olivelli <eo...@gmail.com>
>>> wrote:
>>>
>>>> Hi BookKeepers,
>>>> I'm trying to understand some basic aspect of BookKeeper....
>>>>
>>>> I have a strange case, and I would like to have some opinion from any
>>>> "expert".
>>>>
>>>> I am using BookKeeper as a transaction log for a database, this is an
>>>> example scenario:
>>>>
>>>> 1) the user inserts a record in a table
>>>> the "fact" is logged to the Write-Ahead-log (Bookkeeper ledger), and
>>>> the systems blocks until the asyncAddEntry completes (the callback is
>>>> called with rc=0)
>>>>
>>>> now I (the 'writer') assume that the entry is written to durable
>>>> storage and has been synched to disk
>>>> then the DB applies the change to the local memory as every application
>>>> with a WAL does
>>>>
>>>> 2) The writer 2  other entries for which the asyncAddEntry completes
>>>> (the callback is called with rc=0). I block until the callback as been
>>>> called
>>>>
>>>> 2) for tests I'm using an ensemble of 1 bookie, I "stop" the only bookie
>>>> the bookie stops "gracefully" (no strange errors)
>>>>
>>>> 3) The writer issues other asyncAddEntry, which fail due to
>>>> "notEnoughBookiesException" and then "closes" the LedgerHandle, inside the
>>>> "callback" cade
>>>>
>>>> 4) I restart the bookie (no strange errors on logs)
>>>>
>>>> 5) I restart my "writer", opening the ledger with "openLedger" (with
>>>> recovery and fencing...) in order to perform recovery from the WAL
>>>>
>>>> 6) now the LastAddConfirmed is 0 ! so my first entry is lost !
>>>>
>>>> Am I missing something ?
>>>> The LAC has not been advanced even with 3 successful writes to the
>>>> ledger
>>>>
>>>
>>> Are you using readLastAddConfirmed at step 5)? 0 is expected when you
>>> using #readLastAddConfirmed, since the call is retrieving LAC from bookies.
>>>
>>>
>> I have added a call to readLastAddConfirmed but it does not help, the LAC
>> is still 0
>>
>>
>> I have double checked my code and the "close" happens in another thread,
>> maybe I am hitting a race condition in my code, in fact the problem is not
>> reproducible at 100%
>>
>>
>> I will continue debugging
>>
>> Thanks
>>
>>
>>
>>
>>> At step 3), it already closed the ledger and update the metadata's last
>>> entry id with 2. so for tailing read, you might need to call
>>> #readLastAddConfirmed and then get the last add confirmed using
>>> #getLastAddConfirmed.
>>>
>>>
>>>
>>>>
>>>> From the "writer" point-of-view the fact that the callback as been
>>>> called is not a guarantee that the entry as been written to durable storage
>>>> ?
>>>>
>>>> I understand that of a "tailing" reader (a ledger opened with
>>>> openLedgerNoRecovery) it is very important not to be able to "read" data
>>>> before the LAC, but for the "writer" it is important that it can count on
>>>> the fact the entry will be recovered with success
>>>>
>>>>
>>>> Thanks
>>>> -- Enrico
>>>>
>>>>
>>>
>>
>