You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Enrico Olivelli <eo...@gmail.com> on 2020/11/04 07:40:31 UTC

Bypassing writes to the Journal - everybody wants this feature !

Hello,
It looks like we all want to add a feature to BookKeeper to allow us to
skip writes to the journals.

We already have these patches:
https://github.com/apache/bookkeeper/pull/2401
https://github.com/apache/bookkeeper/pull/2157
and IIRC there are also tentative BPs.

The limit of the given patches is that it is simply skipping all of the
writes to the journal,  and this in turn is a big problem:
- if you restart the bookie it is likely that you lose your data, and
especially the 'fenced' flag
- clients cannot rely on most of the guarantees that BK provides

Also another problem is that those implementations work on a per-bookie
basis, I understand that the user in those cases is Pulsar and usually you
do not share your BK cluster with other applications (is it really true ?
think about PulsarFunctions and BK StreamStorage service....).

Btw this is not true for our case at EmailSuccess.com and also at
MagNews.com, in which we are sharing the bookies with other components
(like HerdDB, DistributedLog, BlobIt).

Skipping the journal is a good trade off in several cases, because it makes
writes blazing fast and also reduces the write amplification.

I would like to wrap up all of this stuff and provide a feature to BK, to
be used consistently by all of the users.

I think that it will be far better to have a WriteFlag to enable this
feature, this way different clients will be able to express their
durability constraints and service level regarding this feature.

Also when the Bookie is not writing to the Journal, after a restart, we
should tell to the clients that the Bookie is not able to return data for a
given ledger or to tell if the ledger has been fenced. IIUC Ivan and Matteo
already have this change in their private fork.


I will be happy to start a BP or to help any other volunteer in writing it.
We should work as a community on this topic.

Thoughts ?
Enrico

Re: Bypassing writes to the Journal - everybody wants this feature !

Posted by Ivan Kelly <iv...@apache.org>.
> The limit of the given patches is that it is simply skipping all of the
> writes to the journal,  and this in turn is a big problem:
> - if you restart the bookie it is likely that you lose your data, and
> especially the 'fenced' flag
> - clients cannot rely on most of the guarantees that BK provides

There are two problems (restatement of above).
- A bookie may accept writes for a ledger which it has previously
promised not to (loss of the fenced bit)
- A bookie may reply negatively for the read of a ledgers entry, which
is has previously acknowledged receipt of (breaks consistency
guarantees)

In both cases, the problem is unclosed ledgers. If the bookie, when it
starts, can detect a non-clean shutdown.
If it does, it can find all unclosed ledgers which were writing to it, and
a) accept not more writes
b) not reply negatively to requests for entries of those ledgers which
do not exist on the bookie.

a) is similar in effect to fencing. If a client was actively writing
to the ledger, it should have updated the ensemble by that time in any
case.
b) is a new concept (lets call it limbo). If the entries do not exist
locally, they may still have existed previously. So to respond
negatively would be untrue
and messes up the recovery process.

As you mentioned, splunk already has this change internally. I'm going
to start another thread about that.

In summary, skipping the journal is fine if you have some other things
in place. However, I would make it a cluster wide property.
If we say skipping the journal is safe (due to multi AZ and the extra
checks) then it should be safe for all.


-Ivan

















 Also another problem is that those implementations work on a per-bookie
> basis, I understand that the user in those cases is Pulsar and usually you
> do not share your BK cluster with other applications (is it really true ?
> think about PulsarFunctions and BK StreamStorage service....).
>
> Btw this is not true for our case at EmailSuccess.com and also at
> MagNews.com, in which we are sharing the bookies with other components
> (like HerdDB, DistributedLog, BlobIt).
>
> Skipping the journal is a good trade off in several cases, because it makes
> writes blazing fast and also reduces the write amplification.
>
> I would like to wrap up all of this stuff and provide a feature to BK, to
> be used consistently by all of the users.
>
> I think that it will be far better to have a WriteFlag to enable this
> feature, this way different clients will be able to express their
> durability constraints and service level regarding this feature.
>
> Also when the Bookie is not writing to the Journal, after a restart, we
> should tell to the clients that the Bookie is not able to return data for a
> given ledger or to tell if the ledger has been fenced. IIUC Ivan and Matteo
> already have this change in their private fork.
>
>
> I will be happy to start a BP or to help any other volunteer in writing it.
> We should work as a community on this topic.
>
> Thoughts ?
> Enrico