You are viewing a plain text version of this content. The canonical link for it is here.
Posted to distributedlog-dev@bookkeeper.apache.org by Enrico Olivelli <eo...@gmail.com> on 2017/08/23 15:25:54 UTC

[DISCUSS] BP-14 Relax Durability

Hi all,
I have drafted a first proposal for BP-14 - Relax Durability

We are talking about limiting the number of fsync to the journal while
preserving the correctness of the LAC protocol.

This is the link to the wiki page, but as the issue is huge we prefer to
use Google Documents for sharing comments
https://cwiki.apache.org/confluence/display/BOOKKEEPER/
BP+-+14+Relax+durability

This is the document
https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
NW8VOUUgUWVBmswCUOG158/edit?usp=sharing

All comments are welcome

I have added DL dev list in cc as the discussion is interesting for both
groups

Enrico Olivelli

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
Thanks Sijie
I will do my best.

I can try to separate:
1) protocol changes (protobuf)
2) new client side API
3) LAC protocol changes bookie side changes
4) additional tests

Actually I already have a private work-in-progress branch with the full
stack, I will finish to implement the document and the split into pieces.

b.q.
I left one comment on the doc about the retention of the SyncCounter on the
bookie side

-- Enrico


2017-09-12 10:08 GMT+02:00 Sijie Guo <gu...@gmail.com>:

> Cool.
>
> I would expect this is a big change. It would be good if you can divide it
> into smaller tasks, so people can review them easier.
>
> - Sijie
>
> On Tue, Sep 12, 2017 at 1:05 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Thank you all !
> >
> > I will copy the content of the Final draft to the Wiki and mark the
> > document as "Accepted"
> >
> > I will send a PR soon but it will depend on BP-15 New CreateLeader API
> >
> > I hope we could make it for 4.6
> >
> >
> > Enrico
> >
> >
> > 2017-09-11 18:58 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> >
> > > Enrico,
> > >
> > > Feel free to close the thread and mark this BP as accepted, if there is
> > no
> > > -1.
> > >
> > > - Sijie
> > >
> > > On Mon, Sep 11, 2017 at 2:26 AM, Enrico Olivelli <eo...@gmail.com>
> > > wrote:
> > >
> > > > Ping
> > > >
> > > > 2017-09-07 9:32 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> > > >
> > > > > Hi all,
> > > > >
> > > > >
> > > > > You can find the revised proposal here
> > > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > > > BP-14+Relax+durability
> > > > >
> > > > > The link to the document open for comments is this:
> > > > > https://docs.google.com/document/d/1yNi9t2_
> > > > deOOMXDaGzrnmaHTQeB3B3Fnym82DU
> > > > > ERH7LM/edit?usp=sharing
> > > > >
> > > > > Please check it out
> > > > > We are going to review this Proposal at the meeting
> > > > >
> > > > > -- Enrico
> > > > >
> > > > >
> > > > > 2017-08-30 8:56 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> > > > >
> > > > >> Thank you Sijie for summarizing and thanks to the community for
> > > helping
> > > > >> in this important enhancement to BookKeeper
> > > > >>
> > > > >> I am convinced that as JV pointed out we need to declare at ledger
> > > > >> creation time that the ledger is going to perform no-sync writes.
> > > > >>
> > > > >> I think we need an explicit declaration currently to make things
> > > "clear"
> > > > >> to the developer which is using the LedgerHandle API even and
> ledger
> > > > >> creation tyime.
> > > > >>
> > > > >> The case is that we are going to forbid "striping" ledgers
> (ensemble
> > > > size
> > > > >> > quorum size) for no-sync writes in the first implementation:
> > > > >> - one option is to  fail at the first no-sync addEntry, but this
> > will
> > > be
> > > > >> really uncomfortable because usually the ack/write/ensemble sizes
> > are
> > > > >> configured by the admin, and there will be configurations in which
> > > > errors
> > > > >> will come out only after starting the system.
> > > > >> - the second option is to make the developer explicitly enable
> > no-sync
> > > > >> writes at creation time and fail the creation of the ledger if the
> > > > >> requested combination of options if not possible
> > > > >>
> > > > >> I am not sure that the changes to the bookie internals are a
> > > Client-API
> > > > >> matter, maybe we can leverage custom metadata (as JV said) in
> order
> > to
> > > > make
> > > > >> the bookie handle ledgers in a different manner, this way will be
> > > always
> > > > >> open as custom metadata are already here.
> > > > >>
> > > > >> JV preferred the ledger-type approach, the dual solution is to
> > > introduce
> > > > >> a list of "capabilities" or "ledger options".
> > > > >> I think that this ability to perform no-syc writes is so important
> > > that
> > > > >> "custom metadata" is not the good place to declare it, same for
> > > "ledger
> > > > >> type"
> > > > >>
> > > > >> So I am proposing to add a boolean 'allowNoSyncWrites" at ledger
> > > > creation
> > > > >> time, without writing in to ledger metadata on ZK,
> > > > >> I think that if further improvements will need ledger metadata
> > changes
> > > > we
> > > > >> will do.
> > > > >>
> > > > >> I have updated the BP-14 document, I have added an "Open issues"
> > > footer
> > > > >> with the open points,
> > > > >> please add comments and I will correct the document as soon as
> > > possible.
> > > > >>
> > > > >>
> > > > >> Enrico
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> 2017-08-30 1:24 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > > > >>
> > > > >>> Thank you, Enrico, JV.
> > > > >>>
> > > > >>> These are great discussions.
> > > > >>>
> > > > >>> After reading these two proposals, I have a few very high-level
> > > > comments,
> > > > >>> dividing into three categories.
> > > > >>>
> > > > >>>
> > > > >>> *API*
> > > > >>>
> > > > >>> - I think there are not fundamentally differences between these
> two
> > > > >>> proposals.
> > > > >>> They are trying to achieve similar goals by exposing durability
> > > levels
> > > > in
> > > > >>> different way.
> > > > >>> So this will be a discussion on what API/interface should look
> like
> > > > from
> > > > >>> user / admin perspective.
> > > > >>> I would suggest focusing what would be the API itself, putting
> the
> > > > >>> implementation design aside when talking about this.
> > > > >>>
> > > > >>> *Core*
> > > > >>>
> > > > >>> - Both proposals need to deal with a core function - what happen
> to
> > > LAC
> > > > >>> and
> > > > >>> what semantic that bookkeeper provides.
> > > > >>> JV did a good summary in his proposal. However I am not a fan of
> > > > >>> maintaining two different semantics. So I am looking for
> > > > >>> a solution that bookkeeper can only maintain one semantic. The
> > > semantic
> > > > >>> is
> > > > >>> basically:
> > > > >>>
> > > > >>> 1) LAC only advanced when entries before LAC are committed to the
> > > > >>> persistent storage
> > > > >>> 2) All the entries until LAC are successfully committed to the
> > > > >>> persistence
> > > > >>> storage
> > > > >>> 3) Entries until LAC: all the entries must be readable all the
> > time.
> > > > >>>
> > > > >>> If we maintain such semantic, there is no need to change the auto
> > > > >>> recovery
> > > > >>> protocol in bookkeeper. All what we guarantee are the entries
> > durably
> > > > >>> persistent.
> > > > >>>
> > > > >>> In order to maintain such semantic, I think both me and JV
> proposed
> > > > >>> similar
> > > > >>> solution in either proposal. I am trying to finalize one here:
> > > > >>>
> > > > >>> * bookie maintains a LAS (Last Add Synced) point for each entry.
> > > > >>> * LAS can be piggybacked on AddResponses
> > > > >>> * Client uses the LAS to advance LAC.
> > > > >>>
> > > > >>> If we can agree on the core semantic we are going to provide, the
> > > other
> > > > >>> things are just logistics.
> > > > >>>
> > > > >>> *Others*
> > > > >>>
> > > > >>> - Regarding separating journal or bypassing journal, there is no
> > > > >>> difference
> > > > >>> when we talking from the core semantic. They are all non-durably
> > > writes
> > > > >>> (acknowledging before fsyncing).
> > > > >>> We can start with same journal approach (but just acknowledge
> > before
> > > > >>> fsyncing), implement the core and add other options later on.
> > > > >>>
> > > > >>>
> > > > >>> From my point of view, I'd be more interesting in providing a
> > single
> > > > >>> consistent durable semantic that application can rely on for both
> > > > durable
> > > > >>> writes and non-durable writes. The other stuffs seem to be more
> > > > logistics
> > > > >>> things.
> > > > >>>
> > > > >>>
> > > > >>> - Sijie
> > > > >>>
> > > > >>>
> > > > >>> On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <
> > > eolivelli@gmail.com
> > > > >
> > > > >>> wrote:
> > > > >>>
> > > > >>> > 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <
> > > > jujjuri@gmail.com
> > > > >>> >:
> > > > >>> >
> > > > >>> > > I don't believe I fully followed your second case. But even
> in
> > > this
> > > > >>> case,
> > > > >>> > > your major concern is about the additional 'sync' RPC?
> > > > >>> > >
> > > > >>> >
> > > > >>> > yes apart from that I am fine with your proposal too, that is
> to
> > > > have a
> > > > >>> > LedgerType which drives durability
> > > > >>> > and I think we need to add per-entry durability options
> > > > >>> >
> > > > >>> > I think that at least for the 'simple' no-sync addEntry we do
> not
> > > > need
> > > > >>> to
> > > > >>> > change many things, I am drafting a prototype, I will share it
> as
> > > > soon
> > > > >>> as
> > > > >>> > we all agree on the roadmap
> > > > >>> >
> > > > >>> > The first implementation can cover the first cases (no-sync
> > > addEntry)
> > > > >>> and
> > > > >>> > change the way the writer advances the LAC in order to support
> > > > 'relaxed
> > > > >>> > durability writes'.
> > > > >>> > This change will be compatible with future improvements and it
> > will
> > > > >>> open
> > > > >>> > the door for big changes on the bookie side like bypassing the
> > > > journal
> > > > >>> or
> > > > >>> > leveraging multiple journals.....
> > > > >>> >
> > > > >>> > -- Enrico
> > > > >>> >
> > > > >>> > or something else that the LedgerType proposal won't work?
> > > > >>> > >
> > > > >>> >
> > > > >>> > >
> > > > >>> > >
> > > > >>> > > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <
> > > > >>> eolivelli@gmail.com>
> > > > >>> > > wrote:
> > > > >>> > >
> > > > >>> > > > I think that having a set of options on the ledger metadata
> > > will
> > > > >>> be a
> > > > >>> > > good
> > > > >>> > > > enhancement and I am sure we will do it as soon as it will
> be
> > > > >>> needed,
> > > > >>> > > maybe
> > > > >>> > > > we do not need it now.
> > > > >>> > > >
> > > > >>> > > > Actually I think we will need to declare this
> > durability-level
> > > at
> > > > >>> entry
> > > > >>> > > > level to support some uses cases in BP-14 document, let me
> > > > explain
> > > > >>> two
> > > > >>> > of
> > > > >>> > > > my usecases for which I need it:
> > > > >>> > > >
> > > > >>> > > > At higher level we have to choices:
> > > > >>> > > >
> > > > >>> > > > A) per-ledger durability options (JV proposal)
> > > > >>> > > > all addEntry operations are durable or non-durable and
> there
> > is
> > > > an
> > > > >>> > > explicit
> > > > >>> > > > 'sync' API (+ forced sync at close)
> > > > >>> > > >
> > > > >>> > > > B) per-entry durability options (original BP-14 proposal)
> > > > >>> > > > every addEntry has an own durable/non-durable option
> > > > >>> (sync/no-sync),
> > > > >>> > with
> > > > >>> > > > the ability to call 'sync' without addEntry (+ forced sync
> at
> > > > >>> close)
> > > > >>> > > >
> > > > >>> > > > I am speaking about the the database WAL case, I am using
> the
> > > > >>> ledger as
> > > > >>> > > > segment for the WAL of a database and I am writing all data
> > > > >>> changes in
> > > > >>> > > the
> > > > >>> > > > scope of a 'transaction' with the relaxed-durability flag,
> > > then I
> > > > >>> am
> > > > >>> > > > writing the 'transaction committed' entry with "strict
> > > > durability"
> > > > >>> > > > requirement, this will in fact require that all previous
> > > entries
> > > > >>> are
> > > > >>> > > > persisted durably and so that the transaction will never be
> > > lost.
> > > > >>> > > >
> > > > >>> > > > In this scenario we would need an addEntry + sync API in
> > fact:
> > > > >>> > > >
> > > > >>> > > > using option  A) the WAL will look like:
> > > > >>> > > > - open ledger no-sync = true
> > > > >>> > > > - addEntry (set foo=bar)  (this will be no-sync)
> > > > >>> > > > - addEntry (set foo=bar2) (this will be no-sync)
> > > > >>> > > > - addEntry (commit)
> > > > >>> > > > - sync
> > > > >>> > > >
> > > > >>> > > > using option B) the WAL will look like
> > > > >>> > > > - open ledger
> > > > >>> > > > - addEntry (set foo=bar), no-sync
> > > > >>> > > > - addEntry (set foo=bar2), no-sync
> > > > >>> > > > - addEntry (commit), sync
> > > > >>> > > >
> > > > >>> > > > in case B) we are "saving" one RPC call to every bookie
> (the
> > > > 'sync'
> > > > >>> > one)
> > > > >>> > > > same for single data change entries, like updating a single
> > > > record
> > > > >>> on
> > > > >>> > the
> > > > >>> > > > database, this with BK 4.5 "costs" only a single RPC to
> every
> > > > >>> bookie
> > > > >>> > > >
> > > > >>> > > > Second case:
> > > > >>> > > > I am using BookKeeper to store binary objects, so I am
> > packing
> > > > more
> > > > >>> > > > 'objects' (named sequences of bytes) into a single ledger,
> > like
> > > > >>> you do
> > > > >>> > > when
> > > > >>> > > > you write many records to a file in a streaming fashion and
> > > keep
> > > > >>> track
> > > > >>> > of
> > > > >>> > > > offsets of the beginning of every record (LedgerHandeAdv is
> > > > >>> perfect for
> > > > >>> > > > this case).
> > > > >>> > > > I am not using a single ledger per 'file' because it kills
> > > > >>> zookeeper to
> > > > >>> > > > create many ledgers very fast, in my systems I have big
> busts
> > > of
> > > > >>> > writes,
> > > > >>> > > > which need to be really "fast", so I am writing multiple
> > > 'files'
> > > > to
> > > > >>> > every
> > > > >>> > > > single ledger. So the close-to-open consistency at ledger
> > level
> > > > is
> > > > >>> not
> > > > >>> > > > suitable for this case.
> > > > >>> > > > I have to write as fast as possible to this 'ledger-backed'
> > > > >>> stream, and
> > > > >>> > > as
> > > > >>> > > > with a 'traditional'  filesystem I am writing parts of each
> > > file
> > > > >>> and
> > > > >>> > than
> > > > >>> > > > requiring 'sync' at the end of each file.
> > > > >>> > > > Using BookKeeper you need to split big 'files' into
> "little"
> > > > >>> parts, you
> > > > >>> > > > cannot transmit the contents as to "real" stream on
> network.
> > > > >>> > > >
> > > > >>> > > > I am not talking about bookie level implementation details
> I
> > > > would
> > > > >>> like
> > > > >>> > > to
> > > > >>> > > > define the high level API in order to support all the
> > relevant
> > > > >>> known
> > > > >>> > use
> > > > >>> > > > cases and keep space for the future,
> > > > >>> > > > at this moment adding a per-entry 'durability option' seems
> > to
> > > be
> > > > >>> very
> > > > >>> > > > flexible and simple to implement, it does not prevent us
> from
> > > > doing
> > > > >>> > > further
> > > > >>> > > > improvements, like namely skipping the journal.
> > > > >>> > > >
> > > > >>> > > > Enrico
> > > > >>> > > >
> > > > >>> > > >
> > > > >>> > > >
> > > > >>> > > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <
> > > eolivelli@gmail.com
> > > > >:
> > > > >>> > > >
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
> > > > >>> > jujjuri@gmail.com>
> > > > >>> > > > > wrote:
> > > > >>> > > > >
> > > > >>> > > > >> Hi all,
> > > > >>> > > > >>
> > > > >>> > > > >> As promised during Thursday call, here is my proposal.
> > > > >>> > > > >>
> > > > >>> > > > >> *NOTE*: Major difference in this proposal compared to
> > > Enrico’s
> > > > >>> > > > >> <https://docs.google.com/document/d/
> > 1JLYO3K3tZ5PJGmyS0YK_-
> > > > >>> > > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> > > > >>> > > > >> is
> > > > >>> > > > >> making the durability a property of the ledger(type) as
> > > > opposed
> > > > >>> to
> > > > >>> > > > >> addEntry(). Rest of the technical details have a lot of
> > > > >>> > similarities.
> > > > >>> > > > >>
> > > > >>> > > > >
> > > > >>> > > > > Thank you JV. I have just read quickly the doc and your
> > view
> > > is
> > > > >>> > > centantly
> > > > >>> > > > > broader.
> > > > >>> > > > > I will dig into the doc as soon as possible on Monday.
> > > > >>> > > > > For me it is ok to have a ledger wide configuration I
> think
> > > > that
> > > > >>> the
> > > > >>> > > most
> > > > >>> > > > > important decision is about the API we will provide as in
> > the
> > > > >>> future
> > > > >>> > it
> > > > >>> > > > > will be difficult to change it.
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > > Cheers
> > > > >>> > > > > Enrico
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > >> https://docs.google.com/document/d/
> > > 1g1eBcVVCZrTG8YZliZP0LVqv
> > > > >>> Wpq43
> > > > >>> > > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> > > > >>> > > > >>
> > > > >>> > > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
> > > > >>> > eolivelli@gmail.com
> > > > >>> > > >
> > > > >>> > > > >> wrote:
> > > > >>> > > > >>
> > > > >>> > > > >> > Thank you all for the comments and for taking a look
> to
> > > the
> > > > >>> > document
> > > > >>> > > > so
> > > > >>> > > > >> > soon.
> > > > >>> > > > >> > I have updated the doc, we will discuss the document
> at
> > > the
> > > > >>> > meeting,
> > > > >>> > > > >> >
> > > > >>> > > > >> >
> > > > >>> > > > >> > Enrico
> > > > >>> > > > >> >
> > > > >>> > > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <
> guosijie@gmail.com
> > >:
> > > > >>> > > > >> >
> > > > >>> > > > >> > > Enrico,
> > > > >>> > > > >> > >
> > > > >>> > > > >> > > Thank you so much! It is a great effort for putting
> > this
> > > > up.
> > > > >>> > > Overall
> > > > >>> > > > >> > looks
> > > > >>> > > > >> > > good. I made some comments, we can discuss at
> > tomorrow's
> > > > >>> > community
> > > > >>> > > > >> > meeting.
> > > > >>> > > > >> > >
> > > > >>> > > > >> > > - Sijie
> > > > >>> > > > >> > >
> > > > >>> > > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> > > > >>> > > > eolivelli@gmail.com
> > > > >>> > > > >> >
> > > > >>> > > > >> > > wrote:
> > > > >>> > > > >> > >
> > > > >>> > > > >> > > > Hi all,
> > > > >>> > > > >> > > > I have drafted a first proposal for BP-14 - Relax
> > > > >>> Durability
> > > > >>> > > > >> > > >
> > > > >>> > > > >> > > > We are talking about limiting the number of fsync
> to
> > > the
> > > > >>> > journal
> > > > >>> > > > >> while
> > > > >>> > > > >> > > > preserving the correctness of the LAC protocol.
> > > > >>> > > > >> > > >
> > > > >>> > > > >> > > > This is the link to the wiki page, but as the
> issue
> > is
> > > > >>> huge we
> > > > >>> > > > >> prefer
> > > > >>> > > > >> > to
> > > > >>> > > > >> > > > use Google Documents for sharing comments
> > > > >>> > > > >> > > > https://cwiki.apache.org/
> > > confluence/display/BOOKKEEPER/
> > > > >>> > > > >> > > > BP+-+14+Relax+durability
> > > > >>> > > > >> > > >
> > > > >>> > > > >> > > > This is the document
> > > > >>> > > > >> > > > https://docs.google.com/document/d/
> > > > 1JLYO3K3tZ5PJGmyS0YK_-
> > > > >>> > > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > > > >>> > > > >> > > >
> > > > >>> > > > >> > > > All comments are welcome
> > > > >>> > > > >> > > >
> > > > >>> > > > >> > > > I have added DL dev list in cc as the discussion
> is
> > > > >>> > interesting
> > > > >>> > > > for
> > > > >>> > > > >> > both
> > > > >>> > > > >> > > > groups
> > > > >>> > > > >> > > >
> > > > >>> > > > >> > > > Enrico Olivelli
> > > > >>> > > > >> > > >
> > > > >>> > > > >> > >
> > > > >>> > > > >> >
> > > > >>> > > > >>
> > > > >>> > > > >>
> > > > >>> > > > >>
> > > > >>> > > > >> --
> > > > >>> > > > >> Jvrao
> > > > >>> > > > >> ---
> > > > >>> > > > >> First they ignore you, then they laugh at you, then they
> > > fight
> > > > >>> you,
> > > > >>> > > then
> > > > >>> > > > >> you win. - Mahatma Gandhi
> > > > >>> > > > >>
> > > > >>> > > > > --
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > > -- Enrico Olivelli
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> > >
> > > > >>> > >
> > > > >>> > > --
> > > > >>> > > Jvrao
> > > > >>> > > ---
> > > > >>> > > First they ignore you, then they laugh at you, then they
> fight
> > > you,
> > > > >>> then
> > > > >>> > > you win. - Mahatma Gandhi
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
Thanks Sijie
I will do my best.

I can try to separate:
1) protocol changes (protobuf)
2) new client side API
3) LAC protocol changes bookie side changes
4) additional tests

Actually I already have a private work-in-progress branch with the full
stack, I will finish to implement the document and the split into pieces.

b.q.
I left one comment on the doc about the retention of the SyncCounter on the
bookie side

-- Enrico


2017-09-12 10:08 GMT+02:00 Sijie Guo <gu...@gmail.com>:

> Cool.
>
> I would expect this is a big change. It would be good if you can divide it
> into smaller tasks, so people can review them easier.
>
> - Sijie
>
> On Tue, Sep 12, 2017 at 1:05 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Thank you all !
> >
> > I will copy the content of the Final draft to the Wiki and mark the
> > document as "Accepted"
> >
> > I will send a PR soon but it will depend on BP-15 New CreateLeader API
> >
> > I hope we could make it for 4.6
> >
> >
> > Enrico
> >
> >
> > 2017-09-11 18:58 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> >
> > > Enrico,
> > >
> > > Feel free to close the thread and mark this BP as accepted, if there is
> > no
> > > -1.
> > >
> > > - Sijie
> > >
> > > On Mon, Sep 11, 2017 at 2:26 AM, Enrico Olivelli <eo...@gmail.com>
> > > wrote:
> > >
> > > > Ping
> > > >
> > > > 2017-09-07 9:32 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> > > >
> > > > > Hi all,
> > > > >
> > > > >
> > > > > You can find the revised proposal here
> > > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > > > BP-14+Relax+durability
> > > > >
> > > > > The link to the document open for comments is this:
> > > > > https://docs.google.com/document/d/1yNi9t2_
> > > > deOOMXDaGzrnmaHTQeB3B3Fnym82DU
> > > > > ERH7LM/edit?usp=sharing
> > > > >
> > > > > Please check it out
> > > > > We are going to review this Proposal at the meeting
> > > > >
> > > > > -- Enrico
> > > > >
> > > > >
> > > > > 2017-08-30 8:56 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> > > > >
> > > > >> Thank you Sijie for summarizing and thanks to the community for
> > > helping
> > > > >> in this important enhancement to BookKeeper
> > > > >>
> > > > >> I am convinced that as JV pointed out we need to declare at ledger
> > > > >> creation time that the ledger is going to perform no-sync writes.
> > > > >>
> > > > >> I think we need an explicit declaration currently to make things
> > > "clear"
> > > > >> to the developer which is using the LedgerHandle API even and
> ledger
> > > > >> creation tyime.
> > > > >>
> > > > >> The case is that we are going to forbid "striping" ledgers
> (ensemble
> > > > size
> > > > >> > quorum size) for no-sync writes in the first implementation:
> > > > >> - one option is to  fail at the first no-sync addEntry, but this
> > will
> > > be
> > > > >> really uncomfortable because usually the ack/write/ensemble sizes
> > are
> > > > >> configured by the admin, and there will be configurations in which
> > > > errors
> > > > >> will come out only after starting the system.
> > > > >> - the second option is to make the developer explicitly enable
> > no-sync
> > > > >> writes at creation time and fail the creation of the ledger if the
> > > > >> requested combination of options if not possible
> > > > >>
> > > > >> I am not sure that the changes to the bookie internals are a
> > > Client-API
> > > > >> matter, maybe we can leverage custom metadata (as JV said) in
> order
> > to
> > > > make
> > > > >> the bookie handle ledgers in a different manner, this way will be
> > > always
> > > > >> open as custom metadata are already here.
> > > > >>
> > > > >> JV preferred the ledger-type approach, the dual solution is to
> > > introduce
> > > > >> a list of "capabilities" or "ledger options".
> > > > >> I think that this ability to perform no-syc writes is so important
> > > that
> > > > >> "custom metadata" is not the good place to declare it, same for
> > > "ledger
> > > > >> type"
> > > > >>
> > > > >> So I am proposing to add a boolean 'allowNoSyncWrites" at ledger
> > > > creation
> > > > >> time, without writing in to ledger metadata on ZK,
> > > > >> I think that if further improvements will need ledger metadata
> > changes
> > > > we
> > > > >> will do.
> > > > >>
> > > > >> I have updated the BP-14 document, I have added an "Open issues"
> > > footer
> > > > >> with the open points,
> > > > >> please add comments and I will correct the document as soon as
> > > possible.
> > > > >>
> > > > >>
> > > > >> Enrico
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> 2017-08-30 1:24 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > > > >>
> > > > >>> Thank you, Enrico, JV.
> > > > >>>
> > > > >>> These are great discussions.
> > > > >>>
> > > > >>> After reading these two proposals, I have a few very high-level
> > > > comments,
> > > > >>> dividing into three categories.
> > > > >>>
> > > > >>>
> > > > >>> *API*
> > > > >>>
> > > > >>> - I think there are not fundamentally differences between these
> two
> > > > >>> proposals.
> > > > >>> They are trying to achieve similar goals by exposing durability
> > > levels
> > > > in
> > > > >>> different way.
> > > > >>> So this will be a discussion on what API/interface should look
> like
> > > > from
> > > > >>> user / admin perspective.
> > > > >>> I would suggest focusing what would be the API itself, putting
> the
> > > > >>> implementation design aside when talking about this.
> > > > >>>
> > > > >>> *Core*
> > > > >>>
> > > > >>> - Both proposals need to deal with a core function - what happen
> to
> > > LAC
> > > > >>> and
> > > > >>> what semantic that bookkeeper provides.
> > > > >>> JV did a good summary in his proposal. However I am not a fan of
> > > > >>> maintaining two different semantics. So I am looking for
> > > > >>> a solution that bookkeeper can only maintain one semantic. The
> > > semantic
> > > > >>> is
> > > > >>> basically:
> > > > >>>
> > > > >>> 1) LAC only advanced when entries before LAC are committed to the
> > > > >>> persistent storage
> > > > >>> 2) All the entries until LAC are successfully committed to the
> > > > >>> persistence
> > > > >>> storage
> > > > >>> 3) Entries until LAC: all the entries must be readable all the
> > time.
> > > > >>>
> > > > >>> If we maintain such semantic, there is no need to change the auto
> > > > >>> recovery
> > > > >>> protocol in bookkeeper. All what we guarantee are the entries
> > durably
> > > > >>> persistent.
> > > > >>>
> > > > >>> In order to maintain such semantic, I think both me and JV
> proposed
> > > > >>> similar
> > > > >>> solution in either proposal. I am trying to finalize one here:
> > > > >>>
> > > > >>> * bookie maintains a LAS (Last Add Synced) point for each entry.
> > > > >>> * LAS can be piggybacked on AddResponses
> > > > >>> * Client uses the LAS to advance LAC.
> > > > >>>
> > > > >>> If we can agree on the core semantic we are going to provide, the
> > > other
> > > > >>> things are just logistics.
> > > > >>>
> > > > >>> *Others*
> > > > >>>
> > > > >>> - Regarding separating journal or bypassing journal, there is no
> > > > >>> difference
> > > > >>> when we talking from the core semantic. They are all non-durably
> > > writes
> > > > >>> (acknowledging before fsyncing).
> > > > >>> We can start with same journal approach (but just acknowledge
> > before
> > > > >>> fsyncing), implement the core and add other options later on.
> > > > >>>
> > > > >>>
> > > > >>> From my point of view, I'd be more interesting in providing a
> > single
> > > > >>> consistent durable semantic that application can rely on for both
> > > > durable
> > > > >>> writes and non-durable writes. The other stuffs seem to be more
> > > > logistics
> > > > >>> things.
> > > > >>>
> > > > >>>
> > > > >>> - Sijie
> > > > >>>
> > > > >>>
> > > > >>> On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <
> > > eolivelli@gmail.com
> > > > >
> > > > >>> wrote:
> > > > >>>
> > > > >>> > 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <
> > > > jujjuri@gmail.com
> > > > >>> >:
> > > > >>> >
> > > > >>> > > I don't believe I fully followed your second case. But even
> in
> > > this
> > > > >>> case,
> > > > >>> > > your major concern is about the additional 'sync' RPC?
> > > > >>> > >
> > > > >>> >
> > > > >>> > yes apart from that I am fine with your proposal too, that is
> to
> > > > have a
> > > > >>> > LedgerType which drives durability
> > > > >>> > and I think we need to add per-entry durability options
> > > > >>> >
> > > > >>> > I think that at least for the 'simple' no-sync addEntry we do
> not
> > > > need
> > > > >>> to
> > > > >>> > change many things, I am drafting a prototype, I will share it
> as
> > > > soon
> > > > >>> as
> > > > >>> > we all agree on the roadmap
> > > > >>> >
> > > > >>> > The first implementation can cover the first cases (no-sync
> > > addEntry)
> > > > >>> and
> > > > >>> > change the way the writer advances the LAC in order to support
> > > > 'relaxed
> > > > >>> > durability writes'.
> > > > >>> > This change will be compatible with future improvements and it
> > will
> > > > >>> open
> > > > >>> > the door for big changes on the bookie side like bypassing the
> > > > journal
> > > > >>> or
> > > > >>> > leveraging multiple journals.....
> > > > >>> >
> > > > >>> > -- Enrico
> > > > >>> >
> > > > >>> > or something else that the LedgerType proposal won't work?
> > > > >>> > >
> > > > >>> >
> > > > >>> > >
> > > > >>> > >
> > > > >>> > > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <
> > > > >>> eolivelli@gmail.com>
> > > > >>> > > wrote:
> > > > >>> > >
> > > > >>> > > > I think that having a set of options on the ledger metadata
> > > will
> > > > >>> be a
> > > > >>> > > good
> > > > >>> > > > enhancement and I am sure we will do it as soon as it will
> be
> > > > >>> needed,
> > > > >>> > > maybe
> > > > >>> > > > we do not need it now.
> > > > >>> > > >
> > > > >>> > > > Actually I think we will need to declare this
> > durability-level
> > > at
> > > > >>> entry
> > > > >>> > > > level to support some uses cases in BP-14 document, let me
> > > > explain
> > > > >>> two
> > > > >>> > of
> > > > >>> > > > my usecases for which I need it:
> > > > >>> > > >
> > > > >>> > > > At higher level we have to choices:
> > > > >>> > > >
> > > > >>> > > > A) per-ledger durability options (JV proposal)
> > > > >>> > > > all addEntry operations are durable or non-durable and
> there
> > is
> > > > an
> > > > >>> > > explicit
> > > > >>> > > > 'sync' API (+ forced sync at close)
> > > > >>> > > >
> > > > >>> > > > B) per-entry durability options (original BP-14 proposal)
> > > > >>> > > > every addEntry has an own durable/non-durable option
> > > > >>> (sync/no-sync),
> > > > >>> > with
> > > > >>> > > > the ability to call 'sync' without addEntry (+ forced sync
> at
> > > > >>> close)
> > > > >>> > > >
> > > > >>> > > > I am speaking about the the database WAL case, I am using
> the
> > > > >>> ledger as
> > > > >>> > > > segment for the WAL of a database and I am writing all data
> > > > >>> changes in
> > > > >>> > > the
> > > > >>> > > > scope of a 'transaction' with the relaxed-durability flag,
> > > then I
> > > > >>> am
> > > > >>> > > > writing the 'transaction committed' entry with "strict
> > > > durability"
> > > > >>> > > > requirement, this will in fact require that all previous
> > > entries
> > > > >>> are
> > > > >>> > > > persisted durably and so that the transaction will never be
> > > lost.
> > > > >>> > > >
> > > > >>> > > > In this scenario we would need an addEntry + sync API in
> > fact:
> > > > >>> > > >
> > > > >>> > > > using option  A) the WAL will look like:
> > > > >>> > > > - open ledger no-sync = true
> > > > >>> > > > - addEntry (set foo=bar)  (this will be no-sync)
> > > > >>> > > > - addEntry (set foo=bar2) (this will be no-sync)
> > > > >>> > > > - addEntry (commit)
> > > > >>> > > > - sync
> > > > >>> > > >
> > > > >>> > > > using option B) the WAL will look like
> > > > >>> > > > - open ledger
> > > > >>> > > > - addEntry (set foo=bar), no-sync
> > > > >>> > > > - addEntry (set foo=bar2), no-sync
> > > > >>> > > > - addEntry (commit), sync
> > > > >>> > > >
> > > > >>> > > > in case B) we are "saving" one RPC call to every bookie
> (the
> > > > 'sync'
> > > > >>> > one)
> > > > >>> > > > same for single data change entries, like updating a single
> > > > record
> > > > >>> on
> > > > >>> > the
> > > > >>> > > > database, this with BK 4.5 "costs" only a single RPC to
> every
> > > > >>> bookie
> > > > >>> > > >
> > > > >>> > > > Second case:
> > > > >>> > > > I am using BookKeeper to store binary objects, so I am
> > packing
> > > > more
> > > > >>> > > > 'objects' (named sequences of bytes) into a single ledger,
> > like
> > > > >>> you do
> > > > >>> > > when
> > > > >>> > > > you write many records to a file in a streaming fashion and
> > > keep
> > > > >>> track
> > > > >>> > of
> > > > >>> > > > offsets of the beginning of every record (LedgerHandeAdv is
> > > > >>> perfect for
> > > > >>> > > > this case).
> > > > >>> > > > I am not using a single ledger per 'file' because it kills
> > > > >>> zookeeper to
> > > > >>> > > > create many ledgers very fast, in my systems I have big
> busts
> > > of
> > > > >>> > writes,
> > > > >>> > > > which need to be really "fast", so I am writing multiple
> > > 'files'
> > > > to
> > > > >>> > every
> > > > >>> > > > single ledger. So the close-to-open consistency at ledger
> > level
> > > > is
> > > > >>> not
> > > > >>> > > > suitable for this case.
> > > > >>> > > > I have to write as fast as possible to this 'ledger-backed'
> > > > >>> stream, and
> > > > >>> > > as
> > > > >>> > > > with a 'traditional'  filesystem I am writing parts of each
> > > file
> > > > >>> and
> > > > >>> > than
> > > > >>> > > > requiring 'sync' at the end of each file.
> > > > >>> > > > Using BookKeeper you need to split big 'files' into
> "little"
> > > > >>> parts, you
> > > > >>> > > > cannot transmit the contents as to "real" stream on
> network.
> > > > >>> > > >
> > > > >>> > > > I am not talking about bookie level implementation details
> I
> > > > would
> > > > >>> like
> > > > >>> > > to
> > > > >>> > > > define the high level API in order to support all the
> > relevant
> > > > >>> known
> > > > >>> > use
> > > > >>> > > > cases and keep space for the future,
> > > > >>> > > > at this moment adding a per-entry 'durability option' seems
> > to
> > > be
> > > > >>> very
> > > > >>> > > > flexible and simple to implement, it does not prevent us
> from
> > > > doing
> > > > >>> > > further
> > > > >>> > > > improvements, like namely skipping the journal.
> > > > >>> > > >
> > > > >>> > > > Enrico
> > > > >>> > > >
> > > > >>> > > >
> > > > >>> > > >
> > > > >>> > > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <
> > > eolivelli@gmail.com
> > > > >:
> > > > >>> > > >
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
> > > > >>> > jujjuri@gmail.com>
> > > > >>> > > > > wrote:
> > > > >>> > > > >
> > > > >>> > > > >> Hi all,
> > > > >>> > > > >>
> > > > >>> > > > >> As promised during Thursday call, here is my proposal.
> > > > >>> > > > >>
> > > > >>> > > > >> *NOTE*: Major difference in this proposal compared to
> > > Enrico’s
> > > > >>> > > > >> <https://docs.google.com/document/d/
> > 1JLYO3K3tZ5PJGmyS0YK_-
> > > > >>> > > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> > > > >>> > > > >> is
> > > > >>> > > > >> making the durability a property of the ledger(type) as
> > > > opposed
> > > > >>> to
> > > > >>> > > > >> addEntry(). Rest of the technical details have a lot of
> > > > >>> > similarities.
> > > > >>> > > > >>
> > > > >>> > > > >
> > > > >>> > > > > Thank you JV. I have just read quickly the doc and your
> > view
> > > is
> > > > >>> > > centantly
> > > > >>> > > > > broader.
> > > > >>> > > > > I will dig into the doc as soon as possible on Monday.
> > > > >>> > > > > For me it is ok to have a ledger wide configuration I
> think
> > > > that
> > > > >>> the
> > > > >>> > > most
> > > > >>> > > > > important decision is about the API we will provide as in
> > the
> > > > >>> future
> > > > >>> > it
> > > > >>> > > > > will be difficult to change it.
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > > Cheers
> > > > >>> > > > > Enrico
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > >> https://docs.google.com/document/d/
> > > 1g1eBcVVCZrTG8YZliZP0LVqv
> > > > >>> Wpq43
> > > > >>> > > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> > > > >>> > > > >>
> > > > >>> > > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
> > > > >>> > eolivelli@gmail.com
> > > > >>> > > >
> > > > >>> > > > >> wrote:
> > > > >>> > > > >>
> > > > >>> > > > >> > Thank you all for the comments and for taking a look
> to
> > > the
> > > > >>> > document
> > > > >>> > > > so
> > > > >>> > > > >> > soon.
> > > > >>> > > > >> > I have updated the doc, we will discuss the document
> at
> > > the
> > > > >>> > meeting,
> > > > >>> > > > >> >
> > > > >>> > > > >> >
> > > > >>> > > > >> > Enrico
> > > > >>> > > > >> >
> > > > >>> > > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <
> guosijie@gmail.com
> > >:
> > > > >>> > > > >> >
> > > > >>> > > > >> > > Enrico,
> > > > >>> > > > >> > >
> > > > >>> > > > >> > > Thank you so much! It is a great effort for putting
> > this
> > > > up.
> > > > >>> > > Overall
> > > > >>> > > > >> > looks
> > > > >>> > > > >> > > good. I made some comments, we can discuss at
> > tomorrow's
> > > > >>> > community
> > > > >>> > > > >> > meeting.
> > > > >>> > > > >> > >
> > > > >>> > > > >> > > - Sijie
> > > > >>> > > > >> > >
> > > > >>> > > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> > > > >>> > > > eolivelli@gmail.com
> > > > >>> > > > >> >
> > > > >>> > > > >> > > wrote:
> > > > >>> > > > >> > >
> > > > >>> > > > >> > > > Hi all,
> > > > >>> > > > >> > > > I have drafted a first proposal for BP-14 - Relax
> > > > >>> Durability
> > > > >>> > > > >> > > >
> > > > >>> > > > >> > > > We are talking about limiting the number of fsync
> to
> > > the
> > > > >>> > journal
> > > > >>> > > > >> while
> > > > >>> > > > >> > > > preserving the correctness of the LAC protocol.
> > > > >>> > > > >> > > >
> > > > >>> > > > >> > > > This is the link to the wiki page, but as the
> issue
> > is
> > > > >>> huge we
> > > > >>> > > > >> prefer
> > > > >>> > > > >> > to
> > > > >>> > > > >> > > > use Google Documents for sharing comments
> > > > >>> > > > >> > > > https://cwiki.apache.org/
> > > confluence/display/BOOKKEEPER/
> > > > >>> > > > >> > > > BP+-+14+Relax+durability
> > > > >>> > > > >> > > >
> > > > >>> > > > >> > > > This is the document
> > > > >>> > > > >> > > > https://docs.google.com/document/d/
> > > > 1JLYO3K3tZ5PJGmyS0YK_-
> > > > >>> > > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > > > >>> > > > >> > > >
> > > > >>> > > > >> > > > All comments are welcome
> > > > >>> > > > >> > > >
> > > > >>> > > > >> > > > I have added DL dev list in cc as the discussion
> is
> > > > >>> > interesting
> > > > >>> > > > for
> > > > >>> > > > >> > both
> > > > >>> > > > >> > > > groups
> > > > >>> > > > >> > > >
> > > > >>> > > > >> > > > Enrico Olivelli
> > > > >>> > > > >> > > >
> > > > >>> > > > >> > >
> > > > >>> > > > >> >
> > > > >>> > > > >>
> > > > >>> > > > >>
> > > > >>> > > > >>
> > > > >>> > > > >> --
> > > > >>> > > > >> Jvrao
> > > > >>> > > > >> ---
> > > > >>> > > > >> First they ignore you, then they laugh at you, then they
> > > fight
> > > > >>> you,
> > > > >>> > > then
> > > > >>> > > > >> you win. - Mahatma Gandhi
> > > > >>> > > > >>
> > > > >>> > > > > --
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > > -- Enrico Olivelli
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> > >
> > > > >>> > >
> > > > >>> > > --
> > > > >>> > > Jvrao
> > > > >>> > > ---
> > > > >>> > > First they ignore you, then they laugh at you, then they
> fight
> > > you,
> > > > >>> then
> > > > >>> > > you win. - Mahatma Gandhi
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Sijie Guo <gu...@gmail.com>.
Cool.

I would expect this is a big change. It would be good if you can divide it
into smaller tasks, so people can review them easier.

- Sijie

On Tue, Sep 12, 2017 at 1:05 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> Thank you all !
>
> I will copy the content of the Final draft to the Wiki and mark the
> document as "Accepted"
>
> I will send a PR soon but it will depend on BP-15 New CreateLeader API
>
> I hope we could make it for 4.6
>
>
> Enrico
>
>
> 2017-09-11 18:58 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>
> > Enrico,
> >
> > Feel free to close the thread and mark this BP as accepted, if there is
> no
> > -1.
> >
> > - Sijie
> >
> > On Mon, Sep 11, 2017 at 2:26 AM, Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > Ping
> > >
> > > 2017-09-07 9:32 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> > >
> > > > Hi all,
> > > >
> > > >
> > > > You can find the revised proposal here
> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > > BP-14+Relax+durability
> > > >
> > > > The link to the document open for comments is this:
> > > > https://docs.google.com/document/d/1yNi9t2_
> > > deOOMXDaGzrnmaHTQeB3B3Fnym82DU
> > > > ERH7LM/edit?usp=sharing
> > > >
> > > > Please check it out
> > > > We are going to review this Proposal at the meeting
> > > >
> > > > -- Enrico
> > > >
> > > >
> > > > 2017-08-30 8:56 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> > > >
> > > >> Thank you Sijie for summarizing and thanks to the community for
> > helping
> > > >> in this important enhancement to BookKeeper
> > > >>
> > > >> I am convinced that as JV pointed out we need to declare at ledger
> > > >> creation time that the ledger is going to perform no-sync writes.
> > > >>
> > > >> I think we need an explicit declaration currently to make things
> > "clear"
> > > >> to the developer which is using the LedgerHandle API even and ledger
> > > >> creation tyime.
> > > >>
> > > >> The case is that we are going to forbid "striping" ledgers (ensemble
> > > size
> > > >> > quorum size) for no-sync writes in the first implementation:
> > > >> - one option is to  fail at the first no-sync addEntry, but this
> will
> > be
> > > >> really uncomfortable because usually the ack/write/ensemble sizes
> are
> > > >> configured by the admin, and there will be configurations in which
> > > errors
> > > >> will come out only after starting the system.
> > > >> - the second option is to make the developer explicitly enable
> no-sync
> > > >> writes at creation time and fail the creation of the ledger if the
> > > >> requested combination of options if not possible
> > > >>
> > > >> I am not sure that the changes to the bookie internals are a
> > Client-API
> > > >> matter, maybe we can leverage custom metadata (as JV said) in order
> to
> > > make
> > > >> the bookie handle ledgers in a different manner, this way will be
> > always
> > > >> open as custom metadata are already here.
> > > >>
> > > >> JV preferred the ledger-type approach, the dual solution is to
> > introduce
> > > >> a list of "capabilities" or "ledger options".
> > > >> I think that this ability to perform no-syc writes is so important
> > that
> > > >> "custom metadata" is not the good place to declare it, same for
> > "ledger
> > > >> type"
> > > >>
> > > >> So I am proposing to add a boolean 'allowNoSyncWrites" at ledger
> > > creation
> > > >> time, without writing in to ledger metadata on ZK,
> > > >> I think that if further improvements will need ledger metadata
> changes
> > > we
> > > >> will do.
> > > >>
> > > >> I have updated the BP-14 document, I have added an "Open issues"
> > footer
> > > >> with the open points,
> > > >> please add comments and I will correct the document as soon as
> > possible.
> > > >>
> > > >>
> > > >> Enrico
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> 2017-08-30 1:24 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > > >>
> > > >>> Thank you, Enrico, JV.
> > > >>>
> > > >>> These are great discussions.
> > > >>>
> > > >>> After reading these two proposals, I have a few very high-level
> > > comments,
> > > >>> dividing into three categories.
> > > >>>
> > > >>>
> > > >>> *API*
> > > >>>
> > > >>> - I think there are not fundamentally differences between these two
> > > >>> proposals.
> > > >>> They are trying to achieve similar goals by exposing durability
> > levels
> > > in
> > > >>> different way.
> > > >>> So this will be a discussion on what API/interface should look like
> > > from
> > > >>> user / admin perspective.
> > > >>> I would suggest focusing what would be the API itself, putting the
> > > >>> implementation design aside when talking about this.
> > > >>>
> > > >>> *Core*
> > > >>>
> > > >>> - Both proposals need to deal with a core function - what happen to
> > LAC
> > > >>> and
> > > >>> what semantic that bookkeeper provides.
> > > >>> JV did a good summary in his proposal. However I am not a fan of
> > > >>> maintaining two different semantics. So I am looking for
> > > >>> a solution that bookkeeper can only maintain one semantic. The
> > semantic
> > > >>> is
> > > >>> basically:
> > > >>>
> > > >>> 1) LAC only advanced when entries before LAC are committed to the
> > > >>> persistent storage
> > > >>> 2) All the entries until LAC are successfully committed to the
> > > >>> persistence
> > > >>> storage
> > > >>> 3) Entries until LAC: all the entries must be readable all the
> time.
> > > >>>
> > > >>> If we maintain such semantic, there is no need to change the auto
> > > >>> recovery
> > > >>> protocol in bookkeeper. All what we guarantee are the entries
> durably
> > > >>> persistent.
> > > >>>
> > > >>> In order to maintain such semantic, I think both me and JV proposed
> > > >>> similar
> > > >>> solution in either proposal. I am trying to finalize one here:
> > > >>>
> > > >>> * bookie maintains a LAS (Last Add Synced) point for each entry.
> > > >>> * LAS can be piggybacked on AddResponses
> > > >>> * Client uses the LAS to advance LAC.
> > > >>>
> > > >>> If we can agree on the core semantic we are going to provide, the
> > other
> > > >>> things are just logistics.
> > > >>>
> > > >>> *Others*
> > > >>>
> > > >>> - Regarding separating journal or bypassing journal, there is no
> > > >>> difference
> > > >>> when we talking from the core semantic. They are all non-durably
> > writes
> > > >>> (acknowledging before fsyncing).
> > > >>> We can start with same journal approach (but just acknowledge
> before
> > > >>> fsyncing), implement the core and add other options later on.
> > > >>>
> > > >>>
> > > >>> From my point of view, I'd be more interesting in providing a
> single
> > > >>> consistent durable semantic that application can rely on for both
> > > durable
> > > >>> writes and non-durable writes. The other stuffs seem to be more
> > > logistics
> > > >>> things.
> > > >>>
> > > >>>
> > > >>> - Sijie
> > > >>>
> > > >>>
> > > >>> On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <
> > eolivelli@gmail.com
> > > >
> > > >>> wrote:
> > > >>>
> > > >>> > 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <
> > > jujjuri@gmail.com
> > > >>> >:
> > > >>> >
> > > >>> > > I don't believe I fully followed your second case. But even in
> > this
> > > >>> case,
> > > >>> > > your major concern is about the additional 'sync' RPC?
> > > >>> > >
> > > >>> >
> > > >>> > yes apart from that I am fine with your proposal too, that is to
> > > have a
> > > >>> > LedgerType which drives durability
> > > >>> > and I think we need to add per-entry durability options
> > > >>> >
> > > >>> > I think that at least for the 'simple' no-sync addEntry we do not
> > > need
> > > >>> to
> > > >>> > change many things, I am drafting a prototype, I will share it as
> > > soon
> > > >>> as
> > > >>> > we all agree on the roadmap
> > > >>> >
> > > >>> > The first implementation can cover the first cases (no-sync
> > addEntry)
> > > >>> and
> > > >>> > change the way the writer advances the LAC in order to support
> > > 'relaxed
> > > >>> > durability writes'.
> > > >>> > This change will be compatible with future improvements and it
> will
> > > >>> open
> > > >>> > the door for big changes on the bookie side like bypassing the
> > > journal
> > > >>> or
> > > >>> > leveraging multiple journals.....
> > > >>> >
> > > >>> > -- Enrico
> > > >>> >
> > > >>> > or something else that the LedgerType proposal won't work?
> > > >>> > >
> > > >>> >
> > > >>> > >
> > > >>> > >
> > > >>> > > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <
> > > >>> eolivelli@gmail.com>
> > > >>> > > wrote:
> > > >>> > >
> > > >>> > > > I think that having a set of options on the ledger metadata
> > will
> > > >>> be a
> > > >>> > > good
> > > >>> > > > enhancement and I am sure we will do it as soon as it will be
> > > >>> needed,
> > > >>> > > maybe
> > > >>> > > > we do not need it now.
> > > >>> > > >
> > > >>> > > > Actually I think we will need to declare this
> durability-level
> > at
> > > >>> entry
> > > >>> > > > level to support some uses cases in BP-14 document, let me
> > > explain
> > > >>> two
> > > >>> > of
> > > >>> > > > my usecases for which I need it:
> > > >>> > > >
> > > >>> > > > At higher level we have to choices:
> > > >>> > > >
> > > >>> > > > A) per-ledger durability options (JV proposal)
> > > >>> > > > all addEntry operations are durable or non-durable and there
> is
> > > an
> > > >>> > > explicit
> > > >>> > > > 'sync' API (+ forced sync at close)
> > > >>> > > >
> > > >>> > > > B) per-entry durability options (original BP-14 proposal)
> > > >>> > > > every addEntry has an own durable/non-durable option
> > > >>> (sync/no-sync),
> > > >>> > with
> > > >>> > > > the ability to call 'sync' without addEntry (+ forced sync at
> > > >>> close)
> > > >>> > > >
> > > >>> > > > I am speaking about the the database WAL case, I am using the
> > > >>> ledger as
> > > >>> > > > segment for the WAL of a database and I am writing all data
> > > >>> changes in
> > > >>> > > the
> > > >>> > > > scope of a 'transaction' with the relaxed-durability flag,
> > then I
> > > >>> am
> > > >>> > > > writing the 'transaction committed' entry with "strict
> > > durability"
> > > >>> > > > requirement, this will in fact require that all previous
> > entries
> > > >>> are
> > > >>> > > > persisted durably and so that the transaction will never be
> > lost.
> > > >>> > > >
> > > >>> > > > In this scenario we would need an addEntry + sync API in
> fact:
> > > >>> > > >
> > > >>> > > > using option  A) the WAL will look like:
> > > >>> > > > - open ledger no-sync = true
> > > >>> > > > - addEntry (set foo=bar)  (this will be no-sync)
> > > >>> > > > - addEntry (set foo=bar2) (this will be no-sync)
> > > >>> > > > - addEntry (commit)
> > > >>> > > > - sync
> > > >>> > > >
> > > >>> > > > using option B) the WAL will look like
> > > >>> > > > - open ledger
> > > >>> > > > - addEntry (set foo=bar), no-sync
> > > >>> > > > - addEntry (set foo=bar2), no-sync
> > > >>> > > > - addEntry (commit), sync
> > > >>> > > >
> > > >>> > > > in case B) we are "saving" one RPC call to every bookie (the
> > > 'sync'
> > > >>> > one)
> > > >>> > > > same for single data change entries, like updating a single
> > > record
> > > >>> on
> > > >>> > the
> > > >>> > > > database, this with BK 4.5 "costs" only a single RPC to every
> > > >>> bookie
> > > >>> > > >
> > > >>> > > > Second case:
> > > >>> > > > I am using BookKeeper to store binary objects, so I am
> packing
> > > more
> > > >>> > > > 'objects' (named sequences of bytes) into a single ledger,
> like
> > > >>> you do
> > > >>> > > when
> > > >>> > > > you write many records to a file in a streaming fashion and
> > keep
> > > >>> track
> > > >>> > of
> > > >>> > > > offsets of the beginning of every record (LedgerHandeAdv is
> > > >>> perfect for
> > > >>> > > > this case).
> > > >>> > > > I am not using a single ledger per 'file' because it kills
> > > >>> zookeeper to
> > > >>> > > > create many ledgers very fast, in my systems I have big busts
> > of
> > > >>> > writes,
> > > >>> > > > which need to be really "fast", so I am writing multiple
> > 'files'
> > > to
> > > >>> > every
> > > >>> > > > single ledger. So the close-to-open consistency at ledger
> level
> > > is
> > > >>> not
> > > >>> > > > suitable for this case.
> > > >>> > > > I have to write as fast as possible to this 'ledger-backed'
> > > >>> stream, and
> > > >>> > > as
> > > >>> > > > with a 'traditional'  filesystem I am writing parts of each
> > file
> > > >>> and
> > > >>> > than
> > > >>> > > > requiring 'sync' at the end of each file.
> > > >>> > > > Using BookKeeper you need to split big 'files' into "little"
> > > >>> parts, you
> > > >>> > > > cannot transmit the contents as to "real" stream on network.
> > > >>> > > >
> > > >>> > > > I am not talking about bookie level implementation details I
> > > would
> > > >>> like
> > > >>> > > to
> > > >>> > > > define the high level API in order to support all the
> relevant
> > > >>> known
> > > >>> > use
> > > >>> > > > cases and keep space for the future,
> > > >>> > > > at this moment adding a per-entry 'durability option' seems
> to
> > be
> > > >>> very
> > > >>> > > > flexible and simple to implement, it does not prevent us from
> > > doing
> > > >>> > > further
> > > >>> > > > improvements, like namely skipping the journal.
> > > >>> > > >
> > > >>> > > > Enrico
> > > >>> > > >
> > > >>> > > >
> > > >>> > > >
> > > >>> > > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <
> > eolivelli@gmail.com
> > > >:
> > > >>> > > >
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
> > > >>> > jujjuri@gmail.com>
> > > >>> > > > > wrote:
> > > >>> > > > >
> > > >>> > > > >> Hi all,
> > > >>> > > > >>
> > > >>> > > > >> As promised during Thursday call, here is my proposal.
> > > >>> > > > >>
> > > >>> > > > >> *NOTE*: Major difference in this proposal compared to
> > Enrico’s
> > > >>> > > > >> <https://docs.google.com/document/d/
> 1JLYO3K3tZ5PJGmyS0YK_-
> > > >>> > > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> > > >>> > > > >> is
> > > >>> > > > >> making the durability a property of the ledger(type) as
> > > opposed
> > > >>> to
> > > >>> > > > >> addEntry(). Rest of the technical details have a lot of
> > > >>> > similarities.
> > > >>> > > > >>
> > > >>> > > > >
> > > >>> > > > > Thank you JV. I have just read quickly the doc and your
> view
> > is
> > > >>> > > centantly
> > > >>> > > > > broader.
> > > >>> > > > > I will dig into the doc as soon as possible on Monday.
> > > >>> > > > > For me it is ok to have a ledger wide configuration I think
> > > that
> > > >>> the
> > > >>> > > most
> > > >>> > > > > important decision is about the API we will provide as in
> the
> > > >>> future
> > > >>> > it
> > > >>> > > > > will be difficult to change it.
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > Cheers
> > > >>> > > > > Enrico
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > >> https://docs.google.com/document/d/
> > 1g1eBcVVCZrTG8YZliZP0LVqv
> > > >>> Wpq43
> > > >>> > > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> > > >>> > > > >>
> > > >>> > > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
> > > >>> > eolivelli@gmail.com
> > > >>> > > >
> > > >>> > > > >> wrote:
> > > >>> > > > >>
> > > >>> > > > >> > Thank you all for the comments and for taking a look to
> > the
> > > >>> > document
> > > >>> > > > so
> > > >>> > > > >> > soon.
> > > >>> > > > >> > I have updated the doc, we will discuss the document at
> > the
> > > >>> > meeting,
> > > >>> > > > >> >
> > > >>> > > > >> >
> > > >>> > > > >> > Enrico
> > > >>> > > > >> >
> > > >>> > > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <guosijie@gmail.com
> >:
> > > >>> > > > >> >
> > > >>> > > > >> > > Enrico,
> > > >>> > > > >> > >
> > > >>> > > > >> > > Thank you so much! It is a great effort for putting
> this
> > > up.
> > > >>> > > Overall
> > > >>> > > > >> > looks
> > > >>> > > > >> > > good. I made some comments, we can discuss at
> tomorrow's
> > > >>> > community
> > > >>> > > > >> > meeting.
> > > >>> > > > >> > >
> > > >>> > > > >> > > - Sijie
> > > >>> > > > >> > >
> > > >>> > > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> > > >>> > > > eolivelli@gmail.com
> > > >>> > > > >> >
> > > >>> > > > >> > > wrote:
> > > >>> > > > >> > >
> > > >>> > > > >> > > > Hi all,
> > > >>> > > > >> > > > I have drafted a first proposal for BP-14 - Relax
> > > >>> Durability
> > > >>> > > > >> > > >
> > > >>> > > > >> > > > We are talking about limiting the number of fsync to
> > the
> > > >>> > journal
> > > >>> > > > >> while
> > > >>> > > > >> > > > preserving the correctness of the LAC protocol.
> > > >>> > > > >> > > >
> > > >>> > > > >> > > > This is the link to the wiki page, but as the issue
> is
> > > >>> huge we
> > > >>> > > > >> prefer
> > > >>> > > > >> > to
> > > >>> > > > >> > > > use Google Documents for sharing comments
> > > >>> > > > >> > > > https://cwiki.apache.org/
> > confluence/display/BOOKKEEPER/
> > > >>> > > > >> > > > BP+-+14+Relax+durability
> > > >>> > > > >> > > >
> > > >>> > > > >> > > > This is the document
> > > >>> > > > >> > > > https://docs.google.com/document/d/
> > > 1JLYO3K3tZ5PJGmyS0YK_-
> > > >>> > > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > > >>> > > > >> > > >
> > > >>> > > > >> > > > All comments are welcome
> > > >>> > > > >> > > >
> > > >>> > > > >> > > > I have added DL dev list in cc as the discussion is
> > > >>> > interesting
> > > >>> > > > for
> > > >>> > > > >> > both
> > > >>> > > > >> > > > groups
> > > >>> > > > >> > > >
> > > >>> > > > >> > > > Enrico Olivelli
> > > >>> > > > >> > > >
> > > >>> > > > >> > >
> > > >>> > > > >> >
> > > >>> > > > >>
> > > >>> > > > >>
> > > >>> > > > >>
> > > >>> > > > >> --
> > > >>> > > > >> Jvrao
> > > >>> > > > >> ---
> > > >>> > > > >> First they ignore you, then they laugh at you, then they
> > fight
> > > >>> you,
> > > >>> > > then
> > > >>> > > > >> you win. - Mahatma Gandhi
> > > >>> > > > >>
> > > >>> > > > > --
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > -- Enrico Olivelli
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> > > --
> > > >>> > > Jvrao
> > > >>> > > ---
> > > >>> > > First they ignore you, then they laugh at you, then they fight
> > you,
> > > >>> then
> > > >>> > > you win. - Mahatma Gandhi
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Sijie Guo <gu...@gmail.com>.
Cool.

I would expect this is a big change. It would be good if you can divide it
into smaller tasks, so people can review them easier.

- Sijie

On Tue, Sep 12, 2017 at 1:05 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> Thank you all !
>
> I will copy the content of the Final draft to the Wiki and mark the
> document as "Accepted"
>
> I will send a PR soon but it will depend on BP-15 New CreateLeader API
>
> I hope we could make it for 4.6
>
>
> Enrico
>
>
> 2017-09-11 18:58 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>
> > Enrico,
> >
> > Feel free to close the thread and mark this BP as accepted, if there is
> no
> > -1.
> >
> > - Sijie
> >
> > On Mon, Sep 11, 2017 at 2:26 AM, Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > Ping
> > >
> > > 2017-09-07 9:32 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> > >
> > > > Hi all,
> > > >
> > > >
> > > > You can find the revised proposal here
> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > > BP-14+Relax+durability
> > > >
> > > > The link to the document open for comments is this:
> > > > https://docs.google.com/document/d/1yNi9t2_
> > > deOOMXDaGzrnmaHTQeB3B3Fnym82DU
> > > > ERH7LM/edit?usp=sharing
> > > >
> > > > Please check it out
> > > > We are going to review this Proposal at the meeting
> > > >
> > > > -- Enrico
> > > >
> > > >
> > > > 2017-08-30 8:56 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> > > >
> > > >> Thank you Sijie for summarizing and thanks to the community for
> > helping
> > > >> in this important enhancement to BookKeeper
> > > >>
> > > >> I am convinced that as JV pointed out we need to declare at ledger
> > > >> creation time that the ledger is going to perform no-sync writes.
> > > >>
> > > >> I think we need an explicit declaration currently to make things
> > "clear"
> > > >> to the developer which is using the LedgerHandle API even and ledger
> > > >> creation tyime.
> > > >>
> > > >> The case is that we are going to forbid "striping" ledgers (ensemble
> > > size
> > > >> > quorum size) for no-sync writes in the first implementation:
> > > >> - one option is to  fail at the first no-sync addEntry, but this
> will
> > be
> > > >> really uncomfortable because usually the ack/write/ensemble sizes
> are
> > > >> configured by the admin, and there will be configurations in which
> > > errors
> > > >> will come out only after starting the system.
> > > >> - the second option is to make the developer explicitly enable
> no-sync
> > > >> writes at creation time and fail the creation of the ledger if the
> > > >> requested combination of options if not possible
> > > >>
> > > >> I am not sure that the changes to the bookie internals are a
> > Client-API
> > > >> matter, maybe we can leverage custom metadata (as JV said) in order
> to
> > > make
> > > >> the bookie handle ledgers in a different manner, this way will be
> > always
> > > >> open as custom metadata are already here.
> > > >>
> > > >> JV preferred the ledger-type approach, the dual solution is to
> > introduce
> > > >> a list of "capabilities" or "ledger options".
> > > >> I think that this ability to perform no-syc writes is so important
> > that
> > > >> "custom metadata" is not the good place to declare it, same for
> > "ledger
> > > >> type"
> > > >>
> > > >> So I am proposing to add a boolean 'allowNoSyncWrites" at ledger
> > > creation
> > > >> time, without writing in to ledger metadata on ZK,
> > > >> I think that if further improvements will need ledger metadata
> changes
> > > we
> > > >> will do.
> > > >>
> > > >> I have updated the BP-14 document, I have added an "Open issues"
> > footer
> > > >> with the open points,
> > > >> please add comments and I will correct the document as soon as
> > possible.
> > > >>
> > > >>
> > > >> Enrico
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> 2017-08-30 1:24 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > > >>
> > > >>> Thank you, Enrico, JV.
> > > >>>
> > > >>> These are great discussions.
> > > >>>
> > > >>> After reading these two proposals, I have a few very high-level
> > > comments,
> > > >>> dividing into three categories.
> > > >>>
> > > >>>
> > > >>> *API*
> > > >>>
> > > >>> - I think there are not fundamentally differences between these two
> > > >>> proposals.
> > > >>> They are trying to achieve similar goals by exposing durability
> > levels
> > > in
> > > >>> different way.
> > > >>> So this will be a discussion on what API/interface should look like
> > > from
> > > >>> user / admin perspective.
> > > >>> I would suggest focusing what would be the API itself, putting the
> > > >>> implementation design aside when talking about this.
> > > >>>
> > > >>> *Core*
> > > >>>
> > > >>> - Both proposals need to deal with a core function - what happen to
> > LAC
> > > >>> and
> > > >>> what semantic that bookkeeper provides.
> > > >>> JV did a good summary in his proposal. However I am not a fan of
> > > >>> maintaining two different semantics. So I am looking for
> > > >>> a solution that bookkeeper can only maintain one semantic. The
> > semantic
> > > >>> is
> > > >>> basically:
> > > >>>
> > > >>> 1) LAC only advanced when entries before LAC are committed to the
> > > >>> persistent storage
> > > >>> 2) All the entries until LAC are successfully committed to the
> > > >>> persistence
> > > >>> storage
> > > >>> 3) Entries until LAC: all the entries must be readable all the
> time.
> > > >>>
> > > >>> If we maintain such semantic, there is no need to change the auto
> > > >>> recovery
> > > >>> protocol in bookkeeper. All what we guarantee are the entries
> durably
> > > >>> persistent.
> > > >>>
> > > >>> In order to maintain such semantic, I think both me and JV proposed
> > > >>> similar
> > > >>> solution in either proposal. I am trying to finalize one here:
> > > >>>
> > > >>> * bookie maintains a LAS (Last Add Synced) point for each entry.
> > > >>> * LAS can be piggybacked on AddResponses
> > > >>> * Client uses the LAS to advance LAC.
> > > >>>
> > > >>> If we can agree on the core semantic we are going to provide, the
> > other
> > > >>> things are just logistics.
> > > >>>
> > > >>> *Others*
> > > >>>
> > > >>> - Regarding separating journal or bypassing journal, there is no
> > > >>> difference
> > > >>> when we talking from the core semantic. They are all non-durably
> > writes
> > > >>> (acknowledging before fsyncing).
> > > >>> We can start with same journal approach (but just acknowledge
> before
> > > >>> fsyncing), implement the core and add other options later on.
> > > >>>
> > > >>>
> > > >>> From my point of view, I'd be more interesting in providing a
> single
> > > >>> consistent durable semantic that application can rely on for both
> > > durable
> > > >>> writes and non-durable writes. The other stuffs seem to be more
> > > logistics
> > > >>> things.
> > > >>>
> > > >>>
> > > >>> - Sijie
> > > >>>
> > > >>>
> > > >>> On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <
> > eolivelli@gmail.com
> > > >
> > > >>> wrote:
> > > >>>
> > > >>> > 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <
> > > jujjuri@gmail.com
> > > >>> >:
> > > >>> >
> > > >>> > > I don't believe I fully followed your second case. But even in
> > this
> > > >>> case,
> > > >>> > > your major concern is about the additional 'sync' RPC?
> > > >>> > >
> > > >>> >
> > > >>> > yes apart from that I am fine with your proposal too, that is to
> > > have a
> > > >>> > LedgerType which drives durability
> > > >>> > and I think we need to add per-entry durability options
> > > >>> >
> > > >>> > I think that at least for the 'simple' no-sync addEntry we do not
> > > need
> > > >>> to
> > > >>> > change many things, I am drafting a prototype, I will share it as
> > > soon
> > > >>> as
> > > >>> > we all agree on the roadmap
> > > >>> >
> > > >>> > The first implementation can cover the first cases (no-sync
> > addEntry)
> > > >>> and
> > > >>> > change the way the writer advances the LAC in order to support
> > > 'relaxed
> > > >>> > durability writes'.
> > > >>> > This change will be compatible with future improvements and it
> will
> > > >>> open
> > > >>> > the door for big changes on the bookie side like bypassing the
> > > journal
> > > >>> or
> > > >>> > leveraging multiple journals.....
> > > >>> >
> > > >>> > -- Enrico
> > > >>> >
> > > >>> > or something else that the LedgerType proposal won't work?
> > > >>> > >
> > > >>> >
> > > >>> > >
> > > >>> > >
> > > >>> > > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <
> > > >>> eolivelli@gmail.com>
> > > >>> > > wrote:
> > > >>> > >
> > > >>> > > > I think that having a set of options on the ledger metadata
> > will
> > > >>> be a
> > > >>> > > good
> > > >>> > > > enhancement and I am sure we will do it as soon as it will be
> > > >>> needed,
> > > >>> > > maybe
> > > >>> > > > we do not need it now.
> > > >>> > > >
> > > >>> > > > Actually I think we will need to declare this
> durability-level
> > at
> > > >>> entry
> > > >>> > > > level to support some uses cases in BP-14 document, let me
> > > explain
> > > >>> two
> > > >>> > of
> > > >>> > > > my usecases for which I need it:
> > > >>> > > >
> > > >>> > > > At higher level we have to choices:
> > > >>> > > >
> > > >>> > > > A) per-ledger durability options (JV proposal)
> > > >>> > > > all addEntry operations are durable or non-durable and there
> is
> > > an
> > > >>> > > explicit
> > > >>> > > > 'sync' API (+ forced sync at close)
> > > >>> > > >
> > > >>> > > > B) per-entry durability options (original BP-14 proposal)
> > > >>> > > > every addEntry has an own durable/non-durable option
> > > >>> (sync/no-sync),
> > > >>> > with
> > > >>> > > > the ability to call 'sync' without addEntry (+ forced sync at
> > > >>> close)
> > > >>> > > >
> > > >>> > > > I am speaking about the the database WAL case, I am using the
> > > >>> ledger as
> > > >>> > > > segment for the WAL of a database and I am writing all data
> > > >>> changes in
> > > >>> > > the
> > > >>> > > > scope of a 'transaction' with the relaxed-durability flag,
> > then I
> > > >>> am
> > > >>> > > > writing the 'transaction committed' entry with "strict
> > > durability"
> > > >>> > > > requirement, this will in fact require that all previous
> > entries
> > > >>> are
> > > >>> > > > persisted durably and so that the transaction will never be
> > lost.
> > > >>> > > >
> > > >>> > > > In this scenario we would need an addEntry + sync API in
> fact:
> > > >>> > > >
> > > >>> > > > using option  A) the WAL will look like:
> > > >>> > > > - open ledger no-sync = true
> > > >>> > > > - addEntry (set foo=bar)  (this will be no-sync)
> > > >>> > > > - addEntry (set foo=bar2) (this will be no-sync)
> > > >>> > > > - addEntry (commit)
> > > >>> > > > - sync
> > > >>> > > >
> > > >>> > > > using option B) the WAL will look like
> > > >>> > > > - open ledger
> > > >>> > > > - addEntry (set foo=bar), no-sync
> > > >>> > > > - addEntry (set foo=bar2), no-sync
> > > >>> > > > - addEntry (commit), sync
> > > >>> > > >
> > > >>> > > > in case B) we are "saving" one RPC call to every bookie (the
> > > 'sync'
> > > >>> > one)
> > > >>> > > > same for single data change entries, like updating a single
> > > record
> > > >>> on
> > > >>> > the
> > > >>> > > > database, this with BK 4.5 "costs" only a single RPC to every
> > > >>> bookie
> > > >>> > > >
> > > >>> > > > Second case:
> > > >>> > > > I am using BookKeeper to store binary objects, so I am
> packing
> > > more
> > > >>> > > > 'objects' (named sequences of bytes) into a single ledger,
> like
> > > >>> you do
> > > >>> > > when
> > > >>> > > > you write many records to a file in a streaming fashion and
> > keep
> > > >>> track
> > > >>> > of
> > > >>> > > > offsets of the beginning of every record (LedgerHandeAdv is
> > > >>> perfect for
> > > >>> > > > this case).
> > > >>> > > > I am not using a single ledger per 'file' because it kills
> > > >>> zookeeper to
> > > >>> > > > create many ledgers very fast, in my systems I have big busts
> > of
> > > >>> > writes,
> > > >>> > > > which need to be really "fast", so I am writing multiple
> > 'files'
> > > to
> > > >>> > every
> > > >>> > > > single ledger. So the close-to-open consistency at ledger
> level
> > > is
> > > >>> not
> > > >>> > > > suitable for this case.
> > > >>> > > > I have to write as fast as possible to this 'ledger-backed'
> > > >>> stream, and
> > > >>> > > as
> > > >>> > > > with a 'traditional'  filesystem I am writing parts of each
> > file
> > > >>> and
> > > >>> > than
> > > >>> > > > requiring 'sync' at the end of each file.
> > > >>> > > > Using BookKeeper you need to split big 'files' into "little"
> > > >>> parts, you
> > > >>> > > > cannot transmit the contents as to "real" stream on network.
> > > >>> > > >
> > > >>> > > > I am not talking about bookie level implementation details I
> > > would
> > > >>> like
> > > >>> > > to
> > > >>> > > > define the high level API in order to support all the
> relevant
> > > >>> known
> > > >>> > use
> > > >>> > > > cases and keep space for the future,
> > > >>> > > > at this moment adding a per-entry 'durability option' seems
> to
> > be
> > > >>> very
> > > >>> > > > flexible and simple to implement, it does not prevent us from
> > > doing
> > > >>> > > further
> > > >>> > > > improvements, like namely skipping the journal.
> > > >>> > > >
> > > >>> > > > Enrico
> > > >>> > > >
> > > >>> > > >
> > > >>> > > >
> > > >>> > > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <
> > eolivelli@gmail.com
> > > >:
> > > >>> > > >
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
> > > >>> > jujjuri@gmail.com>
> > > >>> > > > > wrote:
> > > >>> > > > >
> > > >>> > > > >> Hi all,
> > > >>> > > > >>
> > > >>> > > > >> As promised during Thursday call, here is my proposal.
> > > >>> > > > >>
> > > >>> > > > >> *NOTE*: Major difference in this proposal compared to
> > Enrico’s
> > > >>> > > > >> <https://docs.google.com/document/d/
> 1JLYO3K3tZ5PJGmyS0YK_-
> > > >>> > > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> > > >>> > > > >> is
> > > >>> > > > >> making the durability a property of the ledger(type) as
> > > opposed
> > > >>> to
> > > >>> > > > >> addEntry(). Rest of the technical details have a lot of
> > > >>> > similarities.
> > > >>> > > > >>
> > > >>> > > > >
> > > >>> > > > > Thank you JV. I have just read quickly the doc and your
> view
> > is
> > > >>> > > centantly
> > > >>> > > > > broader.
> > > >>> > > > > I will dig into the doc as soon as possible on Monday.
> > > >>> > > > > For me it is ok to have a ledger wide configuration I think
> > > that
> > > >>> the
> > > >>> > > most
> > > >>> > > > > important decision is about the API we will provide as in
> the
> > > >>> future
> > > >>> > it
> > > >>> > > > > will be difficult to change it.
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > Cheers
> > > >>> > > > > Enrico
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > >> https://docs.google.com/document/d/
> > 1g1eBcVVCZrTG8YZliZP0LVqv
> > > >>> Wpq43
> > > >>> > > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> > > >>> > > > >>
> > > >>> > > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
> > > >>> > eolivelli@gmail.com
> > > >>> > > >
> > > >>> > > > >> wrote:
> > > >>> > > > >>
> > > >>> > > > >> > Thank you all for the comments and for taking a look to
> > the
> > > >>> > document
> > > >>> > > > so
> > > >>> > > > >> > soon.
> > > >>> > > > >> > I have updated the doc, we will discuss the document at
> > the
> > > >>> > meeting,
> > > >>> > > > >> >
> > > >>> > > > >> >
> > > >>> > > > >> > Enrico
> > > >>> > > > >> >
> > > >>> > > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <guosijie@gmail.com
> >:
> > > >>> > > > >> >
> > > >>> > > > >> > > Enrico,
> > > >>> > > > >> > >
> > > >>> > > > >> > > Thank you so much! It is a great effort for putting
> this
> > > up.
> > > >>> > > Overall
> > > >>> > > > >> > looks
> > > >>> > > > >> > > good. I made some comments, we can discuss at
> tomorrow's
> > > >>> > community
> > > >>> > > > >> > meeting.
> > > >>> > > > >> > >
> > > >>> > > > >> > > - Sijie
> > > >>> > > > >> > >
> > > >>> > > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> > > >>> > > > eolivelli@gmail.com
> > > >>> > > > >> >
> > > >>> > > > >> > > wrote:
> > > >>> > > > >> > >
> > > >>> > > > >> > > > Hi all,
> > > >>> > > > >> > > > I have drafted a first proposal for BP-14 - Relax
> > > >>> Durability
> > > >>> > > > >> > > >
> > > >>> > > > >> > > > We are talking about limiting the number of fsync to
> > the
> > > >>> > journal
> > > >>> > > > >> while
> > > >>> > > > >> > > > preserving the correctness of the LAC protocol.
> > > >>> > > > >> > > >
> > > >>> > > > >> > > > This is the link to the wiki page, but as the issue
> is
> > > >>> huge we
> > > >>> > > > >> prefer
> > > >>> > > > >> > to
> > > >>> > > > >> > > > use Google Documents for sharing comments
> > > >>> > > > >> > > > https://cwiki.apache.org/
> > confluence/display/BOOKKEEPER/
> > > >>> > > > >> > > > BP+-+14+Relax+durability
> > > >>> > > > >> > > >
> > > >>> > > > >> > > > This is the document
> > > >>> > > > >> > > > https://docs.google.com/document/d/
> > > 1JLYO3K3tZ5PJGmyS0YK_-
> > > >>> > > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > > >>> > > > >> > > >
> > > >>> > > > >> > > > All comments are welcome
> > > >>> > > > >> > > >
> > > >>> > > > >> > > > I have added DL dev list in cc as the discussion is
> > > >>> > interesting
> > > >>> > > > for
> > > >>> > > > >> > both
> > > >>> > > > >> > > > groups
> > > >>> > > > >> > > >
> > > >>> > > > >> > > > Enrico Olivelli
> > > >>> > > > >> > > >
> > > >>> > > > >> > >
> > > >>> > > > >> >
> > > >>> > > > >>
> > > >>> > > > >>
> > > >>> > > > >>
> > > >>> > > > >> --
> > > >>> > > > >> Jvrao
> > > >>> > > > >> ---
> > > >>> > > > >> First they ignore you, then they laugh at you, then they
> > fight
> > > >>> you,
> > > >>> > > then
> > > >>> > > > >> you win. - Mahatma Gandhi
> > > >>> > > > >>
> > > >>> > > > > --
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > -- Enrico Olivelli
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> > > --
> > > >>> > > Jvrao
> > > >>> > > ---
> > > >>> > > First they ignore you, then they laugh at you, then they fight
> > you,
> > > >>> then
> > > >>> > > you win. - Mahatma Gandhi
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
Thank you all !

I will copy the content of the Final draft to the Wiki and mark the
document as "Accepted"

I will send a PR soon but it will depend on BP-15 New CreateLeader API

I hope we could make it for 4.6


Enrico


2017-09-11 18:58 GMT+02:00 Sijie Guo <gu...@gmail.com>:

> Enrico,
>
> Feel free to close the thread and mark this BP as accepted, if there is no
> -1.
>
> - Sijie
>
> On Mon, Sep 11, 2017 at 2:26 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Ping
> >
> > 2017-09-07 9:32 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> >
> > > Hi all,
> > >
> > >
> > > You can find the revised proposal here
> > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > BP-14+Relax+durability
> > >
> > > The link to the document open for comments is this:
> > > https://docs.google.com/document/d/1yNi9t2_
> > deOOMXDaGzrnmaHTQeB3B3Fnym82DU
> > > ERH7LM/edit?usp=sharing
> > >
> > > Please check it out
> > > We are going to review this Proposal at the meeting
> > >
> > > -- Enrico
> > >
> > >
> > > 2017-08-30 8:56 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> > >
> > >> Thank you Sijie for summarizing and thanks to the community for
> helping
> > >> in this important enhancement to BookKeeper
> > >>
> > >> I am convinced that as JV pointed out we need to declare at ledger
> > >> creation time that the ledger is going to perform no-sync writes.
> > >>
> > >> I think we need an explicit declaration currently to make things
> "clear"
> > >> to the developer which is using the LedgerHandle API even and ledger
> > >> creation tyime.
> > >>
> > >> The case is that we are going to forbid "striping" ledgers (ensemble
> > size
> > >> > quorum size) for no-sync writes in the first implementation:
> > >> - one option is to  fail at the first no-sync addEntry, but this will
> be
> > >> really uncomfortable because usually the ack/write/ensemble sizes are
> > >> configured by the admin, and there will be configurations in which
> > errors
> > >> will come out only after starting the system.
> > >> - the second option is to make the developer explicitly enable no-sync
> > >> writes at creation time and fail the creation of the ledger if the
> > >> requested combination of options if not possible
> > >>
> > >> I am not sure that the changes to the bookie internals are a
> Client-API
> > >> matter, maybe we can leverage custom metadata (as JV said) in order to
> > make
> > >> the bookie handle ledgers in a different manner, this way will be
> always
> > >> open as custom metadata are already here.
> > >>
> > >> JV preferred the ledger-type approach, the dual solution is to
> introduce
> > >> a list of "capabilities" or "ledger options".
> > >> I think that this ability to perform no-syc writes is so important
> that
> > >> "custom metadata" is not the good place to declare it, same for
> "ledger
> > >> type"
> > >>
> > >> So I am proposing to add a boolean 'allowNoSyncWrites" at ledger
> > creation
> > >> time, without writing in to ledger metadata on ZK,
> > >> I think that if further improvements will need ledger metadata changes
> > we
> > >> will do.
> > >>
> > >> I have updated the BP-14 document, I have added an "Open issues"
> footer
> > >> with the open points,
> > >> please add comments and I will correct the document as soon as
> possible.
> > >>
> > >>
> > >> Enrico
> > >>
> > >>
> > >>
> > >>
> > >> 2017-08-30 1:24 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > >>
> > >>> Thank you, Enrico, JV.
> > >>>
> > >>> These are great discussions.
> > >>>
> > >>> After reading these two proposals, I have a few very high-level
> > comments,
> > >>> dividing into three categories.
> > >>>
> > >>>
> > >>> *API*
> > >>>
> > >>> - I think there are not fundamentally differences between these two
> > >>> proposals.
> > >>> They are trying to achieve similar goals by exposing durability
> levels
> > in
> > >>> different way.
> > >>> So this will be a discussion on what API/interface should look like
> > from
> > >>> user / admin perspective.
> > >>> I would suggest focusing what would be the API itself, putting the
> > >>> implementation design aside when talking about this.
> > >>>
> > >>> *Core*
> > >>>
> > >>> - Both proposals need to deal with a core function - what happen to
> LAC
> > >>> and
> > >>> what semantic that bookkeeper provides.
> > >>> JV did a good summary in his proposal. However I am not a fan of
> > >>> maintaining two different semantics. So I am looking for
> > >>> a solution that bookkeeper can only maintain one semantic. The
> semantic
> > >>> is
> > >>> basically:
> > >>>
> > >>> 1) LAC only advanced when entries before LAC are committed to the
> > >>> persistent storage
> > >>> 2) All the entries until LAC are successfully committed to the
> > >>> persistence
> > >>> storage
> > >>> 3) Entries until LAC: all the entries must be readable all the time.
> > >>>
> > >>> If we maintain such semantic, there is no need to change the auto
> > >>> recovery
> > >>> protocol in bookkeeper. All what we guarantee are the entries durably
> > >>> persistent.
> > >>>
> > >>> In order to maintain such semantic, I think both me and JV proposed
> > >>> similar
> > >>> solution in either proposal. I am trying to finalize one here:
> > >>>
> > >>> * bookie maintains a LAS (Last Add Synced) point for each entry.
> > >>> * LAS can be piggybacked on AddResponses
> > >>> * Client uses the LAS to advance LAC.
> > >>>
> > >>> If we can agree on the core semantic we are going to provide, the
> other
> > >>> things are just logistics.
> > >>>
> > >>> *Others*
> > >>>
> > >>> - Regarding separating journal or bypassing journal, there is no
> > >>> difference
> > >>> when we talking from the core semantic. They are all non-durably
> writes
> > >>> (acknowledging before fsyncing).
> > >>> We can start with same journal approach (but just acknowledge before
> > >>> fsyncing), implement the core and add other options later on.
> > >>>
> > >>>
> > >>> From my point of view, I'd be more interesting in providing a single
> > >>> consistent durable semantic that application can rely on for both
> > durable
> > >>> writes and non-durable writes. The other stuffs seem to be more
> > logistics
> > >>> things.
> > >>>
> > >>>
> > >>> - Sijie
> > >>>
> > >>>
> > >>> On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <
> eolivelli@gmail.com
> > >
> > >>> wrote:
> > >>>
> > >>> > 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <
> > jujjuri@gmail.com
> > >>> >:
> > >>> >
> > >>> > > I don't believe I fully followed your second case. But even in
> this
> > >>> case,
> > >>> > > your major concern is about the additional 'sync' RPC?
> > >>> > >
> > >>> >
> > >>> > yes apart from that I am fine with your proposal too, that is to
> > have a
> > >>> > LedgerType which drives durability
> > >>> > and I think we need to add per-entry durability options
> > >>> >
> > >>> > I think that at least for the 'simple' no-sync addEntry we do not
> > need
> > >>> to
> > >>> > change many things, I am drafting a prototype, I will share it as
> > soon
> > >>> as
> > >>> > we all agree on the roadmap
> > >>> >
> > >>> > The first implementation can cover the first cases (no-sync
> addEntry)
> > >>> and
> > >>> > change the way the writer advances the LAC in order to support
> > 'relaxed
> > >>> > durability writes'.
> > >>> > This change will be compatible with future improvements and it will
> > >>> open
> > >>> > the door for big changes on the bookie side like bypassing the
> > journal
> > >>> or
> > >>> > leveraging multiple journals.....
> > >>> >
> > >>> > -- Enrico
> > >>> >
> > >>> > or something else that the LedgerType proposal won't work?
> > >>> > >
> > >>> >
> > >>> > >
> > >>> > >
> > >>> > > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <
> > >>> eolivelli@gmail.com>
> > >>> > > wrote:
> > >>> > >
> > >>> > > > I think that having a set of options on the ledger metadata
> will
> > >>> be a
> > >>> > > good
> > >>> > > > enhancement and I am sure we will do it as soon as it will be
> > >>> needed,
> > >>> > > maybe
> > >>> > > > we do not need it now.
> > >>> > > >
> > >>> > > > Actually I think we will need to declare this durability-level
> at
> > >>> entry
> > >>> > > > level to support some uses cases in BP-14 document, let me
> > explain
> > >>> two
> > >>> > of
> > >>> > > > my usecases for which I need it:
> > >>> > > >
> > >>> > > > At higher level we have to choices:
> > >>> > > >
> > >>> > > > A) per-ledger durability options (JV proposal)
> > >>> > > > all addEntry operations are durable or non-durable and there is
> > an
> > >>> > > explicit
> > >>> > > > 'sync' API (+ forced sync at close)
> > >>> > > >
> > >>> > > > B) per-entry durability options (original BP-14 proposal)
> > >>> > > > every addEntry has an own durable/non-durable option
> > >>> (sync/no-sync),
> > >>> > with
> > >>> > > > the ability to call 'sync' without addEntry (+ forced sync at
> > >>> close)
> > >>> > > >
> > >>> > > > I am speaking about the the database WAL case, I am using the
> > >>> ledger as
> > >>> > > > segment for the WAL of a database and I am writing all data
> > >>> changes in
> > >>> > > the
> > >>> > > > scope of a 'transaction' with the relaxed-durability flag,
> then I
> > >>> am
> > >>> > > > writing the 'transaction committed' entry with "strict
> > durability"
> > >>> > > > requirement, this will in fact require that all previous
> entries
> > >>> are
> > >>> > > > persisted durably and so that the transaction will never be
> lost.
> > >>> > > >
> > >>> > > > In this scenario we would need an addEntry + sync API in fact:
> > >>> > > >
> > >>> > > > using option  A) the WAL will look like:
> > >>> > > > - open ledger no-sync = true
> > >>> > > > - addEntry (set foo=bar)  (this will be no-sync)
> > >>> > > > - addEntry (set foo=bar2) (this will be no-sync)
> > >>> > > > - addEntry (commit)
> > >>> > > > - sync
> > >>> > > >
> > >>> > > > using option B) the WAL will look like
> > >>> > > > - open ledger
> > >>> > > > - addEntry (set foo=bar), no-sync
> > >>> > > > - addEntry (set foo=bar2), no-sync
> > >>> > > > - addEntry (commit), sync
> > >>> > > >
> > >>> > > > in case B) we are "saving" one RPC call to every bookie (the
> > 'sync'
> > >>> > one)
> > >>> > > > same for single data change entries, like updating a single
> > record
> > >>> on
> > >>> > the
> > >>> > > > database, this with BK 4.5 "costs" only a single RPC to every
> > >>> bookie
> > >>> > > >
> > >>> > > > Second case:
> > >>> > > > I am using BookKeeper to store binary objects, so I am packing
> > more
> > >>> > > > 'objects' (named sequences of bytes) into a single ledger, like
> > >>> you do
> > >>> > > when
> > >>> > > > you write many records to a file in a streaming fashion and
> keep
> > >>> track
> > >>> > of
> > >>> > > > offsets of the beginning of every record (LedgerHandeAdv is
> > >>> perfect for
> > >>> > > > this case).
> > >>> > > > I am not using a single ledger per 'file' because it kills
> > >>> zookeeper to
> > >>> > > > create many ledgers very fast, in my systems I have big busts
> of
> > >>> > writes,
> > >>> > > > which need to be really "fast", so I am writing multiple
> 'files'
> > to
> > >>> > every
> > >>> > > > single ledger. So the close-to-open consistency at ledger level
> > is
> > >>> not
> > >>> > > > suitable for this case.
> > >>> > > > I have to write as fast as possible to this 'ledger-backed'
> > >>> stream, and
> > >>> > > as
> > >>> > > > with a 'traditional'  filesystem I am writing parts of each
> file
> > >>> and
> > >>> > than
> > >>> > > > requiring 'sync' at the end of each file.
> > >>> > > > Using BookKeeper you need to split big 'files' into "little"
> > >>> parts, you
> > >>> > > > cannot transmit the contents as to "real" stream on network.
> > >>> > > >
> > >>> > > > I am not talking about bookie level implementation details I
> > would
> > >>> like
> > >>> > > to
> > >>> > > > define the high level API in order to support all the relevant
> > >>> known
> > >>> > use
> > >>> > > > cases and keep space for the future,
> > >>> > > > at this moment adding a per-entry 'durability option' seems to
> be
> > >>> very
> > >>> > > > flexible and simple to implement, it does not prevent us from
> > doing
> > >>> > > further
> > >>> > > > improvements, like namely skipping the journal.
> > >>> > > >
> > >>> > > > Enrico
> > >>> > > >
> > >>> > > >
> > >>> > > >
> > >>> > > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <
> eolivelli@gmail.com
> > >:
> > >>> > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
> > >>> > jujjuri@gmail.com>
> > >>> > > > > wrote:
> > >>> > > > >
> > >>> > > > >> Hi all,
> > >>> > > > >>
> > >>> > > > >> As promised during Thursday call, here is my proposal.
> > >>> > > > >>
> > >>> > > > >> *NOTE*: Major difference in this proposal compared to
> Enrico’s
> > >>> > > > >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > >>> > > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> > >>> > > > >> is
> > >>> > > > >> making the durability a property of the ledger(type) as
> > opposed
> > >>> to
> > >>> > > > >> addEntry(). Rest of the technical details have a lot of
> > >>> > similarities.
> > >>> > > > >>
> > >>> > > > >
> > >>> > > > > Thank you JV. I have just read quickly the doc and your view
> is
> > >>> > > centantly
> > >>> > > > > broader.
> > >>> > > > > I will dig into the doc as soon as possible on Monday.
> > >>> > > > > For me it is ok to have a ledger wide configuration I think
> > that
> > >>> the
> > >>> > > most
> > >>> > > > > important decision is about the API we will provide as in the
> > >>> future
> > >>> > it
> > >>> > > > > will be difficult to change it.
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > Cheers
> > >>> > > > > Enrico
> > >>> > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > >> https://docs.google.com/document/d/
> 1g1eBcVVCZrTG8YZliZP0LVqv
> > >>> Wpq43
> > >>> > > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> > >>> > > > >>
> > >>> > > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
> > >>> > eolivelli@gmail.com
> > >>> > > >
> > >>> > > > >> wrote:
> > >>> > > > >>
> > >>> > > > >> > Thank you all for the comments and for taking a look to
> the
> > >>> > document
> > >>> > > > so
> > >>> > > > >> > soon.
> > >>> > > > >> > I have updated the doc, we will discuss the document at
> the
> > >>> > meeting,
> > >>> > > > >> >
> > >>> > > > >> >
> > >>> > > > >> > Enrico
> > >>> > > > >> >
> > >>> > > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > >>> > > > >> >
> > >>> > > > >> > > Enrico,
> > >>> > > > >> > >
> > >>> > > > >> > > Thank you so much! It is a great effort for putting this
> > up.
> > >>> > > Overall
> > >>> > > > >> > looks
> > >>> > > > >> > > good. I made some comments, we can discuss at tomorrow's
> > >>> > community
> > >>> > > > >> > meeting.
> > >>> > > > >> > >
> > >>> > > > >> > > - Sijie
> > >>> > > > >> > >
> > >>> > > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> > >>> > > > eolivelli@gmail.com
> > >>> > > > >> >
> > >>> > > > >> > > wrote:
> > >>> > > > >> > >
> > >>> > > > >> > > > Hi all,
> > >>> > > > >> > > > I have drafted a first proposal for BP-14 - Relax
> > >>> Durability
> > >>> > > > >> > > >
> > >>> > > > >> > > > We are talking about limiting the number of fsync to
> the
> > >>> > journal
> > >>> > > > >> while
> > >>> > > > >> > > > preserving the correctness of the LAC protocol.
> > >>> > > > >> > > >
> > >>> > > > >> > > > This is the link to the wiki page, but as the issue is
> > >>> huge we
> > >>> > > > >> prefer
> > >>> > > > >> > to
> > >>> > > > >> > > > use Google Documents for sharing comments
> > >>> > > > >> > > > https://cwiki.apache.org/
> confluence/display/BOOKKEEPER/
> > >>> > > > >> > > > BP+-+14+Relax+durability
> > >>> > > > >> > > >
> > >>> > > > >> > > > This is the document
> > >>> > > > >> > > > https://docs.google.com/document/d/
> > 1JLYO3K3tZ5PJGmyS0YK_-
> > >>> > > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > >>> > > > >> > > >
> > >>> > > > >> > > > All comments are welcome
> > >>> > > > >> > > >
> > >>> > > > >> > > > I have added DL dev list in cc as the discussion is
> > >>> > interesting
> > >>> > > > for
> > >>> > > > >> > both
> > >>> > > > >> > > > groups
> > >>> > > > >> > > >
> > >>> > > > >> > > > Enrico Olivelli
> > >>> > > > >> > > >
> > >>> > > > >> > >
> > >>> > > > >> >
> > >>> > > > >>
> > >>> > > > >>
> > >>> > > > >>
> > >>> > > > >> --
> > >>> > > > >> Jvrao
> > >>> > > > >> ---
> > >>> > > > >> First they ignore you, then they laugh at you, then they
> fight
> > >>> you,
> > >>> > > then
> > >>> > > > >> you win. - Mahatma Gandhi
> > >>> > > > >>
> > >>> > > > > --
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > -- Enrico Olivelli
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > > --
> > >>> > > Jvrao
> > >>> > > ---
> > >>> > > First they ignore you, then they laugh at you, then they fight
> you,
> > >>> then
> > >>> > > you win. - Mahatma Gandhi
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
Thank you all !

I will copy the content of the Final draft to the Wiki and mark the
document as "Accepted"

I will send a PR soon but it will depend on BP-15 New CreateLeader API

I hope we could make it for 4.6


Enrico


2017-09-11 18:58 GMT+02:00 Sijie Guo <gu...@gmail.com>:

> Enrico,
>
> Feel free to close the thread and mark this BP as accepted, if there is no
> -1.
>
> - Sijie
>
> On Mon, Sep 11, 2017 at 2:26 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Ping
> >
> > 2017-09-07 9:32 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> >
> > > Hi all,
> > >
> > >
> > > You can find the revised proposal here
> > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > BP-14+Relax+durability
> > >
> > > The link to the document open for comments is this:
> > > https://docs.google.com/document/d/1yNi9t2_
> > deOOMXDaGzrnmaHTQeB3B3Fnym82DU
> > > ERH7LM/edit?usp=sharing
> > >
> > > Please check it out
> > > We are going to review this Proposal at the meeting
> > >
> > > -- Enrico
> > >
> > >
> > > 2017-08-30 8:56 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> > >
> > >> Thank you Sijie for summarizing and thanks to the community for
> helping
> > >> in this important enhancement to BookKeeper
> > >>
> > >> I am convinced that as JV pointed out we need to declare at ledger
> > >> creation time that the ledger is going to perform no-sync writes.
> > >>
> > >> I think we need an explicit declaration currently to make things
> "clear"
> > >> to the developer which is using the LedgerHandle API even and ledger
> > >> creation tyime.
> > >>
> > >> The case is that we are going to forbid "striping" ledgers (ensemble
> > size
> > >> > quorum size) for no-sync writes in the first implementation:
> > >> - one option is to  fail at the first no-sync addEntry, but this will
> be
> > >> really uncomfortable because usually the ack/write/ensemble sizes are
> > >> configured by the admin, and there will be configurations in which
> > errors
> > >> will come out only after starting the system.
> > >> - the second option is to make the developer explicitly enable no-sync
> > >> writes at creation time and fail the creation of the ledger if the
> > >> requested combination of options if not possible
> > >>
> > >> I am not sure that the changes to the bookie internals are a
> Client-API
> > >> matter, maybe we can leverage custom metadata (as JV said) in order to
> > make
> > >> the bookie handle ledgers in a different manner, this way will be
> always
> > >> open as custom metadata are already here.
> > >>
> > >> JV preferred the ledger-type approach, the dual solution is to
> introduce
> > >> a list of "capabilities" or "ledger options".
> > >> I think that this ability to perform no-syc writes is so important
> that
> > >> "custom metadata" is not the good place to declare it, same for
> "ledger
> > >> type"
> > >>
> > >> So I am proposing to add a boolean 'allowNoSyncWrites" at ledger
> > creation
> > >> time, without writing in to ledger metadata on ZK,
> > >> I think that if further improvements will need ledger metadata changes
> > we
> > >> will do.
> > >>
> > >> I have updated the BP-14 document, I have added an "Open issues"
> footer
> > >> with the open points,
> > >> please add comments and I will correct the document as soon as
> possible.
> > >>
> > >>
> > >> Enrico
> > >>
> > >>
> > >>
> > >>
> > >> 2017-08-30 1:24 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > >>
> > >>> Thank you, Enrico, JV.
> > >>>
> > >>> These are great discussions.
> > >>>
> > >>> After reading these two proposals, I have a few very high-level
> > comments,
> > >>> dividing into three categories.
> > >>>
> > >>>
> > >>> *API*
> > >>>
> > >>> - I think there are not fundamentally differences between these two
> > >>> proposals.
> > >>> They are trying to achieve similar goals by exposing durability
> levels
> > in
> > >>> different way.
> > >>> So this will be a discussion on what API/interface should look like
> > from
> > >>> user / admin perspective.
> > >>> I would suggest focusing what would be the API itself, putting the
> > >>> implementation design aside when talking about this.
> > >>>
> > >>> *Core*
> > >>>
> > >>> - Both proposals need to deal with a core function - what happen to
> LAC
> > >>> and
> > >>> what semantic that bookkeeper provides.
> > >>> JV did a good summary in his proposal. However I am not a fan of
> > >>> maintaining two different semantics. So I am looking for
> > >>> a solution that bookkeeper can only maintain one semantic. The
> semantic
> > >>> is
> > >>> basically:
> > >>>
> > >>> 1) LAC only advanced when entries before LAC are committed to the
> > >>> persistent storage
> > >>> 2) All the entries until LAC are successfully committed to the
> > >>> persistence
> > >>> storage
> > >>> 3) Entries until LAC: all the entries must be readable all the time.
> > >>>
> > >>> If we maintain such semantic, there is no need to change the auto
> > >>> recovery
> > >>> protocol in bookkeeper. All what we guarantee are the entries durably
> > >>> persistent.
> > >>>
> > >>> In order to maintain such semantic, I think both me and JV proposed
> > >>> similar
> > >>> solution in either proposal. I am trying to finalize one here:
> > >>>
> > >>> * bookie maintains a LAS (Last Add Synced) point for each entry.
> > >>> * LAS can be piggybacked on AddResponses
> > >>> * Client uses the LAS to advance LAC.
> > >>>
> > >>> If we can agree on the core semantic we are going to provide, the
> other
> > >>> things are just logistics.
> > >>>
> > >>> *Others*
> > >>>
> > >>> - Regarding separating journal or bypassing journal, there is no
> > >>> difference
> > >>> when we talking from the core semantic. They are all non-durably
> writes
> > >>> (acknowledging before fsyncing).
> > >>> We can start with same journal approach (but just acknowledge before
> > >>> fsyncing), implement the core and add other options later on.
> > >>>
> > >>>
> > >>> From my point of view, I'd be more interesting in providing a single
> > >>> consistent durable semantic that application can rely on for both
> > durable
> > >>> writes and non-durable writes. The other stuffs seem to be more
> > logistics
> > >>> things.
> > >>>
> > >>>
> > >>> - Sijie
> > >>>
> > >>>
> > >>> On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <
> eolivelli@gmail.com
> > >
> > >>> wrote:
> > >>>
> > >>> > 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <
> > jujjuri@gmail.com
> > >>> >:
> > >>> >
> > >>> > > I don't believe I fully followed your second case. But even in
> this
> > >>> case,
> > >>> > > your major concern is about the additional 'sync' RPC?
> > >>> > >
> > >>> >
> > >>> > yes apart from that I am fine with your proposal too, that is to
> > have a
> > >>> > LedgerType which drives durability
> > >>> > and I think we need to add per-entry durability options
> > >>> >
> > >>> > I think that at least for the 'simple' no-sync addEntry we do not
> > need
> > >>> to
> > >>> > change many things, I am drafting a prototype, I will share it as
> > soon
> > >>> as
> > >>> > we all agree on the roadmap
> > >>> >
> > >>> > The first implementation can cover the first cases (no-sync
> addEntry)
> > >>> and
> > >>> > change the way the writer advances the LAC in order to support
> > 'relaxed
> > >>> > durability writes'.
> > >>> > This change will be compatible with future improvements and it will
> > >>> open
> > >>> > the door for big changes on the bookie side like bypassing the
> > journal
> > >>> or
> > >>> > leveraging multiple journals.....
> > >>> >
> > >>> > -- Enrico
> > >>> >
> > >>> > or something else that the LedgerType proposal won't work?
> > >>> > >
> > >>> >
> > >>> > >
> > >>> > >
> > >>> > > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <
> > >>> eolivelli@gmail.com>
> > >>> > > wrote:
> > >>> > >
> > >>> > > > I think that having a set of options on the ledger metadata
> will
> > >>> be a
> > >>> > > good
> > >>> > > > enhancement and I am sure we will do it as soon as it will be
> > >>> needed,
> > >>> > > maybe
> > >>> > > > we do not need it now.
> > >>> > > >
> > >>> > > > Actually I think we will need to declare this durability-level
> at
> > >>> entry
> > >>> > > > level to support some uses cases in BP-14 document, let me
> > explain
> > >>> two
> > >>> > of
> > >>> > > > my usecases for which I need it:
> > >>> > > >
> > >>> > > > At higher level we have to choices:
> > >>> > > >
> > >>> > > > A) per-ledger durability options (JV proposal)
> > >>> > > > all addEntry operations are durable or non-durable and there is
> > an
> > >>> > > explicit
> > >>> > > > 'sync' API (+ forced sync at close)
> > >>> > > >
> > >>> > > > B) per-entry durability options (original BP-14 proposal)
> > >>> > > > every addEntry has an own durable/non-durable option
> > >>> (sync/no-sync),
> > >>> > with
> > >>> > > > the ability to call 'sync' without addEntry (+ forced sync at
> > >>> close)
> > >>> > > >
> > >>> > > > I am speaking about the the database WAL case, I am using the
> > >>> ledger as
> > >>> > > > segment for the WAL of a database and I am writing all data
> > >>> changes in
> > >>> > > the
> > >>> > > > scope of a 'transaction' with the relaxed-durability flag,
> then I
> > >>> am
> > >>> > > > writing the 'transaction committed' entry with "strict
> > durability"
> > >>> > > > requirement, this will in fact require that all previous
> entries
> > >>> are
> > >>> > > > persisted durably and so that the transaction will never be
> lost.
> > >>> > > >
> > >>> > > > In this scenario we would need an addEntry + sync API in fact:
> > >>> > > >
> > >>> > > > using option  A) the WAL will look like:
> > >>> > > > - open ledger no-sync = true
> > >>> > > > - addEntry (set foo=bar)  (this will be no-sync)
> > >>> > > > - addEntry (set foo=bar2) (this will be no-sync)
> > >>> > > > - addEntry (commit)
> > >>> > > > - sync
> > >>> > > >
> > >>> > > > using option B) the WAL will look like
> > >>> > > > - open ledger
> > >>> > > > - addEntry (set foo=bar), no-sync
> > >>> > > > - addEntry (set foo=bar2), no-sync
> > >>> > > > - addEntry (commit), sync
> > >>> > > >
> > >>> > > > in case B) we are "saving" one RPC call to every bookie (the
> > 'sync'
> > >>> > one)
> > >>> > > > same for single data change entries, like updating a single
> > record
> > >>> on
> > >>> > the
> > >>> > > > database, this with BK 4.5 "costs" only a single RPC to every
> > >>> bookie
> > >>> > > >
> > >>> > > > Second case:
> > >>> > > > I am using BookKeeper to store binary objects, so I am packing
> > more
> > >>> > > > 'objects' (named sequences of bytes) into a single ledger, like
> > >>> you do
> > >>> > > when
> > >>> > > > you write many records to a file in a streaming fashion and
> keep
> > >>> track
> > >>> > of
> > >>> > > > offsets of the beginning of every record (LedgerHandeAdv is
> > >>> perfect for
> > >>> > > > this case).
> > >>> > > > I am not using a single ledger per 'file' because it kills
> > >>> zookeeper to
> > >>> > > > create many ledgers very fast, in my systems I have big busts
> of
> > >>> > writes,
> > >>> > > > which need to be really "fast", so I am writing multiple
> 'files'
> > to
> > >>> > every
> > >>> > > > single ledger. So the close-to-open consistency at ledger level
> > is
> > >>> not
> > >>> > > > suitable for this case.
> > >>> > > > I have to write as fast as possible to this 'ledger-backed'
> > >>> stream, and
> > >>> > > as
> > >>> > > > with a 'traditional'  filesystem I am writing parts of each
> file
> > >>> and
> > >>> > than
> > >>> > > > requiring 'sync' at the end of each file.
> > >>> > > > Using BookKeeper you need to split big 'files' into "little"
> > >>> parts, you
> > >>> > > > cannot transmit the contents as to "real" stream on network.
> > >>> > > >
> > >>> > > > I am not talking about bookie level implementation details I
> > would
> > >>> like
> > >>> > > to
> > >>> > > > define the high level API in order to support all the relevant
> > >>> known
> > >>> > use
> > >>> > > > cases and keep space for the future,
> > >>> > > > at this moment adding a per-entry 'durability option' seems to
> be
> > >>> very
> > >>> > > > flexible and simple to implement, it does not prevent us from
> > doing
> > >>> > > further
> > >>> > > > improvements, like namely skipping the journal.
> > >>> > > >
> > >>> > > > Enrico
> > >>> > > >
> > >>> > > >
> > >>> > > >
> > >>> > > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <
> eolivelli@gmail.com
> > >:
> > >>> > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
> > >>> > jujjuri@gmail.com>
> > >>> > > > > wrote:
> > >>> > > > >
> > >>> > > > >> Hi all,
> > >>> > > > >>
> > >>> > > > >> As promised during Thursday call, here is my proposal.
> > >>> > > > >>
> > >>> > > > >> *NOTE*: Major difference in this proposal compared to
> Enrico’s
> > >>> > > > >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > >>> > > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> > >>> > > > >> is
> > >>> > > > >> making the durability a property of the ledger(type) as
> > opposed
> > >>> to
> > >>> > > > >> addEntry(). Rest of the technical details have a lot of
> > >>> > similarities.
> > >>> > > > >>
> > >>> > > > >
> > >>> > > > > Thank you JV. I have just read quickly the doc and your view
> is
> > >>> > > centantly
> > >>> > > > > broader.
> > >>> > > > > I will dig into the doc as soon as possible on Monday.
> > >>> > > > > For me it is ok to have a ledger wide configuration I think
> > that
> > >>> the
> > >>> > > most
> > >>> > > > > important decision is about the API we will provide as in the
> > >>> future
> > >>> > it
> > >>> > > > > will be difficult to change it.
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > Cheers
> > >>> > > > > Enrico
> > >>> > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > >> https://docs.google.com/document/d/
> 1g1eBcVVCZrTG8YZliZP0LVqv
> > >>> Wpq43
> > >>> > > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> > >>> > > > >>
> > >>> > > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
> > >>> > eolivelli@gmail.com
> > >>> > > >
> > >>> > > > >> wrote:
> > >>> > > > >>
> > >>> > > > >> > Thank you all for the comments and for taking a look to
> the
> > >>> > document
> > >>> > > > so
> > >>> > > > >> > soon.
> > >>> > > > >> > I have updated the doc, we will discuss the document at
> the
> > >>> > meeting,
> > >>> > > > >> >
> > >>> > > > >> >
> > >>> > > > >> > Enrico
> > >>> > > > >> >
> > >>> > > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > >>> > > > >> >
> > >>> > > > >> > > Enrico,
> > >>> > > > >> > >
> > >>> > > > >> > > Thank you so much! It is a great effort for putting this
> > up.
> > >>> > > Overall
> > >>> > > > >> > looks
> > >>> > > > >> > > good. I made some comments, we can discuss at tomorrow's
> > >>> > community
> > >>> > > > >> > meeting.
> > >>> > > > >> > >
> > >>> > > > >> > > - Sijie
> > >>> > > > >> > >
> > >>> > > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> > >>> > > > eolivelli@gmail.com
> > >>> > > > >> >
> > >>> > > > >> > > wrote:
> > >>> > > > >> > >
> > >>> > > > >> > > > Hi all,
> > >>> > > > >> > > > I have drafted a first proposal for BP-14 - Relax
> > >>> Durability
> > >>> > > > >> > > >
> > >>> > > > >> > > > We are talking about limiting the number of fsync to
> the
> > >>> > journal
> > >>> > > > >> while
> > >>> > > > >> > > > preserving the correctness of the LAC protocol.
> > >>> > > > >> > > >
> > >>> > > > >> > > > This is the link to the wiki page, but as the issue is
> > >>> huge we
> > >>> > > > >> prefer
> > >>> > > > >> > to
> > >>> > > > >> > > > use Google Documents for sharing comments
> > >>> > > > >> > > > https://cwiki.apache.org/
> confluence/display/BOOKKEEPER/
> > >>> > > > >> > > > BP+-+14+Relax+durability
> > >>> > > > >> > > >
> > >>> > > > >> > > > This is the document
> > >>> > > > >> > > > https://docs.google.com/document/d/
> > 1JLYO3K3tZ5PJGmyS0YK_-
> > >>> > > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > >>> > > > >> > > >
> > >>> > > > >> > > > All comments are welcome
> > >>> > > > >> > > >
> > >>> > > > >> > > > I have added DL dev list in cc as the discussion is
> > >>> > interesting
> > >>> > > > for
> > >>> > > > >> > both
> > >>> > > > >> > > > groups
> > >>> > > > >> > > >
> > >>> > > > >> > > > Enrico Olivelli
> > >>> > > > >> > > >
> > >>> > > > >> > >
> > >>> > > > >> >
> > >>> > > > >>
> > >>> > > > >>
> > >>> > > > >>
> > >>> > > > >> --
> > >>> > > > >> Jvrao
> > >>> > > > >> ---
> > >>> > > > >> First they ignore you, then they laugh at you, then they
> fight
> > >>> you,
> > >>> > > then
> > >>> > > > >> you win. - Mahatma Gandhi
> > >>> > > > >>
> > >>> > > > > --
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > -- Enrico Olivelli
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > > --
> > >>> > > Jvrao
> > >>> > > ---
> > >>> > > First they ignore you, then they laugh at you, then they fight
> you,
> > >>> then
> > >>> > > you win. - Mahatma Gandhi
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Sijie Guo <gu...@gmail.com>.
Enrico,

Feel free to close the thread and mark this BP as accepted, if there is no
-1.

- Sijie

On Mon, Sep 11, 2017 at 2:26 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> Ping
>
> 2017-09-07 9:32 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
>
> > Hi all,
> >
> >
> > You can find the revised proposal here
> > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > BP-14+Relax+durability
> >
> > The link to the document open for comments is this:
> > https://docs.google.com/document/d/1yNi9t2_
> deOOMXDaGzrnmaHTQeB3B3Fnym82DU
> > ERH7LM/edit?usp=sharing
> >
> > Please check it out
> > We are going to review this Proposal at the meeting
> >
> > -- Enrico
> >
> >
> > 2017-08-30 8:56 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> >
> >> Thank you Sijie for summarizing and thanks to the community for helping
> >> in this important enhancement to BookKeeper
> >>
> >> I am convinced that as JV pointed out we need to declare at ledger
> >> creation time that the ledger is going to perform no-sync writes.
> >>
> >> I think we need an explicit declaration currently to make things "clear"
> >> to the developer which is using the LedgerHandle API even and ledger
> >> creation tyime.
> >>
> >> The case is that we are going to forbid "striping" ledgers (ensemble
> size
> >> > quorum size) for no-sync writes in the first implementation:
> >> - one option is to  fail at the first no-sync addEntry, but this will be
> >> really uncomfortable because usually the ack/write/ensemble sizes are
> >> configured by the admin, and there will be configurations in which
> errors
> >> will come out only after starting the system.
> >> - the second option is to make the developer explicitly enable no-sync
> >> writes at creation time and fail the creation of the ledger if the
> >> requested combination of options if not possible
> >>
> >> I am not sure that the changes to the bookie internals are a Client-API
> >> matter, maybe we can leverage custom metadata (as JV said) in order to
> make
> >> the bookie handle ledgers in a different manner, this way will be always
> >> open as custom metadata are already here.
> >>
> >> JV preferred the ledger-type approach, the dual solution is to introduce
> >> a list of "capabilities" or "ledger options".
> >> I think that this ability to perform no-syc writes is so important that
> >> "custom metadata" is not the good place to declare it, same for "ledger
> >> type"
> >>
> >> So I am proposing to add a boolean 'allowNoSyncWrites" at ledger
> creation
> >> time, without writing in to ledger metadata on ZK,
> >> I think that if further improvements will need ledger metadata changes
> we
> >> will do.
> >>
> >> I have updated the BP-14 document, I have added an "Open issues" footer
> >> with the open points,
> >> please add comments and I will correct the document as soon as possible.
> >>
> >>
> >> Enrico
> >>
> >>
> >>
> >>
> >> 2017-08-30 1:24 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> >>
> >>> Thank you, Enrico, JV.
> >>>
> >>> These are great discussions.
> >>>
> >>> After reading these two proposals, I have a few very high-level
> comments,
> >>> dividing into three categories.
> >>>
> >>>
> >>> *API*
> >>>
> >>> - I think there are not fundamentally differences between these two
> >>> proposals.
> >>> They are trying to achieve similar goals by exposing durability levels
> in
> >>> different way.
> >>> So this will be a discussion on what API/interface should look like
> from
> >>> user / admin perspective.
> >>> I would suggest focusing what would be the API itself, putting the
> >>> implementation design aside when talking about this.
> >>>
> >>> *Core*
> >>>
> >>> - Both proposals need to deal with a core function - what happen to LAC
> >>> and
> >>> what semantic that bookkeeper provides.
> >>> JV did a good summary in his proposal. However I am not a fan of
> >>> maintaining two different semantics. So I am looking for
> >>> a solution that bookkeeper can only maintain one semantic. The semantic
> >>> is
> >>> basically:
> >>>
> >>> 1) LAC only advanced when entries before LAC are committed to the
> >>> persistent storage
> >>> 2) All the entries until LAC are successfully committed to the
> >>> persistence
> >>> storage
> >>> 3) Entries until LAC: all the entries must be readable all the time.
> >>>
> >>> If we maintain such semantic, there is no need to change the auto
> >>> recovery
> >>> protocol in bookkeeper. All what we guarantee are the entries durably
> >>> persistent.
> >>>
> >>> In order to maintain such semantic, I think both me and JV proposed
> >>> similar
> >>> solution in either proposal. I am trying to finalize one here:
> >>>
> >>> * bookie maintains a LAS (Last Add Synced) point for each entry.
> >>> * LAS can be piggybacked on AddResponses
> >>> * Client uses the LAS to advance LAC.
> >>>
> >>> If we can agree on the core semantic we are going to provide, the other
> >>> things are just logistics.
> >>>
> >>> *Others*
> >>>
> >>> - Regarding separating journal or bypassing journal, there is no
> >>> difference
> >>> when we talking from the core semantic. They are all non-durably writes
> >>> (acknowledging before fsyncing).
> >>> We can start with same journal approach (but just acknowledge before
> >>> fsyncing), implement the core and add other options later on.
> >>>
> >>>
> >>> From my point of view, I'd be more interesting in providing a single
> >>> consistent durable semantic that application can rely on for both
> durable
> >>> writes and non-durable writes. The other stuffs seem to be more
> logistics
> >>> things.
> >>>
> >>>
> >>> - Sijie
> >>>
> >>>
> >>> On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <eolivelli@gmail.com
> >
> >>> wrote:
> >>>
> >>> > 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <
> jujjuri@gmail.com
> >>> >:
> >>> >
> >>> > > I don't believe I fully followed your second case. But even in this
> >>> case,
> >>> > > your major concern is about the additional 'sync' RPC?
> >>> > >
> >>> >
> >>> > yes apart from that I am fine with your proposal too, that is to
> have a
> >>> > LedgerType which drives durability
> >>> > and I think we need to add per-entry durability options
> >>> >
> >>> > I think that at least for the 'simple' no-sync addEntry we do not
> need
> >>> to
> >>> > change many things, I am drafting a prototype, I will share it as
> soon
> >>> as
> >>> > we all agree on the roadmap
> >>> >
> >>> > The first implementation can cover the first cases (no-sync addEntry)
> >>> and
> >>> > change the way the writer advances the LAC in order to support
> 'relaxed
> >>> > durability writes'.
> >>> > This change will be compatible with future improvements and it will
> >>> open
> >>> > the door for big changes on the bookie side like bypassing the
> journal
> >>> or
> >>> > leveraging multiple journals.....
> >>> >
> >>> > -- Enrico
> >>> >
> >>> > or something else that the LedgerType proposal won't work?
> >>> > >
> >>> >
> >>> > >
> >>> > >
> >>> > > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <
> >>> eolivelli@gmail.com>
> >>> > > wrote:
> >>> > >
> >>> > > > I think that having a set of options on the ledger metadata will
> >>> be a
> >>> > > good
> >>> > > > enhancement and I am sure we will do it as soon as it will be
> >>> needed,
> >>> > > maybe
> >>> > > > we do not need it now.
> >>> > > >
> >>> > > > Actually I think we will need to declare this durability-level at
> >>> entry
> >>> > > > level to support some uses cases in BP-14 document, let me
> explain
> >>> two
> >>> > of
> >>> > > > my usecases for which I need it:
> >>> > > >
> >>> > > > At higher level we have to choices:
> >>> > > >
> >>> > > > A) per-ledger durability options (JV proposal)
> >>> > > > all addEntry operations are durable or non-durable and there is
> an
> >>> > > explicit
> >>> > > > 'sync' API (+ forced sync at close)
> >>> > > >
> >>> > > > B) per-entry durability options (original BP-14 proposal)
> >>> > > > every addEntry has an own durable/non-durable option
> >>> (sync/no-sync),
> >>> > with
> >>> > > > the ability to call 'sync' without addEntry (+ forced sync at
> >>> close)
> >>> > > >
> >>> > > > I am speaking about the the database WAL case, I am using the
> >>> ledger as
> >>> > > > segment for the WAL of a database and I am writing all data
> >>> changes in
> >>> > > the
> >>> > > > scope of a 'transaction' with the relaxed-durability flag, then I
> >>> am
> >>> > > > writing the 'transaction committed' entry with "strict
> durability"
> >>> > > > requirement, this will in fact require that all previous entries
> >>> are
> >>> > > > persisted durably and so that the transaction will never be lost.
> >>> > > >
> >>> > > > In this scenario we would need an addEntry + sync API in fact:
> >>> > > >
> >>> > > > using option  A) the WAL will look like:
> >>> > > > - open ledger no-sync = true
> >>> > > > - addEntry (set foo=bar)  (this will be no-sync)
> >>> > > > - addEntry (set foo=bar2) (this will be no-sync)
> >>> > > > - addEntry (commit)
> >>> > > > - sync
> >>> > > >
> >>> > > > using option B) the WAL will look like
> >>> > > > - open ledger
> >>> > > > - addEntry (set foo=bar), no-sync
> >>> > > > - addEntry (set foo=bar2), no-sync
> >>> > > > - addEntry (commit), sync
> >>> > > >
> >>> > > > in case B) we are "saving" one RPC call to every bookie (the
> 'sync'
> >>> > one)
> >>> > > > same for single data change entries, like updating a single
> record
> >>> on
> >>> > the
> >>> > > > database, this with BK 4.5 "costs" only a single RPC to every
> >>> bookie
> >>> > > >
> >>> > > > Second case:
> >>> > > > I am using BookKeeper to store binary objects, so I am packing
> more
> >>> > > > 'objects' (named sequences of bytes) into a single ledger, like
> >>> you do
> >>> > > when
> >>> > > > you write many records to a file in a streaming fashion and keep
> >>> track
> >>> > of
> >>> > > > offsets of the beginning of every record (LedgerHandeAdv is
> >>> perfect for
> >>> > > > this case).
> >>> > > > I am not using a single ledger per 'file' because it kills
> >>> zookeeper to
> >>> > > > create many ledgers very fast, in my systems I have big busts of
> >>> > writes,
> >>> > > > which need to be really "fast", so I am writing multiple 'files'
> to
> >>> > every
> >>> > > > single ledger. So the close-to-open consistency at ledger level
> is
> >>> not
> >>> > > > suitable for this case.
> >>> > > > I have to write as fast as possible to this 'ledger-backed'
> >>> stream, and
> >>> > > as
> >>> > > > with a 'traditional'  filesystem I am writing parts of each file
> >>> and
> >>> > than
> >>> > > > requiring 'sync' at the end of each file.
> >>> > > > Using BookKeeper you need to split big 'files' into "little"
> >>> parts, you
> >>> > > > cannot transmit the contents as to "real" stream on network.
> >>> > > >
> >>> > > > I am not talking about bookie level implementation details I
> would
> >>> like
> >>> > > to
> >>> > > > define the high level API in order to support all the relevant
> >>> known
> >>> > use
> >>> > > > cases and keep space for the future,
> >>> > > > at this moment adding a per-entry 'durability option' seems to be
> >>> very
> >>> > > > flexible and simple to implement, it does not prevent us from
> doing
> >>> > > further
> >>> > > > improvements, like namely skipping the journal.
> >>> > > >
> >>> > > > Enrico
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eolivelli@gmail.com
> >:
> >>> > > >
> >>> > > > >
> >>> > > > >
> >>> > > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
> >>> > jujjuri@gmail.com>
> >>> > > > > wrote:
> >>> > > > >
> >>> > > > >> Hi all,
> >>> > > > >>
> >>> > > > >> As promised during Thursday call, here is my proposal.
> >>> > > > >>
> >>> > > > >> *NOTE*: Major difference in this proposal compared to Enrico’s
> >>> > > > >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> >>> > > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> >>> > > > >> is
> >>> > > > >> making the durability a property of the ledger(type) as
> opposed
> >>> to
> >>> > > > >> addEntry(). Rest of the technical details have a lot of
> >>> > similarities.
> >>> > > > >>
> >>> > > > >
> >>> > > > > Thank you JV. I have just read quickly the doc and your view is
> >>> > > centantly
> >>> > > > > broader.
> >>> > > > > I will dig into the doc as soon as possible on Monday.
> >>> > > > > For me it is ok to have a ledger wide configuration I think
> that
> >>> the
> >>> > > most
> >>> > > > > important decision is about the API we will provide as in the
> >>> future
> >>> > it
> >>> > > > > will be difficult to change it.
> >>> > > > >
> >>> > > > >
> >>> > > > > Cheers
> >>> > > > > Enrico
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > >> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqv
> >>> Wpq43
> >>> > > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> >>> > > > >>
> >>> > > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
> >>> > eolivelli@gmail.com
> >>> > > >
> >>> > > > >> wrote:
> >>> > > > >>
> >>> > > > >> > Thank you all for the comments and for taking a look to the
> >>> > document
> >>> > > > so
> >>> > > > >> > soon.
> >>> > > > >> > I have updated the doc, we will discuss the document at the
> >>> > meeting,
> >>> > > > >> >
> >>> > > > >> >
> >>> > > > >> > Enrico
> >>> > > > >> >
> >>> > > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> >>> > > > >> >
> >>> > > > >> > > Enrico,
> >>> > > > >> > >
> >>> > > > >> > > Thank you so much! It is a great effort for putting this
> up.
> >>> > > Overall
> >>> > > > >> > looks
> >>> > > > >> > > good. I made some comments, we can discuss at tomorrow's
> >>> > community
> >>> > > > >> > meeting.
> >>> > > > >> > >
> >>> > > > >> > > - Sijie
> >>> > > > >> > >
> >>> > > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> >>> > > > eolivelli@gmail.com
> >>> > > > >> >
> >>> > > > >> > > wrote:
> >>> > > > >> > >
> >>> > > > >> > > > Hi all,
> >>> > > > >> > > > I have drafted a first proposal for BP-14 - Relax
> >>> Durability
> >>> > > > >> > > >
> >>> > > > >> > > > We are talking about limiting the number of fsync to the
> >>> > journal
> >>> > > > >> while
> >>> > > > >> > > > preserving the correctness of the LAC protocol.
> >>> > > > >> > > >
> >>> > > > >> > > > This is the link to the wiki page, but as the issue is
> >>> huge we
> >>> > > > >> prefer
> >>> > > > >> > to
> >>> > > > >> > > > use Google Documents for sharing comments
> >>> > > > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> >>> > > > >> > > > BP+-+14+Relax+durability
> >>> > > > >> > > >
> >>> > > > >> > > > This is the document
> >>> > > > >> > > > https://docs.google.com/document/d/
> 1JLYO3K3tZ5PJGmyS0YK_-
> >>> > > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> >>> > > > >> > > >
> >>> > > > >> > > > All comments are welcome
> >>> > > > >> > > >
> >>> > > > >> > > > I have added DL dev list in cc as the discussion is
> >>> > interesting
> >>> > > > for
> >>> > > > >> > both
> >>> > > > >> > > > groups
> >>> > > > >> > > >
> >>> > > > >> > > > Enrico Olivelli
> >>> > > > >> > > >
> >>> > > > >> > >
> >>> > > > >> >
> >>> > > > >>
> >>> > > > >>
> >>> > > > >>
> >>> > > > >> --
> >>> > > > >> Jvrao
> >>> > > > >> ---
> >>> > > > >> First they ignore you, then they laugh at you, then they fight
> >>> you,
> >>> > > then
> >>> > > > >> you win. - Mahatma Gandhi
> >>> > > > >>
> >>> > > > > --
> >>> > > > >
> >>> > > > >
> >>> > > > > -- Enrico Olivelli
> >>> > > > >
> >>> > > >
> >>> > >
> >>> > >
> >>> > >
> >>> > > --
> >>> > > Jvrao
> >>> > > ---
> >>> > > First they ignore you, then they laugh at you, then they fight you,
> >>> then
> >>> > > you win. - Mahatma Gandhi
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Sijie Guo <gu...@gmail.com>.
Enrico,

Feel free to close the thread and mark this BP as accepted, if there is no
-1.

- Sijie

On Mon, Sep 11, 2017 at 2:26 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> Ping
>
> 2017-09-07 9:32 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
>
> > Hi all,
> >
> >
> > You can find the revised proposal here
> > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > BP-14+Relax+durability
> >
> > The link to the document open for comments is this:
> > https://docs.google.com/document/d/1yNi9t2_
> deOOMXDaGzrnmaHTQeB3B3Fnym82DU
> > ERH7LM/edit?usp=sharing
> >
> > Please check it out
> > We are going to review this Proposal at the meeting
> >
> > -- Enrico
> >
> >
> > 2017-08-30 8:56 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> >
> >> Thank you Sijie for summarizing and thanks to the community for helping
> >> in this important enhancement to BookKeeper
> >>
> >> I am convinced that as JV pointed out we need to declare at ledger
> >> creation time that the ledger is going to perform no-sync writes.
> >>
> >> I think we need an explicit declaration currently to make things "clear"
> >> to the developer which is using the LedgerHandle API even and ledger
> >> creation tyime.
> >>
> >> The case is that we are going to forbid "striping" ledgers (ensemble
> size
> >> > quorum size) for no-sync writes in the first implementation:
> >> - one option is to  fail at the first no-sync addEntry, but this will be
> >> really uncomfortable because usually the ack/write/ensemble sizes are
> >> configured by the admin, and there will be configurations in which
> errors
> >> will come out only after starting the system.
> >> - the second option is to make the developer explicitly enable no-sync
> >> writes at creation time and fail the creation of the ledger if the
> >> requested combination of options if not possible
> >>
> >> I am not sure that the changes to the bookie internals are a Client-API
> >> matter, maybe we can leverage custom metadata (as JV said) in order to
> make
> >> the bookie handle ledgers in a different manner, this way will be always
> >> open as custom metadata are already here.
> >>
> >> JV preferred the ledger-type approach, the dual solution is to introduce
> >> a list of "capabilities" or "ledger options".
> >> I think that this ability to perform no-syc writes is so important that
> >> "custom metadata" is not the good place to declare it, same for "ledger
> >> type"
> >>
> >> So I am proposing to add a boolean 'allowNoSyncWrites" at ledger
> creation
> >> time, without writing in to ledger metadata on ZK,
> >> I think that if further improvements will need ledger metadata changes
> we
> >> will do.
> >>
> >> I have updated the BP-14 document, I have added an "Open issues" footer
> >> with the open points,
> >> please add comments and I will correct the document as soon as possible.
> >>
> >>
> >> Enrico
> >>
> >>
> >>
> >>
> >> 2017-08-30 1:24 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> >>
> >>> Thank you, Enrico, JV.
> >>>
> >>> These are great discussions.
> >>>
> >>> After reading these two proposals, I have a few very high-level
> comments,
> >>> dividing into three categories.
> >>>
> >>>
> >>> *API*
> >>>
> >>> - I think there are not fundamentally differences between these two
> >>> proposals.
> >>> They are trying to achieve similar goals by exposing durability levels
> in
> >>> different way.
> >>> So this will be a discussion on what API/interface should look like
> from
> >>> user / admin perspective.
> >>> I would suggest focusing what would be the API itself, putting the
> >>> implementation design aside when talking about this.
> >>>
> >>> *Core*
> >>>
> >>> - Both proposals need to deal with a core function - what happen to LAC
> >>> and
> >>> what semantic that bookkeeper provides.
> >>> JV did a good summary in his proposal. However I am not a fan of
> >>> maintaining two different semantics. So I am looking for
> >>> a solution that bookkeeper can only maintain one semantic. The semantic
> >>> is
> >>> basically:
> >>>
> >>> 1) LAC only advanced when entries before LAC are committed to the
> >>> persistent storage
> >>> 2) All the entries until LAC are successfully committed to the
> >>> persistence
> >>> storage
> >>> 3) Entries until LAC: all the entries must be readable all the time.
> >>>
> >>> If we maintain such semantic, there is no need to change the auto
> >>> recovery
> >>> protocol in bookkeeper. All what we guarantee are the entries durably
> >>> persistent.
> >>>
> >>> In order to maintain such semantic, I think both me and JV proposed
> >>> similar
> >>> solution in either proposal. I am trying to finalize one here:
> >>>
> >>> * bookie maintains a LAS (Last Add Synced) point for each entry.
> >>> * LAS can be piggybacked on AddResponses
> >>> * Client uses the LAS to advance LAC.
> >>>
> >>> If we can agree on the core semantic we are going to provide, the other
> >>> things are just logistics.
> >>>
> >>> *Others*
> >>>
> >>> - Regarding separating journal or bypassing journal, there is no
> >>> difference
> >>> when we talking from the core semantic. They are all non-durably writes
> >>> (acknowledging before fsyncing).
> >>> We can start with same journal approach (but just acknowledge before
> >>> fsyncing), implement the core and add other options later on.
> >>>
> >>>
> >>> From my point of view, I'd be more interesting in providing a single
> >>> consistent durable semantic that application can rely on for both
> durable
> >>> writes and non-durable writes. The other stuffs seem to be more
> logistics
> >>> things.
> >>>
> >>>
> >>> - Sijie
> >>>
> >>>
> >>> On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <eolivelli@gmail.com
> >
> >>> wrote:
> >>>
> >>> > 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <
> jujjuri@gmail.com
> >>> >:
> >>> >
> >>> > > I don't believe I fully followed your second case. But even in this
> >>> case,
> >>> > > your major concern is about the additional 'sync' RPC?
> >>> > >
> >>> >
> >>> > yes apart from that I am fine with your proposal too, that is to
> have a
> >>> > LedgerType which drives durability
> >>> > and I think we need to add per-entry durability options
> >>> >
> >>> > I think that at least for the 'simple' no-sync addEntry we do not
> need
> >>> to
> >>> > change many things, I am drafting a prototype, I will share it as
> soon
> >>> as
> >>> > we all agree on the roadmap
> >>> >
> >>> > The first implementation can cover the first cases (no-sync addEntry)
> >>> and
> >>> > change the way the writer advances the LAC in order to support
> 'relaxed
> >>> > durability writes'.
> >>> > This change will be compatible with future improvements and it will
> >>> open
> >>> > the door for big changes on the bookie side like bypassing the
> journal
> >>> or
> >>> > leveraging multiple journals.....
> >>> >
> >>> > -- Enrico
> >>> >
> >>> > or something else that the LedgerType proposal won't work?
> >>> > >
> >>> >
> >>> > >
> >>> > >
> >>> > > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <
> >>> eolivelli@gmail.com>
> >>> > > wrote:
> >>> > >
> >>> > > > I think that having a set of options on the ledger metadata will
> >>> be a
> >>> > > good
> >>> > > > enhancement and I am sure we will do it as soon as it will be
> >>> needed,
> >>> > > maybe
> >>> > > > we do not need it now.
> >>> > > >
> >>> > > > Actually I think we will need to declare this durability-level at
> >>> entry
> >>> > > > level to support some uses cases in BP-14 document, let me
> explain
> >>> two
> >>> > of
> >>> > > > my usecases for which I need it:
> >>> > > >
> >>> > > > At higher level we have to choices:
> >>> > > >
> >>> > > > A) per-ledger durability options (JV proposal)
> >>> > > > all addEntry operations are durable or non-durable and there is
> an
> >>> > > explicit
> >>> > > > 'sync' API (+ forced sync at close)
> >>> > > >
> >>> > > > B) per-entry durability options (original BP-14 proposal)
> >>> > > > every addEntry has an own durable/non-durable option
> >>> (sync/no-sync),
> >>> > with
> >>> > > > the ability to call 'sync' without addEntry (+ forced sync at
> >>> close)
> >>> > > >
> >>> > > > I am speaking about the the database WAL case, I am using the
> >>> ledger as
> >>> > > > segment for the WAL of a database and I am writing all data
> >>> changes in
> >>> > > the
> >>> > > > scope of a 'transaction' with the relaxed-durability flag, then I
> >>> am
> >>> > > > writing the 'transaction committed' entry with "strict
> durability"
> >>> > > > requirement, this will in fact require that all previous entries
> >>> are
> >>> > > > persisted durably and so that the transaction will never be lost.
> >>> > > >
> >>> > > > In this scenario we would need an addEntry + sync API in fact:
> >>> > > >
> >>> > > > using option  A) the WAL will look like:
> >>> > > > - open ledger no-sync = true
> >>> > > > - addEntry (set foo=bar)  (this will be no-sync)
> >>> > > > - addEntry (set foo=bar2) (this will be no-sync)
> >>> > > > - addEntry (commit)
> >>> > > > - sync
> >>> > > >
> >>> > > > using option B) the WAL will look like
> >>> > > > - open ledger
> >>> > > > - addEntry (set foo=bar), no-sync
> >>> > > > - addEntry (set foo=bar2), no-sync
> >>> > > > - addEntry (commit), sync
> >>> > > >
> >>> > > > in case B) we are "saving" one RPC call to every bookie (the
> 'sync'
> >>> > one)
> >>> > > > same for single data change entries, like updating a single
> record
> >>> on
> >>> > the
> >>> > > > database, this with BK 4.5 "costs" only a single RPC to every
> >>> bookie
> >>> > > >
> >>> > > > Second case:
> >>> > > > I am using BookKeeper to store binary objects, so I am packing
> more
> >>> > > > 'objects' (named sequences of bytes) into a single ledger, like
> >>> you do
> >>> > > when
> >>> > > > you write many records to a file in a streaming fashion and keep
> >>> track
> >>> > of
> >>> > > > offsets of the beginning of every record (LedgerHandeAdv is
> >>> perfect for
> >>> > > > this case).
> >>> > > > I am not using a single ledger per 'file' because it kills
> >>> zookeeper to
> >>> > > > create many ledgers very fast, in my systems I have big busts of
> >>> > writes,
> >>> > > > which need to be really "fast", so I am writing multiple 'files'
> to
> >>> > every
> >>> > > > single ledger. So the close-to-open consistency at ledger level
> is
> >>> not
> >>> > > > suitable for this case.
> >>> > > > I have to write as fast as possible to this 'ledger-backed'
> >>> stream, and
> >>> > > as
> >>> > > > with a 'traditional'  filesystem I am writing parts of each file
> >>> and
> >>> > than
> >>> > > > requiring 'sync' at the end of each file.
> >>> > > > Using BookKeeper you need to split big 'files' into "little"
> >>> parts, you
> >>> > > > cannot transmit the contents as to "real" stream on network.
> >>> > > >
> >>> > > > I am not talking about bookie level implementation details I
> would
> >>> like
> >>> > > to
> >>> > > > define the high level API in order to support all the relevant
> >>> known
> >>> > use
> >>> > > > cases and keep space for the future,
> >>> > > > at this moment adding a per-entry 'durability option' seems to be
> >>> very
> >>> > > > flexible and simple to implement, it does not prevent us from
> doing
> >>> > > further
> >>> > > > improvements, like namely skipping the journal.
> >>> > > >
> >>> > > > Enrico
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eolivelli@gmail.com
> >:
> >>> > > >
> >>> > > > >
> >>> > > > >
> >>> > > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
> >>> > jujjuri@gmail.com>
> >>> > > > > wrote:
> >>> > > > >
> >>> > > > >> Hi all,
> >>> > > > >>
> >>> > > > >> As promised during Thursday call, here is my proposal.
> >>> > > > >>
> >>> > > > >> *NOTE*: Major difference in this proposal compared to Enrico’s
> >>> > > > >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> >>> > > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> >>> > > > >> is
> >>> > > > >> making the durability a property of the ledger(type) as
> opposed
> >>> to
> >>> > > > >> addEntry(). Rest of the technical details have a lot of
> >>> > similarities.
> >>> > > > >>
> >>> > > > >
> >>> > > > > Thank you JV. I have just read quickly the doc and your view is
> >>> > > centantly
> >>> > > > > broader.
> >>> > > > > I will dig into the doc as soon as possible on Monday.
> >>> > > > > For me it is ok to have a ledger wide configuration I think
> that
> >>> the
> >>> > > most
> >>> > > > > important decision is about the API we will provide as in the
> >>> future
> >>> > it
> >>> > > > > will be difficult to change it.
> >>> > > > >
> >>> > > > >
> >>> > > > > Cheers
> >>> > > > > Enrico
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > >> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqv
> >>> Wpq43
> >>> > > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> >>> > > > >>
> >>> > > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
> >>> > eolivelli@gmail.com
> >>> > > >
> >>> > > > >> wrote:
> >>> > > > >>
> >>> > > > >> > Thank you all for the comments and for taking a look to the
> >>> > document
> >>> > > > so
> >>> > > > >> > soon.
> >>> > > > >> > I have updated the doc, we will discuss the document at the
> >>> > meeting,
> >>> > > > >> >
> >>> > > > >> >
> >>> > > > >> > Enrico
> >>> > > > >> >
> >>> > > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> >>> > > > >> >
> >>> > > > >> > > Enrico,
> >>> > > > >> > >
> >>> > > > >> > > Thank you so much! It is a great effort for putting this
> up.
> >>> > > Overall
> >>> > > > >> > looks
> >>> > > > >> > > good. I made some comments, we can discuss at tomorrow's
> >>> > community
> >>> > > > >> > meeting.
> >>> > > > >> > >
> >>> > > > >> > > - Sijie
> >>> > > > >> > >
> >>> > > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> >>> > > > eolivelli@gmail.com
> >>> > > > >> >
> >>> > > > >> > > wrote:
> >>> > > > >> > >
> >>> > > > >> > > > Hi all,
> >>> > > > >> > > > I have drafted a first proposal for BP-14 - Relax
> >>> Durability
> >>> > > > >> > > >
> >>> > > > >> > > > We are talking about limiting the number of fsync to the
> >>> > journal
> >>> > > > >> while
> >>> > > > >> > > > preserving the correctness of the LAC protocol.
> >>> > > > >> > > >
> >>> > > > >> > > > This is the link to the wiki page, but as the issue is
> >>> huge we
> >>> > > > >> prefer
> >>> > > > >> > to
> >>> > > > >> > > > use Google Documents for sharing comments
> >>> > > > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> >>> > > > >> > > > BP+-+14+Relax+durability
> >>> > > > >> > > >
> >>> > > > >> > > > This is the document
> >>> > > > >> > > > https://docs.google.com/document/d/
> 1JLYO3K3tZ5PJGmyS0YK_-
> >>> > > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> >>> > > > >> > > >
> >>> > > > >> > > > All comments are welcome
> >>> > > > >> > > >
> >>> > > > >> > > > I have added DL dev list in cc as the discussion is
> >>> > interesting
> >>> > > > for
> >>> > > > >> > both
> >>> > > > >> > > > groups
> >>> > > > >> > > >
> >>> > > > >> > > > Enrico Olivelli
> >>> > > > >> > > >
> >>> > > > >> > >
> >>> > > > >> >
> >>> > > > >>
> >>> > > > >>
> >>> > > > >>
> >>> > > > >> --
> >>> > > > >> Jvrao
> >>> > > > >> ---
> >>> > > > >> First they ignore you, then they laugh at you, then they fight
> >>> you,
> >>> > > then
> >>> > > > >> you win. - Mahatma Gandhi
> >>> > > > >>
> >>> > > > > --
> >>> > > > >
> >>> > > > >
> >>> > > > > -- Enrico Olivelli
> >>> > > > >
> >>> > > >
> >>> > >
> >>> > >
> >>> > >
> >>> > > --
> >>> > > Jvrao
> >>> > > ---
> >>> > > First they ignore you, then they laugh at you, then they fight you,
> >>> then
> >>> > > you win. - Mahatma Gandhi
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
Ping

2017-09-07 9:32 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:

> Hi all,
>
>
> You can find the revised proposal here
> https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> BP-14+Relax+durability
>
> The link to the document open for comments is this:
> https://docs.google.com/document/d/1yNi9t2_deOOMXDaGzrnmaHTQeB3B3Fnym82DU
> ERH7LM/edit?usp=sharing
>
> Please check it out
> We are going to review this Proposal at the meeting
>
> -- Enrico
>
>
> 2017-08-30 8:56 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
>
>> Thank you Sijie for summarizing and thanks to the community for helping
>> in this important enhancement to BookKeeper
>>
>> I am convinced that as JV pointed out we need to declare at ledger
>> creation time that the ledger is going to perform no-sync writes.
>>
>> I think we need an explicit declaration currently to make things "clear"
>> to the developer which is using the LedgerHandle API even and ledger
>> creation tyime.
>>
>> The case is that we are going to forbid "striping" ledgers (ensemble size
>> > quorum size) for no-sync writes in the first implementation:
>> - one option is to  fail at the first no-sync addEntry, but this will be
>> really uncomfortable because usually the ack/write/ensemble sizes are
>> configured by the admin, and there will be configurations in which errors
>> will come out only after starting the system.
>> - the second option is to make the developer explicitly enable no-sync
>> writes at creation time and fail the creation of the ledger if the
>> requested combination of options if not possible
>>
>> I am not sure that the changes to the bookie internals are a Client-API
>> matter, maybe we can leverage custom metadata (as JV said) in order to make
>> the bookie handle ledgers in a different manner, this way will be always
>> open as custom metadata are already here.
>>
>> JV preferred the ledger-type approach, the dual solution is to introduce
>> a list of "capabilities" or "ledger options".
>> I think that this ability to perform no-syc writes is so important that
>> "custom metadata" is not the good place to declare it, same for "ledger
>> type"
>>
>> So I am proposing to add a boolean 'allowNoSyncWrites" at ledger creation
>> time, without writing in to ledger metadata on ZK,
>> I think that if further improvements will need ledger metadata changes we
>> will do.
>>
>> I have updated the BP-14 document, I have added an "Open issues" footer
>> with the open points,
>> please add comments and I will correct the document as soon as possible.
>>
>>
>> Enrico
>>
>>
>>
>>
>> 2017-08-30 1:24 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>>
>>> Thank you, Enrico, JV.
>>>
>>> These are great discussions.
>>>
>>> After reading these two proposals, I have a few very high-level comments,
>>> dividing into three categories.
>>>
>>>
>>> *API*
>>>
>>> - I think there are not fundamentally differences between these two
>>> proposals.
>>> They are trying to achieve similar goals by exposing durability levels in
>>> different way.
>>> So this will be a discussion on what API/interface should look like from
>>> user / admin perspective.
>>> I would suggest focusing what would be the API itself, putting the
>>> implementation design aside when talking about this.
>>>
>>> *Core*
>>>
>>> - Both proposals need to deal with a core function - what happen to LAC
>>> and
>>> what semantic that bookkeeper provides.
>>> JV did a good summary in his proposal. However I am not a fan of
>>> maintaining two different semantics. So I am looking for
>>> a solution that bookkeeper can only maintain one semantic. The semantic
>>> is
>>> basically:
>>>
>>> 1) LAC only advanced when entries before LAC are committed to the
>>> persistent storage
>>> 2) All the entries until LAC are successfully committed to the
>>> persistence
>>> storage
>>> 3) Entries until LAC: all the entries must be readable all the time.
>>>
>>> If we maintain such semantic, there is no need to change the auto
>>> recovery
>>> protocol in bookkeeper. All what we guarantee are the entries durably
>>> persistent.
>>>
>>> In order to maintain such semantic, I think both me and JV proposed
>>> similar
>>> solution in either proposal. I am trying to finalize one here:
>>>
>>> * bookie maintains a LAS (Last Add Synced) point for each entry.
>>> * LAS can be piggybacked on AddResponses
>>> * Client uses the LAS to advance LAC.
>>>
>>> If we can agree on the core semantic we are going to provide, the other
>>> things are just logistics.
>>>
>>> *Others*
>>>
>>> - Regarding separating journal or bypassing journal, there is no
>>> difference
>>> when we talking from the core semantic. They are all non-durably writes
>>> (acknowledging before fsyncing).
>>> We can start with same journal approach (but just acknowledge before
>>> fsyncing), implement the core and add other options later on.
>>>
>>>
>>> From my point of view, I'd be more interesting in providing a single
>>> consistent durable semantic that application can rely on for both durable
>>> writes and non-durable writes. The other stuffs seem to be more logistics
>>> things.
>>>
>>>
>>> - Sijie
>>>
>>>
>>> On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <eo...@gmail.com>
>>> wrote:
>>>
>>> > 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <jujjuri@gmail.com
>>> >:
>>> >
>>> > > I don't believe I fully followed your second case. But even in this
>>> case,
>>> > > your major concern is about the additional 'sync' RPC?
>>> > >
>>> >
>>> > yes apart from that I am fine with your proposal too, that is to have a
>>> > LedgerType which drives durability
>>> > and I think we need to add per-entry durability options
>>> >
>>> > I think that at least for the 'simple' no-sync addEntry we do not need
>>> to
>>> > change many things, I am drafting a prototype, I will share it as soon
>>> as
>>> > we all agree on the roadmap
>>> >
>>> > The first implementation can cover the first cases (no-sync addEntry)
>>> and
>>> > change the way the writer advances the LAC in order to support 'relaxed
>>> > durability writes'.
>>> > This change will be compatible with future improvements and it will
>>> open
>>> > the door for big changes on the bookie side like bypassing the journal
>>> or
>>> > leveraging multiple journals.....
>>> >
>>> > -- Enrico
>>> >
>>> > or something else that the LedgerType proposal won't work?
>>> > >
>>> >
>>> > >
>>> > >
>>> > > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <
>>> eolivelli@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > I think that having a set of options on the ledger metadata will
>>> be a
>>> > > good
>>> > > > enhancement and I am sure we will do it as soon as it will be
>>> needed,
>>> > > maybe
>>> > > > we do not need it now.
>>> > > >
>>> > > > Actually I think we will need to declare this durability-level at
>>> entry
>>> > > > level to support some uses cases in BP-14 document, let me explain
>>> two
>>> > of
>>> > > > my usecases for which I need it:
>>> > > >
>>> > > > At higher level we have to choices:
>>> > > >
>>> > > > A) per-ledger durability options (JV proposal)
>>> > > > all addEntry operations are durable or non-durable and there is an
>>> > > explicit
>>> > > > 'sync' API (+ forced sync at close)
>>> > > >
>>> > > > B) per-entry durability options (original BP-14 proposal)
>>> > > > every addEntry has an own durable/non-durable option
>>> (sync/no-sync),
>>> > with
>>> > > > the ability to call 'sync' without addEntry (+ forced sync at
>>> close)
>>> > > >
>>> > > > I am speaking about the the database WAL case, I am using the
>>> ledger as
>>> > > > segment for the WAL of a database and I am writing all data
>>> changes in
>>> > > the
>>> > > > scope of a 'transaction' with the relaxed-durability flag, then I
>>> am
>>> > > > writing the 'transaction committed' entry with "strict durability"
>>> > > > requirement, this will in fact require that all previous entries
>>> are
>>> > > > persisted durably and so that the transaction will never be lost.
>>> > > >
>>> > > > In this scenario we would need an addEntry + sync API in fact:
>>> > > >
>>> > > > using option  A) the WAL will look like:
>>> > > > - open ledger no-sync = true
>>> > > > - addEntry (set foo=bar)  (this will be no-sync)
>>> > > > - addEntry (set foo=bar2) (this will be no-sync)
>>> > > > - addEntry (commit)
>>> > > > - sync
>>> > > >
>>> > > > using option B) the WAL will look like
>>> > > > - open ledger
>>> > > > - addEntry (set foo=bar), no-sync
>>> > > > - addEntry (set foo=bar2), no-sync
>>> > > > - addEntry (commit), sync
>>> > > >
>>> > > > in case B) we are "saving" one RPC call to every bookie (the 'sync'
>>> > one)
>>> > > > same for single data change entries, like updating a single record
>>> on
>>> > the
>>> > > > database, this with BK 4.5 "costs" only a single RPC to every
>>> bookie
>>> > > >
>>> > > > Second case:
>>> > > > I am using BookKeeper to store binary objects, so I am packing more
>>> > > > 'objects' (named sequences of bytes) into a single ledger, like
>>> you do
>>> > > when
>>> > > > you write many records to a file in a streaming fashion and keep
>>> track
>>> > of
>>> > > > offsets of the beginning of every record (LedgerHandeAdv is
>>> perfect for
>>> > > > this case).
>>> > > > I am not using a single ledger per 'file' because it kills
>>> zookeeper to
>>> > > > create many ledgers very fast, in my systems I have big busts of
>>> > writes,
>>> > > > which need to be really "fast", so I am writing multiple 'files' to
>>> > every
>>> > > > single ledger. So the close-to-open consistency at ledger level is
>>> not
>>> > > > suitable for this case.
>>> > > > I have to write as fast as possible to this 'ledger-backed'
>>> stream, and
>>> > > as
>>> > > > with a 'traditional'  filesystem I am writing parts of each file
>>> and
>>> > than
>>> > > > requiring 'sync' at the end of each file.
>>> > > > Using BookKeeper you need to split big 'files' into "little"
>>> parts, you
>>> > > > cannot transmit the contents as to "real" stream on network.
>>> > > >
>>> > > > I am not talking about bookie level implementation details I would
>>> like
>>> > > to
>>> > > > define the high level API in order to support all the relevant
>>> known
>>> > use
>>> > > > cases and keep space for the future,
>>> > > > at this moment adding a per-entry 'durability option' seems to be
>>> very
>>> > > > flexible and simple to implement, it does not prevent us from doing
>>> > > further
>>> > > > improvements, like namely skipping the journal.
>>> > > >
>>> > > > Enrico
>>> > > >
>>> > > >
>>> > > >
>>> > > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
>>> > > >
>>> > > > >
>>> > > > >
>>> > > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
>>> > jujjuri@gmail.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > >> Hi all,
>>> > > > >>
>>> > > > >> As promised during Thursday call, here is my proposal.
>>> > > > >>
>>> > > > >> *NOTE*: Major difference in this proposal compared to Enrico’s
>>> > > > >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>>> > > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
>>> > > > >> is
>>> > > > >> making the durability a property of the ledger(type) as opposed
>>> to
>>> > > > >> addEntry(). Rest of the technical details have a lot of
>>> > similarities.
>>> > > > >>
>>> > > > >
>>> > > > > Thank you JV. I have just read quickly the doc and your view is
>>> > > centantly
>>> > > > > broader.
>>> > > > > I will dig into the doc as soon as possible on Monday.
>>> > > > > For me it is ok to have a ledger wide configuration I think that
>>> the
>>> > > most
>>> > > > > important decision is about the API we will provide as in the
>>> future
>>> > it
>>> > > > > will be difficult to change it.
>>> > > > >
>>> > > > >
>>> > > > > Cheers
>>> > > > > Enrico
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqv
>>> Wpq43
>>> > > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
>>> > > > >>
>>> > > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
>>> > eolivelli@gmail.com
>>> > > >
>>> > > > >> wrote:
>>> > > > >>
>>> > > > >> > Thank you all for the comments and for taking a look to the
>>> > document
>>> > > > so
>>> > > > >> > soon.
>>> > > > >> > I have updated the doc, we will discuss the document at the
>>> > meeting,
>>> > > > >> >
>>> > > > >> >
>>> > > > >> > Enrico
>>> > > > >> >
>>> > > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>>> > > > >> >
>>> > > > >> > > Enrico,
>>> > > > >> > >
>>> > > > >> > > Thank you so much! It is a great effort for putting this up.
>>> > > Overall
>>> > > > >> > looks
>>> > > > >> > > good. I made some comments, we can discuss at tomorrow's
>>> > community
>>> > > > >> > meeting.
>>> > > > >> > >
>>> > > > >> > > - Sijie
>>> > > > >> > >
>>> > > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
>>> > > > eolivelli@gmail.com
>>> > > > >> >
>>> > > > >> > > wrote:
>>> > > > >> > >
>>> > > > >> > > > Hi all,
>>> > > > >> > > > I have drafted a first proposal for BP-14 - Relax
>>> Durability
>>> > > > >> > > >
>>> > > > >> > > > We are talking about limiting the number of fsync to the
>>> > journal
>>> > > > >> while
>>> > > > >> > > > preserving the correctness of the LAC protocol.
>>> > > > >> > > >
>>> > > > >> > > > This is the link to the wiki page, but as the issue is
>>> huge we
>>> > > > >> prefer
>>> > > > >> > to
>>> > > > >> > > > use Google Documents for sharing comments
>>> > > > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
>>> > > > >> > > > BP+-+14+Relax+durability
>>> > > > >> > > >
>>> > > > >> > > > This is the document
>>> > > > >> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>>> > > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
>>> > > > >> > > >
>>> > > > >> > > > All comments are welcome
>>> > > > >> > > >
>>> > > > >> > > > I have added DL dev list in cc as the discussion is
>>> > interesting
>>> > > > for
>>> > > > >> > both
>>> > > > >> > > > groups
>>> > > > >> > > >
>>> > > > >> > > > Enrico Olivelli
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > > >>
>>> > > > >>
>>> > > > >> --
>>> > > > >> Jvrao
>>> > > > >> ---
>>> > > > >> First they ignore you, then they laugh at you, then they fight
>>> you,
>>> > > then
>>> > > > >> you win. - Mahatma Gandhi
>>> > > > >>
>>> > > > > --
>>> > > > >
>>> > > > >
>>> > > > > -- Enrico Olivelli
>>> > > > >
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Jvrao
>>> > > ---
>>> > > First they ignore you, then they laugh at you, then they fight you,
>>> then
>>> > > you win. - Mahatma Gandhi
>>> > >
>>> >
>>>
>>
>>
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
Ping

2017-09-07 9:32 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:

> Hi all,
>
>
> You can find the revised proposal here
> https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> BP-14+Relax+durability
>
> The link to the document open for comments is this:
> https://docs.google.com/document/d/1yNi9t2_deOOMXDaGzrnmaHTQeB3B3Fnym82DU
> ERH7LM/edit?usp=sharing
>
> Please check it out
> We are going to review this Proposal at the meeting
>
> -- Enrico
>
>
> 2017-08-30 8:56 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
>
>> Thank you Sijie for summarizing and thanks to the community for helping
>> in this important enhancement to BookKeeper
>>
>> I am convinced that as JV pointed out we need to declare at ledger
>> creation time that the ledger is going to perform no-sync writes.
>>
>> I think we need an explicit declaration currently to make things "clear"
>> to the developer which is using the LedgerHandle API even and ledger
>> creation tyime.
>>
>> The case is that we are going to forbid "striping" ledgers (ensemble size
>> > quorum size) for no-sync writes in the first implementation:
>> - one option is to  fail at the first no-sync addEntry, but this will be
>> really uncomfortable because usually the ack/write/ensemble sizes are
>> configured by the admin, and there will be configurations in which errors
>> will come out only after starting the system.
>> - the second option is to make the developer explicitly enable no-sync
>> writes at creation time and fail the creation of the ledger if the
>> requested combination of options if not possible
>>
>> I am not sure that the changes to the bookie internals are a Client-API
>> matter, maybe we can leverage custom metadata (as JV said) in order to make
>> the bookie handle ledgers in a different manner, this way will be always
>> open as custom metadata are already here.
>>
>> JV preferred the ledger-type approach, the dual solution is to introduce
>> a list of "capabilities" or "ledger options".
>> I think that this ability to perform no-syc writes is so important that
>> "custom metadata" is not the good place to declare it, same for "ledger
>> type"
>>
>> So I am proposing to add a boolean 'allowNoSyncWrites" at ledger creation
>> time, without writing in to ledger metadata on ZK,
>> I think that if further improvements will need ledger metadata changes we
>> will do.
>>
>> I have updated the BP-14 document, I have added an "Open issues" footer
>> with the open points,
>> please add comments and I will correct the document as soon as possible.
>>
>>
>> Enrico
>>
>>
>>
>>
>> 2017-08-30 1:24 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>>
>>> Thank you, Enrico, JV.
>>>
>>> These are great discussions.
>>>
>>> After reading these two proposals, I have a few very high-level comments,
>>> dividing into three categories.
>>>
>>>
>>> *API*
>>>
>>> - I think there are not fundamentally differences between these two
>>> proposals.
>>> They are trying to achieve similar goals by exposing durability levels in
>>> different way.
>>> So this will be a discussion on what API/interface should look like from
>>> user / admin perspective.
>>> I would suggest focusing what would be the API itself, putting the
>>> implementation design aside when talking about this.
>>>
>>> *Core*
>>>
>>> - Both proposals need to deal with a core function - what happen to LAC
>>> and
>>> what semantic that bookkeeper provides.
>>> JV did a good summary in his proposal. However I am not a fan of
>>> maintaining two different semantics. So I am looking for
>>> a solution that bookkeeper can only maintain one semantic. The semantic
>>> is
>>> basically:
>>>
>>> 1) LAC only advanced when entries before LAC are committed to the
>>> persistent storage
>>> 2) All the entries until LAC are successfully committed to the
>>> persistence
>>> storage
>>> 3) Entries until LAC: all the entries must be readable all the time.
>>>
>>> If we maintain such semantic, there is no need to change the auto
>>> recovery
>>> protocol in bookkeeper. All what we guarantee are the entries durably
>>> persistent.
>>>
>>> In order to maintain such semantic, I think both me and JV proposed
>>> similar
>>> solution in either proposal. I am trying to finalize one here:
>>>
>>> * bookie maintains a LAS (Last Add Synced) point for each entry.
>>> * LAS can be piggybacked on AddResponses
>>> * Client uses the LAS to advance LAC.
>>>
>>> If we can agree on the core semantic we are going to provide, the other
>>> things are just logistics.
>>>
>>> *Others*
>>>
>>> - Regarding separating journal or bypassing journal, there is no
>>> difference
>>> when we talking from the core semantic. They are all non-durably writes
>>> (acknowledging before fsyncing).
>>> We can start with same journal approach (but just acknowledge before
>>> fsyncing), implement the core and add other options later on.
>>>
>>>
>>> From my point of view, I'd be more interesting in providing a single
>>> consistent durable semantic that application can rely on for both durable
>>> writes and non-durable writes. The other stuffs seem to be more logistics
>>> things.
>>>
>>>
>>> - Sijie
>>>
>>>
>>> On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <eo...@gmail.com>
>>> wrote:
>>>
>>> > 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <jujjuri@gmail.com
>>> >:
>>> >
>>> > > I don't believe I fully followed your second case. But even in this
>>> case,
>>> > > your major concern is about the additional 'sync' RPC?
>>> > >
>>> >
>>> > yes apart from that I am fine with your proposal too, that is to have a
>>> > LedgerType which drives durability
>>> > and I think we need to add per-entry durability options
>>> >
>>> > I think that at least for the 'simple' no-sync addEntry we do not need
>>> to
>>> > change many things, I am drafting a prototype, I will share it as soon
>>> as
>>> > we all agree on the roadmap
>>> >
>>> > The first implementation can cover the first cases (no-sync addEntry)
>>> and
>>> > change the way the writer advances the LAC in order to support 'relaxed
>>> > durability writes'.
>>> > This change will be compatible with future improvements and it will
>>> open
>>> > the door for big changes on the bookie side like bypassing the journal
>>> or
>>> > leveraging multiple journals.....
>>> >
>>> > -- Enrico
>>> >
>>> > or something else that the LedgerType proposal won't work?
>>> > >
>>> >
>>> > >
>>> > >
>>> > > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <
>>> eolivelli@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > I think that having a set of options on the ledger metadata will
>>> be a
>>> > > good
>>> > > > enhancement and I am sure we will do it as soon as it will be
>>> needed,
>>> > > maybe
>>> > > > we do not need it now.
>>> > > >
>>> > > > Actually I think we will need to declare this durability-level at
>>> entry
>>> > > > level to support some uses cases in BP-14 document, let me explain
>>> two
>>> > of
>>> > > > my usecases for which I need it:
>>> > > >
>>> > > > At higher level we have to choices:
>>> > > >
>>> > > > A) per-ledger durability options (JV proposal)
>>> > > > all addEntry operations are durable or non-durable and there is an
>>> > > explicit
>>> > > > 'sync' API (+ forced sync at close)
>>> > > >
>>> > > > B) per-entry durability options (original BP-14 proposal)
>>> > > > every addEntry has an own durable/non-durable option
>>> (sync/no-sync),
>>> > with
>>> > > > the ability to call 'sync' without addEntry (+ forced sync at
>>> close)
>>> > > >
>>> > > > I am speaking about the the database WAL case, I am using the
>>> ledger as
>>> > > > segment for the WAL of a database and I am writing all data
>>> changes in
>>> > > the
>>> > > > scope of a 'transaction' with the relaxed-durability flag, then I
>>> am
>>> > > > writing the 'transaction committed' entry with "strict durability"
>>> > > > requirement, this will in fact require that all previous entries
>>> are
>>> > > > persisted durably and so that the transaction will never be lost.
>>> > > >
>>> > > > In this scenario we would need an addEntry + sync API in fact:
>>> > > >
>>> > > > using option  A) the WAL will look like:
>>> > > > - open ledger no-sync = true
>>> > > > - addEntry (set foo=bar)  (this will be no-sync)
>>> > > > - addEntry (set foo=bar2) (this will be no-sync)
>>> > > > - addEntry (commit)
>>> > > > - sync
>>> > > >
>>> > > > using option B) the WAL will look like
>>> > > > - open ledger
>>> > > > - addEntry (set foo=bar), no-sync
>>> > > > - addEntry (set foo=bar2), no-sync
>>> > > > - addEntry (commit), sync
>>> > > >
>>> > > > in case B) we are "saving" one RPC call to every bookie (the 'sync'
>>> > one)
>>> > > > same for single data change entries, like updating a single record
>>> on
>>> > the
>>> > > > database, this with BK 4.5 "costs" only a single RPC to every
>>> bookie
>>> > > >
>>> > > > Second case:
>>> > > > I am using BookKeeper to store binary objects, so I am packing more
>>> > > > 'objects' (named sequences of bytes) into a single ledger, like
>>> you do
>>> > > when
>>> > > > you write many records to a file in a streaming fashion and keep
>>> track
>>> > of
>>> > > > offsets of the beginning of every record (LedgerHandeAdv is
>>> perfect for
>>> > > > this case).
>>> > > > I am not using a single ledger per 'file' because it kills
>>> zookeeper to
>>> > > > create many ledgers very fast, in my systems I have big busts of
>>> > writes,
>>> > > > which need to be really "fast", so I am writing multiple 'files' to
>>> > every
>>> > > > single ledger. So the close-to-open consistency at ledger level is
>>> not
>>> > > > suitable for this case.
>>> > > > I have to write as fast as possible to this 'ledger-backed'
>>> stream, and
>>> > > as
>>> > > > with a 'traditional'  filesystem I am writing parts of each file
>>> and
>>> > than
>>> > > > requiring 'sync' at the end of each file.
>>> > > > Using BookKeeper you need to split big 'files' into "little"
>>> parts, you
>>> > > > cannot transmit the contents as to "real" stream on network.
>>> > > >
>>> > > > I am not talking about bookie level implementation details I would
>>> like
>>> > > to
>>> > > > define the high level API in order to support all the relevant
>>> known
>>> > use
>>> > > > cases and keep space for the future,
>>> > > > at this moment adding a per-entry 'durability option' seems to be
>>> very
>>> > > > flexible and simple to implement, it does not prevent us from doing
>>> > > further
>>> > > > improvements, like namely skipping the journal.
>>> > > >
>>> > > > Enrico
>>> > > >
>>> > > >
>>> > > >
>>> > > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
>>> > > >
>>> > > > >
>>> > > > >
>>> > > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
>>> > jujjuri@gmail.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > >> Hi all,
>>> > > > >>
>>> > > > >> As promised during Thursday call, here is my proposal.
>>> > > > >>
>>> > > > >> *NOTE*: Major difference in this proposal compared to Enrico’s
>>> > > > >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>>> > > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
>>> > > > >> is
>>> > > > >> making the durability a property of the ledger(type) as opposed
>>> to
>>> > > > >> addEntry(). Rest of the technical details have a lot of
>>> > similarities.
>>> > > > >>
>>> > > > >
>>> > > > > Thank you JV. I have just read quickly the doc and your view is
>>> > > centantly
>>> > > > > broader.
>>> > > > > I will dig into the doc as soon as possible on Monday.
>>> > > > > For me it is ok to have a ledger wide configuration I think that
>>> the
>>> > > most
>>> > > > > important decision is about the API we will provide as in the
>>> future
>>> > it
>>> > > > > will be difficult to change it.
>>> > > > >
>>> > > > >
>>> > > > > Cheers
>>> > > > > Enrico
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqv
>>> Wpq43
>>> > > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
>>> > > > >>
>>> > > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
>>> > eolivelli@gmail.com
>>> > > >
>>> > > > >> wrote:
>>> > > > >>
>>> > > > >> > Thank you all for the comments and for taking a look to the
>>> > document
>>> > > > so
>>> > > > >> > soon.
>>> > > > >> > I have updated the doc, we will discuss the document at the
>>> > meeting,
>>> > > > >> >
>>> > > > >> >
>>> > > > >> > Enrico
>>> > > > >> >
>>> > > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>>> > > > >> >
>>> > > > >> > > Enrico,
>>> > > > >> > >
>>> > > > >> > > Thank you so much! It is a great effort for putting this up.
>>> > > Overall
>>> > > > >> > looks
>>> > > > >> > > good. I made some comments, we can discuss at tomorrow's
>>> > community
>>> > > > >> > meeting.
>>> > > > >> > >
>>> > > > >> > > - Sijie
>>> > > > >> > >
>>> > > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
>>> > > > eolivelli@gmail.com
>>> > > > >> >
>>> > > > >> > > wrote:
>>> > > > >> > >
>>> > > > >> > > > Hi all,
>>> > > > >> > > > I have drafted a first proposal for BP-14 - Relax
>>> Durability
>>> > > > >> > > >
>>> > > > >> > > > We are talking about limiting the number of fsync to the
>>> > journal
>>> > > > >> while
>>> > > > >> > > > preserving the correctness of the LAC protocol.
>>> > > > >> > > >
>>> > > > >> > > > This is the link to the wiki page, but as the issue is
>>> huge we
>>> > > > >> prefer
>>> > > > >> > to
>>> > > > >> > > > use Google Documents for sharing comments
>>> > > > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
>>> > > > >> > > > BP+-+14+Relax+durability
>>> > > > >> > > >
>>> > > > >> > > > This is the document
>>> > > > >> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>>> > > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
>>> > > > >> > > >
>>> > > > >> > > > All comments are welcome
>>> > > > >> > > >
>>> > > > >> > > > I have added DL dev list in cc as the discussion is
>>> > interesting
>>> > > > for
>>> > > > >> > both
>>> > > > >> > > > groups
>>> > > > >> > > >
>>> > > > >> > > > Enrico Olivelli
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > > >>
>>> > > > >>
>>> > > > >> --
>>> > > > >> Jvrao
>>> > > > >> ---
>>> > > > >> First they ignore you, then they laugh at you, then they fight
>>> you,
>>> > > then
>>> > > > >> you win. - Mahatma Gandhi
>>> > > > >>
>>> > > > > --
>>> > > > >
>>> > > > >
>>> > > > > -- Enrico Olivelli
>>> > > > >
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Jvrao
>>> > > ---
>>> > > First they ignore you, then they laugh at you, then they fight you,
>>> then
>>> > > you win. - Mahatma Gandhi
>>> > >
>>> >
>>>
>>
>>
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
Hi all,


You can find the revised proposal here
https://cwiki.apache.org/confluence/display/BOOKKEEPER/BP-14+Relax+durability

The link to the document open for comments is this:
https://docs.google.com/document/d/1yNi9t2_deOOMXDaGzrnmaHTQeB3B3Fnym82DUERH7LM/edit?usp=sharing

Please check it out
We are going to review this Proposal at the meeting

-- Enrico


2017-08-30 8:56 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:

> Thank you Sijie for summarizing and thanks to the community for helping in
> this important enhancement to BookKeeper
>
> I am convinced that as JV pointed out we need to declare at ledger
> creation time that the ledger is going to perform no-sync writes.
>
> I think we need an explicit declaration currently to make things "clear"
> to the developer which is using the LedgerHandle API even and ledger
> creation tyime.
>
> The case is that we are going to forbid "striping" ledgers (ensemble size
> > quorum size) for no-sync writes in the first implementation:
> - one option is to  fail at the first no-sync addEntry, but this will be
> really uncomfortable because usually the ack/write/ensemble sizes are
> configured by the admin, and there will be configurations in which errors
> will come out only after starting the system.
> - the second option is to make the developer explicitly enable no-sync
> writes at creation time and fail the creation of the ledger if the
> requested combination of options if not possible
>
> I am not sure that the changes to the bookie internals are a Client-API
> matter, maybe we can leverage custom metadata (as JV said) in order to make
> the bookie handle ledgers in a different manner, this way will be always
> open as custom metadata are already here.
>
> JV preferred the ledger-type approach, the dual solution is to introduce a
> list of "capabilities" or "ledger options".
> I think that this ability to perform no-syc writes is so important that
> "custom metadata" is not the good place to declare it, same for "ledger
> type"
>
> So I am proposing to add a boolean 'allowNoSyncWrites" at ledger creation
> time, without writing in to ledger metadata on ZK,
> I think that if further improvements will need ledger metadata changes we
> will do.
>
> I have updated the BP-14 document, I have added an "Open issues" footer
> with the open points,
> please add comments and I will correct the document as soon as possible.
>
>
> Enrico
>
>
>
>
> 2017-08-30 1:24 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>
>> Thank you, Enrico, JV.
>>
>> These are great discussions.
>>
>> After reading these two proposals, I have a few very high-level comments,
>> dividing into three categories.
>>
>>
>> *API*
>>
>> - I think there are not fundamentally differences between these two
>> proposals.
>> They are trying to achieve similar goals by exposing durability levels in
>> different way.
>> So this will be a discussion on what API/interface should look like from
>> user / admin perspective.
>> I would suggest focusing what would be the API itself, putting the
>> implementation design aside when talking about this.
>>
>> *Core*
>>
>> - Both proposals need to deal with a core function - what happen to LAC
>> and
>> what semantic that bookkeeper provides.
>> JV did a good summary in his proposal. However I am not a fan of
>> maintaining two different semantics. So I am looking for
>> a solution that bookkeeper can only maintain one semantic. The semantic is
>> basically:
>>
>> 1) LAC only advanced when entries before LAC are committed to the
>> persistent storage
>> 2) All the entries until LAC are successfully committed to the persistence
>> storage
>> 3) Entries until LAC: all the entries must be readable all the time.
>>
>> If we maintain such semantic, there is no need to change the auto recovery
>> protocol in bookkeeper. All what we guarantee are the entries durably
>> persistent.
>>
>> In order to maintain such semantic, I think both me and JV proposed
>> similar
>> solution in either proposal. I am trying to finalize one here:
>>
>> * bookie maintains a LAS (Last Add Synced) point for each entry.
>> * LAS can be piggybacked on AddResponses
>> * Client uses the LAS to advance LAC.
>>
>> If we can agree on the core semantic we are going to provide, the other
>> things are just logistics.
>>
>> *Others*
>>
>> - Regarding separating journal or bypassing journal, there is no
>> difference
>> when we talking from the core semantic. They are all non-durably writes
>> (acknowledging before fsyncing).
>> We can start with same journal approach (but just acknowledge before
>> fsyncing), implement the core and add other options later on.
>>
>>
>> From my point of view, I'd be more interesting in providing a single
>> consistent durable semantic that application can rely on for both durable
>> writes and non-durable writes. The other stuffs seem to be more logistics
>> things.
>>
>>
>> - Sijie
>>
>>
>> On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <eo...@gmail.com>
>> wrote:
>>
>> > 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <ju...@gmail.com>:
>> >
>> > > I don't believe I fully followed your second case. But even in this
>> case,
>> > > your major concern is about the additional 'sync' RPC?
>> > >
>> >
>> > yes apart from that I am fine with your proposal too, that is to have a
>> > LedgerType which drives durability
>> > and I think we need to add per-entry durability options
>> >
>> > I think that at least for the 'simple' no-sync addEntry we do not need
>> to
>> > change many things, I am drafting a prototype, I will share it as soon
>> as
>> > we all agree on the roadmap
>> >
>> > The first implementation can cover the first cases (no-sync addEntry)
>> and
>> > change the way the writer advances the LAC in order to support 'relaxed
>> > durability writes'.
>> > This change will be compatible with future improvements and it will open
>> > the door for big changes on the bookie side like bypassing the journal
>> or
>> > leveraging multiple journals.....
>> >
>> > -- Enrico
>> >
>> > or something else that the LedgerType proposal won't work?
>> > >
>> >
>> > >
>> > >
>> > > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <eolivelli@gmail.com
>> >
>> > > wrote:
>> > >
>> > > > I think that having a set of options on the ledger metadata will be
>> a
>> > > good
>> > > > enhancement and I am sure we will do it as soon as it will be
>> needed,
>> > > maybe
>> > > > we do not need it now.
>> > > >
>> > > > Actually I think we will need to declare this durability-level at
>> entry
>> > > > level to support some uses cases in BP-14 document, let me explain
>> two
>> > of
>> > > > my usecases for which I need it:
>> > > >
>> > > > At higher level we have to choices:
>> > > >
>> > > > A) per-ledger durability options (JV proposal)
>> > > > all addEntry operations are durable or non-durable and there is an
>> > > explicit
>> > > > 'sync' API (+ forced sync at close)
>> > > >
>> > > > B) per-entry durability options (original BP-14 proposal)
>> > > > every addEntry has an own durable/non-durable option (sync/no-sync),
>> > with
>> > > > the ability to call 'sync' without addEntry (+ forced sync at close)
>> > > >
>> > > > I am speaking about the the database WAL case, I am using the
>> ledger as
>> > > > segment for the WAL of a database and I am writing all data changes
>> in
>> > > the
>> > > > scope of a 'transaction' with the relaxed-durability flag, then I am
>> > > > writing the 'transaction committed' entry with "strict durability"
>> > > > requirement, this will in fact require that all previous entries are
>> > > > persisted durably and so that the transaction will never be lost.
>> > > >
>> > > > In this scenario we would need an addEntry + sync API in fact:
>> > > >
>> > > > using option  A) the WAL will look like:
>> > > > - open ledger no-sync = true
>> > > > - addEntry (set foo=bar)  (this will be no-sync)
>> > > > - addEntry (set foo=bar2) (this will be no-sync)
>> > > > - addEntry (commit)
>> > > > - sync
>> > > >
>> > > > using option B) the WAL will look like
>> > > > - open ledger
>> > > > - addEntry (set foo=bar), no-sync
>> > > > - addEntry (set foo=bar2), no-sync
>> > > > - addEntry (commit), sync
>> > > >
>> > > > in case B) we are "saving" one RPC call to every bookie (the 'sync'
>> > one)
>> > > > same for single data change entries, like updating a single record
>> on
>> > the
>> > > > database, this with BK 4.5 "costs" only a single RPC to every bookie
>> > > >
>> > > > Second case:
>> > > > I am using BookKeeper to store binary objects, so I am packing more
>> > > > 'objects' (named sequences of bytes) into a single ledger, like you
>> do
>> > > when
>> > > > you write many records to a file in a streaming fashion and keep
>> track
>> > of
>> > > > offsets of the beginning of every record (LedgerHandeAdv is perfect
>> for
>> > > > this case).
>> > > > I am not using a single ledger per 'file' because it kills
>> zookeeper to
>> > > > create many ledgers very fast, in my systems I have big busts of
>> > writes,
>> > > > which need to be really "fast", so I am writing multiple 'files' to
>> > every
>> > > > single ledger. So the close-to-open consistency at ledger level is
>> not
>> > > > suitable for this case.
>> > > > I have to write as fast as possible to this 'ledger-backed' stream,
>> and
>> > > as
>> > > > with a 'traditional'  filesystem I am writing parts of each file and
>> > than
>> > > > requiring 'sync' at the end of each file.
>> > > > Using BookKeeper you need to split big 'files' into "little" parts,
>> you
>> > > > cannot transmit the contents as to "real" stream on network.
>> > > >
>> > > > I am not talking about bookie level implementation details I would
>> like
>> > > to
>> > > > define the high level API in order to support all the relevant known
>> > use
>> > > > cases and keep space for the future,
>> > > > at this moment adding a per-entry 'durability option' seems to be
>> very
>> > > > flexible and simple to implement, it does not prevent us from doing
>> > > further
>> > > > improvements, like namely skipping the journal.
>> > > >
>> > > > Enrico
>> > > >
>> > > >
>> > > >
>> > > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
>> > > >
>> > > > >
>> > > > >
>> > > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
>> > jujjuri@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > >> Hi all,
>> > > > >>
>> > > > >> As promised during Thursday call, here is my proposal.
>> > > > >>
>> > > > >> *NOTE*: Major difference in this proposal compared to Enrico’s
>> > > > >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>> > > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
>> > > > >> is
>> > > > >> making the durability a property of the ledger(type) as opposed
>> to
>> > > > >> addEntry(). Rest of the technical details have a lot of
>> > similarities.
>> > > > >>
>> > > > >
>> > > > > Thank you JV. I have just read quickly the doc and your view is
>> > > centantly
>> > > > > broader.
>> > > > > I will dig into the doc as soon as possible on Monday.
>> > > > > For me it is ok to have a ledger wide configuration I think that
>> the
>> > > most
>> > > > > important decision is about the API we will provide as in the
>> future
>> > it
>> > > > > will be difficult to change it.
>> > > > >
>> > > > >
>> > > > > Cheers
>> > > > > Enrico
>> > > > >
>> > > > >
>> > > > >
>> > > > >> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqv
>> Wpq43
>> > > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
>> > > > >>
>> > > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
>> > eolivelli@gmail.com
>> > > >
>> > > > >> wrote:
>> > > > >>
>> > > > >> > Thank you all for the comments and for taking a look to the
>> > document
>> > > > so
>> > > > >> > soon.
>> > > > >> > I have updated the doc, we will discuss the document at the
>> > meeting,
>> > > > >> >
>> > > > >> >
>> > > > >> > Enrico
>> > > > >> >
>> > > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>> > > > >> >
>> > > > >> > > Enrico,
>> > > > >> > >
>> > > > >> > > Thank you so much! It is a great effort for putting this up.
>> > > Overall
>> > > > >> > looks
>> > > > >> > > good. I made some comments, we can discuss at tomorrow's
>> > community
>> > > > >> > meeting.
>> > > > >> > >
>> > > > >> > > - Sijie
>> > > > >> > >
>> > > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
>> > > > eolivelli@gmail.com
>> > > > >> >
>> > > > >> > > wrote:
>> > > > >> > >
>> > > > >> > > > Hi all,
>> > > > >> > > > I have drafted a first proposal for BP-14 - Relax
>> Durability
>> > > > >> > > >
>> > > > >> > > > We are talking about limiting the number of fsync to the
>> > journal
>> > > > >> while
>> > > > >> > > > preserving the correctness of the LAC protocol.
>> > > > >> > > >
>> > > > >> > > > This is the link to the wiki page, but as the issue is
>> huge we
>> > > > >> prefer
>> > > > >> > to
>> > > > >> > > > use Google Documents for sharing comments
>> > > > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
>> > > > >> > > > BP+-+14+Relax+durability
>> > > > >> > > >
>> > > > >> > > > This is the document
>> > > > >> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>> > > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
>> > > > >> > > >
>> > > > >> > > > All comments are welcome
>> > > > >> > > >
>> > > > >> > > > I have added DL dev list in cc as the discussion is
>> > interesting
>> > > > for
>> > > > >> > both
>> > > > >> > > > groups
>> > > > >> > > >
>> > > > >> > > > Enrico Olivelli
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> --
>> > > > >> Jvrao
>> > > > >> ---
>> > > > >> First they ignore you, then they laugh at you, then they fight
>> you,
>> > > then
>> > > > >> you win. - Mahatma Gandhi
>> > > > >>
>> > > > > --
>> > > > >
>> > > > >
>> > > > > -- Enrico Olivelli
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Jvrao
>> > > ---
>> > > First they ignore you, then they laugh at you, then they fight you,
>> then
>> > > you win. - Mahatma Gandhi
>> > >
>> >
>>
>
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
Hi all,


You can find the revised proposal here
https://cwiki.apache.org/confluence/display/BOOKKEEPER/BP-14+Relax+durability

The link to the document open for comments is this:
https://docs.google.com/document/d/1yNi9t2_deOOMXDaGzrnmaHTQeB3B3Fnym82DUERH7LM/edit?usp=sharing

Please check it out
We are going to review this Proposal at the meeting

-- Enrico


2017-08-30 8:56 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:

> Thank you Sijie for summarizing and thanks to the community for helping in
> this important enhancement to BookKeeper
>
> I am convinced that as JV pointed out we need to declare at ledger
> creation time that the ledger is going to perform no-sync writes.
>
> I think we need an explicit declaration currently to make things "clear"
> to the developer which is using the LedgerHandle API even and ledger
> creation tyime.
>
> The case is that we are going to forbid "striping" ledgers (ensemble size
> > quorum size) for no-sync writes in the first implementation:
> - one option is to  fail at the first no-sync addEntry, but this will be
> really uncomfortable because usually the ack/write/ensemble sizes are
> configured by the admin, and there will be configurations in which errors
> will come out only after starting the system.
> - the second option is to make the developer explicitly enable no-sync
> writes at creation time and fail the creation of the ledger if the
> requested combination of options if not possible
>
> I am not sure that the changes to the bookie internals are a Client-API
> matter, maybe we can leverage custom metadata (as JV said) in order to make
> the bookie handle ledgers in a different manner, this way will be always
> open as custom metadata are already here.
>
> JV preferred the ledger-type approach, the dual solution is to introduce a
> list of "capabilities" or "ledger options".
> I think that this ability to perform no-syc writes is so important that
> "custom metadata" is not the good place to declare it, same for "ledger
> type"
>
> So I am proposing to add a boolean 'allowNoSyncWrites" at ledger creation
> time, without writing in to ledger metadata on ZK,
> I think that if further improvements will need ledger metadata changes we
> will do.
>
> I have updated the BP-14 document, I have added an "Open issues" footer
> with the open points,
> please add comments and I will correct the document as soon as possible.
>
>
> Enrico
>
>
>
>
> 2017-08-30 1:24 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>
>> Thank you, Enrico, JV.
>>
>> These are great discussions.
>>
>> After reading these two proposals, I have a few very high-level comments,
>> dividing into three categories.
>>
>>
>> *API*
>>
>> - I think there are not fundamentally differences between these two
>> proposals.
>> They are trying to achieve similar goals by exposing durability levels in
>> different way.
>> So this will be a discussion on what API/interface should look like from
>> user / admin perspective.
>> I would suggest focusing what would be the API itself, putting the
>> implementation design aside when talking about this.
>>
>> *Core*
>>
>> - Both proposals need to deal with a core function - what happen to LAC
>> and
>> what semantic that bookkeeper provides.
>> JV did a good summary in his proposal. However I am not a fan of
>> maintaining two different semantics. So I am looking for
>> a solution that bookkeeper can only maintain one semantic. The semantic is
>> basically:
>>
>> 1) LAC only advanced when entries before LAC are committed to the
>> persistent storage
>> 2) All the entries until LAC are successfully committed to the persistence
>> storage
>> 3) Entries until LAC: all the entries must be readable all the time.
>>
>> If we maintain such semantic, there is no need to change the auto recovery
>> protocol in bookkeeper. All what we guarantee are the entries durably
>> persistent.
>>
>> In order to maintain such semantic, I think both me and JV proposed
>> similar
>> solution in either proposal. I am trying to finalize one here:
>>
>> * bookie maintains a LAS (Last Add Synced) point for each entry.
>> * LAS can be piggybacked on AddResponses
>> * Client uses the LAS to advance LAC.
>>
>> If we can agree on the core semantic we are going to provide, the other
>> things are just logistics.
>>
>> *Others*
>>
>> - Regarding separating journal or bypassing journal, there is no
>> difference
>> when we talking from the core semantic. They are all non-durably writes
>> (acknowledging before fsyncing).
>> We can start with same journal approach (but just acknowledge before
>> fsyncing), implement the core and add other options later on.
>>
>>
>> From my point of view, I'd be more interesting in providing a single
>> consistent durable semantic that application can rely on for both durable
>> writes and non-durable writes. The other stuffs seem to be more logistics
>> things.
>>
>>
>> - Sijie
>>
>>
>> On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <eo...@gmail.com>
>> wrote:
>>
>> > 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <ju...@gmail.com>:
>> >
>> > > I don't believe I fully followed your second case. But even in this
>> case,
>> > > your major concern is about the additional 'sync' RPC?
>> > >
>> >
>> > yes apart from that I am fine with your proposal too, that is to have a
>> > LedgerType which drives durability
>> > and I think we need to add per-entry durability options
>> >
>> > I think that at least for the 'simple' no-sync addEntry we do not need
>> to
>> > change many things, I am drafting a prototype, I will share it as soon
>> as
>> > we all agree on the roadmap
>> >
>> > The first implementation can cover the first cases (no-sync addEntry)
>> and
>> > change the way the writer advances the LAC in order to support 'relaxed
>> > durability writes'.
>> > This change will be compatible with future improvements and it will open
>> > the door for big changes on the bookie side like bypassing the journal
>> or
>> > leveraging multiple journals.....
>> >
>> > -- Enrico
>> >
>> > or something else that the LedgerType proposal won't work?
>> > >
>> >
>> > >
>> > >
>> > > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <eolivelli@gmail.com
>> >
>> > > wrote:
>> > >
>> > > > I think that having a set of options on the ledger metadata will be
>> a
>> > > good
>> > > > enhancement and I am sure we will do it as soon as it will be
>> needed,
>> > > maybe
>> > > > we do not need it now.
>> > > >
>> > > > Actually I think we will need to declare this durability-level at
>> entry
>> > > > level to support some uses cases in BP-14 document, let me explain
>> two
>> > of
>> > > > my usecases for which I need it:
>> > > >
>> > > > At higher level we have to choices:
>> > > >
>> > > > A) per-ledger durability options (JV proposal)
>> > > > all addEntry operations are durable or non-durable and there is an
>> > > explicit
>> > > > 'sync' API (+ forced sync at close)
>> > > >
>> > > > B) per-entry durability options (original BP-14 proposal)
>> > > > every addEntry has an own durable/non-durable option (sync/no-sync),
>> > with
>> > > > the ability to call 'sync' without addEntry (+ forced sync at close)
>> > > >
>> > > > I am speaking about the the database WAL case, I am using the
>> ledger as
>> > > > segment for the WAL of a database and I am writing all data changes
>> in
>> > > the
>> > > > scope of a 'transaction' with the relaxed-durability flag, then I am
>> > > > writing the 'transaction committed' entry with "strict durability"
>> > > > requirement, this will in fact require that all previous entries are
>> > > > persisted durably and so that the transaction will never be lost.
>> > > >
>> > > > In this scenario we would need an addEntry + sync API in fact:
>> > > >
>> > > > using option  A) the WAL will look like:
>> > > > - open ledger no-sync = true
>> > > > - addEntry (set foo=bar)  (this will be no-sync)
>> > > > - addEntry (set foo=bar2) (this will be no-sync)
>> > > > - addEntry (commit)
>> > > > - sync
>> > > >
>> > > > using option B) the WAL will look like
>> > > > - open ledger
>> > > > - addEntry (set foo=bar), no-sync
>> > > > - addEntry (set foo=bar2), no-sync
>> > > > - addEntry (commit), sync
>> > > >
>> > > > in case B) we are "saving" one RPC call to every bookie (the 'sync'
>> > one)
>> > > > same for single data change entries, like updating a single record
>> on
>> > the
>> > > > database, this with BK 4.5 "costs" only a single RPC to every bookie
>> > > >
>> > > > Second case:
>> > > > I am using BookKeeper to store binary objects, so I am packing more
>> > > > 'objects' (named sequences of bytes) into a single ledger, like you
>> do
>> > > when
>> > > > you write many records to a file in a streaming fashion and keep
>> track
>> > of
>> > > > offsets of the beginning of every record (LedgerHandeAdv is perfect
>> for
>> > > > this case).
>> > > > I am not using a single ledger per 'file' because it kills
>> zookeeper to
>> > > > create many ledgers very fast, in my systems I have big busts of
>> > writes,
>> > > > which need to be really "fast", so I am writing multiple 'files' to
>> > every
>> > > > single ledger. So the close-to-open consistency at ledger level is
>> not
>> > > > suitable for this case.
>> > > > I have to write as fast as possible to this 'ledger-backed' stream,
>> and
>> > > as
>> > > > with a 'traditional'  filesystem I am writing parts of each file and
>> > than
>> > > > requiring 'sync' at the end of each file.
>> > > > Using BookKeeper you need to split big 'files' into "little" parts,
>> you
>> > > > cannot transmit the contents as to "real" stream on network.
>> > > >
>> > > > I am not talking about bookie level implementation details I would
>> like
>> > > to
>> > > > define the high level API in order to support all the relevant known
>> > use
>> > > > cases and keep space for the future,
>> > > > at this moment adding a per-entry 'durability option' seems to be
>> very
>> > > > flexible and simple to implement, it does not prevent us from doing
>> > > further
>> > > > improvements, like namely skipping the journal.
>> > > >
>> > > > Enrico
>> > > >
>> > > >
>> > > >
>> > > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
>> > > >
>> > > > >
>> > > > >
>> > > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
>> > jujjuri@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > >> Hi all,
>> > > > >>
>> > > > >> As promised during Thursday call, here is my proposal.
>> > > > >>
>> > > > >> *NOTE*: Major difference in this proposal compared to Enrico’s
>> > > > >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>> > > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
>> > > > >> is
>> > > > >> making the durability a property of the ledger(type) as opposed
>> to
>> > > > >> addEntry(). Rest of the technical details have a lot of
>> > similarities.
>> > > > >>
>> > > > >
>> > > > > Thank you JV. I have just read quickly the doc and your view is
>> > > centantly
>> > > > > broader.
>> > > > > I will dig into the doc as soon as possible on Monday.
>> > > > > For me it is ok to have a ledger wide configuration I think that
>> the
>> > > most
>> > > > > important decision is about the API we will provide as in the
>> future
>> > it
>> > > > > will be difficult to change it.
>> > > > >
>> > > > >
>> > > > > Cheers
>> > > > > Enrico
>> > > > >
>> > > > >
>> > > > >
>> > > > >> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqv
>> Wpq43
>> > > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
>> > > > >>
>> > > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
>> > eolivelli@gmail.com
>> > > >
>> > > > >> wrote:
>> > > > >>
>> > > > >> > Thank you all for the comments and for taking a look to the
>> > document
>> > > > so
>> > > > >> > soon.
>> > > > >> > I have updated the doc, we will discuss the document at the
>> > meeting,
>> > > > >> >
>> > > > >> >
>> > > > >> > Enrico
>> > > > >> >
>> > > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>> > > > >> >
>> > > > >> > > Enrico,
>> > > > >> > >
>> > > > >> > > Thank you so much! It is a great effort for putting this up.
>> > > Overall
>> > > > >> > looks
>> > > > >> > > good. I made some comments, we can discuss at tomorrow's
>> > community
>> > > > >> > meeting.
>> > > > >> > >
>> > > > >> > > - Sijie
>> > > > >> > >
>> > > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
>> > > > eolivelli@gmail.com
>> > > > >> >
>> > > > >> > > wrote:
>> > > > >> > >
>> > > > >> > > > Hi all,
>> > > > >> > > > I have drafted a first proposal for BP-14 - Relax
>> Durability
>> > > > >> > > >
>> > > > >> > > > We are talking about limiting the number of fsync to the
>> > journal
>> > > > >> while
>> > > > >> > > > preserving the correctness of the LAC protocol.
>> > > > >> > > >
>> > > > >> > > > This is the link to the wiki page, but as the issue is
>> huge we
>> > > > >> prefer
>> > > > >> > to
>> > > > >> > > > use Google Documents for sharing comments
>> > > > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
>> > > > >> > > > BP+-+14+Relax+durability
>> > > > >> > > >
>> > > > >> > > > This is the document
>> > > > >> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>> > > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
>> > > > >> > > >
>> > > > >> > > > All comments are welcome
>> > > > >> > > >
>> > > > >> > > > I have added DL dev list in cc as the discussion is
>> > interesting
>> > > > for
>> > > > >> > both
>> > > > >> > > > groups
>> > > > >> > > >
>> > > > >> > > > Enrico Olivelli
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> --
>> > > > >> Jvrao
>> > > > >> ---
>> > > > >> First they ignore you, then they laugh at you, then they fight
>> you,
>> > > then
>> > > > >> you win. - Mahatma Gandhi
>> > > > >>
>> > > > > --
>> > > > >
>> > > > >
>> > > > > -- Enrico Olivelli
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Jvrao
>> > > ---
>> > > First they ignore you, then they laugh at you, then they fight you,
>> then
>> > > you win. - Mahatma Gandhi
>> > >
>> >
>>
>
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
Thank you Sijie for summarizing and thanks to the community for helping in
this important enhancement to BookKeeper

I am convinced that as JV pointed out we need to declare at ledger creation
time that the ledger is going to perform no-sync writes.

I think we need an explicit declaration currently to make things "clear" to
the developer which is using the LedgerHandle API even and ledger creation
tyime.

The case is that we are going to forbid "striping" ledgers (ensemble size >
quorum size) for no-sync writes in the first implementation:
- one option is to  fail at the first no-sync addEntry, but this will be
really uncomfortable because usually the ack/write/ensemble sizes are
configured by the admin, and there will be configurations in which errors
will come out only after starting the system.
- the second option is to make the developer explicitly enable no-sync
writes at creation time and fail the creation of the ledger if the
requested combination of options if not possible

I am not sure that the changes to the bookie internals are a Client-API
matter, maybe we can leverage custom metadata (as JV said) in order to make
the bookie handle ledgers in a different manner, this way will be always
open as custom metadata are already here.

JV preferred the ledger-type approach, the dual solution is to introduce a
list of "capabilities" or "ledger options".
I think that this ability to perform no-syc writes is so important that
"custom metadata" is not the good place to declare it, same for "ledger
type"

So I am proposing to add a boolean 'allowNoSyncWrites" at ledger creation
time, without writing in to ledger metadata on ZK,
I think that if further improvements will need ledger metadata changes we
will do.

I have updated the BP-14 document, I have added an "Open issues" footer
with the open points,
please add comments and I will correct the document as soon as possible.


Enrico




2017-08-30 1:24 GMT+02:00 Sijie Guo <gu...@gmail.com>:

> Thank you, Enrico, JV.
>
> These are great discussions.
>
> After reading these two proposals, I have a few very high-level comments,
> dividing into three categories.
>
>
> *API*
>
> - I think there are not fundamentally differences between these two
> proposals.
> They are trying to achieve similar goals by exposing durability levels in
> different way.
> So this will be a discussion on what API/interface should look like from
> user / admin perspective.
> I would suggest focusing what would be the API itself, putting the
> implementation design aside when talking about this.
>
> *Core*
>
> - Both proposals need to deal with a core function - what happen to LAC and
> what semantic that bookkeeper provides.
> JV did a good summary in his proposal. However I am not a fan of
> maintaining two different semantics. So I am looking for
> a solution that bookkeeper can only maintain one semantic. The semantic is
> basically:
>
> 1) LAC only advanced when entries before LAC are committed to the
> persistent storage
> 2) All the entries until LAC are successfully committed to the persistence
> storage
> 3) Entries until LAC: all the entries must be readable all the time.
>
> If we maintain such semantic, there is no need to change the auto recovery
> protocol in bookkeeper. All what we guarantee are the entries durably
> persistent.
>
> In order to maintain such semantic, I think both me and JV proposed similar
> solution in either proposal. I am trying to finalize one here:
>
> * bookie maintains a LAS (Last Add Synced) point for each entry.
> * LAS can be piggybacked on AddResponses
> * Client uses the LAS to advance LAC.
>
> If we can agree on the core semantic we are going to provide, the other
> things are just logistics.
>
> *Others*
>
> - Regarding separating journal or bypassing journal, there is no difference
> when we talking from the core semantic. They are all non-durably writes
> (acknowledging before fsyncing).
> We can start with same journal approach (but just acknowledge before
> fsyncing), implement the core and add other options later on.
>
>
> From my point of view, I'd be more interesting in providing a single
> consistent durable semantic that application can rely on for both durable
> writes and non-durable writes. The other stuffs seem to be more logistics
> things.
>
>
> - Sijie
>
>
> On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <ju...@gmail.com>:
> >
> > > I don't believe I fully followed your second case. But even in this
> case,
> > > your major concern is about the additional 'sync' RPC?
> > >
> >
> > yes apart from that I am fine with your proposal too, that is to have a
> > LedgerType which drives durability
> > and I think we need to add per-entry durability options
> >
> > I think that at least for the 'simple' no-sync addEntry we do not need to
> > change many things, I am drafting a prototype, I will share it as soon as
> > we all agree on the roadmap
> >
> > The first implementation can cover the first cases (no-sync addEntry) and
> > change the way the writer advances the LAC in order to support 'relaxed
> > durability writes'.
> > This change will be compatible with future improvements and it will open
> > the door for big changes on the bookie side like bypassing the journal or
> > leveraging multiple journals.....
> >
> > -- Enrico
> >
> > or something else that the LedgerType proposal won't work?
> > >
> >
> > >
> > >
> > > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <eo...@gmail.com>
> > > wrote:
> > >
> > > > I think that having a set of options on the ledger metadata will be a
> > > good
> > > > enhancement and I am sure we will do it as soon as it will be needed,
> > > maybe
> > > > we do not need it now.
> > > >
> > > > Actually I think we will need to declare this durability-level at
> entry
> > > > level to support some uses cases in BP-14 document, let me explain
> two
> > of
> > > > my usecases for which I need it:
> > > >
> > > > At higher level we have to choices:
> > > >
> > > > A) per-ledger durability options (JV proposal)
> > > > all addEntry operations are durable or non-durable and there is an
> > > explicit
> > > > 'sync' API (+ forced sync at close)
> > > >
> > > > B) per-entry durability options (original BP-14 proposal)
> > > > every addEntry has an own durable/non-durable option (sync/no-sync),
> > with
> > > > the ability to call 'sync' without addEntry (+ forced sync at close)
> > > >
> > > > I am speaking about the the database WAL case, I am using the ledger
> as
> > > > segment for the WAL of a database and I am writing all data changes
> in
> > > the
> > > > scope of a 'transaction' with the relaxed-durability flag, then I am
> > > > writing the 'transaction committed' entry with "strict durability"
> > > > requirement, this will in fact require that all previous entries are
> > > > persisted durably and so that the transaction will never be lost.
> > > >
> > > > In this scenario we would need an addEntry + sync API in fact:
> > > >
> > > > using option  A) the WAL will look like:
> > > > - open ledger no-sync = true
> > > > - addEntry (set foo=bar)  (this will be no-sync)
> > > > - addEntry (set foo=bar2) (this will be no-sync)
> > > > - addEntry (commit)
> > > > - sync
> > > >
> > > > using option B) the WAL will look like
> > > > - open ledger
> > > > - addEntry (set foo=bar), no-sync
> > > > - addEntry (set foo=bar2), no-sync
> > > > - addEntry (commit), sync
> > > >
> > > > in case B) we are "saving" one RPC call to every bookie (the 'sync'
> > one)
> > > > same for single data change entries, like updating a single record on
> > the
> > > > database, this with BK 4.5 "costs" only a single RPC to every bookie
> > > >
> > > > Second case:
> > > > I am using BookKeeper to store binary objects, so I am packing more
> > > > 'objects' (named sequences of bytes) into a single ledger, like you
> do
> > > when
> > > > you write many records to a file in a streaming fashion and keep
> track
> > of
> > > > offsets of the beginning of every record (LedgerHandeAdv is perfect
> for
> > > > this case).
> > > > I am not using a single ledger per 'file' because it kills zookeeper
> to
> > > > create many ledgers very fast, in my systems I have big busts of
> > writes,
> > > > which need to be really "fast", so I am writing multiple 'files' to
> > every
> > > > single ledger. So the close-to-open consistency at ledger level is
> not
> > > > suitable for this case.
> > > > I have to write as fast as possible to this 'ledger-backed' stream,
> and
> > > as
> > > > with a 'traditional'  filesystem I am writing parts of each file and
> > than
> > > > requiring 'sync' at the end of each file.
> > > > Using BookKeeper you need to split big 'files' into "little" parts,
> you
> > > > cannot transmit the contents as to "real" stream on network.
> > > >
> > > > I am not talking about bookie level implementation details I would
> like
> > > to
> > > > define the high level API in order to support all the relevant known
> > use
> > > > cases and keep space for the future,
> > > > at this moment adding a per-entry 'durability option' seems to be
> very
> > > > flexible and simple to implement, it does not prevent us from doing
> > > further
> > > > improvements, like namely skipping the journal.
> > > >
> > > > Enrico
> > > >
> > > >
> > > >
> > > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> > > >
> > > > >
> > > > >
> > > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
> > jujjuri@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> As promised during Thursday call, here is my proposal.
> > > > >>
> > > > >> *NOTE*: Major difference in this proposal compared to Enrico’s
> > > > >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> > > > >> is
> > > > >> making the durability a property of the ledger(type) as opposed to
> > > > >> addEntry(). Rest of the technical details have a lot of
> > similarities.
> > > > >>
> > > > >
> > > > > Thank you JV. I have just read quickly the doc and your view is
> > > centantly
> > > > > broader.
> > > > > I will dig into the doc as soon as possible on Monday.
> > > > > For me it is ok to have a ledger wide configuration I think that
> the
> > > most
> > > > > important decision is about the API we will provide as in the
> future
> > it
> > > > > will be difficult to change it.
> > > > >
> > > > >
> > > > > Cheers
> > > > > Enrico
> > > > >
> > > > >
> > > > >
> > > > >> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqvWpq43
> > > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> > > > >>
> > > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
> > eolivelli@gmail.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >> > Thank you all for the comments and for taking a look to the
> > document
> > > > so
> > > > >> > soon.
> > > > >> > I have updated the doc, we will discuss the document at the
> > meeting,
> > > > >> >
> > > > >> >
> > > > >> > Enrico
> > > > >> >
> > > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > > > >> >
> > > > >> > > Enrico,
> > > > >> > >
> > > > >> > > Thank you so much! It is a great effort for putting this up.
> > > Overall
> > > > >> > looks
> > > > >> > > good. I made some comments, we can discuss at tomorrow's
> > community
> > > > >> > meeting.
> > > > >> > >
> > > > >> > > - Sijie
> > > > >> > >
> > > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> > > > eolivelli@gmail.com
> > > > >> >
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > Hi all,
> > > > >> > > > I have drafted a first proposal for BP-14 - Relax Durability
> > > > >> > > >
> > > > >> > > > We are talking about limiting the number of fsync to the
> > journal
> > > > >> while
> > > > >> > > > preserving the correctness of the LAC protocol.
> > > > >> > > >
> > > > >> > > > This is the link to the wiki page, but as the issue is huge
> we
> > > > >> prefer
> > > > >> > to
> > > > >> > > > use Google Documents for sharing comments
> > > > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > > >> > > > BP+-+14+Relax+durability
> > > > >> > > >
> > > > >> > > > This is the document
> > > > >> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > > > >> > > >
> > > > >> > > > All comments are welcome
> > > > >> > > >
> > > > >> > > > I have added DL dev list in cc as the discussion is
> > interesting
> > > > for
> > > > >> > both
> > > > >> > > > groups
> > > > >> > > >
> > > > >> > > > Enrico Olivelli
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Jvrao
> > > > >> ---
> > > > >> First they ignore you, then they laugh at you, then they fight
> you,
> > > then
> > > > >> you win. - Mahatma Gandhi
> > > > >>
> > > > > --
> > > > >
> > > > >
> > > > > -- Enrico Olivelli
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Jvrao
> > > ---
> > > First they ignore you, then they laugh at you, then they fight you,
> then
> > > you win. - Mahatma Gandhi
> > >
> >
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
Thank you Sijie for summarizing and thanks to the community for helping in
this important enhancement to BookKeeper

I am convinced that as JV pointed out we need to declare at ledger creation
time that the ledger is going to perform no-sync writes.

I think we need an explicit declaration currently to make things "clear" to
the developer which is using the LedgerHandle API even and ledger creation
tyime.

The case is that we are going to forbid "striping" ledgers (ensemble size >
quorum size) for no-sync writes in the first implementation:
- one option is to  fail at the first no-sync addEntry, but this will be
really uncomfortable because usually the ack/write/ensemble sizes are
configured by the admin, and there will be configurations in which errors
will come out only after starting the system.
- the second option is to make the developer explicitly enable no-sync
writes at creation time and fail the creation of the ledger if the
requested combination of options if not possible

I am not sure that the changes to the bookie internals are a Client-API
matter, maybe we can leverage custom metadata (as JV said) in order to make
the bookie handle ledgers in a different manner, this way will be always
open as custom metadata are already here.

JV preferred the ledger-type approach, the dual solution is to introduce a
list of "capabilities" or "ledger options".
I think that this ability to perform no-syc writes is so important that
"custom metadata" is not the good place to declare it, same for "ledger
type"

So I am proposing to add a boolean 'allowNoSyncWrites" at ledger creation
time, without writing in to ledger metadata on ZK,
I think that if further improvements will need ledger metadata changes we
will do.

I have updated the BP-14 document, I have added an "Open issues" footer
with the open points,
please add comments and I will correct the document as soon as possible.


Enrico




2017-08-30 1:24 GMT+02:00 Sijie Guo <gu...@gmail.com>:

> Thank you, Enrico, JV.
>
> These are great discussions.
>
> After reading these two proposals, I have a few very high-level comments,
> dividing into three categories.
>
>
> *API*
>
> - I think there are not fundamentally differences between these two
> proposals.
> They are trying to achieve similar goals by exposing durability levels in
> different way.
> So this will be a discussion on what API/interface should look like from
> user / admin perspective.
> I would suggest focusing what would be the API itself, putting the
> implementation design aside when talking about this.
>
> *Core*
>
> - Both proposals need to deal with a core function - what happen to LAC and
> what semantic that bookkeeper provides.
> JV did a good summary in his proposal. However I am not a fan of
> maintaining two different semantics. So I am looking for
> a solution that bookkeeper can only maintain one semantic. The semantic is
> basically:
>
> 1) LAC only advanced when entries before LAC are committed to the
> persistent storage
> 2) All the entries until LAC are successfully committed to the persistence
> storage
> 3) Entries until LAC: all the entries must be readable all the time.
>
> If we maintain such semantic, there is no need to change the auto recovery
> protocol in bookkeeper. All what we guarantee are the entries durably
> persistent.
>
> In order to maintain such semantic, I think both me and JV proposed similar
> solution in either proposal. I am trying to finalize one here:
>
> * bookie maintains a LAS (Last Add Synced) point for each entry.
> * LAS can be piggybacked on AddResponses
> * Client uses the LAS to advance LAC.
>
> If we can agree on the core semantic we are going to provide, the other
> things are just logistics.
>
> *Others*
>
> - Regarding separating journal or bypassing journal, there is no difference
> when we talking from the core semantic. They are all non-durably writes
> (acknowledging before fsyncing).
> We can start with same journal approach (but just acknowledge before
> fsyncing), implement the core and add other options later on.
>
>
> From my point of view, I'd be more interesting in providing a single
> consistent durable semantic that application can rely on for both durable
> writes and non-durable writes. The other stuffs seem to be more logistics
> things.
>
>
> - Sijie
>
>
> On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <ju...@gmail.com>:
> >
> > > I don't believe I fully followed your second case. But even in this
> case,
> > > your major concern is about the additional 'sync' RPC?
> > >
> >
> > yes apart from that I am fine with your proposal too, that is to have a
> > LedgerType which drives durability
> > and I think we need to add per-entry durability options
> >
> > I think that at least for the 'simple' no-sync addEntry we do not need to
> > change many things, I am drafting a prototype, I will share it as soon as
> > we all agree on the roadmap
> >
> > The first implementation can cover the first cases (no-sync addEntry) and
> > change the way the writer advances the LAC in order to support 'relaxed
> > durability writes'.
> > This change will be compatible with future improvements and it will open
> > the door for big changes on the bookie side like bypassing the journal or
> > leveraging multiple journals.....
> >
> > -- Enrico
> >
> > or something else that the LedgerType proposal won't work?
> > >
> >
> > >
> > >
> > > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <eo...@gmail.com>
> > > wrote:
> > >
> > > > I think that having a set of options on the ledger metadata will be a
> > > good
> > > > enhancement and I am sure we will do it as soon as it will be needed,
> > > maybe
> > > > we do not need it now.
> > > >
> > > > Actually I think we will need to declare this durability-level at
> entry
> > > > level to support some uses cases in BP-14 document, let me explain
> two
> > of
> > > > my usecases for which I need it:
> > > >
> > > > At higher level we have to choices:
> > > >
> > > > A) per-ledger durability options (JV proposal)
> > > > all addEntry operations are durable or non-durable and there is an
> > > explicit
> > > > 'sync' API (+ forced sync at close)
> > > >
> > > > B) per-entry durability options (original BP-14 proposal)
> > > > every addEntry has an own durable/non-durable option (sync/no-sync),
> > with
> > > > the ability to call 'sync' without addEntry (+ forced sync at close)
> > > >
> > > > I am speaking about the the database WAL case, I am using the ledger
> as
> > > > segment for the WAL of a database and I am writing all data changes
> in
> > > the
> > > > scope of a 'transaction' with the relaxed-durability flag, then I am
> > > > writing the 'transaction committed' entry with "strict durability"
> > > > requirement, this will in fact require that all previous entries are
> > > > persisted durably and so that the transaction will never be lost.
> > > >
> > > > In this scenario we would need an addEntry + sync API in fact:
> > > >
> > > > using option  A) the WAL will look like:
> > > > - open ledger no-sync = true
> > > > - addEntry (set foo=bar)  (this will be no-sync)
> > > > - addEntry (set foo=bar2) (this will be no-sync)
> > > > - addEntry (commit)
> > > > - sync
> > > >
> > > > using option B) the WAL will look like
> > > > - open ledger
> > > > - addEntry (set foo=bar), no-sync
> > > > - addEntry (set foo=bar2), no-sync
> > > > - addEntry (commit), sync
> > > >
> > > > in case B) we are "saving" one RPC call to every bookie (the 'sync'
> > one)
> > > > same for single data change entries, like updating a single record on
> > the
> > > > database, this with BK 4.5 "costs" only a single RPC to every bookie
> > > >
> > > > Second case:
> > > > I am using BookKeeper to store binary objects, so I am packing more
> > > > 'objects' (named sequences of bytes) into a single ledger, like you
> do
> > > when
> > > > you write many records to a file in a streaming fashion and keep
> track
> > of
> > > > offsets of the beginning of every record (LedgerHandeAdv is perfect
> for
> > > > this case).
> > > > I am not using a single ledger per 'file' because it kills zookeeper
> to
> > > > create many ledgers very fast, in my systems I have big busts of
> > writes,
> > > > which need to be really "fast", so I am writing multiple 'files' to
> > every
> > > > single ledger. So the close-to-open consistency at ledger level is
> not
> > > > suitable for this case.
> > > > I have to write as fast as possible to this 'ledger-backed' stream,
> and
> > > as
> > > > with a 'traditional'  filesystem I am writing parts of each file and
> > than
> > > > requiring 'sync' at the end of each file.
> > > > Using BookKeeper you need to split big 'files' into "little" parts,
> you
> > > > cannot transmit the contents as to "real" stream on network.
> > > >
> > > > I am not talking about bookie level implementation details I would
> like
> > > to
> > > > define the high level API in order to support all the relevant known
> > use
> > > > cases and keep space for the future,
> > > > at this moment adding a per-entry 'durability option' seems to be
> very
> > > > flexible and simple to implement, it does not prevent us from doing
> > > further
> > > > improvements, like namely skipping the journal.
> > > >
> > > > Enrico
> > > >
> > > >
> > > >
> > > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> > > >
> > > > >
> > > > >
> > > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
> > jujjuri@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> As promised during Thursday call, here is my proposal.
> > > > >>
> > > > >> *NOTE*: Major difference in this proposal compared to Enrico’s
> > > > >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> > > > >> is
> > > > >> making the durability a property of the ledger(type) as opposed to
> > > > >> addEntry(). Rest of the technical details have a lot of
> > similarities.
> > > > >>
> > > > >
> > > > > Thank you JV. I have just read quickly the doc and your view is
> > > centantly
> > > > > broader.
> > > > > I will dig into the doc as soon as possible on Monday.
> > > > > For me it is ok to have a ledger wide configuration I think that
> the
> > > most
> > > > > important decision is about the API we will provide as in the
> future
> > it
> > > > > will be difficult to change it.
> > > > >
> > > > >
> > > > > Cheers
> > > > > Enrico
> > > > >
> > > > >
> > > > >
> > > > >> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqvWpq43
> > > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> > > > >>
> > > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
> > eolivelli@gmail.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >> > Thank you all for the comments and for taking a look to the
> > document
> > > > so
> > > > >> > soon.
> > > > >> > I have updated the doc, we will discuss the document at the
> > meeting,
> > > > >> >
> > > > >> >
> > > > >> > Enrico
> > > > >> >
> > > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > > > >> >
> > > > >> > > Enrico,
> > > > >> > >
> > > > >> > > Thank you so much! It is a great effort for putting this up.
> > > Overall
> > > > >> > looks
> > > > >> > > good. I made some comments, we can discuss at tomorrow's
> > community
> > > > >> > meeting.
> > > > >> > >
> > > > >> > > - Sijie
> > > > >> > >
> > > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> > > > eolivelli@gmail.com
> > > > >> >
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > Hi all,
> > > > >> > > > I have drafted a first proposal for BP-14 - Relax Durability
> > > > >> > > >
> > > > >> > > > We are talking about limiting the number of fsync to the
> > journal
> > > > >> while
> > > > >> > > > preserving the correctness of the LAC protocol.
> > > > >> > > >
> > > > >> > > > This is the link to the wiki page, but as the issue is huge
> we
> > > > >> prefer
> > > > >> > to
> > > > >> > > > use Google Documents for sharing comments
> > > > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > > >> > > > BP+-+14+Relax+durability
> > > > >> > > >
> > > > >> > > > This is the document
> > > > >> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > > > >> > > >
> > > > >> > > > All comments are welcome
> > > > >> > > >
> > > > >> > > > I have added DL dev list in cc as the discussion is
> > interesting
> > > > for
> > > > >> > both
> > > > >> > > > groups
> > > > >> > > >
> > > > >> > > > Enrico Olivelli
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Jvrao
> > > > >> ---
> > > > >> First they ignore you, then they laugh at you, then they fight
> you,
> > > then
> > > > >> you win. - Mahatma Gandhi
> > > > >>
> > > > > --
> > > > >
> > > > >
> > > > > -- Enrico Olivelli
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Jvrao
> > > ---
> > > First they ignore you, then they laugh at you, then they fight you,
> then
> > > you win. - Mahatma Gandhi
> > >
> >
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Sijie Guo <gu...@gmail.com>.
Thank you, Enrico, JV.

These are great discussions.

After reading these two proposals, I have a few very high-level comments,
dividing into three categories.


*API*

- I think there are not fundamentally differences between these two
proposals.
They are trying to achieve similar goals by exposing durability levels in
different way.
So this will be a discussion on what API/interface should look like from
user / admin perspective.
I would suggest focusing what would be the API itself, putting the
implementation design aside when talking about this.

*Core*

- Both proposals need to deal with a core function - what happen to LAC and
what semantic that bookkeeper provides.
JV did a good summary in his proposal. However I am not a fan of
maintaining two different semantics. So I am looking for
a solution that bookkeeper can only maintain one semantic. The semantic is
basically:

1) LAC only advanced when entries before LAC are committed to the
persistent storage
2) All the entries until LAC are successfully committed to the persistence
storage
3) Entries until LAC: all the entries must be readable all the time.

If we maintain such semantic, there is no need to change the auto recovery
protocol in bookkeeper. All what we guarantee are the entries durably
persistent.

In order to maintain such semantic, I think both me and JV proposed similar
solution in either proposal. I am trying to finalize one here:

* bookie maintains a LAS (Last Add Synced) point for each entry.
* LAS can be piggybacked on AddResponses
* Client uses the LAS to advance LAC.

If we can agree on the core semantic we are going to provide, the other
things are just logistics.

*Others*

- Regarding separating journal or bypassing journal, there is no difference
when we talking from the core semantic. They are all non-durably writes
(acknowledging before fsyncing).
We can start with same journal approach (but just acknowledge before
fsyncing), implement the core and add other options later on.


From my point of view, I'd be more interesting in providing a single
consistent durable semantic that application can rely on for both durable
writes and non-durable writes. The other stuffs seem to be more logistics
things.


- Sijie


On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <eo...@gmail.com>
wrote:

> 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <ju...@gmail.com>:
>
> > I don't believe I fully followed your second case. But even in this case,
> > your major concern is about the additional 'sync' RPC?
> >
>
> yes apart from that I am fine with your proposal too, that is to have a
> LedgerType which drives durability
> and I think we need to add per-entry durability options
>
> I think that at least for the 'simple' no-sync addEntry we do not need to
> change many things, I am drafting a prototype, I will share it as soon as
> we all agree on the roadmap
>
> The first implementation can cover the first cases (no-sync addEntry) and
> change the way the writer advances the LAC in order to support 'relaxed
> durability writes'.
> This change will be compatible with future improvements and it will open
> the door for big changes on the bookie side like bypassing the journal or
> leveraging multiple journals.....
>
> -- Enrico
>
> or something else that the LedgerType proposal won't work?
> >
>
> >
> >
> > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > I think that having a set of options on the ledger metadata will be a
> > good
> > > enhancement and I am sure we will do it as soon as it will be needed,
> > maybe
> > > we do not need it now.
> > >
> > > Actually I think we will need to declare this durability-level at entry
> > > level to support some uses cases in BP-14 document, let me explain two
> of
> > > my usecases for which I need it:
> > >
> > > At higher level we have to choices:
> > >
> > > A) per-ledger durability options (JV proposal)
> > > all addEntry operations are durable or non-durable and there is an
> > explicit
> > > 'sync' API (+ forced sync at close)
> > >
> > > B) per-entry durability options (original BP-14 proposal)
> > > every addEntry has an own durable/non-durable option (sync/no-sync),
> with
> > > the ability to call 'sync' without addEntry (+ forced sync at close)
> > >
> > > I am speaking about the the database WAL case, I am using the ledger as
> > > segment for the WAL of a database and I am writing all data changes in
> > the
> > > scope of a 'transaction' with the relaxed-durability flag, then I am
> > > writing the 'transaction committed' entry with "strict durability"
> > > requirement, this will in fact require that all previous entries are
> > > persisted durably and so that the transaction will never be lost.
> > >
> > > In this scenario we would need an addEntry + sync API in fact:
> > >
> > > using option  A) the WAL will look like:
> > > - open ledger no-sync = true
> > > - addEntry (set foo=bar)  (this will be no-sync)
> > > - addEntry (set foo=bar2) (this will be no-sync)
> > > - addEntry (commit)
> > > - sync
> > >
> > > using option B) the WAL will look like
> > > - open ledger
> > > - addEntry (set foo=bar), no-sync
> > > - addEntry (set foo=bar2), no-sync
> > > - addEntry (commit), sync
> > >
> > > in case B) we are "saving" one RPC call to every bookie (the 'sync'
> one)
> > > same for single data change entries, like updating a single record on
> the
> > > database, this with BK 4.5 "costs" only a single RPC to every bookie
> > >
> > > Second case:
> > > I am using BookKeeper to store binary objects, so I am packing more
> > > 'objects' (named sequences of bytes) into a single ledger, like you do
> > when
> > > you write many records to a file in a streaming fashion and keep track
> of
> > > offsets of the beginning of every record (LedgerHandeAdv is perfect for
> > > this case).
> > > I am not using a single ledger per 'file' because it kills zookeeper to
> > > create many ledgers very fast, in my systems I have big busts of
> writes,
> > > which need to be really "fast", so I am writing multiple 'files' to
> every
> > > single ledger. So the close-to-open consistency at ledger level is not
> > > suitable for this case.
> > > I have to write as fast as possible to this 'ledger-backed' stream, and
> > as
> > > with a 'traditional'  filesystem I am writing parts of each file and
> than
> > > requiring 'sync' at the end of each file.
> > > Using BookKeeper you need to split big 'files' into "little" parts, you
> > > cannot transmit the contents as to "real" stream on network.
> > >
> > > I am not talking about bookie level implementation details I would like
> > to
> > > define the high level API in order to support all the relevant known
> use
> > > cases and keep space for the future,
> > > at this moment adding a per-entry 'durability option' seems to be very
> > > flexible and simple to implement, it does not prevent us from doing
> > further
> > > improvements, like namely skipping the journal.
> > >
> > > Enrico
> > >
> > >
> > >
> > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> > >
> > > >
> > > >
> > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
> jujjuri@gmail.com>
> > > > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> As promised during Thursday call, here is my proposal.
> > > >>
> > > >> *NOTE*: Major difference in this proposal compared to Enrico’s
> > > >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> > > >> is
> > > >> making the durability a property of the ledger(type) as opposed to
> > > >> addEntry(). Rest of the technical details have a lot of
> similarities.
> > > >>
> > > >
> > > > Thank you JV. I have just read quickly the doc and your view is
> > centantly
> > > > broader.
> > > > I will dig into the doc as soon as possible on Monday.
> > > > For me it is ok to have a ledger wide configuration I think that the
> > most
> > > > important decision is about the API we will provide as in the future
> it
> > > > will be difficult to change it.
> > > >
> > > >
> > > > Cheers
> > > > Enrico
> > > >
> > > >
> > > >
> > > >> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqvWpq43
> > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> > > >>
> > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
> eolivelli@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >> > Thank you all for the comments and for taking a look to the
> document
> > > so
> > > >> > soon.
> > > >> > I have updated the doc, we will discuss the document at the
> meeting,
> > > >> >
> > > >> >
> > > >> > Enrico
> > > >> >
> > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > > >> >
> > > >> > > Enrico,
> > > >> > >
> > > >> > > Thank you so much! It is a great effort for putting this up.
> > Overall
> > > >> > looks
> > > >> > > good. I made some comments, we can discuss at tomorrow's
> community
> > > >> > meeting.
> > > >> > >
> > > >> > > - Sijie
> > > >> > >
> > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> > > eolivelli@gmail.com
> > > >> >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi all,
> > > >> > > > I have drafted a first proposal for BP-14 - Relax Durability
> > > >> > > >
> > > >> > > > We are talking about limiting the number of fsync to the
> journal
> > > >> while
> > > >> > > > preserving the correctness of the LAC protocol.
> > > >> > > >
> > > >> > > > This is the link to the wiki page, but as the issue is huge we
> > > >> prefer
> > > >> > to
> > > >> > > > use Google Documents for sharing comments
> > > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > >> > > > BP+-+14+Relax+durability
> > > >> > > >
> > > >> > > > This is the document
> > > >> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > > >> > > >
> > > >> > > > All comments are welcome
> > > >> > > >
> > > >> > > > I have added DL dev list in cc as the discussion is
> interesting
> > > for
> > > >> > both
> > > >> > > > groups
> > > >> > > >
> > > >> > > > Enrico Olivelli
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Jvrao
> > > >> ---
> > > >> First they ignore you, then they laugh at you, then they fight you,
> > then
> > > >> you win. - Mahatma Gandhi
> > > >>
> > > > --
> > > >
> > > >
> > > > -- Enrico Olivelli
> > > >
> > >
> >
> >
> >
> > --
> > Jvrao
> > ---
> > First they ignore you, then they laugh at you, then they fight you, then
> > you win. - Mahatma Gandhi
> >
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Sijie Guo <gu...@gmail.com>.
Thank you, Enrico, JV.

These are great discussions.

After reading these two proposals, I have a few very high-level comments,
dividing into three categories.


*API*

- I think there are not fundamentally differences between these two
proposals.
They are trying to achieve similar goals by exposing durability levels in
different way.
So this will be a discussion on what API/interface should look like from
user / admin perspective.
I would suggest focusing what would be the API itself, putting the
implementation design aside when talking about this.

*Core*

- Both proposals need to deal with a core function - what happen to LAC and
what semantic that bookkeeper provides.
JV did a good summary in his proposal. However I am not a fan of
maintaining two different semantics. So I am looking for
a solution that bookkeeper can only maintain one semantic. The semantic is
basically:

1) LAC only advanced when entries before LAC are committed to the
persistent storage
2) All the entries until LAC are successfully committed to the persistence
storage
3) Entries until LAC: all the entries must be readable all the time.

If we maintain such semantic, there is no need to change the auto recovery
protocol in bookkeeper. All what we guarantee are the entries durably
persistent.

In order to maintain such semantic, I think both me and JV proposed similar
solution in either proposal. I am trying to finalize one here:

* bookie maintains a LAS (Last Add Synced) point for each entry.
* LAS can be piggybacked on AddResponses
* Client uses the LAS to advance LAC.

If we can agree on the core semantic we are going to provide, the other
things are just logistics.

*Others*

- Regarding separating journal or bypassing journal, there is no difference
when we talking from the core semantic. They are all non-durably writes
(acknowledging before fsyncing).
We can start with same journal approach (but just acknowledge before
fsyncing), implement the core and add other options later on.


From my point of view, I'd be more interesting in providing a single
consistent durable semantic that application can rely on for both durable
writes and non-durable writes. The other stuffs seem to be more logistics
things.


- Sijie


On Mon, Aug 28, 2017 at 11:27 PM, Enrico Olivelli <eo...@gmail.com>
wrote:

> 2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <ju...@gmail.com>:
>
> > I don't believe I fully followed your second case. But even in this case,
> > your major concern is about the additional 'sync' RPC?
> >
>
> yes apart from that I am fine with your proposal too, that is to have a
> LedgerType which drives durability
> and I think we need to add per-entry durability options
>
> I think that at least for the 'simple' no-sync addEntry we do not need to
> change many things, I am drafting a prototype, I will share it as soon as
> we all agree on the roadmap
>
> The first implementation can cover the first cases (no-sync addEntry) and
> change the way the writer advances the LAC in order to support 'relaxed
> durability writes'.
> This change will be compatible with future improvements and it will open
> the door for big changes on the bookie side like bypassing the journal or
> leveraging multiple journals.....
>
> -- Enrico
>
> or something else that the LedgerType proposal won't work?
> >
>
> >
> >
> > On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > I think that having a set of options on the ledger metadata will be a
> > good
> > > enhancement and I am sure we will do it as soon as it will be needed,
> > maybe
> > > we do not need it now.
> > >
> > > Actually I think we will need to declare this durability-level at entry
> > > level to support some uses cases in BP-14 document, let me explain two
> of
> > > my usecases for which I need it:
> > >
> > > At higher level we have to choices:
> > >
> > > A) per-ledger durability options (JV proposal)
> > > all addEntry operations are durable or non-durable and there is an
> > explicit
> > > 'sync' API (+ forced sync at close)
> > >
> > > B) per-entry durability options (original BP-14 proposal)
> > > every addEntry has an own durable/non-durable option (sync/no-sync),
> with
> > > the ability to call 'sync' without addEntry (+ forced sync at close)
> > >
> > > I am speaking about the the database WAL case, I am using the ledger as
> > > segment for the WAL of a database and I am writing all data changes in
> > the
> > > scope of a 'transaction' with the relaxed-durability flag, then I am
> > > writing the 'transaction committed' entry with "strict durability"
> > > requirement, this will in fact require that all previous entries are
> > > persisted durably and so that the transaction will never be lost.
> > >
> > > In this scenario we would need an addEntry + sync API in fact:
> > >
> > > using option  A) the WAL will look like:
> > > - open ledger no-sync = true
> > > - addEntry (set foo=bar)  (this will be no-sync)
> > > - addEntry (set foo=bar2) (this will be no-sync)
> > > - addEntry (commit)
> > > - sync
> > >
> > > using option B) the WAL will look like
> > > - open ledger
> > > - addEntry (set foo=bar), no-sync
> > > - addEntry (set foo=bar2), no-sync
> > > - addEntry (commit), sync
> > >
> > > in case B) we are "saving" one RPC call to every bookie (the 'sync'
> one)
> > > same for single data change entries, like updating a single record on
> the
> > > database, this with BK 4.5 "costs" only a single RPC to every bookie
> > >
> > > Second case:
> > > I am using BookKeeper to store binary objects, so I am packing more
> > > 'objects' (named sequences of bytes) into a single ledger, like you do
> > when
> > > you write many records to a file in a streaming fashion and keep track
> of
> > > offsets of the beginning of every record (LedgerHandeAdv is perfect for
> > > this case).
> > > I am not using a single ledger per 'file' because it kills zookeeper to
> > > create many ledgers very fast, in my systems I have big busts of
> writes,
> > > which need to be really "fast", so I am writing multiple 'files' to
> every
> > > single ledger. So the close-to-open consistency at ledger level is not
> > > suitable for this case.
> > > I have to write as fast as possible to this 'ledger-backed' stream, and
> > as
> > > with a 'traditional'  filesystem I am writing parts of each file and
> than
> > > requiring 'sync' at the end of each file.
> > > Using BookKeeper you need to split big 'files' into "little" parts, you
> > > cannot transmit the contents as to "real" stream on network.
> > >
> > > I am not talking about bookie level implementation details I would like
> > to
> > > define the high level API in order to support all the relevant known
> use
> > > cases and keep space for the future,
> > > at this moment adding a per-entry 'durability option' seems to be very
> > > flexible and simple to implement, it does not prevent us from doing
> > further
> > > improvements, like namely skipping the journal.
> > >
> > > Enrico
> > >
> > >
> > >
> > > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> > >
> > > >
> > > >
> > > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <
> jujjuri@gmail.com>
> > > > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> As promised during Thursday call, here is my proposal.
> > > >>
> > > >> *NOTE*: Major difference in this proposal compared to Enrico’s
> > > >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> > > >> is
> > > >> making the durability a property of the ledger(type) as opposed to
> > > >> addEntry(). Rest of the technical details have a lot of
> similarities.
> > > >>
> > > >
> > > > Thank you JV. I have just read quickly the doc and your view is
> > centantly
> > > > broader.
> > > > I will dig into the doc as soon as possible on Monday.
> > > > For me it is ok to have a ledger wide configuration I think that the
> > most
> > > > important decision is about the API we will provide as in the future
> it
> > > > will be difficult to change it.
> > > >
> > > >
> > > > Cheers
> > > > Enrico
> > > >
> > > >
> > > >
> > > >> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqvWpq43
> > > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> > > >>
> > > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <
> eolivelli@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >> > Thank you all for the comments and for taking a look to the
> document
> > > so
> > > >> > soon.
> > > >> > I have updated the doc, we will discuss the document at the
> meeting,
> > > >> >
> > > >> >
> > > >> > Enrico
> > > >> >
> > > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > > >> >
> > > >> > > Enrico,
> > > >> > >
> > > >> > > Thank you so much! It is a great effort for putting this up.
> > Overall
> > > >> > looks
> > > >> > > good. I made some comments, we can discuss at tomorrow's
> community
> > > >> > meeting.
> > > >> > >
> > > >> > > - Sijie
> > > >> > >
> > > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> > > eolivelli@gmail.com
> > > >> >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi all,
> > > >> > > > I have drafted a first proposal for BP-14 - Relax Durability
> > > >> > > >
> > > >> > > > We are talking about limiting the number of fsync to the
> journal
> > > >> while
> > > >> > > > preserving the correctness of the LAC protocol.
> > > >> > > >
> > > >> > > > This is the link to the wiki page, but as the issue is huge we
> > > >> prefer
> > > >> > to
> > > >> > > > use Google Documents for sharing comments
> > > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > >> > > > BP+-+14+Relax+durability
> > > >> > > >
> > > >> > > > This is the document
> > > >> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > > >> > > >
> > > >> > > > All comments are welcome
> > > >> > > >
> > > >> > > > I have added DL dev list in cc as the discussion is
> interesting
> > > for
> > > >> > both
> > > >> > > > groups
> > > >> > > >
> > > >> > > > Enrico Olivelli
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Jvrao
> > > >> ---
> > > >> First they ignore you, then they laugh at you, then they fight you,
> > then
> > > >> you win. - Mahatma Gandhi
> > > >>
> > > > --
> > > >
> > > >
> > > > -- Enrico Olivelli
> > > >
> > >
> >
> >
> >
> > --
> > Jvrao
> > ---
> > First they ignore you, then they laugh at you, then they fight you, then
> > you win. - Mahatma Gandhi
> >
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <ju...@gmail.com>:

> I don't believe I fully followed your second case. But even in this case,
> your major concern is about the additional 'sync' RPC?
>

yes apart from that I am fine with your proposal too, that is to have a
LedgerType which drives durability
and I think we need to add per-entry durability options

I think that at least for the 'simple' no-sync addEntry we do not need to
change many things, I am drafting a prototype, I will share it as soon as
we all agree on the roadmap

The first implementation can cover the first cases (no-sync addEntry) and
change the way the writer advances the LAC in order to support 'relaxed
durability writes'.
This change will be compatible with future improvements and it will open
the door for big changes on the bookie side like bypassing the journal or
leveraging multiple journals.....

-- Enrico

or something else that the LedgerType proposal won't work?
>

>
>
> On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > I think that having a set of options on the ledger metadata will be a
> good
> > enhancement and I am sure we will do it as soon as it will be needed,
> maybe
> > we do not need it now.
> >
> > Actually I think we will need to declare this durability-level at entry
> > level to support some uses cases in BP-14 document, let me explain two of
> > my usecases for which I need it:
> >
> > At higher level we have to choices:
> >
> > A) per-ledger durability options (JV proposal)
> > all addEntry operations are durable or non-durable and there is an
> explicit
> > 'sync' API (+ forced sync at close)
> >
> > B) per-entry durability options (original BP-14 proposal)
> > every addEntry has an own durable/non-durable option (sync/no-sync), with
> > the ability to call 'sync' without addEntry (+ forced sync at close)
> >
> > I am speaking about the the database WAL case, I am using the ledger as
> > segment for the WAL of a database and I am writing all data changes in
> the
> > scope of a 'transaction' with the relaxed-durability flag, then I am
> > writing the 'transaction committed' entry with "strict durability"
> > requirement, this will in fact require that all previous entries are
> > persisted durably and so that the transaction will never be lost.
> >
> > In this scenario we would need an addEntry + sync API in fact:
> >
> > using option  A) the WAL will look like:
> > - open ledger no-sync = true
> > - addEntry (set foo=bar)  (this will be no-sync)
> > - addEntry (set foo=bar2) (this will be no-sync)
> > - addEntry (commit)
> > - sync
> >
> > using option B) the WAL will look like
> > - open ledger
> > - addEntry (set foo=bar), no-sync
> > - addEntry (set foo=bar2), no-sync
> > - addEntry (commit), sync
> >
> > in case B) we are "saving" one RPC call to every bookie (the 'sync' one)
> > same for single data change entries, like updating a single record on the
> > database, this with BK 4.5 "costs" only a single RPC to every bookie
> >
> > Second case:
> > I am using BookKeeper to store binary objects, so I am packing more
> > 'objects' (named sequences of bytes) into a single ledger, like you do
> when
> > you write many records to a file in a streaming fashion and keep track of
> > offsets of the beginning of every record (LedgerHandeAdv is perfect for
> > this case).
> > I am not using a single ledger per 'file' because it kills zookeeper to
> > create many ledgers very fast, in my systems I have big busts of writes,
> > which need to be really "fast", so I am writing multiple 'files' to every
> > single ledger. So the close-to-open consistency at ledger level is not
> > suitable for this case.
> > I have to write as fast as possible to this 'ledger-backed' stream, and
> as
> > with a 'traditional'  filesystem I am writing parts of each file and than
> > requiring 'sync' at the end of each file.
> > Using BookKeeper you need to split big 'files' into "little" parts, you
> > cannot transmit the contents as to "real" stream on network.
> >
> > I am not talking about bookie level implementation details I would like
> to
> > define the high level API in order to support all the relevant known use
> > cases and keep space for the future,
> > at this moment adding a per-entry 'durability option' seems to be very
> > flexible and simple to implement, it does not prevent us from doing
> further
> > improvements, like namely skipping the journal.
> >
> > Enrico
> >
> >
> >
> > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> >
> > >
> > >
> > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <ju...@gmail.com>
> > > wrote:
> > >
> > >> Hi all,
> > >>
> > >> As promised during Thursday call, here is my proposal.
> > >>
> > >> *NOTE*: Major difference in this proposal compared to Enrico’s
> > >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> > >> is
> > >> making the durability a property of the ledger(type) as opposed to
> > >> addEntry(). Rest of the technical details have a lot of similarities.
> > >>
> > >
> > > Thank you JV. I have just read quickly the doc and your view is
> centantly
> > > broader.
> > > I will dig into the doc as soon as possible on Monday.
> > > For me it is ok to have a ledger wide configuration I think that the
> most
> > > important decision is about the API we will provide as in the future it
> > > will be difficult to change it.
> > >
> > >
> > > Cheers
> > > Enrico
> > >
> > >
> > >
> > >> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqvWpq43
> > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> > >>
> > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <eolivelli@gmail.com
> >
> > >> wrote:
> > >>
> > >> > Thank you all for the comments and for taking a look to the document
> > so
> > >> > soon.
> > >> > I have updated the doc, we will discuss the document at the meeting,
> > >> >
> > >> >
> > >> > Enrico
> > >> >
> > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > >> >
> > >> > > Enrico,
> > >> > >
> > >> > > Thank you so much! It is a great effort for putting this up.
> Overall
> > >> > looks
> > >> > > good. I made some comments, we can discuss at tomorrow's community
> > >> > meeting.
> > >> > >
> > >> > > - Sijie
> > >> > >
> > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> > eolivelli@gmail.com
> > >> >
> > >> > > wrote:
> > >> > >
> > >> > > > Hi all,
> > >> > > > I have drafted a first proposal for BP-14 - Relax Durability
> > >> > > >
> > >> > > > We are talking about limiting the number of fsync to the journal
> > >> while
> > >> > > > preserving the correctness of the LAC protocol.
> > >> > > >
> > >> > > > This is the link to the wiki page, but as the issue is huge we
> > >> prefer
> > >> > to
> > >> > > > use Google Documents for sharing comments
> > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > >> > > > BP+-+14+Relax+durability
> > >> > > >
> > >> > > > This is the document
> > >> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > >> > > >
> > >> > > > All comments are welcome
> > >> > > >
> > >> > > > I have added DL dev list in cc as the discussion is interesting
> > for
> > >> > both
> > >> > > > groups
> > >> > > >
> > >> > > > Enrico Olivelli
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Jvrao
> > >> ---
> > >> First they ignore you, then they laugh at you, then they fight you,
> then
> > >> you win. - Mahatma Gandhi
> > >>
> > > --
> > >
> > >
> > > -- Enrico Olivelli
> > >
> >
>
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
2017-08-29 8:01 GMT+02:00 Venkateswara Rao Jujjuri <ju...@gmail.com>:

> I don't believe I fully followed your second case. But even in this case,
> your major concern is about the additional 'sync' RPC?
>

yes apart from that I am fine with your proposal too, that is to have a
LedgerType which drives durability
and I think we need to add per-entry durability options

I think that at least for the 'simple' no-sync addEntry we do not need to
change many things, I am drafting a prototype, I will share it as soon as
we all agree on the roadmap

The first implementation can cover the first cases (no-sync addEntry) and
change the way the writer advances the LAC in order to support 'relaxed
durability writes'.
This change will be compatible with future improvements and it will open
the door for big changes on the bookie side like bypassing the journal or
leveraging multiple journals.....

-- Enrico

or something else that the LedgerType proposal won't work?
>

>
>
> On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > I think that having a set of options on the ledger metadata will be a
> good
> > enhancement and I am sure we will do it as soon as it will be needed,
> maybe
> > we do not need it now.
> >
> > Actually I think we will need to declare this durability-level at entry
> > level to support some uses cases in BP-14 document, let me explain two of
> > my usecases for which I need it:
> >
> > At higher level we have to choices:
> >
> > A) per-ledger durability options (JV proposal)
> > all addEntry operations are durable or non-durable and there is an
> explicit
> > 'sync' API (+ forced sync at close)
> >
> > B) per-entry durability options (original BP-14 proposal)
> > every addEntry has an own durable/non-durable option (sync/no-sync), with
> > the ability to call 'sync' without addEntry (+ forced sync at close)
> >
> > I am speaking about the the database WAL case, I am using the ledger as
> > segment for the WAL of a database and I am writing all data changes in
> the
> > scope of a 'transaction' with the relaxed-durability flag, then I am
> > writing the 'transaction committed' entry with "strict durability"
> > requirement, this will in fact require that all previous entries are
> > persisted durably and so that the transaction will never be lost.
> >
> > In this scenario we would need an addEntry + sync API in fact:
> >
> > using option  A) the WAL will look like:
> > - open ledger no-sync = true
> > - addEntry (set foo=bar)  (this will be no-sync)
> > - addEntry (set foo=bar2) (this will be no-sync)
> > - addEntry (commit)
> > - sync
> >
> > using option B) the WAL will look like
> > - open ledger
> > - addEntry (set foo=bar), no-sync
> > - addEntry (set foo=bar2), no-sync
> > - addEntry (commit), sync
> >
> > in case B) we are "saving" one RPC call to every bookie (the 'sync' one)
> > same for single data change entries, like updating a single record on the
> > database, this with BK 4.5 "costs" only a single RPC to every bookie
> >
> > Second case:
> > I am using BookKeeper to store binary objects, so I am packing more
> > 'objects' (named sequences of bytes) into a single ledger, like you do
> when
> > you write many records to a file in a streaming fashion and keep track of
> > offsets of the beginning of every record (LedgerHandeAdv is perfect for
> > this case).
> > I am not using a single ledger per 'file' because it kills zookeeper to
> > create many ledgers very fast, in my systems I have big busts of writes,
> > which need to be really "fast", so I am writing multiple 'files' to every
> > single ledger. So the close-to-open consistency at ledger level is not
> > suitable for this case.
> > I have to write as fast as possible to this 'ledger-backed' stream, and
> as
> > with a 'traditional'  filesystem I am writing parts of each file and than
> > requiring 'sync' at the end of each file.
> > Using BookKeeper you need to split big 'files' into "little" parts, you
> > cannot transmit the contents as to "real" stream on network.
> >
> > I am not talking about bookie level implementation details I would like
> to
> > define the high level API in order to support all the relevant known use
> > cases and keep space for the future,
> > at this moment adding a per-entry 'durability option' seems to be very
> > flexible and simple to implement, it does not prevent us from doing
> further
> > improvements, like namely skipping the journal.
> >
> > Enrico
> >
> >
> >
> > 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
> >
> > >
> > >
> > > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <ju...@gmail.com>
> > > wrote:
> > >
> > >> Hi all,
> > >>
> > >> As promised during Thursday call, here is my proposal.
> > >>
> > >> *NOTE*: Major difference in this proposal compared to Enrico’s
> > >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> > >> is
> > >> making the durability a property of the ledger(type) as opposed to
> > >> addEntry(). Rest of the technical details have a lot of similarities.
> > >>
> > >
> > > Thank you JV. I have just read quickly the doc and your view is
> centantly
> > > broader.
> > > I will dig into the doc as soon as possible on Monday.
> > > For me it is ok to have a ledger wide configuration I think that the
> most
> > > important decision is about the API we will provide as in the future it
> > > will be difficult to change it.
> > >
> > >
> > > Cheers
> > > Enrico
> > >
> > >
> > >
> > >> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqvWpq43
> > >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> > >>
> > >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <eolivelli@gmail.com
> >
> > >> wrote:
> > >>
> > >> > Thank you all for the comments and for taking a look to the document
> > so
> > >> > soon.
> > >> > I have updated the doc, we will discuss the document at the meeting,
> > >> >
> > >> >
> > >> > Enrico
> > >> >
> > >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > >> >
> > >> > > Enrico,
> > >> > >
> > >> > > Thank you so much! It is a great effort for putting this up.
> Overall
> > >> > looks
> > >> > > good. I made some comments, we can discuss at tomorrow's community
> > >> > meeting.
> > >> > >
> > >> > > - Sijie
> > >> > >
> > >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> > eolivelli@gmail.com
> > >> >
> > >> > > wrote:
> > >> > >
> > >> > > > Hi all,
> > >> > > > I have drafted a first proposal for BP-14 - Relax Durability
> > >> > > >
> > >> > > > We are talking about limiting the number of fsync to the journal
> > >> while
> > >> > > > preserving the correctness of the LAC protocol.
> > >> > > >
> > >> > > > This is the link to the wiki page, but as the issue is huge we
> > >> prefer
> > >> > to
> > >> > > > use Google Documents for sharing comments
> > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > >> > > > BP+-+14+Relax+durability
> > >> > > >
> > >> > > > This is the document
> > >> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > >> > > >
> > >> > > > All comments are welcome
> > >> > > >
> > >> > > > I have added DL dev list in cc as the discussion is interesting
> > for
> > >> > both
> > >> > > > groups
> > >> > > >
> > >> > > > Enrico Olivelli
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Jvrao
> > >> ---
> > >> First they ignore you, then they laugh at you, then they fight you,
> then
> > >> you win. - Mahatma Gandhi
> > >>
> > > --
> > >
> > >
> > > -- Enrico Olivelli
> > >
> >
>
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Venkateswara Rao Jujjuri <ju...@gmail.com>.
I don't believe I fully followed your second case. But even in this case,
your major concern is about the additional 'sync' RPC?
or something else that the LedgerType proposal won't work?



On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> I think that having a set of options on the ledger metadata will be a good
> enhancement and I am sure we will do it as soon as it will be needed, maybe
> we do not need it now.
>
> Actually I think we will need to declare this durability-level at entry
> level to support some uses cases in BP-14 document, let me explain two of
> my usecases for which I need it:
>
> At higher level we have to choices:
>
> A) per-ledger durability options (JV proposal)
> all addEntry operations are durable or non-durable and there is an explicit
> 'sync' API (+ forced sync at close)
>
> B) per-entry durability options (original BP-14 proposal)
> every addEntry has an own durable/non-durable option (sync/no-sync), with
> the ability to call 'sync' without addEntry (+ forced sync at close)
>
> I am speaking about the the database WAL case, I am using the ledger as
> segment for the WAL of a database and I am writing all data changes in the
> scope of a 'transaction' with the relaxed-durability flag, then I am
> writing the 'transaction committed' entry with "strict durability"
> requirement, this will in fact require that all previous entries are
> persisted durably and so that the transaction will never be lost.
>
> In this scenario we would need an addEntry + sync API in fact:
>
> using option  A) the WAL will look like:
> - open ledger no-sync = true
> - addEntry (set foo=bar)  (this will be no-sync)
> - addEntry (set foo=bar2) (this will be no-sync)
> - addEntry (commit)
> - sync
>
> using option B) the WAL will look like
> - open ledger
> - addEntry (set foo=bar), no-sync
> - addEntry (set foo=bar2), no-sync
> - addEntry (commit), sync
>
> in case B) we are "saving" one RPC call to every bookie (the 'sync' one)
> same for single data change entries, like updating a single record on the
> database, this with BK 4.5 "costs" only a single RPC to every bookie
>
> Second case:
> I am using BookKeeper to store binary objects, so I am packing more
> 'objects' (named sequences of bytes) into a single ledger, like you do when
> you write many records to a file in a streaming fashion and keep track of
> offsets of the beginning of every record (LedgerHandeAdv is perfect for
> this case).
> I am not using a single ledger per 'file' because it kills zookeeper to
> create many ledgers very fast, in my systems I have big busts of writes,
> which need to be really "fast", so I am writing multiple 'files' to every
> single ledger. So the close-to-open consistency at ledger level is not
> suitable for this case.
> I have to write as fast as possible to this 'ledger-backed' stream, and as
> with a 'traditional'  filesystem I am writing parts of each file and than
> requiring 'sync' at the end of each file.
> Using BookKeeper you need to split big 'files' into "little" parts, you
> cannot transmit the contents as to "real" stream on network.
>
> I am not talking about bookie level implementation details I would like to
> define the high level API in order to support all the relevant known use
> cases and keep space for the future,
> at this moment adding a per-entry 'durability option' seems to be very
> flexible and simple to implement, it does not prevent us from doing further
> improvements, like namely skipping the journal.
>
> Enrico
>
>
>
> 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
>
> >
> >
> > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <ju...@gmail.com>
> > wrote:
> >
> >> Hi all,
> >>
> >> As promised during Thursday call, here is my proposal.
> >>
> >> *NOTE*: Major difference in this proposal compared to Enrico’s
> >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> >> is
> >> making the durability a property of the ledger(type) as opposed to
> >> addEntry(). Rest of the technical details have a lot of similarities.
> >>
> >
> > Thank you JV. I have just read quickly the doc and your view is centantly
> > broader.
> > I will dig into the doc as soon as possible on Monday.
> > For me it is ok to have a ledger wide configuration I think that the most
> > important decision is about the API we will provide as in the future it
> > will be difficult to change it.
> >
> >
> > Cheers
> > Enrico
> >
> >
> >
> >> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqvWpq43
> >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> >>
> >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <eo...@gmail.com>
> >> wrote:
> >>
> >> > Thank you all for the comments and for taking a look to the document
> so
> >> > soon.
> >> > I have updated the doc, we will discuss the document at the meeting,
> >> >
> >> >
> >> > Enrico
> >> >
> >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> >> >
> >> > > Enrico,
> >> > >
> >> > > Thank you so much! It is a great effort for putting this up. Overall
> >> > looks
> >> > > good. I made some comments, we can discuss at tomorrow's community
> >> > meeting.
> >> > >
> >> > > - Sijie
> >> > >
> >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> eolivelli@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi all,
> >> > > > I have drafted a first proposal for BP-14 - Relax Durability
> >> > > >
> >> > > > We are talking about limiting the number of fsync to the journal
> >> while
> >> > > > preserving the correctness of the LAC protocol.
> >> > > >
> >> > > > This is the link to the wiki page, but as the issue is huge we
> >> prefer
> >> > to
> >> > > > use Google Documents for sharing comments
> >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> >> > > > BP+-+14+Relax+durability
> >> > > >
> >> > > > This is the document
> >> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> >> > > >
> >> > > > All comments are welcome
> >> > > >
> >> > > > I have added DL dev list in cc as the discussion is interesting
> for
> >> > both
> >> > > > groups
> >> > > >
> >> > > > Enrico Olivelli
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Jvrao
> >> ---
> >> First they ignore you, then they laugh at you, then they fight you, then
> >> you win. - Mahatma Gandhi
> >>
> > --
> >
> >
> > -- Enrico Olivelli
> >
>



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: [DISCUSS] BP-14 Relax Durability

Posted by Venkateswara Rao Jujjuri <ju...@gmail.com>.
I don't believe I fully followed your second case. But even in this case,
your major concern is about the additional 'sync' RPC?
or something else that the LedgerType proposal won't work?



On Mon, Aug 28, 2017 at 7:35 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> I think that having a set of options on the ledger metadata will be a good
> enhancement and I am sure we will do it as soon as it will be needed, maybe
> we do not need it now.
>
> Actually I think we will need to declare this durability-level at entry
> level to support some uses cases in BP-14 document, let me explain two of
> my usecases for which I need it:
>
> At higher level we have to choices:
>
> A) per-ledger durability options (JV proposal)
> all addEntry operations are durable or non-durable and there is an explicit
> 'sync' API (+ forced sync at close)
>
> B) per-entry durability options (original BP-14 proposal)
> every addEntry has an own durable/non-durable option (sync/no-sync), with
> the ability to call 'sync' without addEntry (+ forced sync at close)
>
> I am speaking about the the database WAL case, I am using the ledger as
> segment for the WAL of a database and I am writing all data changes in the
> scope of a 'transaction' with the relaxed-durability flag, then I am
> writing the 'transaction committed' entry with "strict durability"
> requirement, this will in fact require that all previous entries are
> persisted durably and so that the transaction will never be lost.
>
> In this scenario we would need an addEntry + sync API in fact:
>
> using option  A) the WAL will look like:
> - open ledger no-sync = true
> - addEntry (set foo=bar)  (this will be no-sync)
> - addEntry (set foo=bar2) (this will be no-sync)
> - addEntry (commit)
> - sync
>
> using option B) the WAL will look like
> - open ledger
> - addEntry (set foo=bar), no-sync
> - addEntry (set foo=bar2), no-sync
> - addEntry (commit), sync
>
> in case B) we are "saving" one RPC call to every bookie (the 'sync' one)
> same for single data change entries, like updating a single record on the
> database, this with BK 4.5 "costs" only a single RPC to every bookie
>
> Second case:
> I am using BookKeeper to store binary objects, so I am packing more
> 'objects' (named sequences of bytes) into a single ledger, like you do when
> you write many records to a file in a streaming fashion and keep track of
> offsets of the beginning of every record (LedgerHandeAdv is perfect for
> this case).
> I am not using a single ledger per 'file' because it kills zookeeper to
> create many ledgers very fast, in my systems I have big busts of writes,
> which need to be really "fast", so I am writing multiple 'files' to every
> single ledger. So the close-to-open consistency at ledger level is not
> suitable for this case.
> I have to write as fast as possible to this 'ledger-backed' stream, and as
> with a 'traditional'  filesystem I am writing parts of each file and than
> requiring 'sync' at the end of each file.
> Using BookKeeper you need to split big 'files' into "little" parts, you
> cannot transmit the contents as to "real" stream on network.
>
> I am not talking about bookie level implementation details I would like to
> define the high level API in order to support all the relevant known use
> cases and keep space for the future,
> at this moment adding a per-entry 'durability option' seems to be very
> flexible and simple to implement, it does not prevent us from doing further
> improvements, like namely skipping the journal.
>
> Enrico
>
>
>
> 2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:
>
> >
> >
> > On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <ju...@gmail.com>
> > wrote:
> >
> >> Hi all,
> >>
> >> As promised during Thursday call, here is my proposal.
> >>
> >> *NOTE*: Major difference in this proposal compared to Enrico’s
> >> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> >> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
> >> is
> >> making the durability a property of the ledger(type) as opposed to
> >> addEntry(). Rest of the technical details have a lot of similarities.
> >>
> >
> > Thank you JV. I have just read quickly the doc and your view is centantly
> > broader.
> > I will dig into the doc as soon as possible on Monday.
> > For me it is ok to have a ledger wide configuration I think that the most
> > important decision is about the API we will provide as in the future it
> > will be difficult to change it.
> >
> >
> > Cheers
> > Enrico
> >
> >
> >
> >> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqvWpq43
> >> 2ODEghrGVQ4d4Q/edit?usp=sharing
> >>
> >> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <eo...@gmail.com>
> >> wrote:
> >>
> >> > Thank you all for the comments and for taking a look to the document
> so
> >> > soon.
> >> > I have updated the doc, we will discuss the document at the meeting,
> >> >
> >> >
> >> > Enrico
> >> >
> >> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> >> >
> >> > > Enrico,
> >> > >
> >> > > Thank you so much! It is a great effort for putting this up. Overall
> >> > looks
> >> > > good. I made some comments, we can discuss at tomorrow's community
> >> > meeting.
> >> > >
> >> > > - Sijie
> >> > >
> >> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <
> eolivelli@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi all,
> >> > > > I have drafted a first proposal for BP-14 - Relax Durability
> >> > > >
> >> > > > We are talking about limiting the number of fsync to the journal
> >> while
> >> > > > preserving the correctness of the LAC protocol.
> >> > > >
> >> > > > This is the link to the wiki page, but as the issue is huge we
> >> prefer
> >> > to
> >> > > > use Google Documents for sharing comments
> >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> >> > > > BP+-+14+Relax+durability
> >> > > >
> >> > > > This is the document
> >> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> >> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> >> > > >
> >> > > > All comments are welcome
> >> > > >
> >> > > > I have added DL dev list in cc as the discussion is interesting
> for
> >> > both
> >> > > > groups
> >> > > >
> >> > > > Enrico Olivelli
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Jvrao
> >> ---
> >> First they ignore you, then they laugh at you, then they fight you, then
> >> you win. - Mahatma Gandhi
> >>
> > --
> >
> >
> > -- Enrico Olivelli
> >
>



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
I think that having a set of options on the ledger metadata will be a good
enhancement and I am sure we will do it as soon as it will be needed, maybe
we do not need it now.

Actually I think we will need to declare this durability-level at entry
level to support some uses cases in BP-14 document, let me explain two of
my usecases for which I need it:

At higher level we have to choices:

A) per-ledger durability options (JV proposal)
all addEntry operations are durable or non-durable and there is an explicit
'sync' API (+ forced sync at close)

B) per-entry durability options (original BP-14 proposal)
every addEntry has an own durable/non-durable option (sync/no-sync), with
the ability to call 'sync' without addEntry (+ forced sync at close)

I am speaking about the the database WAL case, I am using the ledger as
segment for the WAL of a database and I am writing all data changes in the
scope of a 'transaction' with the relaxed-durability flag, then I am
writing the 'transaction committed' entry with "strict durability"
requirement, this will in fact require that all previous entries are
persisted durably and so that the transaction will never be lost.

In this scenario we would need an addEntry + sync API in fact:

using option  A) the WAL will look like:
- open ledger no-sync = true
- addEntry (set foo=bar)  (this will be no-sync)
- addEntry (set foo=bar2) (this will be no-sync)
- addEntry (commit)
- sync

using option B) the WAL will look like
- open ledger
- addEntry (set foo=bar), no-sync
- addEntry (set foo=bar2), no-sync
- addEntry (commit), sync

in case B) we are "saving" one RPC call to every bookie (the 'sync' one)
same for single data change entries, like updating a single record on the
database, this with BK 4.5 "costs" only a single RPC to every bookie

Second case:
I am using BookKeeper to store binary objects, so I am packing more
'objects' (named sequences of bytes) into a single ledger, like you do when
you write many records to a file in a streaming fashion and keep track of
offsets of the beginning of every record (LedgerHandeAdv is perfect for
this case).
I am not using a single ledger per 'file' because it kills zookeeper to
create many ledgers very fast, in my systems I have big busts of writes,
which need to be really "fast", so I am writing multiple 'files' to every
single ledger. So the close-to-open consistency at ledger level is not
suitable for this case.
I have to write as fast as possible to this 'ledger-backed' stream, and as
with a 'traditional'  filesystem I am writing parts of each file and than
requiring 'sync' at the end of each file.
Using BookKeeper you need to split big 'files' into "little" parts, you
cannot transmit the contents as to "real" stream on network.

I am not talking about bookie level implementation details I would like to
define the high level API in order to support all the relevant known use
cases and keep space for the future,
at this moment adding a per-entry 'durability option' seems to be very
flexible and simple to implement, it does not prevent us from doing further
improvements, like namely skipping the journal.

Enrico



2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:

>
>
> On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <ju...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> As promised during Thursday call, here is my proposal.
>>
>> *NOTE*: Major difference in this proposal compared to Enrico’s
>> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
>> is
>> making the durability a property of the ledger(type) as opposed to
>> addEntry(). Rest of the technical details have a lot of similarities.
>>
>
> Thank you JV. I have just read quickly the doc and your view is centantly
> broader.
> I will dig into the doc as soon as possible on Monday.
> For me it is ok to have a ledger wide configuration I think that the most
> important decision is about the API we will provide as in the future it
> will be difficult to change it.
>
>
> Cheers
> Enrico
>
>
>
>> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqvWpq43
>> 2ODEghrGVQ4d4Q/edit?usp=sharing
>>
>> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <eo...@gmail.com>
>> wrote:
>>
>> > Thank you all for the comments and for taking a look to the document so
>> > soon.
>> > I have updated the doc, we will discuss the document at the meeting,
>> >
>> >
>> > Enrico
>> >
>> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>> >
>> > > Enrico,
>> > >
>> > > Thank you so much! It is a great effort for putting this up. Overall
>> > looks
>> > > good. I made some comments, we can discuss at tomorrow's community
>> > meeting.
>> > >
>> > > - Sijie
>> > >
>> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <eolivelli@gmail.com
>> >
>> > > wrote:
>> > >
>> > > > Hi all,
>> > > > I have drafted a first proposal for BP-14 - Relax Durability
>> > > >
>> > > > We are talking about limiting the number of fsync to the journal
>> while
>> > > > preserving the correctness of the LAC protocol.
>> > > >
>> > > > This is the link to the wiki page, but as the issue is huge we
>> prefer
>> > to
>> > > > use Google Documents for sharing comments
>> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
>> > > > BP+-+14+Relax+durability
>> > > >
>> > > > This is the document
>> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
>> > > >
>> > > > All comments are welcome
>> > > >
>> > > > I have added DL dev list in cc as the discussion is interesting for
>> > both
>> > > > groups
>> > > >
>> > > > Enrico Olivelli
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> Jvrao
>> ---
>> First they ignore you, then they laugh at you, then they fight you, then
>> you win. - Mahatma Gandhi
>>
> --
>
>
> -- Enrico Olivelli
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
I think that having a set of options on the ledger metadata will be a good
enhancement and I am sure we will do it as soon as it will be needed, maybe
we do not need it now.

Actually I think we will need to declare this durability-level at entry
level to support some uses cases in BP-14 document, let me explain two of
my usecases for which I need it:

At higher level we have to choices:

A) per-ledger durability options (JV proposal)
all addEntry operations are durable or non-durable and there is an explicit
'sync' API (+ forced sync at close)

B) per-entry durability options (original BP-14 proposal)
every addEntry has an own durable/non-durable option (sync/no-sync), with
the ability to call 'sync' without addEntry (+ forced sync at close)

I am speaking about the the database WAL case, I am using the ledger as
segment for the WAL of a database and I am writing all data changes in the
scope of a 'transaction' with the relaxed-durability flag, then I am
writing the 'transaction committed' entry with "strict durability"
requirement, this will in fact require that all previous entries are
persisted durably and so that the transaction will never be lost.

In this scenario we would need an addEntry + sync API in fact:

using option  A) the WAL will look like:
- open ledger no-sync = true
- addEntry (set foo=bar)  (this will be no-sync)
- addEntry (set foo=bar2) (this will be no-sync)
- addEntry (commit)
- sync

using option B) the WAL will look like
- open ledger
- addEntry (set foo=bar), no-sync
- addEntry (set foo=bar2), no-sync
- addEntry (commit), sync

in case B) we are "saving" one RPC call to every bookie (the 'sync' one)
same for single data change entries, like updating a single record on the
database, this with BK 4.5 "costs" only a single RPC to every bookie

Second case:
I am using BookKeeper to store binary objects, so I am packing more
'objects' (named sequences of bytes) into a single ledger, like you do when
you write many records to a file in a streaming fashion and keep track of
offsets of the beginning of every record (LedgerHandeAdv is perfect for
this case).
I am not using a single ledger per 'file' because it kills zookeeper to
create many ledgers very fast, in my systems I have big busts of writes,
which need to be really "fast", so I am writing multiple 'files' to every
single ledger. So the close-to-open consistency at ledger level is not
suitable for this case.
I have to write as fast as possible to this 'ledger-backed' stream, and as
with a 'traditional'  filesystem I am writing parts of each file and than
requiring 'sync' at the end of each file.
Using BookKeeper you need to split big 'files' into "little" parts, you
cannot transmit the contents as to "real" stream on network.

I am not talking about bookie level implementation details I would like to
define the high level API in order to support all the relevant known use
cases and keep space for the future,
at this moment adding a per-entry 'durability option' seems to be very
flexible and simple to implement, it does not prevent us from doing further
improvements, like namely skipping the journal.

Enrico



2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eo...@gmail.com>:

>
>
> On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <ju...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> As promised during Thursday call, here is my proposal.
>>
>> *NOTE*: Major difference in this proposal compared to Enrico’s
>> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
>> is
>> making the durability a property of the ledger(type) as opposed to
>> addEntry(). Rest of the technical details have a lot of similarities.
>>
>
> Thank you JV. I have just read quickly the doc and your view is centantly
> broader.
> I will dig into the doc as soon as possible on Monday.
> For me it is ok to have a ledger wide configuration I think that the most
> important decision is about the API we will provide as in the future it
> will be difficult to change it.
>
>
> Cheers
> Enrico
>
>
>
>> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqvWpq43
>> 2ODEghrGVQ4d4Q/edit?usp=sharing
>>
>> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <eo...@gmail.com>
>> wrote:
>>
>> > Thank you all for the comments and for taking a look to the document so
>> > soon.
>> > I have updated the doc, we will discuss the document at the meeting,
>> >
>> >
>> > Enrico
>> >
>> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>> >
>> > > Enrico,
>> > >
>> > > Thank you so much! It is a great effort for putting this up. Overall
>> > looks
>> > > good. I made some comments, we can discuss at tomorrow's community
>> > meeting.
>> > >
>> > > - Sijie
>> > >
>> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <eolivelli@gmail.com
>> >
>> > > wrote:
>> > >
>> > > > Hi all,
>> > > > I have drafted a first proposal for BP-14 - Relax Durability
>> > > >
>> > > > We are talking about limiting the number of fsync to the journal
>> while
>> > > > preserving the correctness of the LAC protocol.
>> > > >
>> > > > This is the link to the wiki page, but as the issue is huge we
>> prefer
>> > to
>> > > > use Google Documents for sharing comments
>> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
>> > > > BP+-+14+Relax+durability
>> > > >
>> > > > This is the document
>> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
>> > > >
>> > > > All comments are welcome
>> > > >
>> > > > I have added DL dev list in cc as the discussion is interesting for
>> > both
>> > > > groups
>> > > >
>> > > > Enrico Olivelli
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> Jvrao
>> ---
>> First they ignore you, then they laugh at you, then they fight you, then
>> you win. - Mahatma Gandhi
>>
> --
>
>
> -- Enrico Olivelli
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <ju...@gmail.com>
wrote:

> Hi all,
>
> As promised during Thursday call, here is my proposal.
>
> *NOTE*: Major difference in this proposal compared to Enrico’s
> <
> https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v
> >
> is
> making the durability a property of the ledger(type) as opposed to
> addEntry(). Rest of the technical details have a lot of similarities.
>

Thank you JV. I have just read quickly the doc and your view is centantly
broader.
I will dig into the doc as soon as possible on Monday.
For me it is ok to have a ledger wide configuration I think that the most
important decision is about the API we will provide as in the future it
will be difficult to change it.


Cheers
Enrico



>
> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqvWpq432ODEghrGVQ4d4Q/edit?usp=sharing
>
> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Thank you all for the comments and for taking a look to the document so
> > soon.
> > I have updated the doc, we will discuss the document at the meeting,
> >
> >
> > Enrico
> >
> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> >
> > > Enrico,
> > >
> > > Thank you so much! It is a great effort for putting this up. Overall
> > looks
> > > good. I made some comments, we can discuss at tomorrow's community
> > meeting.
> > >
> > > - Sijie
> > >
> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <eo...@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > > I have drafted a first proposal for BP-14 - Relax Durability
> > > >
> > > > We are talking about limiting the number of fsync to the journal
> while
> > > > preserving the correctness of the LAC protocol.
> > > >
> > > > This is the link to the wiki page, but as the issue is huge we prefer
> > to
> > > > use Google Documents for sharing comments
> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > > BP+-+14+Relax+durability
> > > >
> > > > This is the document
> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > > >
> > > > All comments are welcome
> > > >
> > > > I have added DL dev list in cc as the discussion is interesting for
> > both
> > > > groups
> > > >
> > > > Enrico Olivelli
> > > >
> > >
> >
>
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>
-- 


-- Enrico Olivelli

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <ju...@gmail.com>
wrote:

> Hi all,
>
> As promised during Thursday call, here is my proposal.
>
> *NOTE*: Major difference in this proposal compared to Enrico’s
> <
> https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v
> >
> is
> making the durability a property of the ledger(type) as opposed to
> addEntry(). Rest of the technical details have a lot of similarities.
>

Thank you JV. I have just read quickly the doc and your view is centantly
broader.
I will dig into the doc as soon as possible on Monday.
For me it is ok to have a ledger wide configuration I think that the most
important decision is about the API we will provide as in the future it
will be difficult to change it.


Cheers
Enrico



>
> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqvWpq432ODEghrGVQ4d4Q/edit?usp=sharing
>
> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Thank you all for the comments and for taking a look to the document so
> > soon.
> > I have updated the doc, we will discuss the document at the meeting,
> >
> >
> > Enrico
> >
> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> >
> > > Enrico,
> > >
> > > Thank you so much! It is a great effort for putting this up. Overall
> > looks
> > > good. I made some comments, we can discuss at tomorrow's community
> > meeting.
> > >
> > > - Sijie
> > >
> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <eo...@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > > I have drafted a first proposal for BP-14 - Relax Durability
> > > >
> > > > We are talking about limiting the number of fsync to the journal
> while
> > > > preserving the correctness of the LAC protocol.
> > > >
> > > > This is the link to the wiki page, but as the issue is huge we prefer
> > to
> > > > use Google Documents for sharing comments
> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > > BP+-+14+Relax+durability
> > > >
> > > > This is the document
> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > > >
> > > > All comments are welcome
> > > >
> > > > I have added DL dev list in cc as the discussion is interesting for
> > both
> > > > groups
> > > >
> > > > Enrico Olivelli
> > > >
> > >
> >
>
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>
-- 


-- Enrico Olivelli

Re: [DISCUSS] BP-14 Relax Durability

Posted by Venkateswara Rao Jujjuri <ju...@gmail.com>.
Hi all,

As promised during Thursday call, here is my proposal.

*NOTE*: Major difference in this proposal compared to Enrico’s
<https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
is
making the durability a property of the ledger(type) as opposed to
addEntry(). Rest of the technical details have a lot of similarities.

https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqvWpq432ODEghrGVQ4d4Q/edit?usp=sharing

On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> Thank you all for the comments and for taking a look to the document so
> soon.
> I have updated the doc, we will discuss the document at the meeting,
>
>
> Enrico
>
> 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>
> > Enrico,
> >
> > Thank you so much! It is a great effort for putting this up. Overall
> looks
> > good. I made some comments, we can discuss at tomorrow's community
> meeting.
> >
> > - Sijie
> >
> > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > > I have drafted a first proposal for BP-14 - Relax Durability
> > >
> > > We are talking about limiting the number of fsync to the journal while
> > > preserving the correctness of the LAC protocol.
> > >
> > > This is the link to the wiki page, but as the issue is huge we prefer
> to
> > > use Google Documents for sharing comments
> > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > BP+-+14+Relax+durability
> > >
> > > This is the document
> > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > >
> > > All comments are welcome
> > >
> > > I have added DL dev list in cc as the discussion is interesting for
> both
> > > groups
> > >
> > > Enrico Olivelli
> > >
> >
>



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: [DISCUSS] BP-14 Relax Durability

Posted by Venkateswara Rao Jujjuri <ju...@gmail.com>.
Hi all,

As promised during Thursday call, here is my proposal.

*NOTE*: Major difference in this proposal compared to Enrico’s
<https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
is
making the durability a property of the ledger(type) as opposed to
addEntry(). Rest of the technical details have a lot of similarities.

https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqvWpq432ODEghrGVQ4d4Q/edit?usp=sharing

On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> Thank you all for the comments and for taking a look to the document so
> soon.
> I have updated the doc, we will discuss the document at the meeting,
>
>
> Enrico
>
> 2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>
> > Enrico,
> >
> > Thank you so much! It is a great effort for putting this up. Overall
> looks
> > good. I made some comments, we can discuss at tomorrow's community
> meeting.
> >
> > - Sijie
> >
> > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > > I have drafted a first proposal for BP-14 - Relax Durability
> > >
> > > We are talking about limiting the number of fsync to the journal while
> > > preserving the correctness of the LAC protocol.
> > >
> > > This is the link to the wiki page, but as the issue is huge we prefer
> to
> > > use Google Documents for sharing comments
> > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > BP+-+14+Relax+durability
> > >
> > > This is the document
> > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> > >
> > > All comments are welcome
> > >
> > > I have added DL dev list in cc as the discussion is interesting for
> both
> > > groups
> > >
> > > Enrico Olivelli
> > >
> >
>



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
Thank you all for the comments and for taking a look to the document so
soon.
I have updated the doc, we will discuss the document at the meeting,


Enrico

2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:

> Enrico,
>
> Thank you so much! It is a great effort for putting this up. Overall looks
> good. I made some comments, we can discuss at tomorrow's community meeting.
>
> - Sijie
>
> On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Hi all,
> > I have drafted a first proposal for BP-14 - Relax Durability
> >
> > We are talking about limiting the number of fsync to the journal while
> > preserving the correctness of the LAC protocol.
> >
> > This is the link to the wiki page, but as the issue is huge we prefer to
> > use Google Documents for sharing comments
> > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > BP+-+14+Relax+durability
> >
> > This is the document
> > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> >
> > All comments are welcome
> >
> > I have added DL dev list in cc as the discussion is interesting for both
> > groups
> >
> > Enrico Olivelli
> >
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Enrico Olivelli <eo...@gmail.com>.
Thank you all for the comments and for taking a look to the document so
soon.
I have updated the doc, we will discuss the document at the meeting,


Enrico

2017-08-24 2:27 GMT+02:00 Sijie Guo <gu...@gmail.com>:

> Enrico,
>
> Thank you so much! It is a great effort for putting this up. Overall looks
> good. I made some comments, we can discuss at tomorrow's community meeting.
>
> - Sijie
>
> On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Hi all,
> > I have drafted a first proposal for BP-14 - Relax Durability
> >
> > We are talking about limiting the number of fsync to the journal while
> > preserving the correctness of the LAC protocol.
> >
> > This is the link to the wiki page, but as the issue is huge we prefer to
> > use Google Documents for sharing comments
> > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > BP+-+14+Relax+durability
> >
> > This is the document
> > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
> >
> > All comments are welcome
> >
> > I have added DL dev list in cc as the discussion is interesting for both
> > groups
> >
> > Enrico Olivelli
> >
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Sijie Guo <gu...@gmail.com>.
Enrico,

Thank you so much! It is a great effort for putting this up. Overall looks
good. I made some comments, we can discuss at tomorrow's community meeting.

- Sijie

On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> Hi all,
> I have drafted a first proposal for BP-14 - Relax Durability
>
> We are talking about limiting the number of fsync to the journal while
> preserving the correctness of the LAC protocol.
>
> This is the link to the wiki page, but as the issue is huge we prefer to
> use Google Documents for sharing comments
> https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> BP+-+14+Relax+durability
>
> This is the document
> https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
>
> All comments are welcome
>
> I have added DL dev list in cc as the discussion is interesting for both
> groups
>
> Enrico Olivelli
>

Re: [DISCUSS] BP-14 Relax Durability

Posted by Sijie Guo <gu...@gmail.com>.
Enrico,

Thank you so much! It is a great effort for putting this up. Overall looks
good. I made some comments, we can discuss at tomorrow's community meeting.

- Sijie

On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> Hi all,
> I have drafted a first proposal for BP-14 - Relax Durability
>
> We are talking about limiting the number of fsync to the journal while
> preserving the correctness of the LAC protocol.
>
> This is the link to the wiki page, but as the issue is huge we prefer to
> use Google Documents for sharing comments
> https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> BP+-+14+Relax+durability
>
> This is the document
> https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
> NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
>
> All comments are welcome
>
> I have added DL dev list in cc as the discussion is interesting for both
> groups
>
> Enrico Olivelli
>