You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Venkateswara Rao Jujjuri <ju...@gmail.com> on 2017/05/16 05:35:20 UTC

Client-Bookie Protocol Enhancements.

As we are moving towards mega store, which can house 10s of millions(even
100s) of ledgers, and 1000s of bookies, there can be a huge overhead on
some of the operations we are performing right now.

1. Compaction/GC
A deletion is just a metadata operation. Deletes the zk node. (currently it
just deletes the leaf node, which can be a minor bugfix to delete entire
tree if applicable). But each bookie
need to get a list of existing ledgers from bookie, compare to its local
storage, and identify
deleted ledgers. Then go through compaction logic which is another process.

But 1000s of bookies, in parallel parsing the entire zk tree, making their
own list doesn't appear to be efficient scalable architecture to me.

Why not introduce a opportunistic delete operation from client side, which
will inform to all bookies in that ledger's metadata. We can still keep our
bruit-force method but at very very low frequency, once a week? to address
transient/corner case scenarios like bookie down at that time etc. Is there
any big architectural correctness issue I am missing in this method?

2. Close
Ledger close is also a metadata operation. I believe sending opportunistic
close to bookies of the current ensemble can greatly enhance some of the
use-cases where we need open-to-close consistency. Where in the data
doesn't need to be persistent until the close. Any thoughts??

-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: Client-Bookie Protocol Enhancements.

Posted by Sijie Guo <gu...@gmail.com>.
On Tue, May 16, 2017 at 10:59 PM, Venkateswara Rao Jujjuri <
jujjuri@gmail.com> wrote:

> Please note that I started this thread to stir up discussion.. not my final
> proposals.
>
>
> On Tue, May 16, 2017 at 1:02 AM, Sijie Guo <gu...@gmail.com> wrote:
>
> > On May 16, 2017 12:09 AM, "Enrico Olivelli - Diennea" <
> > enrico.olivelli@diennea.com> wrote:
> >
> > Il giorno mar, 16/05/2017 alle 00.06 -0700, Sijie Guo ha scritto:
> >
> > On Mon, May 15, 2017 at 10:35 PM, Venkateswara Rao Jujjuri <
> > jujjuri@gmail.com<ma...@gmail.com>> wrote:
> >
> >
> >
> > As we are moving towards mega store, which can house 10s of millions(even
> > 100s) of ledgers, and 1000s of bookies, there can be a huge overhead on
> > some of the operations we are performing right now.
> >
> > 1. Compaction/GC
> > A deletion is just a metadata operation. Deletes the zk node. (currently
> it
> > just deletes the leaf node, which can be a minor bugfix to delete entire
> > tree if applicable).
> >
> >
> >
> >
> > +1 for a fix for improving this.
> >
> >
> >
> > +1 for me too, client could send some hint to the bookie that the ledger
> > has been removed and maybe trigger a special GC
> >
> >
> > Just to clarify my +1 is on the fix of deleting the tree nodes on current
> > approach. Not for sending the requests yet.
> >
> >
> >
> >
> >
> >
> >
> > But each bookie
> > need to get a list of existing ledgers from bookie, compare to its local
> > storage, and identify
> > deleted ledgers. Then go through compaction logic which is another
> process.
> >
> > But 1000s of bookies, in parallel parsing the entire zk tree, making
> their
> > own list doesn't appear to be efficient scalable architecture to me.
> >
> > Why not introduce a opportunistic delete operation from client side,
> which
> > will inform to all bookies in that ledger's metadata. We can still keep
> our
> > bruit-force method but at very very low frequency, once a week? to
> address
> > transient/corner case scenarios like bookie down at that time etc. Is
> there
> > any big architectural correctness issue I am missing in this method?
> >
> >
> >
> >
> > I don't think there is a correctness issue for the approach your proposed
> > if current background gc is still running.
> > The current approach is just for simplifying the client logic.
> >
> > Instead of introducing complexity (more operations) on client side, why
> > can't the leader (auditor) perform the deletions?
> >
>
> I believe auditor doing it is more complicated, unless I am mistaken.
>

Technically Auditor is a **client**. What I am try to say here is - It can
be part of client job, but it can be kept as an internal client. It is
always good to make the public client as thin as possible.


> - Auditor Leader is on just one one node, so it need to communicate to
> bookies on client protocol. Right?
>   Hence we can't avoid the 'complexity' you mentioned above.
>
> - How does auditor leader know about deleted list? This needs more work.
>





>
>
>
>
> >
> >
> >
> >
> >
> > 2. Close
> > Ledger close is also a metadata operation. I believe sending
> opportunistic
> > close to bookies of the current ensemble can greatly enhance some of the
> > use-cases where we need open-to-close consistency. Where in the data
> > doesn't need to be persistent until the close. Any thoughts??
> >
> >
> >
> >
> > You mean "close-to-open" consistency?
> >
> > I am trying to understand - Why "where in the data doesn't need to be
> > persistent until the close" is related to ledger close? Are you thinking
> of
> > flushing all entries on the bookies on closing a ledger?
> >
>
> Correct.
>
>
> > How do you handle
> > ensemble changes?
> >
>
> Thinking aloud... but in this mode, any ensemble change will result in
> write failure.
> Client must be aware of this mode.


+1 on this (although there are some tech debts on close-to-open semantic).
At least I think it is good to write fence request or advance LAC when
close would help a lot on reducing the zk dependency.


>
> JV
>
> >
> > - Sijie
> >
> >
> >
> >
> >
> >
> >
> > --
> > Jvrao
> > ---
> > First they ignore you, then they laugh at you, then they fight you, then
> > you win. - Mahatma Gandhi
> >
> >
> >
> > --
> >
> > Enrico Olivelli Software Development Manager @Diennea Tel.: (+39) 0546
> > 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA) MagNews -
> > E-mail Marketing Solutions http://www.magnews.it Diennea - Digital
> > Marketing Solutions http://www.diennea.com
> >
> > ________________________________
> >
> > Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed
> > email marketing! http://www.magnews.it/newsletter/
> >
> > The information in this email is confidential and may be legally
> > privileged. If you are not the intended recipient please notify the
> sender
> > immediately and destroy this email. Any unauthorized, direct or indirect,
> > disclosure, copying, storage, distribution or other use is strictly
> > forbidden.
> >
>
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>

Re: Client-Bookie Protocol Enhancements.

Posted by Venkateswara Rao Jujjuri <ju...@gmail.com>.
Please note that I started this thread to stir up discussion.. not my final
proposals.


On Tue, May 16, 2017 at 1:02 AM, Sijie Guo <gu...@gmail.com> wrote:

> On May 16, 2017 12:09 AM, "Enrico Olivelli - Diennea" <
> enrico.olivelli@diennea.com> wrote:
>
> Il giorno mar, 16/05/2017 alle 00.06 -0700, Sijie Guo ha scritto:
>
> On Mon, May 15, 2017 at 10:35 PM, Venkateswara Rao Jujjuri <
> jujjuri@gmail.com<ma...@gmail.com>> wrote:
>
>
>
> As we are moving towards mega store, which can house 10s of millions(even
> 100s) of ledgers, and 1000s of bookies, there can be a huge overhead on
> some of the operations we are performing right now.
>
> 1. Compaction/GC
> A deletion is just a metadata operation. Deletes the zk node. (currently it
> just deletes the leaf node, which can be a minor bugfix to delete entire
> tree if applicable).
>
>
>
>
> +1 for a fix for improving this.
>
>
>
> +1 for me too, client could send some hint to the bookie that the ledger
> has been removed and maybe trigger a special GC
>
>
> Just to clarify my +1 is on the fix of deleting the tree nodes on current
> approach. Not for sending the requests yet.
>
>
>
>
>
>
>
> But each bookie
> need to get a list of existing ledgers from bookie, compare to its local
> storage, and identify
> deleted ledgers. Then go through compaction logic which is another process.
>
> But 1000s of bookies, in parallel parsing the entire zk tree, making their
> own list doesn't appear to be efficient scalable architecture to me.
>
> Why not introduce a opportunistic delete operation from client side, which
> will inform to all bookies in that ledger's metadata. We can still keep our
> bruit-force method but at very very low frequency, once a week? to address
> transient/corner case scenarios like bookie down at that time etc. Is there
> any big architectural correctness issue I am missing in this method?
>
>
>
>
> I don't think there is a correctness issue for the approach your proposed
> if current background gc is still running.
> The current approach is just for simplifying the client logic.
>
> Instead of introducing complexity (more operations) on client side, why
> can't the leader (auditor) perform the deletions?
>

I believe auditor doing it is more complicated, unless I am mistaken.

- Auditor Leader is on just one one node, so it need to communicate to
bookies on client protocol. Right?
  Hence we can't avoid the 'complexity' you mentioned above.

- How does auditor leader know about deleted list? This needs more work.




>
>
>
>
>
> 2. Close
> Ledger close is also a metadata operation. I believe sending opportunistic
> close to bookies of the current ensemble can greatly enhance some of the
> use-cases where we need open-to-close consistency. Where in the data
> doesn't need to be persistent until the close. Any thoughts??
>
>
>
>
> You mean "close-to-open" consistency?
>
> I am trying to understand - Why "where in the data doesn't need to be
> persistent until the close" is related to ledger close? Are you thinking of
> flushing all entries on the bookies on closing a ledger?
>

Correct.


> How do you handle
> ensemble changes?
>

Thinking aloud... but in this mode, any ensemble change will result in
write failure.
Client must be aware of this mode.

JV

>
> - Sijie
>
>
>
>
>
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>
>
>
> --
>
> Enrico Olivelli Software Development Manager @Diennea Tel.: (+39) 0546
> 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA) MagNews -
> E-mail Marketing Solutions http://www.magnews.it Diennea - Digital
> Marketing Solutions http://www.diennea.com
>
> ________________________________
>
> Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed
> email marketing! http://www.magnews.it/newsletter/
>
> The information in this email is confidential and may be legally
> privileged. If you are not the intended recipient please notify the sender
> immediately and destroy this email. Any unauthorized, direct or indirect,
> disclosure, copying, storage, distribution or other use is strictly
> forbidden.
>



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: Client-Bookie Protocol Enhancements.

Posted by Sijie Guo <gu...@gmail.com>.
On May 16, 2017 12:09 AM, "Enrico Olivelli - Diennea" <
enrico.olivelli@diennea.com> wrote:

Il giorno mar, 16/05/2017 alle 00.06 -0700, Sijie Guo ha scritto:

On Mon, May 15, 2017 at 10:35 PM, Venkateswara Rao Jujjuri <
jujjuri@gmail.com<ma...@gmail.com>> wrote:



As we are moving towards mega store, which can house 10s of millions(even
100s) of ledgers, and 1000s of bookies, there can be a huge overhead on
some of the operations we are performing right now.

1. Compaction/GC
A deletion is just a metadata operation. Deletes the zk node. (currently it
just deletes the leaf node, which can be a minor bugfix to delete entire
tree if applicable).




+1 for a fix for improving this.



+1 for me too, client could send some hint to the bookie that the ledger
has been removed and maybe trigger a special GC


Just to clarify my +1 is on the fix of deleting the tree nodes on current
approach. Not for sending the requests yet.







But each bookie
need to get a list of existing ledgers from bookie, compare to its local
storage, and identify
deleted ledgers. Then go through compaction logic which is another process.

But 1000s of bookies, in parallel parsing the entire zk tree, making their
own list doesn't appear to be efficient scalable architecture to me.

Why not introduce a opportunistic delete operation from client side, which
will inform to all bookies in that ledger's metadata. We can still keep our
bruit-force method but at very very low frequency, once a week? to address
transient/corner case scenarios like bookie down at that time etc. Is there
any big architectural correctness issue I am missing in this method?




I don't think there is a correctness issue for the approach your proposed
if current background gc is still running.
The current approach is just for simplifying the client logic.

Instead of introducing complexity (more operations) on client side, why
can't the leader (auditor) perform the deletions?





2. Close
Ledger close is also a metadata operation. I believe sending opportunistic
close to bookies of the current ensemble can greatly enhance some of the
use-cases where we need open-to-close consistency. Where in the data
doesn't need to be persistent until the close. Any thoughts??




You mean "close-to-open" consistency?

I am trying to understand - Why "where in the data doesn't need to be
persistent until the close" is related to ledger close? Are you thinking of
flushing all entries on the bookies on closing a ledger? How do you handle
ensemble changes?

- Sijie







--
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi



--

Enrico Olivelli Software Development Manager @Diennea Tel.: (+39) 0546
066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA) MagNews -
E-mail Marketing Solutions http://www.magnews.it Diennea - Digital
Marketing Solutions http://www.diennea.com

________________________________

Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed
email marketing! http://www.magnews.it/newsletter/

The information in this email is confidential and may be legally
privileged. If you are not the intended recipient please notify the sender
immediately and destroy this email. Any unauthorized, direct or indirect,
disclosure, copying, storage, distribution or other use is strictly
forbidden.

Re: Client-Bookie Protocol Enhancements.

Posted by Enrico Olivelli - Diennea <en...@diennea.com>.
Il giorno mar, 16/05/2017 alle 00.06 -0700, Sijie Guo ha scritto:

On Mon, May 15, 2017 at 10:35 PM, Venkateswara Rao Jujjuri <
jujjuri@gmail.com<ma...@gmail.com>> wrote:



As we are moving towards mega store, which can house 10s of millions(even
100s) of ledgers, and 1000s of bookies, there can be a huge overhead on
some of the operations we are performing right now.

1. Compaction/GC
A deletion is just a metadata operation. Deletes the zk node. (currently it
just deletes the leaf node, which can be a minor bugfix to delete entire
tree if applicable).




+1 for a fix for improving this.



+1 for me too, client could send some hint to the bookie that the ledger has been removed and maybe trigger a special GC






But each bookie
need to get a list of existing ledgers from bookie, compare to its local
storage, and identify
deleted ledgers. Then go through compaction logic which is another process.

But 1000s of bookies, in parallel parsing the entire zk tree, making their
own list doesn't appear to be efficient scalable architecture to me.

Why not introduce a opportunistic delete operation from client side, which
will inform to all bookies in that ledger's metadata. We can still keep our
bruit-force method but at very very low frequency, once a week? to address
transient/corner case scenarios like bookie down at that time etc. Is there
any big architectural correctness issue I am missing in this method?




I don't think there is a correctness issue for the approach your proposed
if current background gc is still running.
The current approach is just for simplifying the client logic.

Instead of introducing complexity (more operations) on client side, why
can't the leader (auditor) perform the deletions?





2. Close
Ledger close is also a metadata operation. I believe sending opportunistic
close to bookies of the current ensemble can greatly enhance some of the
use-cases where we need open-to-close consistency. Where in the data
doesn't need to be persistent until the close. Any thoughts??




You mean "close-to-open" consistency?

I am trying to understand - Why "where in the data doesn't need to be
persistent until the close" is related to ledger close? Are you thinking of
flushing all entries on the bookies on closing a ledger? How do you handle
ensemble changes?

- Sijie







--
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi



--

Enrico Olivelli Software Development Manager @Diennea Tel.: (+39) 0546 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA) MagNews - E-mail Marketing Solutions http://www.magnews.it Diennea - Digital Marketing Solutions http://www.diennea.com

________________________________

Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed email marketing! http://www.magnews.it/newsletter/

The information in this email is confidential and may be legally privileged. If you are not the intended recipient please notify the sender immediately and destroy this email. Any unauthorized, direct or indirect, disclosure, copying, storage, distribution or other use is strictly forbidden.

Re: Client-Bookie Protocol Enhancements.

Posted by Sijie Guo <gu...@gmail.com>.
On Mon, May 15, 2017 at 10:35 PM, Venkateswara Rao Jujjuri <
jujjuri@gmail.com> wrote:

> As we are moving towards mega store, which can house 10s of millions(even
> 100s) of ledgers, and 1000s of bookies, there can be a huge overhead on
> some of the operations we are performing right now.
>
> 1. Compaction/GC
> A deletion is just a metadata operation. Deletes the zk node. (currently it
> just deletes the leaf node, which can be a minor bugfix to delete entire
> tree if applicable).


+1 for a fix for improving this.


> But each bookie
> need to get a list of existing ledgers from bookie, compare to its local
> storage, and identify
> deleted ledgers. Then go through compaction logic which is another process.
>
> But 1000s of bookies, in parallel parsing the entire zk tree, making their
> own list doesn't appear to be efficient scalable architecture to me.
>
> Why not introduce a opportunistic delete operation from client side, which
> will inform to all bookies in that ledger's metadata. We can still keep our
> bruit-force method but at very very low frequency, once a week? to address
> transient/corner case scenarios like bookie down at that time etc. Is there
> any big architectural correctness issue I am missing in this method?
>

I don't think there is a correctness issue for the approach your proposed
if current background gc is still running.
The current approach is just for simplifying the client logic.

Instead of introducing complexity (more operations) on client side, why
can't the leader (auditor) perform the deletions?


>
> 2. Close
> Ledger close is also a metadata operation. I believe sending opportunistic
> close to bookies of the current ensemble can greatly enhance some of the
> use-cases where we need open-to-close consistency. Where in the data
> doesn't need to be persistent until the close. Any thoughts??
>

You mean "close-to-open" consistency?

I am trying to understand - Why "where in the data doesn't need to be
persistent until the close" is related to ledger close? Are you thinking of
flushing all entries on the bookies on closing a ledger? How do you handle
ensemble changes?

- Sijie




>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>