You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Jia Zhai <zh...@gmail.com> on 2017/09/05 13:10:25 UTC

[DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Hi all,

I have just posted a proposal to remove zookeeper dependency from
bookkeeper client, to make bookkeeper client a thin client:

https://cwiki.apache.org/confluence/display/BOOKKEEPER/BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client


BookKeeper uses zookeeper for service discovery (discovering the available
bookies in the cluster), metadata management (storing all the metadata for
ledgers). However it exposes the metadata storage directly to the clients,
making bookkeeper client a very thick client. It also exposes some problems.

This BP explores the possibility of eliminating zookeeper completely from
client side, to produce a thin bookkeeper client.

I will send a patch as soon as we agree on the proposal.


Thanks.

-Jia

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Enrico Olivelli <eo...@gmail.com>.
On mar 5 set 2017, 21:28 Sijie Guo <gu...@gmail.com> wrote:

> Enrico,
>
> Thank you for your feedback.
>
> Just FYI - this BP is the first part of the work that we've been working on
> improving metadata management on BookKeeper. We are doing this in three
> parts:
>
> - thin client : avoid talking to metadata store directly in clients, moving
> the metadata management to the bookie side.
> - new metadata store: storing metadata in bookies (both journal and
> snapshots are stored at zookeeper-based ledgers), reduce the zookeeper
> usage
> - eliminating zookeeper:  eliminate zookeeper usage completely.
>

That sounds really great to me.

 I feel it is a good roadmap I will help as much as possible in this
direction.



> One comment inline. I would let Jia answer other questions.
>
> On Tue, Sep 5, 2017 at 6:31 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Great to see you working on this !
> > I would be great to have such feature, as it is the first step to a
> > 'standalone' BookKeeper mode
> >
> > Some complementary ideas/first look questions:
> > - the document does not talk about security, IMHO we have at least to
> cover
> > authentication and TLS, it would be great to leverage existing
> AuthPlugins,
> > as they are based on exchanging byte[] (as SASL wants)
> > - do we have some kind of "bootstrap servers list" configuration option ?
> > the list should be complete or just a subset of bookies ? at connection
> the
> > client could discover the list of other bookies
> > - will the client connect to only one bookie at a time ? how we will deal
> > with errors ?
> > - should the bookie write on ZK metadata its gRPC endpoint info ? (this
> > will be useful for a bookie to tell about other bookies to the connected
> > clients)
> > - the bookie will be somehow a proxy for zookeeper, I think that the
> > 'watch' part is the more complex, we will have to deal with
> reconnections,
> > errors....maybe it is worth to write more detail about this
> >
> > Minor issues:
> > - Maybe you can consider using ledgerId and not ledger_id, like in
> > LedgerMetadataFormat we are using lastEntryId
> >
>
>
>
>
> > -In the "motivation" part you write that the fact the having more clients
> > than the number of bookies would be a problem for zookeeper, actually
> > zookeeper is very good at dealing with a huge number of clients.
> Actually I
> > am always running clusters with 3-5 bookies and 10-100 writing clients
> and
> > this has never given troubles'
> >
>
>
> First, I would not claim zookeeper is good at dealing with a huge number of
> clients when a zookeeper ensemble is only serving only 10-100 clients.
>
> Second, based on my production experiences, watch and session expires are
> the two main issues on zookeeper when there are a lot of watchers and a lot
> of connections (a lot means more than thousands or even tens of thousands).
>
> Watch and session expires are also two main reasons that I don't like
> zookeeper:
>
> - session expires. for simplicity, zookeeper tights session state directly
> with connection state. so when a connection is broken, a session is usually
> expired (unless it reconnects before session expires); when a session is
> expired, the underlying connection
> can not be used anymore, the application has to close the connection and
> recreate a new client (establishing a new connection). It is understandable
> that it makes zookeeper development super easy. However it is a very bad
> design in practice. Because
> it means if you can not establish a session, you can't use this connection
> and you have to create new connections: once your zookeeper is in a bad
> state (e.g. network issue or jvm gc), the whole environment will be a very
> bad state (e.g. connection storm), and can barely
> recover from the state until you kill clients and ask them to not connect
> to zookeeper.
>
> -  watcher: 1) it is one time watcher, I can't reliably use it to get
> updates 2) in order to set a watcher, you have to read a znode or get
> children. Image such a use case, clients are watching a list of znodes
> (e.g. list of bookies), when those clients expire, they have
> to rewatch the list. in order to rewatch the list, the clients have to read
> the list first even the list is never changed. It becomes a disaster,
> because all the clients will reread the whole list and overwhelm the
> network bandwidth, and cause session expires.
>

There is an interesting work from Jordan Z, the creator of Curator for
having persistent watches, I think this work could be useful for us
https://github.com/apache/zookeeper/pull/136

>
>
> I can tell a lot of production issues related to the above two behaviors
> (either one of them, or a combination of them) if you are interested.
>

Sure I believe you

Cheers
Enrico

>
>
>
> >
> > Future:
> > - as bookies will be proxies maybe we should take care not to overwhelm a
> > bookie with too many clients
> > - iteration on ledgers, sometimes the clients enumerates ledgers but it
> is
> > not interested in having all of them, as we are using the bookie as proxy
> > maybe some kind of "filter" (at least on custom metadata) would be create
> > to limit the number of returned items. Other point I don't know gRPC but
> it
> > does not seems to be very clear how to 'stop' the iteration
> >
> > -- Enrico
> >
> >
> > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> >
> > > Hi all,
> > >
> > > I have just posted a proposal to remove zookeeper dependency from
> > > bookkeeper client, to make bookkeeper client a thin client:
> > >
> > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
> > >
> > >
> > > BookKeeper uses zookeeper for service discovery (discovering the
> > available
> > > bookies in the cluster), metadata management (storing all the metadata
> > for
> > > ledgers). However it exposes the metadata storage directly to the
> > clients,
> > > making bookkeeper client a very thick client. It also exposes some
> > > problems.
> > >
> > > This BP explores the possibility of eliminating zookeeper completely
> from
> > > client side, to produce a thin bookkeeper client.
> > >
> > > I will send a patch as soon as we agree on the proposal.
> > >
> > >
> > > Thanks.
> > >
> > > -Jia
> > >
> >
>
-- 


-- Enrico Olivelli

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Sijie Guo <gu...@gmail.com>.
Enrico,

Thank you for your feedback.

Just FYI - this BP is the first part of the work that we've been working on
improving metadata management on BookKeeper. We are doing this in three
parts:

- thin client : avoid talking to metadata store directly in clients, moving
the metadata management to the bookie side.
- new metadata store: storing metadata in bookies (both journal and
snapshots are stored at zookeeper-based ledgers), reduce the zookeeper usage
- eliminating zookeeper:  eliminate zookeeper usage completely.

One comment inline. I would let Jia answer other questions.

On Tue, Sep 5, 2017 at 6:31 AM, Enrico Olivelli <eo...@gmail.com> wrote:

> Great to see you working on this !
> I would be great to have such feature, as it is the first step to a
> 'standalone' BookKeeper mode
>
> Some complementary ideas/first look questions:
> - the document does not talk about security, IMHO we have at least to cover
> authentication and TLS, it would be great to leverage existing AuthPlugins,
> as they are based on exchanging byte[] (as SASL wants)
> - do we have some kind of "bootstrap servers list" configuration option ?
> the list should be complete or just a subset of bookies ? at connection the
> client could discover the list of other bookies
> - will the client connect to only one bookie at a time ? how we will deal
> with errors ?
> - should the bookie write on ZK metadata its gRPC endpoint info ? (this
> will be useful for a bookie to tell about other bookies to the connected
> clients)
> - the bookie will be somehow a proxy for zookeeper, I think that the
> 'watch' part is the more complex, we will have to deal with reconnections,
> errors....maybe it is worth to write more detail about this
>
> Minor issues:
> - Maybe you can consider using ledgerId and not ledger_id, like in
> LedgerMetadataFormat we are using lastEntryId
>




> -In the "motivation" part you write that the fact the having more clients
> than the number of bookies would be a problem for zookeeper, actually
> zookeeper is very good at dealing with a huge number of clients. Actually I
> am always running clusters with 3-5 bookies and 10-100 writing clients and
> this has never given troubles'
>


First, I would not claim zookeeper is good at dealing with a huge number of
clients when a zookeeper ensemble is only serving only 10-100 clients.

Second, based on my production experiences, watch and session expires are
the two main issues on zookeeper when there are a lot of watchers and a lot
of connections (a lot means more than thousands or even tens of thousands).

Watch and session expires are also two main reasons that I don't like
zookeeper:

- session expires. for simplicity, zookeeper tights session state directly
with connection state. so when a connection is broken, a session is usually
expired (unless it reconnects before session expires); when a session is
expired, the underlying connection
can not be used anymore, the application has to close the connection and
recreate a new client (establishing a new connection). It is understandable
that it makes zookeeper development super easy. However it is a very bad
design in practice. Because
it means if you can not establish a session, you can't use this connection
and you have to create new connections: once your zookeeper is in a bad
state (e.g. network issue or jvm gc), the whole environment will be a very
bad state (e.g. connection storm), and can barely
recover from the state until you kill clients and ask them to not connect
to zookeeper.

-  watcher: 1) it is one time watcher, I can't reliably use it to get
updates 2) in order to set a watcher, you have to read a znode or get
children. Image such a use case, clients are watching a list of znodes
(e.g. list of bookies), when those clients expire, they have
to rewatch the list. in order to rewatch the list, the clients have to read
the list first even the list is never changed. It becomes a disaster,
because all the clients will reread the whole list and overwhelm the
network bandwidth, and cause session expires.


I can tell a lot of production issues related to the above two behaviors
(either one of them, or a combination of them) if you are interested.



>
> Future:
> - as bookies will be proxies maybe we should take care not to overwhelm a
> bookie with too many clients
> - iteration on ledgers, sometimes the clients enumerates ledgers but it is
> not interested in having all of them, as we are using the bookie as proxy
> maybe some kind of "filter" (at least on custom metadata) would be create
> to limit the number of returned items. Other point I don't know gRPC but it
> does not seems to be very clear how to 'stop' the iteration
>
> -- Enrico
>
>
> 2017-09-05 15:10 GMT+02:00 Jia Zhai <zh...@gmail.com>:
>
> > Hi all,
> >
> > I have just posted a proposal to remove zookeeper dependency from
> > bookkeeper client, to make bookkeeper client a thin client:
> >
> > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
> >
> >
> > BookKeeper uses zookeeper for service discovery (discovering the
> available
> > bookies in the cluster), metadata management (storing all the metadata
> for
> > ledgers). However it exposes the metadata storage directly to the
> clients,
> > making bookkeeper client a very thick client. It also exposes some
> > problems.
> >
> > This BP explores the possibility of eliminating zookeeper completely from
> > client side, to produce a thin bookkeeper client.
> >
> > I will send a patch as soon as we agree on the proposal.
> >
> >
> > Thanks.
> >
> > -Jia
> >
>

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Sijie Guo <gu...@gmail.com>.
Thank you JV. we will try to come up with a BP covering the overall picture
on the metadata enhancements we are thinking. we can then discuss and
collaborate.

On Sat, Sep 16, 2017 at 3:52 PM, Venkateswara Rao Jujjuri <jujjuri@gmail.com
> wrote:

> If the real use of this work item is to use:
> -  Only one bookie
> - Bookie is the metadata server too


the metadata rpc service is a 'stateless' service that can be co-run in
bookies. like how do we run `autorecovery`. It will also help with upgrade
from one metadata store to the other metadata store.


>
> Maybe it is ok. But I am worried about the *magnitude* of the code refactor
> to accommodate this.
> I really like the idea of finding an alternate to ZK, but I think we need
> to have that direction ironed out
> before rushing with the first step without knowing what the next step is.
> Again, I am worried about the
> code refactor and resulting regression.
>

the plan is not to refactor any existing implementation of ledger manager.
we are also concerned about stability with refactor.
the plan is introducing new ledger manager implementation, and have a full
story about upgrade, rollback and migration. would like to cover more in
the BP I will share later.


>
> Removing ZK is one part, but changing from centralized metadata server
> (ZK/etcd kind) model to distributing
> metadata across bookies in the cluster is a HUGE leap. We are looking at
> implementing our own version of Paxos.
>

yes. our plan is to have a decentralized metadata store (can run along with
bookies or on separated machines, depending on your deployment methods).
The approach we are thinking of
is reusing the storage we have on bookies and bootstrap metadata from
ledgers/logs.


>
> It is possible I am not understanding the full intent here and would love
> to have more discussion.
> Can we have more discussion on Thursday call again?
>

I will try to share more details before Thursday. Let's discuss this once I
share a BP covering the aspects we are thinking of and working on.


>
> Thanks,
> JV
>
>
>
> On Sat, Sep 16, 2017 at 3:17 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Yep
> > Thank you Jia
> >
> > Enrico
> >
> > On sab 16 set 2017, 12:10 Jia Zhai <zh...@gmail.com> wrote:
> >
> > > Since there is not objective. would like to make this BP approved.
> > >
> > > On Wed, Sep 13, 2017 at 4:24 PM, Sijie Guo <gu...@gmail.com> wrote:
> > >
> > > > On Wed, Sep 13, 2017 at 1:18 AM, Enrico Olivelli <
> eolivelli@gmail.com>
> > > > wrote:
> > > >
> > > > > 2017-09-13 10:10 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > > > >
> > > > > > On Wed, Sep 13, 2017 at 12:16 AM, Enrico Olivelli <
> > > eolivelli@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I think that this is a good direction to go.
> > > > > > >
> > > > > > > I believe to the reasons about ZK in huge systems even it is
> not
> > my
> > > > > case
> > > > > > so
> > > > > > > I cannot add comments on this usecase.
> > > > > > >
> > > > > > > I am fine with direction as long as we are still going to
> support
> > > > > > > ZooKeeper.
> > > > > > > BookKeeper is in the Hadoop / ZooKeeper ecosystem and several
> > > > products
> > > > > > rely
> > > > > > > on ZK too, for instance in my systems it is usual to have
> > > > > > > BookKeeper/Kafka/HBase/Majordodo....  and so I am not going to
> > live
> > > > > > > without
> > > > > > > zookeeper in the short/mid term.
> > > > > > >
> > > > > > > I am really OK in dropping ZK because for "simple" systems in
> > fact
> > > > when
> > > > > > you
> > > > > > > need only BK having the burden of setting up a zookeeper server
> > is
> > > > > weird
> > > > > > > for customers. I usually re-distribute BK + ZK with my
> > applications
> > > > and
> > > > > > we
> > > > > > > are talking about little clusters of up to 10 machines.
> > > > > > >
> > > > > >
> > > > > > Just to clarify - we are not dropping ZK here. we are just
> > proposing
> > > to
> > > > > > have a ledger manager implementation that doesn't depend on
> > zookeeper
> > > > > > directly.
> > > > > > We are not modifying any existing ledger manager implementation.
> > > > > >
> > > > >
> > > > >
> > > > > Yep, we are on the same page
> > > > > for this proposal the bookie will be a sort of "proxy" between the
> > > client
> > > > > and the actual ledger manager implementation which will "live"
> inside
> > > the
> > > > > bookie
> > > > > it is only a new ledger manager to be used in clients, this ledger
> > > > manager
> > > > > will issue RPCs (or kind of "streaming" RPCs) to a list of bookies
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > The direction on this proposal is OK for me and it is very like
> > the
> > > > > work
> > > > > > I
> > > > > > > was starting about "standalone mode".
> > > > > >
> > > > > >
> > > > > > > I think it will be very easy to support the case of having a
> > single
> > > > > > bookie
> > > > > > > with this approach or even client+ bookie in the same JVM,
> > > > > > > Having multiple bookies will make us to add some other
> > coordination
> > > > > > > facility between bookies, I would like to know if there is
> > already
> > > > some
> > > > > > > idea about this, are we going to use another product like
> > > > etcd,jgroups
> > > > > or
> > > > > > > implement our own coordination protocol ?
> > > > > >
> > > > > >
> > > > > > we are not replacing A with B, even with zookeeper. the ledger
> > > > management
> > > > > > is already abstracted in interfaces.
> > > > > > the users can use whatever system they prefer as the metadata
> > store.
> > > > > >
> > > > > > our direction is to provide an option to store metadata as well
> as
> > > data
> > > > > in
> > > > > > bookies. so in this option, there is no external metadata storage
> > > > needed.
> > > > > >
> > > > >
> > > > > Sorry. Maybe my curiosity is not clear.
> > > > > If you have multiple bookies and each bookie holds its own version
> of
> > > > > metadata, how do you coordinate them ? which will be the source of
> > > truth
> > > > ?
> > > > > Maybe we should start a new email thread in the future to talk
> about
> > > > > "alternative distributed metadata storages"
> > > > >
> > > >
> > > > It is out of the scope of this BP. We will have a next BP to cover
> this
> > > > part.
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > Any way the meaning and the scope of the proposal is clear to me
> and
> > I
> > > am
> > > > > really OK with it, I hope it will get soon approved
> > > > >
> > > > > -- Enrico
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > > ZK is simple but it very
> > > > > > > effective.
> > > > > >
> > > > > > Maybe we could help the ZK community to move forward and resolve
> > > > > > > the problems we are bringing to light
> > > > > > >
> > > > > > >
> > > > > > > Enrico
> > > > > > >
> > > > > > >
> > > > > > > 2017-09-13 3:15 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> > > > > > >
> > > > > > > > Any thoughts or comments
> > > > > > > > :)
> > > > > > > >
> > > > > > > > Thanks a lot.
> > > > > > > > -Jia
> > > > > > > >
> > > > > > > > On Tue, Sep 12, 2017 at 4:30 PM, Jia Zhai <
> zhaijia03@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > This blog: https://bitworks.software/
> > > > > blog/en/2017-07-12-replicated-
> > > > > > > > > scalable-commitlog-with-apachebookkeeper.html, which also
> > refer
> > > > a
> > > > > > > little
> > > > > > > > > the limitation of zookeeper in bookkeeper
> > > > > > > > >
> > > > > > > > > On Thu, Sep 7, 2017 at 9:45 AM, Jia Zhai <
> > zhaijia03@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> đź‘Ť. Thanks a lot for the suggestions and feed back.
> > > > > > > > >>
> > > > > > > > >> On Thu, Sep 7, 2017 at 4:24 AM, Sijie Guo <
> > guosijie@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >>> On Wed, Sep 6, 2017 at 1:07 PM, Enrico Olivelli <
> > > > > > eolivelli@gmail.com
> > > > > > > >
> > > > > > > > >>> wrote:
> > > > > > > > >>>
> > > > > > > > >>> > Off topic curiosity... Jia and Sijie, do you think we
> are
> > > > going
> > > > > > to
> > > > > > > > >>> drop ZK
> > > > > > > > >>> > from DL too?
> > > > > > > > >>> >
> > > > > > > > >>>
> > > > > > > > >>> Yes. That's the goal - 1) for large deployment, we are
> > trying
> > > > to
> > > > > > > > overcome
> > > > > > > > >>> the limitation of zookeeper; 2) for smaller deployments,
> it
> > > > will
> > > > > > make
> > > > > > > > >>> deployment much easier, you just need to deploy a cluster
> > of
> > > > > > bookies.
> > > > > > > > >>> once
> > > > > > > > >>> it is done, you can use ledger api or log stream api to
> > > access
> > > > > the
> > > > > > > > >>> bookkeeper cluster.
> > > > > > > > >>>
> > > > > > > > >>> Both DL and BK are metadata storage pluggable. They have
> > very
> > > > > clear
> > > > > > > > >>> interfaces on defining metadata operations. So it is
> > > > > > straightforward
> > > > > > > to
> > > > > > > > >>> use
> > > > > > > > >>> a different metadata storage.
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> > Enrico
> > > > > > > > >>> >
> > > > > > > > >>> > On mer 6 set 2017, 19:51 Enrico Olivelli <
> > > > eolivelli@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >>> >
> > > > > > > > >>> > >
> > > > > > > > >>> > >
> > > > > > > > >>> > > On mer 6 set 2017, 18:25 Sijie Guo <
> guosijie@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >>> > >
> > > > > > > > >>> > >> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <
> > > > > > eolivelli@gmail.com>
> > > > > > > > >>> wrote:
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> Thank you Sijie and Jia for your comments and
> > > > explanations,
> > > > > > > > >>> > >> answers inline
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> 2017-09-06 2:23 GMT+02:00 Jia Zhai <
> > zhaijia03@gmail.com
> > > >:
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> > Thanks a lot Enrico and Sijie for your comments
> and
> > > > > > > information
> > > > > > > > on
> > > > > > > > >>> > this.
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <
> > > > > > > > >>> eolivelli@gmail.com>
> > > > > > > > >>> > >> > wrote:
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> > > Great to see you working on this !
> > > > > > > > >>> > >> > > I would be great to have such feature, as it is
> > the
> > > > > first
> > > > > > > step
> > > > > > > > >>> to a
> > > > > > > > >>> > >> > > 'standalone' BookKeeper mode
> > > > > > > > >>> > >> > >
> > > > > > > > >>> > >> > > Some complementary ideas/first look questions:
> > > > > > > > >>> > >> > > - the document does not talk about security,
> IMHO
> > we
> > > > > have
> > > > > > at
> > > > > > > > >>> least
> > > > > > > > >>> > to
> > > > > > > > >>> > >> > cover
> > > > > > > > >>> > >> > > authentication and TLS, it would be great to
> > > leverage
> > > > > > > existing
> > > > > > > > >>> > >> > AuthPlugins,
> > > > > > > > >>> > >> > > as they are based on exchanging byte[] (as SASL
> > > wants)
> > > > > > > > >>> > >> > >
> > > > > > > > >>> > >> > [Jia] It is a good idea. We left the security part
> > for
> > > > now
> > > > > > > for a
> > > > > > > > >>> few
> > > > > > > > >>> > >> > reasons. 1) Make this BP more focus on removing
> > > > zookeeper
> > > > > > > > >>> dependencies
> > > > > > > > >>> > >> from
> > > > > > > > >>> > >> > client. 2) It is introduced as a separated
> > > > implementation
> > > > > of
> > > > > > > > >>> existing
> > > > > > > > >>> > >> > interfaces. So it won’t impact existing security
> > > story.
> > > > > >  And
> > > > > > > > for
> > > > > > > > >>> > sure,
> > > > > > > > >>> > >> We
> > > > > > > > >>> > >> > will add the security part later after this.
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >>
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> I am fine, I am only afraid that we won't be able to
> > > > support
> > > > > > it
> > > > > > > in
> > > > > > > > >>> the
> > > > > > > > >>> > >> (near) future,
> > > > > > > > >>> > >> maybe you could just only cite the security story
> and
> > > add
> > > > > some
> > > > > > > > >>> reference
> > > > > > > > >>> > >> to
> > > > > > > > >>> > >> how we would deal with it in future
> > > > > > > > >>> > >>
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> The new ledger manager will be first marked as
> > > > experimental,
> > > > > > > until
> > > > > > > > >>> it is
> > > > > > > > >>> > >> stable and have security feature.
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> How does that sound?
> > > > > > > > >>> > >>
> > > > > > > > >>> > >
> > > > > > > > >>> > > Ok
> > > > > > > > >>> > >
> > > > > > > > >>> > >>
> > > > > > > > >>> > >>
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> > - do we have some kind of "bootstrap servers list"
> > > > > > > configuration
> > > > > > > > >>> > option
> > > > > > > > >>> > >> ?
> > > > > > > > >>> > >> > > the list should be complete or just a subset of
> > > > bookies
> > > > > ?
> > > > > > at
> > > > > > > > >>> > >> connection
> > > > > > > > >>> > >> > the
> > > > > > > > >>> > >> > > client could discover the list of other bookies
> > > > > > > > >>> > >> > >
> > > > > > > > >>> > >> > [Jia] Yes, we will have a `clientBootstrapBookies`
> > > > > settings
> > > > > > in
> > > > > > > > the
> > > > > > > > >>> > >> server
> > > > > > > > >>> > >> > set. It can be a list of bookies or just simple a
> > DNS
> > > > over
> > > > > > the
> > > > > > > > >>> > bookies.
> > > > > > > > >>> > >> > Will add this to the BP
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> > - will the client connect to only one bookie at a
> > > time ?
> > > > > how
> > > > > > > we
> > > > > > > > >>> will
> > > > > > > > >>> > >> deal
> > > > > > > > >>> > >> > > with errors ?
> > > > > > > > >>> > >> > >
> > > > > > > > >>> > >> > [Jia] It will connect the the list of bootstrap
> > > servers.
> > > > > > gPRC
> > > > > > > > will
> > > > > > > > >>> > load
> > > > > > > > >>> > >> > balance the requests and manage the connection
> > errors.
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> > - should the bookie write on ZK metadata its gRPC
> > > > endpoint
> > > > > > > info
> > > > > > > > ?
> > > > > > > > >>> > (this
> > > > > > > > >>> > >> > > will be useful for a bookie to tell about other
> > > > bookies
> > > > > to
> > > > > > > the
> > > > > > > > >>> > >> connected
> > > > > > > > >>> > >> > > clients)
> > > > > > > > >>> > >> > >
> > > > > > > > >>> > >> > [Jia]No, it won’t. We don’t see a strong reason to
> > add
> > > > it.
> > > > > > > > >>> Especially
> > > > > > > > >>> > >> > eventually we may eliminate zookeeper completely.
> > > > > > > > >>> > >> > It can be a fixed port `3281`, or in a
> > scheduler-based
> > > > > > > > >>> environment, it
> > > > > > > > >>> > >> is
> > > > > > > > >>> > >> > very easy to have a load balancer sitting in front
> > of
> > > > > those
> > > > > > > > >>> bookies.
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> I think a fixed port is not a good way.
> > > > > > > > >>> > >> You will not be able to run more than one bookie on
> a
> > > > single
> > > > > > > host.
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> We should support:
> > > > > > > > >>> > >> - configurable port
> > > > > > > > >>> > >> - ephemeral port for tests
> > > > > > > > >>> > >>
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> I think what Jia means is a configurable port, but
> it
> > > is a
> > > > > > > > >>> relatively
> > > > > > > > >>> > >> fixed
> > > > > > > > >>> > >> port, which client doesn't discover this port from
> > > > > zookeeper.
> > > > > > > > >>> > >>
> > > > > > > > >>> > >
> > > > > > > > >>> > > Very good
> > > > > > > > >>> > >
> > > > > > > > >>> > >>
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> Ideally I would like to have the local transport
> > option,
> > > > in
> > > > > > > order
> > > > > > > > to
> > > > > > > > >>> > have
> > > > > > > > >>> > >> a
> > > > > > > > >>> > >> single JVM, but this is not a blocker problem, as we
> > are
> > > > > > running
> > > > > > > > >>> gRPC on
> > > > > > > > >>> > >> netty it should be feasible or we can create some
> kind
> > > of
> > > > > > > > >>> short-circut
> > > > > > > > >>> > >> between the client and the Bookie
> > > > > > > > >>> > >>
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> GRPC supports inprocess channel. So you don't need
> to
> > > use
> > > > > the
> > > > > > > low
> > > > > > > > >>> level
> > > > > > > > >>> > >> netty settings.
> > > > > > > > >>> > >>
> > > > > > > > >>> > >
> > > > > > > > >>> > > Great
> > > > > > > > >>> > >
> > > > > > > > >>> > > So it sounds all good to me thanks
> > > > > > > > >>> > >
> > > > > > > > >>> > > Enrico
> > > > > > > > >>> > >
> > > > > > > > >>> > >
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> I am OK for not writing this to the bookie metadata,
> > > > leaving
> > > > > > up
> > > > > > > to
> > > > > > > > >>> the
> > > > > > > > >>> > >> client have a configured list of bookies enabled to
> > > > metadata
> > > > > > > > >>> operations
> > > > > > > > >>> > >>
> > > > > > > > >>> > >>
> > > > > > > > >>> > >>
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> > - the bookie will be somehow a proxy for
> zookeeper,
> > I
> > > > > think
> > > > > > > that
> > > > > > > > >>> the
> > > > > > > > >>> > >> > > 'watch' part is the more complex, we will have
> to
> > > deal
> > > > > > with
> > > > > > > > >>> > >> > reconnections,
> > > > > > > > >>> > >> > > errors....maybe it is worth to write more detail
> > > about
> > > > > > this
> > > > > > > > >>> > >> > >
> > > > > > > > >>> > >> > [Jia] The `watch` API is using the `streaming` rpc
> > in
> > > > > gRPC.
> > > > > > It
> > > > > > > > is
> > > > > > > > >>> a
> > > > > > > > >>> > >> > straightforward proxy behavior, if a connection is
> > > > broken,
> > > > > > the
> > > > > > > > >>> client
> > > > > > > > >>> > >> will
> > > > > > > > >>> > >> > simply retry on watching again.
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> > > Minor issues:
> > > > > > > > >>> > >> > > - Maybe you can consider using ledgerId and not
> > > > > ledger_id,
> > > > > > > > like
> > > > > > > > >>> in
> > > > > > > > >>> > >> > > LedgerMetadataFormat we are using lastEntryId
> > > > > > > > >>> > >> > >
> > > > > > > > >>> > >> > [Jia] Thanks, It is a protobuf style. The protobuf
> > > will
> > > > > > > convert
> > > > > > > > >>> > >> `ledger_id`
> > > > > > > > >>> > >> > to `ledgerId`. We don’t need to worry about this.
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> got it, thanks
> > > > > > > > >>> > >>
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> > > -In the "motivation" part you write that the
> fact
> > > the
> > > > > > having
> > > > > > > > >>> more
> > > > > > > > >>> > >> clients
> > > > > > > > >>> > >> > > than the number of bookies would be a problem
> for
> > > > > > zookeeper,
> > > > > > > > >>> > actually
> > > > > > > > >>> > >> > > zookeeper is very good at dealing with a huge
> > number
> > > > of
> > > > > > > > clients.
> > > > > > > > >>> > >> > Actually I
> > > > > > > > >>> > >> > > am always running clusters with 3-5 bookies and
> > > 10-100
> > > > > > > writing
> > > > > > > > >>> > clients
> > > > > > > > >>> > >> > and
> > > > > > > > >>> > >> > > this has never given troubles
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> > [Jia] :) Seems “10-100 writing clients” is not “a
> > huge
> > > > > > number
> > > > > > > of
> > > > > > > > >>> > >> clients”.
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> OK, I agree with you an Sijie, I have no experience
> of
> > > > > larger
> > > > > > > > >>> clusters
> > > > > > > > >>> > >>
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> > >
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> > > Future:
> > > > > > > > >>> > >> > > - as bookies will be proxies maybe we should
> take
> > > care
> > > > > not
> > > > > > > to
> > > > > > > > >>> > >> overwhelm
> > > > > > > > >>> > >> a
> > > > > > > > >>> > >> > > bookie with too many clients
> > > > > > > > >>> > >> > >
> > > > > > > > >>> > >> > [Jia] First, gRPC is based on Netty, the protocol
> is
> > > > > http2,
> > > > > > so
> > > > > > > > the
> > > > > > > > >>> > >> > connection is multiplexed. We don’t need to worry
> > > about
> > > > > > > > connection
> > > > > > > > >>> > >> count.
> > > > > > > > >>> > >> > Second, all the bookies are treated equally for
> the
> > > > > metadata
> > > > > > > > >>> > operations,
> > > > > > > > >>> > >> > gRPC will load balancing the requests across the
> > > > bookies.
> > > > > We
> > > > > > > > don’t
> > > > > > > > >>> > need
> > > > > > > > >>> > >> to
> > > > > > > > >>> > >> > worry about some bookies are overwhelmed.
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> gRPC sounds great
> > > > > > > > >>> > >>
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> > > - iteration on ledgers, sometimes the clients
> > > > enumerates
> > > > > > > > >>> ledgers but
> > > > > > > > >>> > >> it
> > > > > > > > >>> > >> > is
> > > > > > > > >>> > >> > > not interested in having all of them, as we are
> > > using
> > > > > the
> > > > > > > > >>> bookie as
> > > > > > > > >>> > >> proxy
> > > > > > > > >>> > >> > > maybe some kind of "filter" (at least on custom
> > > > > metadata)
> > > > > > > > would
> > > > > > > > >>> be
> > > > > > > > >>> > >> create
> > > > > > > > >>> > >> > > to limit the number of returned items. Other
> > point I
> > > > > don't
> > > > > > > > know
> > > > > > > > >>> gRPC
> > > > > > > > >>> > >> but
> > > > > > > > >>> > >> > it
> > > > > > > > >>> > >> > > does not seems to be very clear how to 'stop'
> the
> > > > > > iteration
> > > > > > > > >>> > >> > >
> > > > > > > > >>> > >> > [Jia] Thanks, We can add it later. For now, we
> would
> > > > like
> > > > > to
> > > > > > > > >>> focus on
> > > > > > > > >>> > >> > adding the features the ledger manager needs.
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> Yup
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> -- Enrico
> > > > > > > > >>> > >>
> > > > > > > > >>> > >>
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >> > >
> > > > > > > > >>> > >> > > -- Enrico
> > > > > > > > >>> > >> > >
> > > > > > > > >>> > >> > >
> > > > > > > > >>> > >> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <
> > > > > zhaijia03@gmail.com
> > > > > > >:
> > > > > > > > >>> > >> > >
> > > > > > > > >>> > >> > > > Hi all,
> > > > > > > > >>> > >> > > >
> > > > > > > > >>> > >> > > > I have just posted a proposal to remove
> > zookeeper
> > > > > > > dependency
> > > > > > > > >>> from
> > > > > > > > >>> > >> > > > bookkeeper client, to make bookkeeper client a
> > > thin
> > > > > > > client:
> > > > > > > > >>> > >> > > >
> > > > > > > > >>> > >> > > > https://cwiki.apache.org/
> > > > > confluence/display/BOOKKEEPER/
> > > > > > > > >>> > >> > > > BP-16%3A+remove+zookeeper+
> > > > dependency+from+bookkeeper+
> > > > > > > client
> > > > > > > > >>> > >> > > >
> > > > > > > > >>> > >> > > >
> > > > > > > > >>> > >> > > > BookKeeper uses zookeeper for service
> discovery
> > > > > > > (discovering
> > > > > > > > >>> the
> > > > > > > > >>> > >> > > available
> > > > > > > > >>> > >> > > > bookies in the cluster), metadata management
> > > > (storing
> > > > > > all
> > > > > > > > the
> > > > > > > > >>> > >> metadata
> > > > > > > > >>> > >> > > for
> > > > > > > > >>> > >> > > > ledgers). However it exposes the metadata
> > storage
> > > > > > directly
> > > > > > > > to
> > > > > > > > >>> the
> > > > > > > > >>> > >> > > clients,
> > > > > > > > >>> > >> > > > making bookkeeper client a very thick client.
> It
> > > > also
> > > > > > > > exposes
> > > > > > > > >>> some
> > > > > > > > >>> > >> > > > problems.
> > > > > > > > >>> > >> > > >
> > > > > > > > >>> > >> > > > This BP explores the possibility of
> eliminating
> > > > > > zookeeper
> > > > > > > > >>> > completely
> > > > > > > > >>> > >> > from
> > > > > > > > >>> > >> > > > client side, to produce a thin bookkeeper
> > client.
> > > > > > > > >>> > >> > > >
> > > > > > > > >>> > >> > > > I will send a patch as soon as we agree on the
> > > > > proposal.
> > > > > > > > >>> > >> > > >
> > > > > > > > >>> > >> > > >
> > > > > > > > >>> > >> > > > Thanks.
> > > > > > > > >>> > >> > > >
> > > > > > > > >>> > >> > > > -Jia
> > > > > > > > >>> > >> > > >
> > > > > > > > >>> > >> > >
> > > > > > > > >>> > >> >
> > > > > > > > >>> > >>
> > > > > > > > >>> > > --
> > > > > > > > >>> > >
> > > > > > > > >>> > >
> > > > > > > > >>> > > -- Enrico Olivelli
> > > > > > > > >>> > >
> > > > > > > > >>> > --
> > > > > > > > >>> >
> > > > > > > > >>> >
> > > > > > > > >>> > -- Enrico Olivelli
> > > > > > > > >>> >
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > --
> >
> >
> > -- Enrico Olivelli
> >
>
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Venkateswara Rao Jujjuri <ju...@gmail.com>.
If the real use of this work item is to use:
-  Only one bookie
- Bookie is the metadata server too

Maybe it is ok. But I am worried about the *magnitude* of the code refactor
to accommodate this.
I really like the idea of finding an alternate to ZK, but I think we need
to have that direction ironed out
before rushing with the first step without knowing what the next step is.
Again, I am worried about the
code refactor and resulting regression.

Removing ZK is one part, but changing from centralized metadata server
(ZK/etcd kind) model to distributing
metadata across bookies in the cluster is a HUGE leap. We are looking at
implementing our own version of Paxos.

It is possible I am not understanding the full intent here and would love
to have more discussion.
Can we have more discussion on Thursday call again?

Thanks,
JV



On Sat, Sep 16, 2017 at 3:17 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> Yep
> Thank you Jia
>
> Enrico
>
> On sab 16 set 2017, 12:10 Jia Zhai <zh...@gmail.com> wrote:
>
> > Since there is not objective. would like to make this BP approved.
> >
> > On Wed, Sep 13, 2017 at 4:24 PM, Sijie Guo <gu...@gmail.com> wrote:
> >
> > > On Wed, Sep 13, 2017 at 1:18 AM, Enrico Olivelli <eo...@gmail.com>
> > > wrote:
> > >
> > > > 2017-09-13 10:10 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > > >
> > > > > On Wed, Sep 13, 2017 at 12:16 AM, Enrico Olivelli <
> > eolivelli@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > I think that this is a good direction to go.
> > > > > >
> > > > > > I believe to the reasons about ZK in huge systems even it is not
> my
> > > > case
> > > > > so
> > > > > > I cannot add comments on this usecase.
> > > > > >
> > > > > > I am fine with direction as long as we are still going to support
> > > > > > ZooKeeper.
> > > > > > BookKeeper is in the Hadoop / ZooKeeper ecosystem and several
> > > products
> > > > > rely
> > > > > > on ZK too, for instance in my systems it is usual to have
> > > > > > BookKeeper/Kafka/HBase/Majordodo....  and so I am not going to
> live
> > > > > > without
> > > > > > zookeeper in the short/mid term.
> > > > > >
> > > > > > I am really OK in dropping ZK because for "simple" systems in
> fact
> > > when
> > > > > you
> > > > > > need only BK having the burden of setting up a zookeeper server
> is
> > > > weird
> > > > > > for customers. I usually re-distribute BK + ZK with my
> applications
> > > and
> > > > > we
> > > > > > are talking about little clusters of up to 10 machines.
> > > > > >
> > > > >
> > > > > Just to clarify - we are not dropping ZK here. we are just
> proposing
> > to
> > > > > have a ledger manager implementation that doesn't depend on
> zookeeper
> > > > > directly.
> > > > > We are not modifying any existing ledger manager implementation.
> > > > >
> > > >
> > > >
> > > > Yep, we are on the same page
> > > > for this proposal the bookie will be a sort of "proxy" between the
> > client
> > > > and the actual ledger manager implementation which will "live" inside
> > the
> > > > bookie
> > > > it is only a new ledger manager to be used in clients, this ledger
> > > manager
> > > > will issue RPCs (or kind of "streaming" RPCs) to a list of bookies
> > > >
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > The direction on this proposal is OK for me and it is very like
> the
> > > > work
> > > > > I
> > > > > > was starting about "standalone mode".
> > > > >
> > > > >
> > > > > > I think it will be very easy to support the case of having a
> single
> > > > > bookie
> > > > > > with this approach or even client+ bookie in the same JVM,
> > > > > > Having multiple bookies will make us to add some other
> coordination
> > > > > > facility between bookies, I would like to know if there is
> already
> > > some
> > > > > > idea about this, are we going to use another product like
> > > etcd,jgroups
> > > > or
> > > > > > implement our own coordination protocol ?
> > > > >
> > > > >
> > > > > we are not replacing A with B, even with zookeeper. the ledger
> > > management
> > > > > is already abstracted in interfaces.
> > > > > the users can use whatever system they prefer as the metadata
> store.
> > > > >
> > > > > our direction is to provide an option to store metadata as well as
> > data
> > > > in
> > > > > bookies. so in this option, there is no external metadata storage
> > > needed.
> > > > >
> > > >
> > > > Sorry. Maybe my curiosity is not clear.
> > > > If you have multiple bookies and each bookie holds its own version of
> > > > metadata, how do you coordinate them ? which will be the source of
> > truth
> > > ?
> > > > Maybe we should start a new email thread in the future to talk about
> > > > "alternative distributed metadata storages"
> > > >
> > >
> > > It is out of the scope of this BP. We will have a next BP to cover this
> > > part.
> > >
> > >
> > >
> > >
> > > >
> > > > Any way the meaning and the scope of the proposal is clear to me and
> I
> > am
> > > > really OK with it, I hope it will get soon approved
> > > >
> > > > -- Enrico
> > > >
> > > >
> > > > >
> > > > >
> > > > > > ZK is simple but it very
> > > > > > effective.
> > > > >
> > > > > Maybe we could help the ZK community to move forward and resolve
> > > > > > the problems we are bringing to light
> > > > > >
> > > > > >
> > > > > > Enrico
> > > > > >
> > > > > >
> > > > > > 2017-09-13 3:15 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> > > > > >
> > > > > > > Any thoughts or comments
> > > > > > > :)
> > > > > > >
> > > > > > > Thanks a lot.
> > > > > > > -Jia
> > > > > > >
> > > > > > > On Tue, Sep 12, 2017 at 4:30 PM, Jia Zhai <zhaijia03@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > This blog: https://bitworks.software/
> > > > blog/en/2017-07-12-replicated-
> > > > > > > > scalable-commitlog-with-apachebookkeeper.html, which also
> refer
> > > a
> > > > > > little
> > > > > > > > the limitation of zookeeper in bookkeeper
> > > > > > > >
> > > > > > > > On Thu, Sep 7, 2017 at 9:45 AM, Jia Zhai <
> zhaijia03@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > >> đź‘Ť. Thanks a lot for the suggestions and feed back.
> > > > > > > >>
> > > > > > > >> On Thu, Sep 7, 2017 at 4:24 AM, Sijie Guo <
> guosijie@gmail.com
> > >
> > > > > wrote:
> > > > > > > >>
> > > > > > > >>> On Wed, Sep 6, 2017 at 1:07 PM, Enrico Olivelli <
> > > > > eolivelli@gmail.com
> > > > > > >
> > > > > > > >>> wrote:
> > > > > > > >>>
> > > > > > > >>> > Off topic curiosity... Jia and Sijie, do you think we are
> > > going
> > > > > to
> > > > > > > >>> drop ZK
> > > > > > > >>> > from DL too?
> > > > > > > >>> >
> > > > > > > >>>
> > > > > > > >>> Yes. That's the goal - 1) for large deployment, we are
> trying
> > > to
> > > > > > > overcome
> > > > > > > >>> the limitation of zookeeper; 2) for smaller deployments, it
> > > will
> > > > > make
> > > > > > > >>> deployment much easier, you just need to deploy a cluster
> of
> > > > > bookies.
> > > > > > > >>> once
> > > > > > > >>> it is done, you can use ledger api or log stream api to
> > access
> > > > the
> > > > > > > >>> bookkeeper cluster.
> > > > > > > >>>
> > > > > > > >>> Both DL and BK are metadata storage pluggable. They have
> very
> > > > clear
> > > > > > > >>> interfaces on defining metadata operations. So it is
> > > > > straightforward
> > > > > > to
> > > > > > > >>> use
> > > > > > > >>> a different metadata storage.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> > Enrico
> > > > > > > >>> >
> > > > > > > >>> > On mer 6 set 2017, 19:51 Enrico Olivelli <
> > > eolivelli@gmail.com>
> > > > > > > wrote:
> > > > > > > >>> >
> > > > > > > >>> > >
> > > > > > > >>> > >
> > > > > > > >>> > > On mer 6 set 2017, 18:25 Sijie Guo <guosijie@gmail.com
> >
> > > > wrote:
> > > > > > > >>> > >
> > > > > > > >>> > >> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <
> > > > > eolivelli@gmail.com>
> > > > > > > >>> wrote:
> > > > > > > >>> > >>
> > > > > > > >>> > >> Thank you Sijie and Jia for your comments and
> > > explanations,
> > > > > > > >>> > >> answers inline
> > > > > > > >>> > >>
> > > > > > > >>> > >> 2017-09-06 2:23 GMT+02:00 Jia Zhai <
> zhaijia03@gmail.com
> > >:
> > > > > > > >>> > >>
> > > > > > > >>> > >> > Thanks a lot Enrico and Sijie for your comments and
> > > > > > information
> > > > > > > on
> > > > > > > >>> > this.
> > > > > > > >>> > >> >
> > > > > > > >>> > >> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <
> > > > > > > >>> eolivelli@gmail.com>
> > > > > > > >>> > >> > wrote:
> > > > > > > >>> > >> >
> > > > > > > >>> > >> > > Great to see you working on this !
> > > > > > > >>> > >> > > I would be great to have such feature, as it is
> the
> > > > first
> > > > > > step
> > > > > > > >>> to a
> > > > > > > >>> > >> > > 'standalone' BookKeeper mode
> > > > > > > >>> > >> > >
> > > > > > > >>> > >> > > Some complementary ideas/first look questions:
> > > > > > > >>> > >> > > - the document does not talk about security, IMHO
> we
> > > > have
> > > > > at
> > > > > > > >>> least
> > > > > > > >>> > to
> > > > > > > >>> > >> > cover
> > > > > > > >>> > >> > > authentication and TLS, it would be great to
> > leverage
> > > > > > existing
> > > > > > > >>> > >> > AuthPlugins,
> > > > > > > >>> > >> > > as they are based on exchanging byte[] (as SASL
> > wants)
> > > > > > > >>> > >> > >
> > > > > > > >>> > >> > [Jia] It is a good idea. We left the security part
> for
> > > now
> > > > > > for a
> > > > > > > >>> few
> > > > > > > >>> > >> > reasons. 1) Make this BP more focus on removing
> > > zookeeper
> > > > > > > >>> dependencies
> > > > > > > >>> > >> from
> > > > > > > >>> > >> > client. 2) It is introduced as a separated
> > > implementation
> > > > of
> > > > > > > >>> existing
> > > > > > > >>> > >> > interfaces. So it won’t impact existing security
> > story.
> > > > >  And
> > > > > > > for
> > > > > > > >>> > sure,
> > > > > > > >>> > >> We
> > > > > > > >>> > >> > will add the security part later after this.
> > > > > > > >>> > >> >
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >> I am fine, I am only afraid that we won't be able to
> > > support
> > > > > it
> > > > > > in
> > > > > > > >>> the
> > > > > > > >>> > >> (near) future,
> > > > > > > >>> > >> maybe you could just only cite the security story and
> > add
> > > > some
> > > > > > > >>> reference
> > > > > > > >>> > >> to
> > > > > > > >>> > >> how we would deal with it in future
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >> The new ledger manager will be first marked as
> > > experimental,
> > > > > > until
> > > > > > > >>> it is
> > > > > > > >>> > >> stable and have security feature.
> > > > > > > >>> > >>
> > > > > > > >>> > >> How does that sound?
> > > > > > > >>> > >>
> > > > > > > >>> > >
> > > > > > > >>> > > Ok
> > > > > > > >>> > >
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >> >
> > > > > > > >>> > >> > - do we have some kind of "bootstrap servers list"
> > > > > > configuration
> > > > > > > >>> > option
> > > > > > > >>> > >> ?
> > > > > > > >>> > >> > > the list should be complete or just a subset of
> > > bookies
> > > > ?
> > > > > at
> > > > > > > >>> > >> connection
> > > > > > > >>> > >> > the
> > > > > > > >>> > >> > > client could discover the list of other bookies
> > > > > > > >>> > >> > >
> > > > > > > >>> > >> > [Jia] Yes, we will have a `clientBootstrapBookies`
> > > > settings
> > > > > in
> > > > > > > the
> > > > > > > >>> > >> server
> > > > > > > >>> > >> > set. It can be a list of bookies or just simple a
> DNS
> > > over
> > > > > the
> > > > > > > >>> > bookies.
> > > > > > > >>> > >> > Will add this to the BP
> > > > > > > >>> > >> >
> > > > > > > >>> > >> > - will the client connect to only one bookie at a
> > time ?
> > > > how
> > > > > > we
> > > > > > > >>> will
> > > > > > > >>> > >> deal
> > > > > > > >>> > >> > > with errors ?
> > > > > > > >>> > >> > >
> > > > > > > >>> > >> > [Jia] It will connect the the list of bootstrap
> > servers.
> > > > > gPRC
> > > > > > > will
> > > > > > > >>> > load
> > > > > > > >>> > >> > balance the requests and manage the connection
> errors.
> > > > > > > >>> > >> >
> > > > > > > >>> > >> > - should the bookie write on ZK metadata its gRPC
> > > endpoint
> > > > > > info
> > > > > > > ?
> > > > > > > >>> > (this
> > > > > > > >>> > >> > > will be useful for a bookie to tell about other
> > > bookies
> > > > to
> > > > > > the
> > > > > > > >>> > >> connected
> > > > > > > >>> > >> > > clients)
> > > > > > > >>> > >> > >
> > > > > > > >>> > >> > [Jia]No, it won’t. We don’t see a strong reason to
> add
> > > it.
> > > > > > > >>> Especially
> > > > > > > >>> > >> > eventually we may eliminate zookeeper completely.
> > > > > > > >>> > >> > It can be a fixed port `3281`, or in a
> scheduler-based
> > > > > > > >>> environment, it
> > > > > > > >>> > >> is
> > > > > > > >>> > >> > very easy to have a load balancer sitting in front
> of
> > > > those
> > > > > > > >>> bookies.
> > > > > > > >>> > >> >
> > > > > > > >>> > >>
> > > > > > > >>> > >> I think a fixed port is not a good way.
> > > > > > > >>> > >> You will not be able to run more than one bookie on a
> > > single
> > > > > > host.
> > > > > > > >>> > >>
> > > > > > > >>> > >> We should support:
> > > > > > > >>> > >> - configurable port
> > > > > > > >>> > >> - ephemeral port for tests
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >> I think what Jia means is a configurable port, but it
> > is a
> > > > > > > >>> relatively
> > > > > > > >>> > >> fixed
> > > > > > > >>> > >> port, which client doesn't discover this port from
> > > > zookeeper.
> > > > > > > >>> > >>
> > > > > > > >>> > >
> > > > > > > >>> > > Very good
> > > > > > > >>> > >
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >> Ideally I would like to have the local transport
> option,
> > > in
> > > > > > order
> > > > > > > to
> > > > > > > >>> > have
> > > > > > > >>> > >> a
> > > > > > > >>> > >> single JVM, but this is not a blocker problem, as we
> are
> > > > > running
> > > > > > > >>> gRPC on
> > > > > > > >>> > >> netty it should be feasible or we can create some kind
> > of
> > > > > > > >>> short-circut
> > > > > > > >>> > >> between the client and the Bookie
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >> GRPC supports inprocess channel. So you don't need to
> > use
> > > > the
> > > > > > low
> > > > > > > >>> level
> > > > > > > >>> > >> netty settings.
> > > > > > > >>> > >>
> > > > > > > >>> > >
> > > > > > > >>> > > Great
> > > > > > > >>> > >
> > > > > > > >>> > > So it sounds all good to me thanks
> > > > > > > >>> > >
> > > > > > > >>> > > Enrico
> > > > > > > >>> > >
> > > > > > > >>> > >
> > > > > > > >>> > >>
> > > > > > > >>> > >> I am OK for not writing this to the bookie metadata,
> > > leaving
> > > > > up
> > > > > > to
> > > > > > > >>> the
> > > > > > > >>> > >> client have a configured list of bookies enabled to
> > > metadata
> > > > > > > >>> operations
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >> >
> > > > > > > >>> > >> > - the bookie will be somehow a proxy for zookeeper,
> I
> > > > think
> > > > > > that
> > > > > > > >>> the
> > > > > > > >>> > >> > > 'watch' part is the more complex, we will have to
> > deal
> > > > > with
> > > > > > > >>> > >> > reconnections,
> > > > > > > >>> > >> > > errors....maybe it is worth to write more detail
> > about
> > > > > this
> > > > > > > >>> > >> > >
> > > > > > > >>> > >> > [Jia] The `watch` API is using the `streaming` rpc
> in
> > > > gRPC.
> > > > > It
> > > > > > > is
> > > > > > > >>> a
> > > > > > > >>> > >> > straightforward proxy behavior, if a connection is
> > > broken,
> > > > > the
> > > > > > > >>> client
> > > > > > > >>> > >> will
> > > > > > > >>> > >> > simply retry on watching again.
> > > > > > > >>> > >> >
> > > > > > > >>> > >> >
> > > > > > > >>> > >> > > Minor issues:
> > > > > > > >>> > >> > > - Maybe you can consider using ledgerId and not
> > > > ledger_id,
> > > > > > > like
> > > > > > > >>> in
> > > > > > > >>> > >> > > LedgerMetadataFormat we are using lastEntryId
> > > > > > > >>> > >> > >
> > > > > > > >>> > >> > [Jia] Thanks, It is a protobuf style. The protobuf
> > will
> > > > > > convert
> > > > > > > >>> > >> `ledger_id`
> > > > > > > >>> > >> > to `ledgerId`. We don’t need to worry about this.
> > > > > > > >>> > >> >
> > > > > > > >>> > >>
> > > > > > > >>> > >> got it, thanks
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >> >
> > > > > > > >>> > >> >
> > > > > > > >>> > >> > > -In the "motivation" part you write that the fact
> > the
> > > > > having
> > > > > > > >>> more
> > > > > > > >>> > >> clients
> > > > > > > >>> > >> > > than the number of bookies would be a problem for
> > > > > zookeeper,
> > > > > > > >>> > actually
> > > > > > > >>> > >> > > zookeeper is very good at dealing with a huge
> number
> > > of
> > > > > > > clients.
> > > > > > > >>> > >> > Actually I
> > > > > > > >>> > >> > > am always running clusters with 3-5 bookies and
> > 10-100
> > > > > > writing
> > > > > > > >>> > clients
> > > > > > > >>> > >> > and
> > > > > > > >>> > >> > > this has never given troubles
> > > > > > > >>> > >> >
> > > > > > > >>> > >> > [Jia] :) Seems “10-100 writing clients” is not “a
> huge
> > > > > number
> > > > > > of
> > > > > > > >>> > >> clients”.
> > > > > > > >>> > >> >
> > > > > > > >>> > >>
> > > > > > > >>> > >> OK, I agree with you an Sijie, I have no experience of
> > > > larger
> > > > > > > >>> clusters
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >> >
> > > > > > > >>> > >> > >
> > > > > > > >>> > >> >
> > > > > > > >>> > >> >
> > > > > > > >>> > >> >
> > > > > > > >>> > >> > > Future:
> > > > > > > >>> > >> > > - as bookies will be proxies maybe we should take
> > care
> > > > not
> > > > > > to
> > > > > > > >>> > >> overwhelm
> > > > > > > >>> > >> a
> > > > > > > >>> > >> > > bookie with too many clients
> > > > > > > >>> > >> > >
> > > > > > > >>> > >> > [Jia] First, gRPC is based on Netty, the protocol is
> > > > http2,
> > > > > so
> > > > > > > the
> > > > > > > >>> > >> > connection is multiplexed. We don’t need to worry
> > about
> > > > > > > connection
> > > > > > > >>> > >> count.
> > > > > > > >>> > >> > Second, all the bookies are treated equally for the
> > > > metadata
> > > > > > > >>> > operations,
> > > > > > > >>> > >> > gRPC will load balancing the requests across the
> > > bookies.
> > > > We
> > > > > > > don’t
> > > > > > > >>> > need
> > > > > > > >>> > >> to
> > > > > > > >>> > >> > worry about some bookies are overwhelmed.
> > > > > > > >>> > >> >
> > > > > > > >>> > >>
> > > > > > > >>> > >> gRPC sounds great
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >> >
> > > > > > > >>> > >> >
> > > > > > > >>> > >> > > - iteration on ledgers, sometimes the clients
> > > enumerates
> > > > > > > >>> ledgers but
> > > > > > > >>> > >> it
> > > > > > > >>> > >> > is
> > > > > > > >>> > >> > > not interested in having all of them, as we are
> > using
> > > > the
> > > > > > > >>> bookie as
> > > > > > > >>> > >> proxy
> > > > > > > >>> > >> > > maybe some kind of "filter" (at least on custom
> > > > metadata)
> > > > > > > would
> > > > > > > >>> be
> > > > > > > >>> > >> create
> > > > > > > >>> > >> > > to limit the number of returned items. Other
> point I
> > > > don't
> > > > > > > know
> > > > > > > >>> gRPC
> > > > > > > >>> > >> but
> > > > > > > >>> > >> > it
> > > > > > > >>> > >> > > does not seems to be very clear how to 'stop' the
> > > > > iteration
> > > > > > > >>> > >> > >
> > > > > > > >>> > >> > [Jia] Thanks, We can add it later. For now, we would
> > > like
> > > > to
> > > > > > > >>> focus on
> > > > > > > >>> > >> > adding the features the ledger manager needs.
> > > > > > > >>> > >> >
> > > > > > > >>> > >>
> > > > > > > >>> > >> Yup
> > > > > > > >>> > >>
> > > > > > > >>> > >> -- Enrico
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >> >
> > > > > > > >>> > >> > >
> > > > > > > >>> > >> > > -- Enrico
> > > > > > > >>> > >> > >
> > > > > > > >>> > >> > >
> > > > > > > >>> > >> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <
> > > > zhaijia03@gmail.com
> > > > > >:
> > > > > > > >>> > >> > >
> > > > > > > >>> > >> > > > Hi all,
> > > > > > > >>> > >> > > >
> > > > > > > >>> > >> > > > I have just posted a proposal to remove
> zookeeper
> > > > > > dependency
> > > > > > > >>> from
> > > > > > > >>> > >> > > > bookkeeper client, to make bookkeeper client a
> > thin
> > > > > > client:
> > > > > > > >>> > >> > > >
> > > > > > > >>> > >> > > > https://cwiki.apache.org/
> > > > confluence/display/BOOKKEEPER/
> > > > > > > >>> > >> > > > BP-16%3A+remove+zookeeper+
> > > dependency+from+bookkeeper+
> > > > > > client
> > > > > > > >>> > >> > > >
> > > > > > > >>> > >> > > >
> > > > > > > >>> > >> > > > BookKeeper uses zookeeper for service discovery
> > > > > > (discovering
> > > > > > > >>> the
> > > > > > > >>> > >> > > available
> > > > > > > >>> > >> > > > bookies in the cluster), metadata management
> > > (storing
> > > > > all
> > > > > > > the
> > > > > > > >>> > >> metadata
> > > > > > > >>> > >> > > for
> > > > > > > >>> > >> > > > ledgers). However it exposes the metadata
> storage
> > > > > directly
> > > > > > > to
> > > > > > > >>> the
> > > > > > > >>> > >> > > clients,
> > > > > > > >>> > >> > > > making bookkeeper client a very thick client. It
> > > also
> > > > > > > exposes
> > > > > > > >>> some
> > > > > > > >>> > >> > > > problems.
> > > > > > > >>> > >> > > >
> > > > > > > >>> > >> > > > This BP explores the possibility of eliminating
> > > > > zookeeper
> > > > > > > >>> > completely
> > > > > > > >>> > >> > from
> > > > > > > >>> > >> > > > client side, to produce a thin bookkeeper
> client.
> > > > > > > >>> > >> > > >
> > > > > > > >>> > >> > > > I will send a patch as soon as we agree on the
> > > > proposal.
> > > > > > > >>> > >> > > >
> > > > > > > >>> > >> > > >
> > > > > > > >>> > >> > > > Thanks.
> > > > > > > >>> > >> > > >
> > > > > > > >>> > >> > > > -Jia
> > > > > > > >>> > >> > > >
> > > > > > > >>> > >> > >
> > > > > > > >>> > >> >
> > > > > > > >>> > >>
> > > > > > > >>> > > --
> > > > > > > >>> > >
> > > > > > > >>> > >
> > > > > > > >>> > > -- Enrico Olivelli
> > > > > > > >>> > >
> > > > > > > >>> > --
> > > > > > > >>> >
> > > > > > > >>> >
> > > > > > > >>> > -- Enrico Olivelli
> > > > > > > >>> >
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> --
>
>
> -- Enrico Olivelli
>



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Enrico Olivelli <eo...@gmail.com>.
Yep
Thank you Jia

Enrico

On sab 16 set 2017, 12:10 Jia Zhai <zh...@gmail.com> wrote:

> Since there is not objective. would like to make this BP approved.
>
> On Wed, Sep 13, 2017 at 4:24 PM, Sijie Guo <gu...@gmail.com> wrote:
>
> > On Wed, Sep 13, 2017 at 1:18 AM, Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > 2017-09-13 10:10 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> > >
> > > > On Wed, Sep 13, 2017 at 12:16 AM, Enrico Olivelli <
> eolivelli@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > I think that this is a good direction to go.
> > > > >
> > > > > I believe to the reasons about ZK in huge systems even it is not my
> > > case
> > > > so
> > > > > I cannot add comments on this usecase.
> > > > >
> > > > > I am fine with direction as long as we are still going to support
> > > > > ZooKeeper.
> > > > > BookKeeper is in the Hadoop / ZooKeeper ecosystem and several
> > products
> > > > rely
> > > > > on ZK too, for instance in my systems it is usual to have
> > > > > BookKeeper/Kafka/HBase/Majordodo....  and so I am not going to live
> > > > > without
> > > > > zookeeper in the short/mid term.
> > > > >
> > > > > I am really OK in dropping ZK because for "simple" systems in fact
> > when
> > > > you
> > > > > need only BK having the burden of setting up a zookeeper server is
> > > weird
> > > > > for customers. I usually re-distribute BK + ZK with my applications
> > and
> > > > we
> > > > > are talking about little clusters of up to 10 machines.
> > > > >
> > > >
> > > > Just to clarify - we are not dropping ZK here. we are just proposing
> to
> > > > have a ledger manager implementation that doesn't depend on zookeeper
> > > > directly.
> > > > We are not modifying any existing ledger manager implementation.
> > > >
> > >
> > >
> > > Yep, we are on the same page
> > > for this proposal the bookie will be a sort of "proxy" between the
> client
> > > and the actual ledger manager implementation which will "live" inside
> the
> > > bookie
> > > it is only a new ledger manager to be used in clients, this ledger
> > manager
> > > will issue RPCs (or kind of "streaming" RPCs) to a list of bookies
> > >
> > >
> > > >
> > > >
> > > > >
> > > > > The direction on this proposal is OK for me and it is very like the
> > > work
> > > > I
> > > > > was starting about "standalone mode".
> > > >
> > > >
> > > > > I think it will be very easy to support the case of having a single
> > > > bookie
> > > > > with this approach or even client+ bookie in the same JVM,
> > > > > Having multiple bookies will make us to add some other coordination
> > > > > facility between bookies, I would like to know if there is already
> > some
> > > > > idea about this, are we going to use another product like
> > etcd,jgroups
> > > or
> > > > > implement our own coordination protocol ?
> > > >
> > > >
> > > > we are not replacing A with B, even with zookeeper. the ledger
> > management
> > > > is already abstracted in interfaces.
> > > > the users can use whatever system they prefer as the metadata store.
> > > >
> > > > our direction is to provide an option to store metadata as well as
> data
> > > in
> > > > bookies. so in this option, there is no external metadata storage
> > needed.
> > > >
> > >
> > > Sorry. Maybe my curiosity is not clear.
> > > If you have multiple bookies and each bookie holds its own version of
> > > metadata, how do you coordinate them ? which will be the source of
> truth
> > ?
> > > Maybe we should start a new email thread in the future to talk about
> > > "alternative distributed metadata storages"
> > >
> >
> > It is out of the scope of this BP. We will have a next BP to cover this
> > part.
> >
> >
> >
> >
> > >
> > > Any way the meaning and the scope of the proposal is clear to me and I
> am
> > > really OK with it, I hope it will get soon approved
> > >
> > > -- Enrico
> > >
> > >
> > > >
> > > >
> > > > > ZK is simple but it very
> > > > > effective.
> > > >
> > > > Maybe we could help the ZK community to move forward and resolve
> > > > > the problems we are bringing to light
> > > > >
> > > > >
> > > > > Enrico
> > > > >
> > > > >
> > > > > 2017-09-13 3:15 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> > > > >
> > > > > > Any thoughts or comments
> > > > > > :)
> > > > > >
> > > > > > Thanks a lot.
> > > > > > -Jia
> > > > > >
> > > > > > On Tue, Sep 12, 2017 at 4:30 PM, Jia Zhai <zh...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > This blog: https://bitworks.software/
> > > blog/en/2017-07-12-replicated-
> > > > > > > scalable-commitlog-with-apachebookkeeper.html, which also refer
> > a
> > > > > little
> > > > > > > the limitation of zookeeper in bookkeeper
> > > > > > >
> > > > > > > On Thu, Sep 7, 2017 at 9:45 AM, Jia Zhai <zh...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > >> đź‘Ť. Thanks a lot for the suggestions and feed back.
> > > > > > >>
> > > > > > >> On Thu, Sep 7, 2017 at 4:24 AM, Sijie Guo <guosijie@gmail.com
> >
> > > > wrote:
> > > > > > >>
> > > > > > >>> On Wed, Sep 6, 2017 at 1:07 PM, Enrico Olivelli <
> > > > eolivelli@gmail.com
> > > > > >
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > > >>> > Off topic curiosity... Jia and Sijie, do you think we are
> > going
> > > > to
> > > > > > >>> drop ZK
> > > > > > >>> > from DL too?
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>> Yes. That's the goal - 1) for large deployment, we are trying
> > to
> > > > > > overcome
> > > > > > >>> the limitation of zookeeper; 2) for smaller deployments, it
> > will
> > > > make
> > > > > > >>> deployment much easier, you just need to deploy a cluster of
> > > > bookies.
> > > > > > >>> once
> > > > > > >>> it is done, you can use ledger api or log stream api to
> access
> > > the
> > > > > > >>> bookkeeper cluster.
> > > > > > >>>
> > > > > > >>> Both DL and BK are metadata storage pluggable. They have very
> > > clear
> > > > > > >>> interfaces on defining metadata operations. So it is
> > > > straightforward
> > > > > to
> > > > > > >>> use
> > > > > > >>> a different metadata storage.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> > Enrico
> > > > > > >>> >
> > > > > > >>> > On mer 6 set 2017, 19:51 Enrico Olivelli <
> > eolivelli@gmail.com>
> > > > > > wrote:
> > > > > > >>> >
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>> > > On mer 6 set 2017, 18:25 Sijie Guo <gu...@gmail.com>
> > > wrote:
> > > > > > >>> > >
> > > > > > >>> > >> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <
> > > > eolivelli@gmail.com>
> > > > > > >>> wrote:
> > > > > > >>> > >>
> > > > > > >>> > >> Thank you Sijie and Jia for your comments and
> > explanations,
> > > > > > >>> > >> answers inline
> > > > > > >>> > >>
> > > > > > >>> > >> 2017-09-06 2:23 GMT+02:00 Jia Zhai <zhaijia03@gmail.com
> >:
> > > > > > >>> > >>
> > > > > > >>> > >> > Thanks a lot Enrico and Sijie for your comments and
> > > > > information
> > > > > > on
> > > > > > >>> > this.
> > > > > > >>> > >> >
> > > > > > >>> > >> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <
> > > > > > >>> eolivelli@gmail.com>
> > > > > > >>> > >> > wrote:
> > > > > > >>> > >> >
> > > > > > >>> > >> > > Great to see you working on this !
> > > > > > >>> > >> > > I would be great to have such feature, as it is the
> > > first
> > > > > step
> > > > > > >>> to a
> > > > > > >>> > >> > > 'standalone' BookKeeper mode
> > > > > > >>> > >> > >
> > > > > > >>> > >> > > Some complementary ideas/first look questions:
> > > > > > >>> > >> > > - the document does not talk about security, IMHO we
> > > have
> > > > at
> > > > > > >>> least
> > > > > > >>> > to
> > > > > > >>> > >> > cover
> > > > > > >>> > >> > > authentication and TLS, it would be great to
> leverage
> > > > > existing
> > > > > > >>> > >> > AuthPlugins,
> > > > > > >>> > >> > > as they are based on exchanging byte[] (as SASL
> wants)
> > > > > > >>> > >> > >
> > > > > > >>> > >> > [Jia] It is a good idea. We left the security part for
> > now
> > > > > for a
> > > > > > >>> few
> > > > > > >>> > >> > reasons. 1) Make this BP more focus on removing
> > zookeeper
> > > > > > >>> dependencies
> > > > > > >>> > >> from
> > > > > > >>> > >> > client. 2) It is introduced as a separated
> > implementation
> > > of
> > > > > > >>> existing
> > > > > > >>> > >> > interfaces. So it won’t impact existing security
> story.
> > > >  And
> > > > > > for
> > > > > > >>> > sure,
> > > > > > >>> > >> We
> > > > > > >>> > >> > will add the security part later after this.
> > > > > > >>> > >> >
> > > > > > >>> > >>
> > > > > > >>> > >>
> > > > > > >>> > >> I am fine, I am only afraid that we won't be able to
> > support
> > > > it
> > > > > in
> > > > > > >>> the
> > > > > > >>> > >> (near) future,
> > > > > > >>> > >> maybe you could just only cite the security story and
> add
> > > some
> > > > > > >>> reference
> > > > > > >>> > >> to
> > > > > > >>> > >> how we would deal with it in future
> > > > > > >>> > >>
> > > > > > >>> > >>
> > > > > > >>> > >> The new ledger manager will be first marked as
> > experimental,
> > > > > until
> > > > > > >>> it is
> > > > > > >>> > >> stable and have security feature.
> > > > > > >>> > >>
> > > > > > >>> > >> How does that sound?
> > > > > > >>> > >>
> > > > > > >>> > >
> > > > > > >>> > > Ok
> > > > > > >>> > >
> > > > > > >>> > >>
> > > > > > >>> > >>
> > > > > > >>> > >>
> > > > > > >>> > >> >
> > > > > > >>> > >> > - do we have some kind of "bootstrap servers list"
> > > > > configuration
> > > > > > >>> > option
> > > > > > >>> > >> ?
> > > > > > >>> > >> > > the list should be complete or just a subset of
> > bookies
> > > ?
> > > > at
> > > > > > >>> > >> connection
> > > > > > >>> > >> > the
> > > > > > >>> > >> > > client could discover the list of other bookies
> > > > > > >>> > >> > >
> > > > > > >>> > >> > [Jia] Yes, we will have a `clientBootstrapBookies`
> > > settings
> > > > in
> > > > > > the
> > > > > > >>> > >> server
> > > > > > >>> > >> > set. It can be a list of bookies or just simple a DNS
> > over
> > > > the
> > > > > > >>> > bookies.
> > > > > > >>> > >> > Will add this to the BP
> > > > > > >>> > >> >
> > > > > > >>> > >> > - will the client connect to only one bookie at a
> time ?
> > > how
> > > > > we
> > > > > > >>> will
> > > > > > >>> > >> deal
> > > > > > >>> > >> > > with errors ?
> > > > > > >>> > >> > >
> > > > > > >>> > >> > [Jia] It will connect the the list of bootstrap
> servers.
> > > > gPRC
> > > > > > will
> > > > > > >>> > load
> > > > > > >>> > >> > balance the requests and manage the connection errors.
> > > > > > >>> > >> >
> > > > > > >>> > >> > - should the bookie write on ZK metadata its gRPC
> > endpoint
> > > > > info
> > > > > > ?
> > > > > > >>> > (this
> > > > > > >>> > >> > > will be useful for a bookie to tell about other
> > bookies
> > > to
> > > > > the
> > > > > > >>> > >> connected
> > > > > > >>> > >> > > clients)
> > > > > > >>> > >> > >
> > > > > > >>> > >> > [Jia]No, it won’t. We don’t see a strong reason to add
> > it.
> > > > > > >>> Especially
> > > > > > >>> > >> > eventually we may eliminate zookeeper completely.
> > > > > > >>> > >> > It can be a fixed port `3281`, or in a scheduler-based
> > > > > > >>> environment, it
> > > > > > >>> > >> is
> > > > > > >>> > >> > very easy to have a load balancer sitting in front of
> > > those
> > > > > > >>> bookies.
> > > > > > >>> > >> >
> > > > > > >>> > >>
> > > > > > >>> > >> I think a fixed port is not a good way.
> > > > > > >>> > >> You will not be able to run more than one bookie on a
> > single
> > > > > host.
> > > > > > >>> > >>
> > > > > > >>> > >> We should support:
> > > > > > >>> > >> - configurable port
> > > > > > >>> > >> - ephemeral port for tests
> > > > > > >>> > >>
> > > > > > >>> > >>
> > > > > > >>> > >> I think what Jia means is a configurable port, but it
> is a
> > > > > > >>> relatively
> > > > > > >>> > >> fixed
> > > > > > >>> > >> port, which client doesn't discover this port from
> > > zookeeper.
> > > > > > >>> > >>
> > > > > > >>> > >
> > > > > > >>> > > Very good
> > > > > > >>> > >
> > > > > > >>> > >>
> > > > > > >>> > >>
> > > > > > >>> > >> Ideally I would like to have the local transport option,
> > in
> > > > > order
> > > > > > to
> > > > > > >>> > have
> > > > > > >>> > >> a
> > > > > > >>> > >> single JVM, but this is not a blocker problem, as we are
> > > > running
> > > > > > >>> gRPC on
> > > > > > >>> > >> netty it should be feasible or we can create some kind
> of
> > > > > > >>> short-circut
> > > > > > >>> > >> between the client and the Bookie
> > > > > > >>> > >>
> > > > > > >>> > >>
> > > > > > >>> > >> GRPC supports inprocess channel. So you don't need to
> use
> > > the
> > > > > low
> > > > > > >>> level
> > > > > > >>> > >> netty settings.
> > > > > > >>> > >>
> > > > > > >>> > >
> > > > > > >>> > > Great
> > > > > > >>> > >
> > > > > > >>> > > So it sounds all good to me thanks
> > > > > > >>> > >
> > > > > > >>> > > Enrico
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>> > >>
> > > > > > >>> > >> I am OK for not writing this to the bookie metadata,
> > leaving
> > > > up
> > > > > to
> > > > > > >>> the
> > > > > > >>> > >> client have a configured list of bookies enabled to
> > metadata
> > > > > > >>> operations
> > > > > > >>> > >>
> > > > > > >>> > >>
> > > > > > >>> > >>
> > > > > > >>> > >>
> > > > > > >>> > >> >
> > > > > > >>> > >> > - the bookie will be somehow a proxy for zookeeper, I
> > > think
> > > > > that
> > > > > > >>> the
> > > > > > >>> > >> > > 'watch' part is the more complex, we will have to
> deal
> > > > with
> > > > > > >>> > >> > reconnections,
> > > > > > >>> > >> > > errors....maybe it is worth to write more detail
> about
> > > > this
> > > > > > >>> > >> > >
> > > > > > >>> > >> > [Jia] The `watch` API is using the `streaming` rpc in
> > > gRPC.
> > > > It
> > > > > > is
> > > > > > >>> a
> > > > > > >>> > >> > straightforward proxy behavior, if a connection is
> > broken,
> > > > the
> > > > > > >>> client
> > > > > > >>> > >> will
> > > > > > >>> > >> > simply retry on watching again.
> > > > > > >>> > >> >
> > > > > > >>> > >> >
> > > > > > >>> > >> > > Minor issues:
> > > > > > >>> > >> > > - Maybe you can consider using ledgerId and not
> > > ledger_id,
> > > > > > like
> > > > > > >>> in
> > > > > > >>> > >> > > LedgerMetadataFormat we are using lastEntryId
> > > > > > >>> > >> > >
> > > > > > >>> > >> > [Jia] Thanks, It is a protobuf style. The protobuf
> will
> > > > > convert
> > > > > > >>> > >> `ledger_id`
> > > > > > >>> > >> > to `ledgerId`. We don’t need to worry about this.
> > > > > > >>> > >> >
> > > > > > >>> > >>
> > > > > > >>> > >> got it, thanks
> > > > > > >>> > >>
> > > > > > >>> > >>
> > > > > > >>> > >> >
> > > > > > >>> > >> >
> > > > > > >>> > >> > > -In the "motivation" part you write that the fact
> the
> > > > having
> > > > > > >>> more
> > > > > > >>> > >> clients
> > > > > > >>> > >> > > than the number of bookies would be a problem for
> > > > zookeeper,
> > > > > > >>> > actually
> > > > > > >>> > >> > > zookeeper is very good at dealing with a huge number
> > of
> > > > > > clients.
> > > > > > >>> > >> > Actually I
> > > > > > >>> > >> > > am always running clusters with 3-5 bookies and
> 10-100
> > > > > writing
> > > > > > >>> > clients
> > > > > > >>> > >> > and
> > > > > > >>> > >> > > this has never given troubles
> > > > > > >>> > >> >
> > > > > > >>> > >> > [Jia] :) Seems “10-100 writing clients” is not “a huge
> > > > number
> > > > > of
> > > > > > >>> > >> clients”.
> > > > > > >>> > >> >
> > > > > > >>> > >>
> > > > > > >>> > >> OK, I agree with you an Sijie, I have no experience of
> > > larger
> > > > > > >>> clusters
> > > > > > >>> > >>
> > > > > > >>> > >>
> > > > > > >>> > >> >
> > > > > > >>> > >> > >
> > > > > > >>> > >> >
> > > > > > >>> > >> >
> > > > > > >>> > >> >
> > > > > > >>> > >> > > Future:
> > > > > > >>> > >> > > - as bookies will be proxies maybe we should take
> care
> > > not
> > > > > to
> > > > > > >>> > >> overwhelm
> > > > > > >>> > >> a
> > > > > > >>> > >> > > bookie with too many clients
> > > > > > >>> > >> > >
> > > > > > >>> > >> > [Jia] First, gRPC is based on Netty, the protocol is
> > > http2,
> > > > so
> > > > > > the
> > > > > > >>> > >> > connection is multiplexed. We don’t need to worry
> about
> > > > > > connection
> > > > > > >>> > >> count.
> > > > > > >>> > >> > Second, all the bookies are treated equally for the
> > > metadata
> > > > > > >>> > operations,
> > > > > > >>> > >> > gRPC will load balancing the requests across the
> > bookies.
> > > We
> > > > > > don’t
> > > > > > >>> > need
> > > > > > >>> > >> to
> > > > > > >>> > >> > worry about some bookies are overwhelmed.
> > > > > > >>> > >> >
> > > > > > >>> > >>
> > > > > > >>> > >> gRPC sounds great
> > > > > > >>> > >>
> > > > > > >>> > >>
> > > > > > >>> > >> >
> > > > > > >>> > >> >
> > > > > > >>> > >> > > - iteration on ledgers, sometimes the clients
> > enumerates
> > > > > > >>> ledgers but
> > > > > > >>> > >> it
> > > > > > >>> > >> > is
> > > > > > >>> > >> > > not interested in having all of them, as we are
> using
> > > the
> > > > > > >>> bookie as
> > > > > > >>> > >> proxy
> > > > > > >>> > >> > > maybe some kind of "filter" (at least on custom
> > > metadata)
> > > > > > would
> > > > > > >>> be
> > > > > > >>> > >> create
> > > > > > >>> > >> > > to limit the number of returned items. Other point I
> > > don't
> > > > > > know
> > > > > > >>> gRPC
> > > > > > >>> > >> but
> > > > > > >>> > >> > it
> > > > > > >>> > >> > > does not seems to be very clear how to 'stop' the
> > > > iteration
> > > > > > >>> > >> > >
> > > > > > >>> > >> > [Jia] Thanks, We can add it later. For now, we would
> > like
> > > to
> > > > > > >>> focus on
> > > > > > >>> > >> > adding the features the ledger manager needs.
> > > > > > >>> > >> >
> > > > > > >>> > >>
> > > > > > >>> > >> Yup
> > > > > > >>> > >>
> > > > > > >>> > >> -- Enrico
> > > > > > >>> > >>
> > > > > > >>> > >>
> > > > > > >>> > >> >
> > > > > > >>> > >> > >
> > > > > > >>> > >> > > -- Enrico
> > > > > > >>> > >> > >
> > > > > > >>> > >> > >
> > > > > > >>> > >> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <
> > > zhaijia03@gmail.com
> > > > >:
> > > > > > >>> > >> > >
> > > > > > >>> > >> > > > Hi all,
> > > > > > >>> > >> > > >
> > > > > > >>> > >> > > > I have just posted a proposal to remove zookeeper
> > > > > dependency
> > > > > > >>> from
> > > > > > >>> > >> > > > bookkeeper client, to make bookkeeper client a
> thin
> > > > > client:
> > > > > > >>> > >> > > >
> > > > > > >>> > >> > > > https://cwiki.apache.org/
> > > confluence/display/BOOKKEEPER/
> > > > > > >>> > >> > > > BP-16%3A+remove+zookeeper+
> > dependency+from+bookkeeper+
> > > > > client
> > > > > > >>> > >> > > >
> > > > > > >>> > >> > > >
> > > > > > >>> > >> > > > BookKeeper uses zookeeper for service discovery
> > > > > (discovering
> > > > > > >>> the
> > > > > > >>> > >> > > available
> > > > > > >>> > >> > > > bookies in the cluster), metadata management
> > (storing
> > > > all
> > > > > > the
> > > > > > >>> > >> metadata
> > > > > > >>> > >> > > for
> > > > > > >>> > >> > > > ledgers). However it exposes the metadata storage
> > > > directly
> > > > > > to
> > > > > > >>> the
> > > > > > >>> > >> > > clients,
> > > > > > >>> > >> > > > making bookkeeper client a very thick client. It
> > also
> > > > > > exposes
> > > > > > >>> some
> > > > > > >>> > >> > > > problems.
> > > > > > >>> > >> > > >
> > > > > > >>> > >> > > > This BP explores the possibility of eliminating
> > > > zookeeper
> > > > > > >>> > completely
> > > > > > >>> > >> > from
> > > > > > >>> > >> > > > client side, to produce a thin bookkeeper client.
> > > > > > >>> > >> > > >
> > > > > > >>> > >> > > > I will send a patch as soon as we agree on the
> > > proposal.
> > > > > > >>> > >> > > >
> > > > > > >>> > >> > > >
> > > > > > >>> > >> > > > Thanks.
> > > > > > >>> > >> > > >
> > > > > > >>> > >> > > > -Jia
> > > > > > >>> > >> > > >
> > > > > > >>> > >> > >
> > > > > > >>> > >> >
> > > > > > >>> > >>
> > > > > > >>> > > --
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>> > > -- Enrico Olivelli
> > > > > > >>> > >
> > > > > > >>> > --
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>> > -- Enrico Olivelli
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
-- 


-- Enrico Olivelli

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Jia Zhai <zh...@gmail.com>.
Since there is not objective. would like to make this BP approved.

On Wed, Sep 13, 2017 at 4:24 PM, Sijie Guo <gu...@gmail.com> wrote:

> On Wed, Sep 13, 2017 at 1:18 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > 2017-09-13 10:10 GMT+02:00 Sijie Guo <gu...@gmail.com>:
> >
> > > On Wed, Sep 13, 2017 at 12:16 AM, Enrico Olivelli <eolivelli@gmail.com
> >
> > > wrote:
> > >
> > > > I think that this is a good direction to go.
> > > >
> > > > I believe to the reasons about ZK in huge systems even it is not my
> > case
> > > so
> > > > I cannot add comments on this usecase.
> > > >
> > > > I am fine with direction as long as we are still going to support
> > > > ZooKeeper.
> > > > BookKeeper is in the Hadoop / ZooKeeper ecosystem and several
> products
> > > rely
> > > > on ZK too, for instance in my systems it is usual to have
> > > > BookKeeper/Kafka/HBase/Majordodo....  and so I am not going to live
> > > > without
> > > > zookeeper in the short/mid term.
> > > >
> > > > I am really OK in dropping ZK because for "simple" systems in fact
> when
> > > you
> > > > need only BK having the burden of setting up a zookeeper server is
> > weird
> > > > for customers. I usually re-distribute BK + ZK with my applications
> and
> > > we
> > > > are talking about little clusters of up to 10 machines.
> > > >
> > >
> > > Just to clarify - we are not dropping ZK here. we are just proposing to
> > > have a ledger manager implementation that doesn't depend on zookeeper
> > > directly.
> > > We are not modifying any existing ledger manager implementation.
> > >
> >
> >
> > Yep, we are on the same page
> > for this proposal the bookie will be a sort of "proxy" between the client
> > and the actual ledger manager implementation which will "live" inside the
> > bookie
> > it is only a new ledger manager to be used in clients, this ledger
> manager
> > will issue RPCs (or kind of "streaming" RPCs) to a list of bookies
> >
> >
> > >
> > >
> > > >
> > > > The direction on this proposal is OK for me and it is very like the
> > work
> > > I
> > > > was starting about "standalone mode".
> > >
> > >
> > > > I think it will be very easy to support the case of having a single
> > > bookie
> > > > with this approach or even client+ bookie in the same JVM,
> > > > Having multiple bookies will make us to add some other coordination
> > > > facility between bookies, I would like to know if there is already
> some
> > > > idea about this, are we going to use another product like
> etcd,jgroups
> > or
> > > > implement our own coordination protocol ?
> > >
> > >
> > > we are not replacing A with B, even with zookeeper. the ledger
> management
> > > is already abstracted in interfaces.
> > > the users can use whatever system they prefer as the metadata store.
> > >
> > > our direction is to provide an option to store metadata as well as data
> > in
> > > bookies. so in this option, there is no external metadata storage
> needed.
> > >
> >
> > Sorry. Maybe my curiosity is not clear.
> > If you have multiple bookies and each bookie holds its own version of
> > metadata, how do you coordinate them ? which will be the source of truth
> ?
> > Maybe we should start a new email thread in the future to talk about
> > "alternative distributed metadata storages"
> >
>
> It is out of the scope of this BP. We will have a next BP to cover this
> part.
>
>
>
>
> >
> > Any way the meaning and the scope of the proposal is clear to me and I am
> > really OK with it, I hope it will get soon approved
> >
> > -- Enrico
> >
> >
> > >
> > >
> > > > ZK is simple but it very
> > > > effective.
> > >
> > > Maybe we could help the ZK community to move forward and resolve
> > > > the problems we are bringing to light
> > > >
> > > >
> > > > Enrico
> > > >
> > > >
> > > > 2017-09-13 3:15 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> > > >
> > > > > Any thoughts or comments
> > > > > :)
> > > > >
> > > > > Thanks a lot.
> > > > > -Jia
> > > > >
> > > > > On Tue, Sep 12, 2017 at 4:30 PM, Jia Zhai <zh...@gmail.com>
> > wrote:
> > > > >
> > > > > > This blog: https://bitworks.software/
> > blog/en/2017-07-12-replicated-
> > > > > > scalable-commitlog-with-apachebookkeeper.html, which also refer
> a
> > > > little
> > > > > > the limitation of zookeeper in bookkeeper
> > > > > >
> > > > > > On Thu, Sep 7, 2017 at 9:45 AM, Jia Zhai <zh...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> đź‘Ť. Thanks a lot for the suggestions and feed back.
> > > > > >>
> > > > > >> On Thu, Sep 7, 2017 at 4:24 AM, Sijie Guo <gu...@gmail.com>
> > > wrote:
> > > > > >>
> > > > > >>> On Wed, Sep 6, 2017 at 1:07 PM, Enrico Olivelli <
> > > eolivelli@gmail.com
> > > > >
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>> > Off topic curiosity... Jia and Sijie, do you think we are
> going
> > > to
> > > > > >>> drop ZK
> > > > > >>> > from DL too?
> > > > > >>> >
> > > > > >>>
> > > > > >>> Yes. That's the goal - 1) for large deployment, we are trying
> to
> > > > > overcome
> > > > > >>> the limitation of zookeeper; 2) for smaller deployments, it
> will
> > > make
> > > > > >>> deployment much easier, you just need to deploy a cluster of
> > > bookies.
> > > > > >>> once
> > > > > >>> it is done, you can use ledger api or log stream api to access
> > the
> > > > > >>> bookkeeper cluster.
> > > > > >>>
> > > > > >>> Both DL and BK are metadata storage pluggable. They have very
> > clear
> > > > > >>> interfaces on defining metadata operations. So it is
> > > straightforward
> > > > to
> > > > > >>> use
> > > > > >>> a different metadata storage.
> > > > > >>>
> > > > > >>>
> > > > > >>> > Enrico
> > > > > >>> >
> > > > > >>> > On mer 6 set 2017, 19:51 Enrico Olivelli <
> eolivelli@gmail.com>
> > > > > wrote:
> > > > > >>> >
> > > > > >>> > >
> > > > > >>> > >
> > > > > >>> > > On mer 6 set 2017, 18:25 Sijie Guo <gu...@gmail.com>
> > wrote:
> > > > > >>> > >
> > > > > >>> > >> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <
> > > eolivelli@gmail.com>
> > > > > >>> wrote:
> > > > > >>> > >>
> > > > > >>> > >> Thank you Sijie and Jia for your comments and
> explanations,
> > > > > >>> > >> answers inline
> > > > > >>> > >>
> > > > > >>> > >> 2017-09-06 2:23 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> > > > > >>> > >>
> > > > > >>> > >> > Thanks a lot Enrico and Sijie for your comments and
> > > > information
> > > > > on
> > > > > >>> > this.
> > > > > >>> > >> >
> > > > > >>> > >> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <
> > > > > >>> eolivelli@gmail.com>
> > > > > >>> > >> > wrote:
> > > > > >>> > >> >
> > > > > >>> > >> > > Great to see you working on this !
> > > > > >>> > >> > > I would be great to have such feature, as it is the
> > first
> > > > step
> > > > > >>> to a
> > > > > >>> > >> > > 'standalone' BookKeeper mode
> > > > > >>> > >> > >
> > > > > >>> > >> > > Some complementary ideas/first look questions:
> > > > > >>> > >> > > - the document does not talk about security, IMHO we
> > have
> > > at
> > > > > >>> least
> > > > > >>> > to
> > > > > >>> > >> > cover
> > > > > >>> > >> > > authentication and TLS, it would be great to leverage
> > > > existing
> > > > > >>> > >> > AuthPlugins,
> > > > > >>> > >> > > as they are based on exchanging byte[] (as SASL wants)
> > > > > >>> > >> > >
> > > > > >>> > >> > [Jia] It is a good idea. We left the security part for
> now
> > > > for a
> > > > > >>> few
> > > > > >>> > >> > reasons. 1) Make this BP more focus on removing
> zookeeper
> > > > > >>> dependencies
> > > > > >>> > >> from
> > > > > >>> > >> > client. 2) It is introduced as a separated
> implementation
> > of
> > > > > >>> existing
> > > > > >>> > >> > interfaces. So it won’t impact existing security story.
> > >  And
> > > > > for
> > > > > >>> > sure,
> > > > > >>> > >> We
> > > > > >>> > >> > will add the security part later after this.
> > > > > >>> > >> >
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >> I am fine, I am only afraid that we won't be able to
> support
> > > it
> > > > in
> > > > > >>> the
> > > > > >>> > >> (near) future,
> > > > > >>> > >> maybe you could just only cite the security story and add
> > some
> > > > > >>> reference
> > > > > >>> > >> to
> > > > > >>> > >> how we would deal with it in future
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >> The new ledger manager will be first marked as
> experimental,
> > > > until
> > > > > >>> it is
> > > > > >>> > >> stable and have security feature.
> > > > > >>> > >>
> > > > > >>> > >> How does that sound?
> > > > > >>> > >>
> > > > > >>> > >
> > > > > >>> > > Ok
> > > > > >>> > >
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >> >
> > > > > >>> > >> > - do we have some kind of "bootstrap servers list"
> > > > configuration
> > > > > >>> > option
> > > > > >>> > >> ?
> > > > > >>> > >> > > the list should be complete or just a subset of
> bookies
> > ?
> > > at
> > > > > >>> > >> connection
> > > > > >>> > >> > the
> > > > > >>> > >> > > client could discover the list of other bookies
> > > > > >>> > >> > >
> > > > > >>> > >> > [Jia] Yes, we will have a `clientBootstrapBookies`
> > settings
> > > in
> > > > > the
> > > > > >>> > >> server
> > > > > >>> > >> > set. It can be a list of bookies or just simple a DNS
> over
> > > the
> > > > > >>> > bookies.
> > > > > >>> > >> > Will add this to the BP
> > > > > >>> > >> >
> > > > > >>> > >> > - will the client connect to only one bookie at a time ?
> > how
> > > > we
> > > > > >>> will
> > > > > >>> > >> deal
> > > > > >>> > >> > > with errors ?
> > > > > >>> > >> > >
> > > > > >>> > >> > [Jia] It will connect the the list of bootstrap servers.
> > > gPRC
> > > > > will
> > > > > >>> > load
> > > > > >>> > >> > balance the requests and manage the connection errors.
> > > > > >>> > >> >
> > > > > >>> > >> > - should the bookie write on ZK metadata its gRPC
> endpoint
> > > > info
> > > > > ?
> > > > > >>> > (this
> > > > > >>> > >> > > will be useful for a bookie to tell about other
> bookies
> > to
> > > > the
> > > > > >>> > >> connected
> > > > > >>> > >> > > clients)
> > > > > >>> > >> > >
> > > > > >>> > >> > [Jia]No, it won’t. We don’t see a strong reason to add
> it.
> > > > > >>> Especially
> > > > > >>> > >> > eventually we may eliminate zookeeper completely.
> > > > > >>> > >> > It can be a fixed port `3281`, or in a scheduler-based
> > > > > >>> environment, it
> > > > > >>> > >> is
> > > > > >>> > >> > very easy to have a load balancer sitting in front of
> > those
> > > > > >>> bookies.
> > > > > >>> > >> >
> > > > > >>> > >>
> > > > > >>> > >> I think a fixed port is not a good way.
> > > > > >>> > >> You will not be able to run more than one bookie on a
> single
> > > > host.
> > > > > >>> > >>
> > > > > >>> > >> We should support:
> > > > > >>> > >> - configurable port
> > > > > >>> > >> - ephemeral port for tests
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >> I think what Jia means is a configurable port, but it is a
> > > > > >>> relatively
> > > > > >>> > >> fixed
> > > > > >>> > >> port, which client doesn't discover this port from
> > zookeeper.
> > > > > >>> > >>
> > > > > >>> > >
> > > > > >>> > > Very good
> > > > > >>> > >
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >> Ideally I would like to have the local transport option,
> in
> > > > order
> > > > > to
> > > > > >>> > have
> > > > > >>> > >> a
> > > > > >>> > >> single JVM, but this is not a blocker problem, as we are
> > > running
> > > > > >>> gRPC on
> > > > > >>> > >> netty it should be feasible or we can create some kind of
> > > > > >>> short-circut
> > > > > >>> > >> between the client and the Bookie
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >> GRPC supports inprocess channel. So you don't need to use
> > the
> > > > low
> > > > > >>> level
> > > > > >>> > >> netty settings.
> > > > > >>> > >>
> > > > > >>> > >
> > > > > >>> > > Great
> > > > > >>> > >
> > > > > >>> > > So it sounds all good to me thanks
> > > > > >>> > >
> > > > > >>> > > Enrico
> > > > > >>> > >
> > > > > >>> > >
> > > > > >>> > >>
> > > > > >>> > >> I am OK for not writing this to the bookie metadata,
> leaving
> > > up
> > > > to
> > > > > >>> the
> > > > > >>> > >> client have a configured list of bookies enabled to
> metadata
> > > > > >>> operations
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >> >
> > > > > >>> > >> > - the bookie will be somehow a proxy for zookeeper, I
> > think
> > > > that
> > > > > >>> the
> > > > > >>> > >> > > 'watch' part is the more complex, we will have to deal
> > > with
> > > > > >>> > >> > reconnections,
> > > > > >>> > >> > > errors....maybe it is worth to write more detail about
> > > this
> > > > > >>> > >> > >
> > > > > >>> > >> > [Jia] The `watch` API is using the `streaming` rpc in
> > gRPC.
> > > It
> > > > > is
> > > > > >>> a
> > > > > >>> > >> > straightforward proxy behavior, if a connection is
> broken,
> > > the
> > > > > >>> client
> > > > > >>> > >> will
> > > > > >>> > >> > simply retry on watching again.
> > > > > >>> > >> >
> > > > > >>> > >> >
> > > > > >>> > >> > > Minor issues:
> > > > > >>> > >> > > - Maybe you can consider using ledgerId and not
> > ledger_id,
> > > > > like
> > > > > >>> in
> > > > > >>> > >> > > LedgerMetadataFormat we are using lastEntryId
> > > > > >>> > >> > >
> > > > > >>> > >> > [Jia] Thanks, It is a protobuf style. The protobuf will
> > > > convert
> > > > > >>> > >> `ledger_id`
> > > > > >>> > >> > to `ledgerId`. We don’t need to worry about this.
> > > > > >>> > >> >
> > > > > >>> > >>
> > > > > >>> > >> got it, thanks
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >> >
> > > > > >>> > >> >
> > > > > >>> > >> > > -In the "motivation" part you write that the fact the
> > > having
> > > > > >>> more
> > > > > >>> > >> clients
> > > > > >>> > >> > > than the number of bookies would be a problem for
> > > zookeeper,
> > > > > >>> > actually
> > > > > >>> > >> > > zookeeper is very good at dealing with a huge number
> of
> > > > > clients.
> > > > > >>> > >> > Actually I
> > > > > >>> > >> > > am always running clusters with 3-5 bookies and 10-100
> > > > writing
> > > > > >>> > clients
> > > > > >>> > >> > and
> > > > > >>> > >> > > this has never given troubles
> > > > > >>> > >> >
> > > > > >>> > >> > [Jia] :) Seems “10-100 writing clients” is not “a huge
> > > number
> > > > of
> > > > > >>> > >> clients”.
> > > > > >>> > >> >
> > > > > >>> > >>
> > > > > >>> > >> OK, I agree with you an Sijie, I have no experience of
> > larger
> > > > > >>> clusters
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >> >
> > > > > >>> > >> > >
> > > > > >>> > >> >
> > > > > >>> > >> >
> > > > > >>> > >> >
> > > > > >>> > >> > > Future:
> > > > > >>> > >> > > - as bookies will be proxies maybe we should take care
> > not
> > > > to
> > > > > >>> > >> overwhelm
> > > > > >>> > >> a
> > > > > >>> > >> > > bookie with too many clients
> > > > > >>> > >> > >
> > > > > >>> > >> > [Jia] First, gRPC is based on Netty, the protocol is
> > http2,
> > > so
> > > > > the
> > > > > >>> > >> > connection is multiplexed. We don’t need to worry about
> > > > > connection
> > > > > >>> > >> count.
> > > > > >>> > >> > Second, all the bookies are treated equally for the
> > metadata
> > > > > >>> > operations,
> > > > > >>> > >> > gRPC will load balancing the requests across the
> bookies.
> > We
> > > > > don’t
> > > > > >>> > need
> > > > > >>> > >> to
> > > > > >>> > >> > worry about some bookies are overwhelmed.
> > > > > >>> > >> >
> > > > > >>> > >>
> > > > > >>> > >> gRPC sounds great
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >> >
> > > > > >>> > >> >
> > > > > >>> > >> > > - iteration on ledgers, sometimes the clients
> enumerates
> > > > > >>> ledgers but
> > > > > >>> > >> it
> > > > > >>> > >> > is
> > > > > >>> > >> > > not interested in having all of them, as we are using
> > the
> > > > > >>> bookie as
> > > > > >>> > >> proxy
> > > > > >>> > >> > > maybe some kind of "filter" (at least on custom
> > metadata)
> > > > > would
> > > > > >>> be
> > > > > >>> > >> create
> > > > > >>> > >> > > to limit the number of returned items. Other point I
> > don't
> > > > > know
> > > > > >>> gRPC
> > > > > >>> > >> but
> > > > > >>> > >> > it
> > > > > >>> > >> > > does not seems to be very clear how to 'stop' the
> > > iteration
> > > > > >>> > >> > >
> > > > > >>> > >> > [Jia] Thanks, We can add it later. For now, we would
> like
> > to
> > > > > >>> focus on
> > > > > >>> > >> > adding the features the ledger manager needs.
> > > > > >>> > >> >
> > > > > >>> > >>
> > > > > >>> > >> Yup
> > > > > >>> > >>
> > > > > >>> > >> -- Enrico
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >> >
> > > > > >>> > >> > >
> > > > > >>> > >> > > -- Enrico
> > > > > >>> > >> > >
> > > > > >>> > >> > >
> > > > > >>> > >> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <
> > zhaijia03@gmail.com
> > > >:
> > > > > >>> > >> > >
> > > > > >>> > >> > > > Hi all,
> > > > > >>> > >> > > >
> > > > > >>> > >> > > > I have just posted a proposal to remove zookeeper
> > > > dependency
> > > > > >>> from
> > > > > >>> > >> > > > bookkeeper client, to make bookkeeper client a thin
> > > > client:
> > > > > >>> > >> > > >
> > > > > >>> > >> > > > https://cwiki.apache.org/
> > confluence/display/BOOKKEEPER/
> > > > > >>> > >> > > > BP-16%3A+remove+zookeeper+
> dependency+from+bookkeeper+
> > > > client
> > > > > >>> > >> > > >
> > > > > >>> > >> > > >
> > > > > >>> > >> > > > BookKeeper uses zookeeper for service discovery
> > > > (discovering
> > > > > >>> the
> > > > > >>> > >> > > available
> > > > > >>> > >> > > > bookies in the cluster), metadata management
> (storing
> > > all
> > > > > the
> > > > > >>> > >> metadata
> > > > > >>> > >> > > for
> > > > > >>> > >> > > > ledgers). However it exposes the metadata storage
> > > directly
> > > > > to
> > > > > >>> the
> > > > > >>> > >> > > clients,
> > > > > >>> > >> > > > making bookkeeper client a very thick client. It
> also
> > > > > exposes
> > > > > >>> some
> > > > > >>> > >> > > > problems.
> > > > > >>> > >> > > >
> > > > > >>> > >> > > > This BP explores the possibility of eliminating
> > > zookeeper
> > > > > >>> > completely
> > > > > >>> > >> > from
> > > > > >>> > >> > > > client side, to produce a thin bookkeeper client.
> > > > > >>> > >> > > >
> > > > > >>> > >> > > > I will send a patch as soon as we agree on the
> > proposal.
> > > > > >>> > >> > > >
> > > > > >>> > >> > > >
> > > > > >>> > >> > > > Thanks.
> > > > > >>> > >> > > >
> > > > > >>> > >> > > > -Jia
> > > > > >>> > >> > > >
> > > > > >>> > >> > >
> > > > > >>> > >> >
> > > > > >>> > >>
> > > > > >>> > > --
> > > > > >>> > >
> > > > > >>> > >
> > > > > >>> > > -- Enrico Olivelli
> > > > > >>> > >
> > > > > >>> > --
> > > > > >>> >
> > > > > >>> >
> > > > > >>> > -- Enrico Olivelli
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Sijie Guo <gu...@gmail.com>.
On Wed, Sep 13, 2017 at 1:18 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> 2017-09-13 10:10 GMT+02:00 Sijie Guo <gu...@gmail.com>:
>
> > On Wed, Sep 13, 2017 at 12:16 AM, Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > I think that this is a good direction to go.
> > >
> > > I believe to the reasons about ZK in huge systems even it is not my
> case
> > so
> > > I cannot add comments on this usecase.
> > >
> > > I am fine with direction as long as we are still going to support
> > > ZooKeeper.
> > > BookKeeper is in the Hadoop / ZooKeeper ecosystem and several products
> > rely
> > > on ZK too, for instance in my systems it is usual to have
> > > BookKeeper/Kafka/HBase/Majordodo....  and so I am not going to live
> > > without
> > > zookeeper in the short/mid term.
> > >
> > > I am really OK in dropping ZK because for "simple" systems in fact when
> > you
> > > need only BK having the burden of setting up a zookeeper server is
> weird
> > > for customers. I usually re-distribute BK + ZK with my applications and
> > we
> > > are talking about little clusters of up to 10 machines.
> > >
> >
> > Just to clarify - we are not dropping ZK here. we are just proposing to
> > have a ledger manager implementation that doesn't depend on zookeeper
> > directly.
> > We are not modifying any existing ledger manager implementation.
> >
>
>
> Yep, we are on the same page
> for this proposal the bookie will be a sort of "proxy" between the client
> and the actual ledger manager implementation which will "live" inside the
> bookie
> it is only a new ledger manager to be used in clients, this ledger manager
> will issue RPCs (or kind of "streaming" RPCs) to a list of bookies
>
>
> >
> >
> > >
> > > The direction on this proposal is OK for me and it is very like the
> work
> > I
> > > was starting about "standalone mode".
> >
> >
> > > I think it will be very easy to support the case of having a single
> > bookie
> > > with this approach or even client+ bookie in the same JVM,
> > > Having multiple bookies will make us to add some other coordination
> > > facility between bookies, I would like to know if there is already some
> > > idea about this, are we going to use another product like etcd,jgroups
> or
> > > implement our own coordination protocol ?
> >
> >
> > we are not replacing A with B, even with zookeeper. the ledger management
> > is already abstracted in interfaces.
> > the users can use whatever system they prefer as the metadata store.
> >
> > our direction is to provide an option to store metadata as well as data
> in
> > bookies. so in this option, there is no external metadata storage needed.
> >
>
> Sorry. Maybe my curiosity is not clear.
> If you have multiple bookies and each bookie holds its own version of
> metadata, how do you coordinate them ? which will be the source of truth ?
> Maybe we should start a new email thread in the future to talk about
> "alternative distributed metadata storages"
>

It is out of the scope of this BP. We will have a next BP to cover this
part.




>
> Any way the meaning and the scope of the proposal is clear to me and I am
> really OK with it, I hope it will get soon approved
>
> -- Enrico
>
>
> >
> >
> > > ZK is simple but it very
> > > effective.
> >
> > Maybe we could help the ZK community to move forward and resolve
> > > the problems we are bringing to light
> > >
> > >
> > > Enrico
> > >
> > >
> > > 2017-09-13 3:15 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> > >
> > > > Any thoughts or comments
> > > > :)
> > > >
> > > > Thanks a lot.
> > > > -Jia
> > > >
> > > > On Tue, Sep 12, 2017 at 4:30 PM, Jia Zhai <zh...@gmail.com>
> wrote:
> > > >
> > > > > This blog: https://bitworks.software/
> blog/en/2017-07-12-replicated-
> > > > > scalable-commitlog-with-apachebookkeeper.html, which also refer a
> > > little
> > > > > the limitation of zookeeper in bookkeeper
> > > > >
> > > > > On Thu, Sep 7, 2017 at 9:45 AM, Jia Zhai <zh...@gmail.com>
> > wrote:
> > > > >
> > > > >> đź‘Ť. Thanks a lot for the suggestions and feed back.
> > > > >>
> > > > >> On Thu, Sep 7, 2017 at 4:24 AM, Sijie Guo <gu...@gmail.com>
> > wrote:
> > > > >>
> > > > >>> On Wed, Sep 6, 2017 at 1:07 PM, Enrico Olivelli <
> > eolivelli@gmail.com
> > > >
> > > > >>> wrote:
> > > > >>>
> > > > >>> > Off topic curiosity... Jia and Sijie, do you think we are going
> > to
> > > > >>> drop ZK
> > > > >>> > from DL too?
> > > > >>> >
> > > > >>>
> > > > >>> Yes. That's the goal - 1) for large deployment, we are trying to
> > > > overcome
> > > > >>> the limitation of zookeeper; 2) for smaller deployments, it will
> > make
> > > > >>> deployment much easier, you just need to deploy a cluster of
> > bookies.
> > > > >>> once
> > > > >>> it is done, you can use ledger api or log stream api to access
> the
> > > > >>> bookkeeper cluster.
> > > > >>>
> > > > >>> Both DL and BK are metadata storage pluggable. They have very
> clear
> > > > >>> interfaces on defining metadata operations. So it is
> > straightforward
> > > to
> > > > >>> use
> > > > >>> a different metadata storage.
> > > > >>>
> > > > >>>
> > > > >>> > Enrico
> > > > >>> >
> > > > >>> > On mer 6 set 2017, 19:51 Enrico Olivelli <eo...@gmail.com>
> > > > wrote:
> > > > >>> >
> > > > >>> > >
> > > > >>> > >
> > > > >>> > > On mer 6 set 2017, 18:25 Sijie Guo <gu...@gmail.com>
> wrote:
> > > > >>> > >
> > > > >>> > >> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <
> > eolivelli@gmail.com>
> > > > >>> wrote:
> > > > >>> > >>
> > > > >>> > >> Thank you Sijie and Jia for your comments and explanations,
> > > > >>> > >> answers inline
> > > > >>> > >>
> > > > >>> > >> 2017-09-06 2:23 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> > > > >>> > >>
> > > > >>> > >> > Thanks a lot Enrico and Sijie for your comments and
> > > information
> > > > on
> > > > >>> > this.
> > > > >>> > >> >
> > > > >>> > >> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <
> > > > >>> eolivelli@gmail.com>
> > > > >>> > >> > wrote:
> > > > >>> > >> >
> > > > >>> > >> > > Great to see you working on this !
> > > > >>> > >> > > I would be great to have such feature, as it is the
> first
> > > step
> > > > >>> to a
> > > > >>> > >> > > 'standalone' BookKeeper mode
> > > > >>> > >> > >
> > > > >>> > >> > > Some complementary ideas/first look questions:
> > > > >>> > >> > > - the document does not talk about security, IMHO we
> have
> > at
> > > > >>> least
> > > > >>> > to
> > > > >>> > >> > cover
> > > > >>> > >> > > authentication and TLS, it would be great to leverage
> > > existing
> > > > >>> > >> > AuthPlugins,
> > > > >>> > >> > > as they are based on exchanging byte[] (as SASL wants)
> > > > >>> > >> > >
> > > > >>> > >> > [Jia] It is a good idea. We left the security part for now
> > > for a
> > > > >>> few
> > > > >>> > >> > reasons. 1) Make this BP more focus on removing zookeeper
> > > > >>> dependencies
> > > > >>> > >> from
> > > > >>> > >> > client. 2) It is introduced as a separated implementation
> of
> > > > >>> existing
> > > > >>> > >> > interfaces. So it won’t impact existing security story.
> >  And
> > > > for
> > > > >>> > sure,
> > > > >>> > >> We
> > > > >>> > >> > will add the security part later after this.
> > > > >>> > >> >
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >> I am fine, I am only afraid that we won't be able to support
> > it
> > > in
> > > > >>> the
> > > > >>> > >> (near) future,
> > > > >>> > >> maybe you could just only cite the security story and add
> some
> > > > >>> reference
> > > > >>> > >> to
> > > > >>> > >> how we would deal with it in future
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >> The new ledger manager will be first marked as experimental,
> > > until
> > > > >>> it is
> > > > >>> > >> stable and have security feature.
> > > > >>> > >>
> > > > >>> > >> How does that sound?
> > > > >>> > >>
> > > > >>> > >
> > > > >>> > > Ok
> > > > >>> > >
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >> >
> > > > >>> > >> > - do we have some kind of "bootstrap servers list"
> > > configuration
> > > > >>> > option
> > > > >>> > >> ?
> > > > >>> > >> > > the list should be complete or just a subset of bookies
> ?
> > at
> > > > >>> > >> connection
> > > > >>> > >> > the
> > > > >>> > >> > > client could discover the list of other bookies
> > > > >>> > >> > >
> > > > >>> > >> > [Jia] Yes, we will have a `clientBootstrapBookies`
> settings
> > in
> > > > the
> > > > >>> > >> server
> > > > >>> > >> > set. It can be a list of bookies or just simple a DNS over
> > the
> > > > >>> > bookies.
> > > > >>> > >> > Will add this to the BP
> > > > >>> > >> >
> > > > >>> > >> > - will the client connect to only one bookie at a time ?
> how
> > > we
> > > > >>> will
> > > > >>> > >> deal
> > > > >>> > >> > > with errors ?
> > > > >>> > >> > >
> > > > >>> > >> > [Jia] It will connect the the list of bootstrap servers.
> > gPRC
> > > > will
> > > > >>> > load
> > > > >>> > >> > balance the requests and manage the connection errors.
> > > > >>> > >> >
> > > > >>> > >> > - should the bookie write on ZK metadata its gRPC endpoint
> > > info
> > > > ?
> > > > >>> > (this
> > > > >>> > >> > > will be useful for a bookie to tell about other bookies
> to
> > > the
> > > > >>> > >> connected
> > > > >>> > >> > > clients)
> > > > >>> > >> > >
> > > > >>> > >> > [Jia]No, it won’t. We don’t see a strong reason to add it.
> > > > >>> Especially
> > > > >>> > >> > eventually we may eliminate zookeeper completely.
> > > > >>> > >> > It can be a fixed port `3281`, or in a scheduler-based
> > > > >>> environment, it
> > > > >>> > >> is
> > > > >>> > >> > very easy to have a load balancer sitting in front of
> those
> > > > >>> bookies.
> > > > >>> > >> >
> > > > >>> > >>
> > > > >>> > >> I think a fixed port is not a good way.
> > > > >>> > >> You will not be able to run more than one bookie on a single
> > > host.
> > > > >>> > >>
> > > > >>> > >> We should support:
> > > > >>> > >> - configurable port
> > > > >>> > >> - ephemeral port for tests
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >> I think what Jia means is a configurable port, but it is a
> > > > >>> relatively
> > > > >>> > >> fixed
> > > > >>> > >> port, which client doesn't discover this port from
> zookeeper.
> > > > >>> > >>
> > > > >>> > >
> > > > >>> > > Very good
> > > > >>> > >
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >> Ideally I would like to have the local transport option, in
> > > order
> > > > to
> > > > >>> > have
> > > > >>> > >> a
> > > > >>> > >> single JVM, but this is not a blocker problem, as we are
> > running
> > > > >>> gRPC on
> > > > >>> > >> netty it should be feasible or we can create some kind of
> > > > >>> short-circut
> > > > >>> > >> between the client and the Bookie
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >> GRPC supports inprocess channel. So you don't need to use
> the
> > > low
> > > > >>> level
> > > > >>> > >> netty settings.
> > > > >>> > >>
> > > > >>> > >
> > > > >>> > > Great
> > > > >>> > >
> > > > >>> > > So it sounds all good to me thanks
> > > > >>> > >
> > > > >>> > > Enrico
> > > > >>> > >
> > > > >>> > >
> > > > >>> > >>
> > > > >>> > >> I am OK for not writing this to the bookie metadata, leaving
> > up
> > > to
> > > > >>> the
> > > > >>> > >> client have a configured list of bookies enabled to metadata
> > > > >>> operations
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >> >
> > > > >>> > >> > - the bookie will be somehow a proxy for zookeeper, I
> think
> > > that
> > > > >>> the
> > > > >>> > >> > > 'watch' part is the more complex, we will have to deal
> > with
> > > > >>> > >> > reconnections,
> > > > >>> > >> > > errors....maybe it is worth to write more detail about
> > this
> > > > >>> > >> > >
> > > > >>> > >> > [Jia] The `watch` API is using the `streaming` rpc in
> gRPC.
> > It
> > > > is
> > > > >>> a
> > > > >>> > >> > straightforward proxy behavior, if a connection is broken,
> > the
> > > > >>> client
> > > > >>> > >> will
> > > > >>> > >> > simply retry on watching again.
> > > > >>> > >> >
> > > > >>> > >> >
> > > > >>> > >> > > Minor issues:
> > > > >>> > >> > > - Maybe you can consider using ledgerId and not
> ledger_id,
> > > > like
> > > > >>> in
> > > > >>> > >> > > LedgerMetadataFormat we are using lastEntryId
> > > > >>> > >> > >
> > > > >>> > >> > [Jia] Thanks, It is a protobuf style. The protobuf will
> > > convert
> > > > >>> > >> `ledger_id`
> > > > >>> > >> > to `ledgerId`. We don’t need to worry about this.
> > > > >>> > >> >
> > > > >>> > >>
> > > > >>> > >> got it, thanks
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >> >
> > > > >>> > >> >
> > > > >>> > >> > > -In the "motivation" part you write that the fact the
> > having
> > > > >>> more
> > > > >>> > >> clients
> > > > >>> > >> > > than the number of bookies would be a problem for
> > zookeeper,
> > > > >>> > actually
> > > > >>> > >> > > zookeeper is very good at dealing with a huge number of
> > > > clients.
> > > > >>> > >> > Actually I
> > > > >>> > >> > > am always running clusters with 3-5 bookies and 10-100
> > > writing
> > > > >>> > clients
> > > > >>> > >> > and
> > > > >>> > >> > > this has never given troubles
> > > > >>> > >> >
> > > > >>> > >> > [Jia] :) Seems “10-100 writing clients” is not “a huge
> > number
> > > of
> > > > >>> > >> clients”.
> > > > >>> > >> >
> > > > >>> > >>
> > > > >>> > >> OK, I agree with you an Sijie, I have no experience of
> larger
> > > > >>> clusters
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >> >
> > > > >>> > >> > >
> > > > >>> > >> >
> > > > >>> > >> >
> > > > >>> > >> >
> > > > >>> > >> > > Future:
> > > > >>> > >> > > - as bookies will be proxies maybe we should take care
> not
> > > to
> > > > >>> > >> overwhelm
> > > > >>> > >> a
> > > > >>> > >> > > bookie with too many clients
> > > > >>> > >> > >
> > > > >>> > >> > [Jia] First, gRPC is based on Netty, the protocol is
> http2,
> > so
> > > > the
> > > > >>> > >> > connection is multiplexed. We don’t need to worry about
> > > > connection
> > > > >>> > >> count.
> > > > >>> > >> > Second, all the bookies are treated equally for the
> metadata
> > > > >>> > operations,
> > > > >>> > >> > gRPC will load balancing the requests across the bookies.
> We
> > > > don’t
> > > > >>> > need
> > > > >>> > >> to
> > > > >>> > >> > worry about some bookies are overwhelmed.
> > > > >>> > >> >
> > > > >>> > >>
> > > > >>> > >> gRPC sounds great
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >> >
> > > > >>> > >> >
> > > > >>> > >> > > - iteration on ledgers, sometimes the clients enumerates
> > > > >>> ledgers but
> > > > >>> > >> it
> > > > >>> > >> > is
> > > > >>> > >> > > not interested in having all of them, as we are using
> the
> > > > >>> bookie as
> > > > >>> > >> proxy
> > > > >>> > >> > > maybe some kind of "filter" (at least on custom
> metadata)
> > > > would
> > > > >>> be
> > > > >>> > >> create
> > > > >>> > >> > > to limit the number of returned items. Other point I
> don't
> > > > know
> > > > >>> gRPC
> > > > >>> > >> but
> > > > >>> > >> > it
> > > > >>> > >> > > does not seems to be very clear how to 'stop' the
> > iteration
> > > > >>> > >> > >
> > > > >>> > >> > [Jia] Thanks, We can add it later. For now, we would like
> to
> > > > >>> focus on
> > > > >>> > >> > adding the features the ledger manager needs.
> > > > >>> > >> >
> > > > >>> > >>
> > > > >>> > >> Yup
> > > > >>> > >>
> > > > >>> > >> -- Enrico
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >> >
> > > > >>> > >> > >
> > > > >>> > >> > > -- Enrico
> > > > >>> > >> > >
> > > > >>> > >> > >
> > > > >>> > >> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <
> zhaijia03@gmail.com
> > >:
> > > > >>> > >> > >
> > > > >>> > >> > > > Hi all,
> > > > >>> > >> > > >
> > > > >>> > >> > > > I have just posted a proposal to remove zookeeper
> > > dependency
> > > > >>> from
> > > > >>> > >> > > > bookkeeper client, to make bookkeeper client a thin
> > > client:
> > > > >>> > >> > > >
> > > > >>> > >> > > > https://cwiki.apache.org/
> confluence/display/BOOKKEEPER/
> > > > >>> > >> > > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+
> > > client
> > > > >>> > >> > > >
> > > > >>> > >> > > >
> > > > >>> > >> > > > BookKeeper uses zookeeper for service discovery
> > > (discovering
> > > > >>> the
> > > > >>> > >> > > available
> > > > >>> > >> > > > bookies in the cluster), metadata management (storing
> > all
> > > > the
> > > > >>> > >> metadata
> > > > >>> > >> > > for
> > > > >>> > >> > > > ledgers). However it exposes the metadata storage
> > directly
> > > > to
> > > > >>> the
> > > > >>> > >> > > clients,
> > > > >>> > >> > > > making bookkeeper client a very thick client. It also
> > > > exposes
> > > > >>> some
> > > > >>> > >> > > > problems.
> > > > >>> > >> > > >
> > > > >>> > >> > > > This BP explores the possibility of eliminating
> > zookeeper
> > > > >>> > completely
> > > > >>> > >> > from
> > > > >>> > >> > > > client side, to produce a thin bookkeeper client.
> > > > >>> > >> > > >
> > > > >>> > >> > > > I will send a patch as soon as we agree on the
> proposal.
> > > > >>> > >> > > >
> > > > >>> > >> > > >
> > > > >>> > >> > > > Thanks.
> > > > >>> > >> > > >
> > > > >>> > >> > > > -Jia
> > > > >>> > >> > > >
> > > > >>> > >> > >
> > > > >>> > >> >
> > > > >>> > >>
> > > > >>> > > --
> > > > >>> > >
> > > > >>> > >
> > > > >>> > > -- Enrico Olivelli
> > > > >>> > >
> > > > >>> > --
> > > > >>> >
> > > > >>> >
> > > > >>> > -- Enrico Olivelli
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Enrico Olivelli <eo...@gmail.com>.
2017-09-13 10:10 GMT+02:00 Sijie Guo <gu...@gmail.com>:

> On Wed, Sep 13, 2017 at 12:16 AM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > I think that this is a good direction to go.
> >
> > I believe to the reasons about ZK in huge systems even it is not my case
> so
> > I cannot add comments on this usecase.
> >
> > I am fine with direction as long as we are still going to support
> > ZooKeeper.
> > BookKeeper is in the Hadoop / ZooKeeper ecosystem and several products
> rely
> > on ZK too, for instance in my systems it is usual to have
> > BookKeeper/Kafka/HBase/Majordodo....  and so I am not going to live
> > without
> > zookeeper in the short/mid term.
> >
> > I am really OK in dropping ZK because for "simple" systems in fact when
> you
> > need only BK having the burden of setting up a zookeeper server is weird
> > for customers. I usually re-distribute BK + ZK with my applications and
> we
> > are talking about little clusters of up to 10 machines.
> >
>
> Just to clarify - we are not dropping ZK here. we are just proposing to
> have a ledger manager implementation that doesn't depend on zookeeper
> directly.
> We are not modifying any existing ledger manager implementation.
>


Yep, we are on the same page
for this proposal the bookie will be a sort of "proxy" between the client
and the actual ledger manager implementation which will "live" inside the
bookie
it is only a new ledger manager to be used in clients, this ledger manager
will issue RPCs (or kind of "streaming" RPCs) to a list of bookies


>
>
> >
> > The direction on this proposal is OK for me and it is very like the work
> I
> > was starting about "standalone mode".
>
>
> > I think it will be very easy to support the case of having a single
> bookie
> > with this approach or even client+ bookie in the same JVM,
> > Having multiple bookies will make us to add some other coordination
> > facility between bookies, I would like to know if there is already some
> > idea about this, are we going to use another product like etcd,jgroups or
> > implement our own coordination protocol ?
>
>
> we are not replacing A with B, even with zookeeper. the ledger management
> is already abstracted in interfaces.
> the users can use whatever system they prefer as the metadata store.
>
> our direction is to provide an option to store metadata as well as data in
> bookies. so in this option, there is no external metadata storage needed.
>

Sorry. Maybe my curiosity is not clear.
If you have multiple bookies and each bookie holds its own version of
metadata, how do you coordinate them ? which will be the source of truth ?
Maybe we should start a new email thread in the future to talk about
"alternative distributed metadata storages"

Any way the meaning and the scope of the proposal is clear to me and I am
really OK with it, I hope it will get soon approved

-- Enrico


>
>
> > ZK is simple but it very
> > effective.
>
> Maybe we could help the ZK community to move forward and resolve
> > the problems we are bringing to light
> >
> >
> > Enrico
> >
> >
> > 2017-09-13 3:15 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> >
> > > Any thoughts or comments
> > > :)
> > >
> > > Thanks a lot.
> > > -Jia
> > >
> > > On Tue, Sep 12, 2017 at 4:30 PM, Jia Zhai <zh...@gmail.com> wrote:
> > >
> > > > This blog: https://bitworks.software/blog/en/2017-07-12-replicated-
> > > > scalable-commitlog-with-apachebookkeeper.html, which also refer a
> > little
> > > > the limitation of zookeeper in bookkeeper
> > > >
> > > > On Thu, Sep 7, 2017 at 9:45 AM, Jia Zhai <zh...@gmail.com>
> wrote:
> > > >
> > > >> đź‘Ť. Thanks a lot for the suggestions and feed back.
> > > >>
> > > >> On Thu, Sep 7, 2017 at 4:24 AM, Sijie Guo <gu...@gmail.com>
> wrote:
> > > >>
> > > >>> On Wed, Sep 6, 2017 at 1:07 PM, Enrico Olivelli <
> eolivelli@gmail.com
> > >
> > > >>> wrote:
> > > >>>
> > > >>> > Off topic curiosity... Jia and Sijie, do you think we are going
> to
> > > >>> drop ZK
> > > >>> > from DL too?
> > > >>> >
> > > >>>
> > > >>> Yes. That's the goal - 1) for large deployment, we are trying to
> > > overcome
> > > >>> the limitation of zookeeper; 2) for smaller deployments, it will
> make
> > > >>> deployment much easier, you just need to deploy a cluster of
> bookies.
> > > >>> once
> > > >>> it is done, you can use ledger api or log stream api to access the
> > > >>> bookkeeper cluster.
> > > >>>
> > > >>> Both DL and BK are metadata storage pluggable. They have very clear
> > > >>> interfaces on defining metadata operations. So it is
> straightforward
> > to
> > > >>> use
> > > >>> a different metadata storage.
> > > >>>
> > > >>>
> > > >>> > Enrico
> > > >>> >
> > > >>> > On mer 6 set 2017, 19:51 Enrico Olivelli <eo...@gmail.com>
> > > wrote:
> > > >>> >
> > > >>> > >
> > > >>> > >
> > > >>> > > On mer 6 set 2017, 18:25 Sijie Guo <gu...@gmail.com> wrote:
> > > >>> > >
> > > >>> > >> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <
> eolivelli@gmail.com>
> > > >>> wrote:
> > > >>> > >>
> > > >>> > >> Thank you Sijie and Jia for your comments and explanations,
> > > >>> > >> answers inline
> > > >>> > >>
> > > >>> > >> 2017-09-06 2:23 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> > > >>> > >>
> > > >>> > >> > Thanks a lot Enrico and Sijie for your comments and
> > information
> > > on
> > > >>> > this.
> > > >>> > >> >
> > > >>> > >> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <
> > > >>> eolivelli@gmail.com>
> > > >>> > >> > wrote:
> > > >>> > >> >
> > > >>> > >> > > Great to see you working on this !
> > > >>> > >> > > I would be great to have such feature, as it is the first
> > step
> > > >>> to a
> > > >>> > >> > > 'standalone' BookKeeper mode
> > > >>> > >> > >
> > > >>> > >> > > Some complementary ideas/first look questions:
> > > >>> > >> > > - the document does not talk about security, IMHO we have
> at
> > > >>> least
> > > >>> > to
> > > >>> > >> > cover
> > > >>> > >> > > authentication and TLS, it would be great to leverage
> > existing
> > > >>> > >> > AuthPlugins,
> > > >>> > >> > > as they are based on exchanging byte[] (as SASL wants)
> > > >>> > >> > >
> > > >>> > >> > [Jia] It is a good idea. We left the security part for now
> > for a
> > > >>> few
> > > >>> > >> > reasons. 1) Make this BP more focus on removing zookeeper
> > > >>> dependencies
> > > >>> > >> from
> > > >>> > >> > client. 2) It is introduced as a separated implementation of
> > > >>> existing
> > > >>> > >> > interfaces. So it won’t impact existing security story.
>  And
> > > for
> > > >>> > sure,
> > > >>> > >> We
> > > >>> > >> > will add the security part later after this.
> > > >>> > >> >
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> I am fine, I am only afraid that we won't be able to support
> it
> > in
> > > >>> the
> > > >>> > >> (near) future,
> > > >>> > >> maybe you could just only cite the security story and add some
> > > >>> reference
> > > >>> > >> to
> > > >>> > >> how we would deal with it in future
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> The new ledger manager will be first marked as experimental,
> > until
> > > >>> it is
> > > >>> > >> stable and have security feature.
> > > >>> > >>
> > > >>> > >> How does that sound?
> > > >>> > >>
> > > >>> > >
> > > >>> > > Ok
> > > >>> > >
> > > >>> > >>
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> >
> > > >>> > >> > - do we have some kind of "bootstrap servers list"
> > configuration
> > > >>> > option
> > > >>> > >> ?
> > > >>> > >> > > the list should be complete or just a subset of bookies ?
> at
> > > >>> > >> connection
> > > >>> > >> > the
> > > >>> > >> > > client could discover the list of other bookies
> > > >>> > >> > >
> > > >>> > >> > [Jia] Yes, we will have a `clientBootstrapBookies` settings
> in
> > > the
> > > >>> > >> server
> > > >>> > >> > set. It can be a list of bookies or just simple a DNS over
> the
> > > >>> > bookies.
> > > >>> > >> > Will add this to the BP
> > > >>> > >> >
> > > >>> > >> > - will the client connect to only one bookie at a time ? how
> > we
> > > >>> will
> > > >>> > >> deal
> > > >>> > >> > > with errors ?
> > > >>> > >> > >
> > > >>> > >> > [Jia] It will connect the the list of bootstrap servers.
> gPRC
> > > will
> > > >>> > load
> > > >>> > >> > balance the requests and manage the connection errors.
> > > >>> > >> >
> > > >>> > >> > - should the bookie write on ZK metadata its gRPC endpoint
> > info
> > > ?
> > > >>> > (this
> > > >>> > >> > > will be useful for a bookie to tell about other bookies to
> > the
> > > >>> > >> connected
> > > >>> > >> > > clients)
> > > >>> > >> > >
> > > >>> > >> > [Jia]No, it won’t. We don’t see a strong reason to add it.
> > > >>> Especially
> > > >>> > >> > eventually we may eliminate zookeeper completely.
> > > >>> > >> > It can be a fixed port `3281`, or in a scheduler-based
> > > >>> environment, it
> > > >>> > >> is
> > > >>> > >> > very easy to have a load balancer sitting in front of those
> > > >>> bookies.
> > > >>> > >> >
> > > >>> > >>
> > > >>> > >> I think a fixed port is not a good way.
> > > >>> > >> You will not be able to run more than one bookie on a single
> > host.
> > > >>> > >>
> > > >>> > >> We should support:
> > > >>> > >> - configurable port
> > > >>> > >> - ephemeral port for tests
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> I think what Jia means is a configurable port, but it is a
> > > >>> relatively
> > > >>> > >> fixed
> > > >>> > >> port, which client doesn't discover this port from zookeeper.
> > > >>> > >>
> > > >>> > >
> > > >>> > > Very good
> > > >>> > >
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> Ideally I would like to have the local transport option, in
> > order
> > > to
> > > >>> > have
> > > >>> > >> a
> > > >>> > >> single JVM, but this is not a blocker problem, as we are
> running
> > > >>> gRPC on
> > > >>> > >> netty it should be feasible or we can create some kind of
> > > >>> short-circut
> > > >>> > >> between the client and the Bookie
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> GRPC supports inprocess channel. So you don't need to use the
> > low
> > > >>> level
> > > >>> > >> netty settings.
> > > >>> > >>
> > > >>> > >
> > > >>> > > Great
> > > >>> > >
> > > >>> > > So it sounds all good to me thanks
> > > >>> > >
> > > >>> > > Enrico
> > > >>> > >
> > > >>> > >
> > > >>> > >>
> > > >>> > >> I am OK for not writing this to the bookie metadata, leaving
> up
> > to
> > > >>> the
> > > >>> > >> client have a configured list of bookies enabled to metadata
> > > >>> operations
> > > >>> > >>
> > > >>> > >>
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> >
> > > >>> > >> > - the bookie will be somehow a proxy for zookeeper, I think
> > that
> > > >>> the
> > > >>> > >> > > 'watch' part is the more complex, we will have to deal
> with
> > > >>> > >> > reconnections,
> > > >>> > >> > > errors....maybe it is worth to write more detail about
> this
> > > >>> > >> > >
> > > >>> > >> > [Jia] The `watch` API is using the `streaming` rpc in gRPC.
> It
> > > is
> > > >>> a
> > > >>> > >> > straightforward proxy behavior, if a connection is broken,
> the
> > > >>> client
> > > >>> > >> will
> > > >>> > >> > simply retry on watching again.
> > > >>> > >> >
> > > >>> > >> >
> > > >>> > >> > > Minor issues:
> > > >>> > >> > > - Maybe you can consider using ledgerId and not ledger_id,
> > > like
> > > >>> in
> > > >>> > >> > > LedgerMetadataFormat we are using lastEntryId
> > > >>> > >> > >
> > > >>> > >> > [Jia] Thanks, It is a protobuf style. The protobuf will
> > convert
> > > >>> > >> `ledger_id`
> > > >>> > >> > to `ledgerId`. We don’t need to worry about this.
> > > >>> > >> >
> > > >>> > >>
> > > >>> > >> got it, thanks
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> >
> > > >>> > >> >
> > > >>> > >> > > -In the "motivation" part you write that the fact the
> having
> > > >>> more
> > > >>> > >> clients
> > > >>> > >> > > than the number of bookies would be a problem for
> zookeeper,
> > > >>> > actually
> > > >>> > >> > > zookeeper is very good at dealing with a huge number of
> > > clients.
> > > >>> > >> > Actually I
> > > >>> > >> > > am always running clusters with 3-5 bookies and 10-100
> > writing
> > > >>> > clients
> > > >>> > >> > and
> > > >>> > >> > > this has never given troubles
> > > >>> > >> >
> > > >>> > >> > [Jia] :) Seems “10-100 writing clients” is not “a huge
> number
> > of
> > > >>> > >> clients”.
> > > >>> > >> >
> > > >>> > >>
> > > >>> > >> OK, I agree with you an Sijie, I have no experience of larger
> > > >>> clusters
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> >
> > > >>> > >> > >
> > > >>> > >> >
> > > >>> > >> >
> > > >>> > >> >
> > > >>> > >> > > Future:
> > > >>> > >> > > - as bookies will be proxies maybe we should take care not
> > to
> > > >>> > >> overwhelm
> > > >>> > >> a
> > > >>> > >> > > bookie with too many clients
> > > >>> > >> > >
> > > >>> > >> > [Jia] First, gRPC is based on Netty, the protocol is http2,
> so
> > > the
> > > >>> > >> > connection is multiplexed. We don’t need to worry about
> > > connection
> > > >>> > >> count.
> > > >>> > >> > Second, all the bookies are treated equally for the metadata
> > > >>> > operations,
> > > >>> > >> > gRPC will load balancing the requests across the bookies. We
> > > don’t
> > > >>> > need
> > > >>> > >> to
> > > >>> > >> > worry about some bookies are overwhelmed.
> > > >>> > >> >
> > > >>> > >>
> > > >>> > >> gRPC sounds great
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> >
> > > >>> > >> >
> > > >>> > >> > > - iteration on ledgers, sometimes the clients enumerates
> > > >>> ledgers but
> > > >>> > >> it
> > > >>> > >> > is
> > > >>> > >> > > not interested in having all of them, as we are using the
> > > >>> bookie as
> > > >>> > >> proxy
> > > >>> > >> > > maybe some kind of "filter" (at least on custom metadata)
> > > would
> > > >>> be
> > > >>> > >> create
> > > >>> > >> > > to limit the number of returned items. Other point I don't
> > > know
> > > >>> gRPC
> > > >>> > >> but
> > > >>> > >> > it
> > > >>> > >> > > does not seems to be very clear how to 'stop' the
> iteration
> > > >>> > >> > >
> > > >>> > >> > [Jia] Thanks, We can add it later. For now, we would like to
> > > >>> focus on
> > > >>> > >> > adding the features the ledger manager needs.
> > > >>> > >> >
> > > >>> > >>
> > > >>> > >> Yup
> > > >>> > >>
> > > >>> > >> -- Enrico
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> >
> > > >>> > >> > >
> > > >>> > >> > > -- Enrico
> > > >>> > >> > >
> > > >>> > >> > >
> > > >>> > >> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zhaijia03@gmail.com
> >:
> > > >>> > >> > >
> > > >>> > >> > > > Hi all,
> > > >>> > >> > > >
> > > >>> > >> > > > I have just posted a proposal to remove zookeeper
> > dependency
> > > >>> from
> > > >>> > >> > > > bookkeeper client, to make bookkeeper client a thin
> > client:
> > > >>> > >> > > >
> > > >>> > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > >>> > >> > > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+
> > client
> > > >>> > >> > > >
> > > >>> > >> > > >
> > > >>> > >> > > > BookKeeper uses zookeeper for service discovery
> > (discovering
> > > >>> the
> > > >>> > >> > > available
> > > >>> > >> > > > bookies in the cluster), metadata management (storing
> all
> > > the
> > > >>> > >> metadata
> > > >>> > >> > > for
> > > >>> > >> > > > ledgers). However it exposes the metadata storage
> directly
> > > to
> > > >>> the
> > > >>> > >> > > clients,
> > > >>> > >> > > > making bookkeeper client a very thick client. It also
> > > exposes
> > > >>> some
> > > >>> > >> > > > problems.
> > > >>> > >> > > >
> > > >>> > >> > > > This BP explores the possibility of eliminating
> zookeeper
> > > >>> > completely
> > > >>> > >> > from
> > > >>> > >> > > > client side, to produce a thin bookkeeper client.
> > > >>> > >> > > >
> > > >>> > >> > > > I will send a patch as soon as we agree on the proposal.
> > > >>> > >> > > >
> > > >>> > >> > > >
> > > >>> > >> > > > Thanks.
> > > >>> > >> > > >
> > > >>> > >> > > > -Jia
> > > >>> > >> > > >
> > > >>> > >> > >
> > > >>> > >> >
> > > >>> > >>
> > > >>> > > --
> > > >>> > >
> > > >>> > >
> > > >>> > > -- Enrico Olivelli
> > > >>> > >
> > > >>> > --
> > > >>> >
> > > >>> >
> > > >>> > -- Enrico Olivelli
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Sijie Guo <gu...@gmail.com>.
On Wed, Sep 13, 2017 at 12:16 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> I think that this is a good direction to go.
>
> I believe to the reasons about ZK in huge systems even it is not my case so
> I cannot add comments on this usecase.
>
> I am fine with direction as long as we are still going to support
> ZooKeeper.
> BookKeeper is in the Hadoop / ZooKeeper ecosystem and several products rely
> on ZK too, for instance in my systems it is usual to have
> BookKeeper/Kafka/HBase/Majordodo....  and so I am not going to live
> without
> zookeeper in the short/mid term.
>
> I am really OK in dropping ZK because for "simple" systems in fact when you
> need only BK having the burden of setting up a zookeeper server is weird
> for customers. I usually re-distribute BK + ZK with my applications and we
> are talking about little clusters of up to 10 machines.
>

Just to clarify - we are not dropping ZK here. we are just proposing to
have a ledger manager implementation that doesn't depend on zookeeper
directly.
We are not modifying any existing ledger manager implementation.


>
> The direction on this proposal is OK for me and it is very like the work I
> was starting about "standalone mode".


> I think it will be very easy to support the case of having a single bookie
> with this approach or even client+ bookie in the same JVM,
> Having multiple bookies will make us to add some other coordination
> facility between bookies, I would like to know if there is already some
> idea about this, are we going to use another product like etcd,jgroups or
> implement our own coordination protocol ?


we are not replacing A with B, even with zookeeper. the ledger management
is already abstracted in interfaces.
the users can use whatever system they prefer as the metadata store.

our direction is to provide an option to store metadata as well as data in
bookies. so in this option, there is no external metadata storage needed.


> ZK is simple but it very
> effective.

Maybe we could help the ZK community to move forward and resolve
> the problems we are bringing to light
>
>
> Enrico
>
>
> 2017-09-13 3:15 GMT+02:00 Jia Zhai <zh...@gmail.com>:
>
> > Any thoughts or comments
> > :)
> >
> > Thanks a lot.
> > -Jia
> >
> > On Tue, Sep 12, 2017 at 4:30 PM, Jia Zhai <zh...@gmail.com> wrote:
> >
> > > This blog: https://bitworks.software/blog/en/2017-07-12-replicated-
> > > scalable-commitlog-with-apachebookkeeper.html, which also refer a
> little
> > > the limitation of zookeeper in bookkeeper
> > >
> > > On Thu, Sep 7, 2017 at 9:45 AM, Jia Zhai <zh...@gmail.com> wrote:
> > >
> > >> đź‘Ť. Thanks a lot for the suggestions and feed back.
> > >>
> > >> On Thu, Sep 7, 2017 at 4:24 AM, Sijie Guo <gu...@gmail.com> wrote:
> > >>
> > >>> On Wed, Sep 6, 2017 at 1:07 PM, Enrico Olivelli <eolivelli@gmail.com
> >
> > >>> wrote:
> > >>>
> > >>> > Off topic curiosity... Jia and Sijie, do you think we are going to
> > >>> drop ZK
> > >>> > from DL too?
> > >>> >
> > >>>
> > >>> Yes. That's the goal - 1) for large deployment, we are trying to
> > overcome
> > >>> the limitation of zookeeper; 2) for smaller deployments, it will make
> > >>> deployment much easier, you just need to deploy a cluster of bookies.
> > >>> once
> > >>> it is done, you can use ledger api or log stream api to access the
> > >>> bookkeeper cluster.
> > >>>
> > >>> Both DL and BK are metadata storage pluggable. They have very clear
> > >>> interfaces on defining metadata operations. So it is straightforward
> to
> > >>> use
> > >>> a different metadata storage.
> > >>>
> > >>>
> > >>> > Enrico
> > >>> >
> > >>> > On mer 6 set 2017, 19:51 Enrico Olivelli <eo...@gmail.com>
> > wrote:
> > >>> >
> > >>> > >
> > >>> > >
> > >>> > > On mer 6 set 2017, 18:25 Sijie Guo <gu...@gmail.com> wrote:
> > >>> > >
> > >>> > >> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <eo...@gmail.com>
> > >>> wrote:
> > >>> > >>
> > >>> > >> Thank you Sijie and Jia for your comments and explanations,
> > >>> > >> answers inline
> > >>> > >>
> > >>> > >> 2017-09-06 2:23 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> > >>> > >>
> > >>> > >> > Thanks a lot Enrico and Sijie for your comments and
> information
> > on
> > >>> > this.
> > >>> > >> >
> > >>> > >> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <
> > >>> eolivelli@gmail.com>
> > >>> > >> > wrote:
> > >>> > >> >
> > >>> > >> > > Great to see you working on this !
> > >>> > >> > > I would be great to have such feature, as it is the first
> step
> > >>> to a
> > >>> > >> > > 'standalone' BookKeeper mode
> > >>> > >> > >
> > >>> > >> > > Some complementary ideas/first look questions:
> > >>> > >> > > - the document does not talk about security, IMHO we have at
> > >>> least
> > >>> > to
> > >>> > >> > cover
> > >>> > >> > > authentication and TLS, it would be great to leverage
> existing
> > >>> > >> > AuthPlugins,
> > >>> > >> > > as they are based on exchanging byte[] (as SASL wants)
> > >>> > >> > >
> > >>> > >> > [Jia] It is a good idea. We left the security part for now
> for a
> > >>> few
> > >>> > >> > reasons. 1) Make this BP more focus on removing zookeeper
> > >>> dependencies
> > >>> > >> from
> > >>> > >> > client. 2) It is introduced as a separated implementation of
> > >>> existing
> > >>> > >> > interfaces. So it won’t impact existing security story.   And
> > for
> > >>> > sure,
> > >>> > >> We
> > >>> > >> > will add the security part later after this.
> > >>> > >> >
> > >>> > >>
> > >>> > >>
> > >>> > >> I am fine, I am only afraid that we won't be able to support it
> in
> > >>> the
> > >>> > >> (near) future,
> > >>> > >> maybe you could just only cite the security story and add some
> > >>> reference
> > >>> > >> to
> > >>> > >> how we would deal with it in future
> > >>> > >>
> > >>> > >>
> > >>> > >> The new ledger manager will be first marked as experimental,
> until
> > >>> it is
> > >>> > >> stable and have security feature.
> > >>> > >>
> > >>> > >> How does that sound?
> > >>> > >>
> > >>> > >
> > >>> > > Ok
> > >>> > >
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >> >
> > >>> > >> > - do we have some kind of "bootstrap servers list"
> configuration
> > >>> > option
> > >>> > >> ?
> > >>> > >> > > the list should be complete or just a subset of bookies ? at
> > >>> > >> connection
> > >>> > >> > the
> > >>> > >> > > client could discover the list of other bookies
> > >>> > >> > >
> > >>> > >> > [Jia] Yes, we will have a `clientBootstrapBookies` settings in
> > the
> > >>> > >> server
> > >>> > >> > set. It can be a list of bookies or just simple a DNS over the
> > >>> > bookies.
> > >>> > >> > Will add this to the BP
> > >>> > >> >
> > >>> > >> > - will the client connect to only one bookie at a time ? how
> we
> > >>> will
> > >>> > >> deal
> > >>> > >> > > with errors ?
> > >>> > >> > >
> > >>> > >> > [Jia] It will connect the the list of bootstrap servers. gPRC
> > will
> > >>> > load
> > >>> > >> > balance the requests and manage the connection errors.
> > >>> > >> >
> > >>> > >> > - should the bookie write on ZK metadata its gRPC endpoint
> info
> > ?
> > >>> > (this
> > >>> > >> > > will be useful for a bookie to tell about other bookies to
> the
> > >>> > >> connected
> > >>> > >> > > clients)
> > >>> > >> > >
> > >>> > >> > [Jia]No, it won’t. We don’t see a strong reason to add it.
> > >>> Especially
> > >>> > >> > eventually we may eliminate zookeeper completely.
> > >>> > >> > It can be a fixed port `3281`, or in a scheduler-based
> > >>> environment, it
> > >>> > >> is
> > >>> > >> > very easy to have a load balancer sitting in front of those
> > >>> bookies.
> > >>> > >> >
> > >>> > >>
> > >>> > >> I think a fixed port is not a good way.
> > >>> > >> You will not be able to run more than one bookie on a single
> host.
> > >>> > >>
> > >>> > >> We should support:
> > >>> > >> - configurable port
> > >>> > >> - ephemeral port for tests
> > >>> > >>
> > >>> > >>
> > >>> > >> I think what Jia means is a configurable port, but it is a
> > >>> relatively
> > >>> > >> fixed
> > >>> > >> port, which client doesn't discover this port from zookeeper.
> > >>> > >>
> > >>> > >
> > >>> > > Very good
> > >>> > >
> > >>> > >>
> > >>> > >>
> > >>> > >> Ideally I would like to have the local transport option, in
> order
> > to
> > >>> > have
> > >>> > >> a
> > >>> > >> single JVM, but this is not a blocker problem, as we are running
> > >>> gRPC on
> > >>> > >> netty it should be feasible or we can create some kind of
> > >>> short-circut
> > >>> > >> between the client and the Bookie
> > >>> > >>
> > >>> > >>
> > >>> > >> GRPC supports inprocess channel. So you don't need to use the
> low
> > >>> level
> > >>> > >> netty settings.
> > >>> > >>
> > >>> > >
> > >>> > > Great
> > >>> > >
> > >>> > > So it sounds all good to me thanks
> > >>> > >
> > >>> > > Enrico
> > >>> > >
> > >>> > >
> > >>> > >>
> > >>> > >> I am OK for not writing this to the bookie metadata, leaving up
> to
> > >>> the
> > >>> > >> client have a configured list of bookies enabled to metadata
> > >>> operations
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >> >
> > >>> > >> > - the bookie will be somehow a proxy for zookeeper, I think
> that
> > >>> the
> > >>> > >> > > 'watch' part is the more complex, we will have to deal with
> > >>> > >> > reconnections,
> > >>> > >> > > errors....maybe it is worth to write more detail about this
> > >>> > >> > >
> > >>> > >> > [Jia] The `watch` API is using the `streaming` rpc in gRPC. It
> > is
> > >>> a
> > >>> > >> > straightforward proxy behavior, if a connection is broken, the
> > >>> client
> > >>> > >> will
> > >>> > >> > simply retry on watching again.
> > >>> > >> >
> > >>> > >> >
> > >>> > >> > > Minor issues:
> > >>> > >> > > - Maybe you can consider using ledgerId and not ledger_id,
> > like
> > >>> in
> > >>> > >> > > LedgerMetadataFormat we are using lastEntryId
> > >>> > >> > >
> > >>> > >> > [Jia] Thanks, It is a protobuf style. The protobuf will
> convert
> > >>> > >> `ledger_id`
> > >>> > >> > to `ledgerId`. We don’t need to worry about this.
> > >>> > >> >
> > >>> > >>
> > >>> > >> got it, thanks
> > >>> > >>
> > >>> > >>
> > >>> > >> >
> > >>> > >> >
> > >>> > >> > > -In the "motivation" part you write that the fact the having
> > >>> more
> > >>> > >> clients
> > >>> > >> > > than the number of bookies would be a problem for zookeeper,
> > >>> > actually
> > >>> > >> > > zookeeper is very good at dealing with a huge number of
> > clients.
> > >>> > >> > Actually I
> > >>> > >> > > am always running clusters with 3-5 bookies and 10-100
> writing
> > >>> > clients
> > >>> > >> > and
> > >>> > >> > > this has never given troubles
> > >>> > >> >
> > >>> > >> > [Jia] :) Seems “10-100 writing clients” is not “a huge number
> of
> > >>> > >> clients”.
> > >>> > >> >
> > >>> > >>
> > >>> > >> OK, I agree with you an Sijie, I have no experience of larger
> > >>> clusters
> > >>> > >>
> > >>> > >>
> > >>> > >> >
> > >>> > >> > >
> > >>> > >> >
> > >>> > >> >
> > >>> > >> >
> > >>> > >> > > Future:
> > >>> > >> > > - as bookies will be proxies maybe we should take care not
> to
> > >>> > >> overwhelm
> > >>> > >> a
> > >>> > >> > > bookie with too many clients
> > >>> > >> > >
> > >>> > >> > [Jia] First, gRPC is based on Netty, the protocol is http2, so
> > the
> > >>> > >> > connection is multiplexed. We don’t need to worry about
> > connection
> > >>> > >> count.
> > >>> > >> > Second, all the bookies are treated equally for the metadata
> > >>> > operations,
> > >>> > >> > gRPC will load balancing the requests across the bookies. We
> > don’t
> > >>> > need
> > >>> > >> to
> > >>> > >> > worry about some bookies are overwhelmed.
> > >>> > >> >
> > >>> > >>
> > >>> > >> gRPC sounds great
> > >>> > >>
> > >>> > >>
> > >>> > >> >
> > >>> > >> >
> > >>> > >> > > - iteration on ledgers, sometimes the clients enumerates
> > >>> ledgers but
> > >>> > >> it
> > >>> > >> > is
> > >>> > >> > > not interested in having all of them, as we are using the
> > >>> bookie as
> > >>> > >> proxy
> > >>> > >> > > maybe some kind of "filter" (at least on custom metadata)
> > would
> > >>> be
> > >>> > >> create
> > >>> > >> > > to limit the number of returned items. Other point I don't
> > know
> > >>> gRPC
> > >>> > >> but
> > >>> > >> > it
> > >>> > >> > > does not seems to be very clear how to 'stop' the iteration
> > >>> > >> > >
> > >>> > >> > [Jia] Thanks, We can add it later. For now, we would like to
> > >>> focus on
> > >>> > >> > adding the features the ledger manager needs.
> > >>> > >> >
> > >>> > >>
> > >>> > >> Yup
> > >>> > >>
> > >>> > >> -- Enrico
> > >>> > >>
> > >>> > >>
> > >>> > >> >
> > >>> > >> > >
> > >>> > >> > > -- Enrico
> > >>> > >> > >
> > >>> > >> > >
> > >>> > >> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> > >>> > >> > >
> > >>> > >> > > > Hi all,
> > >>> > >> > > >
> > >>> > >> > > > I have just posted a proposal to remove zookeeper
> dependency
> > >>> from
> > >>> > >> > > > bookkeeper client, to make bookkeeper client a thin
> client:
> > >>> > >> > > >
> > >>> > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > >>> > >> > > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+
> client
> > >>> > >> > > >
> > >>> > >> > > >
> > >>> > >> > > > BookKeeper uses zookeeper for service discovery
> (discovering
> > >>> the
> > >>> > >> > > available
> > >>> > >> > > > bookies in the cluster), metadata management (storing all
> > the
> > >>> > >> metadata
> > >>> > >> > > for
> > >>> > >> > > > ledgers). However it exposes the metadata storage directly
> > to
> > >>> the
> > >>> > >> > > clients,
> > >>> > >> > > > making bookkeeper client a very thick client. It also
> > exposes
> > >>> some
> > >>> > >> > > > problems.
> > >>> > >> > > >
> > >>> > >> > > > This BP explores the possibility of eliminating zookeeper
> > >>> > completely
> > >>> > >> > from
> > >>> > >> > > > client side, to produce a thin bookkeeper client.
> > >>> > >> > > >
> > >>> > >> > > > I will send a patch as soon as we agree on the proposal.
> > >>> > >> > > >
> > >>> > >> > > >
> > >>> > >> > > > Thanks.
> > >>> > >> > > >
> > >>> > >> > > > -Jia
> > >>> > >> > > >
> > >>> > >> > >
> > >>> > >> >
> > >>> > >>
> > >>> > > --
> > >>> > >
> > >>> > >
> > >>> > > -- Enrico Olivelli
> > >>> > >
> > >>> > --
> > >>> >
> > >>> >
> > >>> > -- Enrico Olivelli
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Enrico Olivelli <eo...@gmail.com>.
I think that this is a good direction to go.

I believe to the reasons about ZK in huge systems even it is not my case so
I cannot add comments on this usecase.

I am fine with direction as long as we are still going to support ZooKeeper.
BookKeeper is in the Hadoop / ZooKeeper ecosystem and several products rely
on ZK too, for instance in my systems it is usual to have
BookKeeper/Kafka/HBase/Majordodo....  and so I am not going to live without
zookeeper in the short/mid term.

I am really OK in dropping ZK because for "simple" systems in fact when you
need only BK having the burden of setting up a zookeeper server is weird
for customers. I usually re-distribute BK + ZK with my applications and we
are talking about little clusters of up to 10 machines.

The direction on this proposal is OK for me and it is very like the work I
was starting about "standalone mode".

I think it will be very easy to support the case of having a single bookie
with this approach or even client+ bookie in the same JVM,
Having multiple bookies will make us to add some other coordination
facility between bookies, I would like to know if there is already some
idea about this, are we going to use another product like etcd,jgroups or
implement our own coordination protocol ? ZK is simple but it very
effective. Maybe we could help the ZK community to move forward and resolve
the problems we are bringing to light


Enrico


2017-09-13 3:15 GMT+02:00 Jia Zhai <zh...@gmail.com>:

> Any thoughts or comments
> :)
>
> Thanks a lot.
> -Jia
>
> On Tue, Sep 12, 2017 at 4:30 PM, Jia Zhai <zh...@gmail.com> wrote:
>
> > This blog: https://bitworks.software/blog/en/2017-07-12-replicated-
> > scalable-commitlog-with-apachebookkeeper.html, which also refer a little
> > the limitation of zookeeper in bookkeeper
> >
> > On Thu, Sep 7, 2017 at 9:45 AM, Jia Zhai <zh...@gmail.com> wrote:
> >
> >> đź‘Ť. Thanks a lot for the suggestions and feed back.
> >>
> >> On Thu, Sep 7, 2017 at 4:24 AM, Sijie Guo <gu...@gmail.com> wrote:
> >>
> >>> On Wed, Sep 6, 2017 at 1:07 PM, Enrico Olivelli <eo...@gmail.com>
> >>> wrote:
> >>>
> >>> > Off topic curiosity... Jia and Sijie, do you think we are going to
> >>> drop ZK
> >>> > from DL too?
> >>> >
> >>>
> >>> Yes. That's the goal - 1) for large deployment, we are trying to
> overcome
> >>> the limitation of zookeeper; 2) for smaller deployments, it will make
> >>> deployment much easier, you just need to deploy a cluster of bookies.
> >>> once
> >>> it is done, you can use ledger api or log stream api to access the
> >>> bookkeeper cluster.
> >>>
> >>> Both DL and BK are metadata storage pluggable. They have very clear
> >>> interfaces on defining metadata operations. So it is straightforward to
> >>> use
> >>> a different metadata storage.
> >>>
> >>>
> >>> > Enrico
> >>> >
> >>> > On mer 6 set 2017, 19:51 Enrico Olivelli <eo...@gmail.com>
> wrote:
> >>> >
> >>> > >
> >>> > >
> >>> > > On mer 6 set 2017, 18:25 Sijie Guo <gu...@gmail.com> wrote:
> >>> > >
> >>> > >> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <eo...@gmail.com>
> >>> wrote:
> >>> > >>
> >>> > >> Thank you Sijie and Jia for your comments and explanations,
> >>> > >> answers inline
> >>> > >>
> >>> > >> 2017-09-06 2:23 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> >>> > >>
> >>> > >> > Thanks a lot Enrico and Sijie for your comments and information
> on
> >>> > this.
> >>> > >> >
> >>> > >> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <
> >>> eolivelli@gmail.com>
> >>> > >> > wrote:
> >>> > >> >
> >>> > >> > > Great to see you working on this !
> >>> > >> > > I would be great to have such feature, as it is the first step
> >>> to a
> >>> > >> > > 'standalone' BookKeeper mode
> >>> > >> > >
> >>> > >> > > Some complementary ideas/first look questions:
> >>> > >> > > - the document does not talk about security, IMHO we have at
> >>> least
> >>> > to
> >>> > >> > cover
> >>> > >> > > authentication and TLS, it would be great to leverage existing
> >>> > >> > AuthPlugins,
> >>> > >> > > as they are based on exchanging byte[] (as SASL wants)
> >>> > >> > >
> >>> > >> > [Jia] It is a good idea. We left the security part for now for a
> >>> few
> >>> > >> > reasons. 1) Make this BP more focus on removing zookeeper
> >>> dependencies
> >>> > >> from
> >>> > >> > client. 2) It is introduced as a separated implementation of
> >>> existing
> >>> > >> > interfaces. So it won’t impact existing security story.   And
> for
> >>> > sure,
> >>> > >> We
> >>> > >> > will add the security part later after this.
> >>> > >> >
> >>> > >>
> >>> > >>
> >>> > >> I am fine, I am only afraid that we won't be able to support it in
> >>> the
> >>> > >> (near) future,
> >>> > >> maybe you could just only cite the security story and add some
> >>> reference
> >>> > >> to
> >>> > >> how we would deal with it in future
> >>> > >>
> >>> > >>
> >>> > >> The new ledger manager will be first marked as experimental, until
> >>> it is
> >>> > >> stable and have security feature.
> >>> > >>
> >>> > >> How does that sound?
> >>> > >>
> >>> > >
> >>> > > Ok
> >>> > >
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >> >
> >>> > >> > - do we have some kind of "bootstrap servers list" configuration
> >>> > option
> >>> > >> ?
> >>> > >> > > the list should be complete or just a subset of bookies ? at
> >>> > >> connection
> >>> > >> > the
> >>> > >> > > client could discover the list of other bookies
> >>> > >> > >
> >>> > >> > [Jia] Yes, we will have a `clientBootstrapBookies` settings in
> the
> >>> > >> server
> >>> > >> > set. It can be a list of bookies or just simple a DNS over the
> >>> > bookies.
> >>> > >> > Will add this to the BP
> >>> > >> >
> >>> > >> > - will the client connect to only one bookie at a time ? how we
> >>> will
> >>> > >> deal
> >>> > >> > > with errors ?
> >>> > >> > >
> >>> > >> > [Jia] It will connect the the list of bootstrap servers. gPRC
> will
> >>> > load
> >>> > >> > balance the requests and manage the connection errors.
> >>> > >> >
> >>> > >> > - should the bookie write on ZK metadata its gRPC endpoint info
> ?
> >>> > (this
> >>> > >> > > will be useful for a bookie to tell about other bookies to the
> >>> > >> connected
> >>> > >> > > clients)
> >>> > >> > >
> >>> > >> > [Jia]No, it won’t. We don’t see a strong reason to add it.
> >>> Especially
> >>> > >> > eventually we may eliminate zookeeper completely.
> >>> > >> > It can be a fixed port `3281`, or in a scheduler-based
> >>> environment, it
> >>> > >> is
> >>> > >> > very easy to have a load balancer sitting in front of those
> >>> bookies.
> >>> > >> >
> >>> > >>
> >>> > >> I think a fixed port is not a good way.
> >>> > >> You will not be able to run more than one bookie on a single host.
> >>> > >>
> >>> > >> We should support:
> >>> > >> - configurable port
> >>> > >> - ephemeral port for tests
> >>> > >>
> >>> > >>
> >>> > >> I think what Jia means is a configurable port, but it is a
> >>> relatively
> >>> > >> fixed
> >>> > >> port, which client doesn't discover this port from zookeeper.
> >>> > >>
> >>> > >
> >>> > > Very good
> >>> > >
> >>> > >>
> >>> > >>
> >>> > >> Ideally I would like to have the local transport option, in order
> to
> >>> > have
> >>> > >> a
> >>> > >> single JVM, but this is not a blocker problem, as we are running
> >>> gRPC on
> >>> > >> netty it should be feasible or we can create some kind of
> >>> short-circut
> >>> > >> between the client and the Bookie
> >>> > >>
> >>> > >>
> >>> > >> GRPC supports inprocess channel. So you don't need to use the low
> >>> level
> >>> > >> netty settings.
> >>> > >>
> >>> > >
> >>> > > Great
> >>> > >
> >>> > > So it sounds all good to me thanks
> >>> > >
> >>> > > Enrico
> >>> > >
> >>> > >
> >>> > >>
> >>> > >> I am OK for not writing this to the bookie metadata, leaving up to
> >>> the
> >>> > >> client have a configured list of bookies enabled to metadata
> >>> operations
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >> >
> >>> > >> > - the bookie will be somehow a proxy for zookeeper, I think that
> >>> the
> >>> > >> > > 'watch' part is the more complex, we will have to deal with
> >>> > >> > reconnections,
> >>> > >> > > errors....maybe it is worth to write more detail about this
> >>> > >> > >
> >>> > >> > [Jia] The `watch` API is using the `streaming` rpc in gRPC. It
> is
> >>> a
> >>> > >> > straightforward proxy behavior, if a connection is broken, the
> >>> client
> >>> > >> will
> >>> > >> > simply retry on watching again.
> >>> > >> >
> >>> > >> >
> >>> > >> > > Minor issues:
> >>> > >> > > - Maybe you can consider using ledgerId and not ledger_id,
> like
> >>> in
> >>> > >> > > LedgerMetadataFormat we are using lastEntryId
> >>> > >> > >
> >>> > >> > [Jia] Thanks, It is a protobuf style. The protobuf will convert
> >>> > >> `ledger_id`
> >>> > >> > to `ledgerId`. We don’t need to worry about this.
> >>> > >> >
> >>> > >>
> >>> > >> got it, thanks
> >>> > >>
> >>> > >>
> >>> > >> >
> >>> > >> >
> >>> > >> > > -In the "motivation" part you write that the fact the having
> >>> more
> >>> > >> clients
> >>> > >> > > than the number of bookies would be a problem for zookeeper,
> >>> > actually
> >>> > >> > > zookeeper is very good at dealing with a huge number of
> clients.
> >>> > >> > Actually I
> >>> > >> > > am always running clusters with 3-5 bookies and 10-100 writing
> >>> > clients
> >>> > >> > and
> >>> > >> > > this has never given troubles
> >>> > >> >
> >>> > >> > [Jia] :) Seems “10-100 writing clients” is not “a huge number of
> >>> > >> clients”.
> >>> > >> >
> >>> > >>
> >>> > >> OK, I agree with you an Sijie, I have no experience of larger
> >>> clusters
> >>> > >>
> >>> > >>
> >>> > >> >
> >>> > >> > >
> >>> > >> >
> >>> > >> >
> >>> > >> >
> >>> > >> > > Future:
> >>> > >> > > - as bookies will be proxies maybe we should take care not to
> >>> > >> overwhelm
> >>> > >> a
> >>> > >> > > bookie with too many clients
> >>> > >> > >
> >>> > >> > [Jia] First, gRPC is based on Netty, the protocol is http2, so
> the
> >>> > >> > connection is multiplexed. We don’t need to worry about
> connection
> >>> > >> count.
> >>> > >> > Second, all the bookies are treated equally for the metadata
> >>> > operations,
> >>> > >> > gRPC will load balancing the requests across the bookies. We
> don’t
> >>> > need
> >>> > >> to
> >>> > >> > worry about some bookies are overwhelmed.
> >>> > >> >
> >>> > >>
> >>> > >> gRPC sounds great
> >>> > >>
> >>> > >>
> >>> > >> >
> >>> > >> >
> >>> > >> > > - iteration on ledgers, sometimes the clients enumerates
> >>> ledgers but
> >>> > >> it
> >>> > >> > is
> >>> > >> > > not interested in having all of them, as we are using the
> >>> bookie as
> >>> > >> proxy
> >>> > >> > > maybe some kind of "filter" (at least on custom metadata)
> would
> >>> be
> >>> > >> create
> >>> > >> > > to limit the number of returned items. Other point I don't
> know
> >>> gRPC
> >>> > >> but
> >>> > >> > it
> >>> > >> > > does not seems to be very clear how to 'stop' the iteration
> >>> > >> > >
> >>> > >> > [Jia] Thanks, We can add it later. For now, we would like to
> >>> focus on
> >>> > >> > adding the features the ledger manager needs.
> >>> > >> >
> >>> > >>
> >>> > >> Yup
> >>> > >>
> >>> > >> -- Enrico
> >>> > >>
> >>> > >>
> >>> > >> >
> >>> > >> > >
> >>> > >> > > -- Enrico
> >>> > >> > >
> >>> > >> > >
> >>> > >> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> >>> > >> > >
> >>> > >> > > > Hi all,
> >>> > >> > > >
> >>> > >> > > > I have just posted a proposal to remove zookeeper dependency
> >>> from
> >>> > >> > > > bookkeeper client, to make bookkeeper client a thin client:
> >>> > >> > > >
> >>> > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> >>> > >> > > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
> >>> > >> > > >
> >>> > >> > > >
> >>> > >> > > > BookKeeper uses zookeeper for service discovery (discovering
> >>> the
> >>> > >> > > available
> >>> > >> > > > bookies in the cluster), metadata management (storing all
> the
> >>> > >> metadata
> >>> > >> > > for
> >>> > >> > > > ledgers). However it exposes the metadata storage directly
> to
> >>> the
> >>> > >> > > clients,
> >>> > >> > > > making bookkeeper client a very thick client. It also
> exposes
> >>> some
> >>> > >> > > > problems.
> >>> > >> > > >
> >>> > >> > > > This BP explores the possibility of eliminating zookeeper
> >>> > completely
> >>> > >> > from
> >>> > >> > > > client side, to produce a thin bookkeeper client.
> >>> > >> > > >
> >>> > >> > > > I will send a patch as soon as we agree on the proposal.
> >>> > >> > > >
> >>> > >> > > >
> >>> > >> > > > Thanks.
> >>> > >> > > >
> >>> > >> > > > -Jia
> >>> > >> > > >
> >>> > >> > >
> >>> > >> >
> >>> > >>
> >>> > > --
> >>> > >
> >>> > >
> >>> > > -- Enrico Olivelli
> >>> > >
> >>> > --
> >>> >
> >>> >
> >>> > -- Enrico Olivelli
> >>> >
> >>>
> >>
> >>
> >
>

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Jia Zhai <zh...@gmail.com>.
Any thoughts or comments
:)

Thanks a lot.
-Jia

On Tue, Sep 12, 2017 at 4:30 PM, Jia Zhai <zh...@gmail.com> wrote:

> This blog: https://bitworks.software/blog/en/2017-07-12-replicated-
> scalable-commitlog-with-apachebookkeeper.html, which also refer a little
> the limitation of zookeeper in bookkeeper
>
> On Thu, Sep 7, 2017 at 9:45 AM, Jia Zhai <zh...@gmail.com> wrote:
>
>> đź‘Ť. Thanks a lot for the suggestions and feed back.
>>
>> On Thu, Sep 7, 2017 at 4:24 AM, Sijie Guo <gu...@gmail.com> wrote:
>>
>>> On Wed, Sep 6, 2017 at 1:07 PM, Enrico Olivelli <eo...@gmail.com>
>>> wrote:
>>>
>>> > Off topic curiosity... Jia and Sijie, do you think we are going to
>>> drop ZK
>>> > from DL too?
>>> >
>>>
>>> Yes. That's the goal - 1) for large deployment, we are trying to overcome
>>> the limitation of zookeeper; 2) for smaller deployments, it will make
>>> deployment much easier, you just need to deploy a cluster of bookies.
>>> once
>>> it is done, you can use ledger api or log stream api to access the
>>> bookkeeper cluster.
>>>
>>> Both DL and BK are metadata storage pluggable. They have very clear
>>> interfaces on defining metadata operations. So it is straightforward to
>>> use
>>> a different metadata storage.
>>>
>>>
>>> > Enrico
>>> >
>>> > On mer 6 set 2017, 19:51 Enrico Olivelli <eo...@gmail.com> wrote:
>>> >
>>> > >
>>> > >
>>> > > On mer 6 set 2017, 18:25 Sijie Guo <gu...@gmail.com> wrote:
>>> > >
>>> > >> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <eo...@gmail.com>
>>> wrote:
>>> > >>
>>> > >> Thank you Sijie and Jia for your comments and explanations,
>>> > >> answers inline
>>> > >>
>>> > >> 2017-09-06 2:23 GMT+02:00 Jia Zhai <zh...@gmail.com>:
>>> > >>
>>> > >> > Thanks a lot Enrico and Sijie for your comments and information on
>>> > this.
>>> > >> >
>>> > >> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <
>>> eolivelli@gmail.com>
>>> > >> > wrote:
>>> > >> >
>>> > >> > > Great to see you working on this !
>>> > >> > > I would be great to have such feature, as it is the first step
>>> to a
>>> > >> > > 'standalone' BookKeeper mode
>>> > >> > >
>>> > >> > > Some complementary ideas/first look questions:
>>> > >> > > - the document does not talk about security, IMHO we have at
>>> least
>>> > to
>>> > >> > cover
>>> > >> > > authentication and TLS, it would be great to leverage existing
>>> > >> > AuthPlugins,
>>> > >> > > as they are based on exchanging byte[] (as SASL wants)
>>> > >> > >
>>> > >> > [Jia] It is a good idea. We left the security part for now for a
>>> few
>>> > >> > reasons. 1) Make this BP more focus on removing zookeeper
>>> dependencies
>>> > >> from
>>> > >> > client. 2) It is introduced as a separated implementation of
>>> existing
>>> > >> > interfaces. So it won’t impact existing security story.   And for
>>> > sure,
>>> > >> We
>>> > >> > will add the security part later after this.
>>> > >> >
>>> > >>
>>> > >>
>>> > >> I am fine, I am only afraid that we won't be able to support it in
>>> the
>>> > >> (near) future,
>>> > >> maybe you could just only cite the security story and add some
>>> reference
>>> > >> to
>>> > >> how we would deal with it in future
>>> > >>
>>> > >>
>>> > >> The new ledger manager will be first marked as experimental, until
>>> it is
>>> > >> stable and have security feature.
>>> > >>
>>> > >> How does that sound?
>>> > >>
>>> > >
>>> > > Ok
>>> > >
>>> > >>
>>> > >>
>>> > >>
>>> > >> >
>>> > >> > - do we have some kind of "bootstrap servers list" configuration
>>> > option
>>> > >> ?
>>> > >> > > the list should be complete or just a subset of bookies ? at
>>> > >> connection
>>> > >> > the
>>> > >> > > client could discover the list of other bookies
>>> > >> > >
>>> > >> > [Jia] Yes, we will have a `clientBootstrapBookies` settings in the
>>> > >> server
>>> > >> > set. It can be a list of bookies or just simple a DNS over the
>>> > bookies.
>>> > >> > Will add this to the BP
>>> > >> >
>>> > >> > - will the client connect to only one bookie at a time ? how we
>>> will
>>> > >> deal
>>> > >> > > with errors ?
>>> > >> > >
>>> > >> > [Jia] It will connect the the list of bootstrap servers. gPRC will
>>> > load
>>> > >> > balance the requests and manage the connection errors.
>>> > >> >
>>> > >> > - should the bookie write on ZK metadata its gRPC endpoint info ?
>>> > (this
>>> > >> > > will be useful for a bookie to tell about other bookies to the
>>> > >> connected
>>> > >> > > clients)
>>> > >> > >
>>> > >> > [Jia]No, it won’t. We don’t see a strong reason to add it.
>>> Especially
>>> > >> > eventually we may eliminate zookeeper completely.
>>> > >> > It can be a fixed port `3281`, or in a scheduler-based
>>> environment, it
>>> > >> is
>>> > >> > very easy to have a load balancer sitting in front of those
>>> bookies.
>>> > >> >
>>> > >>
>>> > >> I think a fixed port is not a good way.
>>> > >> You will not be able to run more than one bookie on a single host.
>>> > >>
>>> > >> We should support:
>>> > >> - configurable port
>>> > >> - ephemeral port for tests
>>> > >>
>>> > >>
>>> > >> I think what Jia means is a configurable port, but it is a
>>> relatively
>>> > >> fixed
>>> > >> port, which client doesn't discover this port from zookeeper.
>>> > >>
>>> > >
>>> > > Very good
>>> > >
>>> > >>
>>> > >>
>>> > >> Ideally I would like to have the local transport option, in order to
>>> > have
>>> > >> a
>>> > >> single JVM, but this is not a blocker problem, as we are running
>>> gRPC on
>>> > >> netty it should be feasible or we can create some kind of
>>> short-circut
>>> > >> between the client and the Bookie
>>> > >>
>>> > >>
>>> > >> GRPC supports inprocess channel. So you don't need to use the low
>>> level
>>> > >> netty settings.
>>> > >>
>>> > >
>>> > > Great
>>> > >
>>> > > So it sounds all good to me thanks
>>> > >
>>> > > Enrico
>>> > >
>>> > >
>>> > >>
>>> > >> I am OK for not writing this to the bookie metadata, leaving up to
>>> the
>>> > >> client have a configured list of bookies enabled to metadata
>>> operations
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >> >
>>> > >> > - the bookie will be somehow a proxy for zookeeper, I think that
>>> the
>>> > >> > > 'watch' part is the more complex, we will have to deal with
>>> > >> > reconnections,
>>> > >> > > errors....maybe it is worth to write more detail about this
>>> > >> > >
>>> > >> > [Jia] The `watch` API is using the `streaming` rpc in gRPC. It is
>>> a
>>> > >> > straightforward proxy behavior, if a connection is broken, the
>>> client
>>> > >> will
>>> > >> > simply retry on watching again.
>>> > >> >
>>> > >> >
>>> > >> > > Minor issues:
>>> > >> > > - Maybe you can consider using ledgerId and not ledger_id, like
>>> in
>>> > >> > > LedgerMetadataFormat we are using lastEntryId
>>> > >> > >
>>> > >> > [Jia] Thanks, It is a protobuf style. The protobuf will convert
>>> > >> `ledger_id`
>>> > >> > to `ledgerId`. We don’t need to worry about this.
>>> > >> >
>>> > >>
>>> > >> got it, thanks
>>> > >>
>>> > >>
>>> > >> >
>>> > >> >
>>> > >> > > -In the "motivation" part you write that the fact the having
>>> more
>>> > >> clients
>>> > >> > > than the number of bookies would be a problem for zookeeper,
>>> > actually
>>> > >> > > zookeeper is very good at dealing with a huge number of clients.
>>> > >> > Actually I
>>> > >> > > am always running clusters with 3-5 bookies and 10-100 writing
>>> > clients
>>> > >> > and
>>> > >> > > this has never given troubles
>>> > >> >
>>> > >> > [Jia] :) Seems “10-100 writing clients” is not “a huge number of
>>> > >> clients”.
>>> > >> >
>>> > >>
>>> > >> OK, I agree with you an Sijie, I have no experience of larger
>>> clusters
>>> > >>
>>> > >>
>>> > >> >
>>> > >> > >
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> > > Future:
>>> > >> > > - as bookies will be proxies maybe we should take care not to
>>> > >> overwhelm
>>> > >> a
>>> > >> > > bookie with too many clients
>>> > >> > >
>>> > >> > [Jia] First, gRPC is based on Netty, the protocol is http2, so the
>>> > >> > connection is multiplexed. We don’t need to worry about connection
>>> > >> count.
>>> > >> > Second, all the bookies are treated equally for the metadata
>>> > operations,
>>> > >> > gRPC will load balancing the requests across the bookies. We don’t
>>> > need
>>> > >> to
>>> > >> > worry about some bookies are overwhelmed.
>>> > >> >
>>> > >>
>>> > >> gRPC sounds great
>>> > >>
>>> > >>
>>> > >> >
>>> > >> >
>>> > >> > > - iteration on ledgers, sometimes the clients enumerates
>>> ledgers but
>>> > >> it
>>> > >> > is
>>> > >> > > not interested in having all of them, as we are using the
>>> bookie as
>>> > >> proxy
>>> > >> > > maybe some kind of "filter" (at least on custom metadata) would
>>> be
>>> > >> create
>>> > >> > > to limit the number of returned items. Other point I don't know
>>> gRPC
>>> > >> but
>>> > >> > it
>>> > >> > > does not seems to be very clear how to 'stop' the iteration
>>> > >> > >
>>> > >> > [Jia] Thanks, We can add it later. For now, we would like to
>>> focus on
>>> > >> > adding the features the ledger manager needs.
>>> > >> >
>>> > >>
>>> > >> Yup
>>> > >>
>>> > >> -- Enrico
>>> > >>
>>> > >>
>>> > >> >
>>> > >> > >
>>> > >> > > -- Enrico
>>> > >> > >
>>> > >> > >
>>> > >> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zh...@gmail.com>:
>>> > >> > >
>>> > >> > > > Hi all,
>>> > >> > > >
>>> > >> > > > I have just posted a proposal to remove zookeeper dependency
>>> from
>>> > >> > > > bookkeeper client, to make bookkeeper client a thin client:
>>> > >> > > >
>>> > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
>>> > >> > > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
>>> > >> > > >
>>> > >> > > >
>>> > >> > > > BookKeeper uses zookeeper for service discovery (discovering
>>> the
>>> > >> > > available
>>> > >> > > > bookies in the cluster), metadata management (storing all the
>>> > >> metadata
>>> > >> > > for
>>> > >> > > > ledgers). However it exposes the metadata storage directly to
>>> the
>>> > >> > > clients,
>>> > >> > > > making bookkeeper client a very thick client. It also exposes
>>> some
>>> > >> > > > problems.
>>> > >> > > >
>>> > >> > > > This BP explores the possibility of eliminating zookeeper
>>> > completely
>>> > >> > from
>>> > >> > > > client side, to produce a thin bookkeeper client.
>>> > >> > > >
>>> > >> > > > I will send a patch as soon as we agree on the proposal.
>>> > >> > > >
>>> > >> > > >
>>> > >> > > > Thanks.
>>> > >> > > >
>>> > >> > > > -Jia
>>> > >> > > >
>>> > >> > >
>>> > >> >
>>> > >>
>>> > > --
>>> > >
>>> > >
>>> > > -- Enrico Olivelli
>>> > >
>>> > --
>>> >
>>> >
>>> > -- Enrico Olivelli
>>> >
>>>
>>
>>
>

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Jia Zhai <zh...@gmail.com>.
This blog:
https://bitworks.software/blog/en/2017-07-12-replicated-scalable-commitlog-with-apachebookkeeper.html,
which also refer a little the limitation of zookeeper in bookkeeper

On Thu, Sep 7, 2017 at 9:45 AM, Jia Zhai <zh...@gmail.com> wrote:

> đź‘Ť. Thanks a lot for the suggestions and feed back.
>
> On Thu, Sep 7, 2017 at 4:24 AM, Sijie Guo <gu...@gmail.com> wrote:
>
>> On Wed, Sep 6, 2017 at 1:07 PM, Enrico Olivelli <eo...@gmail.com>
>> wrote:
>>
>> > Off topic curiosity... Jia and Sijie, do you think we are going to drop
>> ZK
>> > from DL too?
>> >
>>
>> Yes. That's the goal - 1) for large deployment, we are trying to overcome
>> the limitation of zookeeper; 2) for smaller deployments, it will make
>> deployment much easier, you just need to deploy a cluster of bookies. once
>> it is done, you can use ledger api or log stream api to access the
>> bookkeeper cluster.
>>
>> Both DL and BK are metadata storage pluggable. They have very clear
>> interfaces on defining metadata operations. So it is straightforward to
>> use
>> a different metadata storage.
>>
>>
>> > Enrico
>> >
>> > On mer 6 set 2017, 19:51 Enrico Olivelli <eo...@gmail.com> wrote:
>> >
>> > >
>> > >
>> > > On mer 6 set 2017, 18:25 Sijie Guo <gu...@gmail.com> wrote:
>> > >
>> > >> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <eo...@gmail.com>
>> wrote:
>> > >>
>> > >> Thank you Sijie and Jia for your comments and explanations,
>> > >> answers inline
>> > >>
>> > >> 2017-09-06 2:23 GMT+02:00 Jia Zhai <zh...@gmail.com>:
>> > >>
>> > >> > Thanks a lot Enrico and Sijie for your comments and information on
>> > this.
>> > >> >
>> > >> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <
>> eolivelli@gmail.com>
>> > >> > wrote:
>> > >> >
>> > >> > > Great to see you working on this !
>> > >> > > I would be great to have such feature, as it is the first step
>> to a
>> > >> > > 'standalone' BookKeeper mode
>> > >> > >
>> > >> > > Some complementary ideas/first look questions:
>> > >> > > - the document does not talk about security, IMHO we have at
>> least
>> > to
>> > >> > cover
>> > >> > > authentication and TLS, it would be great to leverage existing
>> > >> > AuthPlugins,
>> > >> > > as they are based on exchanging byte[] (as SASL wants)
>> > >> > >
>> > >> > [Jia] It is a good idea. We left the security part for now for a
>> few
>> > >> > reasons. 1) Make this BP more focus on removing zookeeper
>> dependencies
>> > >> from
>> > >> > client. 2) It is introduced as a separated implementation of
>> existing
>> > >> > interfaces. So it won’t impact existing security story.   And for
>> > sure,
>> > >> We
>> > >> > will add the security part later after this.
>> > >> >
>> > >>
>> > >>
>> > >> I am fine, I am only afraid that we won't be able to support it in
>> the
>> > >> (near) future,
>> > >> maybe you could just only cite the security story and add some
>> reference
>> > >> to
>> > >> how we would deal with it in future
>> > >>
>> > >>
>> > >> The new ledger manager will be first marked as experimental, until
>> it is
>> > >> stable and have security feature.
>> > >>
>> > >> How does that sound?
>> > >>
>> > >
>> > > Ok
>> > >
>> > >>
>> > >>
>> > >>
>> > >> >
>> > >> > - do we have some kind of "bootstrap servers list" configuration
>> > option
>> > >> ?
>> > >> > > the list should be complete or just a subset of bookies ? at
>> > >> connection
>> > >> > the
>> > >> > > client could discover the list of other bookies
>> > >> > >
>> > >> > [Jia] Yes, we will have a `clientBootstrapBookies` settings in the
>> > >> server
>> > >> > set. It can be a list of bookies or just simple a DNS over the
>> > bookies.
>> > >> > Will add this to the BP
>> > >> >
>> > >> > - will the client connect to only one bookie at a time ? how we
>> will
>> > >> deal
>> > >> > > with errors ?
>> > >> > >
>> > >> > [Jia] It will connect the the list of bootstrap servers. gPRC will
>> > load
>> > >> > balance the requests and manage the connection errors.
>> > >> >
>> > >> > - should the bookie write on ZK metadata its gRPC endpoint info ?
>> > (this
>> > >> > > will be useful for a bookie to tell about other bookies to the
>> > >> connected
>> > >> > > clients)
>> > >> > >
>> > >> > [Jia]No, it won’t. We don’t see a strong reason to add it.
>> Especially
>> > >> > eventually we may eliminate zookeeper completely.
>> > >> > It can be a fixed port `3281`, or in a scheduler-based
>> environment, it
>> > >> is
>> > >> > very easy to have a load balancer sitting in front of those
>> bookies.
>> > >> >
>> > >>
>> > >> I think a fixed port is not a good way.
>> > >> You will not be able to run more than one bookie on a single host.
>> > >>
>> > >> We should support:
>> > >> - configurable port
>> > >> - ephemeral port for tests
>> > >>
>> > >>
>> > >> I think what Jia means is a configurable port, but it is a relatively
>> > >> fixed
>> > >> port, which client doesn't discover this port from zookeeper.
>> > >>
>> > >
>> > > Very good
>> > >
>> > >>
>> > >>
>> > >> Ideally I would like to have the local transport option, in order to
>> > have
>> > >> a
>> > >> single JVM, but this is not a blocker problem, as we are running
>> gRPC on
>> > >> netty it should be feasible or we can create some kind of
>> short-circut
>> > >> between the client and the Bookie
>> > >>
>> > >>
>> > >> GRPC supports inprocess channel. So you don't need to use the low
>> level
>> > >> netty settings.
>> > >>
>> > >
>> > > Great
>> > >
>> > > So it sounds all good to me thanks
>> > >
>> > > Enrico
>> > >
>> > >
>> > >>
>> > >> I am OK for not writing this to the bookie metadata, leaving up to
>> the
>> > >> client have a configured list of bookies enabled to metadata
>> operations
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> >
>> > >> > - the bookie will be somehow a proxy for zookeeper, I think that
>> the
>> > >> > > 'watch' part is the more complex, we will have to deal with
>> > >> > reconnections,
>> > >> > > errors....maybe it is worth to write more detail about this
>> > >> > >
>> > >> > [Jia] The `watch` API is using the `streaming` rpc in gRPC. It is a
>> > >> > straightforward proxy behavior, if a connection is broken, the
>> client
>> > >> will
>> > >> > simply retry on watching again.
>> > >> >
>> > >> >
>> > >> > > Minor issues:
>> > >> > > - Maybe you can consider using ledgerId and not ledger_id, like
>> in
>> > >> > > LedgerMetadataFormat we are using lastEntryId
>> > >> > >
>> > >> > [Jia] Thanks, It is a protobuf style. The protobuf will convert
>> > >> `ledger_id`
>> > >> > to `ledgerId`. We don’t need to worry about this.
>> > >> >
>> > >>
>> > >> got it, thanks
>> > >>
>> > >>
>> > >> >
>> > >> >
>> > >> > > -In the "motivation" part you write that the fact the having more
>> > >> clients
>> > >> > > than the number of bookies would be a problem for zookeeper,
>> > actually
>> > >> > > zookeeper is very good at dealing with a huge number of clients.
>> > >> > Actually I
>> > >> > > am always running clusters with 3-5 bookies and 10-100 writing
>> > clients
>> > >> > and
>> > >> > > this has never given troubles
>> > >> >
>> > >> > [Jia] :) Seems “10-100 writing clients” is not “a huge number of
>> > >> clients”.
>> > >> >
>> > >>
>> > >> OK, I agree with you an Sijie, I have no experience of larger
>> clusters
>> > >>
>> > >>
>> > >> >
>> > >> > >
>> > >> >
>> > >> >
>> > >> >
>> > >> > > Future:
>> > >> > > - as bookies will be proxies maybe we should take care not to
>> > >> overwhelm
>> > >> a
>> > >> > > bookie with too many clients
>> > >> > >
>> > >> > [Jia] First, gRPC is based on Netty, the protocol is http2, so the
>> > >> > connection is multiplexed. We don’t need to worry about connection
>> > >> count.
>> > >> > Second, all the bookies are treated equally for the metadata
>> > operations,
>> > >> > gRPC will load balancing the requests across the bookies. We don’t
>> > need
>> > >> to
>> > >> > worry about some bookies are overwhelmed.
>> > >> >
>> > >>
>> > >> gRPC sounds great
>> > >>
>> > >>
>> > >> >
>> > >> >
>> > >> > > - iteration on ledgers, sometimes the clients enumerates ledgers
>> but
>> > >> it
>> > >> > is
>> > >> > > not interested in having all of them, as we are using the bookie
>> as
>> > >> proxy
>> > >> > > maybe some kind of "filter" (at least on custom metadata) would
>> be
>> > >> create
>> > >> > > to limit the number of returned items. Other point I don't know
>> gRPC
>> > >> but
>> > >> > it
>> > >> > > does not seems to be very clear how to 'stop' the iteration
>> > >> > >
>> > >> > [Jia] Thanks, We can add it later. For now, we would like to focus
>> on
>> > >> > adding the features the ledger manager needs.
>> > >> >
>> > >>
>> > >> Yup
>> > >>
>> > >> -- Enrico
>> > >>
>> > >>
>> > >> >
>> > >> > >
>> > >> > > -- Enrico
>> > >> > >
>> > >> > >
>> > >> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zh...@gmail.com>:
>> > >> > >
>> > >> > > > Hi all,
>> > >> > > >
>> > >> > > > I have just posted a proposal to remove zookeeper dependency
>> from
>> > >> > > > bookkeeper client, to make bookkeeper client a thin client:
>> > >> > > >
>> > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
>> > >> > > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
>> > >> > > >
>> > >> > > >
>> > >> > > > BookKeeper uses zookeeper for service discovery (discovering
>> the
>> > >> > > available
>> > >> > > > bookies in the cluster), metadata management (storing all the
>> > >> metadata
>> > >> > > for
>> > >> > > > ledgers). However it exposes the metadata storage directly to
>> the
>> > >> > > clients,
>> > >> > > > making bookkeeper client a very thick client. It also exposes
>> some
>> > >> > > > problems.
>> > >> > > >
>> > >> > > > This BP explores the possibility of eliminating zookeeper
>> > completely
>> > >> > from
>> > >> > > > client side, to produce a thin bookkeeper client.
>> > >> > > >
>> > >> > > > I will send a patch as soon as we agree on the proposal.
>> > >> > > >
>> > >> > > >
>> > >> > > > Thanks.
>> > >> > > >
>> > >> > > > -Jia
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > > --
>> > >
>> > >
>> > > -- Enrico Olivelli
>> > >
>> > --
>> >
>> >
>> > -- Enrico Olivelli
>> >
>>
>
>

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Jia Zhai <zh...@gmail.com>.
đź‘Ť. Thanks a lot for the suggestions and feed back.

On Thu, Sep 7, 2017 at 4:24 AM, Sijie Guo <gu...@gmail.com> wrote:

> On Wed, Sep 6, 2017 at 1:07 PM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Off topic curiosity... Jia and Sijie, do you think we are going to drop
> ZK
> > from DL too?
> >
>
> Yes. That's the goal - 1) for large deployment, we are trying to overcome
> the limitation of zookeeper; 2) for smaller deployments, it will make
> deployment much easier, you just need to deploy a cluster of bookies. once
> it is done, you can use ledger api or log stream api to access the
> bookkeeper cluster.
>
> Both DL and BK are metadata storage pluggable. They have very clear
> interfaces on defining metadata operations. So it is straightforward to use
> a different metadata storage.
>
>
> > Enrico
> >
> > On mer 6 set 2017, 19:51 Enrico Olivelli <eo...@gmail.com> wrote:
> >
> > >
> > >
> > > On mer 6 set 2017, 18:25 Sijie Guo <gu...@gmail.com> wrote:
> > >
> > >> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <eo...@gmail.com>
> wrote:
> > >>
> > >> Thank you Sijie and Jia for your comments and explanations,
> > >> answers inline
> > >>
> > >> 2017-09-06 2:23 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> > >>
> > >> > Thanks a lot Enrico and Sijie for your comments and information on
> > this.
> > >> >
> > >> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <
> eolivelli@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Great to see you working on this !
> > >> > > I would be great to have such feature, as it is the first step to
> a
> > >> > > 'standalone' BookKeeper mode
> > >> > >
> > >> > > Some complementary ideas/first look questions:
> > >> > > - the document does not talk about security, IMHO we have at least
> > to
> > >> > cover
> > >> > > authentication and TLS, it would be great to leverage existing
> > >> > AuthPlugins,
> > >> > > as they are based on exchanging byte[] (as SASL wants)
> > >> > >
> > >> > [Jia] It is a good idea. We left the security part for now for a few
> > >> > reasons. 1) Make this BP more focus on removing zookeeper
> dependencies
> > >> from
> > >> > client. 2) It is introduced as a separated implementation of
> existing
> > >> > interfaces. So it won’t impact existing security story.   And for
> > sure,
> > >> We
> > >> > will add the security part later after this.
> > >> >
> > >>
> > >>
> > >> I am fine, I am only afraid that we won't be able to support it in the
> > >> (near) future,
> > >> maybe you could just only cite the security story and add some
> reference
> > >> to
> > >> how we would deal with it in future
> > >>
> > >>
> > >> The new ledger manager will be first marked as experimental, until it
> is
> > >> stable and have security feature.
> > >>
> > >> How does that sound?
> > >>
> > >
> > > Ok
> > >
> > >>
> > >>
> > >>
> > >> >
> > >> > - do we have some kind of "bootstrap servers list" configuration
> > option
> > >> ?
> > >> > > the list should be complete or just a subset of bookies ? at
> > >> connection
> > >> > the
> > >> > > client could discover the list of other bookies
> > >> > >
> > >> > [Jia] Yes, we will have a `clientBootstrapBookies` settings in the
> > >> server
> > >> > set. It can be a list of bookies or just simple a DNS over the
> > bookies.
> > >> > Will add this to the BP
> > >> >
> > >> > - will the client connect to only one bookie at a time ? how we will
> > >> deal
> > >> > > with errors ?
> > >> > >
> > >> > [Jia] It will connect the the list of bootstrap servers. gPRC will
> > load
> > >> > balance the requests and manage the connection errors.
> > >> >
> > >> > - should the bookie write on ZK metadata its gRPC endpoint info ?
> > (this
> > >> > > will be useful for a bookie to tell about other bookies to the
> > >> connected
> > >> > > clients)
> > >> > >
> > >> > [Jia]No, it won’t. We don’t see a strong reason to add it.
> Especially
> > >> > eventually we may eliminate zookeeper completely.
> > >> > It can be a fixed port `3281`, or in a scheduler-based environment,
> it
> > >> is
> > >> > very easy to have a load balancer sitting in front of those bookies.
> > >> >
> > >>
> > >> I think a fixed port is not a good way.
> > >> You will not be able to run more than one bookie on a single host.
> > >>
> > >> We should support:
> > >> - configurable port
> > >> - ephemeral port for tests
> > >>
> > >>
> > >> I think what Jia means is a configurable port, but it is a relatively
> > >> fixed
> > >> port, which client doesn't discover this port from zookeeper.
> > >>
> > >
> > > Very good
> > >
> > >>
> > >>
> > >> Ideally I would like to have the local transport option, in order to
> > have
> > >> a
> > >> single JVM, but this is not a blocker problem, as we are running gRPC
> on
> > >> netty it should be feasible or we can create some kind of short-circut
> > >> between the client and the Bookie
> > >>
> > >>
> > >> GRPC supports inprocess channel. So you don't need to use the low
> level
> > >> netty settings.
> > >>
> > >
> > > Great
> > >
> > > So it sounds all good to me thanks
> > >
> > > Enrico
> > >
> > >
> > >>
> > >> I am OK for not writing this to the bookie metadata, leaving up to the
> > >> client have a configured list of bookies enabled to metadata
> operations
> > >>
> > >>
> > >>
> > >>
> > >> >
> > >> > - the bookie will be somehow a proxy for zookeeper, I think that the
> > >> > > 'watch' part is the more complex, we will have to deal with
> > >> > reconnections,
> > >> > > errors....maybe it is worth to write more detail about this
> > >> > >
> > >> > [Jia] The `watch` API is using the `streaming` rpc in gRPC. It is a
> > >> > straightforward proxy behavior, if a connection is broken, the
> client
> > >> will
> > >> > simply retry on watching again.
> > >> >
> > >> >
> > >> > > Minor issues:
> > >> > > - Maybe you can consider using ledgerId and not ledger_id, like in
> > >> > > LedgerMetadataFormat we are using lastEntryId
> > >> > >
> > >> > [Jia] Thanks, It is a protobuf style. The protobuf will convert
> > >> `ledger_id`
> > >> > to `ledgerId`. We don’t need to worry about this.
> > >> >
> > >>
> > >> got it, thanks
> > >>
> > >>
> > >> >
> > >> >
> > >> > > -In the "motivation" part you write that the fact the having more
> > >> clients
> > >> > > than the number of bookies would be a problem for zookeeper,
> > actually
> > >> > > zookeeper is very good at dealing with a huge number of clients.
> > >> > Actually I
> > >> > > am always running clusters with 3-5 bookies and 10-100 writing
> > clients
> > >> > and
> > >> > > this has never given troubles
> > >> >
> > >> > [Jia] :) Seems “10-100 writing clients” is not “a huge number of
> > >> clients”.
> > >> >
> > >>
> > >> OK, I agree with you an Sijie, I have no experience of larger clusters
> > >>
> > >>
> > >> >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > > Future:
> > >> > > - as bookies will be proxies maybe we should take care not to
> > >> overwhelm
> > >> a
> > >> > > bookie with too many clients
> > >> > >
> > >> > [Jia] First, gRPC is based on Netty, the protocol is http2, so the
> > >> > connection is multiplexed. We don’t need to worry about connection
> > >> count.
> > >> > Second, all the bookies are treated equally for the metadata
> > operations,
> > >> > gRPC will load balancing the requests across the bookies. We don’t
> > need
> > >> to
> > >> > worry about some bookies are overwhelmed.
> > >> >
> > >>
> > >> gRPC sounds great
> > >>
> > >>
> > >> >
> > >> >
> > >> > > - iteration on ledgers, sometimes the clients enumerates ledgers
> but
> > >> it
> > >> > is
> > >> > > not interested in having all of them, as we are using the bookie
> as
> > >> proxy
> > >> > > maybe some kind of "filter" (at least on custom metadata) would be
> > >> create
> > >> > > to limit the number of returned items. Other point I don't know
> gRPC
> > >> but
> > >> > it
> > >> > > does not seems to be very clear how to 'stop' the iteration
> > >> > >
> > >> > [Jia] Thanks, We can add it later. For now, we would like to focus
> on
> > >> > adding the features the ledger manager needs.
> > >> >
> > >>
> > >> Yup
> > >>
> > >> -- Enrico
> > >>
> > >>
> > >> >
> > >> > >
> > >> > > -- Enrico
> > >> > >
> > >> > >
> > >> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> > >> > >
> > >> > > > Hi all,
> > >> > > >
> > >> > > > I have just posted a proposal to remove zookeeper dependency
> from
> > >> > > > bookkeeper client, to make bookkeeper client a thin client:
> > >> > > >
> > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > >> > > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
> > >> > > >
> > >> > > >
> > >> > > > BookKeeper uses zookeeper for service discovery (discovering the
> > >> > > available
> > >> > > > bookies in the cluster), metadata management (storing all the
> > >> metadata
> > >> > > for
> > >> > > > ledgers). However it exposes the metadata storage directly to
> the
> > >> > > clients,
> > >> > > > making bookkeeper client a very thick client. It also exposes
> some
> > >> > > > problems.
> > >> > > >
> > >> > > > This BP explores the possibility of eliminating zookeeper
> > completely
> > >> > from
> > >> > > > client side, to produce a thin bookkeeper client.
> > >> > > >
> > >> > > > I will send a patch as soon as we agree on the proposal.
> > >> > > >
> > >> > > >
> > >> > > > Thanks.
> > >> > > >
> > >> > > > -Jia
> > >> > > >
> > >> > >
> > >> >
> > >>
> > > --
> > >
> > >
> > > -- Enrico Olivelli
> > >
> > --
> >
> >
> > -- Enrico Olivelli
> >
>

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Sijie Guo <gu...@gmail.com>.
On Wed, Sep 6, 2017 at 1:07 PM, Enrico Olivelli <eo...@gmail.com> wrote:

> Off topic curiosity... Jia and Sijie, do you think we are going to drop ZK
> from DL too?
>

Yes. That's the goal - 1) for large deployment, we are trying to overcome
the limitation of zookeeper; 2) for smaller deployments, it will make
deployment much easier, you just need to deploy a cluster of bookies. once
it is done, you can use ledger api or log stream api to access the
bookkeeper cluster.

Both DL and BK are metadata storage pluggable. They have very clear
interfaces on defining metadata operations. So it is straightforward to use
a different metadata storage.


> Enrico
>
> On mer 6 set 2017, 19:51 Enrico Olivelli <eo...@gmail.com> wrote:
>
> >
> >
> > On mer 6 set 2017, 18:25 Sijie Guo <gu...@gmail.com> wrote:
> >
> >> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <eo...@gmail.com> wrote:
> >>
> >> Thank you Sijie and Jia for your comments and explanations,
> >> answers inline
> >>
> >> 2017-09-06 2:23 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> >>
> >> > Thanks a lot Enrico and Sijie for your comments and information on
> this.
> >> >
> >> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <eo...@gmail.com>
> >> > wrote:
> >> >
> >> > > Great to see you working on this !
> >> > > I would be great to have such feature, as it is the first step to a
> >> > > 'standalone' BookKeeper mode
> >> > >
> >> > > Some complementary ideas/first look questions:
> >> > > - the document does not talk about security, IMHO we have at least
> to
> >> > cover
> >> > > authentication and TLS, it would be great to leverage existing
> >> > AuthPlugins,
> >> > > as they are based on exchanging byte[] (as SASL wants)
> >> > >
> >> > [Jia] It is a good idea. We left the security part for now for a few
> >> > reasons. 1) Make this BP more focus on removing zookeeper dependencies
> >> from
> >> > client. 2) It is introduced as a separated implementation of existing
> >> > interfaces. So it won’t impact existing security story.   And for
> sure,
> >> We
> >> > will add the security part later after this.
> >> >
> >>
> >>
> >> I am fine, I am only afraid that we won't be able to support it in the
> >> (near) future,
> >> maybe you could just only cite the security story and add some reference
> >> to
> >> how we would deal with it in future
> >>
> >>
> >> The new ledger manager will be first marked as experimental, until it is
> >> stable and have security feature.
> >>
> >> How does that sound?
> >>
> >
> > Ok
> >
> >>
> >>
> >>
> >> >
> >> > - do we have some kind of "bootstrap servers list" configuration
> option
> >> ?
> >> > > the list should be complete or just a subset of bookies ? at
> >> connection
> >> > the
> >> > > client could discover the list of other bookies
> >> > >
> >> > [Jia] Yes, we will have a `clientBootstrapBookies` settings in the
> >> server
> >> > set. It can be a list of bookies or just simple a DNS over the
> bookies.
> >> > Will add this to the BP
> >> >
> >> > - will the client connect to only one bookie at a time ? how we will
> >> deal
> >> > > with errors ?
> >> > >
> >> > [Jia] It will connect the the list of bootstrap servers. gPRC will
> load
> >> > balance the requests and manage the connection errors.
> >> >
> >> > - should the bookie write on ZK metadata its gRPC endpoint info ?
> (this
> >> > > will be useful for a bookie to tell about other bookies to the
> >> connected
> >> > > clients)
> >> > >
> >> > [Jia]No, it won’t. We don’t see a strong reason to add it. Especially
> >> > eventually we may eliminate zookeeper completely.
> >> > It can be a fixed port `3281`, or in a scheduler-based environment, it
> >> is
> >> > very easy to have a load balancer sitting in front of those bookies.
> >> >
> >>
> >> I think a fixed port is not a good way.
> >> You will not be able to run more than one bookie on a single host.
> >>
> >> We should support:
> >> - configurable port
> >> - ephemeral port for tests
> >>
> >>
> >> I think what Jia means is a configurable port, but it is a relatively
> >> fixed
> >> port, which client doesn't discover this port from zookeeper.
> >>
> >
> > Very good
> >
> >>
> >>
> >> Ideally I would like to have the local transport option, in order to
> have
> >> a
> >> single JVM, but this is not a blocker problem, as we are running gRPC on
> >> netty it should be feasible or we can create some kind of short-circut
> >> between the client and the Bookie
> >>
> >>
> >> GRPC supports inprocess channel. So you don't need to use the low level
> >> netty settings.
> >>
> >
> > Great
> >
> > So it sounds all good to me thanks
> >
> > Enrico
> >
> >
> >>
> >> I am OK for not writing this to the bookie metadata, leaving up to the
> >> client have a configured list of bookies enabled to metadata operations
> >>
> >>
> >>
> >>
> >> >
> >> > - the bookie will be somehow a proxy for zookeeper, I think that the
> >> > > 'watch' part is the more complex, we will have to deal with
> >> > reconnections,
> >> > > errors....maybe it is worth to write more detail about this
> >> > >
> >> > [Jia] The `watch` API is using the `streaming` rpc in gRPC. It is a
> >> > straightforward proxy behavior, if a connection is broken, the client
> >> will
> >> > simply retry on watching again.
> >> >
> >> >
> >> > > Minor issues:
> >> > > - Maybe you can consider using ledgerId and not ledger_id, like in
> >> > > LedgerMetadataFormat we are using lastEntryId
> >> > >
> >> > [Jia] Thanks, It is a protobuf style. The protobuf will convert
> >> `ledger_id`
> >> > to `ledgerId`. We don’t need to worry about this.
> >> >
> >>
> >> got it, thanks
> >>
> >>
> >> >
> >> >
> >> > > -In the "motivation" part you write that the fact the having more
> >> clients
> >> > > than the number of bookies would be a problem for zookeeper,
> actually
> >> > > zookeeper is very good at dealing with a huge number of clients.
> >> > Actually I
> >> > > am always running clusters with 3-5 bookies and 10-100 writing
> clients
> >> > and
> >> > > this has never given troubles
> >> >
> >> > [Jia] :) Seems “10-100 writing clients” is not “a huge number of
> >> clients”.
> >> >
> >>
> >> OK, I agree with you an Sijie, I have no experience of larger clusters
> >>
> >>
> >> >
> >> > >
> >> >
> >> >
> >> >
> >> > > Future:
> >> > > - as bookies will be proxies maybe we should take care not to
> >> overwhelm
> >> a
> >> > > bookie with too many clients
> >> > >
> >> > [Jia] First, gRPC is based on Netty, the protocol is http2, so the
> >> > connection is multiplexed. We don’t need to worry about connection
> >> count.
> >> > Second, all the bookies are treated equally for the metadata
> operations,
> >> > gRPC will load balancing the requests across the bookies. We don’t
> need
> >> to
> >> > worry about some bookies are overwhelmed.
> >> >
> >>
> >> gRPC sounds great
> >>
> >>
> >> >
> >> >
> >> > > - iteration on ledgers, sometimes the clients enumerates ledgers but
> >> it
> >> > is
> >> > > not interested in having all of them, as we are using the bookie as
> >> proxy
> >> > > maybe some kind of "filter" (at least on custom metadata) would be
> >> create
> >> > > to limit the number of returned items. Other point I don't know gRPC
> >> but
> >> > it
> >> > > does not seems to be very clear how to 'stop' the iteration
> >> > >
> >> > [Jia] Thanks, We can add it later. For now, we would like to focus on
> >> > adding the features the ledger manager needs.
> >> >
> >>
> >> Yup
> >>
> >> -- Enrico
> >>
> >>
> >> >
> >> > >
> >> > > -- Enrico
> >> > >
> >> > >
> >> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> >> > >
> >> > > > Hi all,
> >> > > >
> >> > > > I have just posted a proposal to remove zookeeper dependency from
> >> > > > bookkeeper client, to make bookkeeper client a thin client:
> >> > > >
> >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> >> > > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
> >> > > >
> >> > > >
> >> > > > BookKeeper uses zookeeper for service discovery (discovering the
> >> > > available
> >> > > > bookies in the cluster), metadata management (storing all the
> >> metadata
> >> > > for
> >> > > > ledgers). However it exposes the metadata storage directly to the
> >> > > clients,
> >> > > > making bookkeeper client a very thick client. It also exposes some
> >> > > > problems.
> >> > > >
> >> > > > This BP explores the possibility of eliminating zookeeper
> completely
> >> > from
> >> > > > client side, to produce a thin bookkeeper client.
> >> > > >
> >> > > > I will send a patch as soon as we agree on the proposal.
> >> > > >
> >> > > >
> >> > > > Thanks.
> >> > > >
> >> > > > -Jia
> >> > > >
> >> > >
> >> >
> >>
> > --
> >
> >
> > -- Enrico Olivelli
> >
> --
>
>
> -- Enrico Olivelli
>

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Enrico Olivelli <eo...@gmail.com>.
Off topic curiosity... Jia and Sijie, do you think we are going to drop ZK
from DL too?
Enrico

On mer 6 set 2017, 19:51 Enrico Olivelli <eo...@gmail.com> wrote:

>
>
> On mer 6 set 2017, 18:25 Sijie Guo <gu...@gmail.com> wrote:
>
>> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <eo...@gmail.com> wrote:
>>
>> Thank you Sijie and Jia for your comments and explanations,
>> answers inline
>>
>> 2017-09-06 2:23 GMT+02:00 Jia Zhai <zh...@gmail.com>:
>>
>> > Thanks a lot Enrico and Sijie for your comments and information on this.
>> >
>> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <eo...@gmail.com>
>> > wrote:
>> >
>> > > Great to see you working on this !
>> > > I would be great to have such feature, as it is the first step to a
>> > > 'standalone' BookKeeper mode
>> > >
>> > > Some complementary ideas/first look questions:
>> > > - the document does not talk about security, IMHO we have at least to
>> > cover
>> > > authentication and TLS, it would be great to leverage existing
>> > AuthPlugins,
>> > > as they are based on exchanging byte[] (as SASL wants)
>> > >
>> > [Jia] It is a good idea. We left the security part for now for a few
>> > reasons. 1) Make this BP more focus on removing zookeeper dependencies
>> from
>> > client. 2) It is introduced as a separated implementation of existing
>> > interfaces. So it won’t impact existing security story.   And for sure,
>> We
>> > will add the security part later after this.
>> >
>>
>>
>> I am fine, I am only afraid that we won't be able to support it in the
>> (near) future,
>> maybe you could just only cite the security story and add some reference
>> to
>> how we would deal with it in future
>>
>>
>> The new ledger manager will be first marked as experimental, until it is
>> stable and have security feature.
>>
>> How does that sound?
>>
>
> Ok
>
>>
>>
>>
>> >
>> > - do we have some kind of "bootstrap servers list" configuration option
>> ?
>> > > the list should be complete or just a subset of bookies ? at
>> connection
>> > the
>> > > client could discover the list of other bookies
>> > >
>> > [Jia] Yes, we will have a `clientBootstrapBookies` settings in the
>> server
>> > set. It can be a list of bookies or just simple a DNS over the bookies.
>> > Will add this to the BP
>> >
>> > - will the client connect to only one bookie at a time ? how we will
>> deal
>> > > with errors ?
>> > >
>> > [Jia] It will connect the the list of bootstrap servers. gPRC will load
>> > balance the requests and manage the connection errors.
>> >
>> > - should the bookie write on ZK metadata its gRPC endpoint info ? (this
>> > > will be useful for a bookie to tell about other bookies to the
>> connected
>> > > clients)
>> > >
>> > [Jia]No, it won’t. We don’t see a strong reason to add it. Especially
>> > eventually we may eliminate zookeeper completely.
>> > It can be a fixed port `3281`, or in a scheduler-based environment, it
>> is
>> > very easy to have a load balancer sitting in front of those bookies.
>> >
>>
>> I think a fixed port is not a good way.
>> You will not be able to run more than one bookie on a single host.
>>
>> We should support:
>> - configurable port
>> - ephemeral port for tests
>>
>>
>> I think what Jia means is a configurable port, but it is a relatively
>> fixed
>> port, which client doesn't discover this port from zookeeper.
>>
>
> Very good
>
>>
>>
>> Ideally I would like to have the local transport option, in order to have
>> a
>> single JVM, but this is not a blocker problem, as we are running gRPC on
>> netty it should be feasible or we can create some kind of short-circut
>> between the client and the Bookie
>>
>>
>> GRPC supports inprocess channel. So you don't need to use the low level
>> netty settings.
>>
>
> Great
>
> So it sounds all good to me thanks
>
> Enrico
>
>
>>
>> I am OK for not writing this to the bookie metadata, leaving up to the
>> client have a configured list of bookies enabled to metadata operations
>>
>>
>>
>>
>> >
>> > - the bookie will be somehow a proxy for zookeeper, I think that the
>> > > 'watch' part is the more complex, we will have to deal with
>> > reconnections,
>> > > errors....maybe it is worth to write more detail about this
>> > >
>> > [Jia] The `watch` API is using the `streaming` rpc in gRPC. It is a
>> > straightforward proxy behavior, if a connection is broken, the client
>> will
>> > simply retry on watching again.
>> >
>> >
>> > > Minor issues:
>> > > - Maybe you can consider using ledgerId and not ledger_id, like in
>> > > LedgerMetadataFormat we are using lastEntryId
>> > >
>> > [Jia] Thanks, It is a protobuf style. The protobuf will convert
>> `ledger_id`
>> > to `ledgerId`. We don’t need to worry about this.
>> >
>>
>> got it, thanks
>>
>>
>> >
>> >
>> > > -In the "motivation" part you write that the fact the having more
>> clients
>> > > than the number of bookies would be a problem for zookeeper, actually
>> > > zookeeper is very good at dealing with a huge number of clients.
>> > Actually I
>> > > am always running clusters with 3-5 bookies and 10-100 writing clients
>> > and
>> > > this has never given troubles
>> >
>> > [Jia] :) Seems “10-100 writing clients” is not “a huge number of
>> clients”.
>> >
>>
>> OK, I agree with you an Sijie, I have no experience of larger clusters
>>
>>
>> >
>> > >
>> >
>> >
>> >
>> > > Future:
>> > > - as bookies will be proxies maybe we should take care not to
>> overwhelm
>> a
>> > > bookie with too many clients
>> > >
>> > [Jia] First, gRPC is based on Netty, the protocol is http2, so the
>> > connection is multiplexed. We don’t need to worry about connection
>> count.
>> > Second, all the bookies are treated equally for the metadata operations,
>> > gRPC will load balancing the requests across the bookies. We don’t need
>> to
>> > worry about some bookies are overwhelmed.
>> >
>>
>> gRPC sounds great
>>
>>
>> >
>> >
>> > > - iteration on ledgers, sometimes the clients enumerates ledgers but
>> it
>> > is
>> > > not interested in having all of them, as we are using the bookie as
>> proxy
>> > > maybe some kind of "filter" (at least on custom metadata) would be
>> create
>> > > to limit the number of returned items. Other point I don't know gRPC
>> but
>> > it
>> > > does not seems to be very clear how to 'stop' the iteration
>> > >
>> > [Jia] Thanks, We can add it later. For now, we would like to focus on
>> > adding the features the ledger manager needs.
>> >
>>
>> Yup
>>
>> -- Enrico
>>
>>
>> >
>> > >
>> > > -- Enrico
>> > >
>> > >
>> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zh...@gmail.com>:
>> > >
>> > > > Hi all,
>> > > >
>> > > > I have just posted a proposal to remove zookeeper dependency from
>> > > > bookkeeper client, to make bookkeeper client a thin client:
>> > > >
>> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
>> > > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
>> > > >
>> > > >
>> > > > BookKeeper uses zookeeper for service discovery (discovering the
>> > > available
>> > > > bookies in the cluster), metadata management (storing all the
>> metadata
>> > > for
>> > > > ledgers). However it exposes the metadata storage directly to the
>> > > clients,
>> > > > making bookkeeper client a very thick client. It also exposes some
>> > > > problems.
>> > > >
>> > > > This BP explores the possibility of eliminating zookeeper completely
>> > from
>> > > > client side, to produce a thin bookkeeper client.
>> > > >
>> > > > I will send a patch as soon as we agree on the proposal.
>> > > >
>> > > >
>> > > > Thanks.
>> > > >
>> > > > -Jia
>> > > >
>> > >
>> >
>>
> --
>
>
> -- Enrico Olivelli
>
-- 


-- Enrico Olivelli

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Enrico Olivelli <eo...@gmail.com>.
On mer 6 set 2017, 18:25 Sijie Guo <gu...@gmail.com> wrote:

> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <eo...@gmail.com> wrote:
>
> Thank you Sijie and Jia for your comments and explanations,
> answers inline
>
> 2017-09-06 2:23 GMT+02:00 Jia Zhai <zh...@gmail.com>:
>
> > Thanks a lot Enrico and Sijie for your comments and information on this.
> >
> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > Great to see you working on this !
> > > I would be great to have such feature, as it is the first step to a
> > > 'standalone' BookKeeper mode
> > >
> > > Some complementary ideas/first look questions:
> > > - the document does not talk about security, IMHO we have at least to
> > cover
> > > authentication and TLS, it would be great to leverage existing
> > AuthPlugins,
> > > as they are based on exchanging byte[] (as SASL wants)
> > >
> > [Jia] It is a good idea. We left the security part for now for a few
> > reasons. 1) Make this BP more focus on removing zookeeper dependencies
> from
> > client. 2) It is introduced as a separated implementation of existing
> > interfaces. So it won’t impact existing security story.   And for sure,
> We
> > will add the security part later after this.
> >
>
>
> I am fine, I am only afraid that we won't be able to support it in the
> (near) future,
> maybe you could just only cite the security story and add some reference to
> how we would deal with it in future
>
>
> The new ledger manager will be first marked as experimental, until it is
> stable and have security feature.
>
> How does that sound?
>

Ok

>
>
>
> >
> > - do we have some kind of "bootstrap servers list" configuration option ?
> > > the list should be complete or just a subset of bookies ? at connection
> > the
> > > client could discover the list of other bookies
> > >
> > [Jia] Yes, we will have a `clientBootstrapBookies` settings in the server
> > set. It can be a list of bookies or just simple a DNS over the bookies.
> > Will add this to the BP
> >
> > - will the client connect to only one bookie at a time ? how we will deal
> > > with errors ?
> > >
> > [Jia] It will connect the the list of bootstrap servers. gPRC will load
> > balance the requests and manage the connection errors.
> >
> > - should the bookie write on ZK metadata its gRPC endpoint info ? (this
> > > will be useful for a bookie to tell about other bookies to the
> connected
> > > clients)
> > >
> > [Jia]No, it won’t. We don’t see a strong reason to add it. Especially
> > eventually we may eliminate zookeeper completely.
> > It can be a fixed port `3281`, or in a scheduler-based environment, it is
> > very easy to have a load balancer sitting in front of those bookies.
> >
>
> I think a fixed port is not a good way.
> You will not be able to run more than one bookie on a single host.
>
> We should support:
> - configurable port
> - ephemeral port for tests
>
>
> I think what Jia means is a configurable port, but it is a relatively fixed
> port, which client doesn't discover this port from zookeeper.
>

Very good

>
>
> Ideally I would like to have the local transport option, in order to have a
> single JVM, but this is not a blocker problem, as we are running gRPC on
> netty it should be feasible or we can create some kind of short-circut
> between the client and the Bookie
>
>
> GRPC supports inprocess channel. So you don't need to use the low level
> netty settings.
>

Great

So it sounds all good to me thanks

Enrico


>
> I am OK for not writing this to the bookie metadata, leaving up to the
> client have a configured list of bookies enabled to metadata operations
>
>
>
>
> >
> > - the bookie will be somehow a proxy for zookeeper, I think that the
> > > 'watch' part is the more complex, we will have to deal with
> > reconnections,
> > > errors....maybe it is worth to write more detail about this
> > >
> > [Jia] The `watch` API is using the `streaming` rpc in gRPC. It is a
> > straightforward proxy behavior, if a connection is broken, the client
> will
> > simply retry on watching again.
> >
> >
> > > Minor issues:
> > > - Maybe you can consider using ledgerId and not ledger_id, like in
> > > LedgerMetadataFormat we are using lastEntryId
> > >
> > [Jia] Thanks, It is a protobuf style. The protobuf will convert
> `ledger_id`
> > to `ledgerId`. We don’t need to worry about this.
> >
>
> got it, thanks
>
>
> >
> >
> > > -In the "motivation" part you write that the fact the having more
> clients
> > > than the number of bookies would be a problem for zookeeper, actually
> > > zookeeper is very good at dealing with a huge number of clients.
> > Actually I
> > > am always running clusters with 3-5 bookies and 10-100 writing clients
> > and
> > > this has never given troubles
> >
> > [Jia] :) Seems “10-100 writing clients” is not “a huge number of
> clients”.
> >
>
> OK, I agree with you an Sijie, I have no experience of larger clusters
>
>
> >
> > >
> >
> >
> >
> > > Future:
> > > - as bookies will be proxies maybe we should take care not to overwhelm
> a
> > > bookie with too many clients
> > >
> > [Jia] First, gRPC is based on Netty, the protocol is http2, so the
> > connection is multiplexed. We don’t need to worry about connection count.
> > Second, all the bookies are treated equally for the metadata operations,
> > gRPC will load balancing the requests across the bookies. We don’t need
> to
> > worry about some bookies are overwhelmed.
> >
>
> gRPC sounds great
>
>
> >
> >
> > > - iteration on ledgers, sometimes the clients enumerates ledgers but it
> > is
> > > not interested in having all of them, as we are using the bookie as
> proxy
> > > maybe some kind of "filter" (at least on custom metadata) would be
> create
> > > to limit the number of returned items. Other point I don't know gRPC
> but
> > it
> > > does not seems to be very clear how to 'stop' the iteration
> > >
> > [Jia] Thanks, We can add it later. For now, we would like to focus on
> > adding the features the ledger manager needs.
> >
>
> Yup
>
> -- Enrico
>
>
> >
> > >
> > > -- Enrico
> > >
> > >
> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> > >
> > > > Hi all,
> > > >
> > > > I have just posted a proposal to remove zookeeper dependency from
> > > > bookkeeper client, to make bookkeeper client a thin client:
> > > >
> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
> > > >
> > > >
> > > > BookKeeper uses zookeeper for service discovery (discovering the
> > > available
> > > > bookies in the cluster), metadata management (storing all the
> metadata
> > > for
> > > > ledgers). However it exposes the metadata storage directly to the
> > > clients,
> > > > making bookkeeper client a very thick client. It also exposes some
> > > > problems.
> > > >
> > > > This BP explores the possibility of eliminating zookeeper completely
> > from
> > > > client side, to produce a thin bookkeeper client.
> > > >
> > > > I will send a patch as soon as we agree on the proposal.
> > > >
> > > >
> > > > Thanks.
> > > >
> > > > -Jia
> > > >
> > >
> >
>
-- 


-- Enrico Olivelli

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Sijie Guo <gu...@gmail.com>.
On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <eo...@gmail.com> wrote:

Thank you Sijie and Jia for your comments and explanations,
answers inline

2017-09-06 2:23 GMT+02:00 Jia Zhai <zh...@gmail.com>:

> Thanks a lot Enrico and Sijie for your comments and information on this.
>
> On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Great to see you working on this !
> > I would be great to have such feature, as it is the first step to a
> > 'standalone' BookKeeper mode
> >
> > Some complementary ideas/first look questions:
> > - the document does not talk about security, IMHO we have at least to
> cover
> > authentication and TLS, it would be great to leverage existing
> AuthPlugins,
> > as they are based on exchanging byte[] (as SASL wants)
> >
> [Jia] It is a good idea. We left the security part for now for a few
> reasons. 1) Make this BP more focus on removing zookeeper dependencies
from
> client. 2) It is introduced as a separated implementation of existing
> interfaces. So it won’t impact existing security story.   And for sure, We
> will add the security part later after this.
>


I am fine, I am only afraid that we won't be able to support it in the
(near) future,
maybe you could just only cite the security story and add some reference to
how we would deal with it in future


The new ledger manager will be first marked as experimental, until it is
stable and have security feature.

How does that sound?



>
> - do we have some kind of "bootstrap servers list" configuration option ?
> > the list should be complete or just a subset of bookies ? at connection
> the
> > client could discover the list of other bookies
> >
> [Jia] Yes, we will have a `clientBootstrapBookies` settings in the server
> set. It can be a list of bookies or just simple a DNS over the bookies.
> Will add this to the BP
>
> - will the client connect to only one bookie at a time ? how we will deal
> > with errors ?
> >
> [Jia] It will connect the the list of bootstrap servers. gPRC will load
> balance the requests and manage the connection errors.
>
> - should the bookie write on ZK metadata its gRPC endpoint info ? (this
> > will be useful for a bookie to tell about other bookies to the connected
> > clients)
> >
> [Jia]No, it won’t. We don’t see a strong reason to add it. Especially
> eventually we may eliminate zookeeper completely.
> It can be a fixed port `3281`, or in a scheduler-based environment, it is
> very easy to have a load balancer sitting in front of those bookies.
>

I think a fixed port is not a good way.
You will not be able to run more than one bookie on a single host.

We should support:
- configurable port
- ephemeral port for tests


I think what Jia means is a configurable port, but it is a relatively fixed
port, which client doesn't discover this port from zookeeper.


Ideally I would like to have the local transport option, in order to have a
single JVM, but this is not a blocker problem, as we are running gRPC on
netty it should be feasible or we can create some kind of short-circut
between the client and the Bookie


GRPC supports inprocess channel. So you don't need to use the low level
netty settings.


I am OK for not writing this to the bookie metadata, leaving up to the
client have a configured list of bookies enabled to metadata operations




>
> - the bookie will be somehow a proxy for zookeeper, I think that the
> > 'watch' part is the more complex, we will have to deal with
> reconnections,
> > errors....maybe it is worth to write more detail about this
> >
> [Jia] The `watch` API is using the `streaming` rpc in gRPC. It is a
> straightforward proxy behavior, if a connection is broken, the client will
> simply retry on watching again.
>
>
> > Minor issues:
> > - Maybe you can consider using ledgerId and not ledger_id, like in
> > LedgerMetadataFormat we are using lastEntryId
> >
> [Jia] Thanks, It is a protobuf style. The protobuf will convert
`ledger_id`
> to `ledgerId`. We don’t need to worry about this.
>

got it, thanks


>
>
> > -In the "motivation" part you write that the fact the having more
clients
> > than the number of bookies would be a problem for zookeeper, actually
> > zookeeper is very good at dealing with a huge number of clients.
> Actually I
> > am always running clusters with 3-5 bookies and 10-100 writing clients
> and
> > this has never given troubles
>
> [Jia] :) Seems “10-100 writing clients” is not “a huge number of clients”.
>

OK, I agree with you an Sijie, I have no experience of larger clusters


>
> >
>
>
>
> > Future:
> > - as bookies will be proxies maybe we should take care not to overwhelm
a
> > bookie with too many clients
> >
> [Jia] First, gRPC is based on Netty, the protocol is http2, so the
> connection is multiplexed. We don’t need to worry about connection count.
> Second, all the bookies are treated equally for the metadata operations,
> gRPC will load balancing the requests across the bookies. We don’t need to
> worry about some bookies are overwhelmed.
>

gRPC sounds great


>
>
> > - iteration on ledgers, sometimes the clients enumerates ledgers but it
> is
> > not interested in having all of them, as we are using the bookie as
proxy
> > maybe some kind of "filter" (at least on custom metadata) would be
create
> > to limit the number of returned items. Other point I don't know gRPC but
> it
> > does not seems to be very clear how to 'stop' the iteration
> >
> [Jia] Thanks, We can add it later. For now, we would like to focus on
> adding the features the ledger manager needs.
>

Yup

-- Enrico


>
> >
> > -- Enrico
> >
> >
> > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> >
> > > Hi all,
> > >
> > > I have just posted a proposal to remove zookeeper dependency from
> > > bookkeeper client, to make bookkeeper client a thin client:
> > >
> > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
> > >
> > >
> > > BookKeeper uses zookeeper for service discovery (discovering the
> > available
> > > bookies in the cluster), metadata management (storing all the metadata
> > for
> > > ledgers). However it exposes the metadata storage directly to the
> > clients,
> > > making bookkeeper client a very thick client. It also exposes some
> > > problems.
> > >
> > > This BP explores the possibility of eliminating zookeeper completely
> from
> > > client side, to produce a thin bookkeeper client.
> > >
> > > I will send a patch as soon as we agree on the proposal.
> > >
> > >
> > > Thanks.
> > >
> > > -Jia
> > >
> >
>

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Enrico Olivelli <eo...@gmail.com>.
Thank you Sijie and Jia for your comments and explanations,
answers inline

2017-09-06 2:23 GMT+02:00 Jia Zhai <zh...@gmail.com>:

> Thanks a lot Enrico and Sijie for your comments and information on this.
>
> On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Great to see you working on this !
> > I would be great to have such feature, as it is the first step to a
> > 'standalone' BookKeeper mode
> >
> > Some complementary ideas/first look questions:
> > - the document does not talk about security, IMHO we have at least to
> cover
> > authentication and TLS, it would be great to leverage existing
> AuthPlugins,
> > as they are based on exchanging byte[] (as SASL wants)
> >
> [Jia] It is a good idea. We left the security part for now for a few
> reasons. 1) Make this BP more focus on removing zookeeper dependencies from
> client. 2) It is introduced as a separated implementation of existing
> interfaces. So it won’t impact existing security story.   And for sure, We
> will add the security part later after this.
>


I am fine, I am only afraid that we won't be able to support it in the
(near) future,
maybe you could just only cite the security story and add some reference to
how we would deal with it in future


>
> - do we have some kind of "bootstrap servers list" configuration option ?
> > the list should be complete or just a subset of bookies ? at connection
> the
> > client could discover the list of other bookies
> >
> [Jia] Yes, we will have a `clientBootstrapBookies` settings in the server
> set. It can be a list of bookies or just simple a DNS over the bookies.
> Will add this to the BP
>
> - will the client connect to only one bookie at a time ? how we will deal
> > with errors ?
> >
> [Jia] It will connect the the list of bootstrap servers. gPRC will load
> balance the requests and manage the connection errors.
>
> - should the bookie write on ZK metadata its gRPC endpoint info ? (this
> > will be useful for a bookie to tell about other bookies to the connected
> > clients)
> >
> [Jia]No, it won’t. We don’t see a strong reason to add it. Especially
> eventually we may eliminate zookeeper completely.
> It can be a fixed port `3281`, or in a scheduler-based environment, it is
> very easy to have a load balancer sitting in front of those bookies.
>

I think a fixed port is not a good way.
You will not be able to run more than one bookie on a single host.

We should support:
- configurable port
- ephemeral port for tests

Ideally I would like to have the local transport option, in order to have a
single JVM, but this is not a blocker problem, as we are running gRPC on
netty it should be feasible or we can create some kind of short-circut
between the client and the Bookie

I am OK for not writing this to the bookie metadata, leaving up to the
client have a configured list of bookies enabled to metadata operations




>
> - the bookie will be somehow a proxy for zookeeper, I think that the
> > 'watch' part is the more complex, we will have to deal with
> reconnections,
> > errors....maybe it is worth to write more detail about this
> >
> [Jia] The `watch` API is using the `streaming` rpc in gRPC. It is a
> straightforward proxy behavior, if a connection is broken, the client will
> simply retry on watching again.
>
>
> > Minor issues:
> > - Maybe you can consider using ledgerId and not ledger_id, like in
> > LedgerMetadataFormat we are using lastEntryId
> >
> [Jia] Thanks, It is a protobuf style. The protobuf will convert `ledger_id`
> to `ledgerId`. We don’t need to worry about this.
>

got it, thanks


>
>
> > -In the "motivation" part you write that the fact the having more clients
> > than the number of bookies would be a problem for zookeeper, actually
> > zookeeper is very good at dealing with a huge number of clients.
> Actually I
> > am always running clusters with 3-5 bookies and 10-100 writing clients
> and
> > this has never given troubles
>
> [Jia] :) Seems “10-100 writing clients” is not “a huge number of clients”.
>

OK, I agree with you an Sijie, I have no experience of larger clusters


>
> >
>
>
>
> > Future:
> > - as bookies will be proxies maybe we should take care not to overwhelm a
> > bookie with too many clients
> >
> [Jia] First, gRPC is based on Netty, the protocol is http2, so the
> connection is multiplexed. We don’t need to worry about connection count.
> Second, all the bookies are treated equally for the metadata operations,
> gRPC will load balancing the requests across the bookies. We don’t need to
> worry about some bookies are overwhelmed.
>

gRPC sounds great


>
>
> > - iteration on ledgers, sometimes the clients enumerates ledgers but it
> is
> > not interested in having all of them, as we are using the bookie as proxy
> > maybe some kind of "filter" (at least on custom metadata) would be create
> > to limit the number of returned items. Other point I don't know gRPC but
> it
> > does not seems to be very clear how to 'stop' the iteration
> >
> [Jia] Thanks, We can add it later. For now, we would like to focus on
> adding the features the ledger manager needs.
>

Yup

-- Enrico


>
> >
> > -- Enrico
> >
> >
> > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zh...@gmail.com>:
> >
> > > Hi all,
> > >
> > > I have just posted a proposal to remove zookeeper dependency from
> > > bookkeeper client, to make bookkeeper client a thin client:
> > >
> > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
> > >
> > >
> > > BookKeeper uses zookeeper for service discovery (discovering the
> > available
> > > bookies in the cluster), metadata management (storing all the metadata
> > for
> > > ledgers). However it exposes the metadata storage directly to the
> > clients,
> > > making bookkeeper client a very thick client. It also exposes some
> > > problems.
> > >
> > > This BP explores the possibility of eliminating zookeeper completely
> from
> > > client side, to produce a thin bookkeeper client.
> > >
> > > I will send a patch as soon as we agree on the proposal.
> > >
> > >
> > > Thanks.
> > >
> > > -Jia
> > >
> >
>

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Jia Zhai <zh...@gmail.com>.
Thanks a lot Enrico and Sijie for your comments and information on this.

On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli <eo...@gmail.com> wrote:

> Great to see you working on this !
> I would be great to have such feature, as it is the first step to a
> 'standalone' BookKeeper mode
>
> Some complementary ideas/first look questions:
> - the document does not talk about security, IMHO we have at least to cover
> authentication and TLS, it would be great to leverage existing AuthPlugins,
> as they are based on exchanging byte[] (as SASL wants)
>
[Jia] It is a good idea. We left the security part for now for a few
reasons. 1) Make this BP more focus on removing zookeeper dependencies from
client. 2) It is introduced as a separated implementation of existing
interfaces. So it won’t impact existing security story.   And for sure, We
will add the security part later after this.

- do we have some kind of "bootstrap servers list" configuration option ?
> the list should be complete or just a subset of bookies ? at connection the
> client could discover the list of other bookies
>
[Jia] Yes, we will have a `clientBootstrapBookies` settings in the server
set. It can be a list of bookies or just simple a DNS over the bookies.
Will add this to the BP

- will the client connect to only one bookie at a time ? how we will deal
> with errors ?
>
[Jia] It will connect the the list of bootstrap servers. gPRC will load
balance the requests and manage the connection errors.

- should the bookie write on ZK metadata its gRPC endpoint info ? (this
> will be useful for a bookie to tell about other bookies to the connected
> clients)
>
[Jia]No, it won’t. We don’t see a strong reason to add it. Especially
eventually we may eliminate zookeeper completely.
It can be a fixed port `3281`, or in a scheduler-based environment, it is
very easy to have a load balancer sitting in front of those bookies.

- the bookie will be somehow a proxy for zookeeper, I think that the
> 'watch' part is the more complex, we will have to deal with reconnections,
> errors....maybe it is worth to write more detail about this
>
[Jia] The `watch` API is using the `streaming` rpc in gRPC. It is a
straightforward proxy behavior, if a connection is broken, the client will
simply retry on watching again.


> Minor issues:
> - Maybe you can consider using ledgerId and not ledger_id, like in
> LedgerMetadataFormat we are using lastEntryId
>
[Jia] Thanks, It is a protobuf style. The protobuf will convert `ledger_id`
to `ledgerId`. We don’t need to worry about this.


> -In the "motivation" part you write that the fact the having more clients
> than the number of bookies would be a problem for zookeeper, actually
> zookeeper is very good at dealing with a huge number of clients. Actually I
> am always running clusters with 3-5 bookies and 10-100 writing clients and
> this has never given troubles

[Jia] :) Seems “10-100 writing clients” is not “a huge number of clients”.

>



> Future:
> - as bookies will be proxies maybe we should take care not to overwhelm a
> bookie with too many clients
>
[Jia] First, gRPC is based on Netty, the protocol is http2, so the
connection is multiplexed. We don’t need to worry about connection count.
Second, all the bookies are treated equally for the metadata operations,
gRPC will load balancing the requests across the bookies. We don’t need to
worry about some bookies are overwhelmed.


> - iteration on ledgers, sometimes the clients enumerates ledgers but it is
> not interested in having all of them, as we are using the bookie as proxy
> maybe some kind of "filter" (at least on custom metadata) would be create
> to limit the number of returned items. Other point I don't know gRPC but it
> does not seems to be very clear how to 'stop' the iteration
>
[Jia] Thanks, We can add it later. For now, we would like to focus on
adding the features the ledger manager needs.

>
> -- Enrico
>
>
> 2017-09-05 15:10 GMT+02:00 Jia Zhai <zh...@gmail.com>:
>
> > Hi all,
> >
> > I have just posted a proposal to remove zookeeper dependency from
> > bookkeeper client, to make bookkeeper client a thin client:
> >
> > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
> >
> >
> > BookKeeper uses zookeeper for service discovery (discovering the
> available
> > bookies in the cluster), metadata management (storing all the metadata
> for
> > ledgers). However it exposes the metadata storage directly to the
> clients,
> > making bookkeeper client a very thick client. It also exposes some
> > problems.
> >
> > This BP explores the possibility of eliminating zookeeper completely from
> > client side, to produce a thin bookkeeper client.
> >
> > I will send a patch as soon as we agree on the proposal.
> >
> >
> > Thanks.
> >
> > -Jia
> >
>

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Posted by Enrico Olivelli <eo...@gmail.com>.
Great to see you working on this !
I would be great to have such feature, as it is the first step to a
'standalone' BookKeeper mode

Some complementary ideas/first look questions:
- the document does not talk about security, IMHO we have at least to cover
authentication and TLS, it would be great to leverage existing AuthPlugins,
as they are based on exchanging byte[] (as SASL wants)
- do we have some kind of "bootstrap servers list" configuration option ?
the list should be complete or just a subset of bookies ? at connection the
client could discover the list of other bookies
- will the client connect to only one bookie at a time ? how we will deal
with errors ?
- should the bookie write on ZK metadata its gRPC endpoint info ? (this
will be useful for a bookie to tell about other bookies to the connected
clients)
- the bookie will be somehow a proxy for zookeeper, I think that the
'watch' part is the more complex, we will have to deal with reconnections,
errors....maybe it is worth to write more detail about this

Minor issues:
- Maybe you can consider using ledgerId and not ledger_id, like in
LedgerMetadataFormat we are using lastEntryId
-In the "motivation" part you write that the fact the having more clients
than the number of bookies would be a problem for zookeeper, actually
zookeeper is very good at dealing with a huge number of clients. Actually I
am always running clusters with 3-5 bookies and 10-100 writing clients and
this has never given troubles

Future:
- as bookies will be proxies maybe we should take care not to overwhelm a
bookie with too many clients
- iteration on ledgers, sometimes the clients enumerates ledgers but it is
not interested in having all of them, as we are using the bookie as proxy
maybe some kind of "filter" (at least on custom metadata) would be create
to limit the number of returned items. Other point I don't know gRPC but it
does not seems to be very clear how to 'stop' the iteration

-- Enrico


2017-09-05 15:10 GMT+02:00 Jia Zhai <zh...@gmail.com>:

> Hi all,
>
> I have just posted a proposal to remove zookeeper dependency from
> bookkeeper client, to make bookkeeper client a thin client:
>
> https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
>
>
> BookKeeper uses zookeeper for service discovery (discovering the available
> bookies in the cluster), metadata management (storing all the metadata for
> ledgers). However it exposes the metadata storage directly to the clients,
> making bookkeeper client a very thick client. It also exposes some
> problems.
>
> This BP explores the possibility of eliminating zookeeper completely from
> client side, to produce a thin bookkeeper client.
>
> I will send a patch as soon as we agree on the proposal.
>
>
> Thanks.
>
> -Jia
>