You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Enrico Olivelli <eo...@gmail.com> on 2017/06/19 19:58:41 UTC
Re: Bookie labels and Placement policy

Il mer 24 mag 2017, 21:49 Sijie Guo <gu...@gmail.com> ha scritto:

> On Tue, May 16, 2017 at 11:39 PM, Venkateswara Rao Jujjuri <
> jujjuri@gmail.com> wrote:
>
> > We have this use case too. I believe introducing "pools" Is the right
> > approach for this.
> > Pools are very high level abstraction and it is treated as simply two
> > different clusters, but wrapped into one.
> >
> > Some of the high level thoughts:
> >
> > * Pools are top level abstraction.
> > * Pool is assigned at the time of ledger creation.(based one some
> criteria
> > at client)
> > * Ensemble changes, replication happens only in that pool of bookies.
> > * Stats, Storage capacity is tracked at pool level.
> > * Capadd is to a particular pool.
> > * Each pool of bookies may run with different server configurations
> > * Client configuration should accommodate pools too, different
> > configuration values under different pools.
> >
>
> JV, can you come up with more details about the pool thing? Have you
> considered using pool for 128 bits ledger id?
>

News?

Enrico


> - Sijie
>
>
>
>
> >
> > JV
> >
> > On Tue, May 16, 2017 at 7:49 AM, Bobby Evans <evans@yahoo-inc.com.invalid
> >
> > wrote:
> >
> > > OK so I am not keen on the idea of labels.  Probably because when I
> have
> > > seen it done in the past (YARN) it just felt like a hack that was
> trying
> > to
> > > avoid fixing the real underlying problem. YARN wanted to schedule for
> > > arbitrary resources but that is hard so they went with Node Labels
> > > instead.  Node labels have evolved in YARN and are now used for
> > > partitioning a cluster for isolation as well (although it really is
> > because
> > > network scheduling/isolation is hard).
> > >
> > > Now that I am done with my YARN node label rant I want to add that
> HBase
> > > put in an option for isolating table groups from each other on
> different
> > > region servers that has worked really well for a multi-tenant setup,
> so I
> > > am not completely opposed to the idea I just want to be sure we do it
> > right.
> > >
> > > In my opinion if this is a feature to isolate different groups from
> each
> > > other to avoid one bad actor impacting everyone else I would prefer to
> > see
> > > something with quotas for clients and/or users and nodes reporting
> their
> > > capabilities + current usage instead.  If you want some kind of
> affinity
> > > because you bought hardware to handle longer term vs shorter term
> storage
> > > then I would prefer to see that called out explicitly when the ledger
> is
> > > created instead of having arbitrary labels.  That way a long lived
> ledger
> > > could be placed on a node with lots of free capacity and short lived
> > > ledgers can go anywhere.  A client could either set it when they
> create a
> > > ledger and have a default in the config if it is not specified.
> > >
> > > If we do go with labels I want to be sure that we stress that users
> > should
> > > keep their matching rules as simple as possible.
> > > Hard partitioning of a cluster on labels provides a lot of possibility
> to
> > > shoot yourself in the foot and not notice it.
> > > They need to make sure that they have ways to easily monitor bookies
> > > grouped in the same way their client rules do.  They need to make sure
> > that
> > > when doing a rolling upgrade that they take the client rules into
> account
> > > when deciding what to take out and upgrade to avoid making a group of
> > > clients completely unusable.
> > >
> > > - Bobby
> > >
> > > On Tuesday, May 16, 2017, 6:05:21 AM CDT, Enrico Olivelli <
> > > eolivelli@gmail.com> wrote:Hi bookkeepers,
> > > I'm using BookKeeper for serveral projects, every project has its own
> > > workload characteristics and I would like to be able to assign bookies
> > > depending of the client type. It is quite common to share a BookKeeper
> > > cluster between different applications.
> > >
> > > For instance I am using Bookies to store Database logs, Task Brokers
> > > logs and recently I have started to use BookKeeper as data storage.
> > >
> > > Within the cluster I would like to use specific Bookies for mid-term
> > > storage, some bookies for logs...and so on, but current placement
> > > policies are not able to "distinguish" bookies.
> > >
> > > Actually I can achieve my goal by using a custom policy + custom
> > > metadata + out of band bookie metadata.
> > >
> > > I would like to introduce a first step, following the work of on
> > > "Resource aware data placement" (1), and introduce a list of "labels"
> > > to be assigned to every bookie.
> > >
> > > For instance: bookies for long term storage will have label
> > > "long-term", bookies for transaction logs may have label "wals".
> > >
> > > Another use case is to be able to request BookKeeper to write ledger
> > > data on specific sets of bookies depending on the "customer" who is
> > > the owner of data (I have customers already grouped by labels/tags)
> > >
> > > I would like to have a simple "standard" policy which uses some
> > > "standard" metadata to select bookies.
> > >
> > > Thinks to add:
> > > - a set  of "labels" configurable for bookies
> > > - Enrich the API (getBookieInfo) to query for labels and BookKeeper
> > > client to keep a local cache of label-to-bookie assignments
> > > - add a standard "custom metadata field"  which is a list of labels to
> > > use to select bookies, a bookie would be used only of it currently
> > > "has" all of the labels requested
> > >
> > >
> > > [1] https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > BP-2+-+Resource+aware+data+placement
> > >
> > > All comments are welcome
> > >
> > > -- Enrico
> > >
> >
> >
> >
> > --
> > Jvrao
> > ---
> > First they ignore you, then they laugh at you, then they fight you, then
> > you win. - Mahatma Gandhi
> >
>
-- 


-- Enrico Olivelli