You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Sijie Guo <si...@apache.org> on 2016/12/10 09:45:59 UTC

Re: [DISCUSS][BP-2] Resource aware data placement

As a reminder, we will be discussing the resource aware placement at the
meeting next week.

Rithin, do you mind updating the proposal by next Tue? so we have time to
check before the meeting.

- Sijie

On Tue, Nov 1, 2016 at 2:29 AM, Sijie Guo <si...@apache.org> wrote:

> https://cwiki.apache.org/confluence/display/BOOKKEEPER/BP-2+
> -+Resource+aware+data+placement
>
> I created a wiki page for place holder. Anyone that is interested in this
> topic can help fill a basic proposal and drive the discussion?
>
> *Problem*
>
> There are bunch of tickets around talking about placement policy :
> - handle different types of storage (tier storage)
> - handle storage node having different type of resources (resource aware)
> - better to canary/bucket-testing new version (manageability)
>
> It would be good to consolidate these thoughts to provide a common
> framework/solution to meet those basic requirements.
>
> *Proposal*
>
> [TBD]
>
> - Sijie
>
>

Re: [DISCUSS][BP-2] Resource aware data placement

Posted by Sijie Guo <si...@apache.org>.
Just a reminder to the community,

We will be discussing BP-2 on Thursday (12/15) 8AM Pacific Time.

The meeting link is https://goo.gl/6UZR1w

Please take a look at this proposal before the meeting.

- Sijie

On Mon, Dec 12, 2016 at 11:09 AM, Rithin Shetty <ri...@gmail.com> wrote:

> Hi Sijie,
>
>     I've updated the proposal page now:
> https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> BP-2+-+Resource+aware+data+placement.
> Sorry about the delay.
>
> Thanks,
> --Rithin
>
> On Sat, Dec 10, 2016 at 1:45 AM, Sijie Guo <si...@apache.org> wrote:
>
> > As a reminder, we will be discussing the resource aware placement at the
> > meeting next week.
> >
> > Rithin, do you mind updating the proposal by next Tue? so we have time to
> > check before the meeting.
> >
> > - Sijie
> >
> > On Tue, Nov 1, 2016 at 2:29 AM, Sijie Guo <si...@apache.org> wrote:
> >
> > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/BP-2+
> > > -+Resource+aware+data+placement
> > >
> > > I created a wiki page for place holder. Anyone that is interested in
> this
> > > topic can help fill a basic proposal and drive the discussion?
> > >
> > > *Problem*
> > >
> > > There are bunch of tickets around talking about placement policy :
> > > - handle different types of storage (tier storage)
> > > - handle storage node having different type of resources (resource
> aware)
> > > - better to canary/bucket-testing new version (manageability)
> > >
> > > It would be good to consolidate these thoughts to provide a common
> > > framework/solution to meet those basic requirements.
> > >
> > > *Proposal*
> > >
> > > [TBD]
> > >
> > > - Sijie
> > >
> > >
> >
>

Re: [DISCUSS][BP-2] Resource aware data placement

Posted by Sijie Guo <si...@apache.org>.
+ Leigh Stewart (for the connection concern on current proposal)

Rithin,

Leigh, Yiming and me had a brief discussion at lunch. Leigh raised the idea
of using P2C load balancing algorithm[1] (to avoid talking to all bookies).
The idea has been implemented and described in finagle load balancing [2].
It would be good to discuss and think if we can use this approach to avoid
pulling all bookies.

Leigh, if you can, can you describe more about the idea of using P2C load
balancing.

[1] Michael Mitzenmacher. 2001. The Power of Two Choices in Randomized Load
Balancing. IEEE Trans. Parallel Distrib.
[2] Finagle Load Balancing.
https://twitter.github.io/finagle/guide/Clients.html#power-of-two-choices-p2c-least-loaded

- Sijie


On Tue, Dec 13, 2016 at 7:12 PM, Sijie Guo <si...@apache.org> wrote:

> Hi Rithin,
>
> The proposal looks good to me. I have a few comments:
>
> - How will the BookieInfo structure look like?
> - I like the idea of a general GET_BOOKIE_INFO request. Do you have any
> proposed wire protocol changes?
> - I assume that you might need the clients to configure a function to
> compute the weight for a bookie, right? So we can also compute the weight
> based on network bandwidth?
> - How does bookie collect the resource information?
>
> - Sijie
>
>
> On Mon, Dec 12, 2016 at 11:09 AM, Rithin Shetty <ri...@gmail.com> wrote:
>
>> Hi Sijie,
>>
>>     I've updated the proposal page now:
>> https://cwiki.apache.org/confluence/display/BOOKKEEPER/BP-2+
>> -+Resource+aware+data+placement.
>> Sorry about the delay.
>>
>> Thanks,
>> --Rithin
>>
>> On Sat, Dec 10, 2016 at 1:45 AM, Sijie Guo <si...@apache.org> wrote:
>>
>> > As a reminder, we will be discussing the resource aware placement at the
>> > meeting next week.
>> >
>> > Rithin, do you mind updating the proposal by next Tue? so we have time
>> to
>> > check before the meeting.
>> >
>> > - Sijie
>> >
>> > On Tue, Nov 1, 2016 at 2:29 AM, Sijie Guo <si...@apache.org> wrote:
>> >
>> > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/BP-2+
>> > > -+Resource+aware+data+placement
>> > >
>> > > I created a wiki page for place holder. Anyone that is interested in
>> this
>> > > topic can help fill a basic proposal and drive the discussion?
>> > >
>> > > *Problem*
>> > >
>> > > There are bunch of tickets around talking about placement policy :
>> > > - handle different types of storage (tier storage)
>> > > - handle storage node having different type of resources (resource
>> aware)
>> > > - better to canary/bucket-testing new version (manageability)
>> > >
>> > > It would be good to consolidate these thoughts to provide a common
>> > > framework/solution to meet those basic requirements.
>> > >
>> > > *Proposal*
>> > >
>> > > [TBD]
>> > >
>> > > - Sijie
>> > >
>> > >
>> >
>>
>
>

Re: [DISCUSS][BP-2] Resource aware data placement

Posted by Rithin Shetty <ri...@gmail.com>.
Hi Sijie,

    See my responses below:

On Tue, Dec 13, 2016 at 7:12 PM, Sijie Guo <si...@apache.org> wrote:

> Hi Rithin,
>
> The proposal looks good to me. I have a few comments:
>
> - How will the BookieInfo structure look like?
>

I've updated the wiki with this info now. It looks like this:


> - I like the idea of a general GET_BOOKIE_INFO request. Do you have any
> proposed wire protocol changes?
>
message GetBookieInfoRequest {
    enum Flags {
        TOTAL_DISK_CAPACITY = 0x01;
        FREE_DISK_SPACE = 0x02;
     }
     // bitwise OR of Flags
     optional int64 requested = 1;
}

message GetBookieInfoResponse {
    required StatusCode status = 1;
    optional int64 totalDiskCapacity = 2;
    optional int64 freeDiskSpace = 3;

}

> - I assume that you might need the clients to configure a function to
> compute the weight for a bookie, right? So we can also compute the weight
> based on network bandwidth?
>

I've implemented a separate class called 'WeightedRandomSelection', which
is given a long value, uses it as the basis to do weight based selection.
Please see the details in the pull request I created just now:
https://github.com/apache/bookkeeper/pull/93


> - How does bookie collect the resource information?
>

It is just going through all the ledger directories and is retrieving the
free disk space info and total disk space info. Those are the only two it
is collecting right now and it is collected on demand.



>
> - Sijie
>
> On Mon, Dec 12, 2016 at 11:09 AM, Rithin Shetty <ri...@gmail.com> wrote:
>
> > Hi Sijie,
> >
> >     I've updated the proposal page now:
> > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > BP-2+-+Resource+aware+data+placement.
> > Sorry about the delay.
> >
> > Thanks,
> > --Rithin
> >
> > On Sat, Dec 10, 2016 at 1:45 AM, Sijie Guo <si...@apache.org> wrote:
> >
> > > As a reminder, we will be discussing the resource aware placement at
> the
> > > meeting next week.
> > >
> > > Rithin, do you mind updating the proposal by next Tue? so we have time
> to
> > > check before the meeting.
> > >
> > > - Sijie
> > >
> > > On Tue, Nov 1, 2016 at 2:29 AM, Sijie Guo <si...@apache.org> wrote:
> > >
> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/BP-2+
> > > > -+Resource+aware+data+placement
> > > >
> > > > I created a wiki page for place holder. Anyone that is interested in
> > this
> > > > topic can help fill a basic proposal and drive the discussion?
> > > >
> > > > *Problem*
> > > >
> > > > There are bunch of tickets around talking about placement policy :
> > > > - handle different types of storage (tier storage)
> > > > - handle storage node having different type of resources (resource
> > aware)
> > > > - better to canary/bucket-testing new version (manageability)
> > > >
> > > > It would be good to consolidate these thoughts to provide a common
> > > > framework/solution to meet those basic requirements.
> > > >
> > > > *Proposal*
> > > >
> > > > [TBD]
> > > >
> > > > - Sijie
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS][BP-2] Resource aware data placement

Posted by Rithin Shetty <ri...@gmail.com>.
Hi Enrico,

    Thanks for the comments. Please see my replies inline below:

On Wed, Dec 14, 2016 at 4:08 AM, Enrico Olivelli <eo...@gmail.com>
wrote:

> Very interesting,
> some other notes for the discussion:
> - as we are now on java8 can you add a "default" no-op implementation
> to the new method in EnsemblePlacementPolicy#updateBookieInfo, this
> way we will not break compatibility with custom policies
>

OK. I wasn't familiar with it. I'll explore it.


> - inside the new BookieInfo can we provide a set of string "labels"
> for the bookie, to be configured statically on bookie configuration
> (ServerConfiguration). maybe this could be another implementation, but
> I would like that the BookieInfo will be extensible enough to add new
> fields in the future without protocol changes and without "protocol
> extensions"
>

My current implementation requires the client to request what information
it needs and the server will only respond to those.


> - I wonder if we can change actual out-of-the-box policies (RackAware,
> Default...) or just create a new policy ResourceAwarePlacementPolicy
> which takes into account this new info
>
>
I think creating a new one will not be easy. I'll need to specialize each
of the existing one: Default, RackAware, RegionAware, etc. Instead I chose
to add weighted property to the existing policies and it is controlled with
a separate config parameter.


> Thanks for the BP
> Enrico
>
> 2016-12-14 4:12 GMT+01:00 Sijie Guo <si...@apache.org>:
> > Hi Rithin,
> >
> > The proposal looks good to me. I have a few comments:
> >
> > - How will the BookieInfo structure look like?
> > - I like the idea of a general GET_BOOKIE_INFO request. Do you have any
> > proposed wire protocol changes?
> > - I assume that you might need the clients to configure a function to
> > compute the weight for a bookie, right? So we can also compute the weight
> > based on network bandwidth?
> > - How does bookie collect the resource information?
> >
> > - Sijie
> >
> > On Mon, Dec 12, 2016 at 11:09 AM, Rithin Shetty <ri...@gmail.com>
> wrote:
> >
> >> Hi Sijie,
> >>
> >>     I've updated the proposal page now:
> >> https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> >> BP-2+-+Resource+aware+data+placement.
> >> Sorry about the delay.
> >>
> >> Thanks,
> >> --Rithin
> >>
> >> On Sat, Dec 10, 2016 at 1:45 AM, Sijie Guo <si...@apache.org> wrote:
> >>
> >> > As a reminder, we will be discussing the resource aware placement at
> the
> >> > meeting next week.
> >> >
> >> > Rithin, do you mind updating the proposal by next Tue? so we have
> time to
> >> > check before the meeting.
> >> >
> >> > - Sijie
> >> >
> >> > On Tue, Nov 1, 2016 at 2:29 AM, Sijie Guo <si...@apache.org> wrote:
> >> >
> >> > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/BP-2+
> >> > > -+Resource+aware+data+placement
> >> > >
> >> > > I created a wiki page for place holder. Anyone that is interested in
> >> this
> >> > > topic can help fill a basic proposal and drive the discussion?
> >> > >
> >> > > *Problem*
> >> > >
> >> > > There are bunch of tickets around talking about placement policy :
> >> > > - handle different types of storage (tier storage)
> >> > > - handle storage node having different type of resources (resource
> >> aware)
> >> > > - better to canary/bucket-testing new version (manageability)
> >> > >
> >> > > It would be good to consolidate these thoughts to provide a common
> >> > > framework/solution to meet those basic requirements.
> >> > >
> >> > > *Proposal*
> >> > >
> >> > > [TBD]
> >> > >
> >> > > - Sijie
> >> > >
> >> > >
> >> >
> >>
>

Re: [DISCUSS][BP-2] Resource aware data placement

Posted by Enrico Olivelli <eo...@gmail.com>.
Very interesting,
some other notes for the discussion:
- as we are now on java8 can you add a "default" no-op implementation
to the new method in EnsemblePlacementPolicy#updateBookieInfo, this
way we will not break compatibility with custom policies
- inside the new BookieInfo can we provide a set of string "labels"
for the bookie, to be configured statically on bookie configuration
(ServerConfiguration). maybe this could be another implementation, but
I would like that the BookieInfo will be extensible enough to add new
fields in the future without protocol changes and without "protocol
extensions"
- I wonder if we can change actual out-of-the-box policies (RackAware,
Default...) or just create a new policy ResourceAwarePlacementPolicy
which takes into account this new info

Thanks for the BP
Enrico

2016-12-14 4:12 GMT+01:00 Sijie Guo <si...@apache.org>:
> Hi Rithin,
>
> The proposal looks good to me. I have a few comments:
>
> - How will the BookieInfo structure look like?
> - I like the idea of a general GET_BOOKIE_INFO request. Do you have any
> proposed wire protocol changes?
> - I assume that you might need the clients to configure a function to
> compute the weight for a bookie, right? So we can also compute the weight
> based on network bandwidth?
> - How does bookie collect the resource information?
>
> - Sijie
>
> On Mon, Dec 12, 2016 at 11:09 AM, Rithin Shetty <ri...@gmail.com> wrote:
>
>> Hi Sijie,
>>
>>     I've updated the proposal page now:
>> https://cwiki.apache.org/confluence/display/BOOKKEEPER/
>> BP-2+-+Resource+aware+data+placement.
>> Sorry about the delay.
>>
>> Thanks,
>> --Rithin
>>
>> On Sat, Dec 10, 2016 at 1:45 AM, Sijie Guo <si...@apache.org> wrote:
>>
>> > As a reminder, we will be discussing the resource aware placement at the
>> > meeting next week.
>> >
>> > Rithin, do you mind updating the proposal by next Tue? so we have time to
>> > check before the meeting.
>> >
>> > - Sijie
>> >
>> > On Tue, Nov 1, 2016 at 2:29 AM, Sijie Guo <si...@apache.org> wrote:
>> >
>> > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/BP-2+
>> > > -+Resource+aware+data+placement
>> > >
>> > > I created a wiki page for place holder. Anyone that is interested in
>> this
>> > > topic can help fill a basic proposal and drive the discussion?
>> > >
>> > > *Problem*
>> > >
>> > > There are bunch of tickets around talking about placement policy :
>> > > - handle different types of storage (tier storage)
>> > > - handle storage node having different type of resources (resource
>> aware)
>> > > - better to canary/bucket-testing new version (manageability)
>> > >
>> > > It would be good to consolidate these thoughts to provide a common
>> > > framework/solution to meet those basic requirements.
>> > >
>> > > *Proposal*
>> > >
>> > > [TBD]
>> > >
>> > > - Sijie
>> > >
>> > >
>> >
>>

Re: [DISCUSS][BP-2] Resource aware data placement

Posted by Sijie Guo <si...@apache.org>.
Hi Rithin,

The proposal looks good to me. I have a few comments:

- How will the BookieInfo structure look like?
- I like the idea of a general GET_BOOKIE_INFO request. Do you have any
proposed wire protocol changes?
- I assume that you might need the clients to configure a function to
compute the weight for a bookie, right? So we can also compute the weight
based on network bandwidth?
- How does bookie collect the resource information?

- Sijie

On Mon, Dec 12, 2016 at 11:09 AM, Rithin Shetty <ri...@gmail.com> wrote:

> Hi Sijie,
>
>     I've updated the proposal page now:
> https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> BP-2+-+Resource+aware+data+placement.
> Sorry about the delay.
>
> Thanks,
> --Rithin
>
> On Sat, Dec 10, 2016 at 1:45 AM, Sijie Guo <si...@apache.org> wrote:
>
> > As a reminder, we will be discussing the resource aware placement at the
> > meeting next week.
> >
> > Rithin, do you mind updating the proposal by next Tue? so we have time to
> > check before the meeting.
> >
> > - Sijie
> >
> > On Tue, Nov 1, 2016 at 2:29 AM, Sijie Guo <si...@apache.org> wrote:
> >
> > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/BP-2+
> > > -+Resource+aware+data+placement
> > >
> > > I created a wiki page for place holder. Anyone that is interested in
> this
> > > topic can help fill a basic proposal and drive the discussion?
> > >
> > > *Problem*
> > >
> > > There are bunch of tickets around talking about placement policy :
> > > - handle different types of storage (tier storage)
> > > - handle storage node having different type of resources (resource
> aware)
> > > - better to canary/bucket-testing new version (manageability)
> > >
> > > It would be good to consolidate these thoughts to provide a common
> > > framework/solution to meet those basic requirements.
> > >
> > > *Proposal*
> > >
> > > [TBD]
> > >
> > > - Sijie
> > >
> > >
> >
>

Re: [DISCUSS][BP-2] Resource aware data placement

Posted by Rithin Shetty <ri...@gmail.com>.
Hi Sijie,

    I've updated the proposal page now:
https://cwiki.apache.org/confluence/display/BOOKKEEPER/BP-2+-+Resource+aware+data+placement.
Sorry about the delay.

Thanks,
--Rithin

On Sat, Dec 10, 2016 at 1:45 AM, Sijie Guo <si...@apache.org> wrote:

> As a reminder, we will be discussing the resource aware placement at the
> meeting next week.
>
> Rithin, do you mind updating the proposal by next Tue? so we have time to
> check before the meeting.
>
> - Sijie
>
> On Tue, Nov 1, 2016 at 2:29 AM, Sijie Guo <si...@apache.org> wrote:
>
> > https://cwiki.apache.org/confluence/display/BOOKKEEPER/BP-2+
> > -+Resource+aware+data+placement
> >
> > I created a wiki page for place holder. Anyone that is interested in this
> > topic can help fill a basic proposal and drive the discussion?
> >
> > *Problem*
> >
> > There are bunch of tickets around talking about placement policy :
> > - handle different types of storage (tier storage)
> > - handle storage node having different type of resources (resource aware)
> > - better to canary/bucket-testing new version (manageability)
> >
> > It would be good to consolidate these thoughts to provide a common
> > framework/solution to meet those basic requirements.
> >
> > *Proposal*
> >
> > [TBD]
> >
> > - Sijie
> >
> >
>