You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Chia-Ping Tsai <ch...@apache.org> on 2017/10/01 04:28:23 UTC

Re: [DISCUSS] Move Type out of KeyValue

The "custom cell type" never exists in the story. (Sorry for misleading you) 

Here is the story. i add some custom cells (for saving memory) to Put via Put#add(Cell). The pseudocode of custom cell is shown below.

{code}
class MyObject() {
  Cell toCell() {
      return CellBuilderFactory.newBuilfer(SHALLOW_COPY)
                    .setRow(sharedBuffer, myRowOffset, myRowLength).
                    .setType(KeyValue.Type.Put.getCode()) // We call the IA.Private to get valid code of Put
                    // set other fields
                    .build();
  }
}

put.add(myObject.toCell);
{code}

And then, I noticed the Put#add is not optimized for our heavy table(a chunk of cells in single row), so I also extend the Put to add some #add methods for avoiding resizing collection.

That was the story -- I try to reducer the cost of converting our object to Put/Cell. A another story i had mentioned is to build custom write path via Endpoint, but it is unrelated to this topic. 

All class we use are shown below:
1) Cell -> IA.Public
2) CellBuilder -> IA.Public
3) CellBuilderFactory -> IA.Public
4) Put -> IA.Public
5) Put#add(Cell) -> IA.Public
5) KeyValue#Type -> IA.Private

That is why i want to make KeyValue#Type IA.Public.

--
Chia-Ping

On 2017-10-01 00:34, Andrew Purtell <an...@gmail.com> wrote: 
> Thanks for sharing these details. They are intriguing. If possible could you explain why the custom type is needed? 
> 
> Something has to be deployed on the server or the custom cell type isn’t guaranteed to be handled correctly. It may work now by accident. I’m a little surprised a custom cell type doesn’t cause an abort. Did you patch the code to handle it?
> 
> 
> > On Sep 30, 2017, at 1:06 AM, Chia-Ping Tsai <ch...@apache.org> wrote:
> > 
> > Thanks for the nice suggestions. Andrew. Sorry for delay response. Busy today.
> > 
> > The root reason we must build own Cell on client side is that the data are located on shared memory which is similar with MSLAB.
> > 
> > You are right. We can use attribute to carry our data but the byte[] is not acceptable because we can’t assign the offset and length. In fact, the endpoint is a better way for our case because our object can be  directly converted to PB object. Also it is easy to apply shared memory to manage our object. However, it will be easier and more readable to follow regular Put operation. All we have to do is to build own cell and extended Put. Nothing have to be deployed on server.
> > 
> > I agree the custom cell is low level thing, and it should be used by advanced users. What I concern is the classes related to  custom Cell have different IA declaration. I’am fine to make them IA.Private but building the custom cell may be a common case.
> > 
> > — 
> > Chia-Ping
> > 
> >> On 2017-09-30 06:05, Andrew Purtell <ap...@apache.org> wrote: 
> >> ​Construct a normal put or delete or batch mutation, add whatever extra
> >> state you need in one or more operation attributes, and use a
> >> regionobserver to extend normal processing to handle the extra state. I'm
> >> curious what dispatching to extension code because of a custom cell type
> >> buys you over dispatching to extension code because of the presence of an
> >> attribute (or cell tag). For example, in security coprocessors we take
> >> attribute data and attach it to the cell using cell tags. Later we check
> >> for cell tag(s) to determine if we have to take special action when the
> >> cell is accessed by a scanner, or during some operations (e.g. appends or
> >> increments have to do extra handling for cell security tags).
> >> 
> >> 
> >> On Fri, Sep 29, 2017 at 2:43 PM, Chia-Ping Tsai <ch...@apache.org> wrote:
> >> 
> >>>> Instead of a custom cell, could you use a regular cell with a custom
> >>>> operation attribute (see OperationWithAttributes).
> >>> Pardon me, I didn't get what you said.
> >>> 
> >>> 
> >>> 
> >>>> On 2017-09-30 04:31, Andrew Purtell <ap...@apache.org> wrote:
> >>>> Instead of a custom cell, could you use a regular cell with a custom
> >>>> operation attribute (see OperationWithAttributes).
> >>>> 
> >>>> On Fri, Sep 29, 2017 at 1:28 PM, Chia-Ping Tsai <ch...@apache.org>
> >>> wrote:
> >>>> 
> >>>>> The custom cell help us to save memory consumption. We don't have own
> >>>>> serialization/deserialization mechanism, hence to transform data from
> >>>>> client to server needs many conversion phase (user data -> Put/Cell ->
> >>> pb
> >>>>> object). The cost of conversion is large in transferring bulk data. In
> >>>>> fact, we also have custom mutation to manage the memory usage of inner
> >>> cell
> >>>>> collection.
> >>>>> 
> >>>>>> On 2017-09-30 02:43, Andrew Purtell <ap...@apache.org> wrote:
> >>>>>> What are the use cases for a custom cell? It seems a dangerously low
> >>>>> level
> >>>>>> thing to attempt and perhaps we should unwind support for it. But
> >>> perhaps
> >>>>>> there is a compelling justification.
> >>>>>> 
> >>>>>> 
> >>>>>> On Thu, Sep 28, 2017 at 10:20 PM, Chia-Ping Tsai <
> >>> chia7712@apache.org>
> >>>>>> wrote:
> >>>>>> 
> >>>>>>> Thanks for all comment.
> >>>>>>> 
> >>>>>>> The problem i want to resolve is the valid code should be exposed
> >>> as
> >>>>>>> IA.Public. Otherwise, end user have to access the IA.Private class
> >>> to
> >>>>> build
> >>>>>>> the custom cell.
> >>>>>>> 
> >>>>>>> For example, I have a use case which plays a streaming role in our
> >>>>>>> appliaction. It
> >>>>>>> applies the CellBuilder(HBASE-18519) to build custom cells. These
> >>> cells
> >>>>>>> have many same fields so they are put in shared-memory for
> >>> avoiding GC
> >>>>>>> pause. Everything is wonderful. However, we have to access the
> >>>>> IA.Private
> >>>>>>> class - KeyValue#Type - to get the valid code of Put.
> >>>>>>> 
> >>>>>>> I believe there are many use cases of custom cell, and
> >>> consequently it
> >>>>> is
> >>>>>>> worth adding a way to get the valid type via IA.Public class.
> >>>>> Otherwise, it
> >>>>>>> may imply that the custom cell is based on a unstable way, because
> >>> the
> >>>>>>> related code can be changed at any time.
> >>>>>>> --
> >>>>>>> Chia-Ping
> >>>>>>> 
> >>>>>>>> On 2017-09-29 00:49, Andrew Purtell <ap...@apache.org> wrote:
> >>>>>>>> I agree with Stack. Was typing up a reply to Anoop but let me
> >>> move it
> >>>>>>> down
> >>>>>>>> here.
> >>>>>>>> 
> >>>>>>>> The type code exposes some low level details of how our current
> >>>>> stores
> >>>>>>> are
> >>>>>>>> architected. But what if in the future you could swap out HStore
> >>>>>>> implements
> >>>>>>>> Store with PStore implements Store, where HStore is backed by
> >>> HFiles
> >>>>> and
> >>>>>>>> PStore is backed by Parquet? Just as a hypothetical example. I
> >>> know
> >>>>> there
> >>>>>>>> would be larger issues if this were actually attempted. Bear with
> >>>>> me. You
> >>>>>>>> can imagine some different new Store implementation that has some
> >>>>>>>> advantages but is not a design derived from the log structured
> >>> merge
> >>>>> tree
> >>>>>>>> if you like. Most values from a new Cell.Type based on
> >>> KeyValue.Type
> >>>>>>>> wouldn't apply to cells from such a thing because they are
> >>>>> particular to
> >>>>>>>> how LSMs work. I'm sure such a project if attempted would make a
> >>>>> number
> >>>>>>> of
> >>>>>>>> changes requiring a major version increment and low level details
> >>>>> could
> >>>>>>> be
> >>>>>>>> unwound from Cell then, but if we could avoid doing it in the
> >>> first
> >>>>>>> place,
> >>>>>>>> I think it would better for maintainability.
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>>> On Thu, Sep 28, 2017 at 9:39 AM, Stack <st...@duboce.net> wrote:
> >>>>>>>>> 
> >>>>>>>>> On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai <
> >>>>> chia7712@apache.org>
> >>>>>>>>> wrote:
> >>>>>>>>> 
> >>>>>>>>>> hi folks,
> >>>>>>>>>> 
> >>>>>>>>>> User is allowed to create custom cell but the valid code of
> >>> type
> >>>>> -
> >>>>>>>>>> KeyValue#Type - is declared as IA.Private. As i see it, we
> >>> should
> >>>>>>> expose
> >>>>>>>>>> KeyValue#Type as Public Client. Three possible ways are shown
> >>>>> below:
> >>>>>>>>>> 1) Change declaration of KeyValue#Type from IA.Private to
> >>>>> IA.Public
> >>>>>>>>>> 2) Move KeyValue#Type into Cell.
> >>>>>>>>>> 3) Move KeyValue#Type to upper level
> >>>>>>>>>> 
> >>>>>>>>>> Any suggestions?
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>> What is the problem that we are trying to solve Chia-Ping? You
> >>>>> want to
> >>>>>>> make
> >>>>>>>>> Cells of a new Type?
> >>>>>>>>> 
> >>>>>>>>> My first reaction is that KV#Type is particular to the KV
> >>>>>>> implementation.
> >>>>>>>>> Any new Cell implementation should not have to adopt the
> >>> KeyValue
> >>>>>>> typing
> >>>>>>>>> mechanism.
> >>>>>>>>> 
> >>>>>>>>> S
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>>> --
> >>>>>>>>>> Chia-Ping
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> --
> >>>>>>>> Best regards,
> >>>>>>>> Andrew
> >>>>>>>> 
> >>>>>>>> Words like orphans lost among the crosstalk, meaning torn from
> >>>>> truth's
> >>>>>>>> decrepit hands
> >>>>>>>>   - A23, Crosstalk
> >>>>>>>> 
> >>>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> --
> >>>>>> Best regards,
> >>>>>> Andrew
> >>>>>> 
> >>>>>> Words like orphans lost among the crosstalk, meaning torn from
> >>> truth's
> >>>>>> decrepit hands
> >>>>>>   - A23, Crosstalk
> >>>>>> 
> >>>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> --
> >>>> Best regards,
> >>>> Andrew
> >>>> 
> >>>> Words like orphans lost among the crosstalk, meaning torn from truth's
> >>>> decrepit hands
> >>>>   - A23, Crosstalk
> >>>> 
> >>> 
> >> 
> >> 
> >> 
> >> -- 
> >> Best regards,
> >> Andrew
> >> 
> >> Words like orphans lost among the crosstalk, meaning torn from truth's
> >> decrepit hands
> >>   - A23, Crosstalk
> >> 
> 

Re: [DISCUSS] Move Type out of KeyValue

Posted by Andrew Purtell <an...@gmail.com>.
Ok, thanks. I understand now.

+1

> On Sep 30, 2017, at 9:28 PM, Chia-Ping Tsai <ch...@apache.org> wrote:
> 
> The "custom cell type" never exists in the story. (Sorry for misleading you) 
> 
> Here is the story. i add some custom cells (for saving memory) to Put via Put#add(Cell). The pseudocode of custom cell is shown below.
> 
> {code}
> class MyObject() {
>  Cell toCell() {
>      return CellBuilderFactory.newBuilfer(SHALLOW_COPY)
>                    .setRow(sharedBuffer, myRowOffset, myRowLength).
>                    .setType(KeyValue.Type.Put.getCode()) // We call the IA.Private to get valid code of Put
>                    // set other fields
>                    .build();
>  }
> }
> 
> put.add(myObject.toCell);
> {code}
> 
> And then, I noticed the Put#add is not optimized for our heavy table(a chunk of cells in single row), so I also extend the Put to add some #add methods for avoiding resizing collection.
> 
> That was the story -- I try to reducer the cost of converting our object to Put/Cell. A another story i had mentioned is to build custom write path via Endpoint, but it is unrelated to this topic. 
> 
> All class we use are shown below:
> 1) Cell -> IA.Public
> 2) CellBuilder -> IA.Public
> 3) CellBuilderFactory -> IA.Public
> 4) Put -> IA.Public
> 5) Put#add(Cell) -> IA.Public
> 5) KeyValue#Type -> IA.Private
> 
> That is why i want to make KeyValue#Type IA.Public.
> 
> --
> Chia-Ping
> 
>> On 2017-10-01 00:34, Andrew Purtell <an...@gmail.com> wrote: 
>> Thanks for sharing these details. They are intriguing. If possible could you explain why the custom type is needed? 
>> 
>> Something has to be deployed on the server or the custom cell type isn’t guaranteed to be handled correctly. It may work now by accident. I’m a little surprised a custom cell type doesn’t cause an abort. Did you patch the code to handle it?
>> 
>> 
>>> On Sep 30, 2017, at 1:06 AM, Chia-Ping Tsai <ch...@apache.org> wrote:
>>> 
>>> Thanks for the nice suggestions. Andrew. Sorry for delay response. Busy today.
>>> 
>>> The root reason we must build own Cell on client side is that the data are located on shared memory which is similar with MSLAB.
>>> 
>>> You are right. We can use attribute to carry our data but the byte[] is not acceptable because we can’t assign the offset and length. In fact, the endpoint is a better way for our case because our object can be  directly converted to PB object. Also it is easy to apply shared memory to manage our object. However, it will be easier and more readable to follow regular Put operation. All we have to do is to build own cell and extended Put. Nothing have to be deployed on server.
>>> 
>>> I agree the custom cell is low level thing, and it should be used by advanced users. What I concern is the classes related to  custom Cell have different IA declaration. I’am fine to make them IA.Private but building the custom cell may be a common case.
>>> 
>>> — 
>>> Chia-Ping
>>> 
>>>> On 2017-09-30 06:05, Andrew Purtell <ap...@apache.org> wrote: 
>>>> ​Construct a normal put or delete or batch mutation, add whatever extra
>>>> state you need in one or more operation attributes, and use a
>>>> regionobserver to extend normal processing to handle the extra state. I'm
>>>> curious what dispatching to extension code because of a custom cell type
>>>> buys you over dispatching to extension code because of the presence of an
>>>> attribute (or cell tag). For example, in security coprocessors we take
>>>> attribute data and attach it to the cell using cell tags. Later we check
>>>> for cell tag(s) to determine if we have to take special action when the
>>>> cell is accessed by a scanner, or during some operations (e.g. appends or
>>>> increments have to do extra handling for cell security tags).
>>>> 
>>>> 
>>>> On Fri, Sep 29, 2017 at 2:43 PM, Chia-Ping Tsai <ch...@apache.org> wrote:
>>>> 
>>>>>> Instead of a custom cell, could you use a regular cell with a custom
>>>>>> operation attribute (see OperationWithAttributes).
>>>>> Pardon me, I didn't get what you said.
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 2017-09-30 04:31, Andrew Purtell <ap...@apache.org> wrote:
>>>>>> Instead of a custom cell, could you use a regular cell with a custom
>>>>>> operation attribute (see OperationWithAttributes).
>>>>>> 
>>>>>> On Fri, Sep 29, 2017 at 1:28 PM, Chia-Ping Tsai <ch...@apache.org>
>>>>> wrote:
>>>>>> 
>>>>>>> The custom cell help us to save memory consumption. We don't have own
>>>>>>> serialization/deserialization mechanism, hence to transform data from
>>>>>>> client to server needs many conversion phase (user data -> Put/Cell ->
>>>>> pb
>>>>>>> object). The cost of conversion is large in transferring bulk data. In
>>>>>>> fact, we also have custom mutation to manage the memory usage of inner
>>>>> cell
>>>>>>> collection.
>>>>>>> 
>>>>>>>> On 2017-09-30 02:43, Andrew Purtell <ap...@apache.org> wrote:
>>>>>>>> What are the use cases for a custom cell? It seems a dangerously low
>>>>>>> level
>>>>>>>> thing to attempt and perhaps we should unwind support for it. But
>>>>> perhaps
>>>>>>>> there is a compelling justification.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Sep 28, 2017 at 10:20 PM, Chia-Ping Tsai <
>>>>> chia7712@apache.org>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Thanks for all comment.
>>>>>>>>> 
>>>>>>>>> The problem i want to resolve is the valid code should be exposed
>>>>> as
>>>>>>>>> IA.Public. Otherwise, end user have to access the IA.Private class
>>>>> to
>>>>>>> build
>>>>>>>>> the custom cell.
>>>>>>>>> 
>>>>>>>>> For example, I have a use case which plays a streaming role in our
>>>>>>>>> appliaction. It
>>>>>>>>> applies the CellBuilder(HBASE-18519) to build custom cells. These
>>>>> cells
>>>>>>>>> have many same fields so they are put in shared-memory for
>>>>> avoiding GC
>>>>>>>>> pause. Everything is wonderful. However, we have to access the
>>>>>>> IA.Private
>>>>>>>>> class - KeyValue#Type - to get the valid code of Put.
>>>>>>>>> 
>>>>>>>>> I believe there are many use cases of custom cell, and
>>>>> consequently it
>>>>>>> is
>>>>>>>>> worth adding a way to get the valid type via IA.Public class.
>>>>>>> Otherwise, it
>>>>>>>>> may imply that the custom cell is based on a unstable way, because
>>>>> the
>>>>>>>>> related code can be changed at any time.
>>>>>>>>> --
>>>>>>>>> Chia-Ping
>>>>>>>>> 
>>>>>>>>>> On 2017-09-29 00:49, Andrew Purtell <ap...@apache.org> wrote:
>>>>>>>>>> I agree with Stack. Was typing up a reply to Anoop but let me
>>>>> move it
>>>>>>>>> down
>>>>>>>>>> here.
>>>>>>>>>> 
>>>>>>>>>> The type code exposes some low level details of how our current
>>>>>>> stores
>>>>>>>>> are
>>>>>>>>>> architected. But what if in the future you could swap out HStore
>>>>>>>>> implements
>>>>>>>>>> Store with PStore implements Store, where HStore is backed by
>>>>> HFiles
>>>>>>> and
>>>>>>>>>> PStore is backed by Parquet? Just as a hypothetical example. I
>>>>> know
>>>>>>> there
>>>>>>>>>> would be larger issues if this were actually attempted. Bear with
>>>>>>> me. You
>>>>>>>>>> can imagine some different new Store implementation that has some
>>>>>>>>>> advantages but is not a design derived from the log structured
>>>>> merge
>>>>>>> tree
>>>>>>>>>> if you like. Most values from a new Cell.Type based on
>>>>> KeyValue.Type
>>>>>>>>>> wouldn't apply to cells from such a thing because they are
>>>>>>> particular to
>>>>>>>>>> how LSMs work. I'm sure such a project if attempted would make a
>>>>>>> number
>>>>>>>>> of
>>>>>>>>>> changes requiring a major version increment and low level details
>>>>>>> could
>>>>>>>>> be
>>>>>>>>>> unwound from Cell then, but if we could avoid doing it in the
>>>>> first
>>>>>>>>> place,
>>>>>>>>>> I think it would better for maintainability.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Thu, Sep 28, 2017 at 9:39 AM, Stack <st...@duboce.net> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai <
>>>>>>> chia7712@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> hi folks,
>>>>>>>>>>>> 
>>>>>>>>>>>> User is allowed to create custom cell but the valid code of
>>>>> type
>>>>>>> -
>>>>>>>>>>>> KeyValue#Type - is declared as IA.Private. As i see it, we
>>>>> should
>>>>>>>>> expose
>>>>>>>>>>>> KeyValue#Type as Public Client. Three possible ways are shown
>>>>>>> below:
>>>>>>>>>>>> 1) Change declaration of KeyValue#Type from IA.Private to
>>>>>>> IA.Public
>>>>>>>>>>>> 2) Move KeyValue#Type into Cell.
>>>>>>>>>>>> 3) Move KeyValue#Type to upper level
>>>>>>>>>>>> 
>>>>>>>>>>>> Any suggestions?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> What is the problem that we are trying to solve Chia-Ping? You
>>>>>>> want to
>>>>>>>>> make
>>>>>>>>>>> Cells of a new Type?
>>>>>>>>>>> 
>>>>>>>>>>> My first reaction is that KV#Type is particular to the KV
>>>>>>>>> implementation.
>>>>>>>>>>> Any new Cell implementation should not have to adopt the
>>>>> KeyValue
>>>>>>>>> typing
>>>>>>>>>>> mechanism.
>>>>>>>>>>> 
>>>>>>>>>>> S
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Chia-Ping
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Best regards,
>>>>>>>>>> Andrew
>>>>>>>>>> 
>>>>>>>>>> Words like orphans lost among the crosstalk, meaning torn from
>>>>>>> truth's
>>>>>>>>>> decrepit hands
>>>>>>>>>>  - A23, Crosstalk
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Best regards,
>>>>>>>> Andrew
>>>>>>>> 
>>>>>>>> Words like orphans lost among the crosstalk, meaning torn from
>>>>> truth's
>>>>>>>> decrepit hands
>>>>>>>>  - A23, Crosstalk
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best regards,
>>>>>> Andrew
>>>>>> 
>>>>>> Words like orphans lost among the crosstalk, meaning torn from truth's
>>>>>> decrepit hands
>>>>>>  - A23, Crosstalk
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Best regards,
>>>> Andrew
>>>> 
>>>> Words like orphans lost among the crosstalk, meaning torn from truth's
>>>> decrepit hands
>>>>  - A23, Crosstalk
>>>> 
>>