You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by OpenInx <op...@gmail.com> on 2019/01/15 02:17:47 UTC

Did the branch-2.1 need the patch HBASE-21657 ?

Hi:

In HBASE-21657,  I simplified the path of estimatedSerialiedSize() &
estimatedSerialiedSizeOfCell() by moving the general getSerializedSize()
and heapSize() from ExtendedCell to Cell interface. It's a incompatible
change in some case, such as if the upstream user implemented their
own Cells, although it's rare but can happen, then their compile will be
error.

We gain almost ~40% throughput improvement in 100% scan case for branch-2
(cacheHitRatio~100%)[1], it's a good thing. but I'm not sure
whether the patch should go to branch-2.1 ?   in here [2], stack says
branch-2.0 won't need this Cell interface change (Agree, maybe the
following
change can be included, will file issue for it), but not quite sure for
branch-1 . Discussion are welcome (smile).

Anyway,  patch can be included to branch-2/master because we've not made a
release yet.

BTW, the patch also included some other improvments:
1.  for 99%  of case, our cells has no tags, so let the HFileScannerImpl
just return the NoTagsByteBufferKeyValue if no tags, which means we can
save
     lots of cpu time when sending no tags cell to rpc because can just
return the length instead of getting the serialize size by caculating
offset/length
     of each fields(row/cf/cq..)
2. Move the subclass's getSerializedSize implementation from ExtendedCell
to their own class, which mean we did not need to call ExtendedCell's
    getSerialiedSize() firstly, then forward to subclass's
getSerializedSize(withTags).
3.  Give a estimated result arraylist size for avoiding the frequent list
extension when in a big scan, now we estimate the array size as
min(scan.rows, 512).
     it's also help a lot.

Thanks.

1.
https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16735455&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16735455
2.
https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16742330&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16742330

Re: Did the branch-2.1 need the patch HBASE-21657 ?

Posted by OpenInx <op...@gmail.com>.
Well, thanks all, Let me push this patch to branch-2 and master.

On Tue, Jan 15, 2019 at 11:58 AM 张铎(Duo Zhang) <pa...@gmail.com>
wrote:

> Sean Busbey <bu...@apache.org> 于2019年1月15日周二 上午11:33写道:
>
> > I'd much rather see this in 2.2.0 rather than squeeze it into a 2.1
> > maintenance release.
> >
> > Can 2.2 do rolling upgrade from earlier 2.y releases? There's nothing in
> > the ref guide, but I thought it didn't work due to some assignment
> change.
> >
> Need to make sure that there are ongoing RITs when upgrading the HMaster.
> If so the new master will quit and you have to start old HMaster again to
> finish them first.
> If we want to release 2.2.0, at least we need to document this out.
>
> >
> > On Mon, Jan 14, 2019, 20:07 OpenInx <openinx@gmail.com wrote:
> >
> > > bq. For me, I would say that let's start the 2.2.x release line soon?
> So
> > > user could benefit from the change after they upgrade to 2.2.x.
> > > Sound good.
> > >
> > > On Tue, Jan 15, 2019 at 11:05 AM OpenInx <op...@gmail.com> wrote:
> > >
> > > > b
> > > >
> > > > On Tue, Jan 15, 2019 at 10:54 AM 张铎(Duo Zhang) <
> palomino219@gmail.com>
> > > > wrote:
> > > >
> > > >> For me, I would say that let's start the 2.2.x release line soon? So
> > > user
> > > >> could benefit from the change after they upgrade to 2.2.x.
> > > >>
> > > >> OpenInx <op...@gmail.com> 于2019年1月15日周二 上午10:21写道:
> > > >>
> > > >> > Sorry, here is a typo.
> > > >> >
> > > >> > > but not quite sure for branch-1 . Discussion are welcome
> (smile).
> > > >> > but not quite sure for branch-2.1
> > > >> >
> > > >> > On Tue, Jan 15, 2019 at 10:17 AM OpenInx <op...@gmail.com>
> wrote:
> > > >> >
> > > >> > > Hi:
> > > >> > >
> > > >> > > In HBASE-21657,  I simplified the path of
> > estimatedSerialiedSize() &
> > > >> > > estimatedSerialiedSizeOfCell() by moving the general
> > > >> getSerializedSize()
> > > >> > > and heapSize() from ExtendedCell to Cell interface. It's a
> > > >> incompatible
> > > >> > > change in some case, such as if the upstream user implemented
> > their
> > > >> > > own Cells, although it's rare but can happen, then their compile
> > > will
> > > >> be
> > > >> > > error.
> > > >> > >
> > > >> > > We gain almost ~40% throughput improvement in 100% scan case for
> > > >> branch-2
> > > >> > > (cacheHitRatio~100%)[1], it's a good thing. but I'm not sure
> > > >> > > whether the patch should go to branch-2.1 ?   in here [2], stack
> > > says
> > > >> > > branch-2.0 won't need this Cell interface change (Agree, maybe
> the
> > > >> > > following
> > > >> > > change can be included, will file issue for it), but not quite
> > sure
> > > >> for
> > > >> > > branch-1 . Discussion are welcome (smile).
> > > >> > >
> > > >> > > Anyway,  patch can be included to branch-2/master because we've
> > not
> > > >> made
> > > >> > a
> > > >> > > release yet.
> > > >> > >
> > > >> > > BTW, the patch also included some other improvments:
> > > >> > > 1.  for 99%  of case, our cells has no tags, so let the
> > > >> HFileScannerImpl
> > > >> > > just return the NoTagsByteBufferKeyValue if no tags, which means
> > we
> > > >> can
> > > >> > > save
> > > >> > >      lots of cpu time when sending no tags cell to rpc because
> can
> > > >> just
> > > >> > > return the length instead of getting the serialize size by
> > > caculating
> > > >> > > offset/length
> > > >> > >      of each fields(row/cf/cq..)
> > > >> > > 2. Move the subclass's getSerializedSize implementation from
> > > >> ExtendedCell
> > > >> > > to their own class, which mean we did not need to call
> > > ExtendedCell's
> > > >> > >     getSerialiedSize() firstly, then forward to subclass's
> > > >> > > getSerializedSize(withTags).
> > > >> > > 3.  Give a estimated result arraylist size for avoiding the
> > frequent
> > > >> list
> > > >> > > extension when in a big scan, now we estimate the array size as
> > > >> > > min(scan.rows, 512).
> > > >> > >      it's also help a lot.
> > > >> > >
> > > >> > > Thanks.
> > > >> > >
> > > >> > > 1.
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16735455&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16735455
> > > >> > > 2.
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16742330&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16742330
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: Did the branch-2.1 need the patch HBASE-21657 ?

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
Sean Busbey <bu...@apache.org> 于2019年1月15日周二 上午11:33写道:

> I'd much rather see this in 2.2.0 rather than squeeze it into a 2.1
> maintenance release.
>
> Can 2.2 do rolling upgrade from earlier 2.y releases? There's nothing in
> the ref guide, but I thought it didn't work due to some assignment change.
>
Need to make sure that there are ongoing RITs when upgrading the HMaster.
If so the new master will quit and you have to start old HMaster again to
finish them first.
If we want to release 2.2.0, at least we need to document this out.

>
> On Mon, Jan 14, 2019, 20:07 OpenInx <openinx@gmail.com wrote:
>
> > bq. For me, I would say that let's start the 2.2.x release line soon? So
> > user could benefit from the change after they upgrade to 2.2.x.
> > Sound good.
> >
> > On Tue, Jan 15, 2019 at 11:05 AM OpenInx <op...@gmail.com> wrote:
> >
> > > b
> > >
> > > On Tue, Jan 15, 2019 at 10:54 AM 张铎(Duo Zhang) <pa...@gmail.com>
> > > wrote:
> > >
> > >> For me, I would say that let's start the 2.2.x release line soon? So
> > user
> > >> could benefit from the change after they upgrade to 2.2.x.
> > >>
> > >> OpenInx <op...@gmail.com> 于2019年1月15日周二 上午10:21写道:
> > >>
> > >> > Sorry, here is a typo.
> > >> >
> > >> > > but not quite sure for branch-1 . Discussion are welcome (smile).
> > >> > but not quite sure for branch-2.1
> > >> >
> > >> > On Tue, Jan 15, 2019 at 10:17 AM OpenInx <op...@gmail.com> wrote:
> > >> >
> > >> > > Hi:
> > >> > >
> > >> > > In HBASE-21657,  I simplified the path of
> estimatedSerialiedSize() &
> > >> > > estimatedSerialiedSizeOfCell() by moving the general
> > >> getSerializedSize()
> > >> > > and heapSize() from ExtendedCell to Cell interface. It's a
> > >> incompatible
> > >> > > change in some case, such as if the upstream user implemented
> their
> > >> > > own Cells, although it's rare but can happen, then their compile
> > will
> > >> be
> > >> > > error.
> > >> > >
> > >> > > We gain almost ~40% throughput improvement in 100% scan case for
> > >> branch-2
> > >> > > (cacheHitRatio~100%)[1], it's a good thing. but I'm not sure
> > >> > > whether the patch should go to branch-2.1 ?   in here [2], stack
> > says
> > >> > > branch-2.0 won't need this Cell interface change (Agree, maybe the
> > >> > > following
> > >> > > change can be included, will file issue for it), but not quite
> sure
> > >> for
> > >> > > branch-1 . Discussion are welcome (smile).
> > >> > >
> > >> > > Anyway,  patch can be included to branch-2/master because we've
> not
> > >> made
> > >> > a
> > >> > > release yet.
> > >> > >
> > >> > > BTW, the patch also included some other improvments:
> > >> > > 1.  for 99%  of case, our cells has no tags, so let the
> > >> HFileScannerImpl
> > >> > > just return the NoTagsByteBufferKeyValue if no tags, which means
> we
> > >> can
> > >> > > save
> > >> > >      lots of cpu time when sending no tags cell to rpc because can
> > >> just
> > >> > > return the length instead of getting the serialize size by
> > caculating
> > >> > > offset/length
> > >> > >      of each fields(row/cf/cq..)
> > >> > > 2. Move the subclass's getSerializedSize implementation from
> > >> ExtendedCell
> > >> > > to their own class, which mean we did not need to call
> > ExtendedCell's
> > >> > >     getSerialiedSize() firstly, then forward to subclass's
> > >> > > getSerializedSize(withTags).
> > >> > > 3.  Give a estimated result arraylist size for avoiding the
> frequent
> > >> list
> > >> > > extension when in a big scan, now we estimate the array size as
> > >> > > min(scan.rows, 512).
> > >> > >      it's also help a lot.
> > >> > >
> > >> > > Thanks.
> > >> > >
> > >> > > 1.
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16735455&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16735455
> > >> > > 2.
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16742330&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16742330
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: Did the branch-2.1 need the patch HBASE-21657 ?

Posted by Sean Busbey <bu...@apache.org>.
I'd much rather see this in 2.2.0 rather than squeeze it into a 2.1
maintenance release.

Can 2.2 do rolling upgrade from earlier 2.y releases? There's nothing in
the ref guide, but I thought it didn't work due to some assignment change.

On Mon, Jan 14, 2019, 20:07 OpenInx <openinx@gmail.com wrote:

> bq. For me, I would say that let's start the 2.2.x release line soon? So
> user could benefit from the change after they upgrade to 2.2.x.
> Sound good.
>
> On Tue, Jan 15, 2019 at 11:05 AM OpenInx <op...@gmail.com> wrote:
>
> > b
> >
> > On Tue, Jan 15, 2019 at 10:54 AM 张铎(Duo Zhang) <pa...@gmail.com>
> > wrote:
> >
> >> For me, I would say that let's start the 2.2.x release line soon? So
> user
> >> could benefit from the change after they upgrade to 2.2.x.
> >>
> >> OpenInx <op...@gmail.com> 于2019年1月15日周二 上午10:21写道:
> >>
> >> > Sorry, here is a typo.
> >> >
> >> > > but not quite sure for branch-1 . Discussion are welcome (smile).
> >> > but not quite sure for branch-2.1
> >> >
> >> > On Tue, Jan 15, 2019 at 10:17 AM OpenInx <op...@gmail.com> wrote:
> >> >
> >> > > Hi:
> >> > >
> >> > > In HBASE-21657,  I simplified the path of estimatedSerialiedSize() &
> >> > > estimatedSerialiedSizeOfCell() by moving the general
> >> getSerializedSize()
> >> > > and heapSize() from ExtendedCell to Cell interface. It's a
> >> incompatible
> >> > > change in some case, such as if the upstream user implemented their
> >> > > own Cells, although it's rare but can happen, then their compile
> will
> >> be
> >> > > error.
> >> > >
> >> > > We gain almost ~40% throughput improvement in 100% scan case for
> >> branch-2
> >> > > (cacheHitRatio~100%)[1], it's a good thing. but I'm not sure
> >> > > whether the patch should go to branch-2.1 ?   in here [2], stack
> says
> >> > > branch-2.0 won't need this Cell interface change (Agree, maybe the
> >> > > following
> >> > > change can be included, will file issue for it), but not quite sure
> >> for
> >> > > branch-1 . Discussion are welcome (smile).
> >> > >
> >> > > Anyway,  patch can be included to branch-2/master because we've not
> >> made
> >> > a
> >> > > release yet.
> >> > >
> >> > > BTW, the patch also included some other improvments:
> >> > > 1.  for 99%  of case, our cells has no tags, so let the
> >> HFileScannerImpl
> >> > > just return the NoTagsByteBufferKeyValue if no tags, which means we
> >> can
> >> > > save
> >> > >      lots of cpu time when sending no tags cell to rpc because can
> >> just
> >> > > return the length instead of getting the serialize size by
> caculating
> >> > > offset/length
> >> > >      of each fields(row/cf/cq..)
> >> > > 2. Move the subclass's getSerializedSize implementation from
> >> ExtendedCell
> >> > > to their own class, which mean we did not need to call
> ExtendedCell's
> >> > >     getSerialiedSize() firstly, then forward to subclass's
> >> > > getSerializedSize(withTags).
> >> > > 3.  Give a estimated result arraylist size for avoiding the frequent
> >> list
> >> > > extension when in a big scan, now we estimate the array size as
> >> > > min(scan.rows, 512).
> >> > >      it's also help a lot.
> >> > >
> >> > > Thanks.
> >> > >
> >> > > 1.
> >> > >
> >> >
> >>
> https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16735455&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16735455
> >> > > 2.
> >> > >
> >> >
> >>
> https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16742330&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16742330
> >> > >
> >> >
> >>
> >
>

Re: Did the branch-2.1 need the patch HBASE-21657 ?

Posted by OpenInx <op...@gmail.com>.
bq. For me, I would say that let's start the 2.2.x release line soon? So
user could benefit from the change after they upgrade to 2.2.x.
Sound good.

On Tue, Jan 15, 2019 at 11:05 AM OpenInx <op...@gmail.com> wrote:

> b
>
> On Tue, Jan 15, 2019 at 10:54 AM 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
>> For me, I would say that let's start the 2.2.x release line soon? So user
>> could benefit from the change after they upgrade to 2.2.x.
>>
>> OpenInx <op...@gmail.com> 于2019年1月15日周二 上午10:21写道:
>>
>> > Sorry, here is a typo.
>> >
>> > > but not quite sure for branch-1 . Discussion are welcome (smile).
>> > but not quite sure for branch-2.1
>> >
>> > On Tue, Jan 15, 2019 at 10:17 AM OpenInx <op...@gmail.com> wrote:
>> >
>> > > Hi:
>> > >
>> > > In HBASE-21657,  I simplified the path of estimatedSerialiedSize() &
>> > > estimatedSerialiedSizeOfCell() by moving the general
>> getSerializedSize()
>> > > and heapSize() from ExtendedCell to Cell interface. It's a
>> incompatible
>> > > change in some case, such as if the upstream user implemented their
>> > > own Cells, although it's rare but can happen, then their compile will
>> be
>> > > error.
>> > >
>> > > We gain almost ~40% throughput improvement in 100% scan case for
>> branch-2
>> > > (cacheHitRatio~100%)[1], it's a good thing. but I'm not sure
>> > > whether the patch should go to branch-2.1 ?   in here [2], stack says
>> > > branch-2.0 won't need this Cell interface change (Agree, maybe the
>> > > following
>> > > change can be included, will file issue for it), but not quite sure
>> for
>> > > branch-1 . Discussion are welcome (smile).
>> > >
>> > > Anyway,  patch can be included to branch-2/master because we've not
>> made
>> > a
>> > > release yet.
>> > >
>> > > BTW, the patch also included some other improvments:
>> > > 1.  for 99%  of case, our cells has no tags, so let the
>> HFileScannerImpl
>> > > just return the NoTagsByteBufferKeyValue if no tags, which means we
>> can
>> > > save
>> > >      lots of cpu time when sending no tags cell to rpc because can
>> just
>> > > return the length instead of getting the serialize size by caculating
>> > > offset/length
>> > >      of each fields(row/cf/cq..)
>> > > 2. Move the subclass's getSerializedSize implementation from
>> ExtendedCell
>> > > to their own class, which mean we did not need to call ExtendedCell's
>> > >     getSerialiedSize() firstly, then forward to subclass's
>> > > getSerializedSize(withTags).
>> > > 3.  Give a estimated result arraylist size for avoiding the frequent
>> list
>> > > extension when in a big scan, now we estimate the array size as
>> > > min(scan.rows, 512).
>> > >      it's also help a lot.
>> > >
>> > > Thanks.
>> > >
>> > > 1.
>> > >
>> >
>> https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16735455&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16735455
>> > > 2.
>> > >
>> >
>> https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16742330&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16742330
>> > >
>> >
>>
>

Re: Did the branch-2.1 need the patch HBASE-21657 ?

Posted by OpenInx <op...@gmail.com>.
b

On Tue, Jan 15, 2019 at 10:54 AM 张铎(Duo Zhang) <pa...@gmail.com>
wrote:

> For me, I would say that let's start the 2.2.x release line soon? So user
> could benefit from the change after they upgrade to 2.2.x.
>
> OpenInx <op...@gmail.com> 于2019年1月15日周二 上午10:21写道:
>
> > Sorry, here is a typo.
> >
> > > but not quite sure for branch-1 . Discussion are welcome (smile).
> > but not quite sure for branch-2.1
> >
> > On Tue, Jan 15, 2019 at 10:17 AM OpenInx <op...@gmail.com> wrote:
> >
> > > Hi:
> > >
> > > In HBASE-21657,  I simplified the path of estimatedSerialiedSize() &
> > > estimatedSerialiedSizeOfCell() by moving the general
> getSerializedSize()
> > > and heapSize() from ExtendedCell to Cell interface. It's a incompatible
> > > change in some case, such as if the upstream user implemented their
> > > own Cells, although it's rare but can happen, then their compile will
> be
> > > error.
> > >
> > > We gain almost ~40% throughput improvement in 100% scan case for
> branch-2
> > > (cacheHitRatio~100%)[1], it's a good thing. but I'm not sure
> > > whether the patch should go to branch-2.1 ?   in here [2], stack says
> > > branch-2.0 won't need this Cell interface change (Agree, maybe the
> > > following
> > > change can be included, will file issue for it), but not quite sure for
> > > branch-1 . Discussion are welcome (smile).
> > >
> > > Anyway,  patch can be included to branch-2/master because we've not
> made
> > a
> > > release yet.
> > >
> > > BTW, the patch also included some other improvments:
> > > 1.  for 99%  of case, our cells has no tags, so let the
> HFileScannerImpl
> > > just return the NoTagsByteBufferKeyValue if no tags, which means we can
> > > save
> > >      lots of cpu time when sending no tags cell to rpc because can just
> > > return the length instead of getting the serialize size by caculating
> > > offset/length
> > >      of each fields(row/cf/cq..)
> > > 2. Move the subclass's getSerializedSize implementation from
> ExtendedCell
> > > to their own class, which mean we did not need to call ExtendedCell's
> > >     getSerialiedSize() firstly, then forward to subclass's
> > > getSerializedSize(withTags).
> > > 3.  Give a estimated result arraylist size for avoiding the frequent
> list
> > > extension when in a big scan, now we estimate the array size as
> > > min(scan.rows, 512).
> > >      it's also help a lot.
> > >
> > > Thanks.
> > >
> > > 1.
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16735455&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16735455
> > > 2.
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16742330&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16742330
> > >
> >
>

Re: Did the branch-2.1 need the patch HBASE-21657 ?

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
For me, I would say that let's start the 2.2.x release line soon? So user
could benefit from the change after they upgrade to 2.2.x.

OpenInx <op...@gmail.com> 于2019年1月15日周二 上午10:21写道:

> Sorry, here is a typo.
>
> > but not quite sure for branch-1 . Discussion are welcome (smile).
> but not quite sure for branch-2.1
>
> On Tue, Jan 15, 2019 at 10:17 AM OpenInx <op...@gmail.com> wrote:
>
> > Hi:
> >
> > In HBASE-21657,  I simplified the path of estimatedSerialiedSize() &
> > estimatedSerialiedSizeOfCell() by moving the general getSerializedSize()
> > and heapSize() from ExtendedCell to Cell interface. It's a incompatible
> > change in some case, such as if the upstream user implemented their
> > own Cells, although it's rare but can happen, then their compile will be
> > error.
> >
> > We gain almost ~40% throughput improvement in 100% scan case for branch-2
> > (cacheHitRatio~100%)[1], it's a good thing. but I'm not sure
> > whether the patch should go to branch-2.1 ?   in here [2], stack says
> > branch-2.0 won't need this Cell interface change (Agree, maybe the
> > following
> > change can be included, will file issue for it), but not quite sure for
> > branch-1 . Discussion are welcome (smile).
> >
> > Anyway,  patch can be included to branch-2/master because we've not made
> a
> > release yet.
> >
> > BTW, the patch also included some other improvments:
> > 1.  for 99%  of case, our cells has no tags, so let the HFileScannerImpl
> > just return the NoTagsByteBufferKeyValue if no tags, which means we can
> > save
> >      lots of cpu time when sending no tags cell to rpc because can just
> > return the length instead of getting the serialize size by caculating
> > offset/length
> >      of each fields(row/cf/cq..)
> > 2. Move the subclass's getSerializedSize implementation from ExtendedCell
> > to their own class, which mean we did not need to call ExtendedCell's
> >     getSerialiedSize() firstly, then forward to subclass's
> > getSerializedSize(withTags).
> > 3.  Give a estimated result arraylist size for avoiding the frequent list
> > extension when in a big scan, now we estimate the array size as
> > min(scan.rows, 512).
> >      it's also help a lot.
> >
> > Thanks.
> >
> > 1.
> >
> https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16735455&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16735455
> > 2.
> >
> https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16742330&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16742330
> >
>

Re: Did the branch-2.1 need the patch HBASE-21657 ?

Posted by OpenInx <op...@gmail.com>.
Sorry, here is a typo.

> but not quite sure for branch-1 . Discussion are welcome (smile).
but not quite sure for branch-2.1

On Tue, Jan 15, 2019 at 10:17 AM OpenInx <op...@gmail.com> wrote:

> Hi:
>
> In HBASE-21657,  I simplified the path of estimatedSerialiedSize() &
> estimatedSerialiedSizeOfCell() by moving the general getSerializedSize()
> and heapSize() from ExtendedCell to Cell interface. It's a incompatible
> change in some case, such as if the upstream user implemented their
> own Cells, although it's rare but can happen, then their compile will be
> error.
>
> We gain almost ~40% throughput improvement in 100% scan case for branch-2
> (cacheHitRatio~100%)[1], it's a good thing. but I'm not sure
> whether the patch should go to branch-2.1 ?   in here [2], stack says
> branch-2.0 won't need this Cell interface change (Agree, maybe the
> following
> change can be included, will file issue for it), but not quite sure for
> branch-1 . Discussion are welcome (smile).
>
> Anyway,  patch can be included to branch-2/master because we've not made a
> release yet.
>
> BTW, the patch also included some other improvments:
> 1.  for 99%  of case, our cells has no tags, so let the HFileScannerImpl
> just return the NoTagsByteBufferKeyValue if no tags, which means we can
> save
>      lots of cpu time when sending no tags cell to rpc because can just
> return the length instead of getting the serialize size by caculating
> offset/length
>      of each fields(row/cf/cq..)
> 2. Move the subclass's getSerializedSize implementation from ExtendedCell
> to their own class, which mean we did not need to call ExtendedCell's
>     getSerialiedSize() firstly, then forward to subclass's
> getSerializedSize(withTags).
> 3.  Give a estimated result arraylist size for avoiding the frequent list
> extension when in a big scan, now we estimate the array size as
> min(scan.rows, 512).
>      it's also help a lot.
>
> Thanks.
>
> 1.
> https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16735455&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16735455
> 2.
> https://issues.apache.org/jira/browse/HBASE-21657?focusedCommentId=16742330&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16742330
>