You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kudu.apache.org by helifu <hz...@corp.netease.com> on 2018/05/08 06:53:49 UTC

A question about the culling row-sets in ’Tablet::CaptureConsistentIterators’ function

Hi all,

 

In our kudu source code, we just cull row-sets when lower_bound_key and
exclusive_upper_bound_key are existing at the same time. And if not, we will
grab all row-sets of the tablet, then have to seek the key in
‘CFileSet::Iterator::PushdownRangeScanPredicate’ for the unnecessary
row-sets (which will waste disk io).

I am curious about that is it possible and helpful to support half
open-ended range or equality value while culling row-sets?

 

Thanks in advance.

 

何李夫

2017-04-10 16:06:24

 


答复: A question about the culling row-sets in ’Tablet::CaptureConsistentIterators’ function

Posted by helifu <hz...@corp.netease.com>.
Thanks, Todd.
I did not notice this comment. Sorry for that:(

何李夫
2017-04-10 16:06:24

-----邮件原件-----
发件人: dev-return-5994-hzhelifu=corp.netease.com@kudu.apache.org <de...@kudu.apache.org> 代表 Todd Lipcon
发送时间: 2018年5月8日 23:03
收件人: dev <de...@kudu.apache.org>
主题: Re: A question about the culling row-sets in ’Tablet::CaptureConsistentIterators’ function

Hi 何李夫

Yes, it seems there's a TODO in the code here about supporting the open-ended intervals:

  if (spec != nullptr && spec->lower_bound_key() &&
spec->exclusive_upper_bound_key()) {
    // TODO : support open-ended intervals

I think fixing this would be relatively straight-forward by adding some methods to the RowSetTree and IntervalTree implementation. Alternatively, it might be possible for the RowSetTree to remember the min and max key of all of its contents, and just use that min/max to fill in the missing bounds on the scan spec.

For the common case of equality predicates, note that equality is already converted into a scan with lower and upper bound set, so I think it will already cull to just the appropriate RowSets in that case.

If you're interested in working on this, that would be great.

-Todd

On Mon, May 7, 2018 at 11:53 PM, helifu <hz...@corp.netease.com> wrote:

> Hi all,
>
>
>
> In our kudu source code, we just cull row-sets when lower_bound_key 
> and exclusive_upper_bound_key are existing at the same time. And if 
> not, we will grab all row-sets of the tablet, then have to seek the 
> key in ‘CFileSet::Iterator::PushdownRangeScanPredicate’ for the 
> unnecessary row-sets (which will waste disk io).
>
> I am curious about that is it possible and helpful to support half 
> open-ended range or equality value while culling row-sets?
>
>
>
> Thanks in advance.
>
>
>
> 何李夫
>
> 2017-04-10 16:06:24
>
>
>
>


--
Todd Lipcon
Software Engineer, Cloudera


Re: A question about the culling row-sets in ’Tablet::CaptureConsistentIterators’ function

Posted by Todd Lipcon <to...@cloudera.com>.
Hi 何李夫

Yes, it seems there's a TODO in the code here about supporting the
open-ended intervals:

  if (spec != nullptr && spec->lower_bound_key() &&
spec->exclusive_upper_bound_key()) {
    // TODO : support open-ended intervals

I think fixing this would be relatively straight-forward by adding some
methods to the RowSetTree and IntervalTree implementation. Alternatively,
it might be possible for the RowSetTree to remember the min and max key of
all of its contents, and just use that min/max to fill in the missing
bounds on the scan spec.

For the common case of equality predicates, note that equality is already
converted into a scan with lower and upper bound set, so I think it will
already cull to just the appropriate RowSets in that case.

If you're interested in working on this, that would be great.

-Todd

On Mon, May 7, 2018 at 11:53 PM, helifu <hz...@corp.netease.com> wrote:

> Hi all,
>
>
>
> In our kudu source code, we just cull row-sets when lower_bound_key and
> exclusive_upper_bound_key are existing at the same time. And if not, we
> will
> grab all row-sets of the tablet, then have to seek the key in
> ‘CFileSet::Iterator::PushdownRangeScanPredicate’ for the unnecessary
> row-sets (which will waste disk io).
>
> I am curious about that is it possible and helpful to support half
> open-ended range or equality value while culling row-sets?
>
>
>
> Thanks in advance.
>
>
>
> 何李夫
>
> 2017-04-10 16:06:24
>
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera