You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kudu.apache.org by helifu <hz...@corp.netease.com> on 2018/04/10 07:37:42 UTC

The speed of 'Scan' could be accelerated!

Hi all,

 

I read the function of ‘MaterializingIterator::MaterializeBlock’
carefully, and found that there is something we can do. If we adjust the
‘cur_idx_’ in ‘CFileSet::Iterator’ to the valid left of
‘dst->selection_vector’ after each loop of predicate evaluation, we could
skip reading some unnecessary ‘data_block’ and that will help to speed up.
Am I right? :)

 

何李夫

2017-04-10 16:06:24

Re: The speed of 'Scan' could be accelerated!

Posted by Andrew Wong <aw...@cloudera.com>.

Hi 何李夫,

Just making sure I understand your point:

In MaterializingIterator::MaterializeBlock(), we iterate through all of the
column predicates, and for each one, we start from the beginning of the
column block. Your point is that upon evaluating a single column predicate,
we may have some rows that we know don't belong in the results set. It then
stands to reason that we should be able to move up the cur_idx_ for the
entire rowwise iterator (CFileSet::Iterator) to avoid consideration of the
rows we've already filtered out.

This would probably serve to save some time for most block decoders, which
don't take into account the selection vector at all; some decoders (like
the dictionary decoder) do take into account the existing selection vector
and avoid the unnecessary materialization. So you're probably right, we
could probably save some cycles here. There might be some gotchas in
actually implementing this since the hierarchy for cfile sets, cfile
readers, and decoders is a bit complex. It seems you've got a fair amount
of context already, so feel free to try testing it out!

Andrew

On Tue, Apr 10, 2018 at 12:37 AM, helifu <hz...@corp.netease.com> wrote:

> Hi all,
>
>
>
> I read the function of ‘MaterializingIterator::MaterializeBlock’
> carefully, and found that there is something we can do. If we adjust the
> ‘cur_idx_’ in ‘CFileSet::Iterator’ to the valid left of
> ‘dst->selection_vector’ after each loop of predicate evaluation, we could
> skip reading some unnecessary ‘data_block’ and that will help to speed up.
> Am I right? :)
>
>
>
> 何李夫
>
> 2017-04-10 16:06:24
>
>
>
>

-- 
Andrew Wong