You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kudu.apache.org by helifu <hz...@corp.netease.com> on 2018/06/14 07:20:33 UTC

Question about 'CFileIterator::Scan'

Hi all,

 

I read the code of ‘CFileIterator::Scan’, and found that it would be
better to pass ‘remaining_sel’ to function ‘CopyNextValues’ to skip
coping the unnecessary data for the columns that are not in predicates. In
other words, the decoder will copy all the data of the columns that are not
in predicates.

 

CFileIterator::Scan:

for (const auto& col_pred : (ctx->DecoderEvalNotDisabled()) {

RETURN_NOT_OK(pb->dblk_->CopyNextAndEval(&this_batch, ctx, &remaining_sel,
&remaining_dst));

} else {

RETURN_NOT_OK(pb->dblk_->CopyNextValues(&this_batch, &remaining_dst));
<-- Here

}

 

 

For example: select column_a, column_b from table where column_c=’c’;

In function ‘MaterializingIterator::MaterializeBlock’:

    Because the column_c is a predicate, so it is in
‘col_idx_predicates_’. And then the decoder will evaluate and copy the
right data or sets the ‘SelectionVector’ to false.

    Next, column_a and column_b are not predicates, so they are in
‘non_predicate_column_indexes_’. This time, the decoder will copy the data
directly even though some rows have been set to false.

 

I just want to make sure I am interpreting this correctly. Thanks in
advance.

 

何李夫

2017-04-10 16:06:24

 


Re: Question about 'CFileIterator::Scan'

Posted by Andrew Wong <aw...@cloudera.com.INVALID>.
Hi,

Glancing at the code a bit, this seems like a reasonable optimization. The
SelectionVectorView was originally added as an optimization for decoders
that are able to fully evaluate predicates, but that isn't to say it can't
be used further by other decoders as a means to avoiding unnecessary
copying. As you suggest, it'd be particularly helpful in materializing
large rows on which the predicates are very selective (not many rows
returned).


Hope this helped,
Andrew

On Thu, Jun 14, 2018 at 12:20 AM, helifu <hz...@corp.netease.com> wrote:

> Hi all,
>
>
>
> I read the code of ‘CFileIterator::Scan’, and found that it would be
> better to pass ‘remaining_sel’ to function ‘CopyNextValues’ to skip
> coping the unnecessary data for the columns that are not in predicates. In
> other words, the decoder will copy all the data of the columns that are not
> in predicates.
>
>
>
> CFileIterator::Scan:
>
> for (const auto& col_pred : (ctx->DecoderEvalNotDisabled()) {
>
> RETURN_NOT_OK(pb->dblk_->CopyNextAndEval(&this_batch, ctx, &remaining_sel,
> &remaining_dst));
>
> } else {
>
> RETURN_NOT_OK(pb->dblk_->CopyNextValues(&this_batch, &remaining_dst));
> <-- Here
>
> }
>
>
>
>
>
> For example: select column_a, column_b from table where column_c=’c’;
>
> In function ‘MaterializingIterator::MaterializeBlock’:
>
>     Because the column_c is a predicate, so it is in
> ‘col_idx_predicates_’. And then the decoder will evaluate and copy the
> right data or sets the ‘SelectionVector’ to false.
>
>     Next, column_a and column_b are not predicates, so they are in
> ‘non_predicate_column_indexes_’. This time, the decoder will copy the data
> directly even though some rows have been set to false.
>
>
>
> I just want to make sure I am interpreting this correctly. Thanks in
> advance.
>
>
>
> 何李夫
>
> 2017-04-10 16:06:24
>
>
>
>


-- 
Andrew Wong