You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2018/08/09 23:44:00 UTC

[jira] [Commented] (DRILL-6678) Improve SelectionVectorRemover to pack output batch based on BatchSizing

    [ https://issues.apache.org/jira/browse/DRILL-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575578#comment-16575578 ] 

Paul Rogers commented on DRILL-6678:
------------------------------------

The SVR copies data. The size of the output batch can be no larger than the input. In the worst case, output will be a single row.

Is the proposal here to consolidate multiple incoming batches into a single output batch to preserve an ideal batch size? If so, that changes the semantics of the operator somewhat.

Is the idea to do all or nothing for each incoming batch? Either append it all to the output batch, or send off the current output and start anew?

Or, is the idea to append rows from multiple incoming batches until the output batch reaches the target size? That is, if A, B and C are incoming batches, the output batch may have all selected rows from A and B, and, say, have the selected rows from C.

If the goal is to consolidate, then you can get a rough cut using batch sizing (the "sizer.") But, the description mentions "maximum utilization." The best way to achieve actual maximum utilization (rather than approximate) is to use the Result Set Loader: it's whole purpose is to pack rows into a batch until it just meets the target output size. Using that might save having to reinvent some of the same wheels.


> Improve SelectionVectorRemover to pack output batch based on BatchSizing
> ------------------------------------------------------------------------
>
>                 Key: DRILL-6678
>                 URL: https://issues.apache.org/jira/browse/DRILL-6678
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.14.0
>            Reporter: Sorabh Hamirwasia
>            Assignee: Sorabh Hamirwasia
>            Priority: Major
>
> SelectionVectorRemover in most of the cases is downstream to Filter which reduces the number of records to be copied in output container. In those cases if SelectionVectorRemover can pack the output batch to maximum utilization that will reduce the number of output batches from it and will help to improve performance. During Lateral & Unnest  Performance evaluation we have noticed a significant decrease in performance as number of batches increases for same number of rows (i.e. Batch is not fully packed)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)