You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Kunal Khatua (JIRA)" <ji...@apache.org> on 2018/10/08 18:07:00 UTC

[jira] [Resolved] (DRILL-6211) Optimizations for SelectionVectorRemover

     [ https://issues.apache.org/jira/browse/DRILL-6211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kunal Khatua resolved DRILL-6211.
---------------------------------
    Resolution: Fixed

As part of Lateral Unnest feature commits, this has been verified.

|| Selectivity || *Drill-1.13* (ms) || %Used by SVR || Est SVR Time (ms) || *Drill-1.14* (ms) || %Used by SVR || Est SVR Time (ms) || 
| 0%          | 5935            | 0.08%        | 4.75         | 4,584           | 0.14%        | 6.42         | 
| 10%         | 6665            | 7.51%        | 500.54       | 4,972           | 0.12%        | 5.97         | 
| 20%         | 7512            | 13.22%       | 993.09       | 5,187           | 0.14%        | 7.26         | 
| 30%         | 7814            | 19.03%       | 1487.00      | 5,432           | 0.20%        | 10.86        | 
| 40%         | 8827            | 22.06%       | 1947.24      | 5,579           | 0.16%        | 8.93         | 
| 50%         | 9499            | 25.36%       | 2408.95      | 5,739           | 0.17%        | 9.76         | 
| 60%         | 10108           | 28.63%       | 2893.92      | 5,823           | 0.18%        | 10.48        | 
| 70%         | 10624           | 31.47%       | 3343.37      | 6,096           | 0.19%        | 11.58        | 
| 80%         | 11342           | 33.58%       | 3808.64      | 6,266           | 0.20%        | 12.53        | 
| 90%         | 12088           | 35.40%       | 4279.15      | 6,324           | 0.21%        | 13.28        | 
| 100%        | 12741           | 37.42%       | 4767.68      | 6,250           | 0.23%        | 14.38        | 


> Optimizations for SelectionVectorRemover 
> -----------------------------------------
>
>                 Key: DRILL-6211
>                 URL: https://issues.apache.org/jira/browse/DRILL-6211
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Codegen
>            Reporter: Kunal Khatua
>            Assignee: Karthikeyan Manivannan
>            Priority: Major
>             Fix For: 1.15.0
>
>         Attachments: 255d264c-f55e-b343-0bef-49d3e672d93f.sys.drill, 255d2664-2418-19e0-00ea-2076a06572a2.sys.drill, 255d2682-8481-bed0-fc22-197a75371c04.sys.drill, 255d26ae-2c0b-6cd6-ae71-4ad04c992daf.sys.drill, 255d2880-48a2-d86b-5410-29ce0cd249ed.sys.drill
>
>
> Currently, when a SelectionVectorRemover receives a record batch from an upstream operator (like a Filter), it immediately starts copying over records into a new outgoing batch.
>  It can be worthwhile if the RecordBatch can be enriched with some additional summary statistics about the attached SelectionVector, such as
>  # number of records that need to be removed/copied
>  # total number of records in the record-batch
> The benefit of this would be that in extreme cases, if *all* the records in a batch need to be either truncated or copies, the SelectionVectorRemover can simply drop the record-batch or simply forward it to the next downstream operator.
> While the extreme cases of simply dropping the batch kind of works (because there is no overhead in copying), for cases where the record batch should pass through, the overhead remains (and is actually more than 35% of the time, if you discount for the streaming agg cost within the tests).
> Here are the statistics of having such an optimization
> ||Selectivity||Query Time||%Time used by SVR||Time||Profile||
> |0%|6.996|0.13%|0.0090948|[^255d264c-f55e-b343-0bef-49d3e672d93f.sys.drill]|
> |10%|7.836|7.97%|0.6245292|[^255d2682-8481-bed0-fc22-197a75371c04.sys.drill]|
> |50%|11.225|25.59%|2.8724775|[^255d2664-2418-19e0-00ea-2076a06572a2.sys.drill]|
> |90%|14.966|33.91%|5.0749706|[^255d26ae-2c0b-6cd6-ae71-4ad04c992daf.sys.drill]|
> |100%|19.003|35.73%|6.7897719|[^255d2880-48a2-d86b-5410-29ce0cd249ed.sys.drill]|
> To summarize, the SVR should avoid creating new batches as much as possible.
> A more generic (non-trivial) optimization should take into account the fact that multiple batches emitted can be coalesced, but we don't currently have test metrics for that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)