You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Jacques Nadeau (JIRA)" <ji...@apache.org> on 2015/01/04 22:28:55 UTC

[jira] [Updated] (DRILL-173) Join operator should reuse ValueVectors when duplicate keys are present

     [ https://issues.apache.org/jira/browse/DRILL-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacques Nadeau updated DRILL-173:
---------------------------------
    Component/s: Execution - Operators

> Join operator should reuse ValueVectors when duplicate keys are present
> -----------------------------------------------------------------------
>
>                 Key: DRILL-173
>                 URL: https://issues.apache.org/jira/browse/DRILL-173
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Operators
>    Affects Versions: m1
>            Reporter: Ben Becker
>              Labels: optimization
>             Fix For: Future
>
>
> There are cases where joining two record batches can result in redundant work.  Consider a merge join performed on two tables (*t1* and *t2*) with duplicate keys on both sides:
> h5. t1
> || key || value ||
> | 2 | 'a' |
> | 2 | 'b' |
> h5. t2
> || key || value ||
> | 2 | 'A' |
> | 2 | 'B' |
> | 2 | 'C' |
> The resulting table will contain the cross product of all key values '2':
> || key || t1.value || t2.value ||
> | 2 | 'a' | 'A' |
> | 2 | 'a' | 'B' |
> | 2 | 'a' | 'C' |
> | 2 | 'b' | 'A' |
> | 2 | 'b' | 'B' |
> | 2 | 'b' | 'C' |
> The current implementation iteratively copies t2.value from the incoming vectors.  Ideally, the t2.value vector would only be iteratively constructed the first pass; after that it can be copied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)