You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jacques Nadeau (JIRA)" <ji...@apache.org> on 2015/01/04 22:28:55 UTC
[jira] [Updated] (DRILL-173) Join operator should reuse
ValueVectors when duplicate keys are present
[ https://issues.apache.org/jira/browse/DRILL-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacques Nadeau updated DRILL-173:
---------------------------------
Component/s: Execution - Operators
> Join operator should reuse ValueVectors when duplicate keys are present
> -----------------------------------------------------------------------
>
> Key: DRILL-173
> URL: https://issues.apache.org/jira/browse/DRILL-173
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Operators
> Affects Versions: m1
> Reporter: Ben Becker
> Labels: optimization
> Fix For: Future
>
>
> There are cases where joining two record batches can result in redundant work. Consider a merge join performed on two tables (*t1* and *t2*) with duplicate keys on both sides:
> h5. t1
> || key || value ||
> | 2 | 'a' |
> | 2 | 'b' |
> h5. t2
> || key || value ||
> | 2 | 'A' |
> | 2 | 'B' |
> | 2 | 'C' |
> The resulting table will contain the cross product of all key values '2':
> || key || t1.value || t2.value ||
> | 2 | 'a' | 'A' |
> | 2 | 'a' | 'B' |
> | 2 | 'a' | 'C' |
> | 2 | 'b' | 'A' |
> | 2 | 'b' | 'B' |
> | 2 | 'b' | 'C' |
> The current implementation iteratively copies t2.value from the incoming vectors. Ideally, the t2.value vector would only be iteratively constructed the first pass; after that it can be copied.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)