You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Ben Becker (JIRA)" <ji...@apache.org> on 2013/08/16 05:55:52 UTC
[jira] [Created] (DRILL-173) Join operator should reuse
ValueVectors when duplicate keys are present
Ben Becker created DRILL-173:
--------------------------------
Summary: Join operator should reuse ValueVectors when duplicate keys are present
Key: DRILL-173
URL: https://issues.apache.org/jira/browse/DRILL-173
Project: Apache Drill
Issue Type: Improvement
Affects Versions: Alpha
Reporter: Ben Becker
There are cases where joining two record batches can result in redundant work. Consider a merge join performed on two tables (*t1* and *t2*) with duplicate keys on both sides:
h5. t1
|| key || value ||
| 2 | 'a' |
| 2 | 'b' |
h5. t2
|| key || value ||
| 2 | 'A' |
| 2 | 'B' |
| 2 | 'C' |
The resulting table will contain the cross product of all key values '2':
|| key || t1.value || t2.value ||
| 2 | 'a' | 'A' |
| 2 | 'a' | 'B' |
| 2 | 'a' | 'C' |
| 2 | 'b' | 'A' |
| 2 | 'b' | 'B' |
| 2 | 'b' | 'C' |
The current implementation iteratively copies t2.value from the incoming vectors. Ideally, the t2.value vector would only be iteratively constructed the first pass; after that it can be copied.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira