You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Boaz Ben-Zvi (JIRA)" <ji...@apache.org> on 2019/02/20 03:33:00 UTC

[jira] [Comment Edited] (DRILL-7043) Enhance Merge-Join to support Full Outer Join

    [ https://issues.apache.org/jira/browse/DRILL-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772590#comment-16772590 ] 

Boaz Ben-Zvi edited comment on DRILL-7043 at 2/20/19 3:32 AM:
--------------------------------------------------------------

This enhancement is becoming more useful as our storage begins to support "sortedness" - e.g., Secondary Indexes, and future Parquet Metadata (e.g., taken from Hive). A Merge-Join on two sorted tables always out-performs a Hash-Join.

 

 

 


was (Author: ben-zvi):
This enhancement is becoming more useful as our storage begins to support "sortedness" - e.g., Secondary Indexes, and future Parquet Metadata (e.g., taken from Hive).

 

 

> Enhance Merge-Join to support Full Outer Join
> ---------------------------------------------
>
>                 Key: DRILL-7043
>                 URL: https://issues.apache.org/jira/browse/DRILL-7043
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators, Query Planning &amp; Optimization
>    Affects Versions: 1.15.0
>            Reporter: Boaz Ben-Zvi
>            Assignee: Boaz Ben-Zvi
>            Priority: Major
>
>    Currently the Merge Join operator internally cannot support a Right Outer Join (and thus a Full Outer Join; for ROJ alone, the planner rotates the inputs and specifies a Left Outer Join).
>    The actual reason for not supporting ROJ is the current MJ implementation - when a match is found, it puts a mark on the right side and iterates down on the right, resetting back at the end (and on to the next left side entry).  This would create an ambiguity if the next left entry is bigger than the previous - is this an unmatched (i.e., need to return the right entry), or there was a prior match (i.e., just advance to the next right).
>    Seems that adding a relevant flag to the persisted state ({{status}}) and some other code changes would make the operator support Right-Outer-Join as well (and thus a Full Outer Join).  The planner need an update as well - to suggest the MJ in case of a FOJ, and maybe not to rotate the inputs in some MJ cases.
>    Currently trying a FOJ with MJ (i.e. HJ disabled) produces the following "no plan found" from Calcite:
> {noformat}
> 0: jdbc:drill:zk=local> select * from temp t1 full outer join temp2 t2 on t1.d_date = t2.d_date;
> Error: SYSTEM ERROR: CannotPlanException: Node [rel#2804:Subset#8.PHYSICAL.SINGLETON([]).[]] could not be implemented; planner state:
> Root: rel#2804:Subset#8.PHYSICAL.SINGLETON([]).[]
> Original rel:
> DrillScreenRel(subset=[rel#2804:Subset#8.PHYSICAL.SINGLETON([]).[]]): rowcount = 6.0, cumulative cost = {0.6000000000000001 rows, 0.6000000000000001 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2802
>   DrillProjectRel(subset=[rel#2801:Subset#7.LOGICAL.ANY([]).[]], **=[$0], **0=[$2]): rowcount = 6.0, cumulative cost = {6.0 rows, 12.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2800
>     DrillJoinRel(subset=[rel#2799:Subset#6.LOGICAL.ANY([]).[]], condition=[=($1, $3)], joinType=[full]): rowcount = 6.0, cumulative cost = {10.0 rows, 104.0 cpu, 0.0 io, 0.0 network, 70.4 memory}, id = 2798
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)