You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Jinfeng Ni <jn...@maprtech.com> on 2015/02/13 02:10:55 UTC

Review Request 30959: DRILL-2236: Optimize hash inner join by swapping inputs based on row count comparison.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30959/
-----------------------------------------------------------

Review request for drill.


Repository: drill-git


Description
-------

See https://issues.apache.org/jira/browse/DRILL-2236#.


Diffs
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java a3c42de 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java 3541db7 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java 394a82c 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java faa8546 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/explain/NumberingRelWriter.java 6522ad9 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SwapHashJoinVisitor.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java 6b3d301 
  exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java aa0a5ad 

Diff: https://reviews.apache.org/r/30959/diff/


Testing
-------

Unit test done. 

Other regression workloads are running. Will update status once complete.


Thanks,

Jinfeng Ni


Re: Review Request 30959: DRILL-2236: Optimize hash inner join by swapping inputs based on row count comparison.

Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30959/#review73378
-----------------------------------------------------------

Ship it!


Ship It!

- Aman Sinha


On Feb. 18, 2015, 8:16 p.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30959/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2015, 8:16 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> See https://issues.apache.org/jira/browse/DRILL-2236#.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java a3c42de 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java 3541db7 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java 394a82c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java faa8546 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/explain/NumberingRelWriter.java 6522ad9 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SwapHashJoinVisitor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java 0ac7c97 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java aa0a5ad 
> 
> Diff: https://reviews.apache.org/r/30959/diff/
> 
> 
> Testing
> -------
> 
> Unit test done. 
> 
> Other regression workloads are running. Will update status once complete.
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Re: Review Request 30959: DRILL-2236: Optimize hash inner join by swapping inputs based on row count comparison.

Posted by Jinfeng Ni <jn...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30959/
-----------------------------------------------------------

(Updated Feb. 18, 2015, 12:16 p.m.)


Review request for drill.


Changes
-------

Revise code based review comments.


Repository: drill-git


Description
-------

See https://issues.apache.org/jira/browse/DRILL-2236#.


Diffs (updated)
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java a3c42de 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java 3541db7 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java 394a82c 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java faa8546 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/explain/NumberingRelWriter.java 6522ad9 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SwapHashJoinVisitor.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java 0ac7c97 
  exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java aa0a5ad 

Diff: https://reviews.apache.org/r/30959/diff/


Testing
-------

Unit test done. 

Other regression workloads are running. Will update status once complete.


Thanks,

Jinfeng Ni


Re: Review Request 30959: DRILL-2236: Optimize hash inner join by swapping inputs based on row count comparison.

Posted by Jinfeng Ni <jn...@maprtech.com>.

> On Feb. 13, 2015, 3:58 p.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SwapHashJoinVisitor.java, line 30
> > <https://reviews.apache.org/r/30959/diff/1/?file=862732#file862732line30>
> >
> >     Can you add javadoc comments to describe what this visitor is doing.

Add javadoc comments.


> On Feb. 13, 2015, 3:58 p.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SwapHashJoinVisitor.java, line 59
> > <https://reviews.apache.org/r/30959/diff/1/?file=862732#file862732line59>
> >
> >     We should have a percentage threshold to do the comparison.. i.e swap only if right side is 10% more than left side.  Also, I feel that we should have some option where we don't put intermediate result on the right side of the join, because the estimates on the intermediate result is highly unpredictable.

Add another option to set the percentage threshold. The default is 10%.

I did not add option where we don't put intermediate result on the right side of the join. It's true that the intermediate result is not accurate, and the decision based on that may not lead to a plan that we want. However, the current cost-based optimizer also heavily replies on the intermediate result estmiates. In that sense, I feel both of approachs would face the same issue. 

Also, I'm thinking to disable this feature by default. Only when users see performance hit because of hash join order, they could turn on the option. This way, it gives user a workaround, and avoid the overhead to either re-write the query, or mannully modify the JSON physical plan.


- Jinfeng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30959/#review72454
-----------------------------------------------------------


On Feb. 12, 2015, 5:10 p.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30959/
> -----------------------------------------------------------
> 
> (Updated Feb. 12, 2015, 5:10 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> See https://issues.apache.org/jira/browse/DRILL-2236#.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java a3c42de 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java 3541db7 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java 394a82c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java faa8546 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/explain/NumberingRelWriter.java 6522ad9 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SwapHashJoinVisitor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java 6b3d301 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java aa0a5ad 
> 
> Diff: https://reviews.apache.org/r/30959/diff/
> 
> 
> Testing
> -------
> 
> Unit test done. 
> 
> Other regression workloads are running. Will update status once complete.
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Re: Review Request 30959: DRILL-2236: Optimize hash inner join by swapping inputs based on row count comparison.

Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30959/#review72454
-----------------------------------------------------------



exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SwapHashJoinVisitor.java
<https://reviews.apache.org/r/30959/#comment118516>

    Can you add javadoc comments to describe what this visitor is doing.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SwapHashJoinVisitor.java
<https://reviews.apache.org/r/30959/#comment118523>

    We should have a percentage threshold to do the comparison.. i.e swap only if right side is 10% more than left side.  Also, I feel that we should have some option where we don't put intermediate result on the right side of the join, because the estimates on the intermediate result is highly unpredictable.


- Aman Sinha


On Feb. 13, 2015, 1:10 a.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30959/
> -----------------------------------------------------------
> 
> (Updated Feb. 13, 2015, 1:10 a.m.)
> 
> 
> Review request for drill.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> See https://issues.apache.org/jira/browse/DRILL-2236#.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java a3c42de 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java 3541db7 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java 394a82c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java faa8546 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/explain/NumberingRelWriter.java 6522ad9 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SwapHashJoinVisitor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java 6b3d301 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java aa0a5ad 
> 
> Diff: https://reviews.apache.org/r/30959/diff/
> 
> 
> Testing
> -------
> 
> Unit test done. 
> 
> Other regression workloads are running. Will update status once complete.
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Re: Review Request 30959: DRILL-2236: Optimize hash inner join by swapping inputs based on row count comparison.

Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30959/#review72460
-----------------------------------------------------------



exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
<https://reviews.apache.org/r/30959/#comment118527>

    I think swapped should be set to true here such that the join knows that the inputs have been swapped if it is visited again.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SwapHashJoinVisitor.java
<https://reviews.apache.org/r/30959/#comment118530>

    This visitor is top-down; I was also thinking what the advantages/disadvantages are for a bottom-up approach to swapping the inputs. The output cardinality of a join node does not change in either approach but perhaps we should think a little bit about the other approach as well.


- Aman Sinha


On Feb. 13, 2015, 1:10 a.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30959/
> -----------------------------------------------------------
> 
> (Updated Feb. 13, 2015, 1:10 a.m.)
> 
> 
> Review request for drill.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> See https://issues.apache.org/jira/browse/DRILL-2236#.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java a3c42de 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java 3541db7 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java 394a82c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java faa8546 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/explain/NumberingRelWriter.java 6522ad9 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SwapHashJoinVisitor.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java 6b3d301 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java aa0a5ad 
> 
> Diff: https://reviews.apache.org/r/30959/diff/
> 
> 
> Testing
> -------
> 
> Unit test done. 
> 
> Other regression workloads are running. Will update status once complete.
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>