You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Jinfeng Ni <jn...@maprtech.com> on 2015/04/18 22:34:21 UTC

Re: Review Request 33136: DRILL-1384: Rebase Drill's forked Optiq library onto Calcite release 1.0.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33136/
-----------------------------------------------------------

(Updated April 18, 2015, 1:34 p.m.)


Review request for drill and Aman Sinha.


Changes
-------

Exclude most of changes coming from Calcite pacakge rename/restructure, and keep the other major changes, to make it easier for review purpose.


Summary (updated)
-----------------

DRILL-1384: Rebase Drill's forked Optiq library onto Calcite release 1.0.


Repository: drill-git


Description
-------

Drill currently uses a forked Optiq (Renamed to Calcite) version, dated back in July 2014. The forked version has 10-20 Drill specific patches. However, we did not rebase the forked version onto the on-going Calcite release. As such, Drill misses some bug fixes/new feature development on Calcite side.

This patch is trying to rebase the Drill's forked version onto Calcite release 1.0. (More precisely, on commit of CALCITE-603, when this rebasing work started).

Most of changes happen due to Calcite's package structure / renaming (See CALCITE-296]. 

Some Drill specific changes:

1. Provide a Drill specifc RelDataTypeSystem, to support decimal with precision/scale up to 38.
2. Modify Drill's parser, to allow * in Compound Identifier.
3. Provide Drill specific FilterJoinRule, to enforce Drill only support equal-join in JOIN operator.
4. Modify Drill costing comparison, such that the costing oder is a total order when compare different plans.
5. Modify costing estimation for Drill Project operator. 
6. Use a ProjectRemove rule, such that it will honor parent's output field name.
7. Modify Calcite's Frameworks/planner interface, such that Drill will use validatedRowType to construct a top-level project, to ensure the final output field is what the query specified. (Calcite could inject "$F0", or "$EXPR0 into converted RelNode tree, in Sql2RelConverter)
8. Fix couple of Drill unit test cases, since the expected result by query semantics are not fixed.
9. Some type-related to Drill Sql operators. 

Some impact of such rebasing.
1. TPCH Q16, or query with NOT IN predicate involving NULLABLE column could hit CanNotPlanException. 

The previous plan for Q16, although return the correct result, is not valid, when the column is nullable. See DRILL-1957 OR CALCITE-373.

2. Plan changed in some TPCH queries, which may cause timeout for Q5 in TPCH scale factor 100 run.
We probably need continue refine the costing estimation formula, especially for Join operator.


Diffs (updated)
-----

  common/pom.xml 525b533 
  contrib/storage-hive/core/pom.xml 9bd6293 
  exec/java-exec/pom.xml f5313ca 
  exec/java-exec/src/main/codegen/includes/compoundIdentifier.ftl 50d8c20 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java 3b3aa1a 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java 7cf98cd 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java bbe7cf3 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillCostBase.java 87a1ea3 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterJoinRules.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillProjectRel.java 14ea894 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjIntoScan.java fcfced2 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillReduceExpressionsRule.java 29175e5 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java 84a0b51 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java f63057f 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlOperator.java 7bd48c8 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java a17a604 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java 5ee502d 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/ExplainHandler.java 1cce6a5 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java c347bef 
  exec/jdbc-all/pom.xml b369aed 
  exec/jdbc/pom.xml b4ec758 
  pom.xml f6bcd91 

Diff: https://reviews.apache.org/r/33136/diff/


Testing
-------

Unit testcase clean run. (except for disabled testcases. See DRILL-2630, DRILL-2761, etc)


Thanks,

Jinfeng Ni


Re: Review Request 33136: DRILL-1384: Rebase Drill's forked Optiq library onto Calcite release 1.0.

Posted by Jinfeng Ni <jn...@maprtech.com>.

> On April 18, 2015, 4:10 p.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillCostBase.java, line 61
> > <https://reviews.apache.org/r/33136/diff/3/?file=934734#file934734line61>
> >
> >     Is this factor actually helping with any of the plans ? If not, we should skip it since having more factors makes it difficult to estimate the cost.

This is used when in DrillCostBase we compare the total cost of cpu/io/network/memory of two different plans. Previously, memory cost is excluded, while in HashAgg/HashJoin/StreamAgg, the memory cost is one important factor when choose a plan. For io/network, seems we calculated in terms of # of instruction cycles, yet memory cost is calcuted propotionly to bytes. That's why we need this ratio, when combine them together. I understand this ratio for now is just a magic number, requiring further tuning in the near future.


> On April 18, 2015, 4:10 p.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java, line 123
> > <https://reviews.apache.org/r/33136/diff/3/?file=934739#file934739line123>
> >
> >     Remove all the commented out old rule names that are not valid in the master branch of Calcite since it adds confusion.

I'll remove. Originally, I kept them, so that we know what is the prior rule name, before the rebasing work, since we're more faimilar with the old rule name.


> On April 18, 2015, 4:10 p.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java, line 78
> > <https://reviews.apache.org/r/33136/diff/3/?file=934740#file934740line78>
> >
> >     Since this logic has moved to DrillJoinRelBase, you should just remove it from here.

remove.


- Jinfeng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33136/#review80617
-----------------------------------------------------------


On April 18, 2015, 1:34 p.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33136/
> -----------------------------------------------------------
> 
> (Updated April 18, 2015, 1:34 p.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Drill currently uses a forked Optiq (Renamed to Calcite) version, dated back in July 2014. The forked version has 10-20 Drill specific patches. However, we did not rebase the forked version onto the on-going Calcite release. As such, Drill misses some bug fixes/new feature development on Calcite side.
> 
> This patch is trying to rebase the Drill's forked version onto Calcite release 1.0. (More precisely, on commit of CALCITE-603, when this rebasing work started).
> 
> Most of changes happen due to Calcite's package structure / renaming (See CALCITE-296]. 
> 
> Some Drill specific changes:
> 
> 1. Provide a Drill specifc RelDataTypeSystem, to support decimal with precision/scale up to 38.
> 2. Modify Drill's parser, to allow * in Compound Identifier.
> 3. Provide Drill specific FilterJoinRule, to enforce Drill only support equal-join in JOIN operator.
> 4. Modify Drill costing comparison, such that the costing oder is a total order when compare different plans.
> 5. Modify costing estimation for Drill Project operator. 
> 6. Use a ProjectRemove rule, such that it will honor parent's output field name.
> 7. Modify Calcite's Frameworks/planner interface, such that Drill will use validatedRowType to construct a top-level project, to ensure the final output field is what the query specified. (Calcite could inject "$F0", or "$EXPR0 into converted RelNode tree, in Sql2RelConverter)
> 8. Fix couple of Drill unit test cases, since the expected result by query semantics are not fixed.
> 9. Some type-related to Drill Sql operators. 
> 
> Some impact of such rebasing.
> 1. TPCH Q16, or query with NOT IN predicate involving NULLABLE column could hit CanNotPlanException. 
> 
> The previous plan for Q16, although return the correct result, is not valid, when the column is nullable. See DRILL-1957 OR CALCITE-373.
> 
> 2. Plan changed in some TPCH queries, which may cause timeout for Q5 in TPCH scale factor 100 run.
> We probably need continue refine the costing estimation formula, especially for Join operator.
> 
> 
> Diffs
> -----
> 
>   common/pom.xml 525b533 
>   contrib/storage-hive/core/pom.xml 9bd6293 
>   exec/java-exec/pom.xml f5313ca 
>   exec/java-exec/src/main/codegen/includes/compoundIdentifier.ftl 50d8c20 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java 3b3aa1a 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java 7cf98cd 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java bbe7cf3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillCostBase.java 87a1ea3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterJoinRules.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillProjectRel.java 14ea894 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjIntoScan.java fcfced2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillReduceExpressionsRule.java 29175e5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java 84a0b51 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java f63057f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlOperator.java 7bd48c8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java a17a604 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java 5ee502d 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/ExplainHandler.java 1cce6a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java c347bef 
>   exec/jdbc-all/pom.xml b369aed 
>   exec/jdbc/pom.xml b4ec758 
>   pom.xml f6bcd91 
> 
> Diff: https://reviews.apache.org/r/33136/diff/
> 
> 
> Testing
> -------
> 
> Unit testcase clean run. (except for disabled testcases. See DRILL-2630, DRILL-2761, etc)
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Re: Review Request 33136: DRILL-1384: Rebase Drill's forked Optiq library onto Calcite release 1.0.

Posted by Aman Sinha <as...@maprtech.com>.

> On April 18, 2015, 11:10 p.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillCostBase.java, line 61
> > <https://reviews.apache.org/r/33136/diff/3/?file=934734#file934734line61>
> >
> >     Is this factor actually helping with any of the plans ? If not, we should skip it since having more factors makes it difficult to estimate the cost.
> 
> Jinfeng Ni wrote:
>     This is used when in DrillCostBase we compare the total cost of cpu/io/network/memory of two different plans. Previously, memory cost is excluded, while in HashAgg/HashJoin/StreamAgg, the memory cost is one important factor when choose a plan. For io/network, seems we calculated in terms of # of instruction cycles, yet memory cost is calcuted propotionly to bytes. That's why we need this ratio, when combine them together. I understand this ratio for now is just a magic number, requiring further tuning in the near future.

Ok to keep it in for now...


- Aman


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33136/#review80617
-----------------------------------------------------------


On April 18, 2015, 8:34 p.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33136/
> -----------------------------------------------------------
> 
> (Updated April 18, 2015, 8:34 p.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Drill currently uses a forked Optiq (Renamed to Calcite) version, dated back in July 2014. The forked version has 10-20 Drill specific patches. However, we did not rebase the forked version onto the on-going Calcite release. As such, Drill misses some bug fixes/new feature development on Calcite side.
> 
> This patch is trying to rebase the Drill's forked version onto Calcite release 1.0. (More precisely, on commit of CALCITE-603, when this rebasing work started).
> 
> Most of changes happen due to Calcite's package structure / renaming (See CALCITE-296]. 
> 
> Some Drill specific changes:
> 
> 1. Provide a Drill specifc RelDataTypeSystem, to support decimal with precision/scale up to 38.
> 2. Modify Drill's parser, to allow * in Compound Identifier.
> 3. Provide Drill specific FilterJoinRule, to enforce Drill only support equal-join in JOIN operator.
> 4. Modify Drill costing comparison, such that the costing oder is a total order when compare different plans.
> 5. Modify costing estimation for Drill Project operator. 
> 6. Use a ProjectRemove rule, such that it will honor parent's output field name.
> 7. Modify Calcite's Frameworks/planner interface, such that Drill will use validatedRowType to construct a top-level project, to ensure the final output field is what the query specified. (Calcite could inject "$F0", or "$EXPR0 into converted RelNode tree, in Sql2RelConverter)
> 8. Fix couple of Drill unit test cases, since the expected result by query semantics are not fixed.
> 9. Some type-related to Drill Sql operators. 
> 
> Some impact of such rebasing.
> 1. TPCH Q16, or query with NOT IN predicate involving NULLABLE column could hit CanNotPlanException. 
> 
> The previous plan for Q16, although return the correct result, is not valid, when the column is nullable. See DRILL-1957 OR CALCITE-373.
> 
> 2. Plan changed in some TPCH queries, which may cause timeout for Q5 in TPCH scale factor 100 run.
> We probably need continue refine the costing estimation formula, especially for Join operator.
> 
> 
> Diffs
> -----
> 
>   common/pom.xml 525b533 
>   contrib/storage-hive/core/pom.xml 9bd6293 
>   exec/java-exec/pom.xml f5313ca 
>   exec/java-exec/src/main/codegen/includes/compoundIdentifier.ftl 50d8c20 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java 3b3aa1a 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java 7cf98cd 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java bbe7cf3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillCostBase.java 87a1ea3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterJoinRules.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillProjectRel.java 14ea894 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjIntoScan.java fcfced2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillReduceExpressionsRule.java 29175e5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java 84a0b51 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java f63057f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlOperator.java 7bd48c8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java a17a604 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java 5ee502d 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/ExplainHandler.java 1cce6a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java c347bef 
>   exec/jdbc-all/pom.xml b369aed 
>   exec/jdbc/pom.xml b4ec758 
>   pom.xml f6bcd91 
> 
> Diff: https://reviews.apache.org/r/33136/diff/
> 
> 
> Testing
> -------
> 
> Unit testcase clean run. (except for disabled testcases. See DRILL-2630, DRILL-2761, etc)
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Re: Review Request 33136: DRILL-1384: Rebase Drill's forked Optiq library onto Calcite release 1.0.

Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33136/#review80617
-----------------------------------------------------------



exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java
<https://reviews.apache.org/r/33136/#comment130713>

    Remove this..



exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillCostBase.java
<https://reviews.apache.org/r/33136/#comment130718>

    Is this factor actually helping with any of the plans ? If not, we should skip it since having more factors makes it difficult to estimate the cost.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterJoinRules.java
<https://reviews.apache.org/r/33136/#comment130712>

    Keeping a chunk of commented out code creates confusion..better to remove it.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
<https://reviews.apache.org/r/33136/#comment130714>

    Remove all the commented out old rule names that are not valid in the master branch of Calcite since it adds confusion.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
<https://reviews.apache.org/r/33136/#comment130715>

    Since this logic has moved to DrillJoinRelBase, you should just remove it from here.


- Aman Sinha


On April 18, 2015, 8:34 p.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33136/
> -----------------------------------------------------------
> 
> (Updated April 18, 2015, 8:34 p.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Drill currently uses a forked Optiq (Renamed to Calcite) version, dated back in July 2014. The forked version has 10-20 Drill specific patches. However, we did not rebase the forked version onto the on-going Calcite release. As such, Drill misses some bug fixes/new feature development on Calcite side.
> 
> This patch is trying to rebase the Drill's forked version onto Calcite release 1.0. (More precisely, on commit of CALCITE-603, when this rebasing work started).
> 
> Most of changes happen due to Calcite's package structure / renaming (See CALCITE-296]. 
> 
> Some Drill specific changes:
> 
> 1. Provide a Drill specifc RelDataTypeSystem, to support decimal with precision/scale up to 38.
> 2. Modify Drill's parser, to allow * in Compound Identifier.
> 3. Provide Drill specific FilterJoinRule, to enforce Drill only support equal-join in JOIN operator.
> 4. Modify Drill costing comparison, such that the costing oder is a total order when compare different plans.
> 5. Modify costing estimation for Drill Project operator. 
> 6. Use a ProjectRemove rule, such that it will honor parent's output field name.
> 7. Modify Calcite's Frameworks/planner interface, such that Drill will use validatedRowType to construct a top-level project, to ensure the final output field is what the query specified. (Calcite could inject "$F0", or "$EXPR0 into converted RelNode tree, in Sql2RelConverter)
> 8. Fix couple of Drill unit test cases, since the expected result by query semantics are not fixed.
> 9. Some type-related to Drill Sql operators. 
> 
> Some impact of such rebasing.
> 1. TPCH Q16, or query with NOT IN predicate involving NULLABLE column could hit CanNotPlanException. 
> 
> The previous plan for Q16, although return the correct result, is not valid, when the column is nullable. See DRILL-1957 OR CALCITE-373.
> 
> 2. Plan changed in some TPCH queries, which may cause timeout for Q5 in TPCH scale factor 100 run.
> We probably need continue refine the costing estimation formula, especially for Join operator.
> 
> 
> Diffs
> -----
> 
>   common/pom.xml 525b533 
>   contrib/storage-hive/core/pom.xml 9bd6293 
>   exec/java-exec/pom.xml f5313ca 
>   exec/java-exec/src/main/codegen/includes/compoundIdentifier.ftl 50d8c20 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java 3b3aa1a 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java 7cf98cd 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java bbe7cf3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillCostBase.java 87a1ea3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterJoinRules.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillProjectRel.java 14ea894 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjIntoScan.java fcfced2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillReduceExpressionsRule.java 29175e5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java 84a0b51 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java f63057f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlOperator.java 7bd48c8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java a17a604 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java 5ee502d 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/ExplainHandler.java 1cce6a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java c347bef 
>   exec/jdbc-all/pom.xml b369aed 
>   exec/jdbc/pom.xml b4ec758 
>   pom.xml f6bcd91 
> 
> Diff: https://reviews.apache.org/r/33136/diff/
> 
> 
> Testing
> -------
> 
> Unit testcase clean run. (except for disabled testcases. See DRILL-2630, DRILL-2761, etc)
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Re: Review Request 33136: DRILL-1384: Rebase Drill's forked Optiq library onto Calcite release 1.0.

Posted by Jinfeng Ni <jn...@maprtech.com>.

> On April 20, 2015, 9:27 p.m., Julian Hyde wrote:
> > Difficult to tell the rationale behind each change, but all of the changes seem to make sense.
> > 
> > I noticed that you had to move to the version of ProjectRemoveRule.isTrivial with a boolean parameter. That was temporary, and was removed shortly afterwards. See CALCITE-575 and CALCITE-577. Unavoidable, I suppose.
> > 
> > At a later point, consider renaming Drill's RelNodes and Rules to match Calcite's new naming scheme. Strictly optional of course.

That's right. ProjectRemoveRule.isTrivial used in Drill is a deprecated version. The reason we use that version is because Drill's execution uses name-based logic, different from Calcite's ordinal-based executiion. A project with identical expressions, yet different names should not be removed in Drill's plan; otherwise, the query may not return correct result. We probably have to re-visit this deprecated code issue, when we keep going on with Calcite's new code. 

Agree that we should renaming Drill's Rules to match Calcite's new naming scheme. I'll do that in a follow-up patch. This patch is mainly focus on not causing any regression to the existing test workloads.


- Jinfeng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33136/#review80902
-----------------------------------------------------------


On April 18, 2015, 1:34 p.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33136/
> -----------------------------------------------------------
> 
> (Updated April 18, 2015, 1:34 p.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Drill currently uses a forked Optiq (Renamed to Calcite) version, dated back in July 2014. The forked version has 10-20 Drill specific patches. However, we did not rebase the forked version onto the on-going Calcite release. As such, Drill misses some bug fixes/new feature development on Calcite side.
> 
> This patch is trying to rebase the Drill's forked version onto Calcite release 1.0. (More precisely, on commit of CALCITE-603, when this rebasing work started).
> 
> Most of changes happen due to Calcite's package structure / renaming (See CALCITE-296]. 
> 
> Some Drill specific changes:
> 
> 1. Provide a Drill specifc RelDataTypeSystem, to support decimal with precision/scale up to 38.
> 2. Modify Drill's parser, to allow * in Compound Identifier.
> 3. Provide Drill specific FilterJoinRule, to enforce Drill only support equal-join in JOIN operator.
> 4. Modify Drill costing comparison, such that the costing oder is a total order when compare different plans.
> 5. Modify costing estimation for Drill Project operator. 
> 6. Use a ProjectRemove rule, such that it will honor parent's output field name.
> 7. Modify Calcite's Frameworks/planner interface, such that Drill will use validatedRowType to construct a top-level project, to ensure the final output field is what the query specified. (Calcite could inject "$F0", or "$EXPR0 into converted RelNode tree, in Sql2RelConverter)
> 8. Fix couple of Drill unit test cases, since the expected result by query semantics are not fixed.
> 9. Some type-related to Drill Sql operators. 
> 
> Some impact of such rebasing.
> 1. TPCH Q16, or query with NOT IN predicate involving NULLABLE column could hit CanNotPlanException. 
> 
> The previous plan for Q16, although return the correct result, is not valid, when the column is nullable. See DRILL-1957 OR CALCITE-373.
> 
> 2. Plan changed in some TPCH queries, which may cause timeout for Q5 in TPCH scale factor 100 run.
> We probably need continue refine the costing estimation formula, especially for Join operator.
> 
> 
> Diffs
> -----
> 
>   common/pom.xml 525b533 
>   contrib/storage-hive/core/pom.xml 9bd6293 
>   exec/java-exec/pom.xml f5313ca 
>   exec/java-exec/src/main/codegen/includes/compoundIdentifier.ftl 50d8c20 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java 3b3aa1a 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java 7cf98cd 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java bbe7cf3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillCostBase.java 87a1ea3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterJoinRules.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillProjectRel.java 14ea894 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjIntoScan.java fcfced2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillReduceExpressionsRule.java 29175e5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java 84a0b51 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java f63057f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlOperator.java 7bd48c8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java a17a604 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java 5ee502d 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/ExplainHandler.java 1cce6a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java c347bef 
>   exec/jdbc-all/pom.xml b369aed 
>   exec/jdbc/pom.xml b4ec758 
>   pom.xml f6bcd91 
> 
> Diff: https://reviews.apache.org/r/33136/diff/
> 
> 
> Testing
> -------
> 
> Unit testcase clean run. (except for disabled testcases. See DRILL-2630, DRILL-2761, etc)
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Re: Review Request 33136: DRILL-1384: Rebase Drill's forked Optiq library onto Calcite release 1.0.

Posted by Julian Hyde <ju...@hydromatic.net>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33136/#review80902
-----------------------------------------------------------

Ship it!


Difficult to tell the rationale behind each change, but all of the changes seem to make sense.

I noticed that you had to move to the version of ProjectRemoveRule.isTrivial with a boolean parameter. That was temporary, and was removed shortly afterwards. See CALCITE-575 and CALCITE-577. Unavoidable, I suppose.

At a later point, consider renaming Drill's RelNodes and Rules to match Calcite's new naming scheme. Strictly optional of course.

- Julian Hyde


On April 18, 2015, 8:34 p.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33136/
> -----------------------------------------------------------
> 
> (Updated April 18, 2015, 8:34 p.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Drill currently uses a forked Optiq (Renamed to Calcite) version, dated back in July 2014. The forked version has 10-20 Drill specific patches. However, we did not rebase the forked version onto the on-going Calcite release. As such, Drill misses some bug fixes/new feature development on Calcite side.
> 
> This patch is trying to rebase the Drill's forked version onto Calcite release 1.0. (More precisely, on commit of CALCITE-603, when this rebasing work started).
> 
> Most of changes happen due to Calcite's package structure / renaming (See CALCITE-296]. 
> 
> Some Drill specific changes:
> 
> 1. Provide a Drill specifc RelDataTypeSystem, to support decimal with precision/scale up to 38.
> 2. Modify Drill's parser, to allow * in Compound Identifier.
> 3. Provide Drill specific FilterJoinRule, to enforce Drill only support equal-join in JOIN operator.
> 4. Modify Drill costing comparison, such that the costing oder is a total order when compare different plans.
> 5. Modify costing estimation for Drill Project operator. 
> 6. Use a ProjectRemove rule, such that it will honor parent's output field name.
> 7. Modify Calcite's Frameworks/planner interface, such that Drill will use validatedRowType to construct a top-level project, to ensure the final output field is what the query specified. (Calcite could inject "$F0", or "$EXPR0 into converted RelNode tree, in Sql2RelConverter)
> 8. Fix couple of Drill unit test cases, since the expected result by query semantics are not fixed.
> 9. Some type-related to Drill Sql operators. 
> 
> Some impact of such rebasing.
> 1. TPCH Q16, or query with NOT IN predicate involving NULLABLE column could hit CanNotPlanException. 
> 
> The previous plan for Q16, although return the correct result, is not valid, when the column is nullable. See DRILL-1957 OR CALCITE-373.
> 
> 2. Plan changed in some TPCH queries, which may cause timeout for Q5 in TPCH scale factor 100 run.
> We probably need continue refine the costing estimation formula, especially for Join operator.
> 
> 
> Diffs
> -----
> 
>   common/pom.xml 525b533 
>   contrib/storage-hive/core/pom.xml 9bd6293 
>   exec/java-exec/pom.xml f5313ca 
>   exec/java-exec/src/main/codegen/includes/compoundIdentifier.ftl 50d8c20 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java 3b3aa1a 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java 7cf98cd 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java bbe7cf3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillCostBase.java 87a1ea3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterJoinRules.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillProjectRel.java 14ea894 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjIntoScan.java fcfced2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillReduceExpressionsRule.java 29175e5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java 84a0b51 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java f63057f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlOperator.java 7bd48c8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java a17a604 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java 5ee502d 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/ExplainHandler.java 1cce6a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java c347bef 
>   exec/jdbc-all/pom.xml b369aed 
>   exec/jdbc/pom.xml b4ec758 
>   pom.xml f6bcd91 
> 
> Diff: https://reviews.apache.org/r/33136/diff/
> 
> 
> Testing
> -------
> 
> Unit testcase clean run. (except for disabled testcases. See DRILL-2630, DRILL-2761, etc)
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Re: Review Request 33136: DRILL-1384: Rebase Drill's forked Optiq library onto Calcite release 1.0.

Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33136/#review80631
-----------------------------------------------------------

Ship it!


Ship It!

- Aman Sinha


On April 18, 2015, 8:34 p.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33136/
> -----------------------------------------------------------
> 
> (Updated April 18, 2015, 8:34 p.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Drill currently uses a forked Optiq (Renamed to Calcite) version, dated back in July 2014. The forked version has 10-20 Drill specific patches. However, we did not rebase the forked version onto the on-going Calcite release. As such, Drill misses some bug fixes/new feature development on Calcite side.
> 
> This patch is trying to rebase the Drill's forked version onto Calcite release 1.0. (More precisely, on commit of CALCITE-603, when this rebasing work started).
> 
> Most of changes happen due to Calcite's package structure / renaming (See CALCITE-296]. 
> 
> Some Drill specific changes:
> 
> 1. Provide a Drill specifc RelDataTypeSystem, to support decimal with precision/scale up to 38.
> 2. Modify Drill's parser, to allow * in Compound Identifier.
> 3. Provide Drill specific FilterJoinRule, to enforce Drill only support equal-join in JOIN operator.
> 4. Modify Drill costing comparison, such that the costing oder is a total order when compare different plans.
> 5. Modify costing estimation for Drill Project operator. 
> 6. Use a ProjectRemove rule, such that it will honor parent's output field name.
> 7. Modify Calcite's Frameworks/planner interface, such that Drill will use validatedRowType to construct a top-level project, to ensure the final output field is what the query specified. (Calcite could inject "$F0", or "$EXPR0 into converted RelNode tree, in Sql2RelConverter)
> 8. Fix couple of Drill unit test cases, since the expected result by query semantics are not fixed.
> 9. Some type-related to Drill Sql operators. 
> 
> Some impact of such rebasing.
> 1. TPCH Q16, or query with NOT IN predicate involving NULLABLE column could hit CanNotPlanException. 
> 
> The previous plan for Q16, although return the correct result, is not valid, when the column is nullable. See DRILL-1957 OR CALCITE-373.
> 
> 2. Plan changed in some TPCH queries, which may cause timeout for Q5 in TPCH scale factor 100 run.
> We probably need continue refine the costing estimation formula, especially for Join operator.
> 
> 
> Diffs
> -----
> 
>   common/pom.xml 525b533 
>   contrib/storage-hive/core/pom.xml 9bd6293 
>   exec/java-exec/pom.xml f5313ca 
>   exec/java-exec/src/main/codegen/includes/compoundIdentifier.ftl 50d8c20 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java 3b3aa1a 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java 7cf98cd 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java bbe7cf3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillCostBase.java 87a1ea3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterJoinRules.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillProjectRel.java 14ea894 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjIntoScan.java fcfced2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillReduceExpressionsRule.java 29175e5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java 84a0b51 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java f63057f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlOperator.java 7bd48c8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateTableHandler.java a17a604 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java 5ee502d 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/ExplainHandler.java 1cce6a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/SqlHandlerUtil.java c347bef 
>   exec/jdbc-all/pom.xml b369aed 
>   exec/jdbc/pom.xml b4ec758 
>   pom.xml f6bcd91 
> 
> Diff: https://reviews.apache.org/r/33136/diff/
> 
> 
> Testing
> -------
> 
> Unit testcase clean run. (except for disabled testcases. See DRILL-2630, DRILL-2761, etc)
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>