You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Aman Sinha <as...@maprtech.com> on 2015/02/02 19:15:17 UTC

Review Request 30503: Process the null comparisons for IS NOT DISTINCT FROM operator.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30503/
-----------------------------------------------------------

Review request for drill, Jinfeng Ni and Mehant Baid.


Bugs: DRILL-2092
    https://issues.apache.org/jira/browse/DRILL-2092


Repository: drill-git


Description
-------

For queries of the form SELECT a, COUNT(DISTINCT b), SUM(b) FROM T GROUP BY a,  Calcite generates a plan where there are 2 subqueries each of which corresponds to an aggregate function with group-by and the 2 subqeuries are joined in the outer query block on the group-by columns.  This join-back uses 'IS NOT DISTINCT FROM' rather than equality..e.g here's an extract from the Explain: 
      DrillJoinRel(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT FROM($1, $6))], joinType=[inner])

The IS NOT DISTINCT FROM differs from '=' in terms of null handling.  null == null is FALSE, whereas null IS NOT DISTINCT FROM null is TRUE.  This patch handles the nulls for this comparator both during planning and execution operations.


Diffs
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java fd6a3e2 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java 4af0292 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 14bc094 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java d9a7277 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java d5473f2 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java f6b7ef6 
  exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java 2b3ff50 

Diff: https://reviews.apache.org/r/30503/diff/


Testing
-------

Added new unit tests.  Ran unit tests, functional suite and tpch 100.


Thanks,

Aman Sinha


Re: Review Request 30503: Process the null comparisons for IS NOT DISTINCT FROM operator.

Posted by Jinfeng Ni <jn...@maprtech.com>.

> On Feb. 2, 2015, 5:39 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java, line 68
> > <https://reviews.apache.org/r/30503/diff/1/?file=843838#file843838line68>
> >
> >     Because of this transformation of join, Drill will not be able to handle case without "group by", because it will convert into cartesian join.
> >     
> >     select a1, count(distinct a1), sum(b1) from t1;
> >     
> >     Maybe we should log a new issue for this particular case?
> 
> Aman Sinha wrote:
>     There might be an open JIRA .. I will check.

The example query should be:

select count(distinct a1), sum(b1) from t1;


- Jinfeng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30503/#review70688
-----------------------------------------------------------


On Feb. 4, 2015, 10:19 a.m., Aman Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30503/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 10:19 a.m.)
> 
> 
> Review request for drill, Jinfeng Ni and Mehant Baid.
> 
> 
> Bugs: DRILL-2092
>     https://issues.apache.org/jira/browse/DRILL-2092
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> For queries of the form SELECT a, COUNT(DISTINCT b), SUM(b) FROM T GROUP BY a,  Calcite generates a plan where there are 2 subqueries each of which corresponds to an aggregate function with group-by and the 2 subqeuries are joined in the outer query block on the group-by columns.  This join-back uses 'IS NOT DISTINCT FROM' rather than equality..e.g here's an extract from the Explain: 
>       DrillJoinRel(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT FROM($1, $6))], joinType=[inner])
> 
> The IS NOT DISTINCT FROM differs from '=' in terms of null handling.  null == null is FALSE, whereas null IS NOT DISTINCT FROM null is TRUE.  This patch handles the nulls for this comparator both during planning and execution operations.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java fd6a3e2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java 4af0292 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 14bc094 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java d9a7277 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java d5473f2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java f6b7ef6 
>   exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java 2b3ff50 
>   exec/java-exec/src/test/resources/agg/bugs/drill2092/input.json PRE-CREATION 
>   exec/java-exec/src/test/resources/agg/bugs/drill2092/result.tsv PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30503/diff/
> 
> 
> Testing
> -------
> 
> Added new unit tests.  Ran unit tests, functional suite and tpch 100.
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>


Re: Review Request 30503: Process the null comparisons for IS NOT DISTINCT FROM operator.

Posted by Aman Sinha <as...@maprtech.com>.

> On Feb. 3, 2015, 1:39 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java, line 35
> > <https://reviews.apache.org/r/30503/diff/1/?file=843833#file843833line35>
> >
> >     Is it better to use "equal", instead of "startsWith"? If someone put any string starts with "=" in the physical plan, will the code treat it as an EQUALS?

The comparison expression is actually something like '=($2, $4)' which is why I was not able to use exact equality. But I will make this checking smarter..


> On Feb. 3, 2015, 1:39 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java, line 42
> > <https://reviews.apache.org/r/30503/diff/1/?file=843833#file843833line42>
> >
> >     Same as the startsWith("=").

see above..


> On Feb. 3, 2015, 1:39 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java, line 127
> > <https://reviews.apache.org/r/30503/diff/1/?file=843836#file843836line127>
> >
> >     In stead of convert the condition into string first, the call split(), can we check the condition is a SqlBinaryOperator and it's SqlKind is IS_NOT_DISTINCT_FROM?   Using string comparison sometimes could cause issue, when the sql parser/planner uses a different string for that operator.

Let me look into it...should be doable.


> On Feb. 3, 2015, 1:39 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java, line 68
> > <https://reviews.apache.org/r/30503/diff/1/?file=843838#file843838line68>
> >
> >     Because of this transformation of join, Drill will not be able to handle case without "group by", because it will convert into cartesian join.
> >     
> >     select a1, count(distinct a1), sum(b1) from t1;
> >     
> >     Maybe we should log a new issue for this particular case?

There might be an open JIRA .. I will check.


- Aman


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30503/#review70688
-----------------------------------------------------------


On Feb. 2, 2015, 6:15 p.m., Aman Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30503/
> -----------------------------------------------------------
> 
> (Updated Feb. 2, 2015, 6:15 p.m.)
> 
> 
> Review request for drill, Jinfeng Ni and Mehant Baid.
> 
> 
> Bugs: DRILL-2092
>     https://issues.apache.org/jira/browse/DRILL-2092
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> For queries of the form SELECT a, COUNT(DISTINCT b), SUM(b) FROM T GROUP BY a,  Calcite generates a plan where there are 2 subqueries each of which corresponds to an aggregate function with group-by and the 2 subqeuries are joined in the outer query block on the group-by columns.  This join-back uses 'IS NOT DISTINCT FROM' rather than equality..e.g here's an extract from the Explain: 
>       DrillJoinRel(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT FROM($1, $6))], joinType=[inner])
> 
> The IS NOT DISTINCT FROM differs from '=' in terms of null handling.  null == null is FALSE, whereas null IS NOT DISTINCT FROM null is TRUE.  This patch handles the nulls for this comparator both during planning and execution operations.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java fd6a3e2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java 4af0292 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 14bc094 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java d9a7277 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java d5473f2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java f6b7ef6 
>   exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java 2b3ff50 
> 
> Diff: https://reviews.apache.org/r/30503/diff/
> 
> 
> Testing
> -------
> 
> Added new unit tests.  Ran unit tests, functional suite and tpch 100.
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>


Re: Review Request 30503: Process the null comparisons for IS NOT DISTINCT FROM operator.

Posted by Jinfeng Ni <jn...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30503/#review70688
-----------------------------------------------------------



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
<https://reviews.apache.org/r/30503/#comment116027>

    Is it better to use "equal", instead of "startsWith"? If someone put any string starts with "=" in the physical plan, will the code treat it as an EQUALS?



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
<https://reviews.apache.org/r/30503/#comment116028>

    Same as the startsWith("=").



exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java
<https://reviews.apache.org/r/30503/#comment116031>

    In stead of convert the condition into string first, the call split(), can we check the condition is a SqlBinaryOperator and it's SqlKind is IS_NOT_DISTINCT_FROM?   Using string comparison sometimes could cause issue, when the sql parser/planner uses a different string for that operator.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java
<https://reviews.apache.org/r/30503/#comment116032>

    Same comment as in line 125.



exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java
<https://reviews.apache.org/r/30503/#comment116033>

    Because of this transformation of join, Drill will not be able to handle case without "group by", because it will convert into cartesian join.
    
    select a1, count(distinct a1), sum(b1) from t1;
    
    Maybe we should log a new issue for this particular case?


- Jinfeng Ni


On Feb. 2, 2015, 10:15 a.m., Aman Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30503/
> -----------------------------------------------------------
> 
> (Updated Feb. 2, 2015, 10:15 a.m.)
> 
> 
> Review request for drill, Jinfeng Ni and Mehant Baid.
> 
> 
> Bugs: DRILL-2092
>     https://issues.apache.org/jira/browse/DRILL-2092
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> For queries of the form SELECT a, COUNT(DISTINCT b), SUM(b) FROM T GROUP BY a,  Calcite generates a plan where there are 2 subqueries each of which corresponds to an aggregate function with group-by and the 2 subqeuries are joined in the outer query block on the group-by columns.  This join-back uses 'IS NOT DISTINCT FROM' rather than equality..e.g here's an extract from the Explain: 
>       DrillJoinRel(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT FROM($1, $6))], joinType=[inner])
> 
> The IS NOT DISTINCT FROM differs from '=' in terms of null handling.  null == null is FALSE, whereas null IS NOT DISTINCT FROM null is TRUE.  This patch handles the nulls for this comparator both during planning and execution operations.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java fd6a3e2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java 4af0292 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 14bc094 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java d9a7277 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java d5473f2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java f6b7ef6 
>   exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java 2b3ff50 
> 
> Diff: https://reviews.apache.org/r/30503/diff/
> 
> 
> Testing
> -------
> 
> Added new unit tests.  Ran unit tests, functional suite and tpch 100.
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>


Re: Review Request 30503: Process the null comparisons for IS NOT DISTINCT FROM operator.

Posted by Jinfeng Ni <jn...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30503/#review71068
-----------------------------------------------------------

Ship it!


Ship It!

- Jinfeng Ni


On Feb. 4, 2015, 10:19 a.m., Aman Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30503/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 10:19 a.m.)
> 
> 
> Review request for drill, Jinfeng Ni and Mehant Baid.
> 
> 
> Bugs: DRILL-2092
>     https://issues.apache.org/jira/browse/DRILL-2092
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> For queries of the form SELECT a, COUNT(DISTINCT b), SUM(b) FROM T GROUP BY a,  Calcite generates a plan where there are 2 subqueries each of which corresponds to an aggregate function with group-by and the 2 subqeuries are joined in the outer query block on the group-by columns.  This join-back uses 'IS NOT DISTINCT FROM' rather than equality..e.g here's an extract from the Explain: 
>       DrillJoinRel(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT FROM($1, $6))], joinType=[inner])
> 
> The IS NOT DISTINCT FROM differs from '=' in terms of null handling.  null == null is FALSE, whereas null IS NOT DISTINCT FROM null is TRUE.  This patch handles the nulls for this comparator both during planning and execution operations.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java fd6a3e2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java 4af0292 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 14bc094 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java d9a7277 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java d5473f2 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java f6b7ef6 
>   exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java 2b3ff50 
>   exec/java-exec/src/test/resources/agg/bugs/drill2092/input.json PRE-CREATION 
>   exec/java-exec/src/test/resources/agg/bugs/drill2092/result.tsv PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30503/diff/
> 
> 
> Testing
> -------
> 
> Added new unit tests.  Ran unit tests, functional suite and tpch 100.
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>


Re: Review Request 30503: Process the null comparisons for IS NOT DISTINCT FROM operator.

Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30503/
-----------------------------------------------------------

(Updated Feb. 4, 2015, 6:19 p.m.)


Review request for drill, Jinfeng Ni and Mehant Baid.


Changes
-------

Updated patch after addressing review comments. The JoinPrel will generate the condition string corresponding to SqlKind (either EQUALS or IS_NOT_DISTINCT_FROM) and the execution operator will check for this.  However, several existing join tests already have .json plans that contain '--' comparison operator, so I have preserved that check.


Bugs: DRILL-2092
    https://issues.apache.org/jira/browse/DRILL-2092


Repository: drill-git


Description
-------

For queries of the form SELECT a, COUNT(DISTINCT b), SUM(b) FROM T GROUP BY a,  Calcite generates a plan where there are 2 subqueries each of which corresponds to an aggregate function with group-by and the 2 subqeuries are joined in the outer query block on the group-by columns.  This join-back uses 'IS NOT DISTINCT FROM' rather than equality..e.g here's an extract from the Explain: 
      DrillJoinRel(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT FROM($1, $6))], joinType=[inner])

The IS NOT DISTINCT FROM differs from '=' in terms of null handling.  null == null is FALSE, whereas null IS NOT DISTINCT FROM null is TRUE.  This patch handles the nulls for this comparator both during planning and execution operations.


Diffs (updated)
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java fd6a3e2 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java 4af0292 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 14bc094 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java d9a7277 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java d5473f2 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java f6b7ef6 
  exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java 2b3ff50 
  exec/java-exec/src/test/resources/agg/bugs/drill2092/input.json PRE-CREATION 
  exec/java-exec/src/test/resources/agg/bugs/drill2092/result.tsv PRE-CREATION 

Diff: https://reviews.apache.org/r/30503/diff/


Testing
-------

Added new unit tests.  Ran unit tests, functional suite and tpch 100.


Thanks,

Aman Sinha