You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Aman Sinha <as...@maprtech.com> on 2015/02/02 19:15:17 UTC
Review Request 30503: Process the null comparisons for IS NOT
DISTINCT FROM operator.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30503/
-----------------------------------------------------------
Review request for drill, Jinfeng Ni and Mehant Baid.
Bugs: DRILL-2092
https://issues.apache.org/jira/browse/DRILL-2092
Repository: drill-git
Description
-------
For queries of the form SELECT a, COUNT(DISTINCT b), SUM(b) FROM T GROUP BY a, Calcite generates a plan where there are 2 subqueries each of which corresponds to an aggregate function with group-by and the 2 subqeuries are joined in the outer query block on the group-by columns. This join-back uses 'IS NOT DISTINCT FROM' rather than equality..e.g here's an extract from the Explain:
DrillJoinRel(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT FROM($1, $6))], joinType=[inner])
The IS NOT DISTINCT FROM differs from '=' in terms of null handling. null == null is FALSE, whereas null IS NOT DISTINCT FROM null is TRUE. This patch handles the nulls for this comparator both during planning and execution operations.
Diffs
-----
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java fd6a3e2
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java 4af0292
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 14bc094
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java d9a7277
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java d5473f2
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java f6b7ef6
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java 2b3ff50
Diff: https://reviews.apache.org/r/30503/diff/
Testing
-------
Added new unit tests. Ran unit tests, functional suite and tpch 100.
Thanks,
Aman Sinha
Re: Review Request 30503: Process the null comparisons for IS NOT
DISTINCT FROM operator.
Posted by Jinfeng Ni <jn...@maprtech.com>.
> On Feb. 2, 2015, 5:39 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java, line 68
> > <https://reviews.apache.org/r/30503/diff/1/?file=843838#file843838line68>
> >
> > Because of this transformation of join, Drill will not be able to handle case without "group by", because it will convert into cartesian join.
> >
> > select a1, count(distinct a1), sum(b1) from t1;
> >
> > Maybe we should log a new issue for this particular case?
>
> Aman Sinha wrote:
> There might be an open JIRA .. I will check.
The example query should be:
select count(distinct a1), sum(b1) from t1;
- Jinfeng
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30503/#review70688
-----------------------------------------------------------
On Feb. 4, 2015, 10:19 a.m., Aman Sinha wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30503/
> -----------------------------------------------------------
>
> (Updated Feb. 4, 2015, 10:19 a.m.)
>
>
> Review request for drill, Jinfeng Ni and Mehant Baid.
>
>
> Bugs: DRILL-2092
> https://issues.apache.org/jira/browse/DRILL-2092
>
>
> Repository: drill-git
>
>
> Description
> -------
>
> For queries of the form SELECT a, COUNT(DISTINCT b), SUM(b) FROM T GROUP BY a, Calcite generates a plan where there are 2 subqueries each of which corresponds to an aggregate function with group-by and the 2 subqeuries are joined in the outer query block on the group-by columns. This join-back uses 'IS NOT DISTINCT FROM' rather than equality..e.g here's an extract from the Explain:
> DrillJoinRel(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT FROM($1, $6))], joinType=[inner])
>
> The IS NOT DISTINCT FROM differs from '=' in terms of null handling. null == null is FALSE, whereas null IS NOT DISTINCT FROM null is TRUE. This patch handles the nulls for this comparator both during planning and execution operations.
>
>
> Diffs
> -----
>
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java fd6a3e2
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java 4af0292
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java PRE-CREATION
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 14bc094
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java d9a7277
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java d5473f2
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java f6b7ef6
> exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java 2b3ff50
> exec/java-exec/src/test/resources/agg/bugs/drill2092/input.json PRE-CREATION
> exec/java-exec/src/test/resources/agg/bugs/drill2092/result.tsv PRE-CREATION
>
> Diff: https://reviews.apache.org/r/30503/diff/
>
>
> Testing
> -------
>
> Added new unit tests. Ran unit tests, functional suite and tpch 100.
>
>
> Thanks,
>
> Aman Sinha
>
>
Re: Review Request 30503: Process the null comparisons for IS NOT
DISTINCT FROM operator.
Posted by Aman Sinha <as...@maprtech.com>.
> On Feb. 3, 2015, 1:39 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java, line 35
> > <https://reviews.apache.org/r/30503/diff/1/?file=843833#file843833line35>
> >
> > Is it better to use "equal", instead of "startsWith"? If someone put any string starts with "=" in the physical plan, will the code treat it as an EQUALS?
The comparison expression is actually something like '=($2, $4)' which is why I was not able to use exact equality. But I will make this checking smarter..
> On Feb. 3, 2015, 1:39 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java, line 42
> > <https://reviews.apache.org/r/30503/diff/1/?file=843833#file843833line42>
> >
> > Same as the startsWith("=").
see above..
> On Feb. 3, 2015, 1:39 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java, line 127
> > <https://reviews.apache.org/r/30503/diff/1/?file=843836#file843836line127>
> >
> > In stead of convert the condition into string first, the call split(), can we check the condition is a SqlBinaryOperator and it's SqlKind is IS_NOT_DISTINCT_FROM? Using string comparison sometimes could cause issue, when the sql parser/planner uses a different string for that operator.
Let me look into it...should be doable.
> On Feb. 3, 2015, 1:39 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java, line 68
> > <https://reviews.apache.org/r/30503/diff/1/?file=843838#file843838line68>
> >
> > Because of this transformation of join, Drill will not be able to handle case without "group by", because it will convert into cartesian join.
> >
> > select a1, count(distinct a1), sum(b1) from t1;
> >
> > Maybe we should log a new issue for this particular case?
There might be an open JIRA .. I will check.
- Aman
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30503/#review70688
-----------------------------------------------------------
On Feb. 2, 2015, 6:15 p.m., Aman Sinha wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30503/
> -----------------------------------------------------------
>
> (Updated Feb. 2, 2015, 6:15 p.m.)
>
>
> Review request for drill, Jinfeng Ni and Mehant Baid.
>
>
> Bugs: DRILL-2092
> https://issues.apache.org/jira/browse/DRILL-2092
>
>
> Repository: drill-git
>
>
> Description
> -------
>
> For queries of the form SELECT a, COUNT(DISTINCT b), SUM(b) FROM T GROUP BY a, Calcite generates a plan where there are 2 subqueries each of which corresponds to an aggregate function with group-by and the 2 subqeuries are joined in the outer query block on the group-by columns. This join-back uses 'IS NOT DISTINCT FROM' rather than equality..e.g here's an extract from the Explain:
> DrillJoinRel(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT FROM($1, $6))], joinType=[inner])
>
> The IS NOT DISTINCT FROM differs from '=' in terms of null handling. null == null is FALSE, whereas null IS NOT DISTINCT FROM null is TRUE. This patch handles the nulls for this comparator both during planning and execution operations.
>
>
> Diffs
> -----
>
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java fd6a3e2
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java 4af0292
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java PRE-CREATION
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 14bc094
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java d9a7277
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java d5473f2
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java f6b7ef6
> exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java 2b3ff50
>
> Diff: https://reviews.apache.org/r/30503/diff/
>
>
> Testing
> -------
>
> Added new unit tests. Ran unit tests, functional suite and tpch 100.
>
>
> Thanks,
>
> Aman Sinha
>
>
Re: Review Request 30503: Process the null comparisons for IS NOT
DISTINCT FROM operator.
Posted by Jinfeng Ni <jn...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30503/#review70688
-----------------------------------------------------------
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
<https://reviews.apache.org/r/30503/#comment116027>
Is it better to use "equal", instead of "startsWith"? If someone put any string starts with "=" in the physical plan, will the code treat it as an EQUALS?
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
<https://reviews.apache.org/r/30503/#comment116028>
Same as the startsWith("=").
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java
<https://reviews.apache.org/r/30503/#comment116031>
In stead of convert the condition into string first, the call split(), can we check the condition is a SqlBinaryOperator and it's SqlKind is IS_NOT_DISTINCT_FROM? Using string comparison sometimes could cause issue, when the sql parser/planner uses a different string for that operator.
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java
<https://reviews.apache.org/r/30503/#comment116032>
Same comment as in line 125.
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java
<https://reviews.apache.org/r/30503/#comment116033>
Because of this transformation of join, Drill will not be able to handle case without "group by", because it will convert into cartesian join.
select a1, count(distinct a1), sum(b1) from t1;
Maybe we should log a new issue for this particular case?
- Jinfeng Ni
On Feb. 2, 2015, 10:15 a.m., Aman Sinha wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30503/
> -----------------------------------------------------------
>
> (Updated Feb. 2, 2015, 10:15 a.m.)
>
>
> Review request for drill, Jinfeng Ni and Mehant Baid.
>
>
> Bugs: DRILL-2092
> https://issues.apache.org/jira/browse/DRILL-2092
>
>
> Repository: drill-git
>
>
> Description
> -------
>
> For queries of the form SELECT a, COUNT(DISTINCT b), SUM(b) FROM T GROUP BY a, Calcite generates a plan where there are 2 subqueries each of which corresponds to an aggregate function with group-by and the 2 subqeuries are joined in the outer query block on the group-by columns. This join-back uses 'IS NOT DISTINCT FROM' rather than equality..e.g here's an extract from the Explain:
> DrillJoinRel(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT FROM($1, $6))], joinType=[inner])
>
> The IS NOT DISTINCT FROM differs from '=' in terms of null handling. null == null is FALSE, whereas null IS NOT DISTINCT FROM null is TRUE. This patch handles the nulls for this comparator both during planning and execution operations.
>
>
> Diffs
> -----
>
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java fd6a3e2
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java 4af0292
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java PRE-CREATION
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 14bc094
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java d9a7277
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java d5473f2
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java f6b7ef6
> exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java 2b3ff50
>
> Diff: https://reviews.apache.org/r/30503/diff/
>
>
> Testing
> -------
>
> Added new unit tests. Ran unit tests, functional suite and tpch 100.
>
>
> Thanks,
>
> Aman Sinha
>
>
Re: Review Request 30503: Process the null comparisons for IS NOT
DISTINCT FROM operator.
Posted by Jinfeng Ni <jn...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30503/#review71068
-----------------------------------------------------------
Ship it!
Ship It!
- Jinfeng Ni
On Feb. 4, 2015, 10:19 a.m., Aman Sinha wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30503/
> -----------------------------------------------------------
>
> (Updated Feb. 4, 2015, 10:19 a.m.)
>
>
> Review request for drill, Jinfeng Ni and Mehant Baid.
>
>
> Bugs: DRILL-2092
> https://issues.apache.org/jira/browse/DRILL-2092
>
>
> Repository: drill-git
>
>
> Description
> -------
>
> For queries of the form SELECT a, COUNT(DISTINCT b), SUM(b) FROM T GROUP BY a, Calcite generates a plan where there are 2 subqueries each of which corresponds to an aggregate function with group-by and the 2 subqeuries are joined in the outer query block on the group-by columns. This join-back uses 'IS NOT DISTINCT FROM' rather than equality..e.g here's an extract from the Explain:
> DrillJoinRel(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT FROM($1, $6))], joinType=[inner])
>
> The IS NOT DISTINCT FROM differs from '=' in terms of null handling. null == null is FALSE, whereas null IS NOT DISTINCT FROM null is TRUE. This patch handles the nulls for this comparator both during planning and execution operations.
>
>
> Diffs
> -----
>
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java fd6a3e2
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java 4af0292
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java PRE-CREATION
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 14bc094
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java d9a7277
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java d5473f2
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java f6b7ef6
> exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java 2b3ff50
> exec/java-exec/src/test/resources/agg/bugs/drill2092/input.json PRE-CREATION
> exec/java-exec/src/test/resources/agg/bugs/drill2092/result.tsv PRE-CREATION
>
> Diff: https://reviews.apache.org/r/30503/diff/
>
>
> Testing
> -------
>
> Added new unit tests. Ran unit tests, functional suite and tpch 100.
>
>
> Thanks,
>
> Aman Sinha
>
>
Re: Review Request 30503: Process the null comparisons for IS NOT
DISTINCT FROM operator.
Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30503/
-----------------------------------------------------------
(Updated Feb. 4, 2015, 6:19 p.m.)
Review request for drill, Jinfeng Ni and Mehant Baid.
Changes
-------
Updated patch after addressing review comments. The JoinPrel will generate the condition string corresponding to SqlKind (either EQUALS or IS_NOT_DISTINCT_FROM) and the execution operator will check for this. However, several existing join tests already have .json plans that contain '--' comparison operator, so I have preserved that check.
Bugs: DRILL-2092
https://issues.apache.org/jira/browse/DRILL-2092
Repository: drill-git
Description
-------
For queries of the form SELECT a, COUNT(DISTINCT b), SUM(b) FROM T GROUP BY a, Calcite generates a plan where there are 2 subqueries each of which corresponds to an aggregate function with group-by and the 2 subqeuries are joined in the outer query block on the group-by columns. This join-back uses 'IS NOT DISTINCT FROM' rather than equality..e.g here's an extract from the Explain:
DrillJoinRel(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT FROM($1, $6))], joinType=[inner])
The IS NOT DISTINCT FROM differs from '=' in terms of null handling. null == null is FALSE, whereas null IS NOT DISTINCT FROM null is TRUE. This patch handles the nulls for this comparator both during planning and execution operations.
Diffs (updated)
-----
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java fd6a3e2
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java 4af0292
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 14bc094
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java d9a7277
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java d5473f2
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java f6b7ef6
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java 2b3ff50
exec/java-exec/src/test/resources/agg/bugs/drill2092/input.json PRE-CREATION
exec/java-exec/src/test/resources/agg/bugs/drill2092/result.tsv PRE-CREATION
Diff: https://reviews.apache.org/r/30503/diff/
Testing
-------
Added new unit tests. Ran unit tests, functional suite and tpch 100.
Thanks,
Aman Sinha