You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Mehant Baid <ba...@gmail.com> on 2015/02/18 20:21:30 UTC
Review Request 31160: DRILL-2244: Implicit cast for join conditions
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/
-----------------------------------------------------------
Review request for drill and Aman Sinha.
Repository: drill-git
Description
-------
If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected
Diffs
-----
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e
exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 9d3b8dc
exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION
exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION
Diff: https://reviews.apache.org/r/31160/diff/
Testing
-------
Added unit tests with hash join and merge join
Thanks,
Mehant Baid
Re: Review Request 31160: DRILL-2244: Implicit cast for join
conditions
Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/#review72989
-----------------------------------------------------------
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
<https://reviews.apache.org/r/31160/#comment119152>
This operation was previously done before adding the casts. Doing it after the cast seems fine from functional perspective; however if we are joining say Int to Bigint and the Int happens to be the build side, we will now create a bigint vector, so it will increase the memory footprint. It's probably worth noting this in the comments for future reference. One could argue that if the build and probe were reversed, we would have created the bigint vector anyways.
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
<https://reviews.apache.org/r/31160/#comment119116>
Since this method is being used by both HJ and MJ, we should change the error message to something more general: Join conditions cannot be compared; left expression: , right expression:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
<https://reviews.apache.org/r/31160/#comment119161>
It seems like it would be useful to keep this check such that we don't materialize either left or right side if the right side returns NONE.
- Aman Sinha
On Feb. 18, 2015, 7:21 p.m., Mehant Baid wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31160/
> -----------------------------------------------------------
>
> (Updated Feb. 18, 2015, 7:21 p.m.)
>
>
> Review request for drill and Aman Sinha.
>
>
> Repository: drill-git
>
>
> Description
> -------
>
> If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected
>
>
> Diffs
> -----
>
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e
> exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 9d3b8dc
> exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION
> exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION
>
> Diff: https://reviews.apache.org/r/31160/diff/
>
>
> Testing
> -------
>
> Added unit tests with hash join and merge join
>
>
> Thanks,
>
> Mehant Baid
>
>
Re: Review Request 31160: DRILL-2244: Implicit cast for join
conditions
Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/#review78788
-----------------------------------------------------------
Ship it!
Updated patch looks ok to me. Couple of suggestions about testing: the testing section says 'added unit tests for HJ and MJ' but I did not see those...I am thinking it is already covered by existing HJ and MJ tests as long as you set slice_target = 1 to force exchanges. Note that TpchDistributed sets slice_target = 1 but most of those plans are HashJoin plans. So you might want to check that out. Also, it would be useful to have a negative test case where we don't expect the hash as double (e.g for aggregates or for non-numeric types).
- Aman Sinha
On April 3, 2015, 8:13 a.m., Mehant Baid wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31160/
> -----------------------------------------------------------
>
> (Updated April 3, 2015, 8:13 a.m.)
>
>
> Review request for drill and Aman Sinha.
>
>
> Repository: drill-git
>
>
> Description
> -------
>
> If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected
>
>
> Diffs
> -----
>
> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64AsDouble.java PRE-CREATION
> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64Functions.java 57154ed
> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64FunctionsWithSeed.java b9ec956
> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64WithSeedAsDouble.java PRE-CREATION
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java 84a2956
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java f7c144f
> exec/java-exec/src/main/java/org/apache/drill/exec/resolver/TypeCastRules.java d8652f2
> exec/java-exec/src/test/java/org/apache/drill/TestFunctionsQuery.java f1005ab
>
> Diff: https://reviews.apache.org/r/31160/diff/
>
>
> Testing
> -------
>
> Added unit tests with hash join and merge join
>
>
> Thanks,
>
> Mehant Baid
>
>
Re: Review Request 31160: DRILL-2244: Implicit cast for join
conditions
Posted by Mehant Baid <ba...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/
-----------------------------------------------------------
(Updated April 3, 2015, 8:13 a.m.)
Review request for drill and Aman Sinha.
Changes
-------
Updated patch to address review comment from Aman to not use hashing as double for aggregates and only to use it for joins and distribution.
Repository: drill-git
Description
-------
If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected
Diffs (updated)
-----
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64AsDouble.java PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64Functions.java 57154ed
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64FunctionsWithSeed.java b9ec956
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64WithSeedAsDouble.java PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java 84a2956
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java f7c144f
exec/java-exec/src/main/java/org/apache/drill/exec/resolver/TypeCastRules.java d8652f2
exec/java-exec/src/test/java/org/apache/drill/TestFunctionsQuery.java f1005ab
Diff: https://reviews.apache.org/r/31160/diff/
Testing
-------
Added unit tests with hash join and merge join
Thanks,
Mehant Baid
Re: Review Request 31160: DRILL-2244: Implicit cast for join
conditions
Posted by Mehant Baid <ba...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/
-----------------------------------------------------------
(Updated April 3, 2015, 12:48 a.m.)
Review request for drill and Aman Sinha.
Repository: drill-git
Description
-------
If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected
Diffs (updated)
-----
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64Functions.java 57154ed
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64FunctionsWithSeed.java b9ec956
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java 84a2956
exec/java-exec/src/main/java/org/apache/drill/exec/resolver/TypeCastRules.java d8652f2
exec/java-exec/src/test/java/org/apache/drill/TestFunctionsQuery.java f1005ab
Diff: https://reviews.apache.org/r/31160/diff/
Testing
-------
Added unit tests with hash join and merge join
Thanks,
Mehant Baid
Re: Review Request 31160: DRILL-2244: Implicit cast for join
conditions
Posted by Mehant Baid <ba...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/
-----------------------------------------------------------
(Updated April 2, 2015, 6:57 p.m.)
Review request for drill and Aman Sinha.
Changes
-------
Now we hash all numeric data types as double, this would fix the distribution issue and we can apply implicit casting in the join.
Repository: drill-git
Description
-------
If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected
Diffs (updated)
-----
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64Functions.java 57154ed
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64FunctionsWithSeed.java b9ec956
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java 84a2956
exec/java-exec/src/main/java/org/apache/drill/exec/resolver/TypeCastRules.java d8652f2
exec/java-exec/src/test/java/org/apache/drill/TestFunctionsQuery.java f1005ab
Diff: https://reviews.apache.org/r/31160/diff/
Testing
-------
Added unit tests with hash join and merge join
Thanks,
Mehant Baid
Re: Review Request 31160: DRILL-2244: Implicit cast for join
conditions
Posted by Aman Sinha <as...@maprtech.com>.
> On Feb. 19, 2015, 3:30 a.m., Jacques Nadeau wrote:
> > This isn't legal unless we know that the distribution keys were hashed the same way. Our thinking is that we should make sure that all numeric values hash the same way. For example, all convert to double and then are hashed. We need to evaluate the performance impact. When that isn't possible, we should throw a run time schema change exception and require the user to express the query with an explicit cast. Otherwise we'll return wrong result.
Good point..I thought this issue was a follow-up to an older bug DRILL-1125 which was fixed in 0.6. True, the implicit cast join will work only if the data is co-located or for broadcast joins.
- Aman
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/#review73080
-----------------------------------------------------------
On Feb. 19, 2015, 2:23 a.m., Mehant Baid wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31160/
> -----------------------------------------------------------
>
> (Updated Feb. 19, 2015, 2:23 a.m.)
>
>
> Review request for drill and Aman Sinha.
>
>
> Repository: drill-git
>
>
> Description
> -------
>
> If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected
>
>
> Diffs
> -----
>
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e
> exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 55fb59f
> exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION
> exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION
>
> Diff: https://reviews.apache.org/r/31160/diff/
>
>
> Testing
> -------
>
> Added unit tests with hash join and merge join
>
>
> Thanks,
>
> Mehant Baid
>
>
Re: Review Request 31160: DRILL-2244: Implicit cast for join
conditions
Posted by Jacques Nadeau <ja...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/#review73080
-----------------------------------------------------------
This isn't legal unless we know that the distribution keys were hashed the same way. Our thinking is that we should make sure that all numeric values hash the same way. For example, all convert to double and then are hashed. We need to evaluate the performance impact. When that isn't possible, we should throw a run time schema change exception and require the user to express the query with an explicit cast. Otherwise we'll return wrong result.
- Jacques Nadeau
On Feb. 19, 2015, 2:23 a.m., Mehant Baid wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31160/
> -----------------------------------------------------------
>
> (Updated Feb. 19, 2015, 2:23 a.m.)
>
>
> Review request for drill and Aman Sinha.
>
>
> Repository: drill-git
>
>
> Description
> -------
>
> If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected
>
>
> Diffs
> -----
>
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e
> exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 55fb59f
> exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION
> exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION
>
> Diff: https://reviews.apache.org/r/31160/diff/
>
>
> Testing
> -------
>
> Added unit tests with hash join and merge join
>
>
> Thanks,
>
> Mehant Baid
>
>
Re: Review Request 31160: DRILL-2244: Implicit cast for join
conditions
Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/#review73073
-----------------------------------------------------------
Ship it!
Ship It!
- Aman Sinha
On Feb. 19, 2015, 2:23 a.m., Mehant Baid wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31160/
> -----------------------------------------------------------
>
> (Updated Feb. 19, 2015, 2:23 a.m.)
>
>
> Review request for drill and Aman Sinha.
>
>
> Repository: drill-git
>
>
> Description
> -------
>
> If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected
>
>
> Diffs
> -----
>
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e
> exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 55fb59f
> exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION
> exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION
>
> Diff: https://reviews.apache.org/r/31160/diff/
>
>
> Testing
> -------
>
> Added unit tests with hash join and merge join
>
>
> Thanks,
>
> Mehant Baid
>
>
Re: Review Request 31160: DRILL-2244: Implicit cast for join
conditions
Posted by Mehant Baid <ba...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/
-----------------------------------------------------------
(Updated Feb. 19, 2015, 2:23 a.m.)
Review request for drill and Aman Sinha.
Changes
-------
Updated patch based on review comments
Repository: drill-git
Description
-------
If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected
Diffs (updated)
-----
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e
exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 55fb59f
exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION
exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION
Diff: https://reviews.apache.org/r/31160/diff/
Testing
-------
Added unit tests with hash join and merge join
Thanks,
Mehant Baid