You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Mehant Baid <ba...@gmail.com> on 2015/02/18 20:21:30 UTC

Review Request 31160: DRILL-2244: Implicit cast for join conditions

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/
-----------------------------------------------------------

Review request for drill and Aman Sinha.


Repository: drill-git


Description
-------

If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected


Diffs
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e 
  exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 9d3b8dc 
  exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION 
  exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION 

Diff: https://reviews.apache.org/r/31160/diff/


Testing
-------

Added unit tests with hash join and merge join


Thanks,

Mehant Baid


Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/#review72989
-----------------------------------------------------------



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
<https://reviews.apache.org/r/31160/#comment119152>

    This operation was previously done before adding the casts. Doing it after the cast seems fine from functional perspective; however if we are joining say Int to Bigint and the Int happens to be the build side, we will now create a bigint vector, so it will increase the memory footprint. It's probably worth noting this in the comments for future reference.  One could argue that if the build and probe were reversed, we would have created the bigint vector anyways.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
<https://reviews.apache.org/r/31160/#comment119116>

    Since this method is being used by both HJ and MJ, we should change the error message to something more general: Join conditions cannot be compared; left expression: , right expression:



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
<https://reviews.apache.org/r/31160/#comment119161>

    It seems like it would be useful to keep this check such that we don't materialize either left or right side if the right side returns NONE.


- Aman Sinha


On Feb. 18, 2015, 7:21 p.m., Mehant Baid wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31160/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2015, 7:21 p.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e 
>   exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 9d3b8dc 
>   exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION 
>   exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/31160/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests with hash join and merge join
> 
> 
> Thanks,
> 
> Mehant Baid
> 
>


Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/#review78788
-----------------------------------------------------------

Ship it!


Updated patch looks ok to me.  Couple of suggestions about testing:  the testing section says 'added unit tests for HJ and MJ' but I did not see those...I am thinking it is already covered by existing HJ and MJ tests as long as you set slice_target = 1 to force exchanges. Note that TpchDistributed sets slice_target = 1 but most of those plans are HashJoin plans.  So you might want to check that out.  Also, it would be useful to have a negative test case where we don't expect the hash as double (e.g for aggregates or for non-numeric types).

- Aman Sinha


On April 3, 2015, 8:13 a.m., Mehant Baid wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31160/
> -----------------------------------------------------------
> 
> (Updated April 3, 2015, 8:13 a.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64AsDouble.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64Functions.java 57154ed 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64FunctionsWithSeed.java b9ec956 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64WithSeedAsDouble.java PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java 84a2956 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java f7c144f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/resolver/TypeCastRules.java d8652f2 
>   exec/java-exec/src/test/java/org/apache/drill/TestFunctionsQuery.java f1005ab 
> 
> Diff: https://reviews.apache.org/r/31160/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests with hash join and merge join
> 
> 
> Thanks,
> 
> Mehant Baid
> 
>


Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

Posted by Mehant Baid <ba...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/
-----------------------------------------------------------

(Updated April 3, 2015, 8:13 a.m.)


Review request for drill and Aman Sinha.


Changes
-------

Updated patch to address review comment from Aman to not use hashing as double for aggregates and only to use it for joins and distribution.


Repository: drill-git


Description
-------

If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected


Diffs (updated)
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64AsDouble.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64Functions.java 57154ed 
  exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64FunctionsWithSeed.java b9ec956 
  exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64WithSeedAsDouble.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java 84a2956 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java f7c144f 
  exec/java-exec/src/main/java/org/apache/drill/exec/resolver/TypeCastRules.java d8652f2 
  exec/java-exec/src/test/java/org/apache/drill/TestFunctionsQuery.java f1005ab 

Diff: https://reviews.apache.org/r/31160/diff/


Testing
-------

Added unit tests with hash join and merge join


Thanks,

Mehant Baid


Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

Posted by Mehant Baid <ba...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/
-----------------------------------------------------------

(Updated April 3, 2015, 12:48 a.m.)


Review request for drill and Aman Sinha.


Repository: drill-git


Description
-------

If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected


Diffs (updated)
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64Functions.java 57154ed 
  exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64FunctionsWithSeed.java b9ec956 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java 84a2956 
  exec/java-exec/src/main/java/org/apache/drill/exec/resolver/TypeCastRules.java d8652f2 
  exec/java-exec/src/test/java/org/apache/drill/TestFunctionsQuery.java f1005ab 

Diff: https://reviews.apache.org/r/31160/diff/


Testing
-------

Added unit tests with hash join and merge join


Thanks,

Mehant Baid


Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

Posted by Mehant Baid <ba...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/
-----------------------------------------------------------

(Updated April 2, 2015, 6:57 p.m.)


Review request for drill and Aman Sinha.


Changes
-------

Now we hash all numeric data types as double, this would fix the distribution issue and we can apply implicit casting in the join.


Repository: drill-git


Description
-------

If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected


Diffs (updated)
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64Functions.java 57154ed 
  exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64FunctionsWithSeed.java b9ec956 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java 84a2956 
  exec/java-exec/src/main/java/org/apache/drill/exec/resolver/TypeCastRules.java d8652f2 
  exec/java-exec/src/test/java/org/apache/drill/TestFunctionsQuery.java f1005ab 

Diff: https://reviews.apache.org/r/31160/diff/


Testing
-------

Added unit tests with hash join and merge join


Thanks,

Mehant Baid


Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

Posted by Aman Sinha <as...@maprtech.com>.

> On Feb. 19, 2015, 3:30 a.m., Jacques Nadeau wrote:
> > This isn't legal unless we know that the distribution keys were hashed the same way.  Our thinking is that we should make sure that all numeric values hash the same way. For example, all convert to double and then are hashed.  We need to evaluate the performance impact.  When that isn't possible, we should throw a run time schema change exception and require the user to express the query with an explicit cast.  Otherwise we'll return wrong result.

Good point..I thought this issue was a follow-up to an older bug DRILL-1125 which was fixed in 0.6. True, the implicit cast join will work only if the data is co-located or for broadcast joins.


- Aman


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/#review73080
-----------------------------------------------------------


On Feb. 19, 2015, 2:23 a.m., Mehant Baid wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31160/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2015, 2:23 a.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e 
>   exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 55fb59f 
>   exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION 
>   exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/31160/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests with hash join and merge join
> 
> 
> Thanks,
> 
> Mehant Baid
> 
>


Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

Posted by Jacques Nadeau <ja...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/#review73080
-----------------------------------------------------------


This isn't legal unless we know that the distribution keys were hashed the same way.  Our thinking is that we should make sure that all numeric values hash the same way. For example, all convert to double and then are hashed.  We need to evaluate the performance impact.  When that isn't possible, we should throw a run time schema change exception and require the user to express the query with an explicit cast.  Otherwise we'll return wrong result.

- Jacques Nadeau


On Feb. 19, 2015, 2:23 a.m., Mehant Baid wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31160/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2015, 2:23 a.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e 
>   exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 55fb59f 
>   exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION 
>   exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/31160/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests with hash join and merge join
> 
> 
> Thanks,
> 
> Mehant Baid
> 
>


Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/#review73073
-----------------------------------------------------------

Ship it!


Ship It!

- Aman Sinha


On Feb. 19, 2015, 2:23 a.m., Mehant Baid wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31160/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2015, 2:23 a.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e 
>   exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 55fb59f 
>   exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION 
>   exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/31160/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests with hash join and merge join
> 
> 
> Thanks,
> 
> Mehant Baid
> 
>


Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

Posted by Mehant Baid <ba...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/
-----------------------------------------------------------

(Updated Feb. 19, 2015, 2:23 a.m.)


Review request for drill and Aman Sinha.


Changes
-------

Updated patch based on review comments


Repository: drill-git


Description
-------

If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected


Diffs (updated)
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e 
  exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 55fb59f 
  exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION 
  exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION 

Diff: https://reviews.apache.org/r/31160/diff/


Testing
-------

Added unit tests with hash join and merge join


Thanks,

Mehant Baid