You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Mehant Baid <ba...@gmail.com> on 2015/04/19 23:34:56 UTC

Review Request 33343: DRILL-2823: Implicit cast for comparisions in join conditions

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33343/
-----------------------------------------------------------

Review request for drill and Aman Sinha.


Repository: drill-git


Description
-------

DRILL-2753 aims to remove the comparision function implementations that have different data type inputs (Look at the JIRA for more details). As a result we need to modify hash join and merge join so that when we have different types in the join condition they can apply implicit casts to do the comparison. We have resolved the issue of distribution (as part of DRILL-2244) and so different data types with the same numeric value will be correctly distributed to the same node. 

As part of this change we materialize the expression in the join condition, check if the types are different and apply casts if necessary.


Diffs
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java 9df67d8 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 7fa79a1 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 8fce52e 
  exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java 796f6fe 
  exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestJoinAdvanced.java PRE-CREATION 

Diff: https://reviews.apache.org/r/33343/diff/


Testing
-------

Added unit tests plus ran existing tests with different data types in join conditions.


Thanks,

Mehant Baid


Re: Review Request 33343: DRILL-2823: Implicit cast for comparisions in join conditions

Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33343/#review80650
-----------------------------------------------------------

Ship it!


Ship It!

- Aman Sinha


On April 20, 2015, 4:55 a.m., Mehant Baid wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33343/
> -----------------------------------------------------------
> 
> (Updated April 20, 2015, 4:55 a.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> DRILL-2753 aims to remove the comparision function implementations that have different data type inputs (Look at the JIRA for more details). As a result we need to modify hash join and merge join so that when we have different types in the join condition they can apply implicit casts to do the comparison. We have resolved the issue of distribution (as part of DRILL-2244) and so different data types with the same numeric value will be correctly distributed to the same node. 
> 
> As part of this change we materialize the expression in the join condition, check if the types are different and apply casts if necessary.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java 9df67d8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 7fa79a1 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 8fce52e 
>   exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java 796f6fe 
>   exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestJoinAdvanced.java PRE-CREATION 
>   exec/java-exec/src/test/resources/jsoninput/implicit_cast_join_1.json PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/33343/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests plus ran existing tests with different data types in join conditions.
> 
> 
> Thanks,
> 
> Mehant Baid
> 
>


Re: Review Request 33343: DRILL-2823: Implicit cast for comparisions in join conditions

Posted by Mehant Baid <ba...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33343/
-----------------------------------------------------------

(Updated April 20, 2015, 4:55 a.m.)


Review request for drill and Aman Sinha.


Changes
-------

Addressed review comments.


Repository: drill-git


Description
-------

DRILL-2753 aims to remove the comparision function implementations that have different data type inputs (Look at the JIRA for more details). As a result we need to modify hash join and merge join so that when we have different types in the join condition they can apply implicit casts to do the comparison. We have resolved the issue of distribution (as part of DRILL-2244) and so different data types with the same numeric value will be correctly distributed to the same node. 

As part of this change we materialize the expression in the join condition, check if the types are different and apply casts if necessary.


Diffs (updated)
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java 9df67d8 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 7fa79a1 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 8fce52e 
  exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java 796f6fe 
  exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestJoinAdvanced.java PRE-CREATION 
  exec/java-exec/src/test/resources/jsoninput/implicit_cast_join_1.json PRE-CREATION 

Diff: https://reviews.apache.org/r/33343/diff/


Testing
-------

Added unit tests plus ran existing tests with different data types in join conditions.


Thanks,

Mehant Baid


Re: Review Request 33343: DRILL-2823: Implicit cast for comparisions in join conditions

Posted by Mehant Baid <ba...@gmail.com>.

> On April 20, 2015, 1:01 a.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java, line 189
> > <https://reviews.apache.org/r/33343/diff/1/?file=935063#file935063line189>
> >
> >     If we have  int a = bigint b , both sides will be hash distributed as doubles, then when we are doing the join (either hash or merge join), we will ignore that they were distributed as double and just implict cast the int to bigint ...is that correct ? 
> >     
> >     Any thoughts on the performance impact ? Previously the comparison function would have taken care of comparing different types, now it must go through implicit casting.

we will ignore that they were distributed as double and just implict cast the int to bigint ...is that correct ? 
Yes that is the behavior currently. 


Performance impact should be minimal or none, I would think since even with the explicit function implementations we were doing the casting. However we probably might be creating an extra object (for the holder) during implicit casting but I think that should be ok.


> On April 20, 2015, 1:01 a.m., Aman Sinha wrote:
> > exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestJoinAdvanced.java, line 80
> > <https://reviews.apache.org/r/33343/diff/1/?file=935067#file935067line80>
> >
> >     It would be good to have a test with join conditions on 2 or more columns that require implicit casting - 
> >     WHERE t1.a1 = t2.a2 
> >           AND t1.b1 = t2.b2 
> >           AND t1.c1 = t2.c2 
> >      where a1:int, b1:bigint, b1:float, b2:double, c1:double, c2: varchar with numeric double values.  
> >     This does not necesarily have to be unit test but see if you can get it included either as unit or functional test.

I have added a unit test with miniscule data as part of the unit test. I will add a test with more substantial data to the functional suite as well. 

I have added the case with multiple join conditions with the following data types on the left and right sides
1. bigint and int
2. double and float 
3. bigint and double

The case that you mention about varchar and double isn't supported, since our distribution only takes care of distributing the numeric values as double. This behavior is consistent with Postgres where in such cases users are required to put an explicit cast. To support such a scenario we would have to hash everything as varchar which will not be very efficient.


- Mehant


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33343/#review80643
-----------------------------------------------------------


On April 20, 2015, 4:55 a.m., Mehant Baid wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33343/
> -----------------------------------------------------------
> 
> (Updated April 20, 2015, 4:55 a.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> DRILL-2753 aims to remove the comparision function implementations that have different data type inputs (Look at the JIRA for more details). As a result we need to modify hash join and merge join so that when we have different types in the join condition they can apply implicit casts to do the comparison. We have resolved the issue of distribution (as part of DRILL-2244) and so different data types with the same numeric value will be correctly distributed to the same node. 
> 
> As part of this change we materialize the expression in the join condition, check if the types are different and apply casts if necessary.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java 9df67d8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 7fa79a1 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 8fce52e 
>   exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java 796f6fe 
>   exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestJoinAdvanced.java PRE-CREATION 
>   exec/java-exec/src/test/resources/jsoninput/implicit_cast_join_1.json PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/33343/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests plus ran existing tests with different data types in join conditions.
> 
> 
> Thanks,
> 
> Mehant Baid
> 
>


Re: Review Request 33343: DRILL-2823: Implicit cast for comparisions in join conditions

Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33343/#review80643
-----------------------------------------------------------



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
<https://reviews.apache.org/r/33343/#comment130779>

    If we have  int a = bigint b , both sides will be hash distributed as doubles, then when we are doing the join (either hash or merge join), we will ignore that they were distributed as double and just implict cast the int to bigint ...is that correct ? 
    
    Any thoughts on the performance impact ? Previously the comparison function would have taken care of comparing different types, now it must go through implicit casting.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
<https://reviews.apache.org/r/33343/#comment130778>

    Since this method is called for left and right expressions, it would be useful to know if the materialization error occurred for left or right exprs in the join condition.



exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestJoinAdvanced.java
<https://reviews.apache.org/r/33343/#comment130780>

    It would be good to have a test with join conditions on 2 or more columns that require implicit casting - 
    WHERE t1.a1 = t2.a2 
          AND t1.b1 = t2.b2 
          AND t1.c1 = t2.c2 
     where a1:int, b1:bigint, b1:float, b2:double, c1:double, c2: varchar with numeric double values.  
    This does not necesarily have to be unit test but see if you can get it included either as unit or functional test.


- Aman Sinha


On April 19, 2015, 9:34 p.m., Mehant Baid wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33343/
> -----------------------------------------------------------
> 
> (Updated April 19, 2015, 9:34 p.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> DRILL-2753 aims to remove the comparision function implementations that have different data type inputs (Look at the JIRA for more details). As a result we need to modify hash join and merge join so that when we have different types in the join condition they can apply implicit casts to do the comparison. We have resolved the issue of distribution (as part of DRILL-2244) and so different data types with the same numeric value will be correctly distributed to the same node. 
> 
> As part of this change we materialize the expression in the join condition, check if the types are different and apply casts if necessary.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java 9df67d8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 7fa79a1 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 8fce52e 
>   exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java 796f6fe 
>   exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestJoinAdvanced.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/33343/diff/
> 
> 
> Testing
> -------
> 
> Added unit tests plus ran existing tests with different data types in join conditions.
> 
> 
> Thanks,
> 
> Mehant Baid
> 
>