You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Zoltan Haindrich <ki...@rxd.hu> on 2019/07/09 16:12:36 UTC

Review Request 71040: HIVE-21923 Vectorized MapJoin may miss results when only the join key is selected

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71040/
-----------------------------------------------------------

Review request for hive and Jesús Camacho Rodríguez.


Bugs: HIVE-21923
    https://issues.apache.org/jira/browse/HIVE-21923


Repository: hive-git


Description
-------

HIVE-21923
Vectorized MapJoin may miss results when only the join key is selected


Diffs
-----

  common/src/test/org/apache/hadoop/hive/common/format/datetime/package-info.java 70ee4266f45219fd81bf0d0df0a2c4380334e307 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinInnerBigOnlyGenerateResultOperator.java 35dddddb844f236f24d2f17f4a43d064c9ebaf8c 
  ql/src/test/queries/clientpositive/hybridgrace_hashjoin_2.q d989ca7dc883fa071cf5772f358c68bff78f659f 
  ql/src/test/results/clientpositive/llap/correlationoptimizer4.q.out 45a646c948ec8b72710a6b8a3949fbe0203dd68e 
  ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out 2305f87e45bd65152a6c77ce04f7b8efad4724d7 
  ql/src/test/results/clientpositive/spark/auto_join14.q.out 0c80c13889d134abe82bde30c98300620b1fd432 
  ql/src/test/results/clientpositive/spark/bucket_map_join_tez1.q.out 4ee669fa7dd50e0373910030b35c8860383a3a70 
  ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out e28b15044503ea4bb5bd12b7caed6b105f337efd 


Diff: https://reviews.apache.org/r/71040/diff/1/


Testing
-------


Thanks,

Zoltan Haindrich


Re: Review Request 71040: HIVE-21923 Vectorized MapJoin may miss results when only the join key is selected

Posted by Zoltan Haindrich <ki...@rxd.hu>.

> On July 11, 2019, 4 a.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinInnerBigOnlyGenerateResultOperator.java
> > Line 255 (original), 255 (patched)
> > <https://reviews.apache.org/r/71040/diff/1/?file=2154108#file2154108line255>
> >
> >     Can we update this comment since it is not only the big table? Feel free to add any more info to understand better what is going on.

I've removed the bigtable keyword...I don't think extending it will help.
I feel that redesigning/reducing the 3-4 mapping things to 1 would make it easier to undetstand this codes; and that would also avoid bugs like this.
Most importantly the part which puzzles together these mappings are hard to follow - and I think the problem arised from that cause.


> On July 11, 2019, 4 a.m., Jesús Camacho Rodríguez wrote:
> > ql/src/test/queries/clientpositive/hybridgrace_hashjoin_2.q
> > Line 10 (original), 9 (patched)
> > <https://reviews.apache.org/r/71040/diff/1/?file=2154109#file2154109line10>
> >
> >     Why is this disabled now? This is causing map join conversion to not being triggered below.

oh damn...I was making a final check that it's working correctly - looks like I've commited this...fixed; and all the other joins are mapjoins again


- Zoltan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71040/#review216516
-----------------------------------------------------------


On July 9, 2019, 6:12 p.m., Zoltan Haindrich wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71040/
> -----------------------------------------------------------
> 
> (Updated July 9, 2019, 6:12 p.m.)
> 
> 
> Review request for hive and Jesús Camacho Rodríguez.
> 
> 
> Bugs: HIVE-21923
>     https://issues.apache.org/jira/browse/HIVE-21923
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-21923
> Vectorized MapJoin may miss results when only the join key is selected
> 
> 
> Diffs
> -----
> 
>   common/src/test/org/apache/hadoop/hive/common/format/datetime/package-info.java 70ee4266f45219fd81bf0d0df0a2c4380334e307 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinInnerBigOnlyGenerateResultOperator.java 35dddddb844f236f24d2f17f4a43d064c9ebaf8c 
>   ql/src/test/queries/clientpositive/hybridgrace_hashjoin_2.q d989ca7dc883fa071cf5772f358c68bff78f659f 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer4.q.out 45a646c948ec8b72710a6b8a3949fbe0203dd68e 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out 2305f87e45bd65152a6c77ce04f7b8efad4724d7 
>   ql/src/test/results/clientpositive/spark/auto_join14.q.out 0c80c13889d134abe82bde30c98300620b1fd432 
>   ql/src/test/results/clientpositive/spark/bucket_map_join_tez1.q.out 4ee669fa7dd50e0373910030b35c8860383a3a70 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out e28b15044503ea4bb5bd12b7caed6b105f337efd 
> 
> 
> Diff: https://reviews.apache.org/r/71040/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zoltan Haindrich
> 
>


Re: Review Request 71040: HIVE-21923 Vectorized MapJoin may miss results when only the join key is selected

Posted by Jesús Camacho Rodríguez <jc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71040/#review216516
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinInnerBigOnlyGenerateResultOperator.java
Line 255 (original), 255 (patched)
<https://reviews.apache.org/r/71040/#comment303733>

    Can we update this comment since it is not only the big table? Feel free to add any more info to understand better what is going on.



ql/src/test/queries/clientpositive/hybridgrace_hashjoin_2.q
Line 10 (original), 9 (patched)
<https://reviews.apache.org/r/71040/#comment303736>

    Why is this disabled now? This is causing map join conversion to not being triggered below.



ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out
Line 83 (original), 84 (patched)
<https://reviews.apache.org/r/71040/#comment303734>

    Map Join conversion not being triggered.



ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out
Line 241 (original), 255 (patched)
<https://reviews.apache.org/r/71040/#comment303735>

    Map Join conversion not being triggered.



ql/src/test/results/clientpositive/spark/bucket_map_join_tez1.q.out
Line 5149 (original), 5149 (patched)
<https://reviews.apache.org/r/71040/#comment303737>

    Cool.


- Jesús Camacho Rodríguez


On July 9, 2019, 4:12 p.m., Zoltan Haindrich wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71040/
> -----------------------------------------------------------
> 
> (Updated July 9, 2019, 4:12 p.m.)
> 
> 
> Review request for hive and Jesús Camacho Rodríguez.
> 
> 
> Bugs: HIVE-21923
>     https://issues.apache.org/jira/browse/HIVE-21923
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-21923
> Vectorized MapJoin may miss results when only the join key is selected
> 
> 
> Diffs
> -----
> 
>   common/src/test/org/apache/hadoop/hive/common/format/datetime/package-info.java 70ee4266f45219fd81bf0d0df0a2c4380334e307 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinInnerBigOnlyGenerateResultOperator.java 35dddddb844f236f24d2f17f4a43d064c9ebaf8c 
>   ql/src/test/queries/clientpositive/hybridgrace_hashjoin_2.q d989ca7dc883fa071cf5772f358c68bff78f659f 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer4.q.out 45a646c948ec8b72710a6b8a3949fbe0203dd68e 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out 2305f87e45bd65152a6c77ce04f7b8efad4724d7 
>   ql/src/test/results/clientpositive/spark/auto_join14.q.out 0c80c13889d134abe82bde30c98300620b1fd432 
>   ql/src/test/results/clientpositive/spark/bucket_map_join_tez1.q.out 4ee669fa7dd50e0373910030b35c8860383a3a70 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out e28b15044503ea4bb5bd12b7caed6b105f337efd 
> 
> 
> Diff: https://reviews.apache.org/r/71040/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zoltan Haindrich
> 
>


Re: Review Request 71040: HIVE-21923 Vectorized MapJoin may miss results when only the join key is selected

Posted by Zoltan Haindrich <ki...@rxd.hu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71040/
-----------------------------------------------------------

(Updated July 11, 2019, 9:48 a.m.)


Review request for hive and Jesús Camacho Rodríguez.


Changes
-------

patch#7


Bugs: HIVE-21923
    https://issues.apache.org/jira/browse/HIVE-21923


Repository: hive-git


Description
-------

HIVE-21923
Vectorized MapJoin may miss results when only the join key is selected


Diffs (updated)
-----

  common/src/test/org/apache/hadoop/hive/common/format/datetime/package-info.java 70ee4266f4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinInnerBigOnlyGenerateResultOperator.java 35dddddb84 
  ql/src/test/queries/clientpositive/hybridgrace_hashjoin_2.q d989ca7dc8 
  ql/src/test/results/clientpositive/llap/correlationoptimizer4.q.out 45a646c948 
  ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out 1ddc1ea1ec 
  ql/src/test/results/clientpositive/spark/auto_join14.q.out 0c80c13889 
  ql/src/test/results/clientpositive/spark/bucket_map_join_tez1.q.out 4ee669fa7d 
  ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out 8e9bd0513e 


Diff: https://reviews.apache.org/r/71040/diff/2/

Changes: https://reviews.apache.org/r/71040/diff/1-2/


Testing
-------


Thanks,

Zoltan Haindrich