You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Jesús Camacho Rodríguez <jc...@hortonworks.com> on 2017/06/15 10:49:28 UTC

Review Request 60116: HIVE-16885

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60116/
-----------------------------------------------------------

Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-16885
    https://issues.apache.org/jira/browse/HIVE-16885


Repository: hive-git


Description
-------

HIVE-16885


Diffs
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fce8db3df1026de8b6ee8c59567e55db40696217 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 6651900e79a5c3d4ad8329afbe3894544ce9f46e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 07fd653dedc9a98d89b492ae6b49da70984569f7 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 737aad1b764ee6487b420f2b9ea651c42e08e9bf 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java fc6adafa0ebd0bd49d59cd0f4a82f70e9646ca6d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9e84a29470c481d932d4f2d12e2898e05a925e5b 
  ql/src/test/queries/clientpositive/join47.q PRE-CREATION 
  ql/src/test/queries/clientpositive/mapjoin47.q PRE-CREATION 
  ql/src/test/queries/clientpositive/smb_mapjoin_47.q PRE-CREATION 
  ql/src/test/results/clientpositive/join47.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/mapjoin47.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/smb_mapjoin_47.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/60116/diff/1/


Testing
-------


Thanks,

Jesús Camacho Rodríguez


Re: Review Request 60116: HIVE-16885

Posted by Jesús Camacho Rodríguez <jc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60116/
-----------------------------------------------------------

(Updated June 21, 2017, 2:10 p.m.)


Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-16885
    https://issues.apache.org/jira/browse/HIVE-16885


Repository: hive-git


Description
-------

HIVE-16885


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java be38f381e6a7638be96fa6060bf296f91e92c18c 
  data/conf/llap/hive-site.xml 05ab6eeaf070c4b66bf3fffb88c04b93153217ad 
  data/conf/tez/hive-site.xml dbff10c75f5c5f38cb9ac4d7d8792120a6ff0205 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 6651900e79a5c3d4ad8329afbe3894544ce9f46e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 07fd653dedc9a98d89b492ae6b49da70984569f7 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 737aad1b764ee6487b420f2b9ea651c42e08e9bf 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java fc6adafa0ebd0bd49d59cd0f4a82f70e9646ca6d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9e84a29470c481d932d4f2d12e2898e05a925e5b 
  ql/src/test/queries/clientpositive/join47.q PRE-CREATION 
  ql/src/test/queries/clientpositive/mapjoin47.q PRE-CREATION 
  ql/src/test/queries/clientpositive/smb_mapjoin_47.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vector_between_columns.q 920b6922a9f9f9d5e30509b9118d4500046fc1e4 
  ql/src/test/results/clientpositive/join47.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/llap/dynamic_partition_pruning.q.out 2875e13bbaccb4e5bfee3e682e5b2c88acaf31a5 
  ql/src/test/results/clientpositive/llap/explainuser_1.q.out 28ca13b6d448d11d5baa1c4c96c959d413cf24f7 
  ql/src/test/results/clientpositive/llap/lineage3.q.out 25f0439477153adc57889623d05a419f320320ff 
  ql/src/test/results/clientpositive/llap/subquery_in.q.out 1f9c9e447416c153f909ebbe960a8343454dde14 
  ql/src/test/results/clientpositive/llap/subquery_notin.q.out b4af91579bb44cd30503d48de72a2f8f36e98501 
  ql/src/test/results/clientpositive/llap/subquery_scalar.q.out e94edff262685a2e2e3c4ab4ba6382b5d2cc186d 
  ql/src/test/results/clientpositive/llap/subquery_select.q.out 202980e975b4bdd988df856bc1e533990e5c664b 
  ql/src/test/results/clientpositive/llap/vectorized_dynamic_partition_pruning.q.out d9fc6b5f5834a041d1f19495234c30948aca2751 
  ql/src/test/results/clientpositive/mapjoin47.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/smb_mapjoin_47.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/60116/diff/2/

Changes: https://reviews.apache.org/r/60116/diff/1-2/


Testing
-------


Thanks,

Jesús Camacho Rodríguez


Re: Review Request 60116: HIVE-16885

Posted by Ashutosh Chauhan <ha...@apache.org>.

> On June 19, 2017, 3:21 p.m., Jesús Camacho Rodríguez wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Lines 1092 (patched)
> > <https://reviews.apache.org/r/60116/diff/1/?file=1751804#file1751804line1092>
> >
> >     The problem is not cross-joins, but all kind of inner joins.
> >     
> >     For instance, consider _JOIN ON (a=b and c>10)_. With this property set to true, _a_ and _b_ are keys and _c>10_ is the residual. Thus, the plan (in fact, the work containing the JOIN) will not be vectorized, however if optimization is disabled, this would not happen.
> >     
> >     My idea was to create a follow-up to close the gap for vectorization and then enable it by default. Another option would be to push the residual within the join only for cross joins and lift the restriction when the vectorization support for residual predicates is there. What do you think?

That still depends on data distribution. Non-vectorized path may still be faster.
I think what we shall do is to a) turn this on for tests via data/conf/llap/hive-site.xml and data/conf/tez/hive-site.xml so that we get coverage in tests from compiler side and b) Create a follow-up jira for vectorization work.


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60116/#review178249
-----------------------------------------------------------


On June 15, 2017, 10:49 a.m., Jesús Camacho Rodríguez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60116/
> -----------------------------------------------------------
> 
> (Updated June 15, 2017, 10:49 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-16885
>     https://issues.apache.org/jira/browse/HIVE-16885
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16885
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fce8db3df1026de8b6ee8c59567e55db40696217 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 6651900e79a5c3d4ad8329afbe3894544ce9f46e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 07fd653dedc9a98d89b492ae6b49da70984569f7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 737aad1b764ee6487b420f2b9ea651c42e08e9bf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java fc6adafa0ebd0bd49d59cd0f4a82f70e9646ca6d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9e84a29470c481d932d4f2d12e2898e05a925e5b 
>   ql/src/test/queries/clientpositive/join47.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/mapjoin47.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/smb_mapjoin_47.q PRE-CREATION 
>   ql/src/test/results/clientpositive/join47.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/mapjoin47.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/smb_mapjoin_47.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60116/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>


Re: Review Request 60116: HIVE-16885

Posted by Jesús Camacho Rodríguez <jc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60116/#review178249
-----------------------------------------------------------




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 1092 (patched)
<https://reviews.apache.org/r/60116/#comment252125>

    The problem is not cross-joins, but all kind of inner joins.
    
    For instance, consider _JOIN ON (a=b and c>10)_. With this property set to true, _a_ and _b_ are keys and _c>10_ is the residual. Thus, the plan (in fact, the work containing the JOIN) will not be vectorized, however if optimization is disabled, this would not happen.
    
    My idea was to create a follow-up to close the gap for vectorization and then enable it by default. Another option would be to push the residual within the join only for cross joins and lift the restriction when the vectorization support for residual predicates is there. What do you think?


- Jesús Camacho Rodríguez


On June 15, 2017, 10:49 a.m., Jesús Camacho Rodríguez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60116/
> -----------------------------------------------------------
> 
> (Updated June 15, 2017, 10:49 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-16885
>     https://issues.apache.org/jira/browse/HIVE-16885
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16885
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fce8db3df1026de8b6ee8c59567e55db40696217 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 6651900e79a5c3d4ad8329afbe3894544ce9f46e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 07fd653dedc9a98d89b492ae6b49da70984569f7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 737aad1b764ee6487b420f2b9ea651c42e08e9bf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java fc6adafa0ebd0bd49d59cd0f4a82f70e9646ca6d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9e84a29470c481d932d4f2d12e2898e05a925e5b 
>   ql/src/test/queries/clientpositive/join47.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/mapjoin47.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/smb_mapjoin_47.q PRE-CREATION 
>   ql/src/test/results/clientpositive/join47.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/mapjoin47.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/smb_mapjoin_47.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60116/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>


Re: Review Request 60116: HIVE-16885

Posted by Jesús Camacho Rodríguez <jc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60116/#review178259
-----------------------------------------------------------




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 1092 (patched)
<https://reviews.apache.org/r/60116/#comment252131>

    Sounds like a plan, I'll update the patch.


- Jesús Camacho Rodríguez


On June 15, 2017, 10:49 a.m., Jesús Camacho Rodríguez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60116/
> -----------------------------------------------------------
> 
> (Updated June 15, 2017, 10:49 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-16885
>     https://issues.apache.org/jira/browse/HIVE-16885
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16885
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fce8db3df1026de8b6ee8c59567e55db40696217 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 6651900e79a5c3d4ad8329afbe3894544ce9f46e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 07fd653dedc9a98d89b492ae6b49da70984569f7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 737aad1b764ee6487b420f2b9ea651c42e08e9bf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java fc6adafa0ebd0bd49d59cd0f4a82f70e9646ca6d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9e84a29470c481d932d4f2d12e2898e05a925e5b 
>   ql/src/test/queries/clientpositive/join47.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/mapjoin47.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/smb_mapjoin_47.q PRE-CREATION 
>   ql/src/test/results/clientpositive/join47.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/mapjoin47.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/smb_mapjoin_47.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60116/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>


Re: Review Request 60116: HIVE-16885

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60116/#review178246
-----------------------------------------------------------




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 1092 (patched)
<https://reviews.apache.org/r/60116/#comment252120>

    This should be true by default. Compiler should always generate an optimal plan. 
    Further, its not clear if vectorized cross join followed by filter will actually be faster than non-vectorized join which doesn't generate unnecessary rows.


- Ashutosh Chauhan


On June 15, 2017, 10:49 a.m., Jesús Camacho Rodríguez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60116/
> -----------------------------------------------------------
> 
> (Updated June 15, 2017, 10:49 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-16885
>     https://issues.apache.org/jira/browse/HIVE-16885
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16885
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fce8db3df1026de8b6ee8c59567e55db40696217 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 6651900e79a5c3d4ad8329afbe3894544ce9f46e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 07fd653dedc9a98d89b492ae6b49da70984569f7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 737aad1b764ee6487b420f2b9ea651c42e08e9bf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java fc6adafa0ebd0bd49d59cd0f4a82f70e9646ca6d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9e84a29470c481d932d4f2d12e2898e05a925e5b 
>   ql/src/test/queries/clientpositive/join47.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/mapjoin47.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/smb_mapjoin_47.q PRE-CREATION 
>   ql/src/test/results/clientpositive/join47.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/mapjoin47.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/smb_mapjoin_47.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60116/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>