You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Jesús Camacho Rodríguez <jc...@hortonworks.com> on 2017/06/15 10:49:28 UTC
Review Request 60116: HIVE-16885
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60116/
-----------------------------------------------------------
Review request for hive and Ashutosh Chauhan.
Bugs: HIVE-16885
https://issues.apache.org/jira/browse/HIVE-16885
Repository: hive-git
Description
-------
HIVE-16885
Diffs
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fce8db3df1026de8b6ee8c59567e55db40696217
ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 6651900e79a5c3d4ad8329afbe3894544ce9f46e
ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 07fd653dedc9a98d89b492ae6b49da70984569f7
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 737aad1b764ee6487b420f2b9ea651c42e08e9bf
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java fc6adafa0ebd0bd49d59cd0f4a82f70e9646ca6d
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9e84a29470c481d932d4f2d12e2898e05a925e5b
ql/src/test/queries/clientpositive/join47.q PRE-CREATION
ql/src/test/queries/clientpositive/mapjoin47.q PRE-CREATION
ql/src/test/queries/clientpositive/smb_mapjoin_47.q PRE-CREATION
ql/src/test/results/clientpositive/join47.q.out PRE-CREATION
ql/src/test/results/clientpositive/mapjoin47.q.out PRE-CREATION
ql/src/test/results/clientpositive/smb_mapjoin_47.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/60116/diff/1/
Testing
-------
Thanks,
Jesús Camacho Rodríguez
Re: Review Request 60116: HIVE-16885
Posted by Jesús Camacho Rodríguez <jc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60116/
-----------------------------------------------------------
(Updated June 21, 2017, 2:10 p.m.)
Review request for hive and Ashutosh Chauhan.
Bugs: HIVE-16885
https://issues.apache.org/jira/browse/HIVE-16885
Repository: hive-git
Description
-------
HIVE-16885
Diffs (updated)
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java be38f381e6a7638be96fa6060bf296f91e92c18c
data/conf/llap/hive-site.xml 05ab6eeaf070c4b66bf3fffb88c04b93153217ad
data/conf/tez/hive-site.xml dbff10c75f5c5f38cb9ac4d7d8792120a6ff0205
ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 6651900e79a5c3d4ad8329afbe3894544ce9f46e
ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 07fd653dedc9a98d89b492ae6b49da70984569f7
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 737aad1b764ee6487b420f2b9ea651c42e08e9bf
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java fc6adafa0ebd0bd49d59cd0f4a82f70e9646ca6d
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9e84a29470c481d932d4f2d12e2898e05a925e5b
ql/src/test/queries/clientpositive/join47.q PRE-CREATION
ql/src/test/queries/clientpositive/mapjoin47.q PRE-CREATION
ql/src/test/queries/clientpositive/smb_mapjoin_47.q PRE-CREATION
ql/src/test/queries/clientpositive/vector_between_columns.q 920b6922a9f9f9d5e30509b9118d4500046fc1e4
ql/src/test/results/clientpositive/join47.q.out PRE-CREATION
ql/src/test/results/clientpositive/llap/dynamic_partition_pruning.q.out 2875e13bbaccb4e5bfee3e682e5b2c88acaf31a5
ql/src/test/results/clientpositive/llap/explainuser_1.q.out 28ca13b6d448d11d5baa1c4c96c959d413cf24f7
ql/src/test/results/clientpositive/llap/lineage3.q.out 25f0439477153adc57889623d05a419f320320ff
ql/src/test/results/clientpositive/llap/subquery_in.q.out 1f9c9e447416c153f909ebbe960a8343454dde14
ql/src/test/results/clientpositive/llap/subquery_notin.q.out b4af91579bb44cd30503d48de72a2f8f36e98501
ql/src/test/results/clientpositive/llap/subquery_scalar.q.out e94edff262685a2e2e3c4ab4ba6382b5d2cc186d
ql/src/test/results/clientpositive/llap/subquery_select.q.out 202980e975b4bdd988df856bc1e533990e5c664b
ql/src/test/results/clientpositive/llap/vectorized_dynamic_partition_pruning.q.out d9fc6b5f5834a041d1f19495234c30948aca2751
ql/src/test/results/clientpositive/mapjoin47.q.out PRE-CREATION
ql/src/test/results/clientpositive/smb_mapjoin_47.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/60116/diff/2/
Changes: https://reviews.apache.org/r/60116/diff/1-2/
Testing
-------
Thanks,
Jesús Camacho Rodríguez
Re: Review Request 60116: HIVE-16885
Posted by Ashutosh Chauhan <ha...@apache.org>.
> On June 19, 2017, 3:21 p.m., Jesús Camacho Rodríguez wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Lines 1092 (patched)
> > <https://reviews.apache.org/r/60116/diff/1/?file=1751804#file1751804line1092>
> >
> > The problem is not cross-joins, but all kind of inner joins.
> >
> > For instance, consider _JOIN ON (a=b and c>10)_. With this property set to true, _a_ and _b_ are keys and _c>10_ is the residual. Thus, the plan (in fact, the work containing the JOIN) will not be vectorized, however if optimization is disabled, this would not happen.
> >
> > My idea was to create a follow-up to close the gap for vectorization and then enable it by default. Another option would be to push the residual within the join only for cross joins and lift the restriction when the vectorization support for residual predicates is there. What do you think?
That still depends on data distribution. Non-vectorized path may still be faster.
I think what we shall do is to a) turn this on for tests via data/conf/llap/hive-site.xml and data/conf/tez/hive-site.xml so that we get coverage in tests from compiler side and b) Create a follow-up jira for vectorization work.
- Ashutosh
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60116/#review178249
-----------------------------------------------------------
On June 15, 2017, 10:49 a.m., Jesús Camacho Rodríguez wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60116/
> -----------------------------------------------------------
>
> (Updated June 15, 2017, 10:49 a.m.)
>
>
> Review request for hive and Ashutosh Chauhan.
>
>
> Bugs: HIVE-16885
> https://issues.apache.org/jira/browse/HIVE-16885
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> HIVE-16885
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fce8db3df1026de8b6ee8c59567e55db40696217
> ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 6651900e79a5c3d4ad8329afbe3894544ce9f46e
> ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 07fd653dedc9a98d89b492ae6b49da70984569f7
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 737aad1b764ee6487b420f2b9ea651c42e08e9bf
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java fc6adafa0ebd0bd49d59cd0f4a82f70e9646ca6d
> ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9e84a29470c481d932d4f2d12e2898e05a925e5b
> ql/src/test/queries/clientpositive/join47.q PRE-CREATION
> ql/src/test/queries/clientpositive/mapjoin47.q PRE-CREATION
> ql/src/test/queries/clientpositive/smb_mapjoin_47.q PRE-CREATION
> ql/src/test/results/clientpositive/join47.q.out PRE-CREATION
> ql/src/test/results/clientpositive/mapjoin47.q.out PRE-CREATION
> ql/src/test/results/clientpositive/smb_mapjoin_47.q.out PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/60116/diff/1/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jesús Camacho Rodríguez
>
>
Re: Review Request 60116: HIVE-16885
Posted by Jesús Camacho Rodríguez <jc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60116/#review178249
-----------------------------------------------------------
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 1092 (patched)
<https://reviews.apache.org/r/60116/#comment252125>
The problem is not cross-joins, but all kind of inner joins.
For instance, consider _JOIN ON (a=b and c>10)_. With this property set to true, _a_ and _b_ are keys and _c>10_ is the residual. Thus, the plan (in fact, the work containing the JOIN) will not be vectorized, however if optimization is disabled, this would not happen.
My idea was to create a follow-up to close the gap for vectorization and then enable it by default. Another option would be to push the residual within the join only for cross joins and lift the restriction when the vectorization support for residual predicates is there. What do you think?
- Jesús Camacho Rodríguez
On June 15, 2017, 10:49 a.m., Jesús Camacho Rodríguez wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60116/
> -----------------------------------------------------------
>
> (Updated June 15, 2017, 10:49 a.m.)
>
>
> Review request for hive and Ashutosh Chauhan.
>
>
> Bugs: HIVE-16885
> https://issues.apache.org/jira/browse/HIVE-16885
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> HIVE-16885
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fce8db3df1026de8b6ee8c59567e55db40696217
> ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 6651900e79a5c3d4ad8329afbe3894544ce9f46e
> ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 07fd653dedc9a98d89b492ae6b49da70984569f7
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 737aad1b764ee6487b420f2b9ea651c42e08e9bf
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java fc6adafa0ebd0bd49d59cd0f4a82f70e9646ca6d
> ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9e84a29470c481d932d4f2d12e2898e05a925e5b
> ql/src/test/queries/clientpositive/join47.q PRE-CREATION
> ql/src/test/queries/clientpositive/mapjoin47.q PRE-CREATION
> ql/src/test/queries/clientpositive/smb_mapjoin_47.q PRE-CREATION
> ql/src/test/results/clientpositive/join47.q.out PRE-CREATION
> ql/src/test/results/clientpositive/mapjoin47.q.out PRE-CREATION
> ql/src/test/results/clientpositive/smb_mapjoin_47.q.out PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/60116/diff/1/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jesús Camacho Rodríguez
>
>
Re: Review Request 60116: HIVE-16885
Posted by Jesús Camacho Rodríguez <jc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60116/#review178259
-----------------------------------------------------------
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 1092 (patched)
<https://reviews.apache.org/r/60116/#comment252131>
Sounds like a plan, I'll update the patch.
- Jesús Camacho Rodríguez
On June 15, 2017, 10:49 a.m., Jesús Camacho Rodríguez wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60116/
> -----------------------------------------------------------
>
> (Updated June 15, 2017, 10:49 a.m.)
>
>
> Review request for hive and Ashutosh Chauhan.
>
>
> Bugs: HIVE-16885
> https://issues.apache.org/jira/browse/HIVE-16885
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> HIVE-16885
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fce8db3df1026de8b6ee8c59567e55db40696217
> ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 6651900e79a5c3d4ad8329afbe3894544ce9f46e
> ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 07fd653dedc9a98d89b492ae6b49da70984569f7
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 737aad1b764ee6487b420f2b9ea651c42e08e9bf
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java fc6adafa0ebd0bd49d59cd0f4a82f70e9646ca6d
> ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9e84a29470c481d932d4f2d12e2898e05a925e5b
> ql/src/test/queries/clientpositive/join47.q PRE-CREATION
> ql/src/test/queries/clientpositive/mapjoin47.q PRE-CREATION
> ql/src/test/queries/clientpositive/smb_mapjoin_47.q PRE-CREATION
> ql/src/test/results/clientpositive/join47.q.out PRE-CREATION
> ql/src/test/results/clientpositive/mapjoin47.q.out PRE-CREATION
> ql/src/test/results/clientpositive/smb_mapjoin_47.q.out PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/60116/diff/1/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jesús Camacho Rodríguez
>
>
Re: Review Request 60116: HIVE-16885
Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60116/#review178246
-----------------------------------------------------------
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 1092 (patched)
<https://reviews.apache.org/r/60116/#comment252120>
This should be true by default. Compiler should always generate an optimal plan.
Further, its not clear if vectorized cross join followed by filter will actually be faster than non-vectorized join which doesn't generate unnecessary rows.
- Ashutosh Chauhan
On June 15, 2017, 10:49 a.m., Jesús Camacho Rodríguez wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60116/
> -----------------------------------------------------------
>
> (Updated June 15, 2017, 10:49 a.m.)
>
>
> Review request for hive and Ashutosh Chauhan.
>
>
> Bugs: HIVE-16885
> https://issues.apache.org/jira/browse/HIVE-16885
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> HIVE-16885
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fce8db3df1026de8b6ee8c59567e55db40696217
> ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 6651900e79a5c3d4ad8329afbe3894544ce9f46e
> ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 07fd653dedc9a98d89b492ae6b49da70984569f7
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 737aad1b764ee6487b420f2b9ea651c42e08e9bf
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java fc6adafa0ebd0bd49d59cd0f4a82f70e9646ca6d
> ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9e84a29470c481d932d4f2d12e2898e05a925e5b
> ql/src/test/queries/clientpositive/join47.q PRE-CREATION
> ql/src/test/queries/clientpositive/mapjoin47.q PRE-CREATION
> ql/src/test/queries/clientpositive/smb_mapjoin_47.q PRE-CREATION
> ql/src/test/results/clientpositive/join47.q.out PRE-CREATION
> ql/src/test/results/clientpositive/mapjoin47.q.out PRE-CREATION
> ql/src/test/results/clientpositive/smb_mapjoin_47.q.out PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/60116/diff/1/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jesús Camacho Rodríguez
>
>