You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Hari Sankar Sivarama Subramaniyan <hs...@hortonworks.com> on 2016/02/02 23:16:43 UTC
Review Request 43115: HIVE-12924 CBO: Calcite Operator To Hive
Operator
(Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43115/
-----------------------------------------------------------
Review request for hive, Jesús Camacho Rodríguez and John Pullokkaran.
Repository: hive-git
Description
-------
CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java 7fbf8cd
Diff: https://reviews.apache.org/r/43115/diff/
Testing
-------
Precommit runs
Thanks,
Hari Sankar Sivarama Subramaniyan
Re: Review Request 43115: HIVE-12924 CBO: Calcite Operator To Hive
Operator
(Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure
Posted by Hari Sankar Sivarama Subramaniyan <hs...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43115/
-----------------------------------------------------------
(Updated Feb. 10, 2016, 12:12 a.m.)
Review request for hive, Jesús Camacho Rodríguez and John Pullokkaran.
Repository: hive-git
Description
-------
CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java 7fbf8cd
ql/src/test/results/clientpositive/cbo_rp_groupby3_noskew_multi_distinct.q.out 95233b0
Diff: https://reviews.apache.org/r/43115/diff/
Testing
-------
Precommit runs
Thanks,
Hari Sankar Sivarama Subramaniyan
Re: Review Request 43115: HIVE-12924 CBO: Calcite Operator To Hive
Operator
(Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure
Posted by Hari Sankar Sivarama Subramaniyan <hs...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43115/
-----------------------------------------------------------
(Updated Feb. 4, 2016, 9:29 p.m.)
Review request for hive, Jesús Camacho Rodríguez and John Pullokkaran.
Changes
-------
Thanks John for the review.
The naming convention for the Distinct UDAF field for the GBY in the reduce side : <Last Reduce Key>:<Current Distinct UDF#>._col_<Distinct Key # in the current Distinct UDF>. It seems that currently we dont generate the colExprMap correctly for the above convention in HiveGBOpUtil.genMapSideRS(). The ReduceSide GBY pipeling looks good to me in the current return path code. Since we are not generating the entries for the correct columns in the MapSide Reduce Operator, we run into an exception when we look for an entry corresponding to a column in the reduce side aggreagation.
There is another optimization which can possibly done in the below scenario(after turning off mapside aggr):
explain FROM srcpart src SELECT count(DISTINCT src.value), count(DISTINCT src.key,src.key), sum(DISTINCT src.value) WHERE src.ds = '2008-04-08' GROUP BY substr(src.key,1,1);
The Reduce Operator Tree :
.......
Reduce Operator Tree:
Group By Operator
aggregations: count(DISTINCT KEY._col1:0._col0), count(DISTINCT KEY._col1:1._col0, KEY._col1:1._col1), sum(DISTINCT KEY._col1:2._col0)
keys: KEY._col0 (type: string)
mode: complete
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
......
As you can see,
1. KEY._col1:1._col0, KEY._col1:1._col1 is mapped to the same column and hence we could have used the same column in the rowschema of the ReduceSink Operator pipeline
2. KEY._col1:2._col0, KEY._col1:0._col0 is mapped to the same column and we can do the same thing mentioned in 1.
I verified that this happens even in the non-return path code and should be covered as a general change as a further optimization in a separate jira.
Thanks
Hari
Repository: hive-git
Description
-------
CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java 7fbf8cd
Diff: https://reviews.apache.org/r/43115/diff/
Testing
-------
Precommit runs
Thanks,
Hari Sankar Sivarama Subramaniyan
Re: Review Request 43115: HIVE-12924 CBO: Calcite Operator To Hive
Operator
(Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure
Posted by John Pullokkaran <jp...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43115/#review117552
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java (line 225)
<https://reviews.apache.org/r/43115/#comment178783>
Instead of walking the expression you could keep a map of inputref to exprnode.
- John Pullokkaran
On Feb. 2, 2016, 10:16 p.m., Hari Sankar Sivarama Subramaniyan wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43115/
> -----------------------------------------------------------
>
> (Updated Feb. 2, 2016, 10:16 p.m.)
>
>
> Review request for hive, Jesús Camacho Rodríguez and John Pullokkaran.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java 7fbf8cd
>
> Diff: https://reviews.apache.org/r/43115/diff/
>
>
> Testing
> -------
>
> Precommit runs
>
>
> Thanks,
>
> Hari Sankar Sivarama Subramaniyan
>
>
Re: Review Request 43115: HIVE-12924 CBO: Calcite Operator To Hive
Operator
(Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure
Posted by John Pullokkaran <jp...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43115/#review117553
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java (line 730)
<https://reviews.apache.org/r/43115/#comment178784>
Why is this being changed on MapSide RS.
I thought the issue was with distincts on ReduceSide GB when map side is turned off.
- John Pullokkaran
On Feb. 2, 2016, 10:16 p.m., Hari Sankar Sivarama Subramaniyan wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43115/
> -----------------------------------------------------------
>
> (Updated Feb. 2, 2016, 10:16 p.m.)
>
>
> Review request for hive, Jesús Camacho Rodríguez and John Pullokkaran.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java 7fbf8cd
>
> Diff: https://reviews.apache.org/r/43115/diff/
>
>
> Testing
> -------
>
> Precommit runs
>
>
> Thanks,
>
> Hari Sankar Sivarama Subramaniyan
>
>