You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Hari Sankar Sivarama Subramaniyan <hs...@hortonworks.com> on 2016/02/02 23:16:43 UTC

Review Request 43115: HIVE-12924 CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43115/
-----------------------------------------------------------

Review request for hive, Jesús Camacho Rodríguez and John Pullokkaran.


Repository: hive-git


Description
-------

CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java 7fbf8cd 

Diff: https://reviews.apache.org/r/43115/diff/


Testing
-------

Precommit runs


Thanks,

Hari Sankar Sivarama Subramaniyan


Re: Review Request 43115: HIVE-12924 CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure

Posted by Hari Sankar Sivarama Subramaniyan <hs...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43115/
-----------------------------------------------------------

(Updated Feb. 10, 2016, 12:12 a.m.)


Review request for hive, Jesús Camacho Rodríguez and John Pullokkaran.


Repository: hive-git


Description
-------

CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java 7fbf8cd 
  ql/src/test/results/clientpositive/cbo_rp_groupby3_noskew_multi_distinct.q.out 95233b0 

Diff: https://reviews.apache.org/r/43115/diff/


Testing
-------

Precommit runs


Thanks,

Hari Sankar Sivarama Subramaniyan


Re: Review Request 43115: HIVE-12924 CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure

Posted by Hari Sankar Sivarama Subramaniyan <hs...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43115/
-----------------------------------------------------------

(Updated Feb. 4, 2016, 9:29 p.m.)


Review request for hive, Jesús Camacho Rodríguez and John Pullokkaran.


Changes
-------

Thanks John for the review.

The naming convention for the Distinct UDAF field for the GBY in the reduce side : <Last Reduce Key>:<Current Distinct UDF#>._col_<Distinct Key # in the current Distinct UDF>. It seems that currently we dont generate the colExprMap correctly for the above convention in HiveGBOpUtil.genMapSideRS(). The ReduceSide GBY pipeling looks good to me in the current return path code. Since we are not generating the entries for the correct columns in the MapSide Reduce Operator, we run into an exception when we look for an entry corresponding to a column in the reduce side aggreagation.

There is another optimization which can possibly done in the below scenario(after turning off mapside aggr):
explain FROM srcpart src SELECT count(DISTINCT src.value), count(DISTINCT src.key,src.key), sum(DISTINCT src.value) WHERE src.ds = '2008-04-08' GROUP BY substr(src.key,1,1);

The Reduce Operator Tree :
.......
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(DISTINCT KEY._col1:0._col0), count(DISTINCT KEY._col1:1._col0, KEY._col1:1._col1), sum(DISTINCT KEY._col1:2._col0)
          keys: KEY._col0 (type: string)
          mode: complete
          outputColumnNames: _col0, _col1, _col2, _col3
          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
          Select Operator
          ......
As you can see, 
1. KEY._col1:1._col0, KEY._col1:1._col1 is mapped to the same column and hence we could have used the same column in the rowschema of the ReduceSink Operator pipeline
2. KEY._col1:2._col0,  KEY._col1:0._col0 is mapped to the same column and we can do the same thing mentioned in 1.

I verified that this happens even in the non-return path code and should be covered as a general change as a further optimization in a separate jira.

Thanks
Hari


Repository: hive-git


Description
-------

CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java 7fbf8cd 

Diff: https://reviews.apache.org/r/43115/diff/


Testing
-------

Precommit runs


Thanks,

Hari Sankar Sivarama Subramaniyan


Re: Review Request 43115: HIVE-12924 CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure

Posted by John Pullokkaran <jp...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43115/#review117552
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java (line 225)
<https://reviews.apache.org/r/43115/#comment178783>

    Instead of walking the expression you could keep a map of inputref to exprnode.


- John Pullokkaran


On Feb. 2, 2016, 10:16 p.m., Hari Sankar Sivarama Subramaniyan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43115/
> -----------------------------------------------------------
> 
> (Updated Feb. 2, 2016, 10:16 p.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez and John Pullokkaran.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java 7fbf8cd 
> 
> Diff: https://reviews.apache.org/r/43115/diff/
> 
> 
> Testing
> -------
> 
> Precommit runs
> 
> 
> Thanks,
> 
> Hari Sankar Sivarama Subramaniyan
> 
>


Re: Review Request 43115: HIVE-12924 CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure

Posted by John Pullokkaran <jp...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43115/#review117553
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java (line 730)
<https://reviews.apache.org/r/43115/#comment178784>

    Why is this being changed on MapSide RS.
    I thought the issue was with distincts on ReduceSide GB when map side is turned off.


- John Pullokkaran


On Feb. 2, 2016, 10:16 p.m., Hari Sankar Sivarama Subramaniyan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43115/
> -----------------------------------------------------------
> 
> (Updated Feb. 2, 2016, 10:16 p.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez and John Pullokkaran.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java 7fbf8cd 
> 
> Diff: https://reviews.apache.org/r/43115/diff/
> 
> 
> Testing
> -------
> 
> Precommit runs
> 
> 
> Thanks,
> 
> Hari Sankar Sivarama Subramaniyan
> 
>