You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by Jesús Camacho Rodríguez <jc...@hortonworks.com> on 2018/11/01 20:33:00 UTC

Re: Review Request 69202: HIVE-20804 Further improvements to group by optimization with constraints

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69202/#review210258
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java
Lines 774 (patched)
<https://reviews.apache.org/r/69202/#comment294900>

    Cardinality and NDV may change. This just checks whether column origin/lineage can be backtrack to single input. We should use RelMdColumnOrigins or RelMdExpressionLineage instead of writing this new class.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java
Lines 809 (patched)
<https://reviews.apache.org/r/69202/#comment294899>

    You can use _shift_ method in ImmutableBitSet for these cases.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java
Lines 823 (patched)
<https://reviews.apache.org/r/69202/#comment294904>

    Not used?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java
Lines 833 (patched)
<https://reviews.apache.org/r/69202/#comment294894>

    colSet.contains(ImmutableBitSet.of(i)) -> colSet.get(i)



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java
Lines 834 (patched)
<https://reviews.apache.org/r/69202/#comment294896>

    This will create a copy in every iteration. You could create a ImmutableBitSet.Builder from colset outside of the loop and then clear it here.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java
Lines 321 (patched)
<https://reviews.apache.org/r/69202/#comment294901>

    Method name should change, tbh it is not about cardinality, but about whether the rows can continue being identified uniquely.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java
Lines 346 (patched)
<https://reviews.apache.org/r/69202/#comment294902>

    This should never be empty?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java
Lines 359 (patched)
<https://reviews.apache.org/r/69202/#comment294903>

    This logic does not seem correct. What happens when a Project permutes input references and the GBy columns would not be in same order as the key? For instance consider GBy key is (0,1,2) which maps into columns (2,0,4) in TS. Key is (0,4). You would remove column 1 from GBy, but you should remove column 0. Using ColumnOrigins or ExpressionLineage providers to get the mapping would solve the issue.


- Jesús Camacho Rodríguez


On Oct. 29, 2018, 6:45 p.m., Vineet Garg wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69202/
> -----------------------------------------------------------
> 
> (Updated Oct. 29, 2018, 6:45 p.m.)
> 
> 
> Review request for hive and Jesús Camacho Rodríguez.
> 
> 
> Bugs: HIVE-20804
>     https://issues.apache.org/jira/browse/HIVE-20804
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> See Jira
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java 9aa30129b6 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java b7c31bdfca 
>   ql/src/test/queries/clientpositive/constraints_optimization.q 70ab8509c5 
>   ql/src/test/results/clientpositive/llap/constraints_optimization.q.out 96caa4d6dd 
> 
> 
> Diff: https://reviews.apache.org/r/69202/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vineet Garg
> 
>

Re: Review Request 69202: HIVE-20804 Further improvements to group by optimization with constraints

Posted by Vineet Garg <vg...@hortonworks.com>.


> On Nov. 1, 2018, 8:33 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java
> > Lines 774 (patched)
> > <https://reviews.apache.org/r/69202/diff/1/?file=2103050#file2103050line774>
> >
> >     Cardinality and NDV may change. This just checks whether column origin/lineage can be backtrack to single input. We should use RelMdColumnOrigins or RelMdExpressionLineage instead of writing this new class.

The reason I decided not to use RelMdColumnOrigins or RelMdExpressionLineage is because both of them operate on single column/rex node. To get to column origin for a set of columns/RexRefs we will need to call them repeatadely. I feel this is big overkill and therefore wrote another class to take on set of columns.

I agree about the name being misleading. I'll rename the class to better communicate its purpose.


> On Nov. 1, 2018, 8:33 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java
> > Lines 346 (patched)
> > <https://reviews.apache.org/r/69202/diff/1/?file=2103051#file2103051line346>
> >
> >     This should never be empty?

It could be empty if 'CardinalityChange' couldn't backtrack. However if can never be null.


> On Nov. 1, 2018, 8:33 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java
> > Lines 359 (patched)
> > <https://reviews.apache.org/r/69202/diff/1/?file=2103051#file2103051line359>
> >
> >     This logic does not seem correct. What happens when a Project permutes input references and the GBy columns would not be in same order as the key? For instance consider GBy key is (0,1,2) which maps into columns (2,0,4) in TS. Key is (0,4). You would remove column 1 from GBy, but you should remove column 0. Using ColumnOrigins or ExpressionLineage providers to get the mapping would solve the issue.

The whole for loop actually takes care of this case. CardinalityChange returns you backtracked (if it could) columns (2,0,4) in this case. Then we loop over this backtracked columns and remove the corresponding column from the GBy key (based on the position) if not part of key. For instance in this case first column in backtracked columns (i.e. 2) is not part of key therefore corresponding column (i.e. 0) will be removed from GBy key.


- Vineet


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69202/#review210258
-----------------------------------------------------------


On Nov. 5, 2018, 10:57 p.m., Vineet Garg wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69202/
> -----------------------------------------------------------
> 
> (Updated Nov. 5, 2018, 10:57 p.m.)
> 
> 
> Review request for hive and Jesús Camacho Rodríguez.
> 
> 
> Bugs: HIVE-20804
>     https://issues.apache.org/jira/browse/HIVE-20804
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> See Jira
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java 9aa30129b6 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java b7c31bdfca 
>   ql/src/test/queries/clientpositive/constraints_optimization.q 70ab8509c5 
>   ql/src/test/results/clientpositive/llap/constraints_optimization.q.out 96caa4d6dd 
> 
> 
> Diff: https://reviews.apache.org/r/69202/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vineet Garg
> 
>