You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Attila Magyar <am...@hortonworks.com> on 2020/02/25 15:12:28 UTC
Review Request 71783: Implement TopNKeyFilter efficiency check
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71783/
-----------------------------------------------------------
Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and Panos Garefalakis.
Bugs: HIVE-22925
https://issues.apache.org/jira/browse/HIVE-22925
Repository: hive-git
Description
-------
In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an ascending order but the filter wants the top N smallest elements the filter will forward everything.
Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high.
Diffs
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e419dc5eb3b
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 38d2e08b760
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java dd66dfcd72e
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 7feadd3137d
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 3869ffa2b83
ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 31735c9ea3d
ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java 19910a341e0
ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 95cd45978a8
Diff: https://reviews.apache.org/r/71783/diff/1/
Testing
-------
on dwx
Thanks,
Attila Magyar
Re: Review Request 71783: Implement TopNKeyFilter efficiency check
Posted by Krisztian Kasa <kk...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71783/#review219672
-----------------------------------------------------------
Ship it!
+1 LGTM
- Krisztian Kasa
On Feb. 27, 2020, 12:57 p.m., Attila Magyar wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71783/
> -----------------------------------------------------------
>
> (Updated Feb. 27, 2020, 12:57 p.m.)
>
>
> Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and Panos Garefalakis.
>
>
> Bugs: HIVE-22925
> https://issues.apache.org/jira/browse/HIVE-22925
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an descending order but the filter wants the top N smallest elements the filter will forward everything.
>
> Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high.
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e419dc5eb3b
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 38d2e08b760
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java dd66dfcd72e
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 7feadd3137d
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 3869ffa2b83
> ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 31735c9ea3d
> ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java 19910a341e0
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 95cd45978a8
>
>
> Diff: https://reviews.apache.org/r/71783/diff/3/
>
>
> Testing
> -------
>
> on dwx
>
>
> Thanks,
>
> Attila Magyar
>
>
Re: Review Request 71783: Implement TopNKeyFilter efficiency check
Posted by Attila Magyar <am...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71783/
-----------------------------------------------------------
(Updated Feb. 27, 2020, 12:57 p.m.)
Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and Panos Garefalakis.
Bugs: HIVE-22925
https://issues.apache.org/jira/browse/HIVE-22925
Repository: hive-git
Description
-------
In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an descending order but the filter wants the top N smallest elements the filter will forward everything.
Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high.
Diffs (updated)
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e419dc5eb3b
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 38d2e08b760
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java dd66dfcd72e
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 7feadd3137d
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 3869ffa2b83
ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 31735c9ea3d
ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java 19910a341e0
ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 95cd45978a8
Diff: https://reviews.apache.org/r/71783/diff/3/
Changes: https://reviews.apache.org/r/71783/diff/2-3/
Testing
-------
on dwx
Thanks,
Attila Magyar
Re: Review Request 71783: Implement TopNKeyFilter efficiency check
Posted by Jesús Camacho Rodríguez <jc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71783/#review219667
-----------------------------------------------------------
Overall LGTM, left mostly minor comments.
- Jesús Camacho Rodríguez
On Feb. 25, 2020, 3:12 p.m., Attila Magyar wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71783/
> -----------------------------------------------------------
>
> (Updated Feb. 25, 2020, 3:12 p.m.)
>
>
> Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and Panos Garefalakis.
>
>
> Bugs: HIVE-22925
> https://issues.apache.org/jira/browse/HIVE-22925
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an descending order but the filter wants the top N smallest elements the filter will forward everything.
>
> Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high.
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e419dc5eb3b
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 38d2e08b760
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java dd66dfcd72e
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 7feadd3137d
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 3869ffa2b83
> ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 31735c9ea3d
> ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java 19910a341e0
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 95cd45978a8
> ql/src/test/results/clientpositive/llap/topnkey.q.out 345d481f18c
>
>
> Diff: https://reviews.apache.org/r/71783/diff/2/
>
>
> Testing
> -------
>
> on dwx
>
>
> Thanks,
>
> Attila Magyar
>
>
Re: Review Request 71783: Implement TopNKeyFilter efficiency check
Posted by Jesús Camacho Rodríguez <jc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71783/#review219661
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java
Lines 76 (patched)
<https://reviews.apache.org/r/71783/#comment307871>
This will include the hashcode of the object... Is this intended?
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java
Lines 87 (patched)
<https://reviews.apache.org/r/71783/#comment307870>
* Ratio between the forwarded rows and the total incoming rows.
* The higher the number is, the less is the efficiency of the filter.
* 1 means all rows should be forwarded.
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java
Lines 141 (patched)
<https://reviews.apache.org/r/71783/#comment307873>
_runTimeNumRows % conf.getCheckEfficiencyNumRows() == 0_
Could you leave a short comment here explaining when we are doing the efficiency check?
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java
Lines 148 (patched)
<https://reviews.apache.org/r/71783/#comment307875>
Is it worth to have this if/else branch?
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java
Lines 181 (patched)
<https://reviews.apache.org/r/71783/#comment307876>
nit. Maybe method should not be static so we do not have pass the Logger instance around.
ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java
Lines 96 (patched)
<https://reviews.apache.org/r/71783/#comment307872>
Probably we do not need these at USER level (doubting we even need them at DEFAULT level).
- Jesús Camacho Rodríguez
On Feb. 25, 2020, 3:12 p.m., Attila Magyar wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71783/
> -----------------------------------------------------------
>
> (Updated Feb. 25, 2020, 3:12 p.m.)
>
>
> Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and Panos Garefalakis.
>
>
> Bugs: HIVE-22925
> https://issues.apache.org/jira/browse/HIVE-22925
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an descending order but the filter wants the top N smallest elements the filter will forward everything.
>
> Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high.
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e419dc5eb3b
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 38d2e08b760
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java dd66dfcd72e
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 7feadd3137d
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 3869ffa2b83
> ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 31735c9ea3d
> ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java 19910a341e0
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 95cd45978a8
> ql/src/test/results/clientpositive/llap/topnkey.q.out 345d481f18c
>
>
> Diff: https://reviews.apache.org/r/71783/diff/2/
>
>
> Testing
> -------
>
> on dwx
>
>
> Thanks,
>
> Attila Magyar
>
>