You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Attila Magyar <am...@hortonworks.com> on 2020/02/25 15:12:28 UTC

Review Request 71783: Implement TopNKeyFilter efficiency check

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71783/
-----------------------------------------------------------

Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and Panos Garefalakis.


Bugs: HIVE-22925
    https://issues.apache.org/jira/browse/HIVE-22925


Repository: hive-git


Description
-------

In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an ascending order but the filter wants the top N smallest elements the filter will forward everything.

Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high.


Diffs
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e419dc5eb3b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 38d2e08b760 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java dd66dfcd72e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 7feadd3137d 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 3869ffa2b83 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 31735c9ea3d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java 19910a341e0 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 95cd45978a8 


Diff: https://reviews.apache.org/r/71783/diff/1/


Testing
-------

on dwx


Thanks,

Attila Magyar


Re: Review Request 71783: Implement TopNKeyFilter efficiency check

Posted by Krisztian Kasa <kk...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71783/#review219672
-----------------------------------------------------------


Ship it!




+1 LGTM

- Krisztian Kasa


On Feb. 27, 2020, 12:57 p.m., Attila Magyar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71783/
> -----------------------------------------------------------
> 
> (Updated Feb. 27, 2020, 12:57 p.m.)
> 
> 
> Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and Panos Garefalakis.
> 
> 
> Bugs: HIVE-22925
>     https://issues.apache.org/jira/browse/HIVE-22925
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an descending order but the filter wants the top N smallest elements the filter will forward everything.
> 
> Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e419dc5eb3b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 38d2e08b760 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java dd66dfcd72e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 7feadd3137d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 3869ffa2b83 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 31735c9ea3d 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java 19910a341e0 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 95cd45978a8 
> 
> 
> Diff: https://reviews.apache.org/r/71783/diff/3/
> 
> 
> Testing
> -------
> 
> on dwx
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>


Re: Review Request 71783: Implement TopNKeyFilter efficiency check

Posted by Attila Magyar <am...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71783/
-----------------------------------------------------------

(Updated Feb. 27, 2020, 12:57 p.m.)


Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and Panos Garefalakis.


Bugs: HIVE-22925
    https://issues.apache.org/jira/browse/HIVE-22925


Repository: hive-git


Description
-------

In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an descending order but the filter wants the top N smallest elements the filter will forward everything.

Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e419dc5eb3b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 38d2e08b760 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java dd66dfcd72e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 7feadd3137d 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 3869ffa2b83 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 31735c9ea3d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java 19910a341e0 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 95cd45978a8 


Diff: https://reviews.apache.org/r/71783/diff/3/

Changes: https://reviews.apache.org/r/71783/diff/2-3/


Testing
-------

on dwx


Thanks,

Attila Magyar


Re: Review Request 71783: Implement TopNKeyFilter efficiency check

Posted by Jesús Camacho Rodríguez <jc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71783/#review219667
-----------------------------------------------------------



Overall LGTM, left mostly minor comments.

- Jesús Camacho Rodríguez


On Feb. 25, 2020, 3:12 p.m., Attila Magyar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71783/
> -----------------------------------------------------------
> 
> (Updated Feb. 25, 2020, 3:12 p.m.)
> 
> 
> Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and Panos Garefalakis.
> 
> 
> Bugs: HIVE-22925
>     https://issues.apache.org/jira/browse/HIVE-22925
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an descending order but the filter wants the top N smallest elements the filter will forward everything.
> 
> Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e419dc5eb3b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 38d2e08b760 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java dd66dfcd72e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 7feadd3137d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 3869ffa2b83 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 31735c9ea3d 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java 19910a341e0 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 95cd45978a8 
>   ql/src/test/results/clientpositive/llap/topnkey.q.out 345d481f18c 
> 
> 
> Diff: https://reviews.apache.org/r/71783/diff/2/
> 
> 
> Testing
> -------
> 
> on dwx
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>


Re: Review Request 71783: Implement TopNKeyFilter efficiency check

Posted by Jesús Camacho Rodríguez <jc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71783/#review219661
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java
Lines 76 (patched)
<https://reviews.apache.org/r/71783/#comment307871>

    This will include the hashcode of the object... Is this intended?



ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java
Lines 87 (patched)
<https://reviews.apache.org/r/71783/#comment307870>

    * Ratio between the forwarded rows and the total incoming rows.
       * The higher the number is, the less is the efficiency of the filter.
       * 1 means all rows should be forwarded.



ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java
Lines 141 (patched)
<https://reviews.apache.org/r/71783/#comment307873>

    _runTimeNumRows % conf.getCheckEfficiencyNumRows() == 0_
    
    Could you leave a short comment here explaining when we are doing the efficiency check?



ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java
Lines 148 (patched)
<https://reviews.apache.org/r/71783/#comment307875>

    Is it worth to have this if/else branch?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java
Lines 181 (patched)
<https://reviews.apache.org/r/71783/#comment307876>

    nit. Maybe method should not be static so we do not have pass the Logger instance around.



ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java
Lines 96 (patched)
<https://reviews.apache.org/r/71783/#comment307872>

    Probably we do not need these at USER level (doubting we even need them at DEFAULT level).


- Jesús Camacho Rodríguez


On Feb. 25, 2020, 3:12 p.m., Attila Magyar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71783/
> -----------------------------------------------------------
> 
> (Updated Feb. 25, 2020, 3:12 p.m.)
> 
> 
> Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and Panos Garefalakis.
> 
> 
> Bugs: HIVE-22925
>     https://issues.apache.org/jira/browse/HIVE-22925
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an descending order but the filter wants the top N smallest elements the filter will forward everything.
> 
> Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e419dc5eb3b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 38d2e08b760 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java dd66dfcd72e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 7feadd3137d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java 3869ffa2b83 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 31735c9ea3d 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java 19910a341e0 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 95cd45978a8 
>   ql/src/test/results/clientpositive/llap/topnkey.q.out 345d481f18c 
> 
> 
> Diff: https://reviews.apache.org/r/71783/diff/2/
> 
> 
> Testing
> -------
> 
> on dwx
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>