You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Vineet Garg <vg...@hortonworks.com> on 2018/08/12 22:08:43 UTC

Review Request 68313: HIVE-20366 TPC-DS query78 stats estimates are off for is null filter

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68313/
-----------------------------------------------------------

Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-20366
    https://issues.apache.org/jira/browse/HIVE-20366


Repository: hive-git


Description
-------

Heuristic to estimate unmatched rows for outer joins


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java 7682791f4d 
  ql/src/test/results/clientpositive/annotate_stats_join.q.out b0d2b05ab0 
  ql/src/test/results/clientpositive/llap/auto_join30.q.out 874511a112 
  ql/src/test/results/clientpositive/llap/bucket_map_join_tez2.q.out 205cd444b2 
  ql/src/test/results/clientpositive/llap/check_constraint.q.out ec1ed64fe8 
  ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out 21b07b2c80 
  ql/src/test/results/clientpositive/llap/explainuser_1.q.out a98191653f 
  ql/src/test/results/clientpositive/llap/insert_into_default_keyword.q.out 1a61c0e592 
  ql/src/test/results/clientpositive/llap/join46.q.out b6ef9b184e 
  ql/src/test/results/clientpositive/llap/join_emit_interval.q.out 9484b7ae0a 
  ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out f8ce1ce93e 
  ql/src/test/results/clientpositive/llap/mapjoin3.q.out 7aa7318896 
  ql/src/test/results/clientpositive/llap/mapjoin46.q.out 204e7755e5 
  ql/src/test/results/clientpositive/llap/mapjoin_emit_interval.q.out f6a1a6ee41 
  ql/src/test/results/clientpositive/llap/skewjoinopt15.q.out cd20c3ab17 
  ql/src/test/results/clientpositive/llap/subquery_in.q.out a045b12dc6 
  ql/src/test/results/clientpositive/llap/subquery_multi.q.out a865ee9259 
  ql/src/test/results/clientpositive/llap/subquery_notin.q.out f5f5f36aa3 
  ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 4423aec8a2 
  ql/src/test/results/clientpositive/llap/subquery_select.q.out cf3d60f4b3 
  ql/src/test/results/clientpositive/llap/tez_join_tests.q.out bf2f5a8548 
  ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 72b84a0106 
  ql/src/test/results/clientpositive/llap/tez_union.q.out 914ed47859 
  ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out f006e37b56 
  ql/src/test/results/clientpositive/llap/vector_coalesce_3.q.out d05dd70206 
  ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 6443678f89 
  ql/src/test/results/clientpositive/llap/vector_outer_join0.q.out 19e98f3f4a 
  ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out c74a588993 
  ql/src/test/results/clientpositive/llap/vectorized_join46.q.out e03948f8b0 
  ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out 7d45328d41 
  ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out 00f5d7ef11 


Diff: https://reviews.apache.org/r/68313/diff/1/


Testing
-------


Thanks,

Vineet Garg


Re: Review Request 68313: HIVE-20366 TPC-DS query78 stats estimates are off for is null filter

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68313/#review207573
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Lines 1700 (patched)
<https://reviews.apache.org/r/68313/#comment290972>

    Lets s/dangling/unmatched/gc everywhere.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Lines 1703 (patched)
<https://reviews.apache.org/r/68313/#comment290973>

    e.g., here



ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Line 2162 (original), 2178 (patched)
<https://reviews.apache.org/r/68313/#comment290975>

    Rename interimNumRows to unmatchedRows?



ql/src/test/results/clientpositive/annotate_stats_join.q.out
Line 901 (original), 901 (patched)
<https://reviews.apache.org/r/68313/#comment290976>

    This looks incorrect both before and after. There are few columns being output with 4 of them being join key. 194/54 = less than 4 bytes per row. Can you check how much nulls we predicted here and if thats correct?



ql/src/test/results/clientpositive/llap/auto_join30.q.out
Line 172 (original), 172 (patched)
<https://reviews.apache.org/r/68313/#comment290977>

    Data size is way underestimated. Can you verify?



ql/src/test/results/clientpositive/llap/subquery_notin.q.out
Line 5354 (original), 5354 (patched)
<https://reviews.apache.org/r/68313/#comment290978>

    Data size is too low. Can you verify?


- Ashutosh Chauhan


On Aug. 12, 2018, 10:08 p.m., Vineet Garg wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68313/
> -----------------------------------------------------------
> 
> (Updated Aug. 12, 2018, 10:08 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20366
>     https://issues.apache.org/jira/browse/HIVE-20366
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Heuristic to estimate unmatched rows for outer joins
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java 7682791f4d 
>   ql/src/test/results/clientpositive/annotate_stats_join.q.out b0d2b05ab0 
>   ql/src/test/results/clientpositive/llap/auto_join30.q.out 874511a112 
>   ql/src/test/results/clientpositive/llap/bucket_map_join_tez2.q.out 205cd444b2 
>   ql/src/test/results/clientpositive/llap/check_constraint.q.out ec1ed64fe8 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out 21b07b2c80 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out a98191653f 
>   ql/src/test/results/clientpositive/llap/insert_into_default_keyword.q.out 1a61c0e592 
>   ql/src/test/results/clientpositive/llap/join46.q.out b6ef9b184e 
>   ql/src/test/results/clientpositive/llap/join_emit_interval.q.out 9484b7ae0a 
>   ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out f8ce1ce93e 
>   ql/src/test/results/clientpositive/llap/mapjoin3.q.out 7aa7318896 
>   ql/src/test/results/clientpositive/llap/mapjoin46.q.out 204e7755e5 
>   ql/src/test/results/clientpositive/llap/mapjoin_emit_interval.q.out f6a1a6ee41 
>   ql/src/test/results/clientpositive/llap/skewjoinopt15.q.out cd20c3ab17 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out a045b12dc6 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out a865ee9259 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out f5f5f36aa3 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 4423aec8a2 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out cf3d60f4b3 
>   ql/src/test/results/clientpositive/llap/tez_join_tests.q.out bf2f5a8548 
>   ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 72b84a0106 
>   ql/src/test/results/clientpositive/llap/tez_union.q.out 914ed47859 
>   ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out f006e37b56 
>   ql/src/test/results/clientpositive/llap/vector_coalesce_3.q.out d05dd70206 
>   ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 6443678f89 
>   ql/src/test/results/clientpositive/llap/vector_outer_join0.q.out 19e98f3f4a 
>   ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out c74a588993 
>   ql/src/test/results/clientpositive/llap/vectorized_join46.q.out e03948f8b0 
>   ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out 7d45328d41 
>   ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out 00f5d7ef11 
> 
> 
> Diff: https://reviews.apache.org/r/68313/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vineet Garg
> 
>


Re: Review Request 68313: HIVE-20366 TPC-DS query78 stats estimates are off for is null filter

Posted by Vineet Garg <vg...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68313/
-----------------------------------------------------------

(Updated Aug. 19, 2018, 1:01 a.m.)


Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-20366
    https://issues.apache.org/jira/browse/HIVE-20366


Repository: hive-git


Description
-------

Heuristic to estimate unmatched rows for outer joins


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java 7682791f4d 
  ql/src/test/results/clientpositive/annotate_stats_join.q.out b0d2b05ab0 
  ql/src/test/results/clientpositive/llap/bucket_map_join_tez2.q.out 205cd444b2 
  ql/src/test/results/clientpositive/llap/check_constraint.q.out ec1ed64fe8 
  ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out 21b07b2c80 
  ql/src/test/results/clientpositive/llap/explainuser_1.q.out a98191653f 
  ql/src/test/results/clientpositive/llap/insert_into_default_keyword.q.out 1a61c0e592 
  ql/src/test/results/clientpositive/llap/join46.q.out b6ef9b184e 
  ql/src/test/results/clientpositive/llap/join_emit_interval.q.out 9484b7ae0a 
  ql/src/test/results/clientpositive/llap/mapjoin46.q.out 204e7755e5 
  ql/src/test/results/clientpositive/llap/mapjoin_emit_interval.q.out f6a1a6ee41 
  ql/src/test/results/clientpositive/llap/subquery_in.q.out cb2aa4c08c 
  ql/src/test/results/clientpositive/llap/subquery_multi.q.out a865ee9259 
  ql/src/test/results/clientpositive/llap/subquery_notin.q.out 70501f9cca 
  ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 82871f40a7 
  ql/src/test/results/clientpositive/llap/subquery_select.q.out 866ae28486 
  ql/src/test/results/clientpositive/llap/tez_join_tests.q.out bf2f5a8548 
  ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 72b84a0106 
  ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out f006e37b56 
  ql/src/test/results/clientpositive/llap/vector_coalesce_3.q.out d05dd70206 
  ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 6443678f89 
  ql/src/test/results/clientpositive/llap/vector_outer_join0.q.out 19e98f3f4a 
  ql/src/test/results/clientpositive/llap/vectorized_join46.q.out e03948f8b0 
  ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out 7d45328d41 
  ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out 00f5d7ef11 


Diff: https://reviews.apache.org/r/68313/diff/2/

Changes: https://reviews.apache.org/r/68313/diff/1-2/


Testing
-------


Thanks,

Vineet Garg