You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Vineet Garg <vg...@hortonworks.com> on 2018/08/12 22:08:43 UTC
Review Request 68313: HIVE-20366 TPC-DS query78 stats estimates are
off for is null filter
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68313/
-----------------------------------------------------------
Review request for hive and Ashutosh Chauhan.
Bugs: HIVE-20366
https://issues.apache.org/jira/browse/HIVE-20366
Repository: hive-git
Description
-------
Heuristic to estimate unmatched rows for outer joins
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java 7682791f4d
ql/src/test/results/clientpositive/annotate_stats_join.q.out b0d2b05ab0
ql/src/test/results/clientpositive/llap/auto_join30.q.out 874511a112
ql/src/test/results/clientpositive/llap/bucket_map_join_tez2.q.out 205cd444b2
ql/src/test/results/clientpositive/llap/check_constraint.q.out ec1ed64fe8
ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out 21b07b2c80
ql/src/test/results/clientpositive/llap/explainuser_1.q.out a98191653f
ql/src/test/results/clientpositive/llap/insert_into_default_keyword.q.out 1a61c0e592
ql/src/test/results/clientpositive/llap/join46.q.out b6ef9b184e
ql/src/test/results/clientpositive/llap/join_emit_interval.q.out 9484b7ae0a
ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out f8ce1ce93e
ql/src/test/results/clientpositive/llap/mapjoin3.q.out 7aa7318896
ql/src/test/results/clientpositive/llap/mapjoin46.q.out 204e7755e5
ql/src/test/results/clientpositive/llap/mapjoin_emit_interval.q.out f6a1a6ee41
ql/src/test/results/clientpositive/llap/skewjoinopt15.q.out cd20c3ab17
ql/src/test/results/clientpositive/llap/subquery_in.q.out a045b12dc6
ql/src/test/results/clientpositive/llap/subquery_multi.q.out a865ee9259
ql/src/test/results/clientpositive/llap/subquery_notin.q.out f5f5f36aa3
ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 4423aec8a2
ql/src/test/results/clientpositive/llap/subquery_select.q.out cf3d60f4b3
ql/src/test/results/clientpositive/llap/tez_join_tests.q.out bf2f5a8548
ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 72b84a0106
ql/src/test/results/clientpositive/llap/tez_union.q.out 914ed47859
ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out f006e37b56
ql/src/test/results/clientpositive/llap/vector_coalesce_3.q.out d05dd70206
ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 6443678f89
ql/src/test/results/clientpositive/llap/vector_outer_join0.q.out 19e98f3f4a
ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out c74a588993
ql/src/test/results/clientpositive/llap/vectorized_join46.q.out e03948f8b0
ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out 7d45328d41
ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out 00f5d7ef11
Diff: https://reviews.apache.org/r/68313/diff/1/
Testing
-------
Thanks,
Vineet Garg
Re: Review Request 68313: HIVE-20366 TPC-DS query78 stats estimates
are off for is null filter
Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68313/#review207573
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Lines 1700 (patched)
<https://reviews.apache.org/r/68313/#comment290972>
Lets s/dangling/unmatched/gc everywhere.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Lines 1703 (patched)
<https://reviews.apache.org/r/68313/#comment290973>
e.g., here
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Line 2162 (original), 2178 (patched)
<https://reviews.apache.org/r/68313/#comment290975>
Rename interimNumRows to unmatchedRows?
ql/src/test/results/clientpositive/annotate_stats_join.q.out
Line 901 (original), 901 (patched)
<https://reviews.apache.org/r/68313/#comment290976>
This looks incorrect both before and after. There are few columns being output with 4 of them being join key. 194/54 = less than 4 bytes per row. Can you check how much nulls we predicted here and if thats correct?
ql/src/test/results/clientpositive/llap/auto_join30.q.out
Line 172 (original), 172 (patched)
<https://reviews.apache.org/r/68313/#comment290977>
Data size is way underestimated. Can you verify?
ql/src/test/results/clientpositive/llap/subquery_notin.q.out
Line 5354 (original), 5354 (patched)
<https://reviews.apache.org/r/68313/#comment290978>
Data size is too low. Can you verify?
- Ashutosh Chauhan
On Aug. 12, 2018, 10:08 p.m., Vineet Garg wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68313/
> -----------------------------------------------------------
>
> (Updated Aug. 12, 2018, 10:08 p.m.)
>
>
> Review request for hive and Ashutosh Chauhan.
>
>
> Bugs: HIVE-20366
> https://issues.apache.org/jira/browse/HIVE-20366
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Heuristic to estimate unmatched rows for outer joins
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java 7682791f4d
> ql/src/test/results/clientpositive/annotate_stats_join.q.out b0d2b05ab0
> ql/src/test/results/clientpositive/llap/auto_join30.q.out 874511a112
> ql/src/test/results/clientpositive/llap/bucket_map_join_tez2.q.out 205cd444b2
> ql/src/test/results/clientpositive/llap/check_constraint.q.out ec1ed64fe8
> ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out 21b07b2c80
> ql/src/test/results/clientpositive/llap/explainuser_1.q.out a98191653f
> ql/src/test/results/clientpositive/llap/insert_into_default_keyword.q.out 1a61c0e592
> ql/src/test/results/clientpositive/llap/join46.q.out b6ef9b184e
> ql/src/test/results/clientpositive/llap/join_emit_interval.q.out 9484b7ae0a
> ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out f8ce1ce93e
> ql/src/test/results/clientpositive/llap/mapjoin3.q.out 7aa7318896
> ql/src/test/results/clientpositive/llap/mapjoin46.q.out 204e7755e5
> ql/src/test/results/clientpositive/llap/mapjoin_emit_interval.q.out f6a1a6ee41
> ql/src/test/results/clientpositive/llap/skewjoinopt15.q.out cd20c3ab17
> ql/src/test/results/clientpositive/llap/subquery_in.q.out a045b12dc6
> ql/src/test/results/clientpositive/llap/subquery_multi.q.out a865ee9259
> ql/src/test/results/clientpositive/llap/subquery_notin.q.out f5f5f36aa3
> ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 4423aec8a2
> ql/src/test/results/clientpositive/llap/subquery_select.q.out cf3d60f4b3
> ql/src/test/results/clientpositive/llap/tez_join_tests.q.out bf2f5a8548
> ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 72b84a0106
> ql/src/test/results/clientpositive/llap/tez_union.q.out 914ed47859
> ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out f006e37b56
> ql/src/test/results/clientpositive/llap/vector_coalesce_3.q.out d05dd70206
> ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 6443678f89
> ql/src/test/results/clientpositive/llap/vector_outer_join0.q.out 19e98f3f4a
> ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out c74a588993
> ql/src/test/results/clientpositive/llap/vectorized_join46.q.out e03948f8b0
> ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out 7d45328d41
> ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out 00f5d7ef11
>
>
> Diff: https://reviews.apache.org/r/68313/diff/1/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Vineet Garg
>
>
Re: Review Request 68313: HIVE-20366 TPC-DS query78 stats estimates
are off for is null filter
Posted by Vineet Garg <vg...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68313/
-----------------------------------------------------------
(Updated Aug. 19, 2018, 1:01 a.m.)
Review request for hive and Ashutosh Chauhan.
Bugs: HIVE-20366
https://issues.apache.org/jira/browse/HIVE-20366
Repository: hive-git
Description
-------
Heuristic to estimate unmatched rows for outer joins
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java 7682791f4d
ql/src/test/results/clientpositive/annotate_stats_join.q.out b0d2b05ab0
ql/src/test/results/clientpositive/llap/bucket_map_join_tez2.q.out 205cd444b2
ql/src/test/results/clientpositive/llap/check_constraint.q.out ec1ed64fe8
ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out 21b07b2c80
ql/src/test/results/clientpositive/llap/explainuser_1.q.out a98191653f
ql/src/test/results/clientpositive/llap/insert_into_default_keyword.q.out 1a61c0e592
ql/src/test/results/clientpositive/llap/join46.q.out b6ef9b184e
ql/src/test/results/clientpositive/llap/join_emit_interval.q.out 9484b7ae0a
ql/src/test/results/clientpositive/llap/mapjoin46.q.out 204e7755e5
ql/src/test/results/clientpositive/llap/mapjoin_emit_interval.q.out f6a1a6ee41
ql/src/test/results/clientpositive/llap/subquery_in.q.out cb2aa4c08c
ql/src/test/results/clientpositive/llap/subquery_multi.q.out a865ee9259
ql/src/test/results/clientpositive/llap/subquery_notin.q.out 70501f9cca
ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 82871f40a7
ql/src/test/results/clientpositive/llap/subquery_select.q.out 866ae28486
ql/src/test/results/clientpositive/llap/tez_join_tests.q.out bf2f5a8548
ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 72b84a0106
ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out f006e37b56
ql/src/test/results/clientpositive/llap/vector_coalesce_3.q.out d05dd70206
ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 6443678f89
ql/src/test/results/clientpositive/llap/vector_outer_join0.q.out 19e98f3f4a
ql/src/test/results/clientpositive/llap/vectorized_join46.q.out e03948f8b0
ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out 7d45328d41
ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out 00f5d7ef11
Diff: https://reviews.apache.org/r/68313/diff/2/
Changes: https://reviews.apache.org/r/68313/diff/1-2/
Testing
-------
Thanks,
Vineet Garg