You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Vineet Garg <vg...@hortonworks.com> on 2017/07/11 00:50:50 UTC
Review Request 60757: HIVE-17066: Better estimation for number of
nulls for outer join
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60757/
-----------------------------------------------------------
Review request for hive and Ashutosh Chauhan.
Bugs: HIVE-17066
https://issues.apache.org/jira/browse/HIVE-17066
Repository: hive-git
Description
-------
This patch improves estimation of number of nulls for columns coming out of outer join.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java ae32a28c66
ql/src/test/results/clientpositive/annotate_stats_join.q.out 48ba40ef41
ql/src/test/results/clientpositive/cbo_rp_join0.q.out ba96de2fe1
ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out b970dd6716
ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out 6067c77f85
ql/src/test/results/clientpositive/llap/explainuser_1.q.out bb42e45c2d
ql/src/test/results/clientpositive/llap/subquery_in.q.out e401f31e52
ql/src/test/results/clientpositive/llap/subquery_multi.q.out a876c620e3
ql/src/test/results/clientpositive/llap/subquery_notin.q.out 018ef1db54
ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 3a0d1464c5
ql/src/test/results/clientpositive/llap/subquery_select.q.out 703d19de05
ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out f434a1e00b
ql/src/test/results/clientpositive/llap/tez_join_tests.q.out b0eff1e1f4
ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 418c23c16d
ql/src/test/results/clientpositive/llap/tez_smb_empty.q.out cd392a7b2b
ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out 6eba119ca0
ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 82111bb27f
ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out f64e7393d9
ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out c24a2d0e6e
ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out d09bc52155
Diff: https://reviews.apache.org/r/60757/diff/1/
Testing
-------
Updated existing tests
Thanks,
Vineet Garg
Re: Review Request 60757: HIVE-17066: Better estimation for number of
nulls for outer join
Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60757/#review180200
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Lines 1921 (patched)
<https://reviews.apache.org/r/60757/#comment255242>
Better function name: updateNumNulls
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Lines 1925 (patched)
<https://reviews.apache.org/r/60757/#comment255243>
Better comment: TODO: handle multi joins
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Lines 1933 (patched)
<https://reviews.apache.org/r/60757/#comment255250>
We shall update for else case also, otherwise I believe it will be set to 0 ? We can use:
Math.min(newNumRows, oldNumNulls);
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Lines 1936-1937 (patched)
<https://reviews.apache.org/r/60757/#comment255246>
Add comment: interimNumRows represent number of matches for join keys on two sides.
newNumRows-interimNumRows represent number of non-matches.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Lines 1980 (patched)
<https://reviews.apache.org/r/60757/#comment255249>
This I think we shall implement. If its a join key numNulls = 0. Else its
Math.min(newNumRows, oldNumNulls);
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Lines 2078 (patched)
<https://reviews.apache.org/r/60757/#comment255251>
Better name: computeRowCountAssumingInnerJoin()
- Ashutosh Chauhan
On July 11, 2017, 12:50 a.m., Vineet Garg wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60757/
> -----------------------------------------------------------
>
> (Updated July 11, 2017, 12:50 a.m.)
>
>
> Review request for hive and Ashutosh Chauhan.
>
>
> Bugs: HIVE-17066
> https://issues.apache.org/jira/browse/HIVE-17066
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch improves estimation of number of nulls for columns coming out of outer join.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java ae32a28c66
> ql/src/test/results/clientpositive/annotate_stats_join.q.out 48ba40ef41
> ql/src/test/results/clientpositive/cbo_rp_join0.q.out ba96de2fe1
> ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out b970dd6716
> ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out 6067c77f85
> ql/src/test/results/clientpositive/llap/explainuser_1.q.out bb42e45c2d
> ql/src/test/results/clientpositive/llap/subquery_in.q.out e401f31e52
> ql/src/test/results/clientpositive/llap/subquery_multi.q.out a876c620e3
> ql/src/test/results/clientpositive/llap/subquery_notin.q.out 018ef1db54
> ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 3a0d1464c5
> ql/src/test/results/clientpositive/llap/subquery_select.q.out 703d19de05
> ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out f434a1e00b
> ql/src/test/results/clientpositive/llap/tez_join_tests.q.out b0eff1e1f4
> ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 418c23c16d
> ql/src/test/results/clientpositive/llap/tez_smb_empty.q.out cd392a7b2b
> ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out 6eba119ca0
> ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 82111bb27f
> ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out f64e7393d9
> ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out c24a2d0e6e
> ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out d09bc52155
>
>
> Diff: https://reviews.apache.org/r/60757/diff/1/
>
>
> Testing
> -------
>
> Updated existing tests
>
>
> Thanks,
>
> Vineet Garg
>
>
Re: Review Request 60757: HIVE-17066: Better estimation for number of
nulls for outer join
Posted by Vineet Garg <vg...@hortonworks.com>.
> On July 11, 2017, 10:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
> > Line 1965 (original)
> > <https://reviews.apache.org/r/60757/diff/1-2/?file=1773876#file1773876line1971>
> >
> > Did you intend to remove break; from here?
No this was un-intentional! Thanks for catching this. Updating patch
- Vineet
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60757/#review180251
-----------------------------------------------------------
On July 11, 2017, 10:07 p.m., Vineet Garg wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60757/
> -----------------------------------------------------------
>
> (Updated July 11, 2017, 10:07 p.m.)
>
>
> Review request for hive and Ashutosh Chauhan.
>
>
> Bugs: HIVE-17066
> https://issues.apache.org/jira/browse/HIVE-17066
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch improves estimation of number of nulls for columns coming out of outer join.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java ae32a28c66
> ql/src/test/results/clientpositive/annotate_stats_join.q.out 48ba40ef41
> ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out b4d46d28e9
> ql/src/test/results/clientpositive/cbo_rp_join0.q.out ba96de2fe1
> ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out b970dd6716
> ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out 6067c77f85
> ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 88c4a17bad
> ql/src/test/results/clientpositive/llap/explainuser_1.q.out bb42e45c2d
> ql/src/test/results/clientpositive/llap/explainuser_4.q.out 99db828d6a
> ql/src/test/results/clientpositive/llap/subquery_in.q.out e401f31e52
> ql/src/test/results/clientpositive/llap/subquery_multi.q.out a876c620e3
> ql/src/test/results/clientpositive/llap/subquery_notin.q.out 018ef1db54
> ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 3a0d1464c5
> ql/src/test/results/clientpositive/llap/subquery_select.q.out 703d19de05
> ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out 6dd3fbf6ca
> ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out f434a1e00b
> ql/src/test/results/clientpositive/llap/tez_join_tests.q.out b0eff1e1f4
> ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 418c23c16d
> ql/src/test/results/clientpositive/llap/tez_smb_empty.q.out cd392a7b2b
> ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out 3b47383803
> ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out 6eba119ca0
> ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 82111bb27f
> ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out f64e7393d9
> ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out c24a2d0e6e
> ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out e56800ad40
> ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out ed28530eec
> ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out a750d9fd01
> ql/src/test/results/clientpositive/llap/windowing_gby.q.out 945f8e0caf
> ql/src/test/results/clientpositive/perf/query14.q.out 42bad8da14
> ql/src/test/results/clientpositive/perf/query23.q.out ebd2271108
> ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out d09bc52155
> ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 14535f63da
>
>
> Diff: https://reviews.apache.org/r/60757/diff/2/
>
>
> Testing
> -------
>
> Updated existing tests
>
>
> Thanks,
>
> Vineet Garg
>
>
Re: Review Request 60757: HIVE-17066: Better estimation for number of
nulls for outer join
Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60757/#review180251
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Line 1965 (original)
<https://reviews.apache.org/r/60757/#comment255329>
Did you intend to remove break; from here?
- Ashutosh Chauhan
On July 11, 2017, 10:07 p.m., Vineet Garg wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60757/
> -----------------------------------------------------------
>
> (Updated July 11, 2017, 10:07 p.m.)
>
>
> Review request for hive and Ashutosh Chauhan.
>
>
> Bugs: HIVE-17066
> https://issues.apache.org/jira/browse/HIVE-17066
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch improves estimation of number of nulls for columns coming out of outer join.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java ae32a28c66
> ql/src/test/results/clientpositive/annotate_stats_join.q.out 48ba40ef41
> ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out b4d46d28e9
> ql/src/test/results/clientpositive/cbo_rp_join0.q.out ba96de2fe1
> ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out b970dd6716
> ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out 6067c77f85
> ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 88c4a17bad
> ql/src/test/results/clientpositive/llap/explainuser_1.q.out bb42e45c2d
> ql/src/test/results/clientpositive/llap/explainuser_4.q.out 99db828d6a
> ql/src/test/results/clientpositive/llap/subquery_in.q.out e401f31e52
> ql/src/test/results/clientpositive/llap/subquery_multi.q.out a876c620e3
> ql/src/test/results/clientpositive/llap/subquery_notin.q.out 018ef1db54
> ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 3a0d1464c5
> ql/src/test/results/clientpositive/llap/subquery_select.q.out 703d19de05
> ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out 6dd3fbf6ca
> ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out f434a1e00b
> ql/src/test/results/clientpositive/llap/tez_join_tests.q.out b0eff1e1f4
> ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 418c23c16d
> ql/src/test/results/clientpositive/llap/tez_smb_empty.q.out cd392a7b2b
> ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out 3b47383803
> ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out 6eba119ca0
> ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 82111bb27f
> ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out f64e7393d9
> ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out c24a2d0e6e
> ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out e56800ad40
> ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out ed28530eec
> ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out a750d9fd01
> ql/src/test/results/clientpositive/llap/windowing_gby.q.out 945f8e0caf
> ql/src/test/results/clientpositive/perf/query14.q.out 42bad8da14
> ql/src/test/results/clientpositive/perf/query23.q.out ebd2271108
> ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out d09bc52155
> ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 14535f63da
>
>
> Diff: https://reviews.apache.org/r/60757/diff/2/
>
>
> Testing
> -------
>
> Updated existing tests
>
>
> Thanks,
>
> Vineet Garg
>
>
Re: Review Request 60757: HIVE-17066: Better estimation for number of
nulls for outer join
Posted by Vineet Garg <vg...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60757/
-----------------------------------------------------------
(Updated July 11, 2017, 10:33 p.m.)
Review request for hive and Ashutosh Chauhan.
Bugs: HIVE-17066
https://issues.apache.org/jira/browse/HIVE-17066
Repository: hive-git
Description
-------
This patch improves estimation of number of nulls for columns coming out of outer join.
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java ae32a28c66
ql/src/test/results/clientpositive/annotate_stats_join.q.out 48ba40ef41
ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out b4d46d28e9
ql/src/test/results/clientpositive/cbo_rp_join0.q.out ba96de2fe1
ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out b970dd6716
ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out 6067c77f85
ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 88c4a17bad
ql/src/test/results/clientpositive/llap/explainuser_1.q.out bb42e45c2d
ql/src/test/results/clientpositive/llap/explainuser_4.q.out 99db828d6a
ql/src/test/results/clientpositive/llap/subquery_in.q.out e401f31e52
ql/src/test/results/clientpositive/llap/subquery_multi.q.out a876c620e3
ql/src/test/results/clientpositive/llap/subquery_notin.q.out 018ef1db54
ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 3a0d1464c5
ql/src/test/results/clientpositive/llap/subquery_select.q.out 703d19de05
ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out 6dd3fbf6ca
ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out f434a1e00b
ql/src/test/results/clientpositive/llap/tez_join_tests.q.out b0eff1e1f4
ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 418c23c16d
ql/src/test/results/clientpositive/llap/tez_smb_empty.q.out cd392a7b2b
ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out 3b47383803
ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out 6eba119ca0
ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 82111bb27f
ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out f64e7393d9
ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out c24a2d0e6e
ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out e56800ad40
ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out ed28530eec
ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out a750d9fd01
ql/src/test/results/clientpositive/llap/windowing_gby.q.out 945f8e0caf
ql/src/test/results/clientpositive/perf/query14.q.out 42bad8da14
ql/src/test/results/clientpositive/perf/query23.q.out ebd2271108
ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out d09bc52155
ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 14535f63da
Diff: https://reviews.apache.org/r/60757/diff/3/
Changes: https://reviews.apache.org/r/60757/diff/2-3/
Testing
-------
Updated existing tests
Thanks,
Vineet Garg
Re: Review Request 60757: HIVE-17066: Better estimation for number of
nulls for outer join
Posted by Vineet Garg <vg...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60757/
-----------------------------------------------------------
(Updated July 11, 2017, 10:07 p.m.)
Review request for hive and Ashutosh Chauhan.
Changes
-------
Addressed review comments
Bugs: HIVE-17066
https://issues.apache.org/jira/browse/HIVE-17066
Repository: hive-git
Description
-------
This patch improves estimation of number of nulls for columns coming out of outer join.
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java ae32a28c66
ql/src/test/results/clientpositive/annotate_stats_join.q.out 48ba40ef41
ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out b4d46d28e9
ql/src/test/results/clientpositive/cbo_rp_join0.q.out ba96de2fe1
ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out b970dd6716
ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out 6067c77f85
ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 88c4a17bad
ql/src/test/results/clientpositive/llap/explainuser_1.q.out bb42e45c2d
ql/src/test/results/clientpositive/llap/explainuser_4.q.out 99db828d6a
ql/src/test/results/clientpositive/llap/subquery_in.q.out e401f31e52
ql/src/test/results/clientpositive/llap/subquery_multi.q.out a876c620e3
ql/src/test/results/clientpositive/llap/subquery_notin.q.out 018ef1db54
ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 3a0d1464c5
ql/src/test/results/clientpositive/llap/subquery_select.q.out 703d19de05
ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out 6dd3fbf6ca
ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out f434a1e00b
ql/src/test/results/clientpositive/llap/tez_join_tests.q.out b0eff1e1f4
ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 418c23c16d
ql/src/test/results/clientpositive/llap/tez_smb_empty.q.out cd392a7b2b
ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out 3b47383803
ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out 6eba119ca0
ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 82111bb27f
ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out f64e7393d9
ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out c24a2d0e6e
ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out e56800ad40
ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out ed28530eec
ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out a750d9fd01
ql/src/test/results/clientpositive/llap/windowing_gby.q.out 945f8e0caf
ql/src/test/results/clientpositive/perf/query14.q.out 42bad8da14
ql/src/test/results/clientpositive/perf/query23.q.out ebd2271108
ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out d09bc52155
ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 14535f63da
Diff: https://reviews.apache.org/r/60757/diff/2/
Changes: https://reviews.apache.org/r/60757/diff/1-2/
Testing
-------
Updated existing tests
Thanks,
Vineet Garg