You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Zoltan Haindrich <ki...@rxd.hu> on 2018/05/15 13:07:48 UTC
Review Request 67126: HIVE-19326: stats auto gather: incorrect
aggregation during UNION queries (may lead to incorrect results)
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67126/
-----------------------------------------------------------
Review request for hive, Ashutosh Chauhan and Sergey Shelukhin.
Bugs: HIVE-19326
https://issues.apache.org/jira/browse/HIVE-19326
Repository: hive-git
Description
-------
in queries like: INSERT ... SELECT ... UNION ALL SELECT ...
the stats are only collected for the first select
there are 2 issues fixed - which both resulted in the same result:
* statscollectors have overwritten eachothers result; because the filename was only dependent from the resulting table name
* in case tez.merge.files the 2. task have not been set to collect statistics
Diffs
-----
itests/src/test/resources/testconfiguration.properties 13c08de3c57fdd7fcd4181814bb8e547c699b9f1
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 01a5b4c9c328cb034a613a1539cea2584e122fb4
ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 108bb57c4189b720e672eb6f09b1ef05f78448c2
ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 07991811f92bc7accae2fde23244f424bdd64c6b
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 605bb09caba3b60ad8d51e50dace70757ab80188
ql/src/java/org/apache/hadoop/hive/ql/stats/StatsCollectionContext.java 5c3328c63e8ca37e284e1dc1cdbee5969e185a80
ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsPublisher.java 902b37f7874dd5b1afaf8c8bb1259c6f0ddf817f
ql/src/test/queries/clientpositive/union_fast_stats.q d69bef3ac083d5a06acda9f47e5d2c1cbe2dfb69
ql/src/test/queries/clientpositive/union_rowcounts.q PRE-CREATION
ql/src/test/queries/clientpositive/union_stats.q 0e91c23fea475ec95a7fa67433707cb290b277a2
ql/src/test/results/clientpositive/llap/multiMapJoin1.q.out f8adcd4ba24f2122f7e7e20770e24a71cfb01a7e
ql/src/test/results/clientpositive/llap/union_fast_stats.q.out 4ca5f47a850ba62290c0845eb11a8d0a32780526
ql/src/test/results/clientpositive/llap/union_rowcounts.q.out PRE-CREATION
ql/src/test/results/clientpositive/llap/union_stats.q.out 5a088f40f5fea41d04a5b6cb6c12bf852a22f097
ql/src/test/results/clientpositive/union_stats.q.out 8bd3f44b6e276d6636082673084d66bba3b5c0d3
Diff: https://reviews.apache.org/r/67126/diff/1/
Testing
-------
Thanks,
Zoltan Haindrich
Re: Review Request 67126: HIVE-19326: stats auto gather: incorrect
aggregation during UNION queries (may lead to incorrect results)
Posted by Zoltan Haindrich <ki...@rxd.hu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67126/#review203111
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
Line 1910 (original), 1910 (patched)
<https://reviews.apache.org/r/67126/#comment285189>
this change fixes the tez.merge.files case; because the only problem in that cases is that the second filesink is not gathering stats
my own opinion: is that gathering statistics has no real overhead (it writes a file)...I think by enabling it here and there it somewhat just adds complexity
ql/src/test/results/clientpositive/llap/union_stats.q.out
Lines 427 (patched)
<https://reviews.apache.org/r/67126/#comment285190>
FS_7 is present 2 times in this plan
operator ids are reused multiple times in queries like:
from (select * from src union all select * from src)s
insert overwrite table t1 select *
insert overwrite table t2 select *;
if I understand correctly actually the file sink id's are reused for in every union branch to do output.
HIVE-19237 should fix this; and probably also remove indexInTezUnion setters/etc
- Zoltan Haindrich
On May 15, 2018, 1:07 p.m., Zoltan Haindrich wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67126/
> -----------------------------------------------------------
>
> (Updated May 15, 2018, 1:07 p.m.)
>
>
> Review request for hive, Ashutosh Chauhan and Sergey Shelukhin.
>
>
> Bugs: HIVE-19326
> https://issues.apache.org/jira/browse/HIVE-19326
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> in queries like: INSERT ... SELECT ... UNION ALL SELECT ...
> the stats are only collected for the first select
>
> there are 2 issues fixed - which both resulted in the same result:
>
> * statscollectors have overwritten eachothers result; because the filename was only dependent from the resulting table name
> * in case tez.merge.files the 2. task have not been set to collect statistics
>
>
> Diffs
> -----
>
> itests/src/test/resources/testconfiguration.properties 13c08de3c57fdd7fcd4181814bb8e547c699b9f1
> ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 01a5b4c9c328cb034a613a1539cea2584e122fb4
> ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 108bb57c4189b720e672eb6f09b1ef05f78448c2
> ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 07991811f92bc7accae2fde23244f424bdd64c6b
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 605bb09caba3b60ad8d51e50dace70757ab80188
> ql/src/java/org/apache/hadoop/hive/ql/stats/StatsCollectionContext.java 5c3328c63e8ca37e284e1dc1cdbee5969e185a80
> ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsPublisher.java 902b37f7874dd5b1afaf8c8bb1259c6f0ddf817f
> ql/src/test/queries/clientpositive/union_fast_stats.q d69bef3ac083d5a06acda9f47e5d2c1cbe2dfb69
> ql/src/test/queries/clientpositive/union_rowcounts.q PRE-CREATION
> ql/src/test/queries/clientpositive/union_stats.q 0e91c23fea475ec95a7fa67433707cb290b277a2
> ql/src/test/results/clientpositive/llap/multiMapJoin1.q.out f8adcd4ba24f2122f7e7e20770e24a71cfb01a7e
> ql/src/test/results/clientpositive/llap/union_fast_stats.q.out 4ca5f47a850ba62290c0845eb11a8d0a32780526
> ql/src/test/results/clientpositive/llap/union_rowcounts.q.out PRE-CREATION
> ql/src/test/results/clientpositive/llap/union_stats.q.out 5a088f40f5fea41d04a5b6cb6c12bf852a22f097
> ql/src/test/results/clientpositive/union_stats.q.out 8bd3f44b6e276d6636082673084d66bba3b5c0d3
>
>
> Diff: https://reviews.apache.org/r/67126/diff/1/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Zoltan Haindrich
>
>