You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Eugene Koifman <ek...@hortonworks.com> on 2018/03/01 00:40:25 UTC
Re: Review Request 65415: HIVE-18571 stats issues for MM tables
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65415/#review198419
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
Line 1126 (original), 1134 (patched)
<https://reviews.apache.org/r/65415/#comment278548>
I think Wei added this to skip aborted deltas. In full acid it's no possible since it relies on MoveTask. This could probably safely generalize to all isTransactional()
ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
Lines 1683 (patched)
<https://reviews.apache.org/r/65415/#comment278549>
For full acid table, if you have no ParseDelta.isDeleteDelta(), then the ((non-delete) ParseDelta from getCurrentDirectories + (base | getOriginalFiles())) fileset should be accurate
ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
Lines 1695 (patched)
<https://reviews.apache.org/r/65415/#comment278550>
adding ParseDelta.isDeleteDelta() seems wrong - if anything it should subtract from file size/row count
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 1585 (patched)
<https://reviews.apache.org/r/65415/#comment278551>
unused
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 1597 (patched)
<https://reviews.apache.org/r/65415/#comment278553>
This needs elaboration or be removed - it will be confusing to most people I think
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 1619 (patched)
<https://reviews.apache.org/r/65415/#comment278552>
should all these todos be jiras?
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 1697 (patched)
<https://reviews.apache.org/r/65415/#comment278554>
I saw a nubmer of comments/logic to this effect - probably better to wait for HIVE-18824 and remove these
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 2194 (patched)
<https://reviews.apache.org/r/65415/#comment278555>
Jiras?
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
Lines 523 (patched)
<https://reviews.apache.org/r/65415/#comment278556>
Jira?
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
Line 616 (original), 611 (patched)
<https://reviews.apache.org/r/65415/#comment278558>
can you elaborate?
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
Lines 647 (patched)
<https://reviews.apache.org/r/65415/#comment278557>
why? No ValidTxnList?
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
Line 675 (original), 671 (patched)
<https://reviews.apache.org/r/65415/#comment278559>
Jira? Assert? at least a Wtf...
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
Lines 758 (patched)
<https://reviews.apache.org/r/65415/#comment278561>
jira?
- Eugene Koifman
On Feb. 26, 2018, 7:14 p.m., Sergey Shelukhin wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65415/
> -----------------------------------------------------------
>
> (Updated Feb. 26, 2018, 7:14 p.m.)
>
>
> Review request for hive and Eugene Koifman.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> f.,v fbghdscd
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java df77a4a2f2
> ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java b490325091
> ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java fd8423129f
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java 0a82225d4a
> ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 70fcd2c142
> ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileWork.java 1a63d3f971
> ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 8b0af3e5c8
> ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 67d05e65dd
> ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 7d2de75315
> ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cd6f1ee692
> ql/src/java/org/apache/hadoop/hive/ql/plan/BasicStatsWork.java a4e770ce95
> ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 8ce0cb05b6
> ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java 946c300750
> ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java 1d7660e8b2
> ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java 7591c0681b
> ql/src/java/org/apache/hadoop/hive/ql/stats/Partish.java 05b0474e90
> ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsAggregator.java d84cf136d5
> ql/src/test/results/clientpositive/autoColumnStats_4.q.out 9c0e020351
> standalone-metastore/src/main/java/org/apache/hadoop/hive/common/StatsSetupConst.java 59190893e6
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 89354a2d34
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java c6e34a8a22
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java 20c10607bb
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java b44ff8ce47
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java 50f873a013
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 2599ab103e
>
>
> Diff: https://reviews.apache.org/r/65415/diff/4/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Sergey Shelukhin
>
>
Re: Review Request 65415: HIVE-18571 stats issues for MM tables
Posted by Sergey Shelukhin <se...@hortonworks.com>.
> On March 1, 2018, 12:40 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
> > Lines 1683 (patched)
> > <https://reviews.apache.org/r/65415/diff/4/?file=1965759#file1965759line1688>
> >
> > For full acid table, if you have no ParseDelta.isDeleteDelta(), then the ((non-delete) ParseDelta from getCurrentDirectories + (base | getOriginalFiles())) fileset should be accurate
Can you elaborate? not sure what you mean wrt this code
> On March 1, 2018, 12:40 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
> > Lines 1695 (patched)
> > <https://reviews.apache.org/r/65415/diff/4/?file=1965759#file1965759line1700>
> >
> > adding ParseDelta.isDeleteDelta() seems wrong - if anything it should subtract from file size/row count
Well, I think at least the size is more likely to be used for scan size estimation, and delete deltas would need to be scanned together with other files.
I think the proper impl of stats for ACID would need to be done in separate jira and actually account properly for ACID operations.
> On March 1, 2018, 12:40 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 1619 (patched)
> > <https://reviews.apache.org/r/65415/diff/4/?file=1965761#file1965761line1620>
> >
> > should all these todos be jiras?
Will file jira(s) after the final version on the patch based on all the TODOs added where it's relevant.
> On March 1, 2018, 12:40 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 1697 (patched)
> > <https://reviews.apache.org/r/65415/diff/4/?file=1965761#file1965761line1698>
> >
> > I saw a nubmer of comments/logic to this effect - probably better to wait for HIVE-18824 and remove these
Well, it can still happen due to some bug. I'll keep the checks for safety, we'll see 0 stats if they happen to trigger.
> On March 1, 2018, 12:40 a.m., Eugene Koifman wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
> > Lines 647 (patched)
> > <https://reviews.apache.org/r/65415/diff/4/?file=1965778#file1965778line656>
> >
> > why? No ValidTxnList?
yes; also the code itself is in QL
> On March 1, 2018, 12:40 a.m., Eugene Koifman wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
> > Line 675 (original), 671 (patched)
> > <https://reviews.apache.org/r/65415/diff/4/?file=1965778#file1965778line682>
> >
> > Jira? Assert? at least a Wtf...
it's impossible to assert what file list is...
It would be valid to call this by getting the file list from AcidUtils.
I'm going to file a follow up JIRA for this.
- Sergey
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65415/#review198419
-----------------------------------------------------------
On Feb. 27, 2018, 3:14 a.m., Sergey Shelukhin wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65415/
> -----------------------------------------------------------
>
> (Updated Feb. 27, 2018, 3:14 a.m.)
>
>
> Review request for hive and Eugene Koifman.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> f.,v fbghdscd
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java df77a4a2f2
> ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java b490325091
> ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java fd8423129f
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java 0a82225d4a
> ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 70fcd2c142
> ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileWork.java 1a63d3f971
> ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 8b0af3e5c8
> ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 67d05e65dd
> ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 7d2de75315
> ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cd6f1ee692
> ql/src/java/org/apache/hadoop/hive/ql/plan/BasicStatsWork.java a4e770ce95
> ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 8ce0cb05b6
> ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java 946c300750
> ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java 1d7660e8b2
> ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java 7591c0681b
> ql/src/java/org/apache/hadoop/hive/ql/stats/Partish.java 05b0474e90
> ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsAggregator.java d84cf136d5
> ql/src/test/results/clientpositive/autoColumnStats_4.q.out 9c0e020351
> standalone-metastore/src/main/java/org/apache/hadoop/hive/common/StatsSetupConst.java 59190893e6
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 89354a2d34
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java c6e34a8a22
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java 20c10607bb
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java b44ff8ce47
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java 50f873a013
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 2599ab103e
>
>
> Diff: https://reviews.apache.org/r/65415/diff/4/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Sergey Shelukhin
>
>