You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Eugene Koifman <ek...@hortonworks.com> on 2018/03/01 00:40:25 UTC

Re: Review Request 65415: HIVE-18571 stats issues for MM tables

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65415/#review198419
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
Line 1126 (original), 1134 (patched)
<https://reviews.apache.org/r/65415/#comment278548>

    I think Wei added this to skip aborted deltas.  In full acid it's no possible since it relies on MoveTask.  This could probably safely generalize to all isTransactional()



ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
Lines 1683 (patched)
<https://reviews.apache.org/r/65415/#comment278549>

    For full acid table, if you have no ParseDelta.isDeleteDelta(), then the ((non-delete) ParseDelta from getCurrentDirectories + (base | getOriginalFiles())) fileset should be accurate



ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
Lines 1695 (patched)
<https://reviews.apache.org/r/65415/#comment278550>

    adding ParseDelta.isDeleteDelta() seems wrong - if anything it should subtract from file size/row count



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 1585 (patched)
<https://reviews.apache.org/r/65415/#comment278551>

    unused



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 1597 (patched)
<https://reviews.apache.org/r/65415/#comment278553>

    This needs elaboration or be removed - it will be confusing to most people I think



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 1619 (patched)
<https://reviews.apache.org/r/65415/#comment278552>

    should all these todos be jiras?



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 1697 (patched)
<https://reviews.apache.org/r/65415/#comment278554>

    I saw a nubmer of comments/logic to this effect - probably better to wait for HIVE-18824 and remove these



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 2194 (patched)
<https://reviews.apache.org/r/65415/#comment278555>

    Jiras?



ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
Lines 523 (patched)
<https://reviews.apache.org/r/65415/#comment278556>

    Jira?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
Line 616 (original), 611 (patched)
<https://reviews.apache.org/r/65415/#comment278558>

    can you elaborate?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
Lines 647 (patched)
<https://reviews.apache.org/r/65415/#comment278557>

    why?  No ValidTxnList?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
Line 675 (original), 671 (patched)
<https://reviews.apache.org/r/65415/#comment278559>

    Jira? Assert?  at least a Wtf...



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
Lines 758 (patched)
<https://reviews.apache.org/r/65415/#comment278561>

    jira?


- Eugene Koifman


On Feb. 26, 2018, 7:14 p.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65415/
> -----------------------------------------------------------
> 
> (Updated Feb. 26, 2018, 7:14 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> f.,v fbghdscd
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java df77a4a2f2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java b490325091 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java fd8423129f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java 0a82225d4a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 70fcd2c142 
>   ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileWork.java 1a63d3f971 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 8b0af3e5c8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 67d05e65dd 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 7d2de75315 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cd6f1ee692 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/BasicStatsWork.java a4e770ce95 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 8ce0cb05b6 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java 946c300750 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java 1d7660e8b2 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java 7591c0681b 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/Partish.java 05b0474e90 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsAggregator.java d84cf136d5 
>   ql/src/test/results/clientpositive/autoColumnStats_4.q.out 9c0e020351 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/common/StatsSetupConst.java 59190893e6 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 89354a2d34 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java c6e34a8a22 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java 20c10607bb 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java b44ff8ce47 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java 50f873a013 
>   standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 2599ab103e 
> 
> 
> Diff: https://reviews.apache.org/r/65415/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>


Re: Review Request 65415: HIVE-18571 stats issues for MM tables

Posted by Sergey Shelukhin <se...@hortonworks.com>.

> On March 1, 2018, 12:40 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
> > Lines 1683 (patched)
> > <https://reviews.apache.org/r/65415/diff/4/?file=1965759#file1965759line1688>
> >
> >     For full acid table, if you have no ParseDelta.isDeleteDelta(), then the ((non-delete) ParseDelta from getCurrentDirectories + (base | getOriginalFiles())) fileset should be accurate

Can you elaborate? not sure what you mean wrt this code


> On March 1, 2018, 12:40 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
> > Lines 1695 (patched)
> > <https://reviews.apache.org/r/65415/diff/4/?file=1965759#file1965759line1700>
> >
> >     adding ParseDelta.isDeleteDelta() seems wrong - if anything it should subtract from file size/row count

Well, I think at least the size is more likely to be used for scan size estimation, and delete deltas would need to be scanned together with other files.
I think the proper impl of stats for ACID would need to be done in separate jira and actually account properly for ACID operations.


> On March 1, 2018, 12:40 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 1619 (patched)
> > <https://reviews.apache.org/r/65415/diff/4/?file=1965761#file1965761line1620>
> >
> >     should all these todos be jiras?

Will file jira(s) after the final version on the patch based on all the TODOs added where it's relevant.


> On March 1, 2018, 12:40 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 1697 (patched)
> > <https://reviews.apache.org/r/65415/diff/4/?file=1965761#file1965761line1698>
> >
> >     I saw a nubmer of comments/logic to this effect - probably better to wait for HIVE-18824 and remove these

Well, it can still happen due to some bug. I'll keep the checks for safety, we'll see 0 stats if they happen to trigger.


> On March 1, 2018, 12:40 a.m., Eugene Koifman wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
> > Lines 647 (patched)
> > <https://reviews.apache.org/r/65415/diff/4/?file=1965778#file1965778line656>
> >
> >     why?  No ValidTxnList?

yes; also the code itself is in QL


> On March 1, 2018, 12:40 a.m., Eugene Koifman wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
> > Line 675 (original), 671 (patched)
> > <https://reviews.apache.org/r/65415/diff/4/?file=1965778#file1965778line682>
> >
> >     Jira? Assert?  at least a Wtf...

it's impossible to assert what file list is... 
It would be valid to call this by getting the file list from AcidUtils.
I'm going to file a follow up JIRA for this.


- Sergey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65415/#review198419
-----------------------------------------------------------


On Feb. 27, 2018, 3:14 a.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65415/
> -----------------------------------------------------------
> 
> (Updated Feb. 27, 2018, 3:14 a.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> f.,v fbghdscd
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java df77a4a2f2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java b490325091 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java fd8423129f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java 0a82225d4a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 70fcd2c142 
>   ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileWork.java 1a63d3f971 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 8b0af3e5c8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 67d05e65dd 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 7d2de75315 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cd6f1ee692 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/BasicStatsWork.java a4e770ce95 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 8ce0cb05b6 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java 946c300750 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java 1d7660e8b2 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java 7591c0681b 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/Partish.java 05b0474e90 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsAggregator.java d84cf136d5 
>   ql/src/test/results/clientpositive/autoColumnStats_4.q.out 9c0e020351 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/common/StatsSetupConst.java 59190893e6 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 89354a2d34 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java c6e34a8a22 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java 20c10607bb 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java b44ff8ce47 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java 50f873a013 
>   standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 2599ab103e 
> 
> 
> Diff: https://reviews.apache.org/r/65415/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>