You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Karen Coppage (Jira)" <ji...@apache.org> on 2020/08/13 12:01:00 UTC

[jira] [Updated] (HIVE-23966) Minor query-based compaction always results in delta dirs with minWriteId=1

     [ https://issues.apache.org/jira/browse/HIVE-23966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karen Coppage updated HIVE-23966:
---------------------------------
    Fix Version/s: 4.0.0

> Minor query-based compaction always results in delta dirs with minWriteId=1
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-23966
>                 URL: https://issues.apache.org/jira/browse/HIVE-23966
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Karen Coppage
>            Assignee: Karen Coppage
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Minor compaction after major/IOW will result in directories that look like:
>  * base_z_v
>  * delta_1_y_v
>  * delete_delta_1_y_v
> Should be:
>  * base_z_v
>  * delta_(z+1)_y_v
>  * delete_delta_(z+1)_y_v
> Issues this causes:
> For example, after running insert overwrite, then minor compaction, major compaction will fail with the following error:
> {noformat}
> Found 2 equal splits: OrcSplit [hdfs://.../warehouse/tablespace/managed/hive/bucketed/delta_0000001_0000006_v0001058/bucket_00004,
> start=0, length=722, isOriginal=false, fileLength=722, hasFooter=false, hasBase=true, deltas=1] and OrcSplit 
> [hdfs://.../warehouse/tablespace/managed/hive/bucketed/base_0000001/bucket_00004_0,
> start=0, length=811, isOriginal=false, fileLength=811, hasFooter=false, hasBase=true, deltas=1]
> {noformat}
> or it can fail with:
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Wrong sort order of Acid rows detected for the rows: org.apache.hadoop.hive.ql.udf.generic.GenericUDFValidateAcidSortOrder$WriteIdRowId@201be62b an
> d org.apache.hadoop.hive.ql.udf.generic.GenericUDFValidateAcidSortOrder$WriteIdRowId@5f97bd3f
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)