You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Denys Kuzmenko (Jira)" <ji...@apache.org> on 2022/10/19 08:56:00 UTC

[jira] [Updated] (HIVE-25441) Incorrect deltas split for sub-compactions when using `hive.compactor.max.num.delta`

     [ https://issues.apache.org/jira/browse/HIVE-25441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Denys Kuzmenko updated HIVE-25441:
----------------------------------
    Fix Version/s: 4.0.0-alpha-1

> Incorrect deltas split for sub-compactions when using `hive.compactor.max.num.delta`
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-25441
>                 URL: https://issues.apache.org/jira/browse/HIVE-25441
>             Project: Hive
>          Issue Type: Task
>            Reporter: Denys Kuzmenko
>            Assignee: Denys Kuzmenko
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0-alpha-1
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> {code}
> #Repro steps:
> #1./ set hive.compactor.max.num.delta to 5 on HMS
> #2./ Set up the table
> set hive.merge.cardinality.check=false;
> create table test (k int);
> ALTER TABLE test SET TBLPROPERTIES ('NO_AUTO_COMPACTION'='true');
> insert into test values (1);
> alter table test compact 'major' and wait;
> dfs -ls '/warehouse/tablespace/managed/hive/test';
> # drwxrwx---+  - hive hive          0 2021-08-09 12:26 /warehouse/tablespace/managed/hive/test/base_0000008_v0000416
> select * from test;
> # k=1
> #run 3 times so there's enough delta dirs, ie. 6 (should just increase k by 1)
> #basically just removes the row and adds a new row with k+1 value
> MERGE INTO test AS T USING (select * from test union all select k+1 from test) AS S
> ON T.k=s.k
> WHEN MATCHED THEN DELETE
> WHEN not MATCHED THEN INSERT values (s.k);
> select * from test;
> #k=4
> dfs -ls '/warehouse/tablespace/managed/hive/test';
> #drwxrwx---+  - hive hive          0 2021-08-09 12:26 /warehouse/tablespace/managed/hive/test/base_0000008_v0000416
> #drwxrwx---+  - hive hive          0 2021-08-09 12:28 /warehouse/tablespace/managed/hive/test/delete_delta_0000009_0000009_0001
> #drwxrwx---+  - hive hive          0 2021-08-09 12:29 /warehouse/tablespace/managed/hive/test/delete_delta_0000010_0000010_0001
> #drwxrwx---+  - hive hive          0 2021-08-09 12:29 /warehouse/tablespace/managed/hive/test/delete_delta_0000011_0000011_0001
> #drwxrwx---+  - hive hive          0 2021-08-09 12:28 /warehouse/tablespace/managed/hive/test/delta_0000009_0000009_0003
> #drwxrwx---+  - hive hive          0 2021-08-09 12:29 /warehouse/tablespace/managed/hive/test/delta_0000010_0000010_0003
> #drwxrwx---+  - hive hive          0 2021-08-09 12:29 /warehouse/tablespace/managed/hive/test/delta_0000011_0000011_0003
> alter table test compact 'major' and wait;
> select * from test;
> #result is empty
> dfs -ls '/warehouse/tablespace/managed/hive/test';
> #2	drwxrwx---+  - hive hive          0 2021-08-09 12:31 /warehouse/tablespace/managed/hive/test/base_0000011_v0000428
> {code}
> Some logs from the above example: 
> {code}
> 2021-08-09 12:30:37,532 WARN  org.apache.hadoop.hive.ql.txn.compactor.CompactorMR: [nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49_executor]: 6 delta files found for default.test located at hdfs://nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site:8020/warehouse/tablespace/managed/hive/test! This is likely a sign of misconfiguration, especially if this message repeats.  Check that compaction is running properly.  Check for any runaway/mis-configured process writing to ACID tables, especially using Streaming Ingest API.
> 2021-08-09 12:30:37,533 INFO  org.apache.hadoop.hive.ql.txn.compactor.CompactorMR: [nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49_executor]: Submitting MINOR compaction job 'nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49-compactor-default.test_0' to default queue.  (current delta dirs count=5, obsolete delta dirs count=-1. TxnIdRange[9,11]
> 2021-08-09 12:30:38,003 INFO  org.apache.hadoop.hive.ql.txn.compactor.CompactorMR: [nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49_executor]: Submitted compaction job 'nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49-compactor-default.test_0' with jobID=job_1628497133224_0051 compaction ID=23
> #From app logs of the minor compaction, note that delta_0000011_0000011_0001 is missing from the list
> 2021-08-09 12:30:47,399 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: CompactorInputSplit{base: null, bucket: 0, length: 3231, deltas: [delete_delta_0000009_0000009_0001, delta_0000009_0000009_0003, delete_delta_0000010_0000010_0001, delta_0000010_0000010_0003, delete_delta_0000011_0000011_0001]}
> 2021-08-09 12:30:53,061 INFO  org.apache.hadoop.hive.ql.txn.compactor.CompactorMR: [nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49_executor]: Submitting MAJOR compaction job 'nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49-compactor-default.test' to default queue.  (current delta dirs count=2, obsolete delta dirs count=6. TxnIdRange[9,11]
> 2021-08-09 12:30:53,501 INFO  org.apache.hadoop.hive.ql.txn.compactor.CompactorMR: [nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49_executor]: Submitted compaction job 'nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49-compactor-default.test' with jobID=job_1628497133224_0052 compaction ID=23
> 2021-08-09 12:31:03,493 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: CompactorInputSplit{base: hdfs://nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site:8020/warehouse/tablespace/managed/hive/test/base_0000008_v0000416, bucket: 0, length: 1697, deltas: [delete_delta_0000009_0000011_v0000428, delta_0000009_0000011_v0000428]}
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)