You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/01/03 10:03:00 UTC
[jira] [Work logged] (HIVE-26825) Compactor: Cleaner shouldn't fetch table details again and again for partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-26825?focusedWorklogId=836574&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-836574 ]
ASF GitHub Bot logged work on HIVE-26825:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 03/Jan/23 10:02
Start Date: 03/Jan/23 10:02
Worklog Time Spent: 10m
Work Description: veghlaci05 commented on code in PR #3864:
URL: https://github.com/apache/hive/pull/3864#discussion_r1060420473
##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java:
##########
@@ -133,4 +138,14 @@ protected static long updateCycleDurationMetric(String metric, long startedAt) {
}
return 0;
}
+ <T extends TBase<T,?>> T computeIfAbsent(Optional<Cache<String, TBase>> metaCache, String key, Callable<T> callable) throws Exception {
Review Comment:
Nit: new line between methods
##########
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCleaner.java:
##########
@@ -1093,5 +1095,34 @@ public void testReady() throws Exception {
Assert.assertEquals(TxnStore.CLEANING_RESPONSE, rsp.getCompacts().get(0).getState());
}
+ @Test
+ public void testMetaCache() throws Exception {
+ conf.setBoolVar(HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED, false);
+
+ Table t = newTable("default", "retry_test", false);
+
+ addBaseFile(t, null, 20L, 20);
+ addDeltaFile(t, null, 21L, 22L, 2);
+ addDeltaFile(t, null, 23L, 24L, 2);
+ burnThroughTransactions("default", "retry_test", 25);
+
+ CompactionRequest rqst = new CompactionRequest("default", "retry_test", CompactionType.MAJOR);
+ long compactTxn = compactInTxn(rqst);
+ addBaseFile(t, null, 25L, 25, compactTxn);
+
+ //Prevent cleaner from marking the compaction as cleaned
+ TxnStore mockedHandler = spy(txnHandler);
+ doThrow(new RuntimeException()).when(mockedHandler).markCleaned(nullable(CompactionInfo.class));
+ Cleaner cleaner = Mockito.spy(new Cleaner());
+ cleaner.setConf(conf);
+ cleaner.init(new AtomicBoolean(true));
+ cleaner.run();
+
+ ShowCompactResponse rsp = txnHandler.showCompact(new ShowCompactRequest());
+ List<ShowCompactResponseElement> compacts = rsp.getCompacts();
+ Assert.assertEquals(1, compacts.size());
+ Mockito.verify(cleaner, times(1)).resolveTable(Mockito.any());
Review Comment:
For a single run without removing any files, `resolveTable` will be called only once anyways so this test does not ensure that the cache is used. You should run cleaning for the **same table** twice and assert that `resolveTable` called only once, while `computeIfAbsent` called twice.
Issue Time Tracking
-------------------
Worklog Id: (was: 836574)
Time Spent: 40m (was: 0.5h)
> Compactor: Cleaner shouldn't fetch table details again and again for partitioned tables
> ---------------------------------------------------------------------------------------
>
> Key: HIVE-26825
> URL: https://issues.apache.org/jira/browse/HIVE-26825
> Project: Hive
> Issue Type: Improvement
> Components: Transactions
> Reporter: KIRTI RUGE
> Assignee: KIRTI RUGE
> Priority: Major
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Cleaner shouldn't be fetch table/partition details for all its partitions. When there are large number of databases/tables, it takes lot of time for Initiator to complete its initial iteration and load on DB also goes higher.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)