You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/01/03 10:03:00 UTC
[jira] [Work logged] (HIVE-26825) Compactor: Cleaner shouldn't fetch table details again and again for partitioned tables

     [ https://issues.apache.org/jira/browse/HIVE-26825?focusedWorklogId=836574&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-836574 ]

ASF GitHub Bot logged work on HIVE-26825:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Jan/23 10:02
            Start Date: 03/Jan/23 10:02
    Worklog Time Spent: 10m 
      Work Description: veghlaci05 commented on code in PR #3864:
URL: https://github.com/apache/hive/pull/3864#discussion_r1060420473


##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java:
##########
@@ -133,4 +138,14 @@ protected static long updateCycleDurationMetric(String metric, long startedAt) {
     }
     return 0;
   }
+  <T extends TBase<T,?>> T computeIfAbsent(Optional<Cache<String, TBase>> metaCache, String key, Callable<T> callable) throws Exception {

Review Comment:
   Nit: new line between methods



##########
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCleaner.java:
##########
@@ -1093,5 +1095,34 @@ public void testReady() throws Exception {
     Assert.assertEquals(TxnStore.CLEANING_RESPONSE, rsp.getCompacts().get(0).getState());
   }
 
+  @Test
+  public void testMetaCache() throws Exception {
+    conf.setBoolVar(HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED, false);
+
+    Table t = newTable("default", "retry_test", false);
+
+    addBaseFile(t, null, 20L, 20);
+    addDeltaFile(t, null, 21L, 22L, 2);
+    addDeltaFile(t, null, 23L, 24L, 2);
+    burnThroughTransactions("default", "retry_test", 25);
+
+    CompactionRequest rqst = new CompactionRequest("default", "retry_test", CompactionType.MAJOR);
+    long compactTxn = compactInTxn(rqst);
+    addBaseFile(t, null, 25L, 25, compactTxn);
+
+    //Prevent cleaner from marking the compaction as cleaned
+    TxnStore mockedHandler = spy(txnHandler);
+    doThrow(new RuntimeException()).when(mockedHandler).markCleaned(nullable(CompactionInfo.class));
+    Cleaner cleaner = Mockito.spy(new Cleaner());
+    cleaner.setConf(conf);
+    cleaner.init(new AtomicBoolean(true));
+    cleaner.run();
+
+    ShowCompactResponse rsp = txnHandler.showCompact(new ShowCompactRequest());
+    List<ShowCompactResponseElement> compacts = rsp.getCompacts();
+    Assert.assertEquals(1, compacts.size());
+    Mockito.verify(cleaner, times(1)).resolveTable(Mockito.any());

Review Comment:
   For a single run without removing any files, `resolveTable` will be called only once anyways so this test does not ensure that the cache is used. You should run cleaning for the **same table** twice  and assert that `resolveTable` called only once, while `computeIfAbsent` called twice.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 836574)
    Time Spent: 40m  (was: 0.5h)

> Compactor: Cleaner shouldn't fetch table details again and again for partitioned tables
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-26825
>                 URL: https://issues.apache.org/jira/browse/HIVE-26825
>             Project: Hive
>          Issue Type: Improvement
>          Components: Transactions
>            Reporter: KIRTI RUGE
>            Assignee: KIRTI RUGE
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Cleaner shouldn't be fetch table/partition details for all its partitions. When there are large number of databases/tables, it takes lot of time for Initiator to complete its initial iteration and load on DB also goes higher.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)