You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/01/21 07:55:28 UTC

[GitHub] [hive] deniskuzZ commented on a change in pull request #2958: HIVE-25883: Enhance Compaction Cleaner to skip when there is nothing to do

deniskuzZ commented on a change in pull request #2958:
URL: https://github.com/apache/hive/pull/2958#discussion_r789180459



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##########
@@ -410,6 +409,10 @@ private boolean removeFiles(String location, ValidWriteIdList writeIdList, Compa
       // Including obsolete directories for partitioned tables can result in data loss.
       obsoleteDirs = dir.getAbortedDirectories();
     }
+    if (obsoleteDirs.isEmpty() && !dir.hasDataBelowWatermark(writeIdList.getHighWatermark())) {

Review comment:
       that won't work as dir.getCurrentDirectories() is limited by HWM:
   try
   ````
       Table t = newTable("default", "camtc", false);
       openTxn();
       
       addBaseFile(t, null, 19L, 20);
       addBaseFile(t, null, 20L, 20);
       addDeltaFile(t, null, 21L, 22L, 2);
       burnThroughTransactions("default", "camtc", 22);
   
       addDeltaFile(t, null, 24L, 25L, 2);
       burnThroughTransactions("default", "camtc", 3);
       
       CompactionRequest rqst = new CompactionRequest("default", "camtc", CompactionType.MAJOR);
       long compactTxn = compactInTxn(rqst);
       addBaseFile(t, null, 25L, 25, compactTxn);
   
       startCleaner();
   ````
   in the above test, obsolete and current dirs would be empty.
   
   you should list the whole dir:  
   ````
   new FileGenerator(context, ()->dir.getFileSystem(conf), dir, useFileIds, ugi)
   ````




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org