You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/03/31 07:21:09 UTC

[GitHub] [hive] klcopp commented on a change in pull request #2134: HIVE-24943: Initiator: Optimise when tables/partitions are not eligib…

klcopp commented on a change in pull request #2134:
URL: https://github.com/apache/hive/pull/2134#discussion_r604650698



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
##########
@@ -444,29 +447,47 @@ private static boolean isDynPartIngest(Table t, CompactionInfo ci){
     return false;
   }
 
-  private boolean isEligibleForCompaction(CompactionInfo ci, ShowCompactResponse currentCompactions) {
-    LOG.info("Checking to see if we should compact " + ci.getFullPartitionName());
-
-    // Check if we already have initiated or are working on a compaction for this partition
-    // or table. If so, skip it. If we are just waiting on cleaning we can still check,
-    // as it may be time to compact again even though we haven't cleaned.
-    // todo: this is not robust. You can easily run `alter table` to start a compaction between
-    // the time currentCompactions is generated and now
-    if (lookForCurrentCompactions(currentCompactions, ci)) {
-      LOG.info("Found currently initiated or working compaction for " +
-          ci.getFullPartitionName() + " so we will not initiate another compaction");
-      return false;
-    }
-
+  private boolean isEligibleForCompaction(CompactionInfo ci,
+      ShowCompactResponse currentCompactions, Set<String> skipDBs, Set<String> skipTables) {
     try {
+      if (skipDBs.contains(ci.dbname)) {
+        LOG.debug("Skipping {}::{}, skipDBs:{}", ci.dbname, ci.tableName, skipDBs);

Review comment:
       Consider making this info-level

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
##########
@@ -444,29 +447,47 @@ private static boolean isDynPartIngest(Table t, CompactionInfo ci){
     return false;
   }
 
-  private boolean isEligibleForCompaction(CompactionInfo ci, ShowCompactResponse currentCompactions) {
-    LOG.info("Checking to see if we should compact " + ci.getFullPartitionName());
-
-    // Check if we already have initiated or are working on a compaction for this partition
-    // or table. If so, skip it. If we are just waiting on cleaning we can still check,
-    // as it may be time to compact again even though we haven't cleaned.
-    // todo: this is not robust. You can easily run `alter table` to start a compaction between
-    // the time currentCompactions is generated and now
-    if (lookForCurrentCompactions(currentCompactions, ci)) {
-      LOG.info("Found currently initiated or working compaction for " +
-          ci.getFullPartitionName() + " so we will not initiate another compaction");
-      return false;
-    }
-
+  private boolean isEligibleForCompaction(CompactionInfo ci,
+      ShowCompactResponse currentCompactions, Set<String> skipDBs, Set<String> skipTables) {
     try {
+      if (skipDBs.contains(ci.dbname)) {
+        LOG.debug("Skipping {}::{}, skipDBs:{}", ci.dbname, ci.tableName, skipDBs);
+        return false;
+      } else {
+        if (replIsCompactionDisabledForDatabase(ci.dbname)) {
+          skipDBs.add(ci.dbname);
+          LOG.debug("Skipping {}::{}, skipDBs:{}", ci.dbname, ci.tableName, skipDBs);

Review comment:
       Especially consider making this info-level. Also it would be nice if the messages at lines 459 and 454 weren't the same

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
##########
@@ -123,11 +124,13 @@ public void run() {
 
           ShowCompactResponse currentCompactions = txnHandler.showCompact(new ShowCompactRequest());
 
+          Set<String> skipDBs = new HashSet<>();
+          Set<String> skipTables = new HashSet<>();

Review comment:
       IIUC compaction should resume on dbs and tables after first successful incremental load. How will these sets be updated to reflect that?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org