You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by "akshat0395 (via GitHub)" <gi...@apache.org> on 2023/03/20 22:01:46 UTC

[GitHub] [hive] akshat0395 commented on a diff in pull request #4091: HIVE-27020: Implement a separate handler to handle aborted transaction cleanup

akshat0395 commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1142680898


##########
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java:
##########
@@ -76,10 +83,7 @@
 import static org.apache.hadoop.hive.ql.TestTxnCommands2.runWorker;
 import static org.junit.Assert.assertEquals;
 import static org.mockito.ArgumentMatchers.any;
-import static org.mockito.Mockito.doAnswer;
-import static org.mockito.Mockito.doThrow;
-import static org.mockito.Mockito.times;
-import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.*;

Review Comment:
   I see Mockito is already imported on line 67, we should avoid wildcard imports and let import specific classes that are required



##########
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java:
##########
@@ -1064,7 +1068,80 @@ public void testCleanAbortCompactAfterAbort() throws Exception {
     connection2.close();
   }
 
+  @Test
+  public void testAbortAfterMarkCleaned() throws Exception {
+    boolean useCleanerForAbortCleanup = MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.COMPACTOR_CLEAN_ABORTS_USING_CLEANER);

Review Comment:
   We can avoid using this config all together for this test as the only place I can see this being use is for the `if` statement.
   As the entire test logic depend on COMPACTOR_CLEAN_ABORTS_USING_CLEANER being true, we can simply use 
   assumeTrue method and run the test case. `assumeTrue(MetastoreConf.ConfVars.COMPACTOR_CLEAN_ABORTS_USING_CLEANER)` to run the test case 
   The default value of this config is also true, Hence I feel this if condition can be avoided. WDYT
   



##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##########
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends AcidTxnCleaner {

Review Comment:
   Do we have unit test for this class?



##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##########
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends AcidTxnCleaner {
+
+  private static final Logger LOG = LoggerFactory.getLogger(AbortedTxnCleaner.class.getName());
+
+  public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler,
+                           MetadataCache metadataCache, boolean metricsEnabled,
+                           FSRemover fsRemover) {
+    super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover);
+  }
+
+  /**
+   The following cleanup is based on the following idea - <br>
+   1. Aborted cleanup is independent of compaction. This is because directories which are written by
+      aborted txns are not visible by any open txns. It is only visible while determining the AcidState (which
+      only sees the aborted deltas and does not read the file).<br><br>
+
+   The following algorithm is used to clean the set of aborted directories - <br>
+      a. Find the list of entries which are suitable for cleanup (This is done in {@link TxnStore#findReadyToCleanForAborts(long, int)}).<br>
+      b. If the table/partition does not exist, then remove the associated aborted entry in TXN_COMPONENTS table. <br>
+      c. Get the AcidState of the table by using the min open txnID, database name, tableName, partition name, highest write ID <br>
+      d. Fetch the aborted directories and delete the directories. <br>
+      e. Fetch the aborted write IDs from the AcidState and use it to delete the associated metadata in the TXN_COMPONENTS table.
+   **/
+  @Override
+  public List<Runnable> getTasks() throws MetaException {
+    int abortedThreshold = HiveConf.getIntVar(conf,
+              HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD);
+    long abortedTimeThreshold = HiveConf
+              .getTimeVar(conf, HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD,
+                      TimeUnit.MILLISECONDS);
+    List<AcidTxnInfo> readyToCleanAborts = txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold);
+
+    if (!readyToCleanAborts.isEmpty()) {
+      return readyToCleanAborts.stream().map(ci -> ThrowingRunnable.unchecked(() ->
+                      clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE, metricsEnabled)))
+              .collect(Collectors.toList());
+    }
+    return Collections.emptyList();
+  }
+
+  private void clean(AcidTxnInfo info, long minOpenTxn, boolean metricsEnabled) throws MetaException {
+    LOG.info("Starting cleaning for {}", info);
+    PerfLogger perfLogger = PerfLogger.getPerfLogger(false);
+    String cleanerMetric = MetricsConstants.COMPACTION_CLEANER_CYCLE + "_";
+    try {
+      if (metricsEnabled) {
+        perfLogger.perfLogBegin(AbortedTxnCleaner.class.getName(), cleanerMetric);
+      }
+      Table t;
+      Partition p = null;
+      t = metadataCache.computeIfAbsent(info.getFullTableName(), () -> resolveTable(info.dbname, info.tableName));
+      if (isNull(t)) {
+        // The table was dropped before we got around to cleaning it.
+        LOG.info("Unable to find table {}, assuming it was dropped.", info.getFullTableName());
+        txnHandler.markCleanedForAborts(info);
+        return;
+      }
+      if (MetaStoreUtils.isNoCleanUpSet(t.getParameters())) {
+        // The table was marked no clean up true.
+        LOG.info("Skipping table {} clean up, as NO_CLEANUP set to true", info.getFullTableName());
+        return;
+      }
+      if (!isNull(info.partName)) {
+        p = resolvePartition(info.dbname, info.tableName, info.partName);
+        if (isNull(p)) {
+          // The partition was dropped before we got around to cleaning it.
+          LOG.info("Unable to find partition {}, assuming it was dropped.",
+                  info.getFullPartitionName());
+          txnHandler.markCleanedForAborts(info);
+          return;
+        }
+        if (MetaStoreUtils.isNoCleanUpSet(p.getParameters())) {
+          // The partition was marked no clean up true.
+          LOG.info("Skipping partition {} clean up, as NO_CLEANUP set to true", info.getFullPartitionName());
+          return;
+        }
+      }
+
+      String location = CompactorUtil.resolveStorageDescriptor(t,p).getLocation();
+      info.runAs = TxnUtils.findUserToRunAs(location, t, conf);
+      abortCleanUsingAcidDir(info, location, minOpenTxn);
+
+    } catch (Exception e) {
+      LOG.error("Caught exception when cleaning, unable to complete cleaning of {} due to {}", info,
+              e.getMessage());
+      throw new MetaException(e.getMessage());
+    } finally {
+      if (metricsEnabled) {
+        perfLogger.perfLogEnd(AbortedTxnCleaner.class.getName(), cleanerMetric);

Review Comment:
   Should we add a retry mechanism? Any thoughts @SourabhBadhya @veghlaci05 



##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##########
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends AcidTxnCleaner {
+
+  private static final Logger LOG = LoggerFactory.getLogger(AbortedTxnCleaner.class.getName());
+
+  public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler,
+                           MetadataCache metadataCache, boolean metricsEnabled,
+                           FSRemover fsRemover) {
+    super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover);
+  }
+
+  /**
+   The following cleanup is based on the following idea - <br>
+   1. Aborted cleanup is independent of compaction. This is because directories which are written by
+      aborted txns are not visible by any open txns. It is only visible while determining the AcidState (which
+      only sees the aborted deltas and does not read the file).<br><br>
+
+   The following algorithm is used to clean the set of aborted directories - <br>
+      a. Find the list of entries which are suitable for cleanup (This is done in {@link TxnStore#findReadyToCleanForAborts(long, int)}).<br>
+      b. If the table/partition does not exist, then remove the associated aborted entry in TXN_COMPONENTS table. <br>
+      c. Get the AcidState of the table by using the min open txnID, database name, tableName, partition name, highest write ID <br>
+      d. Fetch the aborted directories and delete the directories. <br>
+      e. Fetch the aborted write IDs from the AcidState and use it to delete the associated metadata in the TXN_COMPONENTS table.
+   **/
+  @Override
+  public List<Runnable> getTasks() throws MetaException {
+    int abortedThreshold = HiveConf.getIntVar(conf,
+              HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD);
+    long abortedTimeThreshold = HiveConf
+              .getTimeVar(conf, HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD,
+                      TimeUnit.MILLISECONDS);
+    List<AcidTxnInfo> readyToCleanAborts = txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold);
+
+    if (!readyToCleanAborts.isEmpty()) {
+      return readyToCleanAborts.stream().map(ci -> ThrowingRunnable.unchecked(() ->
+                      clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE, metricsEnabled)))
+              .collect(Collectors.toList());
+    }
+    return Collections.emptyList();
+  }
+
+  private void clean(AcidTxnInfo info, long minOpenTxn, boolean metricsEnabled) throws MetaException {
+    LOG.info("Starting cleaning for {}", info);
+    PerfLogger perfLogger = PerfLogger.getPerfLogger(false);
+    String cleanerMetric = MetricsConstants.COMPACTION_CLEANER_CYCLE + "_";
+    try {
+      if (metricsEnabled) {
+        perfLogger.perfLogBegin(AbortedTxnCleaner.class.getName(), cleanerMetric);
+      }
+      Table t;
+      Partition p = null;
+      t = metadataCache.computeIfAbsent(info.getFullTableName(), () -> resolveTable(info.dbname, info.tableName));
+      if (isNull(t)) {
+        // The table was dropped before we got around to cleaning it.
+        LOG.info("Unable to find table {}, assuming it was dropped.", info.getFullTableName());
+        txnHandler.markCleanedForAborts(info);
+        return;
+      }
+      if (MetaStoreUtils.isNoCleanUpSet(t.getParameters())) {
+        // The table was marked no clean up true.
+        LOG.info("Skipping table {} clean up, as NO_CLEANUP set to true", info.getFullTableName());
+        return;
+      }
+      if (!isNull(info.partName)) {
+        p = resolvePartition(info.dbname, info.tableName, info.partName);
+        if (isNull(p)) {
+          // The partition was dropped before we got around to cleaning it.
+          LOG.info("Unable to find partition {}, assuming it was dropped.",
+                  info.getFullPartitionName());
+          txnHandler.markCleanedForAborts(info);
+          return;
+        }
+        if (MetaStoreUtils.isNoCleanUpSet(p.getParameters())) {
+          // The partition was marked no clean up true.
+          LOG.info("Skipping partition {} clean up, as NO_CLEANUP set to true", info.getFullPartitionName());
+          return;
+        }
+      }
+
+      String location = CompactorUtil.resolveStorageDescriptor(t,p).getLocation();
+      info.runAs = TxnUtils.findUserToRunAs(location, t, conf);
+      abortCleanUsingAcidDir(info, location, minOpenTxn);
+
+    } catch (Exception e) {

Review Comment:
   We should also capture more specific exception related to location resolution, this will help identifying access/location related issues faster



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java:
##########
@@ -84,6 +84,28 @@ public static ValidTxnList createValidTxnListForCleaner(GetOpenTxnsResponse txns
     // string as they'd have to know which object to instantiate
     return new ValidReadTxnList(abortedTxns, bitSet, highWaterMark, Long.MAX_VALUE);
   }
+
+  public static ValidTxnList createValidTxnListForTxnAbortedCleaner(GetOpenTxnsResponse txns, long minOpenTxn) {
+    long highWaterMark = minOpenTxn - 1;

Review Comment:
   nit: Watermark is a single word, we can rename to highWatermark



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org