You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/19 14:02:03 UTC

[GitHub] [iceberg] pvary opened a new pull request, #6451: Hive: Lock hardening

pvary opened a new pull request, #6451:
URL: https://github.com/apache/iceberg/pull/6451

   The PR adds:
   
   - If the lock is not successful, then based on the agentInfo field we try to find the created lock
      - If we find it, then we use it
      - If we do not find it, then we retry the lock one more time
   - When unlocking, if we do not have a lockId, we still try to find the lock by the agentInfo field, and we try to remove the lock
      - If there is an interruption during the unlock, then we still try to remove the lock one more time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063348799


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -606,20 +630,13 @@ private StorageDescriptor storageDescriptor(TableMetadata metadata, boolean hive
     return storageDescriptor;
   }
 
-  @SuppressWarnings("ReverseDnsLookup")
   @VisibleForTesting
-  long acquireLock() throws UnknownHostException, TException, InterruptedException {
-    final LockComponent lockComponent =
-        new LockComponent(LockType.EXCLUSIVE, LockLevel.TABLE, database);
-    lockComponent.setTablename(tableName);
-    final LockRequest lockRequest =
-        new LockRequest(
-            Lists.newArrayList(lockComponent),
-            System.getProperty("user.name"),
-            InetAddress.getLocalHost().getHostName());
-    LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
-    long lockId = lockResponse.getLockid();
+  long acquireLock(String agentInfo) throws UnknownHostException, TException, InterruptedException {
+    AtomicReference<Long> lockId = new AtomicReference<>();
+    AtomicReference<LockState> state = new AtomicReference<>();
+
+    // This will set the lockId and state if successful
+    tryLock(agentInfo, lockId, state);

Review Comment:
   Done



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -699,18 +717,53 @@ private void cleanupMetadataAndUnlock(
     } catch (RuntimeException e) {
       LOG.error("Failed to cleanup metadata file at {}", metadataLocation, e);
     } finally {
-      unlock(lockId);
+      unlock(lockId, agentInfo);
       tableLevelMutex.unlock();
     }
   }
 
-  private void unlock(Optional<Long> lockId) {
-    if (lockId.isPresent()) {
-      try {
-        doUnlock(lockId.get());
-      } catch (Exception e) {
-        LOG.warn("Failed to unlock {}.{}", database, tableName, e);
+  private void unlock(Optional<Long> lockId, String agentInfo) {
+    Long id = null;
+    try {
+      if (!lockId.isPresent()) {
+        // Try to find the lock based on agentInfo. Only works with Hive 2 or later.
+        if (MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2)) {
+          AtomicReference<Long> foundLockId = new AtomicReference<>();
+          AtomicReference<LockState> state = new AtomicReference<>();
+          if (!findLock(agentInfo, foundLockId, state)) {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1053007788


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -759,6 +806,29 @@ private static boolean hiveEngineEnabled(TableMetadata metadata, Configuration c
         ConfigProperties.ENGINE_HIVE_ENABLED, TableProperties.ENGINE_HIVE_ENABLED_DEFAULT);
   }
 
+  private Pair<Long, LockState> findLock(String agentInfo) throws TException, InterruptedException {
+    // Search for the locks using HMSClient.showLocks identified by the agentInfo
+    ShowLocksRequest showLocksRequest = new ShowLocksRequest();
+    showLocksRequest.setDbname(database);
+    showLocksRequest.setDbname(tableName);

Review Comment:
   should this be :
   ```suggestion
       showLocksRequest.setTableName(tableName);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1061999067


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreClientVersion.java:
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.hive;
+
+import org.apache.hive.common.util.HiveVersionInfo;
+
+public enum MetastoreClientVersion {
+  HIVE_4(4),
+  HIVE_3(3),
+  HIVE_2(2),
+  HIVE_1_2(1),
+  NOT_SUPPORTED(0);
+
+  private final int order;
+  private static final MetastoreClientVersion current = calculate();
+
+  MetastoreClientVersion(int order) {
+    this.order = order;
+  }
+
+  public static MetastoreClientVersion current() {
+    return current;
+  }
+
+  public static boolean min(MetastoreClientVersion other) {
+    return current.order >= other.order;
+  }
+
+  private static MetastoreClientVersion calculate() {
+    String version = HiveVersionInfo.getShortVersion();
+    String[] versions = version.split("\\.");
+    switch (versions[0]) {
+      case "4":
+        return HIVE_4;
+      case "3":
+        return HIVE_3;
+      case "2":
+        return HIVE_2;
+      case "1":
+        if (versions[1].equals("2")) {
+          return HIVE_1_2;
+        } else {
+          return NOT_SUPPORTED;

Review Comment:
   Why are we explicitly not-supporting versions lesser than Hive 1.2 now?  I don't think there was any checks   previously to throw an exception for this case.  Anyway, we are protecting the agentInfo code already with a Hive 2 check



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1055950865


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -617,9 +624,27 @@ long acquireLock() throws UnknownHostException, TException, InterruptedException
             Lists.newArrayList(lockComponent),
             System.getProperty("user.name"),
             InetAddress.getLocalHost().getHostName());
-    LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
-    long lockId = lockResponse.getLockid();
+
+    // Only works in Hive 2 or later.
+    if (MetastoreVersion.min(MetastoreVersion.HIVE_2)) {
+      lockRequest.setAgentInfo(agentInfo);
+    }
+
+    Pair<Long, LockState> lockResult;

Review Comment:
   I also think we could have number of retries configurable, (and optionally backoff) otherwise it doesnt match how everything else is configurable.



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -617,9 +624,27 @@ long acquireLock() throws UnknownHostException, TException, InterruptedException
             Lists.newArrayList(lockComponent),
             System.getProperty("user.name"),
             InetAddress.getLocalHost().getHostName());
-    LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
-    long lockId = lockResponse.getLockid();
+
+    // Only works in Hive 2 or later.
+    if (MetastoreVersion.min(MetastoreVersion.HIVE_2)) {
+      lockRequest.setAgentInfo(agentInfo);
+    }
+
+    Pair<Long, LockState> lockResult;

Review Comment:
   Can we re-use the Tasks.retry().shouldRetryTest().run(...) like the rest of retries?
   
   That way we get backoff and configurable number of retries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hililiwei commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
hililiwei commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1056350192


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -617,9 +624,27 @@ long acquireLock() throws UnknownHostException, TException, InterruptedException
             Lists.newArrayList(lockComponent),
             System.getProperty("user.name"),
             InetAddress.getLocalHost().getHostName());
-    LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
-    long lockId = lockResponse.getLockid();
+
+    // Only works in Hive 2 or later.
+    if (MetastoreVersion.min(MetastoreVersion.HIVE_2)) {
+      lockRequest.setAgentInfo(agentInfo);
+    }
+
+    Pair<Long, LockState> lockResult;
+    try {
+      lockResult = tryLock(lockRequest);
+    } catch (TException e) {
+      // Try to find the lock. Only works in Hive 2 or later.
+      if (MetastoreVersion.min(MetastoreVersion.HIVE_2)) {
+        lockResult = reTryLock(agentInfo, lockRequest);
+      } else {
+        LOG.info("Could not retry with HMSClient {}", MetastoreVersion.current(), e);

Review Comment:
   Should this be a higher level log, like `warn`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1064208704


##########
docs/configuration.md:
##########
@@ -159,15 +159,24 @@ Here are the catalog properties related to locking. They are used by some catalo
 ## Hadoop configuration
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
-
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+The HMS table locking is a 2-step process:
+- Lock creation - the Lock itself is created and queued
+- Lock acquisition - the queued Lock should wait until every previously created Lock is released
+
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-creation-timeout-ms     | 180000 (3 min)  | Maximum time in milliseconds to create a lock in the HMS                     |
+| iceberg.hive.lock-creation-min-wait-ms    | 50              | Minimum time in milliseconds to check the lock creation in the HMS           |
+| iceberg.hive.lock-creation-max-wait-ms    | 5000            | Maximum time in milliseconds to check the lock creation in the HMS           |
+| iceberg.hive.lock-timeout-ms              | 180000 (3 min)  | Maximum time in milliseconds to acquire a lock                               |
+| iceberg.hive.lock-check-min-wait-ms       | 50              | Minimum time in milliseconds to check back on the status of lock acquisition |
+| iceberg.hive.lock-check-max-wait-ms       | 5000            | Maximum time in milliseconds to check back on the status of lock acquisition |

Review Comment:
   Done with a somewhat reworded way - left out the "retry" from the description, and this one is an absolutely "normal" way of checking the status of the lock - no actual retry is involved



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1064208486


##########
docs/configuration.md:
##########
@@ -160,14 +160,20 @@ Here are the catalog properties related to locking. They are used by some catalo
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
 
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-timeout-ms              | 180000 (3 min)  | Maximum time in milliseconds to acquire a lock                               |

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063863413


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreClientVersion.java:
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.hive;
+
+import org.apache.hive.common.util.HiveVersionInfo;
+
+public enum MetastoreClientVersion {
+  HIVE_4(4),
+  HIVE_3(3),
+  HIVE_2(2),
+  HIVE_1_2(1),
+  NOT_SUPPORTED(0);
+
+  private final int order;
+  private static final MetastoreClientVersion current = calculate();
+
+  MetastoreClientVersion(int order) {
+    this.order = order;
+  }
+
+  public static MetastoreClientVersion current() {
+    return current;
+  }
+
+  public static boolean min(MetastoreClientVersion other) {
+    return current.order >= other.order;
+  }
+
+  private static MetastoreClientVersion calculate() {
+    String version = HiveVersionInfo.getShortVersion();
+    String[] versions = version.split("\\.");
+    switch (versions[0]) {
+      case "4":
+        return HIVE_4;
+      case "3":
+        return HIVE_3;
+      case "2":
+        return HIVE_2;
+      case "1":
+        if (versions[1].equals("2")) {
+          return HIVE_1_2;
+        } else {
+          return NOT_SUPPORTED;

Review Comment:
   Yea makes sense, chatted offline, this is more for code completeness purpose as NOT_SUPPORTED is not used anywhere.  Somehow I mistakenly thought this was going throw an exception.  Im ok with this then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063985982


##########
docs/configuration.md:
##########
@@ -160,14 +160,20 @@ Here are the catalog properties related to locking. They are used by some catalo
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
 
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-timeout-ms              | 180000 (3 min)  | Maximum time in milliseconds to acquire a lock                               |

Review Comment:
   I see, just saw its from another pr and unrelated to this one.  But it still doesnt fit very well with the two phases (creation/acquisition) you put in the first part of the doc, should we just say simply:
   
   
   The HMS table locking is a 2-step process:
   1. Lock create: Create lock in HMS and queue for acquisition
   2. Lock check: Check if lock successfully acquired
   
   This way, they are more relevant to the property names (lock-create, lock-check)  Also wasnt sure about putting the details of 'should wait until every previously created Lock is released', is this the Hive internals?  As I feel it will just confuse more. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063985982


##########
docs/configuration.md:
##########
@@ -160,14 +160,20 @@ Here are the catalog properties related to locking. They are used by some catalo
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
 
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-timeout-ms              | 180000 (3 min)  | Maximum time in milliseconds to acquire a lock                               |

Review Comment:
   I see, just saw its from another pr and unrelated to this one.  But it still doesnt fit very well with the two phases (creation/acquisition) you put in the first part of the doc, should we just say simply:
   
   
   The HMS table locking is a 2-step process:
   1. Lock create: Create lock in HMS and queue for acquisition
   2. Lock check: Check if lock successfully acquired
   
   This way, they are more relevant to the property name.  Also wasnt sure about putting the details of 'should wait until every previously created Lock is released', is this the Hive internals?  As I feel it will just confuse more. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063897801


##########
docs/configuration.md:
##########
@@ -160,14 +160,20 @@ Here are the catalog properties related to locking. They are used by some catalo
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
 
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-timeout-ms              | 180000 (3 min)  | Maximum time in milliseconds to acquire a lock                               |

Review Comment:
   I must have missed the earlier discussion, but should 'iceberg.hive.lock-timeout-ms' be 
   'iceberg.hive.lock-check-timeout' as its used with iceberg.hive.lock-check-min-wait-ms and iceberg.hive.lock-check.max-wait-ms?
   
   I guess the two phases to me would be  :  
   1. Lock Creation request
   2. Lock Check (check for successful acquisition of lock)
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#issuecomment-1376783079

   @RussellSpitzer, @hililiwei, @stevenzwu or anyone else who is interested: If there are no more comments, I would like to merge this PR tomorrow.
   
   Thanks for all the reviews so far!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1064208559


##########
docs/configuration.md:
##########
@@ -159,15 +159,24 @@ Here are the catalog properties related to locking. They are used by some catalo
 ## Hadoop configuration
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
-
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+The HMS table locking is a 2-step process:
+- Lock creation - the Lock itself is created and queued
+- Lock acquisition - the queued Lock should wait until every previously created Lock is released
+
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-creation-timeout-ms     | 180000 (3 min)  | Maximum time in milliseconds to create a lock in the HMS                     |
+| iceberg.hive.lock-creation-min-wait-ms    | 50              | Minimum time in milliseconds to check the lock creation in the HMS           |

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#issuecomment-1378304930

   Merged to master.
   Thanks for all the reviews!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1062807599


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -606,20 +630,13 @@ private StorageDescriptor storageDescriptor(TableMetadata metadata, boolean hive
     return storageDescriptor;
   }
 
-  @SuppressWarnings("ReverseDnsLookup")
   @VisibleForTesting
-  long acquireLock() throws UnknownHostException, TException, InterruptedException {
-    final LockComponent lockComponent =
-        new LockComponent(LockType.EXCLUSIVE, LockLevel.TABLE, database);
-    lockComponent.setTablename(tableName);
-    final LockRequest lockRequest =
-        new LockRequest(
-            Lists.newArrayList(lockComponent),
-            System.getProperty("user.name"),
-            InetAddress.getLocalHost().getHostName());
-    LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
-    long lockId = lockResponse.getLockid();
+  long acquireLock(String agentInfo) throws UnknownHostException, TException, InterruptedException {
+    AtomicReference<Long> lockId = new AtomicReference<>();
+    AtomicReference<LockState> state = new AtomicReference<>();
+
+    // This will set the lockId and state if successful
+    tryLock(agentInfo, lockId, state);

Review Comment:
   Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063371043


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreVersion.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.hive;
+
+public enum MetastoreVersion {
+  HIVE_4("org.apache.hadoop.hive.metastore.api.AlterTableResponse", 4),

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1061839321


##########
docs/configuration.md:
##########
@@ -160,14 +160,20 @@ Here are the catalog properties related to locking. They are used by some catalo
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
 
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-timeout-ms              | 180000 (3 min)  | Maximum time in milliseconds to acquire a lock                               |

Review Comment:
   I'm a little confused on the difference between the lock timeout, lock check and lock request.
   
   They all seem to have very similar descriptions. Could you clarify which is which?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary merged pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary merged PR #6451:
URL: https://github.com/apache/iceberg/pull/6451


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1056518850


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -617,9 +624,27 @@ long acquireLock() throws UnknownHostException, TException, InterruptedException
             Lists.newArrayList(lockComponent),
             System.getProperty("user.name"),
             InetAddress.getLocalHost().getHostName());
-    LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
-    long lockId = lockResponse.getLockid();
+
+    // Only works in Hive 2 or later.
+    if (MetastoreVersion.min(MetastoreVersion.HIVE_2)) {
+      lockRequest.setAgentInfo(agentInfo);
+    }
+
+    Pair<Long, LockState> lockResult;
+    try {
+      lockResult = tryLock(lockRequest);
+    } catch (TException e) {
+      // Try to find the lock. Only works in Hive 2 or later.
+      if (MetastoreVersion.min(MetastoreVersion.HIVE_2)) {
+        lockResult = reTryLock(agentInfo, lockRequest);
+      } else {
+        LOG.info("Could not retry with HMSClient {}", MetastoreVersion.current(), e);

Review Comment:
   The Exception is rethrown, and most likely logged later.
   This log line is only here to highlight the  codepath used
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1056523305


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -617,9 +624,27 @@ long acquireLock() throws UnknownHostException, TException, InterruptedException
             Lists.newArrayList(lockComponent),
             System.getProperty("user.name"),
             InetAddress.getLocalHost().getHostName());
-    LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
-    long lockId = lockResponse.getLockid();
+
+    // Only works in Hive 2 or later.
+    if (MetastoreVersion.min(MetastoreVersion.HIVE_2)) {
+      lockRequest.setAgentInfo(agentInfo);
+    }
+
+    Pair<Long, LockState> lockResult;

Review Comment:
   Let me check this out



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1061835671


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreVersion.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.hive;
+
+public enum MetastoreVersion {
+  HIVE_4("org.apache.hadoop.hive.metastore.api.AlterTableResponse", 4),

Review Comment:
   Does Hive not have a Version API? I think this is fine if this is our only approach it just feels brittle.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063901869


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -263,7 +286,7 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) {
     tableLevelMutex.lock();
     HiveLockHeartbeat hiveLockHeartbeat = null;
     try {
-      lockId = Optional.of(acquireLock());
+      lockId = Optional.ofNullable(acquireLock(agentInfo));

Review Comment:
   Here I get an Intellij warning because acquireLock returns long which is never null , should we keep Optional.of()?  



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -699,18 +717,53 @@ private void cleanupMetadataAndUnlock(
     } catch (RuntimeException e) {
       LOG.error("Failed to cleanup metadata file at {}", metadataLocation, e);
     } finally {
-      unlock(lockId);
+      unlock(lockId, agentInfo);
       tableLevelMutex.unlock();
     }
   }
 
-  private void unlock(Optional<Long> lockId) {
-    if (lockId.isPresent()) {
-      try {
-        doUnlock(lockId.get());
-      } catch (Exception e) {
-        LOG.warn("Failed to unlock {}.{}", database, tableName, e);
+  private void unlock(Optional<Long> lockId, String agentInfo) {
+    Long id = null;
+    try {
+      if (!lockId.isPresent()) {
+        // Try to find the lock based on agentInfo. Only works with Hive 2 or later.
+        if (MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2)) {
+          AtomicReference<Long> foundLockId = new AtomicReference<>();
+          AtomicReference<LockState> state = new AtomicReference<>();
+          if (!findLock(agentInfo, foundLockId, state)) {
+            // No lock found
+            LOG.info("No lock found with {} agentInfo", agentInfo);
+            return;
+          }
+
+          id = foundLockId.get();
+        } else {
+          LOG.warn("Could not find lock with HMSClient {}", MetastoreClientVersion.current());
+          return;
+        }
+      } else {
+        id = lockId.get();
+      }
+
+      doUnlock(id);
+    } catch (InterruptedException ie) {
+      if (id != null) {

Review Comment:
   Thanks for the explanation, makes sense.  It does seem a bit risky for complex handling of interrupt by unlocking again, unless you are very sure thats what caused the issue (if the first call is interrupted because of timeout, I'm not sure what the chance is the unlock retry will succeed), what do you think?  Will leave it to you as you have seen the issue.  Also fyi @stevenzwu 



##########
docs/configuration.md:
##########
@@ -160,14 +160,20 @@ Here are the catalog properties related to locking. They are used by some catalo
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
 
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-timeout-ms              | 180000 (3 min)  | Maximum time in milliseconds to acquire a lock                               |

Review Comment:
   I must have missed the earlier discussion, but should 'iceberg.hive.lock-timeout-ms' be 
   'iceberg.hive.lock-check-timeout' as its used with iceberg.hive.lock-check-min-wait-ms.min-wait-ms and iceberg.hive.lock-check-min-wait-ms.max-wait-ms?
   
   I guess the two phases to me would be  :  
   1. Lock Creation request
   2. Lock Check (check for successful acquisition of lock)
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063372178


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -759,6 +812,98 @@ private static boolean hiveEngineEnabled(TableMetadata metadata, Configuration c
         ConfigProperties.ENGINE_HIVE_ENABLED, TableProperties.ENGINE_HIVE_ENABLED_DEFAULT);
   }
 
+  @SuppressWarnings("ReverseDnsLookup")
+  private void tryLock(
+      String agentInfo, AtomicReference<Long> lockId, AtomicReference<LockState> state)
+      throws UnknownHostException, TException {
+    final LockComponent lockComponent =
+        new LockComponent(LockType.EXCLUSIVE, LockLevel.TABLE, database);
+    lockComponent.setTablename(tableName);
+    final LockRequest lockRequest =
+        new LockRequest(
+            Lists.newArrayList(lockComponent),
+            System.getProperty("user.name"),
+            InetAddress.getLocalHost().getHostName());
+
+    // Only works in Hive 2 or later.
+    if (MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2)) {
+      lockRequest.setAgentInfo(agentInfo);
+    }
+
+    Tasks.foreach(lockRequest)
+        .retry(Integer.MAX_VALUE - 100)
+        .exponentialBackoff(
+            lockCreationMinWaitTime, lockCreationMaxWaitTime, lockCreationTimeout, 2.0)
+        .shouldRetryTest(
+            e ->
+                e instanceof TException
+                    && MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2))
+        .throwFailureWhenFinished()
+        .run(
+            request -> {
+              try {
+                LockResponse lockResponse = metaClients.run(client -> client.lock(request));
+                lockId.set(lockResponse.getLockid());
+                state.set(lockResponse.getState());
+              } catch (TException te) {
+                LOG.warn("Failed to acquire lock {}", request, te);
+                try {
+                  // If we can not check for lock, or we do not find it, then rethrow the exception
+                  // Otherwise we are happy as the findLock sets the lockId and the state correctly
+                  if (!MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2)
+                      || !findLock(agentInfo, lockId, state)) {
+                    throw te;
+                  } else {
+                    LOG.info(
+                        "Found lock by agentInfo {} with id {} and state {}",
+                        agentInfo,
+                        lockId,
+                        state);
+                  }
+                } catch (InterruptedException e) {
+                  Thread.currentThread().interrupt();
+                  LOG.warn(
+                      "Interrupted while checking for lock on table {}.{}", database, tableName, e);
+                  throw new RuntimeException("Interrupted while checking for lock", e);
+                }
+              } catch (InterruptedException e) {
+                Thread.currentThread().interrupt();
+                LOG.warn("Interrupted while acquiring lock on table {}.{}", database, tableName, e);
+                throw new RuntimeException("Interrupted while acquiring lock", e);
+              }
+            },
+            TException.class);
+  }
+
+  /**
+   * Search for the locks using HMSClient.showLocks identified by the agentInfo. If the lock is
+   * there, then the lockId and the state is set based on the result.
+   *
+   * @param agentInfo The key for searching the locks
+   * @param lockId The lockId if the lock has found
+   * @param state The state if the lock has found
+   * @return <code>true</code> if the the lock is there <code>false</code> if the lock is missing
+   */
+  private boolean findLock(
+      String agentInfo, AtomicReference<Long> lockId, AtomicReference<LockState> state)
+      throws TException, InterruptedException {
+    ShowLocksRequest showLocksRequest = new ShowLocksRequest();
+    showLocksRequest.setDbname(database);
+    showLocksRequest.setTablename(tableName);
+    ShowLocksResponse response = metaClients.run(client -> client.showLocks(showLocksRequest));
+    for (ShowLocksResponseElement lock : response.getLocks()) {
+      if (lock.getAgentInfo().equals(agentInfo)) {

Review Comment:
   Added the precondition check to the method



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063348152


##########
docs/configuration.md:
##########
@@ -160,14 +160,20 @@ Here are the catalog properties related to locking. They are used by some catalo
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
 
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-timeout-ms              | 180000 (3 min)  | Maximum time in milliseconds to acquire a lock                               |

Review Comment:
   If you have even better suggestion, I am open to update, so feel free to comment 😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063970277


##########
docs/configuration.md:
##########
@@ -160,14 +160,20 @@ Here are the catalog properties related to locking. They are used by some catalo
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
 
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-timeout-ms              | 180000 (3 min)  | Maximum time in milliseconds to acquire a lock                               |

Review Comment:
   `iceberg.hive.lock-timeout-ms` is an already existing configuration. Is there an existing way to seamlessly deprecate configuration keys in favor of a new one? I did not want to break existing uses of this configuration key.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063368432


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -699,18 +717,53 @@ private void cleanupMetadataAndUnlock(
     } catch (RuntimeException e) {
       LOG.error("Failed to cleanup metadata file at {}", metadataLocation, e);
     } finally {
-      unlock(lockId);
+      unlock(lockId, agentInfo);
       tableLevelMutex.unlock();
     }
   }
 
-  private void unlock(Optional<Long> lockId) {
-    if (lockId.isPresent()) {
-      try {
-        doUnlock(lockId.get());
-      } catch (Exception e) {
-        LOG.warn("Failed to unlock {}.{}", database, tableName, e);
+  private void unlock(Optional<Long> lockId, String agentInfo) {
+    Long id = null;
+    try {
+      if (!lockId.isPresent()) {
+        // Try to find the lock based on agentInfo. Only works with Hive 2 or later.
+        if (MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2)) {
+          AtomicReference<Long> foundLockId = new AtomicReference<>();
+          AtomicReference<LockState> state = new AtomicReference<>();
+          if (!findLock(agentInfo, foundLockId, state)) {
+            // No lock found
+            LOG.info("No lock found with {} agentInfo", agentInfo);
+            return;
+          }
+
+          id = foundLockId.get();
+        } else {
+          LOG.warn("Could not find lock with HMSClient {}", MetastoreClientVersion.current());
+          return;
+        }
+      } else {
+        id = lockId.get();
+      }
+
+      doUnlock(id);
+    } catch (InterruptedException ie) {
+      if (id != null) {

Review Comment:
   If the `tryLock` is failed because of an `InterruptedException` it will throw a `RuntimeException` and does not set the `lockId`:
   ```
                 } catch (InterruptedException e) {
                   Thread.currentThread().interrupt();
                   LOG.warn("Interrupted while acquiring lock on table {}.{}", database, tableName, e);
                   throw new RuntimeException("Interrupted while acquiring lock", e);
                 }
   ```
   
   If the `lockId` is not set then we try to use `findLock` to find the lockId, and then `doUnlock` to unlock the table.
   
   We need to harden the HMS `lock` request specifically, because it is not handled by the `RetryingMetastoreClient` because of this annotation:
   ```
     @RetrySemantics.CannotRetry
     public LockResponse lock(LockRequest rqst) throws NoSuchTxnException, TxnAbortedException, MetaException {
   ```
   For more details see: https://github.com/apache/hive/blob/rel/release-2.3.9/metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L855-L871
   
   Since both the `unlock`, and the `findLock` is retried, I think we should not do any extra work there, and expect that the `RetryingMetastoreClient` will fix our issues when possible



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1061857857


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreVersion.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.hive;
+
+public enum MetastoreVersion {
+  HIVE_4("org.apache.hadoop.hive.metastore.api.AlterTableResponse", 4),

Review Comment:
   I have known only about this one [one](https://issues.apache.org/jira/browse/HIVE-21484) starting from 2.4/3.2.
   
   After your pointer, I have found `HiveVersionInfo` which is available, so I was able to fix this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1061865620


##########
docs/configuration.md:
##########
@@ -160,14 +160,20 @@ Here are the catalog properties related to locking. They are used by some catalo
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
 
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-timeout-ms              | 180000 (3 min)  | Maximum time in milliseconds to acquire a lock                               |

Review Comment:
   Tried to rename and add some description.
   Would this be better?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1062147705


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -606,20 +630,13 @@ private StorageDescriptor storageDescriptor(TableMetadata metadata, boolean hive
     return storageDescriptor;
   }
 
-  @SuppressWarnings("ReverseDnsLookup")
   @VisibleForTesting
-  long acquireLock() throws UnknownHostException, TException, InterruptedException {
-    final LockComponent lockComponent =
-        new LockComponent(LockType.EXCLUSIVE, LockLevel.TABLE, database);
-    lockComponent.setTablename(tableName);
-    final LockRequest lockRequest =
-        new LockRequest(
-            Lists.newArrayList(lockComponent),
-            System.getProperty("user.name"),
-            InetAddress.getLocalHost().getHostName());
-    LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
-    long lockId = lockResponse.getLockid();
+  long acquireLock(String agentInfo) throws UnknownHostException, TException, InterruptedException {
+    AtomicReference<Long> lockId = new AtomicReference<>();
+    AtomicReference<LockState> state = new AtomicReference<>();
+
+    // This will set the lockId and state if successful
+    tryLock(agentInfo, lockId, state);

Review Comment:
   I started with this solution, but changed that later. Let me check again if I find the reason why I ended up using AtomicReferences again. Maybe this had to do something with the lambda expressions.
   I also prefer returning a LockInfo above using references 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1062153218


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -699,18 +717,53 @@ private void cleanupMetadataAndUnlock(
     } catch (RuntimeException e) {
       LOG.error("Failed to cleanup metadata file at {}", metadataLocation, e);
     } finally {
-      unlock(lockId);
+      unlock(lockId, agentInfo);
       tableLevelMutex.unlock();
     }
   }
 
-  private void unlock(Optional<Long> lockId) {
-    if (lockId.isPresent()) {
-      try {
-        doUnlock(lockId.get());
-      } catch (Exception e) {
-        LOG.warn("Failed to unlock {}.{}", database, tableName, e);
+  private void unlock(Optional<Long> lockId, String agentInfo) {
+    Long id = null;
+    try {
+      if (!lockId.isPresent()) {
+        // Try to find the lock based on agentInfo. Only works with Hive 2 or later.
+        if (MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2)) {
+          AtomicReference<Long> foundLockId = new AtomicReference<>();
+          AtomicReference<LockState> state = new AtomicReference<>();
+          if (!findLock(agentInfo, foundLockId, state)) {
+            // No lock found
+            LOG.info("No lock found with {} agentInfo", agentInfo);
+            return;
+          }
+
+          id = foundLockId.get();
+        } else {
+          LOG.warn("Could not find lock with HMSClient {}", MetastoreClientVersion.current());
+          return;
+        }
+      } else {
+        id = lockId.get();
+      }
+
+      doUnlock(id);
+    } catch (InterruptedException ie) {
+      if (id != null) {

Review Comment:
   TBH this was the original change which started my thinking and exploded into a general hardening 😄 
   
   In Flink we had the following situation:
   - Commited a change and tried to unlock the table
   - The first unlock failed because of a network issue
   - RetryingMetastoreClient started the sleep between retries
   - The job was interrupted
   - The lock was not removed
   
   So this change aims to try to unlock the table in any case, especially in the situation when there is an interrupt.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hililiwei commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
hililiwei commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1056353089


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -759,6 +816,40 @@ private static boolean hiveEngineEnabled(TableMetadata metadata, Configuration c
         ConfigProperties.ENGINE_HIVE_ENABLED, TableProperties.ENGINE_HIVE_ENABLED_DEFAULT);
   }
 
+  private Pair<Long, LockState> findLock(String agentInfo) throws TException, InterruptedException {
+    // Search for the locks using HMSClient.showLocks identified by the agentInfo
+    ShowLocksRequest showLocksRequest = new ShowLocksRequest();
+    showLocksRequest.setDbname(database);
+    showLocksRequest.setTablename(tableName);
+    ShowLocksResponse response = metaClients.run(client -> client.showLocks(showLocksRequest));
+    for (ShowLocksResponseElement lock : response.getLocks()) {

Review Comment:
   Not familiar with this. But excuse, is it possible to have multiple agentInfo locked in it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#issuecomment-1359346478

   I am not sure who is on PTO, or who is still working, so I tagged @RussellSpitzer, @szehon-ho and @stevenzwu who might be interested in this patch 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1061912802


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -617,9 +624,27 @@ long acquireLock() throws UnknownHostException, TException, InterruptedException
             Lists.newArrayList(lockComponent),
             System.getProperty("user.name"),
             InetAddress.getLocalHost().getHostName());
-    LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
-    long lockId = lockResponse.getLockid();
+
+    // Only works in Hive 2 or later.
+    if (MetastoreVersion.min(MetastoreVersion.HIVE_2)) {
+      lockRequest.setAgentInfo(agentInfo);
+    }
+
+    Pair<Long, LockState> lockResult;

Review Comment:
   Thanks for the idea!
   Done as suggested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1062145905


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -759,6 +812,98 @@ private static boolean hiveEngineEnabled(TableMetadata metadata, Configuration c
         ConfigProperties.ENGINE_HIVE_ENABLED, TableProperties.ENGINE_HIVE_ENABLED_DEFAULT);
   }
 
+  @SuppressWarnings("ReverseDnsLookup")
+  private void tryLock(
+      String agentInfo, AtomicReference<Long> lockId, AtomicReference<LockState> state)
+      throws UnknownHostException, TException {
+    final LockComponent lockComponent =
+        new LockComponent(LockType.EXCLUSIVE, LockLevel.TABLE, database);
+    lockComponent.setTablename(tableName);
+    final LockRequest lockRequest =
+        new LockRequest(
+            Lists.newArrayList(lockComponent),
+            System.getProperty("user.name"),
+            InetAddress.getLocalHost().getHostName());
+
+    // Only works in Hive 2 or later.
+    if (MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2)) {
+      lockRequest.setAgentInfo(agentInfo);
+    }
+
+    Tasks.foreach(lockRequest)
+        .retry(Integer.MAX_VALUE - 100)
+        .exponentialBackoff(
+            lockCreationMinWaitTime, lockCreationMaxWaitTime, lockCreationTimeout, 2.0)
+        .shouldRetryTest(
+            e ->
+                e instanceof TException
+                    && MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2))
+        .throwFailureWhenFinished()
+        .run(
+            request -> {
+              try {
+                LockResponse lockResponse = metaClients.run(client -> client.lock(request));
+                lockId.set(lockResponse.getLockid());
+                state.set(lockResponse.getState());
+              } catch (TException te) {
+                LOG.warn("Failed to acquire lock {}", request, te);
+                try {
+                  // If we can not check for lock, or we do not find it, then rethrow the exception
+                  // Otherwise we are happy as the findLock sets the lockId and the state correctly
+                  if (!MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2)
+                      || !findLock(agentInfo, lockId, state)) {
+                    throw te;
+                  } else {
+                    LOG.info(
+                        "Found lock by agentInfo {} with id {} and state {}",
+                        agentInfo,
+                        lockId,
+                        state);
+                  }
+                } catch (InterruptedException e) {
+                  Thread.currentThread().interrupt();
+                  LOG.warn(
+                      "Interrupted while checking for lock on table {}.{}", database, tableName, e);
+                  throw new RuntimeException("Interrupted while checking for lock", e);
+                }
+              } catch (InterruptedException e) {
+                Thread.currentThread().interrupt();
+                LOG.warn("Interrupted while acquiring lock on table {}.{}", database, tableName, e);
+                throw new RuntimeException("Interrupted while acquiring lock", e);
+              }
+            },
+            TException.class);
+  }
+
+  /**
+   * Search for the locks using HMSClient.showLocks identified by the agentInfo. If the lock is
+   * there, then the lockId and the state is set based on the result.
+   *
+   * @param agentInfo The key for searching the locks
+   * @param lockId The lockId if the lock has found
+   * @param state The state if the lock has found
+   * @return <code>true</code> if the the lock is there <code>false</code> if the lock is missing
+   */
+  private boolean findLock(
+      String agentInfo, AtomicReference<Long> lockId, AtomicReference<LockState> state)
+      throws TException, InterruptedException {
+    ShowLocksRequest showLocksRequest = new ShowLocksRequest();
+    showLocksRequest.setDbname(database);
+    showLocksRequest.setTablename(tableName);
+    ShowLocksResponse response = metaClients.run(client -> client.showLocks(showLocksRequest));
+    for (ShowLocksResponseElement lock : response.getLocks()) {
+      if (lock.getAgentInfo().equals(agentInfo)) {

Review Comment:
   Hive 1 has ShowLock, but with a different signature (no agentInfo), which is not enough for our purposes. Spark has Hive 1.2 embedded  so the feature is not working there.
   
   Also in Hive 1.2 we can not filter the results of the ShowLocks in advance, we have to fetch all the locks, and filter them on the client side, which is suboptimal for large deployments.
   
   I did not find a suitable solution in Hive 1.2, so I disabled it for now, and opted in for using ShowLocks only with newer HMSClients.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063970352


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java:
##########
@@ -73,8 +73,12 @@ public static void alterTable(
   }
 
   private static boolean detectHive3() {
+    return findClass(HIVE3_UNIQUE_CLASS);

Review Comment:
   Good call. Removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063985982


##########
docs/configuration.md:
##########
@@ -160,14 +160,20 @@ Here are the catalog properties related to locking. They are used by some catalo
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
 
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-timeout-ms              | 180000 (3 min)  | Maximum time in milliseconds to acquire a lock                               |

Review Comment:
   I see, just saw its from another pr and unrelated to this one.  But it still doesnt fit very well with the two phases (creation/acquisition) you put in the first part of the doc, should we just say simply:
   
   ```
   The HMS table locking is a 2-step process:
   1. Lock Creation: Create lock in HMS and queue for acquisition
   2. Lock Check: Check if lock successfully acquired
   ```
   
   This way, they are more relevant to the property names (lock-create, lock-check)  Also wasnt sure about putting the details of 'should wait until every previously created Lock is released', is this the Hive internals?  As I feel it will just confuse more. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063897801


##########
docs/configuration.md:
##########
@@ -160,14 +160,20 @@ Here are the catalog properties related to locking. They are used by some catalo
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
 
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-timeout-ms              | 180000 (3 min)  | Maximum time in milliseconds to acquire a lock                               |

Review Comment:
   I must have missed the earlier discussion, but should 'iceberg.hive.lock-timeout-ms' be 
   'iceberg.hive.lock-check-timeout' as its used with iceberg.hive.lock-check-min-wait-ms and iceberg.hive.lock-check.max-wait-ms?
   
   I guess the two phases to me should be (to go with the property names)  :  
   1. Lock Creation request
   2. Lock Check (check for successful acquisition of lock)
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063985982


##########
docs/configuration.md:
##########
@@ -160,14 +160,20 @@ Here are the catalog properties related to locking. They are used by some catalo
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
 
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-timeout-ms              | 180000 (3 min)  | Maximum time in milliseconds to acquire a lock                               |

Review Comment:
   I see, just saw its from unrelated changelist.  But it still doesnt fit very well with the two phases (creation/acquisition) you put in the first part of the doc, should we just say simply:
   
   
   The HMS table locking is a 2-step process:
   1. Lock create: Create lock in HMS and queue for acquisition
   2. Lock check: Check if lock successfully acquired
   
   This way, they are more relevant to the property name.  Also wasnt sure about putting the details of 'should wait until every previously created Lock is released', is this the Hive internals?  As I feel it will just confuse more. 



##########
docs/configuration.md:
##########
@@ -159,15 +159,24 @@ Here are the catalog properties related to locking. They are used by some catalo
 ## Hadoop configuration
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
-
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+The HMS table locking is a 2-step process:
+- Lock creation - the Lock itself is created and queued
+- Lock acquisition - the queued Lock should wait until every previously created Lock is released
+
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-creation-timeout-ms     | 180000 (3 min)  | Maximum time in milliseconds to create a lock in the HMS                     |
+| iceberg.hive.lock-creation-min-wait-ms    | 50              | Minimum time in milliseconds to check the lock creation in the HMS           |

Review Comment:
   How about, max/min time between retries of creating lock.  (Using the word 'check' here confuses with the actual check.x properties)



##########
docs/configuration.md:
##########
@@ -159,15 +159,24 @@ Here are the catalog properties related to locking. They are used by some catalo
 ## Hadoop configuration
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
-
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+The HMS table locking is a 2-step process:
+- Lock creation - the Lock itself is created and queued
+- Lock acquisition - the queued Lock should wait until every previously created Lock is released
+
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-creation-timeout-ms     | 180000 (3 min)  | Maximum time in milliseconds to create a lock in the HMS                     |
+| iceberg.hive.lock-creation-min-wait-ms    | 50              | Minimum time in milliseconds to check the lock creation in the HMS           |
+| iceberg.hive.lock-creation-max-wait-ms    | 5000            | Maximum time in milliseconds to check the lock creation in the HMS           |
+| iceberg.hive.lock-timeout-ms              | 180000 (3 min)  | Maximum time in milliseconds to acquire a lock                               |
+| iceberg.hive.lock-check-min-wait-ms       | 50              | Minimum time in milliseconds to check back on the status of lock acquisition |
+| iceberg.hive.lock-check-max-wait-ms       | 5000            | Maximum time in milliseconds to check back on the status of lock acquisition |

Review Comment:
   Same , how about, min/max time to retry checking acquisition status of lock



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1061980018


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -759,6 +812,98 @@ private static boolean hiveEngineEnabled(TableMetadata metadata, Configuration c
         ConfigProperties.ENGINE_HIVE_ENABLED, TableProperties.ENGINE_HIVE_ENABLED_DEFAULT);
   }
 
+  @SuppressWarnings("ReverseDnsLookup")
+  private void tryLock(
+      String agentInfo, AtomicReference<Long> lockId, AtomicReference<LockState> state)
+      throws UnknownHostException, TException {
+    final LockComponent lockComponent =
+        new LockComponent(LockType.EXCLUSIVE, LockLevel.TABLE, database);
+    lockComponent.setTablename(tableName);
+    final LockRequest lockRequest =
+        new LockRequest(
+            Lists.newArrayList(lockComponent),
+            System.getProperty("user.name"),
+            InetAddress.getLocalHost().getHostName());
+
+    // Only works in Hive 2 or later.
+    if (MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2)) {
+      lockRequest.setAgentInfo(agentInfo);
+    }
+
+    Tasks.foreach(lockRequest)
+        .retry(Integer.MAX_VALUE - 100)
+        .exponentialBackoff(
+            lockCreationMinWaitTime, lockCreationMaxWaitTime, lockCreationTimeout, 2.0)
+        .shouldRetryTest(
+            e ->
+                e instanceof TException
+                    && MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2))
+        .throwFailureWhenFinished()
+        .run(
+            request -> {
+              try {
+                LockResponse lockResponse = metaClients.run(client -> client.lock(request));
+                lockId.set(lockResponse.getLockid());
+                state.set(lockResponse.getState());
+              } catch (TException te) {
+                LOG.warn("Failed to acquire lock {}", request, te);
+                try {
+                  // If we can not check for lock, or we do not find it, then rethrow the exception
+                  // Otherwise we are happy as the findLock sets the lockId and the state correctly
+                  if (!MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2)
+                      || !findLock(agentInfo, lockId, state)) {
+                    throw te;
+                  } else {
+                    LOG.info(
+                        "Found lock by agentInfo {} with id {} and state {}",
+                        agentInfo,
+                        lockId,
+                        state);
+                  }
+                } catch (InterruptedException e) {
+                  Thread.currentThread().interrupt();
+                  LOG.warn(
+                      "Interrupted while checking for lock on table {}.{}", database, tableName, e);
+                  throw new RuntimeException("Interrupted while checking for lock", e);
+                }
+              } catch (InterruptedException e) {
+                Thread.currentThread().interrupt();
+                LOG.warn("Interrupted while acquiring lock on table {}.{}", database, tableName, e);
+                throw new RuntimeException("Interrupted while acquiring lock", e);
+              }
+            },
+            TException.class);
+  }
+
+  /**
+   * Search for the locks using HMSClient.showLocks identified by the agentInfo. If the lock is
+   * there, then the lockId and the state is set based on the result.
+   *
+   * @param agentInfo The key for searching the locks
+   * @param lockId The lockId if the lock has found
+   * @param state The state if the lock has found
+   * @return <code>true</code> if the the lock is there <code>false</code> if the lock is missing
+   */
+  private boolean findLock(
+      String agentInfo, AtomicReference<Long> lockId, AtomicReference<LockState> state)
+      throws TException, InterruptedException {
+    ShowLocksRequest showLocksRequest = new ShowLocksRequest();
+    showLocksRequest.setDbname(database);
+    showLocksRequest.setTablename(tableName);
+    ShowLocksResponse response = metaClients.run(client -> client.showLocks(showLocksRequest));
+    for (ShowLocksResponseElement lock : response.getLocks()) {
+      if (lock.getAgentInfo().equals(agentInfo)) {

Review Comment:
   Should there be a Hive2+ check here?



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -699,18 +717,53 @@ private void cleanupMetadataAndUnlock(
     } catch (RuntimeException e) {
       LOG.error("Failed to cleanup metadata file at {}", metadataLocation, e);
     } finally {
-      unlock(lockId);
+      unlock(lockId, agentInfo);
       tableLevelMutex.unlock();
     }
   }
 
-  private void unlock(Optional<Long> lockId) {
-    if (lockId.isPresent()) {
-      try {
-        doUnlock(lockId.get());
-      } catch (Exception e) {
-        LOG.warn("Failed to unlock {}.{}", database, tableName, e);
+  private void unlock(Optional<Long> lockId, String agentInfo) {
+    Long id = null;
+    try {
+      if (!lockId.isPresent()) {
+        // Try to find the lock based on agentInfo. Only works with Hive 2 or later.
+        if (MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2)) {
+          AtomicReference<Long> foundLockId = new AtomicReference<>();
+          AtomicReference<LockState> state = new AtomicReference<>();
+          if (!findLock(agentInfo, foundLockId, state)) {

Review Comment:
   Style: same, would prefer findLock to return a LockInfo if it finds a lock, or null if it cannot find it.



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreClientVersion.java:
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.hive;
+
+import org.apache.hive.common.util.HiveVersionInfo;
+
+public enum MetastoreClientVersion {
+  HIVE_4(4),
+  HIVE_3(3),
+  HIVE_2(2),
+  HIVE_1_2(1),
+  NOT_SUPPORTED(0);
+
+  private final int order;
+  private static final MetastoreClientVersion current = calculate();
+
+  MetastoreClientVersion(int order) {
+    this.order = order;
+  }
+
+  public static MetastoreClientVersion current() {
+    return current;
+  }
+
+  public static boolean min(MetastoreClientVersion other) {
+    return current.order >= other.order;
+  }
+
+  private static MetastoreClientVersion calculate() {
+    String version = HiveVersionInfo.getShortVersion();
+    String[] versions = version.split("\\.");
+    switch (versions[0]) {
+      case "4":
+        return HIVE_4;
+      case "3":
+        return HIVE_3;
+      case "2":
+        return HIVE_2;
+      case "1":
+        if (versions[1].equals("2")) {
+          return HIVE_1_2;
+        } else {
+          return NOT_SUPPORTED;

Review Comment:
   Why are we explicitly not-supporting versions lesser than Hive 1.2 now?  I don't think there was any checks   previously to throw an exception for this case.  Anyway, we are protecting the agentInfo code around Hive 2 check



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -606,20 +630,13 @@ private StorageDescriptor storageDescriptor(TableMetadata metadata, boolean hive
     return storageDescriptor;
   }
 
-  @SuppressWarnings("ReverseDnsLookup")
   @VisibleForTesting
-  long acquireLock() throws UnknownHostException, TException, InterruptedException {
-    final LockComponent lockComponent =
-        new LockComponent(LockType.EXCLUSIVE, LockLevel.TABLE, database);
-    lockComponent.setTablename(tableName);
-    final LockRequest lockRequest =
-        new LockRequest(
-            Lists.newArrayList(lockComponent),
-            System.getProperty("user.name"),
-            InetAddress.getLocalHost().getHostName());
-    LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
-    long lockId = lockResponse.getLockid();
+  long acquireLock(String agentInfo) throws UnknownHostException, TException, InterruptedException {
+    AtomicReference<Long> lockId = new AtomicReference<>();
+    AtomicReference<LockState> state = new AtomicReference<>();
+
+    // This will set the lockId and state if successful
+    tryLock(agentInfo, lockId, state);

Review Comment:
   Style Preference: would be great to do it more functional style, and have the method return a LockInfo class of (lockId, state), rather than having the arguments mutated which is harder to keep track.



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -699,18 +717,53 @@ private void cleanupMetadataAndUnlock(
     } catch (RuntimeException e) {
       LOG.error("Failed to cleanup metadata file at {}", metadataLocation, e);
     } finally {
-      unlock(lockId);
+      unlock(lockId, agentInfo);
       tableLevelMutex.unlock();
     }
   }
 
-  private void unlock(Optional<Long> lockId) {
-    if (lockId.isPresent()) {
-      try {
-        doUnlock(lockId.get());
-      } catch (Exception e) {
-        LOG.warn("Failed to unlock {}.{}", database, tableName, e);
+  private void unlock(Optional<Long> lockId, String agentInfo) {
+    Long id = null;
+    try {
+      if (!lockId.isPresent()) {
+        // Try to find the lock based on agentInfo. Only works with Hive 2 or later.
+        if (MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2)) {
+          AtomicReference<Long> foundLockId = new AtomicReference<>();
+          AtomicReference<LockState> state = new AtomicReference<>();
+          if (!findLock(agentInfo, foundLockId, state)) {
+            // No lock found
+            LOG.info("No lock found with {} agentInfo", agentInfo);
+            return;
+          }
+
+          id = foundLockId.get();
+        } else {
+          LOG.warn("Could not find lock with HMSClient {}", MetastoreClientVersion.current());
+          return;
+        }
+      } else {
+        id = lockId.get();
+      }
+
+      doUnlock(id);
+    } catch (InterruptedException ie) {
+      if (id != null) {

Review Comment:
   Is this just to handle interrupt for unlock?  I'm not sure I see how its related to the rest of the change, I would think if anything here, we should be checking TException for unlock and try unlock again, synonomous with tryLock?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1062550991


##########
docs/configuration.md:
##########
@@ -160,14 +160,20 @@ Here are the catalog properties related to locking. They are used by some catalo
 
 The following properties from the Hadoop configuration are used by the Hive Metastore connector.
 
-| Property                              | Default          | Description                                                                        |
-| ------------------------------------- | ---------------- | ---------------------------------------------------------------------------------- |
-| iceberg.hive.client-pool-size         | 5                | The size of the Hive client pool when tracking tables in HMS                       |
-| iceberg.hive.lock-timeout-ms          | 180000 (3 min)   | Maximum time in milliseconds to acquire a lock                                     |
-| iceberg.hive.lock-check-min-wait-ms   | 50               | Minimum time in milliseconds to check back on the status of lock acquisition       |
-| iceberg.hive.lock-check-max-wait-ms   | 5000             | Maximum time in milliseconds to check back on the status of lock acquisition       |
-
-Note: `iceberg.hive.lock-check-max-wait-ms` should be less than the [transaction timeout](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.timeout) 
+| Property                                  | Default         | Description                                                                  |
+|-------------------------------------------|-----------------|------------------------------------------------------------------------------|
+| iceberg.hive.client-pool-size             | 5               | The size of the Hive client pool when tracking tables in HMS                 |
+| iceberg.hive.lock-timeout-ms              | 180000 (3 min)  | Maximum time in milliseconds to acquire a lock                               |

Review Comment:
   I think that is a bit clearer for me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1062800518


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -759,6 +812,98 @@ private static boolean hiveEngineEnabled(TableMetadata metadata, Configuration c
         ConfigProperties.ENGINE_HIVE_ENABLED, TableProperties.ENGINE_HIVE_ENABLED_DEFAULT);
   }
 
+  @SuppressWarnings("ReverseDnsLookup")
+  private void tryLock(
+      String agentInfo, AtomicReference<Long> lockId, AtomicReference<LockState> state)
+      throws UnknownHostException, TException {
+    final LockComponent lockComponent =
+        new LockComponent(LockType.EXCLUSIVE, LockLevel.TABLE, database);
+    lockComponent.setTablename(tableName);
+    final LockRequest lockRequest =
+        new LockRequest(
+            Lists.newArrayList(lockComponent),
+            System.getProperty("user.name"),
+            InetAddress.getLocalHost().getHostName());
+
+    // Only works in Hive 2 or later.
+    if (MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2)) {
+      lockRequest.setAgentInfo(agentInfo);
+    }
+
+    Tasks.foreach(lockRequest)
+        .retry(Integer.MAX_VALUE - 100)
+        .exponentialBackoff(
+            lockCreationMinWaitTime, lockCreationMaxWaitTime, lockCreationTimeout, 2.0)
+        .shouldRetryTest(
+            e ->
+                e instanceof TException
+                    && MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2))
+        .throwFailureWhenFinished()
+        .run(
+            request -> {
+              try {
+                LockResponse lockResponse = metaClients.run(client -> client.lock(request));
+                lockId.set(lockResponse.getLockid());
+                state.set(lockResponse.getState());
+              } catch (TException te) {
+                LOG.warn("Failed to acquire lock {}", request, te);
+                try {
+                  // If we can not check for lock, or we do not find it, then rethrow the exception
+                  // Otherwise we are happy as the findLock sets the lockId and the state correctly
+                  if (!MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2)
+                      || !findLock(agentInfo, lockId, state)) {
+                    throw te;
+                  } else {
+                    LOG.info(
+                        "Found lock by agentInfo {} with id {} and state {}",
+                        agentInfo,
+                        lockId,
+                        state);
+                  }
+                } catch (InterruptedException e) {
+                  Thread.currentThread().interrupt();
+                  LOG.warn(
+                      "Interrupted while checking for lock on table {}.{}", database, tableName, e);
+                  throw new RuntimeException("Interrupted while checking for lock", e);
+                }
+              } catch (InterruptedException e) {
+                Thread.currentThread().interrupt();
+                LOG.warn("Interrupted while acquiring lock on table {}.{}", database, tableName, e);
+                throw new RuntimeException("Interrupted while acquiring lock", e);
+              }
+            },
+            TException.class);
+  }
+
+  /**
+   * Search for the locks using HMSClient.showLocks identified by the agentInfo. If the lock is
+   * there, then the lockId and the state is set based on the result.
+   *
+   * @param agentInfo The key for searching the locks
+   * @param lockId The lockId if the lock has found
+   * @param state The state if the lock has found
+   * @return <code>true</code> if the the lock is there <code>false</code> if the lock is missing
+   */
+  private boolean findLock(
+      String agentInfo, AtomicReference<Long> lockId, AtomicReference<LockState> state)
+      throws TException, InterruptedException {
+    ShowLocksRequest showLocksRequest = new ShowLocksRequest();
+    showLocksRequest.setDbname(database);
+    showLocksRequest.setTablename(tableName);
+    ShowLocksResponse response = metaClients.run(client -> client.showLocks(showLocksRequest));
+    for (ShowLocksResponseElement lock : response.getLocks()) {
+      if (lock.getAgentInfo().equals(agentInfo)) {

Review Comment:
   Sorry I missed , that this whole block only gets called for Hive2.  Thinking it may be worth a Precondition check, up to you



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreClientVersion.java:
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.hive;
+
+import org.apache.hive.common.util.HiveVersionInfo;
+
+public enum MetastoreClientVersion {
+  HIVE_4(4),
+  HIVE_3(3),
+  HIVE_2(2),
+  HIVE_1_2(1),
+  NOT_SUPPORTED(0);
+
+  private final int order;
+  private static final MetastoreClientVersion current = calculate();
+
+  MetastoreClientVersion(int order) {
+    this.order = order;
+  }
+
+  public static MetastoreClientVersion current() {
+    return current;
+  }
+
+  public static boolean min(MetastoreClientVersion other) {
+    return current.order >= other.order;
+  }
+
+  private static MetastoreClientVersion calculate() {
+    String version = HiveVersionInfo.getShortVersion();
+    String[] versions = version.split("\\.");
+    switch (versions[0]) {
+      case "4":
+        return HIVE_4;
+      case "3":
+        return HIVE_3;
+      case "2":
+        return HIVE_2;
+      case "1":
+        if (versions[1].equals("2")) {
+          return HIVE_1_2;
+        } else {
+          return NOT_SUPPORTED;

Review Comment:
   Makes sense, I am not sure if its desirable to make the effort here to explicitly throw an exception though for Hive less than 1.2.  Also was there an explicit note in any Iceberg docs?  
   
   Though im open if thats what community feels, any thoughts: @rdblue @RussellSpitzer 



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -699,18 +717,53 @@ private void cleanupMetadataAndUnlock(
     } catch (RuntimeException e) {
       LOG.error("Failed to cleanup metadata file at {}", metadataLocation, e);
     } finally {
-      unlock(lockId);
+      unlock(lockId, agentInfo);
       tableLevelMutex.unlock();
     }
   }
 
-  private void unlock(Optional<Long> lockId) {
-    if (lockId.isPresent()) {
-      try {
-        doUnlock(lockId.get());
-      } catch (Exception e) {
-        LOG.warn("Failed to unlock {}.{}", database, tableName, e);
+  private void unlock(Optional<Long> lockId, String agentInfo) {
+    Long id = null;
+    try {
+      if (!lockId.isPresent()) {
+        // Try to find the lock based on agentInfo. Only works with Hive 2 or later.
+        if (MetastoreClientVersion.min(MetastoreClientVersion.HIVE_2)) {
+          AtomicReference<Long> foundLockId = new AtomicReference<>();
+          AtomicReference<LockState> state = new AtomicReference<>();
+          if (!findLock(agentInfo, foundLockId, state)) {
+            // No lock found
+            LOG.info("No lock found with {} agentInfo", agentInfo);
+            return;
+          }
+
+          id = foundLockId.get();
+        } else {
+          LOG.warn("Could not find lock with HMSClient {}", MetastoreClientVersion.current());
+          return;
+        }
+      } else {
+        id = lockId.get();
+      }
+
+      doUnlock(id);
+    } catch (InterruptedException ie) {
+      if (id != null) {

Review Comment:
   While this makes sense, following this logic, this change seems like its missing handling for following:
   
   - lock fails due to interrupted (or is this handled?  I didnt see any explicit interruptedException handling in lock code)
   - unlock fails because of Thrift Exception
   
   Shouldn't we harden these cases then?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063351994


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreClientVersion.java:
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.hive;
+
+import org.apache.hive.common.util.HiveVersionInfo;
+
+public enum MetastoreClientVersion {
+  HIVE_4(4),
+  HIVE_3(3),
+  HIVE_2(2),
+  HIVE_1_2(1),
+  NOT_SUPPORTED(0);
+
+  private final int order;
+  private static final MetastoreClientVersion current = calculate();
+
+  MetastoreClientVersion(int order) {
+    this.order = order;
+  }
+
+  public static MetastoreClientVersion current() {
+    return current;
+  }
+
+  public static boolean min(MetastoreClientVersion other) {
+    return current.order >= other.order;
+  }
+
+  private static MetastoreClientVersion calculate() {
+    String version = HiveVersionInfo.getShortVersion();
+    String[] versions = version.split("\\.");
+    switch (versions[0]) {
+      case "4":
+        return HIVE_4;
+      case "3":
+        return HIVE_3;
+      case "2":
+        return HIVE_2;
+      case "1":
+        if (versions[1].equals("2")) {
+          return HIVE_1_2;
+        } else {
+          return NOT_SUPPORTED;

Review Comment:
   Is is never called out explicitly, but it would be hard find Hive older than 1.2 in any production environment. (I know of one, but that is a serious outlier)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1056519719


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -759,6 +816,40 @@ private static boolean hiveEngineEnabled(TableMetadata metadata, Configuration c
         ConfigProperties.ENGINE_HIVE_ENABLED, TableProperties.ENGINE_HIVE_ENABLED_DEFAULT);
   }
 
+  private Pair<Long, LockState> findLock(String agentInfo) throws TException, InterruptedException {
+    // Search for the locks using HMSClient.showLocks identified by the agentInfo
+    ShowLocksRequest showLocksRequest = new ShowLocksRequest();
+    showLocksRequest.setDbname(database);
+    showLocksRequest.setTablename(tableName);
+    ShowLocksResponse response = metaClients.run(client -> client.showLocks(showLocksRequest));
+    for (ShowLocksResponseElement lock : response.getLocks()) {

Review Comment:
   If there is a concurrent commit with its own lock, then we need to find our own



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1063566573


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java:
##########
@@ -73,8 +73,12 @@ public static void alterTable(
   }
 
   private static boolean detectHive3() {
+    return findClass(HIVE3_UNIQUE_CLASS);

Review Comment:
   Do we still need this? Doesn't MetastoreClientVersion give us a Version 3 check?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6451: Hive: Lock hardening

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6451:
URL: https://github.com/apache/iceberg/pull/6451#discussion_r1062148775


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreClientVersion.java:
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.hive;
+
+import org.apache.hive.common.util.HiveVersionInfo;
+
+public enum MetastoreClientVersion {
+  HIVE_4(4),
+  HIVE_3(3),
+  HIVE_2(2),
+  HIVE_1_2(1),
+  NOT_SUPPORTED(0);
+
+  private final int order;
+  private static final MetastoreClientVersion current = calculate();
+
+  MetastoreClientVersion(int order) {
+    this.order = order;
+  }
+
+  public static MetastoreClientVersion current() {
+    return current;
+  }
+
+  public static boolean min(MetastoreClientVersion other) {
+    return current.order >= other.order;
+  }
+
+  private static MetastoreClientVersion calculate() {
+    String version = HiveVersionInfo.getShortVersion();
+    String[] versions = version.split("\\.");
+    switch (versions[0]) {
+      case "4":
+        return HIVE_4;
+      case "3":
+        return HIVE_3;
+      case "2":
+        return HIVE_2;
+      case "1":
+        if (versions[1].equals("2")) {
+          return HIVE_1_2;
+        } else {
+          return NOT_SUPPORTED;

Review Comment:
   We support Hive 1.2 through Spark. I think we do not want to start supporting even older versions of Hive. That is the reason that I explicitly kept Hive 1.2 here and added anything else as unsupported.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org