You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/12/04 03:09:59 UTC

[GitHub] [iceberg] raptond opened a new pull request #1873: Hive: Make lock check retries backoff exponentially

raptond opened a new pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873


   50 milliseconds (constant) sleep time between "checking lock status" thrashes hive metastore databases when multiple jobs try to commit to the same Iceberg table. This fix allows the frequency of "checking the WAITING lock status" configurable and makes use of Tasks to backoff exponentially.
   
   Every time a check on the lock is made, the HMS performs heartbeats on the lock record and the transaction record. It eventually ends up with the below errors if the number of jobs on the same table grew and commit at the same time. Ability to configure the delay between retries and slowing down retries further exponentially would help. Thanks.
   
   ```
   MetaException(message:Unable to update transaction database org.postgresql.util.PSQLException: ERROR: could not serialize access due to read/write dependencies among transactions
   Detail: Reason code: Canceled on identification as a pivot, during write.
   Hint: The transaction might succeed if retried.
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#issuecomment-742680915


   I'll do a pass in 15 mins


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536253250



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -96,7 +115,14 @@ protected HiveTableOperations(Configuration conf, HiveClientPool metaClients, Fi
     this.database = database;
     this.tableName = table;
     this.lockAcquireTimeout =
-        conf.getLong(HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS, HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT);
+        conf.getLong(HIVE_ACQUIRE_LOCK_TIMEOUT_MS, HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT);
+    this.lockCheckMinWaitTime =
+        conf.getLong(HIVE_LOCK_CHECK_MIN_WAIT_MS, HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT);
+    this.lockCheckMaxWaitTime =
+        conf.getLong(HIVE_LOCK_CHECK_MAX_WAIT_MS, HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT);
+    this.lockCheckBackoffScaleFactor =
+        conf.getDouble(HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR, HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR_DEFAULT);

Review comment:
       We don't expose the scale factor in other places. I think it makes sense to keep it simple and not have one here either.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] raptond commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
raptond commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536354930



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -96,7 +115,14 @@ protected HiveTableOperations(Configuration conf, HiveClientPool metaClients, Fi
     this.database = database;
     this.tableName = table;
     this.lockAcquireTimeout =
-        conf.getLong(HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS, HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT);
+        conf.getLong(HIVE_ACQUIRE_LOCK_TIMEOUT_MS, HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT);
+    this.lockCheckMinWaitTime =
+        conf.getLong(HIVE_LOCK_CHECK_MIN_WAIT_MS, HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT);
+    this.lockCheckMaxWaitTime =
+        conf.getLong(HIVE_LOCK_CHECK_MAX_WAIT_MS, HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT);
+    this.lockCheckBackoffScaleFactor =
+        conf.getDouble(HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR, HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR_DEFAULT);

Review comment:
       I'll remove the scale factor.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536255756



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds
+  private static final double HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR_DEFAULT = 1.5;
   private static final DynMethods.UnboundMethod ALTER_TABLE = DynMethods.builder("alter_table")
       .impl(HiveMetaStoreClient.class, "alter_table_with_environmentContext",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .impl(HiveMetaStoreClient.class, "alter_table",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .build();
 
+  protected static class WaitingForLockException extends RuntimeException {

Review comment:
       I agree. If this is always caught, it should be private.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536252025



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds
+  private static final double HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR_DEFAULT = 1.5;
   private static final DynMethods.UnboundMethod ALTER_TABLE = DynMethods.builder("alter_table")
       .impl(HiveMetaStoreClient.class, "alter_table_with_environmentContext",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .impl(HiveMetaStoreClient.class, "alter_table",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .build();
 
+  protected static class WaitingForLockException extends RuntimeException {
+    public WaitingForLockException(String message) {
+      super(message);
+    }
+  }
+
+  private static final WaitingForLockException WAITING_FOR_LOCK_EXCEPTION =

Review comment:
       I agree. What is the purpose of having a single instance of an exception? Wouldn't that lead to deceptive stack traces?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] raptond commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
raptond commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r539075969



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -96,7 +115,14 @@ protected HiveTableOperations(Configuration conf, HiveClientPool metaClients, Fi
     this.database = database;
     this.tableName = table;
     this.lockAcquireTimeout =
-        conf.getLong(HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS, HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT);
+        conf.getLong(HIVE_ACQUIRE_LOCK_TIMEOUT_MS, HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT);
+    this.lockCheckMinWaitTime =
+        conf.getLong(HIVE_LOCK_CHECK_MIN_WAIT_MS, HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT);
+    this.lockCheckMaxWaitTime =
+        conf.getLong(HIVE_LOCK_CHECK_MAX_WAIT_MS, HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT);
+    this.lockCheckBackoffScaleFactor =
+        conf.getDouble(HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR, HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR_DEFAULT);
+

Review comment:
       taken care. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536382097



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds

Review comment:
       That being said, I am +1 for adding catalog options. I am just not sure we can get rid of Hadoop conf completely in this case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] raptond commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
raptond commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r539077360



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -294,32 +313,50 @@ private long acquireLock() throws UnknownHostException, TException, InterruptedE
         System.getProperty("user.name"),
         InetAddress.getLocalHost().getHostName());
     LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    LockState state = lockResponse.getState();
+    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
     long lockId = lockResponse.getLockid();
 
     final long start = System.currentTimeMillis();
     long duration = 0;
     boolean timeout = false;
-    while (!timeout && state.equals(LockState.WAITING)) {
-      lockResponse = metaClients.run(client -> client.checkLock(lockId));
-      state = lockResponse.getState();
 
-      // check timeout
-      duration = System.currentTimeMillis() - start;
-      if (duration > lockAcquireTimeout) {
+    if (state.get().equals(LockState.WAITING)) {
+      try {
+        Tasks.foreach(lockId)
+            .retry(Integer.MAX_VALUE - 100) // Endless retries bound by timeouts. Tasks.retry adds 1 for "first try".
+            .exponentialBackoff(
+                lockCheckMinWaitTime,
+                lockCheckMaxWaitTime,
+                lockAcquireTimeout,
+                1.5)
+            .throwFailureWhenFinished()
+            .onlyRetryOn(WaitingForLockException.class)
+            .run(id -> {
+              try {
+                LockResponse response = metaClients.run(client -> client.checkLock(id));
+                LockState newState = response.getState();
+                state.set(newState);
+                if (newState.equals(LockState.WAITING)) {
+                  throw new WaitingForLockException("Waiting for lock.");
+                }
+              } catch (InterruptedException e) {
+                Thread.currentThread().interrupt();
+                throw new RuntimeException("Interrupted while checking lock status.", e);

Review comment:
       You are correct. I was fixated on failing the execution. This suggestion works nicely and I have a test case for this. Thank you.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r535925641



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds

Review comment:
       Might worth to mention in the documentation, or somewhere that this should be smaller than `hive.txn.timeout` or in newer versions `metastore.txn.timeout` otherwise the locks might be timed out without because of the lack of heartbeat.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r535988782



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds

Review comment:
       We should also add these configs to `configuration.md` to the rest of Hadoop conf, @raptond.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] raptond commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
raptond commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r539077810



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -304,32 +330,57 @@ private long acquireLock() throws UnknownHostException, TException, InterruptedE
         System.getProperty("user.name"),
         InetAddress.getLocalHost().getHostName());
     LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    LockState state = lockResponse.getState();
+    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
     long lockId = lockResponse.getLockid();
 
     final long start = System.currentTimeMillis();
     long duration = 0;
     boolean timeout = false;
-    while (!timeout && state.equals(LockState.WAITING)) {
-      lockResponse = metaClients.run(client -> client.checkLock(lockId));
-      state = lockResponse.getState();
 
-      // check timeout
-      duration = System.currentTimeMillis() - start;
-      if (duration > lockAcquireTimeout) {
+    if (state.get().equals(LockState.WAITING)) {
+      try {
+        Tasks.foreach(lockId)
+            .retry(Integer.MAX_VALUE - 100) // Endless retries bound by timeouts. Tasks.retry adds 1 for "first try".
+            .exponentialBackoff(
+                lockCheckMinWaitTime,
+                lockCheckMaxWaitTime,
+                lockAcquireTimeout,
+                lockCheckBackoffScaleFactor)
+            .throwFailureWhenFinished()
+            .onlyRetryOn(WaitingForLockException.class)
+            .run(id -> {
+              try {
+                LockResponse response = metaClients.run(client -> client.checkLock(id));
+                LockState newState = response.getState();
+                state.set(newState);
+                if (newState.equals(LockState.WAITING)) {
+                  throw WAITING_FOR_LOCK_EXCEPTION;
+                }
+              } catch (InterruptedException | TException e) {

Review comment:
       Taken care as per [the other comment](https://github.com/apache/iceberg/pull/1873#discussion_r539077360).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi edited a comment on pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
aokolnychyi edited a comment on pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#issuecomment-738705947


   This change looks great to me, just minor comments in addition to what @pvary mentioned.
   Thanks for working on this, @raptond!
   
   We could add fewer configs but I'd be in favor of what this PR does. It was really painful when we hit this problem and couldn't do anything without changing the code so having props to configure every aspect sounds good to me. It is rather a sensitive area and more control here seems justified to me.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] raptond commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
raptond commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536348290



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds
+  private static final double HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR_DEFAULT = 1.5;
   private static final DynMethods.UnboundMethod ALTER_TABLE = DynMethods.builder("alter_table")
       .impl(HiveMetaStoreClient.class, "alter_table_with_environmentContext",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .impl(HiveMetaStoreClient.class, "alter_table",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .build();
 
+  protected static class WaitingForLockException extends RuntimeException {
+    public WaitingForLockException(String message) {
+      super(message);
+    }
+  }
+
+  private static final WaitingForLockException WAITING_FOR_LOCK_EXCEPTION =

Review comment:
       It's not needed to be a single instance and holding a different stack trace than where it originated. I'll fix this. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#issuecomment-738705947


   This change looks great to me, just minor comments in addition to what @pvary mentioned.
   Thanks for working on this, @raptond!
   
   We could add fewer configs but I'd be in favor of what this PR does. It was really painful when we hit this problem and couldn't do anything without changing the code so having props to configure every aspect sounds good to me. It is rather a sensitive area and more control here would be good.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] raptond commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
raptond commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r539112140



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds

Review comment:
       @aokolnychyi @pvary - 
   ![image](https://user-images.githubusercontent.com/30231978/101605486-12138680-39b7-11eb-8b9c-bad652894e71.png)
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#issuecomment-742693183


   @raptond, do you want to work on catalog options for this next?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536270675



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds

Review comment:
       Yes agree. But Hive is currently built around reading from Hadoop configs. If we want to change it to use catalog properties, we need to also change all the places that loads `HiveCatalog` using the constructor `public HiveCatalog(Configuration conf)`, such as https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/Catalogs.java#L215




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536251635



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds

Review comment:
       The Glue catalog is introducing support for a lock using DynamoDB. It would be nice to standardize these options across catalogs so that we only need to document them once and they work the same way. FYI @jackye1995.
   
   I think it would also make sense for these to be catalog options, rather than pulled from the Hive configuration. We used HiveConf originally because we didn't have catalog-specific configuration, but now I think it would make sense to move these into catalog properties. We don't want to increase the cases where we use a Hadoop Configuration.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#issuecomment-738659966


   Minor comments, looks good to me (non-binding)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] raptond commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
raptond commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r539071463



##########
File path: hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommitLocks.java
##########
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hive;
+
+import java.util.concurrent.atomic.AtomicReference;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.HiveMetaStoreClient;
+import org.apache.hadoop.hive.metastore.api.LockResponse;
+import org.apache.hadoop.hive.metastore.api.LockState;
+import org.apache.iceberg.AssertHelpers;
+import org.apache.iceberg.HasTableOperations;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.types.Types;
+import org.apache.thrift.TException;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.mockito.Matchers.any;
+import static org.mockito.Matchers.eq;
+import static org.mockito.Mockito.doNothing;
+import static org.mockito.Mockito.doReturn;
+import static org.mockito.Mockito.spy;
+import static org.mockito.Mockito.when;
+
+public class TestHiveCommitLocks extends HiveTableBaseTest {
+  HiveTableOperations ops = null;
+  HiveTableOperations spyOps = null;
+  HiveClientPool spyClientPool = null;
+  HiveMetaStoreClient spyClient = null;
+  TableMetadata metadataV1 = null;
+  TableMetadata metadataV2 = null;
+
+  long dummyLockId = 500L;
+  LockResponse waitLockResponse = new LockResponse(dummyLockId, LockState.WAITING);
+  LockResponse acquiredLockResponse = new LockResponse(dummyLockId, LockState.ACQUIRED);
+  LockResponse notAcquiredLockResponse = new LockResponse(dummyLockId, LockState.NOT_ACQUIRED);
+
+  @Before
+  public void before() throws Exception {
+    Table table = catalog.loadTable(TABLE_IDENTIFIER);
+    ops = (HiveTableOperations) ((HasTableOperations) table).operations();
+    String dbName = TABLE_IDENTIFIER.namespace().level(0);
+    String tableName = TABLE_IDENTIFIER.name();
+    Configuration overriddenHiveConf = new Configuration(hiveConf);
+    overriddenHiveConf.setLong("iceberg.hive.lock-timeout-ms", 10 * 1000);
+    overriddenHiveConf.setLong("iceberg.hive.lock-check-min-wait-ms", 50);
+    overriddenHiveConf.setLong("iceberg.hive.lock-check-max-wait-ms", 5 * 1000);
+    overriddenHiveConf.setDouble("iceberg.hive.lock-check-backoff-scale-factor", 3.0);
+
+    metadataV1 = ops.current();
+
+    table.updateSchema()
+        .addColumn("n", Types.IntegerType.get())
+        .commit();
+
+    ops.refresh();
+
+    metadataV2 = ops.current();
+
+    Assert.assertEquals(2, ops.current().schema().columns().size());
+
+    spyClientPool = spy(new HiveClientPool(1, overriddenHiveConf));
+    AtomicReference<HiveMetaStoreClient> spyClientRef = new AtomicReference<>();
+
+    when(spyClientPool.newClient()).thenAnswer(invocation -> {
+      HiveMetaStoreClient client = (HiveMetaStoreClient) invocation.callRealMethod();
+      spyClientRef.set(spy(client));
+      return spyClientRef.get();
+    });
+
+    spyOps = spy(new HiveTableOperations(overriddenHiveConf, spyClientPool, ops.io(), catalog.name(),
+        dbName, tableName));
+    spyClientPool.run(client -> client.isLocalMetaStore()); // To ensure new client is created.
+    Assert.assertNotNull(spyClientRef.get());
+
+    spyClient = spyClientRef.get();
+  }
+
+  @After
+  public void cleanup() {
+    try {
+      spyClientPool.close();
+    } catch (Throwable t) {
+      // Ignore any exception
+    }
+  }
+
+  @Test
+  public void testLockAcquisitionAtFirstTime() throws TException, InterruptedException {
+    doReturn(acquiredLockResponse).when(spyClient).lock(any());
+    doNothing().when(spyOps).doUnlock(eq(dummyLockId));
+
+    spyOps.doCommit(metadataV2, metadataV1);
+
+    Assert.assertEquals(1, spyOps.current().schema().columns().size()); // should be 1 again
+  }
+
+  @Test
+  public void testLockAcquisitionAfterRetries() throws TException, InterruptedException {

Review comment:
       yes, here the [HiveTableOperations.doUnlock](https://github.com/apache/iceberg/blob/master/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L340) method throws `InterruptedException`. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r535923649



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes

Review comment:
       Might worth to mention in the documentation, or somewhere that this should be smaller than `hive.txn.timeout` or in newer versions `metastore.txn.timeout` otherwise the locks might be timed out without because of the lack of heartbeat.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] raptond commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
raptond commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r539075826



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -304,32 +330,57 @@ private long acquireLock() throws UnknownHostException, TException, InterruptedE
         System.getProperty("user.name"),
         InetAddress.getLocalHost().getHostName());
     LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    LockState state = lockResponse.getState();
+    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
     long lockId = lockResponse.getLockid();
 
     final long start = System.currentTimeMillis();
     long duration = 0;
     boolean timeout = false;
-    while (!timeout && state.equals(LockState.WAITING)) {
-      lockResponse = metaClients.run(client -> client.checkLock(lockId));
-      state = lockResponse.getState();
 
-      // check timeout
-      duration = System.currentTimeMillis() - start;
-      if (duration > lockAcquireTimeout) {
+    if (state.get().equals(LockState.WAITING)) {
+      try {
+        Tasks.foreach(lockId)
+            .retry(Integer.MAX_VALUE - 100) // Endless retries bound by timeouts. Tasks.retry adds 1 for "first try".

Review comment:
       👍 +1 I have added the rationale in the comments.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] raptond commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
raptond commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r538762910



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -304,32 +330,57 @@ private long acquireLock() throws UnknownHostException, TException, InterruptedE
         System.getProperty("user.name"),
         InetAddress.getLocalHost().getHostName());
     LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    LockState state = lockResponse.getState();
+    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
     long lockId = lockResponse.getLockid();
 
     final long start = System.currentTimeMillis();
     long duration = 0;
     boolean timeout = false;
-    while (!timeout && state.equals(LockState.WAITING)) {
-      lockResponse = metaClients.run(client -> client.checkLock(lockId));
-      state = lockResponse.getState();
 
-      // check timeout
-      duration = System.currentTimeMillis() - start;
-      if (duration > lockAcquireTimeout) {
+    if (state.get().equals(LockState.WAITING)) {
+      try {
+        Tasks.foreach(lockId)
+            .retry(Integer.MAX_VALUE - 100) // Endless retries bound by timeouts. Tasks.retry adds 1 for "first try".
+            .exponentialBackoff(
+                lockCheckMinWaitTime,
+                lockCheckMaxWaitTime,
+                lockAcquireTimeout,
+                lockCheckBackoffScaleFactor)
+            .throwFailureWhenFinished()
+            .onlyRetryOn(WaitingForLockException.class)
+            .run(id -> {
+              try {
+                LockResponse response = metaClients.run(client -> client.checkLock(id));
+                LockState newState = response.getState();
+                state.set(newState);
+                if (newState.equals(LockState.WAITING)) {
+                  throw WAITING_FOR_LOCK_EXCEPTION;
+                }
+              } catch (InterruptedException | TException e) {

Review comment:
       The code looks better after this comment. However throwing `WaitingForLockException` ends up losing the source of the original `InterruptedException` because it would get handled here: https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/util/Tasks.java#L452
   
   So, I chose to throw `RuntimeException` which will stop the retry and preserve the source stack trace. 
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r538893914



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -294,32 +313,50 @@ private long acquireLock() throws UnknownHostException, TException, InterruptedE
         System.getProperty("user.name"),
         InetAddress.getLocalHost().getHostName());
     LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    LockState state = lockResponse.getState();
+    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
     long lockId = lockResponse.getLockid();
 
     final long start = System.currentTimeMillis();
     long duration = 0;
     boolean timeout = false;
-    while (!timeout && state.equals(LockState.WAITING)) {
-      lockResponse = metaClients.run(client -> client.checkLock(lockId));
-      state = lockResponse.getState();
 
-      // check timeout
-      duration = System.currentTimeMillis() - start;
-      if (duration > lockAcquireTimeout) {
+    if (state.get().equals(LockState.WAITING)) {
+      try {
+        Tasks.foreach(lockId)
+            .retry(Integer.MAX_VALUE - 100) // Endless retries bound by timeouts. Tasks.retry adds 1 for "first try".
+            .exponentialBackoff(
+                lockCheckMinWaitTime,
+                lockCheckMaxWaitTime,
+                lockAcquireTimeout,
+                1.5)
+            .throwFailureWhenFinished()
+            .onlyRetryOn(WaitingForLockException.class)
+            .run(id -> {
+              try {
+                LockResponse response = metaClients.run(client -> client.checkLock(id));
+                LockState newState = response.getState();
+                state.set(newState);
+                if (newState.equals(LockState.WAITING)) {
+                  throw new WaitingForLockException("Waiting for lock.");
+                }
+              } catch (InterruptedException e) {
+                Thread.currentThread().interrupt();
+                throw new RuntimeException("Interrupted while checking lock status.", e);

Review comment:
       I don't think it is necessary to throw `RuntimeException` here. If this doesn't throw `WaitingForLockException` then it will exit and move on. Since timeout is not set, it would hit the check for whether the lock was acquired and fail, resulting in the `CommitFailedException`.
   
   I think that's a fairly reasonable way to handle an interrupt without wrapping it in a `RuntimeException`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536165380



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -304,32 +330,57 @@ private long acquireLock() throws UnknownHostException, TException, InterruptedE
         System.getProperty("user.name"),
         InetAddress.getLocalHost().getHostName());
     LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    LockState state = lockResponse.getState();
+    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
     long lockId = lockResponse.getLockid();
 
     final long start = System.currentTimeMillis();
     long duration = 0;
     boolean timeout = false;
-    while (!timeout && state.equals(LockState.WAITING)) {
-      lockResponse = metaClients.run(client -> client.checkLock(lockId));
-      state = lockResponse.getState();
 
-      // check timeout
-      duration = System.currentTimeMillis() - start;
-      if (duration > lockAcquireTimeout) {
+    if (state.get().equals(LockState.WAITING)) {
+      try {
+        Tasks.foreach(lockId)
+            .retry(Integer.MAX_VALUE - 100) // Endless retries bound by timeouts. Tasks.retry adds 1 for "first try".

Review comment:
       Why - 100?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi merged pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
aokolnychyi merged pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] raptond commented on pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
raptond commented on pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#issuecomment-742712798


   @aokolnychyi - Sure, I will work on to submit a new one with catalog options (properties via initialize method) overriding the Configuration. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r535992519



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds
+  private static final double HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR_DEFAULT = 1.5;
   private static final DynMethods.UnboundMethod ALTER_TABLE = DynMethods.builder("alter_table")
       .impl(HiveMetaStoreClient.class, "alter_table_with_environmentContext",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .impl(HiveMetaStoreClient.class, "alter_table",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .build();
 
+  protected static class WaitingForLockException extends RuntimeException {
+    public WaitingForLockException(String message) {
+      super(message);
+    }
+  }
+
+  private static final WaitingForLockException WAITING_FOR_LOCK_EXCEPTION =

Review comment:
       nit: it seems we use it only in one place. Would it make sense to just call `throw new` there?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r538851899



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -304,32 +330,57 @@ private long acquireLock() throws UnknownHostException, TException, InterruptedE
         System.getProperty("user.name"),
         InetAddress.getLocalHost().getHostName());
     LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    LockState state = lockResponse.getState();
+    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
     long lockId = lockResponse.getLockid();
 
     final long start = System.currentTimeMillis();
     long duration = 0;
     boolean timeout = false;
-    while (!timeout && state.equals(LockState.WAITING)) {
-      lockResponse = metaClients.run(client -> client.checkLock(lockId));
-      state = lockResponse.getState();
 
-      // check timeout
-      duration = System.currentTimeMillis() - start;
-      if (duration > lockAcquireTimeout) {
+    if (state.get().equals(LockState.WAITING)) {
+      try {
+        Tasks.foreach(lockId)
+            .retry(Integer.MAX_VALUE - 100) // Endless retries bound by timeouts. Tasks.retry adds 1 for "first try".

Review comment:
       I think it would be worth noting the rational for a choice like this in a comment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] raptond commented on pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
raptond commented on pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#issuecomment-741624785


   All comments taken care. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536382097



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds

Review comment:
       That being said, I am +1 for adding catalog options.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r538895183



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -304,32 +330,57 @@ private long acquireLock() throws UnknownHostException, TException, InterruptedE
         System.getProperty("user.name"),
         InetAddress.getLocalHost().getHostName());
     LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    LockState state = lockResponse.getState();
+    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
     long lockId = lockResponse.getLockid();
 
     final long start = System.currentTimeMillis();
     long duration = 0;
     boolean timeout = false;
-    while (!timeout && state.equals(LockState.WAITING)) {
-      lockResponse = metaClients.run(client -> client.checkLock(lockId));
-      state = lockResponse.getState();
 
-      // check timeout
-      duration = System.currentTimeMillis() - start;
-      if (duration > lockAcquireTimeout) {
+    if (state.get().equals(LockState.WAITING)) {
+      try {
+        Tasks.foreach(lockId)
+            .retry(Integer.MAX_VALUE - 100) // Endless retries bound by timeouts. Tasks.retry adds 1 for "first try".
+            .exponentialBackoff(
+                lockCheckMinWaitTime,
+                lockCheckMaxWaitTime,
+                lockAcquireTimeout,
+                lockCheckBackoffScaleFactor)
+            .throwFailureWhenFinished()
+            .onlyRetryOn(WaitingForLockException.class)
+            .run(id -> {
+              try {
+                LockResponse response = metaClients.run(client -> client.checkLock(id));
+                LockState newState = response.getState();
+                state.set(newState);
+                if (newState.equals(LockState.WAITING)) {
+                  throw WAITING_FOR_LOCK_EXCEPTION;
+                }
+              } catch (InterruptedException | TException e) {

Review comment:
       I'd probably opt to suppress the interrupt and let the code carry on after setting that the thread was interrupted. That results in a `CommitFailedException`. I don't think that preserving the stack of the `InterruptedException` is really needed, but I'm fin with it this way if you prefer it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#issuecomment-742692663


   Thanks everyone! The change looks solid so I merged it!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] raptond commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
raptond commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536348462



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds
+  private static final double HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR_DEFAULT = 1.5;
   private static final DynMethods.UnboundMethod ALTER_TABLE = DynMethods.builder("alter_table")
       .impl(HiveMetaStoreClient.class, "alter_table_with_environmentContext",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .impl(HiveMetaStoreClient.class, "alter_table",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .build();
 
+  protected static class WaitingForLockException extends RuntimeException {

Review comment:
       It's a miss from using in a testing. Fixing it. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536255049



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -304,32 +330,57 @@ private long acquireLock() throws UnknownHostException, TException, InterruptedE
         System.getProperty("user.name"),
         InetAddress.getLocalHost().getHostName());
     LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    LockState state = lockResponse.getState();
+    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
     long lockId = lockResponse.getLockid();
 
     final long start = System.currentTimeMillis();
     long duration = 0;
     boolean timeout = false;
-    while (!timeout && state.equals(LockState.WAITING)) {
-      lockResponse = metaClients.run(client -> client.checkLock(lockId));
-      state = lockResponse.getState();
 
-      // check timeout
-      duration = System.currentTimeMillis() - start;
-      if (duration > lockAcquireTimeout) {
+    if (state.get().equals(LockState.WAITING)) {
+      try {
+        Tasks.foreach(lockId)
+            .retry(Integer.MAX_VALUE - 100) // Endless retries bound by timeouts. Tasks.retry adds 1 for "first try".
+            .exponentialBackoff(
+                lockCheckMinWaitTime,
+                lockCheckMaxWaitTime,
+                lockAcquireTimeout,
+                lockCheckBackoffScaleFactor)
+            .throwFailureWhenFinished()
+            .onlyRetryOn(WaitingForLockException.class)
+            .run(id -> {
+              try {
+                LockResponse response = metaClients.run(client -> client.checkLock(id));
+                LockState newState = response.getState();
+                state.set(newState);
+                if (newState.equals(LockState.WAITING)) {
+                  throw WAITING_FOR_LOCK_EXCEPTION;
+                }
+              } catch (InterruptedException | TException e) {

Review comment:
       For `InterruptedException`, why not throw `WaitingForLockException` and signal that the thread was interrupted? Then this could use the checked exception call, `run(id -> {...}, TException.class)` and would not need to wrap the exceptions.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r535935973



##########
File path: hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveMetastore.java
##########
@@ -84,7 +84,7 @@
   private ExecutorService executorService;
   private TServer server;
   private HiveMetaStore.HMSHandler baseHandler;
-  private HiveClientPool clientPool;
+  protected HiveClientPool clientPool; // Exposed for testing.

Review comment:
       nit: Could we use VisibleForTesting annotation here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536168371



##########
File path: hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommitLocks.java
##########
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hive;
+
+import java.util.concurrent.atomic.AtomicReference;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.HiveMetaStoreClient;
+import org.apache.hadoop.hive.metastore.api.LockResponse;
+import org.apache.hadoop.hive.metastore.api.LockState;
+import org.apache.iceberg.AssertHelpers;
+import org.apache.iceberg.HasTableOperations;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.types.Types;
+import org.apache.thrift.TException;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import static org.mockito.Matchers.any;
+import static org.mockito.Matchers.eq;
+import static org.mockito.Mockito.doNothing;
+import static org.mockito.Mockito.doReturn;
+import static org.mockito.Mockito.spy;
+import static org.mockito.Mockito.when;
+
+public class TestHiveCommitLocks extends HiveTableBaseTest {
+  HiveTableOperations ops = null;
+  HiveTableOperations spyOps = null;
+  HiveClientPool spyClientPool = null;
+  HiveMetaStoreClient spyClient = null;
+  TableMetadata metadataV1 = null;
+  TableMetadata metadataV2 = null;
+
+  long dummyLockId = 500L;
+  LockResponse waitLockResponse = new LockResponse(dummyLockId, LockState.WAITING);
+  LockResponse acquiredLockResponse = new LockResponse(dummyLockId, LockState.ACQUIRED);
+  LockResponse notAcquiredLockResponse = new LockResponse(dummyLockId, LockState.NOT_ACQUIRED);
+
+  @Before
+  public void before() throws Exception {
+    Table table = catalog.loadTable(TABLE_IDENTIFIER);
+    ops = (HiveTableOperations) ((HasTableOperations) table).operations();
+    String dbName = TABLE_IDENTIFIER.namespace().level(0);
+    String tableName = TABLE_IDENTIFIER.name();
+    Configuration overriddenHiveConf = new Configuration(hiveConf);
+    overriddenHiveConf.setLong("iceberg.hive.lock-timeout-ms", 10 * 1000);
+    overriddenHiveConf.setLong("iceberg.hive.lock-check-min-wait-ms", 50);
+    overriddenHiveConf.setLong("iceberg.hive.lock-check-max-wait-ms", 5 * 1000);
+    overriddenHiveConf.setDouble("iceberg.hive.lock-check-backoff-scale-factor", 3.0);
+
+    metadataV1 = ops.current();
+
+    table.updateSchema()
+        .addColumn("n", Types.IntegerType.get())
+        .commit();
+
+    ops.refresh();
+
+    metadataV2 = ops.current();
+
+    Assert.assertEquals(2, ops.current().schema().columns().size());
+
+    spyClientPool = spy(new HiveClientPool(1, overriddenHiveConf));
+    AtomicReference<HiveMetaStoreClient> spyClientRef = new AtomicReference<>();
+
+    when(spyClientPool.newClient()).thenAnswer(invocation -> {
+      HiveMetaStoreClient client = (HiveMetaStoreClient) invocation.callRealMethod();
+      spyClientRef.set(spy(client));
+      return spyClientRef.get();
+    });
+
+    spyOps = spy(new HiveTableOperations(overriddenHiveConf, spyClientPool, ops.io(), catalog.name(),
+        dbName, tableName));
+    spyClientPool.run(client -> client.isLocalMetaStore()); // To ensure new client is created.
+    Assert.assertNotNull(spyClientRef.get());
+
+    spyClient = spyClientRef.get();
+  }
+
+  @After
+  public void cleanup() {
+    try {
+      spyClientPool.close();
+    } catch (Throwable t) {
+      // Ignore any exception
+    }
+  }
+
+  @Test
+  public void testLockAcquisitionAtFirstTime() throws TException, InterruptedException {
+    doReturn(acquiredLockResponse).when(spyClient).lock(any());
+    doNothing().when(spyOps).doUnlock(eq(dummyLockId));
+
+    spyOps.doCommit(metadataV2, metadataV1);
+
+    Assert.assertEquals(1, spyOps.current().schema().columns().size()); // should be 1 again
+  }
+
+  @Test
+  public void testLockAcquisitionAfterRetries() throws TException, InterruptedException {

Review comment:
       Is `InterruptedException` needed?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536381718



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds

Review comment:
       This API is also used from Spark 2 where we don't have a way to specify catalog options. Plus, we already have a Hadoop conf for the lock timeout. How do we approach this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536402264



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds

Review comment:
       In that case, we should default from Configuration, but prefer options passed to the initialize method.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] raptond commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
raptond commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r539116073



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -66,33 +68,50 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds
   private static final DynMethods.UnboundMethod ALTER_TABLE = DynMethods.builder("alter_table")
       .impl(HiveMetaStoreClient.class, "alter_table_with_environmentContext",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .impl(HiveMetaStoreClient.class, "alter_table",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .build();
 
+  private static class WaitingForLockException extends RuntimeException {
+    WaitingForLockException(String message) {
+      super(message);
+    }
+  }
+
   private final HiveClientPool metaClients;
   private final String fullName;
   private final String database;
   private final String tableName;
   private final Configuration conf;
   private final long lockAcquireTimeout;
+  private final long lockCheckMinWaitTime;
+  private final long lockCheckMaxWaitTime;
   private final FileIO fileIO;
 
   protected HiveTableOperations(Configuration conf, HiveClientPool metaClients, FileIO fileIO,
-                                String catalogName, String database, String table) {
+      String catalogName, String database, String table) {

Review comment:
       done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r535993815



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds
+  private static final double HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR_DEFAULT = 1.5;
   private static final DynMethods.UnboundMethod ALTER_TABLE = DynMethods.builder("alter_table")
       .impl(HiveMetaStoreClient.class, "alter_table_with_environmentContext",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .impl(HiveMetaStoreClient.class, "alter_table",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .build();
 
+  protected static class WaitingForLockException extends RuntimeException {

Review comment:
       Why protected?

##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds
+  private static final double HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR_DEFAULT = 1.5;
   private static final DynMethods.UnboundMethod ALTER_TABLE = DynMethods.builder("alter_table")
       .impl(HiveMetaStoreClient.class, "alter_table_with_environmentContext",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .impl(HiveMetaStoreClient.class, "alter_table",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .build();
 
+  protected static class WaitingForLockException extends RuntimeException {

Review comment:
       Why protected?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r540368933



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -70,21 +72,38 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR = "iceberg.hive.lock-check-backoff-scale-factor";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds

Review comment:
       To make sure I got @rdblue and @jackye1995. You are talking about generalizing catalog options, right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r536170029



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -96,7 +115,14 @@ protected HiveTableOperations(Configuration conf, HiveClientPool metaClients, Fi
     this.database = database;
     this.tableName = table;
     this.lockAcquireTimeout =
-        conf.getLong(HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS, HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT);
+        conf.getLong(HIVE_ACQUIRE_LOCK_TIMEOUT_MS, HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT);
+    this.lockCheckMinWaitTime =
+        conf.getLong(HIVE_LOCK_CHECK_MIN_WAIT_MS, HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT);
+    this.lockCheckMaxWaitTime =
+        conf.getLong(HIVE_LOCK_CHECK_MAX_WAIT_MS, HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT);
+    this.lockCheckBackoffScaleFactor =
+        conf.getDouble(HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR, HIVE_LOCK_CHECK_BACKOFF_SCALE_FACTOR_DEFAULT);
+

Review comment:
       nit: extra empty line




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] raptond commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
raptond commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r538772266



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -304,32 +330,57 @@ private long acquireLock() throws UnknownHostException, TException, InterruptedE
         System.getProperty("user.name"),
         InetAddress.getLocalHost().getHostName());
     LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
-    LockState state = lockResponse.getState();
+    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
     long lockId = lockResponse.getLockid();
 
     final long start = System.currentTimeMillis();
     long duration = 0;
     boolean timeout = false;
-    while (!timeout && state.equals(LockState.WAITING)) {
-      lockResponse = metaClients.run(client -> client.checkLock(lockId));
-      state = lockResponse.getState();
 
-      // check timeout
-      duration = System.currentTimeMillis() - start;
-      if (duration > lockAcquireTimeout) {
+    if (state.get().equals(LockState.WAITING)) {
+      try {
+        Tasks.foreach(lockId)
+            .retry(Integer.MAX_VALUE - 100) // Endless retries bound by timeouts. Tasks.retry adds 1 for "first try".

Review comment:
       I only wanted to keep a big number for retry.  Eg `Integer.MAX_VALUE`. But, the [setter](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/util/Tasks.java#L163) adds 1 overflowing to `MIN_VALUE`. 
   
   `Integer.MAX_VALUE - 1` would simply suffice, but I chose conservatively to set `Integer.MAX_VALUE - 100`. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] raptond commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
raptond commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r539078415



##########
File path: hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveMetastore.java
##########
@@ -84,7 +84,7 @@
   private ExecutorService executorService;
   private TServer server;
   private HiveMetaStore.HMSHandler baseHandler;
-  private HiveClientPool clientPool;
+  protected HiveClientPool clientPool; // Exposed for testing.

Review comment:
       Since this was a test class, I didn't add this annotation. I finally ended up not using it, so I reverted back the change. Thanks for the review. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1873: Hive: Make lock check retries backoff exponentially

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1873:
URL: https://github.com/apache/iceberg/pull/1873#discussion_r538857584



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -66,33 +68,50 @@
 public class HiveTableOperations extends BaseMetastoreTableOperations {
   private static final Logger LOG = LoggerFactory.getLogger(HiveTableOperations.class);
 
-  private static final String HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
-  private static final long HIVE_ACQUIRE_LOCK_STATE_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms";
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds
   private static final DynMethods.UnboundMethod ALTER_TABLE = DynMethods.builder("alter_table")
       .impl(HiveMetaStoreClient.class, "alter_table_with_environmentContext",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .impl(HiveMetaStoreClient.class, "alter_table",
           String.class, String.class, Table.class, EnvironmentContext.class)
       .build();
 
+  private static class WaitingForLockException extends RuntimeException {
+    WaitingForLockException(String message) {
+      super(message);
+    }
+  }
+
   private final HiveClientPool metaClients;
   private final String fullName;
   private final String database;
   private final String tableName;
   private final Configuration conf;
   private final long lockAcquireTimeout;
+  private final long lockCheckMinWaitTime;
+  private final long lockCheckMaxWaitTime;
   private final FileIO fileIO;
 
   protected HiveTableOperations(Configuration conf, HiveClientPool metaClients, FileIO fileIO,
-                                String catalogName, String database, String table) {
+      String catalogName, String database, String table) {

Review comment:
       Nit: unnecessary whitespace change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org