You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/04/30 13:00:58 UTC

[GitHub] [iceberg] marton-bod opened a new pull request #2547: Hive: add table level JVM lock on commit operation

marton-bod opened a new pull request #2547:
URL: https://github.com/apache/iceberg/pull/2547


   Following up on https://github.com/apache/iceberg/issues/2540. 
   This commit introduces table-level JVM locks for commit operations to avoid requesting HMS locks unnecessarily when multiple threads are trying to commit to the same table concurrently. The implementation uses a lock cache in order to instantiate any new locks in a thread-safe manner and evict table keys over time to avoid the caching table keys indefinitely.
   
   @RussellSpitzer @raptond @pvary @aokolnychyi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2547: Hive: add table level JVM lock on HiveCatalog commit operation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2547:
URL: https://github.com/apache/iceberg/pull/2547#discussion_r627915043



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -191,6 +209,10 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) {
     CommitStatus commitStatus = CommitStatus.FAILURE;
     boolean updateHiveTable = false;
     Optional<Long> lockId = Optional.empty();
+    // getting a process-level lock per table to avoid concurrent commit attempts to the same table from the same
+    // JVM process, which would result in unnecessary and costly HMS lock acquisition requests
+    ReentrantLock tableLevelMutex = commitLockCache.get(fullName, t -> new ReentrantLock(true));

Review comment:
       This seems safe even though one application can interact with multiple metastores as full name includes the catalog name.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2547: Hive: add table level JVM lock on HiveCatalog commit operation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2547:
URL: https://github.com/apache/iceberg/pull/2547#issuecomment-834724744


   Thanks for fixing this, @marton-bod! Thanks for reviewing, @RussellSpitzer @pvary @kbendick!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #2547: Hive: add table level JVM lock on HiveCatalog commit operation

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #2547:
URL: https://github.com/apache/iceberg/pull/2547#issuecomment-834068536


   Users can only synchronize entire writes not just the commit process. Without this patch, a user has to do the entire write process synchronously to gain the same effect.
   
   For example instead of two jobs preparing files at the same time and then synchronously committing, a user would need to do one write and commit before proceeding to the next.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2547: Hive: add table level JVM lock on HiveCatalog commit operation

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2547:
URL: https://github.com/apache/iceberg/pull/2547#discussion_r623886438



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -96,6 +102,15 @@
       GC_ENABLED, "external.table.purge"
   );
 
+  private static Cache<String, ReentrantLock> commitLockCache;
+
+  private static synchronized void initTableLevelLockCache(long evictionTimeout) {
+    if (commitLockCache == null) {
+      commitLockCache = Caffeine.newBuilder()
+          .expireAfterAccess(evictionTimeout, TimeUnit.MILLISECONDS)

Review comment:
       Yeah I like how the worst case scenario in the lock's not working is that we just fall back to the old behavior.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2547: Hive: add table level JVM lock on HiveCatalog commit operation

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2547:
URL: https://github.com/apache/iceberg/pull/2547#discussion_r623867809



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -96,6 +102,15 @@
       GC_ENABLED, "external.table.purge"
   );
 
+  private static Cache<String, ReentrantLock> commitLockCache;
+
+  private static synchronized void initTableLevelLockCache(long evictionTimeout) {
+    if (commitLockCache == null) {
+      commitLockCache = Caffeine.newBuilder()
+          .expireAfterAccess(evictionTimeout, TimeUnit.MILLISECONDS)

Review comment:
       How does `expireAfterAccess` work?
   I am worried about this sequence:
   - C1 commit comes, gets the JVM lock, and HMS lock
   - eviction timeout
   - C2 commit comes, gets a new JVM lock, and tries to lock
   - C1 finishes and unlocks the old JVM lock




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi merged pull request #2547: Hive: add table level JVM lock on HiveCatalog commit operation

Posted by GitBox <gi...@apache.org>.
aokolnychyi merged pull request #2547:
URL: https://github.com/apache/iceberg/pull/2547


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #2547: Hive: add table level JVM lock on HiveCatalog commit operation

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #2547:
URL: https://github.com/apache/iceberg/pull/2547#issuecomment-836450994


   Thanks everyone for the reviews and @aokolnychyi for the merge!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #2547: Hive: add table level JVM lock on HiveCatalog commit operation

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #2547:
URL: https://github.com/apache/iceberg/pull/2547#issuecomment-833243630


   Thanks for making this update. Should we add the new config `iceberg.hive.table-level-lock-evict-ms` to the docs? Unless a user knows about the concurrent threads issue, this particular config option might be kind of confusing. I'm simply thinking out loud and would be ok with following up with a small docs update in a later PR and/or accepting for the moment that this is likely a more advanced config that most users won't need to update.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #2547: Hive: add table level JVM lock on HiveCatalog commit operation

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #2547:
URL: https://github.com/apache/iceberg/pull/2547#issuecomment-834866563


   > Thanks @kbendick for reviewing. I also think we probably don't want to include this in the docs in order to avoid overloading users with internally-used configs they'll most likely never need to tune. Tuning this parameter should be very rare and 99% of users would not care about it, I expect. For the remainder of users, they will probably dig into the code anyway and find this flag quite soon after a bit of search, but just my 2 cents.
   
   Yeah that makes sense to me. Only a power user would be likely to tune this parameter and they'd probably go into the code / reach out in one of the channels, etc, like you mentioned. 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #2547: Hive: add table level JVM lock on HiveCatalog commit operation

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #2547:
URL: https://github.com/apache/iceberg/pull/2547#issuecomment-833546353


   Thanks @kbendick for reviewing. I also think we probably don't want to include this in the docs in order to avoid overloading users with internally-used configs they'll most likely never need to tune. Tuning this parameter should be very rare and 99% of users would not care about it, I expect. For the remainder of users, they will probably dig into the code anyway and find this flag quite soon after a bit of search, but just my 2 cents.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2547: Hive: add table level JVM lock on HiveCatalog commit operation

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2547:
URL: https://github.com/apache/iceberg/pull/2547#discussion_r623880632



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -96,6 +102,15 @@
       GC_ENABLED, "external.table.purge"
   );
 
+  private static Cache<String, ReentrantLock> commitLockCache;
+
+  private static synchronized void initTableLevelLockCache(long evictionTimeout) {
+    if (commitLockCache == null) {
+      commitLockCache = Caffeine.newBuilder()
+          .expireAfterAccess(evictionTimeout, TimeUnit.MILLISECONDS)

Review comment:
       `expireAfterAccess` is based on the amount of time that elapsed after the last read/write operation on the cache entry. So the scenario that you mentioned could only happen if the C1 commit takes more than the entire eviction timeout period (10 minutes by default) which is very unlikely. Even if it does happens (e.g. due to some extreme lock starvation), and C2 starts the commit operation, it shouldn't have too much of an impact because the HMS locking mechanism will still enforce that there won't be write conflicts between threads (i.e. we would be basically back to where we are in the status quo, without the table-level locks).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2547: Hive: add table level JVM lock on HiveCatalog commit operation

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2547:
URL: https://github.com/apache/iceberg/pull/2547#discussion_r623880632



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -96,6 +102,15 @@
       GC_ENABLED, "external.table.purge"
   );
 
+  private static Cache<String, ReentrantLock> commitLockCache;
+
+  private static synchronized void initTableLevelLockCache(long evictionTimeout) {
+    if (commitLockCache == null) {
+      commitLockCache = Caffeine.newBuilder()
+          .expireAfterAccess(evictionTimeout, TimeUnit.MILLISECONDS)

Review comment:
       `expireAfterAccess` is based on the amount of time that elapsed after the last read/write operation on the cache entry. So the scenario that you mentioned could only happen if the C1 commit takes more than the entire eviction timeout period (10 minutes by default) which is very unlikely. Even if it does happen (e.g. due to some extreme lock starvation), and C2 starts the commit operation, it shouldn't have too much of an impact because the HMS locking mechanism will still enforce that there won't be write conflicts between threads (i.e. we would be basically back to where we are in the status quo, without the table-level locks).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2547: Hive: add table level JVM lock on HiveCatalog commit operation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2547:
URL: https://github.com/apache/iceberg/pull/2547#issuecomment-833973237


   Let me also take a look today/tomorrow. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org