You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "frankliee (via GitHub)" <gi...@apache.org> on 2023/04/09 11:34:06 UTC

[GitHub] [iceberg] frankliee opened a new pull request, #7310: Hive: Clean up expired metastore clients

frankliee opened a new pull request, #7310:
URL: https://github.com/apache/iceberg/pull/7310

   Fixed a bug that could leak hive client connections.
   
   Caffeine cache will not always call removelistener by default, so an extra scheduler is required to invoke removelistener.
   
   See: https://github.com/google/guava/issues/3295. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "pvary (via GitHub)" <gi...@apache.org>.
pvary commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1169469620


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/CachedClientPool.java:
##########
@@ -97,10 +99,15 @@ HiveClientPool clientPool() {
 
   private synchronized void init() {
     if (clientPoolCache == null) {
+      // Since Caffeine does not ensure that removalListener will be involved after expiration
+      // We use a scheduler with 2 threads to clean up expired clients.

Review Comment:
   Why this is size 2? Why not 1?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] frankliee commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "frankliee (via GitHub)" <gi...@apache.org>.
frankliee commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1161677318


##########
core/src/main/java/org/apache/iceberg/ClientPoolImpl.java:
##########
@@ -147,4 +147,8 @@ private void release(C client) {
   public int poolSize() {
     return poolSize;
   }
+
+  public boolean isClosed() {

Review Comment:
   See the above comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1169568272


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/CachedClientPool.java:
##########
@@ -97,10 +99,15 @@ HiveClientPool clientPool() {
 
   private synchronized void init() {
     if (clientPoolCache == null) {
+      // Since Caffeine does not ensure that removalListener will be involved after expiration
+      // We use a scheduler with 2 threads to clean up expired clients.

Review Comment:
   I agree that this should be rather 1 instead of 2



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1169566618


##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestCachedClientPool.java:
##########
@@ -46,6 +46,10 @@ public void testClientPoolCleaner() throws InterruptedException {
     Assert.assertNull(
         CachedClientPool.clientPoolCache()
             .getIfPresent(CachedClientPool.extractKey(null, hiveConf)));
+
+    // The client has been really closed.
+    Assert.assertTrue(clientPool1.isClosed());

Review Comment:
   I've checked this when reviewing and the test fails with the code as-is. But the test passes with 
   ```
   Awaitility.await()
           .atMost(10, TimeUnit.SECONDS)
           .untilAsserted(
               () -> {
                 Assert.assertTrue(clientPool1.isClosed());
                 Assert.assertTrue(clientPool2.isClosed());
               });
   ```
   because cache removal will eventually be executed (just not immediately)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] frankliee commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "frankliee (via GitHub)" <gi...@apache.org>.
frankliee commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1161675791


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/CachedClientPool.java:
##########
@@ -97,10 +99,13 @@ HiveClientPool clientPool() {
 
   private synchronized void init() {
     if (clientPoolCache == null) {
+      // Since Caffeine does not ensure that removalListener will be involved after expiration
+      // We use a scheduler with 2 threads to clean up expired clients.
       clientPoolCache =
           Caffeine.newBuilder()
               .expireAfterAccess(evictionInterval, TimeUnit.MILLISECONDS)
               .removalListener((ignored, value, cause) -> ((HiveClientPool) value).close())
+              .scheduler(Scheduler.forScheduledExecutorService(Executors.newScheduledThreadPool(2)))

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] frankliee commented on pull request #7310: Hive: Clean up expired metastore clients

Posted by "frankliee (via GitHub)" <gi...@apache.org>.
frankliee commented on PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#issuecomment-1512351204

   > > I think it would be good here to get @pvary's opinion as I'm not very familiar with Hive and how critical it is in this particular case that we perform removal immediately
   > 
   > I have seen cases with Flink when the open HMS client numbers are kept increasing. I was suspicious about the Caffeine cleanup, but was not able to pin down the the issue. That said, I am not sure the issue was the same.
   > 
   > @frankliee: did you have a concrete issue which needs fixing? What was it?
   > 
   > Thanks, Peter
   
   
   In another project, I found guava cache can leak connections, when the cache will not be "set" for a long time.
   So Iceberg has the similar risk.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] frankliee commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "frankliee (via GitHub)" <gi...@apache.org>.
frankliee commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1177447929


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/CachedClientPool.java:
##########
@@ -97,10 +99,15 @@ HiveClientPool clientPool() {
 
   private synchronized void init() {
     if (clientPoolCache == null) {
+      // Since Caffeine does not ensure that removalListener will be involved after expiration
+      // We use a scheduler with 2 threads to clean up expired clients.

Review Comment:
   Thanks for your correction. I have updated the comment. @pvary 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "pvary (via GitHub)" <gi...@apache.org>.
pvary commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1169469984


##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestCachedClientPool.java:
##########
@@ -46,6 +46,10 @@ public void testClientPoolCleaner() throws InterruptedException {
     Assert.assertNull(
         CachedClientPool.clientPoolCache()
             .getIfPresent(CachedClientPool.extractKey(null, hiveConf)));
+
+    // The client has been really closed.
+    Assert.assertTrue(clientPool1.isClosed());

Review Comment:
   Just for the record: Did we test that this test fails without the fix?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] frankliee commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "frankliee (via GitHub)" <gi...@apache.org>.
frankliee commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1170739394


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/CachedClientPool.java:
##########
@@ -97,10 +99,15 @@ HiveClientPool clientPool() {
 
   private synchronized void init() {
     if (clientPoolCache == null) {
+      // Since Caffeine does not ensure that removalListener will be involved after expiration
+      // We use a scheduler with 2 threads to clean up expired clients.

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "pvary (via GitHub)" <gi...@apache.org>.
pvary commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1176335156


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/CachedClientPool.java:
##########
@@ -97,10 +99,15 @@ HiveClientPool clientPool() {
 
   private synchronized void init() {
     if (clientPoolCache == null) {
+      // Since Caffeine does not ensure that removalListener will be involved after expiration
+      // We use a scheduler with 2 threads to clean up expired clients.

Review Comment:
   Could you please update the comment?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary merged pull request #7310: Hive: Clean up expired metastore clients

Posted by "pvary (via GitHub)" <gi...@apache.org>.
pvary merged PR #7310:
URL: https://github.com/apache/iceberg/pull/7310


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] frankliee commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "frankliee (via GitHub)" <gi...@apache.org>.
frankliee commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1162234862


##########
core/src/main/java/org/apache/iceberg/ClientPoolImpl.java:
##########
@@ -147,4 +147,8 @@ private void release(C client) {
   public int poolSize() {
     return poolSize;
   }
+
+  public boolean isClosed() {

Review Comment:
   @VisibleForTesting + public cannot pass Iceberg's style check, so I use public.
   
   Since `boolean isClosed()` is read-only function, public will not introduce risks.
   
   Besides,  `int poolSize()` is this class is also public.
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "chenjunjiedada (via GitHub)" <gi...@apache.org>.
chenjunjiedada commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1161430375


##########
core/src/main/java/org/apache/iceberg/ClientPoolImpl.java:
##########
@@ -147,4 +147,8 @@ private void release(C client) {
   public int poolSize() {
     return poolSize;
   }
+
+  public boolean isClosed() {

Review Comment:
   Do we need this to be public? 



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/CachedClientPool.java:
##########
@@ -97,10 +99,13 @@ HiveClientPool clientPool() {
 
   private synchronized void init() {
     if (clientPoolCache == null) {
+      // Since Caffeine does not ensure that removalListener will be involved after expiration
+      // We use a scheduler with 2 threads to clean up expired clients.
       clientPoolCache =
           Caffeine.newBuilder()
               .expireAfterAccess(evictionInterval, TimeUnit.MILLISECONDS)
               .removalListener((ignored, value, cause) -> ((HiveClientPool) value).close())
+              .scheduler(Scheduler.forScheduledExecutorService(Executors.newScheduledThreadPool(2)))

Review Comment:
   Nit: Iceberg has a thread pool util `ThreadPools.newScheduledPool` that you can use. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] frankliee commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "frankliee (via GitHub)" <gi...@apache.org>.
frankliee commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1161677318


##########
core/src/main/java/org/apache/iceberg/ClientPoolImpl.java:
##########
@@ -147,4 +147,8 @@ private void release(C client) {
   public int poolSize() {
     return poolSize;
   }
+
+  public boolean isClosed() {

Review Comment:
   See the above comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on pull request #7310: Hive: Clean up expired metastore clients

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#issuecomment-1507222190

   I think it would be good here to get @pvary's opinion as I'm not very familiar with Hive and how critical it is in this particular case that we perform removal immediately


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #7310: Hive: Clean up expired metastore clients

Posted by "chenjunjiedada (via GitHub)" <gi...@apache.org>.
chenjunjiedada commented on PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#issuecomment-1521123966

   @nastra @pvary Any more options on this? Is it ready to go? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] frankliee commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "frankliee (via GitHub)" <gi...@apache.org>.
frankliee commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1161675630


##########
core/src/main/java/org/apache/iceberg/ClientPoolImpl.java:
##########
@@ -147,4 +147,8 @@ private void release(C client) {
   public int poolSize() {
     return poolSize;
   }
+
+  public boolean isClosed() {

Review Comment:
   ClientPoolImpl and UT are in different packages, so package-private is not enough.
   But I will add VisibleForTesting.
   
   cc @chenjunjiedada 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on pull request #7310: Hive: Clean up expired metastore clients

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#issuecomment-1506476128

   > Caffeine cache will not always call removelistener by default, so an extra scheduler is required to invoke removelistener.
   
   From how I understand how Caffeine does this is that it performs the cleanup whenever the cache is being accessed or modified.
   
   > If expireAfter, expireAfterWrite, or expireAfterAccess is requested then entries may be evicted on each cache modification, on occasional cache accesses, or on calls to Cache.cleanUp. A scheduler(Scheduler) may be specified to provide prompt removal of expired entries rather than waiting until activity triggers the periodic maintenance. Expired entries may be counted by Cache.estimatedSize(), but will never be visible to read or write operations.
   
   That means the removal listener will be called. The question for this PR should rather be how **critical** it is that the removal is performed immediately vs waiting until activity triggers the periodic maintenance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] frankliee commented on pull request #7310: Hive: Clean up expired metastore clients

Posted by "frankliee (via GitHub)" <gi...@apache.org>.
frankliee commented on PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#issuecomment-1506863502

   > > Caffeine cache will not always call removelistener by default, so an extra scheduler is required to invoke removelistener.
   > 
   > From how I understand how Caffeine does this is that it performs the cleanup whenever the cache is being accessed or modified. Below is an excerpt from the Javadoc:
   > 
   > > If expireAfter, expireAfterWrite, or expireAfterAccess is requested then entries may be evicted on each cache modification, on occasional cache accesses, or on calls to Cache.cleanUp. A scheduler(Scheduler) may be specified to provide prompt removal of expired entries rather than waiting until activity triggers the periodic maintenance. Expired entries may be counted by Cache.estimatedSize(), but will never be visible to read or write operations.
   > 
   > That means the removal listener will be called. And this can be seen by modifying the test to
   > 
   > ```
   >     Awaitility.await()
   >         .atMost(10, TimeUnit.SECONDS)
   >         .untilAsserted(
   >             () -> {
   >               Assert.assertTrue(clientPool1.isClosed());
   >               Assert.assertTrue(clientPool2.isClosed());
   >             });
   > ```
   > 
   > The question for this PR should rather be how **critical** it is that the removal is performed immediately vs waiting until activity triggers the periodic maintenance.
   
   
   
   I understand your concerns, and this PR chooses to call removal immediately for the following two reasons.
   
   1) Caffeine's default stratey will postpone removal as much as possible to reduce overhead.  In this Iceberg, removal will be call for each 5 min by default, so the overhead is acceptable. 
   
   2) Iceberg has a conf called "client.pool.cache.eviction-interval-ms" to ensure the time of eviction, but the uncertainty of Caffeine default stratey will make it strange to predict. This PR makes it meaningful.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] frankliee commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "frankliee (via GitHub)" <gi...@apache.org>.
frankliee commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1177447929


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/CachedClientPool.java:
##########
@@ -97,10 +99,15 @@ HiveClientPool clientPool() {
 
   private synchronized void init() {
     if (clientPoolCache == null) {
+      // Since Caffeine does not ensure that removalListener will be involved after expiration
+      // We use a scheduler with 2 threads to clean up expired clients.

Review Comment:
   Thanks for your correction. I have updated the comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1161564293


##########
core/src/main/java/org/apache/iceberg/ClientPoolImpl.java:
##########
@@ -147,4 +147,8 @@ private void release(C client) {
   public int poolSize() {
     return poolSize;
   }
+
+  public boolean isClosed() {

Review Comment:
   package-private is enough? For UT, we could mark it as `VisibleForTesting`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] frankliee commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "frankliee (via GitHub)" <gi...@apache.org>.
frankliee commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1170760148


##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestCachedClientPool.java:
##########
@@ -46,6 +46,10 @@ public void testClientPoolCleaner() throws InterruptedException {
     Assert.assertNull(
         CachedClientPool.clientPoolCache()
             .getIfPresent(CachedClientPool.extractKey(null, hiveConf)));
+
+    // The client has been really closed.
+    Assert.assertTrue(clientPool1.isClosed());

Review Comment:
   Caffeine's doc describes its complicated cleanup behavior, and introduces the scheduler to prompt cleanup.
    
   `By default, Caffeine does not perform cleanup and evict values "automatically" or instantly after a value expires. Instead, it performs small amounts of maintenance work after write operations or occasionally after read operations if writes are rare. `
   ( see: https://github.com/ben-manes/caffeine/wiki/Cleanup )
   
   The unpredictability of removal could increase the pressure on HMS, when there are many applications (including Flink and Spark).
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #7310: Hive: Clean up expired metastore clients

Posted by "pvary (via GitHub)" <gi...@apache.org>.
pvary commented on PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#issuecomment-1523605182

   Thanks for the review @chenjunjiedada, @nastra, @ConeyLiu for the review, and @frankliee for the PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #7310: Hive: Clean up expired metastore clients

Posted by "pvary (via GitHub)" <gi...@apache.org>.
pvary commented on PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#issuecomment-1511560759

   > I think it would be good here to get @pvary's opinion as I'm not very familiar with Hive and how critical it is in this particular case that we perform removal immediately
   
   I have seen cases with Flink when the open HMS client numbers are kept increasing. I was suspicious about the Caffeine cleanup, but was not able to pin down the the issue. That said, I am not sure the issue was the same.
   
   @frankliee: did you have a concrete issue which needs fixing? What was it?
   
   Thanks, Peter 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7310: Hive: Clean up expired metastore clients

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7310:
URL: https://github.com/apache/iceberg/pull/7310#discussion_r1169566618


##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestCachedClientPool.java:
##########
@@ -46,6 +46,10 @@ public void testClientPoolCleaner() throws InterruptedException {
     Assert.assertNull(
         CachedClientPool.clientPoolCache()
             .getIfPresent(CachedClientPool.extractKey(null, hiveConf)));
+
+    // The client has been really closed.
+    Assert.assertTrue(clientPool1.isClosed());

Review Comment:
   I've checked this when reviewing and the test fails on master with these 2 checks added. However, the test passes with some wait time
   ```
   Awaitility.await()
           .atMost(10, TimeUnit.SECONDS)
           .untilAsserted(
               () -> {
                 Assert.assertTrue(clientPool1.isClosed());
                 Assert.assertTrue(clientPool2.isClosed());
               });
   ```
   because cache removal will eventually be executed (just not immediately).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org