You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "zhangminglei (via GitHub)" <gi...@apache.org> on 2023/10/16 10:23:54 UTC

[PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

zhangminglei opened a new pull request, #8844:
URL: https://github.com/apache/iceberg/pull/8844

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Update CachingCatalog to use expireAfterWrite instead of expireAfterAccess [iceberg]

Posted by "zhangminglei (via GitHub)" <gi...@apache.org>.
zhangminglei commented on PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#issuecomment-1773078235

   Allow me to provide some context. The reason I initially opened this PR was because we observed that when a user attempted to create a table, they got an exception indicating that the table already existed. However, the table did not exist because it had been deleted by another client. And this problem has been encountered many times, but we cannot give users an accurate answer about how long it will take to successfully rebuild the table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Update CachingCatalog to use expireAfterWrite instead of expireAfterAccess [iceberg]

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#issuecomment-1773316624

   I'm also pretty cautious about changing the behavior here. I'm pretty happy with the current default and using "REFRESH TABLE" if the user is in doubt about cache state (or disabling cache entirely). Does Refresh table not work in this instance? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

Posted by "zhangminglei (via GitHub)" <gi...@apache.org>.
zhangminglei commented on PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#issuecomment-1765624174

   @nastra Thanks for your review, code updated done. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

Posted by "zhangminglei (via GitHub)" <gi...@apache.org>.
zhangminglei commented on code in PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360798631


##########
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCachingCatalogExpirationAfterWrite.java:
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.extensions;
+
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.hive.HiveClientPool;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.spark.SparkCatalog;
+import org.apache.spark.sql.connector.catalog.Identifier;
+import org.junit.After;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.runners.Parameterized;
+import org.mockito.Mockito;
+
+public class TestCachingCatalogExpirationAfterWrite extends SparkExtensionsTestBase {
+  private static HiveClientPool clients;
+
+  @Parameterized.Parameters(name = "catalogName = {0}, implementation = {1}, config = {2}")
+  public static Object[][] parameters() {
+    return new Object[][] {
+      {
+        "iceberg",
+        SparkCatalog.class.getName(),
+        ImmutableMap.of(
+            "type", "hive",
+            "default-namespace", "default",
+            "parquet-enabled", "true",
+            "cache-enabled", "true",
+            "cache.expiration-interval-ms", "5000")
+      }
+    };
+  }
+
+  public TestCachingCatalogExpirationAfterWrite(
+      String catalogName, String implementation, Map<String, String> config) {
+    super(catalogName, implementation, config);
+  }
+
+  @BeforeClass
+  public static void start() {
+    HiveClientPool clientPool = new HiveClientPool(2, new Configuration());
+    clients = Mockito.spy(clientPool);

Review Comment:
   Thanks @nastra 👍 



##########
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCachingCatalogExpirationAfterWrite.java:
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.extensions;
+
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.hive.HiveClientPool;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.spark.SparkCatalog;
+import org.apache.spark.sql.connector.catalog.Identifier;
+import org.junit.After;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.runners.Parameterized;
+import org.mockito.Mockito;
+
+public class TestCachingCatalogExpirationAfterWrite extends SparkExtensionsTestBase {
+  private static HiveClientPool clients;
+
+  @Parameterized.Parameters(name = "catalogName = {0}, implementation = {1}, config = {2}")
+  public static Object[][] parameters() {
+    return new Object[][] {
+      {
+        "iceberg",
+        SparkCatalog.class.getName(),
+        ImmutableMap.of(
+            "type", "hive",
+            "default-namespace", "default",
+            "parquet-enabled", "true",
+            "cache-enabled", "true",
+            "cache.expiration-interval-ms", "5000")
+      }
+    };
+  }
+
+  public TestCachingCatalogExpirationAfterWrite(
+      String catalogName, String implementation, Map<String, String> config) {
+    super(catalogName, implementation, config);
+  }
+
+  @BeforeClass
+  public static void start() {
+    HiveClientPool clientPool = new HiveClientPool(2, new Configuration());
+    clients = Mockito.spy(clientPool);

Review Comment:
   Thanks @nastra 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

Posted by "zhangminglei (via GitHub)" <gi...@apache.org>.
zhangminglei commented on code in PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360727462


##########
core/src/main/java/org/apache/iceberg/CachingCatalog.java:
##########
@@ -110,6 +110,8 @@ private Cache<TableIdentifier, Table> createTableCache(Ticker ticker) {
           .removalListener(new MetadataTableInvalidatingRemovalListener())
           .executor(Runnable::run) // Makes the callbacks to removal listener synchronous
           .expireAfterAccess(Duration.ofMillis(expirationIntervalMillis))
+          .expireAfterWrite(

Review Comment:
   In fact, my local code does not have `expireAfterAccess`. I kept `expireAfterAccess` to see how reviewer thinks about this problem. I set the `expireAfterWrite` time to be longer than `expireAfterAccess` because I want to use it as a cache ttl.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

Posted by "zhangminglei (via GitHub)" <gi...@apache.org>.
zhangminglei commented on code in PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#discussion_r1361733500


##########
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestCachingCatalogExpirationAfterWrite.java:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark;
+
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.hive.HiveClientPool;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.spark.sql.connector.catalog.Identifier;
+import org.awaitility.Awaitility;
+import org.junit.After;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.runners.Parameterized;
+
+public class TestCachingCatalogExpirationAfterWrite extends SparkCatalogTestBase {

Review Comment:
   Sounds reasonable, What do you think ? @nastra , You wish put it into `spark` module ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Update CachingCatalog to use expireAfterWrite instead of expireAfterAccess [iceberg]

Posted by "zhangminglei (via GitHub)" <gi...@apache.org>.
zhangminglei commented on PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#issuecomment-1769792110

   Hi, @nastra @lirui-apache I think all the PR comments have been addressed now - can you please check if anything else is required, and if not, merge this in? Thanks a lot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360480900


##########
core/src/main/java/org/apache/iceberg/CachingCatalog.java:
##########
@@ -110,6 +110,8 @@ private Cache<TableIdentifier, Table> createTableCache(Ticker ticker) {
           .removalListener(new MetadataTableInvalidatingRemovalListener())
           .executor(Runnable::run) // Makes the callbacks to removal listener synchronous
           .expireAfterAccess(Duration.ofMillis(expirationIntervalMillis))
+          .expireAfterWrite(

Review Comment:
   I don't think we need/should specify `expireAfterWrite` in addition to `expireAfterAccess`:
   
   * The javadoc of `expireAfterWrite` mentions: `Specifies that each entry should be automatically removed from the cache once a fixed duration has elapsed after the entry's creation, or the most recent replacement of its value.`
   
   * The javadoc of `expireAfterAccess` mentions: `Specifies that each entry should be automatically removed from the cache once a fixed duration has elapsed after the entry's creation, the most recent replacement of its value, or its last access.`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Update CachingCatalog to use expireAfterWrite instead of expireAfterAccess [iceberg]

Posted by "lirui-apache (via GitHub)" <gi...@apache.org>.
lirui-apache commented on code in PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#discussion_r1364866417


##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -86,8 +86,8 @@ private CatalogProperties() {}
    * <ul>
    *   <li>Zero - Cache entries expires only if it gets evicted due to memory pressure from {@link
    *       #IO_MANIFEST_CACHE_MAX_TOTAL_BYTES} setting.
-   *   <li>Positive Values - Cache entries expire if not accessed via the cache after this many
-   *       milliseconds
+   *   <li>Positive Values - Cache entries expire after a fixed time period, regardless of whether
+   *        they are accessed via the cache during this time

Review Comment:
   I don't think manifest cache is changed, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Update CachingCatalog to use expireAfterWrite instead of expireAfterAccess [iceberg]

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#issuecomment-1772561757

   @RussellSpitzer any thoughts on changing the caching behavior here? I'm a little bit hesitant with this change. On one hand I can see a reasoning about wanting a table to expire from the cache after it was initially written. On the other hand it's possible that longer running jobs all of a sudden might start to fail, because the table they are operating on changed suddenly (due to cache expiration)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360480900


##########
core/src/main/java/org/apache/iceberg/CachingCatalog.java:
##########
@@ -110,6 +110,8 @@ private Cache<TableIdentifier, Table> createTableCache(Ticker ticker) {
           .removalListener(new MetadataTableInvalidatingRemovalListener())
           .executor(Runnable::run) // Makes the callbacks to removal listener synchronous
           .expireAfterAccess(Duration.ofMillis(expirationIntervalMillis))
+          .expireAfterWrite(

Review Comment:
   I don't think we need/should specify `expireAfterWrite` in addition to `expireAfterAccess`:
   
   * The javadoc of `expireAfterWrite` mentions: `Specifies that each entry should be automatically removed from the cache once a fixed duration has elapsed after the entry's creation, or the most recent replacement of its value.`
   
   * The javadoc of `expireAfterAccess` mentions: `Specifies that each entry should be automatically removed from the cache once a fixed duration has elapsed after the entry's creation, the most recent replacement of its value, or its last access.`
   
   That being said, did you actually mean to replace `expireAfterAccess` with `expireAfterWrite`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Update CachingCatalog to use expireAfterWrite instead of expireAfterAccess [iceberg]

Posted by "zhangminglei (via GitHub)" <gi...@apache.org>.
zhangminglei commented on PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#issuecomment-1772312138

   @nastra Hi, if has no problem, please merge this in ? 😄 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Update CachingCatalog to use expireAfterWrite instead of expireAfterAccess [iceberg]

Posted by "lirui-apache (via GitHub)" <gi...@apache.org>.
lirui-apache commented on PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#issuecomment-1774392135

   Thanks @RussellSpitzer , I think "REFRESH TABLE" can solve our issue. Is it correct to assume that the best practice is to always refresh before running any DDLs on a table?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360589867


##########
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCachingCatalogExpirationAfterWrite.java:
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.extensions;
+
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.hive.HiveClientPool;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.spark.SparkCatalog;
+import org.apache.spark.sql.connector.catalog.Identifier;
+import org.junit.After;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.runners.Parameterized;
+import org.mockito.Mockito;
+
+public class TestCachingCatalogExpirationAfterWrite extends SparkExtensionsTestBase {
+  private static HiveClientPool clients;
+
+  @Parameterized.Parameters(name = "catalogName = {0}, implementation = {1}, config = {2}")
+  public static Object[][] parameters() {
+    return new Object[][] {
+      {
+        "iceberg",
+        SparkCatalog.class.getName(),
+        ImmutableMap.of(
+            "type", "hive",
+            "default-namespace", "default",
+            "parquet-enabled", "true",
+            "cache-enabled", "true",
+            "cache.expiration-interval-ms", "5000")

Review Comment:
   we can lower this to `1000`, no need to wait 5 seconds



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Update CachingCatalog to use expireAfterWrite instead of expireAfterAccess [iceberg]

Posted by "zhangminglei (via GitHub)" <gi...@apache.org>.
zhangminglei commented on PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#issuecomment-1774520236

   From spark doc https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-aux-refresh-table.html#description
   
   **REFRESH TABLE statement invalidates the cached entries, which include data and metadata of the given table or view. The invalidated cache is populated in lazy manner when the cached table or the query associated with it is executed again.**
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Update CachingCatalog to use expireAfterWrite instead of expireAfterAccess [iceberg]

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra closed pull request #8844: Update CachingCatalog to use expireAfterWrite instead of expireAfterAccess
URL: https://github.com/apache/iceberg/pull/8844


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360600506


##########
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCachingCatalogExpirationAfterWrite.java:
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.extensions;
+
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.hive.HiveClientPool;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.spark.SparkCatalog;
+import org.apache.spark.sql.connector.catalog.Identifier;
+import org.junit.After;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.runners.Parameterized;
+import org.mockito.Mockito;
+
+public class TestCachingCatalogExpirationAfterWrite extends SparkExtensionsTestBase {

Review Comment:
   does the test need to be located in the `spark-extensions` module or can it go into `spark`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

Posted by "lirui-apache (via GitHub)" <gi...@apache.org>.
lirui-apache commented on code in PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#discussion_r1361499834


##########
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestCachingCatalogExpirationAfterWrite.java:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark;
+
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.hive.HiveClientPool;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.spark.sql.connector.catalog.Identifier;
+import org.awaitility.Awaitility;
+import org.junit.After;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.runners.Parameterized;
+
+public class TestCachingCatalogExpirationAfterWrite extends SparkCatalogTestBase {

Review Comment:
   Is it possible to add the new test to `TestCachingCatalog`? There seems nothing specific to spark here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

Posted by "lirui-apache (via GitHub)" <gi...@apache.org>.
lirui-apache commented on PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#issuecomment-1765657323

   Thanks @zhangminglei for working on this. I guess we also need to update the doc in `CatalogProperties` to indicate the change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360588962


##########
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCachingCatalogExpirationAfterWrite.java:
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.extensions;
+
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.hive.HiveClientPool;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.spark.SparkCatalog;
+import org.apache.spark.sql.connector.catalog.Identifier;
+import org.junit.After;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.runners.Parameterized;
+import org.mockito.Mockito;
+
+public class TestCachingCatalogExpirationAfterWrite extends SparkExtensionsTestBase {
+  private static HiveClientPool clients;
+
+  @Parameterized.Parameters(name = "catalogName = {0}, implementation = {1}, config = {2}")
+  public static Object[][] parameters() {
+    return new Object[][] {
+      {
+        "iceberg",
+        SparkCatalog.class.getName(),
+        ImmutableMap.of(
+            "type", "hive",
+            "default-namespace", "default",
+            "parquet-enabled", "true",
+            "cache-enabled", "true",
+            "cache.expiration-interval-ms", "5000")
+      }
+    };
+  }
+
+  public TestCachingCatalogExpirationAfterWrite(
+      String catalogName, String implementation, Map<String, String> config) {
+    super(catalogName, implementation, config);
+  }
+
+  @BeforeClass
+  public static void start() {
+    HiveClientPool clientPool = new HiveClientPool(2, new Configuration());
+    clients = Mockito.spy(clientPool);
+  }
+
+  @After
+  public void removeTable() {
+    sql("DROP TABLE IF EXISTS iceberg.default.foo");
+  }
+
+  @Test(timeout = 60000)
+  public void testExpireCacheAfterWrite() throws Exception {
+    sql("create table iceberg.default.foo(x int) using iceberg");
+    clients.run(
+        client -> {
+          client.dropTable("default", "foo");
+          return null;
+        });
+    SparkCatalog sparkCatalog =
+        (SparkCatalog) spark.sessionState().catalogManager().catalog(catalogName);
+    while (sparkCatalog.tableExists(Identifier.of(new String[] {"default"}, "foo"))) {
+      Thread.sleep(1000);

Review Comment:
   I think it's better to use Awaitility here, because we don't want to use sleeps and potentially have a long running (60 seconds) test:
   
   ```
   @BeforeClass
     public static void start() {
       clients = new HiveClientPool(2, new Configuration());
     }
   
     @AfterClass
     public static void stop() {
       if (null != clients) {
         clients.close();
       }
     }
   
     @After
     public void removeTable() {
       sql("DROP TABLE IF EXISTS iceberg.default.foo");
     }
   
     @Test
     public void testExpireCacheAfterWrite() throws Exception {
       sql("create table iceberg.default.foo(x int) using iceberg");
       clients.run(
           client -> {
             client.dropTable("default", "foo");
             return null;
           });
   
       SparkCatalog sparkCatalog =
           (SparkCatalog) spark.sessionState().catalogManager().catalog(catalogName);
   
       Awaitility.await("wait for table to be removed from cache")
           .atMost(3, TimeUnit.SECONDS)
           .until(() -> !sparkCatalog.tableExists(Identifier.of(new String[] {"default"}, "foo")));
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360589513


##########
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCachingCatalogExpirationAfterWrite.java:
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.extensions;
+
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.hive.HiveClientPool;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.spark.SparkCatalog;
+import org.apache.spark.sql.connector.catalog.Identifier;
+import org.junit.After;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.runners.Parameterized;
+import org.mockito.Mockito;
+
+public class TestCachingCatalogExpirationAfterWrite extends SparkExtensionsTestBase {
+  private static HiveClientPool clients;
+
+  @Parameterized.Parameters(name = "catalogName = {0}, implementation = {1}, config = {2}")
+  public static Object[][] parameters() {
+    return new Object[][] {
+      {
+        "iceberg",
+        SparkCatalog.class.getName(),
+        ImmutableMap.of(
+            "type", "hive",
+            "default-namespace", "default",
+            "parquet-enabled", "true",
+            "cache-enabled", "true",
+            "cache.expiration-interval-ms", "5000")
+      }
+    };
+  }
+
+  public TestCachingCatalogExpirationAfterWrite(
+      String catalogName, String implementation, Map<String, String> config) {
+    super(catalogName, implementation, config);
+  }
+
+  @BeforeClass
+  public static void start() {
+    HiveClientPool clientPool = new HiveClientPool(2, new Configuration());
+    clients = Mockito.spy(clientPool);

Review Comment:
   no need for Mockito here. Also the client pool needs closing in an `@AfterClass`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#discussion_r1362413695


##########
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestCachingCatalogExpirationAfterWrite.java:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark;
+
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.hive.HiveClientPool;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.spark.sql.connector.catalog.Identifier;
+import org.awaitility.Awaitility;
+import org.junit.After;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.runners.Parameterized;
+
+public class TestCachingCatalogExpirationAfterWrite extends SparkCatalogTestBase {

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

Posted by "zhangminglei (via GitHub)" <gi...@apache.org>.
zhangminglei commented on code in PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#discussion_r1363068688


##########
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestCachingCatalogExpirationAfterWrite.java:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark;
+
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.hive.HiveClientPool;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.spark.sql.connector.catalog.Identifier;
+import org.awaitility.Awaitility;
+import org.junit.After;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.runners.Parameterized;
+
+public class TestCachingCatalogExpirationAfterWrite extends SparkCatalogTestBase {

Review Comment:
   I've briefly reviewed the code, and it seems that the existing test `testCatalogExpirationTtl` already covers the code change. I believe we can safely remove this test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Update CachingCatalog to use expireAfterWrite instead of expireAfterAccess [iceberg]

Posted by "zhangminglei (via GitHub)" <gi...@apache.org>.
zhangminglei commented on code in PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#discussion_r1364968054


##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -86,8 +86,8 @@ private CatalogProperties() {}
    * <ul>
    *   <li>Zero - Cache entries expires only if it gets evicted due to memory pressure from {@link
    *       #IO_MANIFEST_CACHE_MAX_TOTAL_BYTES} setting.
-   *   <li>Positive Values - Cache entries expire if not accessed via the cache after this many
-   *       milliseconds
+   *   <li>Positive Values - Cache entries expire after a fixed time period, regardless of whether
+   *        they are accessed via the cache during this time

Review Comment:
   Correct, manifest cache use `weakKeys` and `softValue` to expire entries by gc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Update CachingCatalog to use expireAfterWrite instead of expireAfterAccess [iceberg]

Posted by "zhangminglei (via GitHub)" <gi...@apache.org>.
zhangminglei commented on PR #8844:
URL: https://github.com/apache/iceberg/pull/8844#issuecomment-1770804416

   @nastra Can you please take a finally review ? 😺 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org