You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/08 11:32:23 UTC

[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3861: Spark: Reduce unnecessary remote service requests in `SparkSessionCatalog.invalidateTable`

RussellSpitzer commented on a change in pull request #3861:
URL: https://github.com/apache/iceberg/pull/3861#discussion_r780660136



##########
File path: spark/v3.0/spark/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java
##########
@@ -123,11 +123,10 @@ public Table loadTable(Identifier ident) throws NoSuchTableException {
 
   @Override
   public void invalidateTable(Identifier ident) {
-    if (icebergCatalog.tableExists(ident)) {
-      icebergCatalog.invalidateTable(ident);
-    } else {
-      getSessionCatalog().invalidateTable(ident);
-    }
+    // We do not need to check whether the table exists and whether
+    // it is an Iceberg table to reduce remote service requests.
+    icebergCatalog.invalidateTable(ident);

Review comment:
       So we always have to do at least one catalog load here. Since we have a caching catalog this is at most a single remote lookup but in most circumstances should probably be zero because the reference is cached.
   
   The code for the SparkCatalog 'invalidatetable' method starts by doing a 'load' operation, so either that call will use a cached reference or do a remote lookup. If the 'exists' call was remote than it the table reference would be cached and the invalidate call would use that cache. If the exists call wasn't remote it also would be cached that means the reference was cached already and invalidate should also use that cache. So the number of remote lookups should be at most one both with this patch and without.
   
   This change now also always calls the underlying session catalogs invalidate method and I'm not sure what the consequences of that so I would probably recommend leaving this as is unless we have some evidence this is causing an issue.
   
   I think if we want to remove the remote calls we have to first go to the invalidateTables method and do some mods there.
   
   First, check whether the catalog is caching or not. For non caching or no cache timeout catalogs we can return a noop. For tables backed by a catalog with a cache we would probably need to add a dropCache function to the Iceberg cachingcatalog api or something like that so we don't have to rely on calling load.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org