You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "wypoon (via GitHub)" <gi...@apache.org> on 2023/04/29 00:12:53 UTC

[GitHub] [iceberg] wypoon opened a new pull request, #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

wypoon opened a new pull request, #7469:
URL: https://github.com/apache/iceberg/pull/7469

   ... and make the SparkCatalog use a case-insensitive CachingCatalog by default.
   
   Motivation:
   The `CachingCatalog` has a field that determines if the cache uses case-sensitive `TableIdentifier` keys. By default, caching is enabled and the `SparkCatalog` wraps itself in a case-sensitive `CachingCatalog`. There is no configuration to allow specifying the use of a case-insensitive `CachingCatalog`.
   In a customer SQL workload, we discovered that queries used inconsistent case for database and table names. A table is read from using an upper case name and is updated using a lower case name. This is not incorrect as SQL is case-insensitive for database, table and column names. This is in the **same** Spark session. Normally the new snapshot should be read immediately after the write, but it is not, due to a **different** table being read from the cache (two different entries for the table are in the case, under different keys). As a result, stale data is read until the cache expiration occurs. (Due to repeated reads, the cache keeps getting renewed, exacerbating the problem.)
   
   Note: Currently, in the `CachingCatalog`, the metadata table resolution is already case-insensitive for the metadata part of the name.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1532287984

   @RussellSpitzer @szehon-ho can this be merged?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1535688492

   Thanks @wypoon , I approved #7535 to restore the behavior.  
   
   You guys may be right that its a specific property to IcebergCatalog types, it looks like due to Hive limitation, HiveCatalog will always be case insensitive to database/table names, and we cannot ever support caseSensitive db/table names, is it right?  In that case, I dont mind adding that catalog flag unless I'm missing something.  cc @RussellSpitzer for sanity check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1184353752


##########
spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -457,6 +459,13 @@ public final void initialize(String name, CaseInsensitiveStringMap options) {
         PropertyUtil.propertyAsBoolean(
             options, CatalogProperties.CACHE_ENABLED, CatalogProperties.CACHE_ENABLED_DEFAULT);
 
+    SparkSession sparkSession = SparkSession.active();
+    boolean cacheCaseSensitive =
+        PropertyUtil.propertyAsBoolean(
+            options,
+            CatalogProperties.CACHE_CASE_SENSITIVE,
+            Boolean.parseBoolean(sparkSession.conf().get("spark.sql.caseSensitive")));

Review Comment:
   Yea, i would say if its just a few lines, add the method to SparkUtil and then use it here, so at least this code is same across three versions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1184096271


##########
spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -457,6 +459,13 @@ public final void initialize(String name, CaseInsensitiveStringMap options) {
         PropertyUtil.propertyAsBoolean(
             options, CatalogProperties.CACHE_ENABLED, CatalogProperties.CACHE_ENABLED_DEFAULT);
 
+    SparkSession sparkSession = SparkSession.active();
+    boolean cacheCaseSensitive =
+        PropertyUtil.propertyAsBoolean(
+            options,
+            CatalogProperties.CACHE_CASE_SENSITIVE,
+            Boolean.parseBoolean(sparkSession.conf().get("spark.sql.caseSensitive")));

Review Comment:
   In 3.1, `SparkUtil.caseSensitive(SparkSession)` does not exist. In 3.2 - 3.4, that helper method is used instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1183182500


##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -41,6 +41,11 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_ENABLED_DEFAULT = true;
 
+  /** Controls whether the caching catalog will cache table entries using case-sensitive keys. */
+  public static final String CACHE_CASE_SENSITIVE = "cache.case-sensitive";
+
+  public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = false;

Review Comment:
   Exactly, reading the parameter from the runtime conf creating the catalog and using that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1183182682


##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -41,6 +41,11 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_ENABLED_DEFAULT = true;
 
+  /** Controls whether the caching catalog will cache table entries using case-sensitive keys. */
+  public static final String CACHE_CASE_SENSITIVE = "cache.case-sensitive";
+
+  public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = false;

Review Comment:
   Not that we have to do that, but I think that would make sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1181959523


##########
spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -90,6 +90,8 @@
  *   <li><code>catalog-impl</code> - a custom {@link Catalog} implementation to use
  *   <li><code>default-namespace</code> - a namespace to use as the default
  *   <li><code>cache-enabled</code> - whether to enable catalog cache
+ *   <li><code>cache.case-sensitive</code> - whether the catalog cache should use case-sensitive

Review Comment:
   It is more than table names. Database name can also be an issue. Basically all the components of the TableIdentifier.
   How about "whether the catalog cache should compare table identifiers in a case sensitive way"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1184230222


##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -41,6 +41,11 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_ENABLED_DEFAULT = true;
 
+  /** Controls whether the caching catalog will cache table entries using case-sensitive keys. */
+  public static final String CACHE_CASE_SENSITIVE = "cache.case-sensitive";
+
+  public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = false;

Review Comment:
   Great, makes sense to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1181945744


##########
spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -90,6 +90,8 @@
  *   <li><code>catalog-impl</code> - a custom {@link Catalog} implementation to use
  *   <li><code>default-namespace</code> - a namespace to use as the default
  *   <li><code>cache-enabled</code> - whether to enable catalog cache
+ *   <li><code>cache.case-sensitive</code> - whether the catalog cache should use case-sensitive

Review Comment:
   Key may be a bit abstract?
   
   "whether the catalog cache should compare table names in a case sensitive manner"?



##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -41,6 +41,11 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_ENABLED_DEFAULT = true;
 
+  /** Controls whether the caching catalog will cache table entries using case-sensitive keys. */
+  public static final String CACHE_CASE_SENSITIVE = "cache.case-sensitive";
+
+  public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = false;

Review Comment:
   I see the current wrapCatalog two-arg call defaults to caseSensitive=true?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1530539002

   @szehon-ho thanks for reviewing! I'll update the javadoc comment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1184228640


##########
spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -457,6 +459,13 @@ public final void initialize(String name, CaseInsensitiveStringMap options) {
         PropertyUtil.propertyAsBoolean(
             options, CatalogProperties.CACHE_ENABLED, CatalogProperties.CACHE_ENABLED_DEFAULT);
 
+    SparkSession sparkSession = SparkSession.active();
+    boolean cacheCaseSensitive =
+        PropertyUtil.propertyAsBoolean(
+            options,
+            CatalogProperties.CACHE_CASE_SENSITIVE,
+            Boolean.parseBoolean(sparkSession.conf().get("spark.sql.caseSensitive")));

Review Comment:
   Just realized that even the util method uses the constant, I would have thought it would use the SqlConf object val, oh well.  
   
   It's minor but could we can add that to SparkUtil if we are anyway in Spark3.1 code, if its quick?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1184092083


##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -41,6 +41,11 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_ENABLED_DEFAULT = true;
 
+  /** Controls whether the caching catalog will cache table entries using case-sensitive keys. */
+  public static final String CACHE_CASE_SENSITIVE = "cache.case-sensitive";
+
+  public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = false;

Review Comment:
   @RussellSpitzer I had not understood what you meant.
   @szehon-ho I have now updated the PR.
   
   Assuming caching is enabled, the behavior is as follows:
   If the user configures `cache.case-sensitive`, the conf is used. Otherwise, the case sensitivity of `CachingCatalog` is set to match `spark.sql.caseSensitive` in the runtime configuration of the `SparkSession`.
   Thus we have configurability as well as a default that should match users' expectations. In practice, `spark.sql.caseSensitive` should rarely be set (and thus default to false), as the Spark documentation for it advises, "It is highly discouraged to turn on case sensitive mode."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1181958644


##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -41,6 +41,11 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_ENABLED_DEFAULT = true;
 
+  /** Controls whether the caching catalog will cache table entries using case-sensitive keys. */
+  public static final String CACHE_CASE_SENSITIVE = "cache.case-sensitive";
+
+  public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = false;

Review Comment:
   Yes, and that is the problem. I'd argue that we should default the case sensitivity to false, because I think this causes the least surprise to users.
   If others have use cases where they need case sensitivity to be true, I'd like to hear about them. I may yet be convinced that case sensitivity should default to true.
   In any case, this critical point is that now the case sensitivity can be configured.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1181967223


##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -41,6 +41,11 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_ENABLED_DEFAULT = true;
 
+  /** Controls whether the caching catalog will cache table entries using case-sensitive keys. */
+  public static final String CACHE_CASE_SENSITIVE = "cache.case-sensitive";
+
+  public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = false;

Review Comment:
   Should we match the spark sql case sensitivity patan?



##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -41,6 +41,11 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_ENABLED_DEFAULT = true;
 
+  /** Controls whether the caching catalog will cache table entries using case-sensitive keys. */
+  public static final String CACHE_CASE_SENSITIVE = "cache.case-sensitive";
+
+  public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = false;

Review Comment:
   Should we match the spark sql case sensitivity param



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1535485253

   Ok, I'm thinking its simpler to just change the default from SparkUtil.caseSensitive to true ?  Anyway we should also wait for others here to comment, but feel free to open a pr .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] snazy commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "snazy (via GitHub)" <gi...@apache.org>.
snazy commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1535346215

   Sure, you can use Nessie via Spark.
   This PR changes the default and existing behavior.
   Telling users of case sensitive catalogs to change all their client configurations isn‘t a good option.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1184300768


##########
spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -457,6 +459,13 @@ public final void initialize(String name, CaseInsensitiveStringMap options) {
         PropertyUtil.propertyAsBoolean(
             options, CatalogProperties.CACHE_ENABLED, CatalogProperties.CACHE_ENABLED_DEFAULT);
 
+    SparkSession sparkSession = SparkSession.active();
+    boolean cacheCaseSensitive =
+        PropertyUtil.propertyAsBoolean(
+            options,
+            CatalogProperties.CACHE_CASE_SENSITIVE,
+            Boolean.parseBoolean(sparkSession.conf().get("spark.sql.caseSensitive")));

Review Comment:
   I could add the `caseSensitive(SparkSession)` helper method to `SparkUtil` in 3.1 as well, but where do I stop? Do I then go on to rewrite the other places in 3.1 that should then use this helper instead? I didn't do so because this would touch more files unrelated to this change. @szehon-ho let me know what you prefer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1181973306


##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -41,6 +41,11 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_ENABLED_DEFAULT = true;
 
+  /** Controls whether the caching catalog will cache table entries using case-sensitive keys. */
+  public static final String CACHE_CASE_SENSITIVE = "cache.case-sensitive";
+
+  public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = false;

Review Comment:
   @RussellSpitzer are you referring to https://github.com/apache/spark/blob/v3.4.0/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L867-L873
   That conf defaults to false, so I am in fact matching it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1181968213


##########
spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -90,6 +90,8 @@
  *   <li><code>catalog-impl</code> - a custom {@link Catalog} implementation to use
  *   <li><code>default-namespace</code> - a namespace to use as the default
  *   <li><code>cache-enabled</code> - whether to enable catalog cache
+ *   <li><code>cache.case-sensitive</code> - whether the catalog cache should use case-sensitive

Review Comment:
   Yea sounds good to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] snazy commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "snazy (via GitHub)" <gi...@apache.org>.
snazy commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1535892623

   (Just shouting at that Spark property:) Oh my, so that `spark.sql.caseSensitive`, which seems to be originally intended for table-level column-name case-sensitivity, is now used for catalog/database/table name case-sensitivity as well. This Spark property is IMHO "maturely broken".
   
   #7535 fixes the breaking change - thanks for that!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1528851861

   @rdblue @aokolnychyi @RussellSpitzer @jackye1995 
   Since this is a simple change, I made changes to all the Spark versions at once, but if you insist, I can remove the changes to Spark 3.1 - 3.3 and make them separately.
   @stevenzwu I can make a similar change to the `FlinkCatalog` as well; what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1181973306


##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -41,6 +41,11 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_ENABLED_DEFAULT = true;
 
+  /** Controls whether the caching catalog will cache table entries using case-sensitive keys. */
+  public static final String CACHE_CASE_SENSITIVE = "cache.case-sensitive";
+
+  public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = false;

Review Comment:
   @RussellSpitzer are you referring to https://github.com/apache/spark/blob/v3.4.0/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L867-L873
   That conf defaults to false, so I am in fact matching it.
   
   "Default to case insensitive. It is highly discouraged to turn on case sensitive mode."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1184381819


##########
spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -457,6 +459,13 @@ public final void initialize(String name, CaseInsensitiveStringMap options) {
         PropertyUtil.propertyAsBoolean(
             options, CatalogProperties.CACHE_ENABLED, CatalogProperties.CACHE_ENABLED_DEFAULT);
 
+    SparkSession sparkSession = SparkSession.active();
+    boolean cacheCaseSensitive =
+        PropertyUtil.propertyAsBoolean(
+            options,
+            CatalogProperties.CACHE_CASE_SENSITIVE,
+            Boolean.parseBoolean(sparkSession.conf().get("spark.sql.caseSensitive")));

Review Comment:
   Ok, done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1184096271


##########
spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -457,6 +459,13 @@ public final void initialize(String name, CaseInsensitiveStringMap options) {
         PropertyUtil.propertyAsBoolean(
             options, CatalogProperties.CACHE_ENABLED, CatalogProperties.CACHE_ENABLED_DEFAULT);
 
+    SparkSession sparkSession = SparkSession.active();
+    boolean cacheCaseSensitive =
+        PropertyUtil.propertyAsBoolean(
+            options,
+            CatalogProperties.CACHE_CASE_SENSITIVE,
+            Boolean.parseBoolean(sparkSession.conf().get("spark.sql.caseSensitive")));

Review Comment:
   In 3.1, `SparkUtil.caseSensitive(SparkSession)` does not exists. In 3.2 - 3.4, that helper method is used instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1535184596

   Merged, thanks @wypoon , also @RussellSpitzer for extra review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1535399213

   OK, taking a look through it, I think it is not breaking Iceberg's [NessieCatalog](https://github.com/apache/iceberg/blob/master/nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java) but just any user of SparkCatalog that depended on case sensitive cache to be true.
   
   Seems to be the same the concern I mentioned at https://github.com/apache/iceberg/pull/7469#discussion_r1181945451 ,  consensus there was indeed that it should follow spark.sql.caseSensitive config.  
   
   > spark.sql.caseSensitive, because AFAIU that property is used for the table schema (case (in)sensitive column names).
   
   I do see spark.sql.caseSensitive being used for table identifiers in some point in past like :  https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L154 , but I guess now it may depend on backing Catalog.  Hive Metastore for example is case insensitive period.
   
   I dont mind reverting default explicitly to the original value (true), though of course it means the other way, users who want case insensitive need to set it to false.  Not sure what @wypoon @RussellSpitzer think, or if there is any other way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1535713944

   Thanks @szehon-ho.
   
   The `SparkCatalog` holds an instance of an Iceberg `Catalog` (in the `icebergCatalog` field), and uses it to implement the Spark `TableCatalog` functionality. The implementation of Iceberg `Catalog` that it uses depends on configuration; e.g., spark.sql.catalog.<catalog_name>.type=hive means a `HiveCatalog` is used; spark.sql.catalog.<catalog_name>.type=hadoop means a `HadoopCatalog` is used; and I already mentioned above how a `NessieCatalog` can be configured. This `Catalog` is then wrapped in a `CachingCatalog` if caching is enabled.
   
   I wouldn't call it a limitation of Hive that the `HiveCatalog` is case insensitive; I would call it a feature. :-)
   Come to think of it, the `HadoopCatalog` must be case sensitive, since the path is constructed from the database and table name, and the path is case sensitive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] snazy commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "snazy (via GitHub)" <gi...@apache.org>.
snazy commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1535323466

   This change breaks the Nessie catalog, which is case sensitive.
   
   `SparkUtil.caseSensitive` returns `false`, if the `spark.sql.caseSensitive` is not set, which is IMO not the right behavior for case-sensitive catalogs. Also not sure whether it's correct to imply that a _catalog_ is case sensitive (or not) from `spark.sql.caseSensitive`, because AFAIU that property is used for the table schema (case (in)sensitive column names).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1528520056

   Failure in org.apache.iceberg.spark.source.TestStructuredStreamingRead3.testReadStreamOnIcebergTableWithMultipleSnapshots_WithNumberOfRows_1 for Spark 3.4 occurs without this change too in my local build, so I think it is not related to this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1183148833


##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -41,6 +41,11 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_ENABLED_DEFAULT = true;
 
+  /** Controls whether the caching catalog will cache table entries using case-sensitive keys. */
+  public static final String CACHE_CASE_SENSITIVE = "cache.case-sensitive";
+
+  public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = false;

Review Comment:
   Hm I'm not too familiar, but is tthis a breaking change then?  If the user set case-sensitive=true on spark-sql side, and is running with expectation for Iceberg to follow that?  I guess @RussellSpitzer is suggesting we read the spark conf and set that from SparkCatalog?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1181958644


##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -41,6 +41,11 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_ENABLED_DEFAULT = true;
 
+  /** Controls whether the caching catalog will cache table entries using case-sensitive keys. */
+  public static final String CACHE_CASE_SENSITIVE = "cache.case-sensitive";
+
+  public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = false;

Review Comment:
   Yes, and that is the problem. I'd argue that we should default the case sensitivity to false, because I think this causes the least surprise to users.
   If others have use cases where they need case sensitivity to be true, I'd like to hear about them. I may yet be convinced that case sensitivity should default to true.
   In any case, the critical point is that now the case sensitivity can be configured.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1181967223


##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -41,6 +41,11 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_ENABLED_DEFAULT = true;
 
+  /** Controls whether the caching catalog will cache table entries using case-sensitive keys. */
+  public static final String CACHE_CASE_SENSITIVE = "cache.case-sensitive";
+
+  public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = false;

Review Comment:
   Should we match the spark sql case sensitivity param?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1184300768


##########
spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -457,6 +459,13 @@ public final void initialize(String name, CaseInsensitiveStringMap options) {
         PropertyUtil.propertyAsBoolean(
             options, CatalogProperties.CACHE_ENABLED, CatalogProperties.CACHE_ENABLED_DEFAULT);
 
+    SparkSession sparkSession = SparkSession.active();
+    boolean cacheCaseSensitive =
+        PropertyUtil.propertyAsBoolean(
+            options,
+            CatalogProperties.CACHE_CASE_SENSITIVE,
+            Boolean.parseBoolean(sparkSession.conf().get("spark.sql.caseSensitive")));

Review Comment:
   I could add the `caseSensitive(SparkSession)` helper method to `SparkUtil` in 3.1 as well, but where do I stop? Do I then go on to rewrite the other places in 3.1 that should then use this helper instead? I didn't do so because this would touch more files unrelated to this change. @szehon-ho let me know what you prefer.
   
   Note: The helper method was introduced in https://github.com/apache/iceberg/pull/6899 in 3.3, which was backported to 3.2 but not 3.1.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#discussion_r1184183581


##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -41,6 +41,11 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_ENABLED_DEFAULT = true;
 
+  /** Controls whether the caching catalog will cache table entries using case-sensitive keys. */
+  public static final String CACHE_CASE_SENSITIVE = "cache.case-sensitive";
+
+  public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = false;

Review Comment:
   In case there are other use cases I do not know about, I am leaving configurability in. Also, other catalogs such as `FlinkCatalog` that use the `CachingCatalog` may choose to make use of the `cache.case-sensitive` conf.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho merged pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho merged PR #7469:
URL: https://github.com/apache/iceberg/pull/7469


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1535337646

   Hi, it seems the changes here are mostly on SparkCatalog, is it used by NessieCatalog?  L
   
   But I guess looks like putting it on CatalogProperties does indicate it should be implemented by all catalogs at some point.
   
   >  Also not sure whether it's correct to imply that a catalog is case sensitive (or not) from spark.sql.caseSensitive
   
   I think spark.sql.caseSensitive is set on the session so it seemed to me even at a higher level than catalog, but could be wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1535522018

   The terminology gets confusing because there are two concepts of catalog here: Spark's concept of catalog (`org.apache.spark.sql.connector.catalog.TableCatalog`) and Iceberg's concept of catalog (`org.apache.iceberg.catalog.Catalog`). `org.apache.iceberg.nessie.NessieCatalog` is an `org.apache.iceberg.catalog.Catalog`. `org.apache.iceberg.spark.SparkCatalog` is an `org.apache.spark.sql.connector.catalog.TableCatalog`.
   
   I have not used Nessie and am not familiar with it so I did not know that the Nessie catalog is case sensitive.
   If I understand @snazy correctly, a Spark user can use `SparkCatalog` to interact with Nessie integration with Iceberg by configuring a Spark catalog, call it `nessie`, by setting `spark.sql.catalog.nessie=org.apache.iceberg.spark.SparkCatalog` and `spark.sql.catalog.nessie.catalog-impl=org.apache.iceberg.nessie.NessieCatalog`, among other confs. The `SparkCatalog` will by default wrap itself in a `CachingCatalog` with case sensitivity by default matching that of `spark.sql.caseSensitive` (which defaults to false), and this causes a problem due to Nessie's case sensitivity.
   Of course, the problem can be addressed by setting `spark.sql.catalog.nessie.cache.case-sensitive=false`.
   
   I agree with @szehon-ho that the simplest remedy is to introduce a default of true for `CatalogProperties.CACHE_CASE_SENSITIVE` and use that instead of using the value of `spark.sql.caseSensitive`. This restores the default behavior and allows configurability.
   For non-Nessie users like users of the HiveCatalog, the current behavior is actually very counterintuitive and baffling (see https://github.com/apache/iceberg/issues/7474) and a default of false would make more sense. However, I'm fine to go with false to avoid breaking existing use cases. At the end of the day, some users are going to have to use a conf to get the non-default behavior, whether Nessie users or others. (Of course, I'd prefer that our users not have to use a conf!)
   
   Incidentally, `spark.sql.caseSensitive` does not apply only to column names. It applies to database and table names too. E.g., in `org.apache.spark.sql.catalyst.catalog.SessionCatalog`, `loadTable` calls `formatDatabaseName` and `formatTableName` (which both depend of the value of `spark.sql.caseSensitive`) before calling the `ExternalCatalog` to load the table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] snazy commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "snazy (via GitHub)" <gi...@apache.org>.
snazy commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1535430527

   I think the idea to let users explicitly override this is fine. But I‘d suggest a different property for that. The default behavior should be inquired from the (Iceberg) catalog implementation in some way.
   +1 on revering this and taking a different approach. Happy to help with design/review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #7469:
URL: https://github.com/apache/iceberg/pull/7469#issuecomment-1535541837

   As a quick remedy, I opened https://github.com/apache/iceberg/pull/7535.
   However, an alternative would be (in line with @snazy's suggestion) to have Iceberg `Catalog` implementations implement a `caseSensitive` boolean method (this requires an API change), and then in `SparkCatalog`, when wrapping its `icebergCatalog` in a `CachingCatalog`, to check the case sensitivity of this `Catalog` and use a case sensitive/insensitive `CachingCatalog` accordingly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org