You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/15 18:01:56 UTC

[GitHub] [spark] sandeep-katta opened a new pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

sandeep-katta opened a new pull request #32194:
URL: https://github.com/apache/spark/pull/32194


   
   ### What changes were proposed in this pull request?
   
   As a part of the SPARK-26837 pruning of nested fields from object serializers are supported. But it is missed to handle case insensitivity nature of spark
   
   In this PR I have resolved the column names to be pruned based on `spark.sql.caseSensitive ` config
   **Exception Before Fix**
   
   ```
   Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
     at org.apache.spark.sql.types.StructType.apply(StructType.scala:414)
     at org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.$anonfun$applyOrElse$3(objects.scala:216)
     at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
     at scala.collection.immutable.List.foreach(List.scala:392)
     at scala.collection.TraversableLike.map(TraversableLike.scala:238)
     at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
     at scala.collection.immutable.List.map(List.scala:298)
     at org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.applyOrElse(objects.scala:215)
     at org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.applyOrElse(objects.scala:203)
     at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:309)
     at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
     at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:309)
     at 
   ```
   
   ### Why are the changes needed?
   After Upgrade to Spark 3 `foreachBatch` API throws` java.lang.ArrayIndexOutOfBoundsException`. This issue will be fixed using this PR
   
   
   ### Does this PR introduce _any_ user-facing change?
   No, Infact fixes the regression
   
   
   ### How was this patch tested?
   Added tests and also tested verified manually
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821053547


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42051/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821050744


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42051/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

wangyum commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614620249



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -61,12 +63,15 @@ object SchemaPruning {
           sortLeftFieldsByRight(leftValueType, rightValueType),
           containsNull)
       case (leftStruct: StructType, rightStruct: StructType) =>
-        val filteredRightFieldNames = rightStruct.fieldNames.filter(leftStruct.fieldNames.contains)
+        val resolver = conf.resolver
+        val filteredRightFieldNames = rightStruct.fieldNames
+          .filter(name => leftStruct.fieldNames.exists(resolver(_, name)))
         val sortedLeftFields = filteredRightFieldNames.map { fieldName =>
-          val leftFieldType = leftStruct(fieldName).dataType
+          val resolvedLeftStruct = leftStruct.filter(p => resolver(p.name, fieldName)).head

Review comment:
       `filter` -> `find`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a change in pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614535186



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -36,7 +38,8 @@ object SchemaPruning {
       .reduceLeft(_ merge _)
     val dataSchemaFieldNames = dataSchema.fieldNames.toSet
     val mergedDataSchema =
-      StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
+      StructType(mergedSchema.filter(f => dataSchemaFieldNames.exists(resolver(_, f.name))

Review comment:
       Before query should fail without this fix
   ```
   val inputPath = "/Users/xyz/data/testcaseInsensitivity"
   val output_path = "/Users/xyz/output"
   
   spark.range(10).write.format("parquet").save(inputPath)
   
   def process_row(microBatch: DataFrame, batchId: Long): Unit = {
     val df = microBatch.select($"ID".alias("other")) // Doesn't work
     df.write.format("parquet").mode("append").save(output_path)
   
   }
   
   val schema = new StructType().add("id", LongType)
   
   val stream_df = spark.readStream.schema(schema).format("parquet").load(inputPath)
   stream_df.writeStream.trigger(Trigger.Once).foreachBatch(process_row _)
     .start().awaitTermination()
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

viirya commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614341325



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -36,7 +38,8 @@ object SchemaPruning {
       .reduceLeft(_ merge _)
     val dataSchemaFieldNames = dataSchema.fieldNames.toSet
     val mergedDataSchema =
-      StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
+      StructType(mergedSchema.filter(f => dataSchemaFieldNames.exists(resolver(_, f.name))

Review comment:
       I should fix this issue in SPARK-34963 recently. `requestedRootFields` is already resolved with case-sensitivity.
   
   
   

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -36,7 +38,8 @@ object SchemaPruning {
       .reduceLeft(_ merge _)
     val dataSchemaFieldNames = dataSchema.fieldNames.toSet
     val mergedDataSchema =
-      StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
+      StructType(mergedSchema.filter(f => dataSchemaFieldNames.exists(resolver(_, f.name))

Review comment:
       I should already fix this issue in SPARK-34963 recently. `requestedRootFields` is already resolved with case-sensitivity.
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta edited a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

sandeep-katta edited a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-823261848


   @dongjoon-hyun  @viirya , can this PR be merged ?. If not, I am happy to address review comments if there are any


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820785818


   **[Test build #137436 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137436/testReport)** for PR 32194 at commit [`f6e4b6b`](https://github.com/apache/spark/commit/f6e4b6b78d2a46a9e27b9674902e61c15fae61df).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820627846


   CC @viirya @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820805775


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137436/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821071130


   **[Test build #137464 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137464/testReport)** for PR 32194 at commit [`baf8125`](https://github.com/apache/spark/commit/baf81252128a10c169433a16721f5f568d827934).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821167742


   **[Test build #137475 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137475/testReport)** for PR 32194 at commit [`04b24c9`](https://github.com/apache/spark/commit/04b24c9fee2d10ff6066cb87d9baf9a600736c88).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820923870


   **[Test build #137464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137464/testReport)** for PR 32194 at commit [`baf8125`](https://github.com/apache/spark/commit/baf81252128a10c169433a16721f5f568d827934).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614556852



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -36,7 +38,8 @@ object SchemaPruning {
       .reduceLeft(_ merge _)
     val dataSchemaFieldNames = dataSchema.fieldNames.toSet
     val mergedDataSchema =
-      StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
+      StructType(mergedSchema.filter(f => dataSchemaFieldNames.exists(resolver(_, f.name))
+      ))

Review comment:
       Indentation? Can we piggy-back this to line 41?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614557384



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruningSuite.scala
##########
@@ -59,4 +69,28 @@ class SchemaPruningSuite extends SparkFunSuite {
       StructType.fromDDL("e int, f string")))
     testPrunedSchema(complexStruct, StructField("c", IntegerType), selectFieldInMap)
   }
+
+  test("SPARK-35096: test case insensitivity of pruned schema  ") {

Review comment:
       Shall we remove the trailing space in the test name?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820998327


   **[Test build #754876451](https://github.com/sandeep-katta/spark/actions/runs/754876451)** for PR 32194 at commit [`04b24c9`](https://github.com/sandeep-katta/spark/commit/04b24c9fee2d10ff6066cb87d9baf9a600736c88).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-823835594


   thanks, merging to master/3.1/3.0!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821022533


   **[Test build #137475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137475/testReport)** for PR 32194 at commit [`04b24c9`](https://github.com/apache/spark/commit/04b24c9fee2d10ff6066cb87d9baf9a600736c88).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821022533


   **[Test build #137475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137475/testReport)** for PR 32194 at commit [`04b24c9`](https://github.com/apache/spark/commit/04b24c9fee2d10ff6066cb87d9baf9a600736c88).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820805775


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137436/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614557660



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruningSuite.scala
##########
@@ -59,4 +69,28 @@ class SchemaPruningSuite extends SparkFunSuite {
       StructType.fromDDL("e int, f string")))
     testPrunedSchema(complexStruct, StructField("c", IntegerType), selectFieldInMap)
   }
+
+  test("SPARK-35096: test case insensitivity of pruned schema  ") {
+    Seq(true, false).foreach(isCaseSensitive => {
+      withSQLConf(CASE_SENSITIVE.key -> isCaseSensitive.toString) {
+        if (isCaseSensitive) {
+          val requestedFields = getRootFields(StructField("id", IntegerType))
+          val prunedSchema = SchemaPruning.pruneDataSchema(
+            StructType.fromDDL("ID int, name String"), requestedFields)
+          assert(prunedSchema == StructType(Seq.empty))
+        } else {
+          // Schema is case insensitive
+          val prunedSchema = SchemaPruning.pruneDataSchema(
+            StructType.fromDDL("ID int, name String"),
+            getRootFields(StructField("id", IntegerType)))
+          assert(prunedSchema == StructType(StructField("ID", IntegerType) :: Nil))
+          // Root fields are insensitive

Review comment:
       `are insensitive` -> `are case-insensitive`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-823261848


   @dongjoon-hyun  @viirya , can this PR be merged ?. If not I am happy to address review comments if there are any


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820624884


   **[Test build #752951989](https://github.com/sandeep-katta/spark/actions/runs/752951989)** for PR 32194 at commit [`db4a74a`](https://github.com/sandeep-katta/spark/commit/db4a74a2da53272e1b3cdd27f8e9105938ef9d2c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820683081






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820653119


   **[Test build #137436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137436/testReport)** for PR 32194 at commit [`f6e4b6b`](https://github.com/apache/spark/commit/f6e4b6b78d2a46a9e27b9674902e61c15fae61df).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820685927


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42011/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821056637


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42051/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821021484


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42049/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821056637


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42051/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820627109


   **[Test build #752985634](https://github.com/sandeep-katta/spark/actions/runs/752985634)** for PR 32194 at commit [`f6e4b6b`](https://github.com/sandeep-katta/spark/commit/f6e4b6b78d2a46a9e27b9674902e61c15fae61df).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820947927


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42039/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820685927


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42011/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821088566


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137464/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-824509815


   @sandeep-katta can you help to resubmit the PR for 3.0? thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614642928



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -61,12 +63,15 @@ object SchemaPruning {
           sortLeftFieldsByRight(leftValueType, rightValueType),
           containsNull)
       case (leftStruct: StructType, rightStruct: StructType) =>
-        val filteredRightFieldNames = rightStruct.fieldNames.filter(leftStruct.fieldNames.contains)
+        val resolver = conf.resolver
+        val filteredRightFieldNames = rightStruct.fieldNames
+          .filter(name => leftStruct.fieldNames.exists(resolver(_, name)))
         val sortedLeftFields = filteredRightFieldNames.map { fieldName =>
-          val leftFieldType = leftStruct(fieldName).dataType
+          val resolvedLeftStruct = leftStruct.filter(p => resolver(p.name, fieldName)).head

Review comment:
       Done updated, just curious is there any performance difference between these two




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821188377


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137475/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-824511186


   Sure will raise it soon
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820625573


   **[Test build #137434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137434/testReport)** for PR 32194 at commit [`db4a74a`](https://github.com/apache/spark/commit/db4a74a2da53272e1b3cdd27f8e9105938ef9d2c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614535186



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -36,7 +38,8 @@ object SchemaPruning {
       .reduceLeft(_ merge _)
     val dataSchemaFieldNames = dataSchema.fieldNames.toSet
     val mergedDataSchema =
-      StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
+      StructType(mergedSchema.filter(f => dataSchemaFieldNames.exists(resolver(_, f.name))

Review comment:
       Below query should fail without this fix
   ```
   val inputPath = "/Users/xyz/data/testcaseInsensitivity"
   val output_path = "/Users/xyz/output"
   
   spark.range(10).write.format("parquet").save(inputPath)
   
   def process_row(microBatch: DataFrame, batchId: Long): Unit = {
     val df = microBatch.select($"ID".alias("other")) // Doesn't work
     df.write.format("parquet").mode("append").save(output_path)
   
   }
   
   val schema = new StructType().add("id", LongType)
   
   val stream_df = spark.readStream.schema(schema).format("parquet").load(inputPath)
   stream_df.writeStream.trigger(Trigger.Once).foreachBatch(process_row _)
     .start().awaitTermination()
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820766431


   **[Test build #137434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137434/testReport)** for PR 32194 at commit [`db4a74a`](https://github.com/apache/spark/commit/db4a74a2da53272e1b3cdd27f8e9105938ef9d2c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

viirya commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614341643



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -36,7 +38,8 @@ object SchemaPruning {
       .reduceLeft(_ merge _)
     val dataSchemaFieldNames = dataSchema.fieldNames.toSet
     val mergedDataSchema =
-      StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
+      StructType(mergedSchema.filter(f => dataSchemaFieldNames.exists(resolver(_, f.name))

Review comment:
       Do you have e2e test case that fails to resolve to correct field?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821022075


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42049/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820951539


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42039/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614558228



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruningSuite.scala
##########
@@ -59,4 +69,28 @@ class SchemaPruningSuite extends SparkFunSuite {
       StructType.fromDDL("e int, f string")))
     testPrunedSchema(complexStruct, StructField("c", IntegerType), selectFieldInMap)
   }
+
+  test("SPARK-35096: test case insensitivity of pruned schema  ") {
+    Seq(true, false).foreach(isCaseSensitive => {
+      withSQLConf(CASE_SENSITIVE.key -> isCaseSensitive.toString) {
+        if (isCaseSensitive) {
+          val requestedFields = getRootFields(StructField("id", IntegerType))
+          val prunedSchema = SchemaPruning.pruneDataSchema(
+            StructType.fromDDL("ID int, name String"), requestedFields)
+          assert(prunedSchema == StructType(Seq.empty))
+        } else {
+          // Schema is case insensitive
+          val prunedSchema = SchemaPruning.pruneDataSchema(
+            StructType.fromDDL("ID int, name String"),
+            getRootFields(StructField("id", IntegerType)))
+          assert(prunedSchema == StructType(StructField("ID", IntegerType) :: Nil))
+          // Root fields are insensitive
+          val prunedSchema_1 = SchemaPruning.pruneDataSchema(

Review comment:
       Apache Spark has a scala style guideline for variable. We use `camelCase` naming instead of `_1`.
   - https://github.com/databricks/scala-style-guide#variable-naming




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821188377


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137475/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820772685


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137434/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821022075


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42049/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sandeep-katta commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

sandeep-katta commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-824527188


   backport PR https://github.com/apache/spark/pull/32284


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820625573


   **[Test build #137434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137434/testReport)** for PR 32194 at commit [`db4a74a`](https://github.com/apache/spark/commit/db4a74a2da53272e1b3cdd27f8e9105938ef9d2c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

cloud-fan closed pull request #32194:
URL: https://github.com/apache/spark/pull/32194


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820951539


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42039/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614557235



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruningSuite.scala
##########
@@ -32,7 +43,6 @@ class SchemaPruningSuite extends SparkFunSuite {
       val expectedSchema = SchemaPruning.pruneDataSchema(schema, requestedRootFields)
       assert(expectedSchema == StructType(requestedFields))
     }
-

Review comment:
       Shall we revert this removal?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820945023


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42039/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820772685


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137434/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820906993


   **[Test build #754432050](https://github.com/sandeep-katta/spark/actions/runs/754432050)** for PR 32194 at commit [`e477e91`](https://github.com/sandeep-katta/spark/commit/e477e91222cff396d8d3421f77b9957c238e9757).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820923870


   **[Test build #137464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137464/testReport)** for PR 32194 at commit [`baf8125`](https://github.com/apache/spark/commit/baf81252128a10c169433a16721f5f568d827934).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

viirya commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614544878



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -61,12 +64,16 @@ object SchemaPruning {
           sortLeftFieldsByRight(leftValueType, rightValueType),
           containsNull)
       case (leftStruct: StructType, rightStruct: StructType) =>
-        val filteredRightFieldNames = rightStruct.fieldNames.filter(leftStruct.fieldNames.contains)
+        val resolver = conf.resolver
+        val filteredRightFieldNames = rightStruct.fieldNames
+          .filter(name => leftStruct.fieldNames.exists(resolver(_, name)))
         val sortedLeftFields = filteredRightFieldNames.map { fieldName =>
-          val leftFieldType = leftStruct(fieldName).dataType
-          val rightFieldType = rightStruct(fieldName).dataType
+          val resolvedLeftStruct = leftStruct.filter(p => resolver(p.name, fieldName)).head
+          val leftFieldType = resolvedLeftStruct.dataType
+          val resolvedRightStruct = rightStruct.filter(p => resolver(p.name, fieldName)).head

Review comment:
       `fieldName` is from `rightStruct`. I think we can directly use `val rightFieldType = rightStruct(fieldName).dataType`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

viirya commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614543508



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -36,7 +38,8 @@ object SchemaPruning {
       .reduceLeft(_ merge _)
     val dataSchemaFieldNames = dataSchema.fieldNames.toSet
     val mergedDataSchema =
-      StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
+      StructType(mergedSchema.filter(f => dataSchemaFieldNames.exists(resolver(_, f.name))

Review comment:
       Oh, I see. SPARK-34963 fixes the nested column extractor case. This is top-level.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821088566


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137464/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820917401


   **[Test build #754495960](https://github.com/sandeep-katta/spark/actions/runs/754495960)** for PR 32194 at commit [`baf8125`](https://github.com/sandeep-katta/spark/commit/baf81252128a10c169433a16721f5f568d827934).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-824505643


   Hi, All.
   This broke branch-3.0 because there is no `SQLConfHelper`.
   ```
   Error: ] /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala:20: object SQLConfHelper is not a member of package org.apache.spark.sql.catalyst
   Error: ] /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala:23: not found: type SQLConfHelper
   Error: ] /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala:32: not found: value conf
   ```
   
   ```
   spark-3.0:branch-3.0 $ git grep SQLConfHelper
   sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala:import org.apache.spark.sql.catalyst.SQLConfHelper
   sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala:object SchemaPruning extends SQLConfHelper {
   ```
   
   I'll revert this from branch-3.0. Please make a backporting PR to branch-3.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-824515044


   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820997111


   **[Test build #754870665](https://github.com/sandeep-katta/spark/actions/runs/754870665)** for PR 32194 at commit [`004d56c`](https://github.com/sandeep-katta/spark/commit/004d56cbed5937e25a0163b11d5610f017a6f7a8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820653119


   **[Test build #137436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137436/testReport)** for PR 32194 at commit [`f6e4b6b`](https://github.com/apache/spark/commit/f6e4b6b78d2a46a9e27b9674902e61c15fae61df).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org