You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/15 18:01:56 UTC
[GitHub] [spark] sandeep-katta opened a new pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
sandeep-katta opened a new pull request #32194:
URL: https://github.com/apache/spark/pull/32194
### What changes were proposed in this pull request?
As a part of the SPARK-26837 pruning of nested fields from object serializers are supported. But it is missed to handle case insensitivity nature of spark
In this PR I have resolved the column names to be pruned based on `spark.sql.caseSensitive ` config
**Exception Before Fix**
```
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.spark.sql.types.StructType.apply(StructType.scala:414)
at org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.$anonfun$applyOrElse$3(objects.scala:216)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.applyOrElse(objects.scala:215)
at org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.applyOrElse(objects.scala:203)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:309)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:309)
at
```
### Why are the changes needed?
After Upgrade to Spark 3 `foreachBatch` API throws` java.lang.ArrayIndexOutOfBoundsException`. This issue will be fixed using this PR
### Does this PR introduce _any_ user-facing change?
No, Infact fixes the regression
### How was this patch tested?
Added tests and also tested verified manually
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821053547
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42051/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821050744
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42051/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614620249
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -61,12 +63,15 @@ object SchemaPruning {
sortLeftFieldsByRight(leftValueType, rightValueType),
containsNull)
case (leftStruct: StructType, rightStruct: StructType) =>
- val filteredRightFieldNames = rightStruct.fieldNames.filter(leftStruct.fieldNames.contains)
+ val resolver = conf.resolver
+ val filteredRightFieldNames = rightStruct.fieldNames
+ .filter(name => leftStruct.fieldNames.exists(resolver(_, name)))
val sortedLeftFields = filteredRightFieldNames.map { fieldName =>
- val leftFieldType = leftStruct(fieldName).dataType
+ val resolvedLeftStruct = leftStruct.filter(p => resolver(p.name, fieldName)).head
Review comment:
`filter` -> `find`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] sandeep-katta commented on a change in pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
sandeep-katta commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614535186
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -36,7 +38,8 @@ object SchemaPruning {
.reduceLeft(_ merge _)
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
- StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
+ StructType(mergedSchema.filter(f => dataSchemaFieldNames.exists(resolver(_, f.name))
Review comment:
Before query should fail without this fix
```
val inputPath = "/Users/xyz/data/testcaseInsensitivity"
val output_path = "/Users/xyz/output"
spark.range(10).write.format("parquet").save(inputPath)
def process_row(microBatch: DataFrame, batchId: Long): Unit = {
val df = microBatch.select($"ID".alias("other")) // Doesn't work
df.write.format("parquet").mode("append").save(output_path)
}
val schema = new StructType().add("id", LongType)
val stream_df = spark.readStream.schema(schema).format("parquet").load(inputPath)
stream_df.writeStream.trigger(Trigger.Once).foreachBatch(process_row _)
.start().awaitTermination()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614341325
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -36,7 +38,8 @@ object SchemaPruning {
.reduceLeft(_ merge _)
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
- StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
+ StructType(mergedSchema.filter(f => dataSchemaFieldNames.exists(resolver(_, f.name))
Review comment:
I should fix this issue in SPARK-34963 recently. `requestedRootFields` is already resolved with case-sensitivity.
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -36,7 +38,8 @@ object SchemaPruning {
.reduceLeft(_ merge _)
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
- StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
+ StructType(mergedSchema.filter(f => dataSchemaFieldNames.exists(resolver(_, f.name))
Review comment:
I should already fix this issue in SPARK-34963 recently. `requestedRootFields` is already resolved with case-sensitivity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] sandeep-katta edited a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
sandeep-katta edited a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-823261848
@dongjoon-hyun @viirya , can this PR be merged ?. If not, I am happy to address review comments if there are any
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820785818
**[Test build #137436 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137436/testReport)** for PR 32194 at commit [`f6e4b6b`](https://github.com/apache/spark/commit/f6e4b6b78d2a46a9e27b9674902e61c15fae61df).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] sandeep-katta commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
sandeep-katta commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820627846
CC @viirya @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820805775
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137436/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821071130
**[Test build #137464 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137464/testReport)** for PR 32194 at commit [`baf8125`](https://github.com/apache/spark/commit/baf81252128a10c169433a16721f5f568d827934).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821167742
**[Test build #137475 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137475/testReport)** for PR 32194 at commit [`04b24c9`](https://github.com/apache/spark/commit/04b24c9fee2d10ff6066cb87d9baf9a600736c88).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820923870
**[Test build #137464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137464/testReport)** for PR 32194 at commit [`baf8125`](https://github.com/apache/spark/commit/baf81252128a10c169433a16721f5f568d827934).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614556852
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -36,7 +38,8 @@ object SchemaPruning {
.reduceLeft(_ merge _)
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
- StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
+ StructType(mergedSchema.filter(f => dataSchemaFieldNames.exists(resolver(_, f.name))
+ ))
Review comment:
Indentation? Can we piggy-back this to line 41?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614557384
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruningSuite.scala
##########
@@ -59,4 +69,28 @@ class SchemaPruningSuite extends SparkFunSuite {
StructType.fromDDL("e int, f string")))
testPrunedSchema(complexStruct, StructField("c", IntegerType), selectFieldInMap)
}
+
+ test("SPARK-35096: test case insensitivity of pruned schema ") {
Review comment:
Shall we remove the trailing space in the test name?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820998327
**[Test build #754876451](https://github.com/sandeep-katta/spark/actions/runs/754876451)** for PR 32194 at commit [`04b24c9`](https://github.com/sandeep-katta/spark/commit/04b24c9fee2d10ff6066cb87d9baf9a600736c88).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-823835594
thanks, merging to master/3.1/3.0!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821022533
**[Test build #137475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137475/testReport)** for PR 32194 at commit [`04b24c9`](https://github.com/apache/spark/commit/04b24c9fee2d10ff6066cb87d9baf9a600736c88).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821022533
**[Test build #137475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137475/testReport)** for PR 32194 at commit [`04b24c9`](https://github.com/apache/spark/commit/04b24c9fee2d10ff6066cb87d9baf9a600736c88).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820805775
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137436/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614557660
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruningSuite.scala
##########
@@ -59,4 +69,28 @@ class SchemaPruningSuite extends SparkFunSuite {
StructType.fromDDL("e int, f string")))
testPrunedSchema(complexStruct, StructField("c", IntegerType), selectFieldInMap)
}
+
+ test("SPARK-35096: test case insensitivity of pruned schema ") {
+ Seq(true, false).foreach(isCaseSensitive => {
+ withSQLConf(CASE_SENSITIVE.key -> isCaseSensitive.toString) {
+ if (isCaseSensitive) {
+ val requestedFields = getRootFields(StructField("id", IntegerType))
+ val prunedSchema = SchemaPruning.pruneDataSchema(
+ StructType.fromDDL("ID int, name String"), requestedFields)
+ assert(prunedSchema == StructType(Seq.empty))
+ } else {
+ // Schema is case insensitive
+ val prunedSchema = SchemaPruning.pruneDataSchema(
+ StructType.fromDDL("ID int, name String"),
+ getRootFields(StructField("id", IntegerType)))
+ assert(prunedSchema == StructType(StructField("ID", IntegerType) :: Nil))
+ // Root fields are insensitive
Review comment:
`are insensitive` -> `are case-insensitive`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] sandeep-katta commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
sandeep-katta commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-823261848
@dongjoon-hyun @viirya , can this PR be merged ?. If not I am happy to address review comments if there are any
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820624884
**[Test build #752951989](https://github.com/sandeep-katta/spark/actions/runs/752951989)** for PR 32194 at commit [`db4a74a`](https://github.com/sandeep-katta/spark/commit/db4a74a2da53272e1b3cdd27f8e9105938ef9d2c).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820683081
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820653119
**[Test build #137436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137436/testReport)** for PR 32194 at commit [`f6e4b6b`](https://github.com/apache/spark/commit/f6e4b6b78d2a46a9e27b9674902e61c15fae61df).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820685927
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42011/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821056637
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42051/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821021484
Kubernetes integration test unable to build dist.
exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42049/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821056637
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42051/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820627109
**[Test build #752985634](https://github.com/sandeep-katta/spark/actions/runs/752985634)** for PR 32194 at commit [`f6e4b6b`](https://github.com/sandeep-katta/spark/commit/f6e4b6b78d2a46a9e27b9674902e61c15fae61df).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820947927
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42039/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820685927
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42011/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821088566
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137464/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-824509815
@sandeep-katta can you help to resubmit the PR for 3.0? thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] sandeep-katta commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
sandeep-katta commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614642928
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -61,12 +63,15 @@ object SchemaPruning {
sortLeftFieldsByRight(leftValueType, rightValueType),
containsNull)
case (leftStruct: StructType, rightStruct: StructType) =>
- val filteredRightFieldNames = rightStruct.fieldNames.filter(leftStruct.fieldNames.contains)
+ val resolver = conf.resolver
+ val filteredRightFieldNames = rightStruct.fieldNames
+ .filter(name => leftStruct.fieldNames.exists(resolver(_, name)))
val sortedLeftFields = filteredRightFieldNames.map { fieldName =>
- val leftFieldType = leftStruct(fieldName).dataType
+ val resolvedLeftStruct = leftStruct.filter(p => resolver(p.name, fieldName)).head
Review comment:
Done updated, just curious is there any performance difference between these two
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821188377
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137475/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] sandeep-katta commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
sandeep-katta commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-824511186
Sure will raise it soon
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820625573
**[Test build #137434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137434/testReport)** for PR 32194 at commit [`db4a74a`](https://github.com/apache/spark/commit/db4a74a2da53272e1b3cdd27f8e9105938ef9d2c).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] sandeep-katta commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
sandeep-katta commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614535186
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -36,7 +38,8 @@ object SchemaPruning {
.reduceLeft(_ merge _)
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
- StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
+ StructType(mergedSchema.filter(f => dataSchemaFieldNames.exists(resolver(_, f.name))
Review comment:
Below query should fail without this fix
```
val inputPath = "/Users/xyz/data/testcaseInsensitivity"
val output_path = "/Users/xyz/output"
spark.range(10).write.format("parquet").save(inputPath)
def process_row(microBatch: DataFrame, batchId: Long): Unit = {
val df = microBatch.select($"ID".alias("other")) // Doesn't work
df.write.format("parquet").mode("append").save(output_path)
}
val schema = new StructType().add("id", LongType)
val stream_df = spark.readStream.schema(schema).format("parquet").load(inputPath)
stream_df.writeStream.trigger(Trigger.Once).foreachBatch(process_row _)
.start().awaitTermination()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820766431
**[Test build #137434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137434/testReport)** for PR 32194 at commit [`db4a74a`](https://github.com/apache/spark/commit/db4a74a2da53272e1b3cdd27f8e9105938ef9d2c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614341643
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -36,7 +38,8 @@ object SchemaPruning {
.reduceLeft(_ merge _)
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
- StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
+ StructType(mergedSchema.filter(f => dataSchemaFieldNames.exists(resolver(_, f.name))
Review comment:
Do you have e2e test case that fails to resolve to correct field?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821022075
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42049/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820951539
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42039/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614558228
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruningSuite.scala
##########
@@ -59,4 +69,28 @@ class SchemaPruningSuite extends SparkFunSuite {
StructType.fromDDL("e int, f string")))
testPrunedSchema(complexStruct, StructField("c", IntegerType), selectFieldInMap)
}
+
+ test("SPARK-35096: test case insensitivity of pruned schema ") {
+ Seq(true, false).foreach(isCaseSensitive => {
+ withSQLConf(CASE_SENSITIVE.key -> isCaseSensitive.toString) {
+ if (isCaseSensitive) {
+ val requestedFields = getRootFields(StructField("id", IntegerType))
+ val prunedSchema = SchemaPruning.pruneDataSchema(
+ StructType.fromDDL("ID int, name String"), requestedFields)
+ assert(prunedSchema == StructType(Seq.empty))
+ } else {
+ // Schema is case insensitive
+ val prunedSchema = SchemaPruning.pruneDataSchema(
+ StructType.fromDDL("ID int, name String"),
+ getRootFields(StructField("id", IntegerType)))
+ assert(prunedSchema == StructType(StructField("ID", IntegerType) :: Nil))
+ // Root fields are insensitive
+ val prunedSchema_1 = SchemaPruning.pruneDataSchema(
Review comment:
Apache Spark has a scala style guideline for variable. We use `camelCase` naming instead of `_1`.
- https://github.com/databricks/scala-style-guide#variable-naming
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821188377
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137475/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820772685
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137434/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821022075
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42049/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] sandeep-katta commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
sandeep-katta commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-824527188
backport PR https://github.com/apache/spark/pull/32284
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820625573
**[Test build #137434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137434/testReport)** for PR 32194 at commit [`db4a74a`](https://github.com/apache/spark/commit/db4a74a2da53272e1b3cdd27f8e9105938ef9d2c).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #32194:
URL: https://github.com/apache/spark/pull/32194
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820951539
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42039/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614557235
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruningSuite.scala
##########
@@ -32,7 +43,6 @@ class SchemaPruningSuite extends SparkFunSuite {
val expectedSchema = SchemaPruning.pruneDataSchema(schema, requestedRootFields)
assert(expectedSchema == StructType(requestedFields))
}
-
Review comment:
Shall we revert this removal?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820945023
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42039/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820772685
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137434/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820906993
**[Test build #754432050](https://github.com/sandeep-katta/spark/actions/runs/754432050)** for PR 32194 at commit [`e477e91`](https://github.com/sandeep-katta/spark/commit/e477e91222cff396d8d3421f77b9957c238e9757).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820923870
**[Test build #137464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137464/testReport)** for PR 32194 at commit [`baf8125`](https://github.com/apache/spark/commit/baf81252128a10c169433a16721f5f568d827934).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614544878
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -61,12 +64,16 @@ object SchemaPruning {
sortLeftFieldsByRight(leftValueType, rightValueType),
containsNull)
case (leftStruct: StructType, rightStruct: StructType) =>
- val filteredRightFieldNames = rightStruct.fieldNames.filter(leftStruct.fieldNames.contains)
+ val resolver = conf.resolver
+ val filteredRightFieldNames = rightStruct.fieldNames
+ .filter(name => leftStruct.fieldNames.exists(resolver(_, name)))
val sortedLeftFields = filteredRightFieldNames.map { fieldName =>
- val leftFieldType = leftStruct(fieldName).dataType
- val rightFieldType = rightStruct(fieldName).dataType
+ val resolvedLeftStruct = leftStruct.filter(p => resolver(p.name, fieldName)).head
+ val leftFieldType = resolvedLeftStruct.dataType
+ val resolvedRightStruct = rightStruct.filter(p => resolver(p.name, fieldName)).head
Review comment:
`fieldName` is from `rightStruct`. I think we can directly use `val rightFieldType = rightStruct(fieldName).dataType`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #32194:
URL: https://github.com/apache/spark/pull/32194#discussion_r614543508
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -36,7 +38,8 @@ object SchemaPruning {
.reduceLeft(_ merge _)
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
- StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
+ StructType(mergedSchema.filter(f => dataSchemaFieldNames.exists(resolver(_, f.name))
Review comment:
Oh, I see. SPARK-34963 fixes the nested column extractor case. This is top-level.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-821088566
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137464/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820917401
**[Test build #754495960](https://github.com/sandeep-katta/spark/actions/runs/754495960)** for PR 32194 at commit [`baf8125`](https://github.com/sandeep-katta/spark/commit/baf81252128a10c169433a16721f5f568d827934).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-824505643
Hi, All.
This broke branch-3.0 because there is no `SQLConfHelper`.
```
Error: ] /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala:20: object SQLConfHelper is not a member of package org.apache.spark.sql.catalyst
Error: ] /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala:23: not found: type SQLConfHelper
Error: ] /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala:32: not found: value conf
```
```
spark-3.0:branch-3.0 $ git grep SQLConfHelper
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala:import org.apache.spark.sql.catalyst.SQLConfHelper
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala:object SchemaPruning extends SQLConfHelper {
```
I'll revert this from branch-3.0. Please make a backporting PR to branch-3.0.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-824515044
Thank you!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #32194: [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820997111
**[Test build #754870665](https://github.com/sandeep-katta/spark/actions/runs/754870665)** for PR 32194 at commit [`004d56c`](https://github.com/sandeep-katta/spark/commit/004d56cbed5937e25a0163b11d5610f017a6f7a8).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32194: [SPARK-35096][Core] SchemaPruning should adhere spark.sql.caseSensitive config
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32194:
URL: https://github.com/apache/spark/pull/32194#issuecomment-820653119
**[Test build #137436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137436/testReport)** for PR 32194 at commit [`f6e4b6b`](https://github.com/apache/spark/commit/f6e4b6b78d2a46a9e27b9674902e61c15fae61df).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org