You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/06 06:45:04 UTC

[GitHub] [spark] viirya opened a new pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

viirya opened a new pull request #32059:
URL: https://github.com/apache/spark/pull/32059


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   This patch proposes a fix of nested column pruning for extracting case-insensitive struct field from array of struct.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   Under case-insensitive mode, nested column pruning rule cannot correctly push down extractor of a struct field of an array of struct, e.g.,
   
   ```scala
   val query = spark.table("contacts").select("friends.First", "friends.MiDDle")
   ```
   
   Error stack:
   ```
   [info]   java.lang.IllegalArgumentException: Field "First" does not exist.                                                                                        
   [info] Available fields:                                                                                                                                          
   [info]   at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:274)                                                                    
   [info]   at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:274)                            
   [info]   at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)                                                                                           
   [info]   at scala.collection.AbstractMap.getOrElse(Map.scala:59)                                                                                                  
   [info]   at org.apache.spark.sql.types.StructType.apply(StructType.scala:273)                                                                                     
   [info]   at org.apache.spark.sql.execution.ProjectionOverSchema$$anonfun$getProjection$3.apply(ProjectionOverSchema.scala:44)                                   
   [info]   at org.apache.spark.sql.execution.ProjectionOverSchema$$anonfun$getProjection$3.apply(ProjectionOverSchema.scala:41) 
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   
   Unit test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816803019


   > +1, LGTM for Apache Spark 3.2.0.
   > For me, I believe this can be considered an improvement to give additional support cases.
   
   Thanks @dongjoon-hyun and @maropu.
   
   For the case, if it doesn't throw exception but silently read all nested column, it is okay to treat it as an improvement. But it throws an exception so that is why I marked it as a bug in JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #32059:
URL: https://github.com/apache/spark/pull/32059#discussion_r608305174



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -48,7 +49,8 @@ object SchemaPruning {
    * right, recursively. That is, left is a "subschema" of right, ignoring order of
    * fields.
    */
-  private def sortLeftFieldsByRight(left: DataType, right: DataType): DataType =
+  private def sortLeftFieldsByRight(left: DataType, right: DataType): DataType = {

Review comment:
       It is. As like `https://github.com/apache/spark/pull/32059#discussion_r608277894`, at `selectField` we treat `GetStructField` and `GetArrayStructFields` differently. So it causes different behavior in case-sensitive aware resolution here.
   
   It looks like we should better correct them together..




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814571064


   **[Test build #136983 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136983/testReport)** for PR 32059 at commit [`15bdcd6`](https://github.com/apache/spark/commit/15bdcd6cb6c0147473ac3335350a190e14837110).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816899479


   Thanks @dongjoon-hyun @maropu. Merged to master/3.1/3.0. For 2.4, it has conflict, so I will backport it manually.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816044574


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41671/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #32059:
URL: https://github.com/apache/spark/pull/32059#discussion_r611092514



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ProjectionOverSchema.scala
##########
@@ -41,9 +41,14 @@ case class ProjectionOverSchema(schema: StructType) {
       case a: GetArrayStructFields =>
         getProjection(a.child).map(p => (p, p.dataType)).map {
           case (projection, ArrayType(projSchema @ StructType(_), _)) =>
+            // For case-sensitivity aware field resolution, we should take `ordinal` which

Review comment:
       Ah, I missed this comment. As it is minor, I will add the comment in #31966 for master only.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814049658


   **[Test build #136935 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136935/testReport)** for PR 32059 at commit [`f4db7e9`](https://github.com/apache/spark/commit/f4db7e94c1a13c545dba9ee267a3e55946830010).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #32059:
URL: https://github.com/apache/spark/pull/32059#discussion_r608278115



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaPruningSuite.scala
##########
@@ -774,4 +774,17 @@ abstract class SchemaPruningSuite
         assert(scanSchema === expectedScanSchema)
     }
   }
+
+  testSchemaPruning("extract case-insensitive struct field from array") {

Review comment:
       It is fine (https://github.com/apache/spark/pull/32059#discussion_r608277894), but it is better to add a test for better coverage too. Let me add one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-813882211


   **[Test build #136935 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136935/testReport)** for PR 32059 at commit [`f4db7e9`](https://github.com/apache/spark/commit/f4db7e94c1a13c545dba9ee267a3e55946830010).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #32059:
URL: https://github.com/apache/spark/pull/32059#discussion_r608305174



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -48,7 +49,8 @@ object SchemaPruning {
    * right, recursively. That is, left is a "subschema" of right, ignoring order of
    * fields.
    */
-  private def sortLeftFieldsByRight(left: DataType, right: DataType): DataType =
+  private def sortLeftFieldsByRight(left: DataType, right: DataType): DataType = {

Review comment:
       It is. As like https://github.com/apache/spark/pull/32059#discussion_r608277894, at `selectField` we treat `GetStructField` and `GetArrayStructFields` differently. So it causes different behavior in case-sensitive aware resolution here.
   
   It looks like we should better correct them together..




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814662895


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136983/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814595654


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41562/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-813918550


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41512/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816883901


   Feel free to proceed as you want, @viirya . I respect your decision here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #32059:
URL: https://github.com/apache/spark/pull/32059#discussion_r608270127



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ProjectionOverSchema.scala
##########
@@ -41,9 +43,10 @@ case class ProjectionOverSchema(schema: StructType) {
       case a: GetArrayStructFields =>
         getProjection(a.child).map(p => (p, p.dataType)).map {
           case (projection, ArrayType(projSchema @ StructType(_), _)) =>
+            val selectedField = projSchema.find(f => resolver(f.name, a.field.name)).get

Review comment:
       This issue can occur only in an array of structs? The code at line 66 (@dongjoon-hyun pointed out above) has the same pattern `projSchema.fieldIndex(field.name)`, so I'm worried that is can occur in other cases.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814572017


   **[Test build #136984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136984/testReport)** for PR 32059 at commit [`c335304`](https://github.com/apache/spark/commit/c33530448016aa3ff7e785b9613a23bb89c93059).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32059:
URL: https://github.com/apache/spark/pull/32059#discussion_r608243907



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ProjectionOverSchema.scala
##########
@@ -41,9 +43,10 @@ case class ProjectionOverSchema(schema: StructType) {
       case a: GetArrayStructFields =>
         getProjection(a.child).map(p => (p, p.dataType)).map {
           case (projection, ArrayType(projSchema @ StructType(_), _)) =>
+            val selectedField = projSchema.find(f => resolver(f.name, a.field.name)).get

Review comment:
       It seems that we are not doing this for `struct` type. To allow this for `array of struct`, it seems that we need this for `struct` first at line 66.
   ```
   GetStructField(projection, projSchema.fieldIndex(field.name))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816754875


   Sure, thanks, @viirya and @maropu .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #32059:
URL: https://github.com/apache/spark/pull/32059#discussion_r608273677



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ProjectionOverSchema.scala
##########
@@ -41,9 +43,10 @@ case class ProjectionOverSchema(schema: StructType) {
       case a: GetArrayStructFields =>
         getProjection(a.child).map(p => (p, p.dataType)).map {
           case (projection, ArrayType(projSchema @ StructType(_), _)) =>
+            val selectedField = projSchema.find(f => resolver(f.name, a.field.name)).get

Review comment:
       Let me check it on.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814589170


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41560/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816299179


   @dongjoon-hyun @maropu Now this is with more appropriate fix. Added a few more tests. Please take another look. Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-813882211


   **[Test build #136935 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136935/testReport)** for PR 32059 at commit [`f4db7e9`](https://github.com/apache/spark/commit/f4db7e94c1a13c545dba9ee267a3e55946830010).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816088746


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41673/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #32059:
URL: https://github.com/apache/spark/pull/32059#discussion_r608277894



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ProjectionOverSchema.scala
##########
@@ -41,9 +43,10 @@ case class ProjectionOverSchema(schema: StructType) {
       case a: GetArrayStructFields =>
         getProjection(a.child).map(p => (p, p.dataType)).map {
           case (projection, ArrayType(projSchema @ StructType(_), _)) =>
+            val selectedField = projSchema.find(f => resolver(f.name, a.field.name)).get

Review comment:
       Oh it is fine. `ExtractValue` actually does column name resolving correctly. The difference is how `ProjectionOverSchema` treats `GetArrayStructFields` and `GetStructField` there.
   
   That's also said we may not need to do resolving again in `ProjectionOverSchema`, as this PR currently do.  We can just use `GetArrayStructFields.ordinal` which already points to correct field in child expression.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816294153


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137093/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32059:
URL: https://github.com/apache/spark/pull/32059#discussion_r608235728



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -48,7 +49,8 @@ object SchemaPruning {
    * right, recursively. That is, left is a "subschema" of right, ignoring order of
    * fields.
    */
-  private def sortLeftFieldsByRight(left: DataType, right: DataType): DataType =
+  private def sortLeftFieldsByRight(left: DataType, right: DataType): DataType = {

Review comment:
       When we construct `mergedDataSchema` in line 39, it's also case-sensitive, isn't it?
   > StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814589145






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32059:
URL: https://github.com/apache/spark/pull/32059#discussion_r608243907



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ProjectionOverSchema.scala
##########
@@ -41,9 +43,10 @@ case class ProjectionOverSchema(schema: StructType) {
       case a: GetArrayStructFields =>
         getProjection(a.child).map(p => (p, p.dataType)).map {
           case (projection, ArrayType(projSchema @ StructType(_), _)) =>
+            val selectedField = projSchema.find(f => resolver(f.name, a.field.name)).get

Review comment:
       It seems that we are not doing this for `struct` type. To allow this for `array of struct`, it seems that we need this for `struct` first at line 66.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816273839


   **[Test build #137093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137093/testReport)** for PR 32059 at commit [`ea17366`](https://github.com/apache/spark/commit/ea17366a9d3cddebdfa671d38ed1e46e755de668).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816294153


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137093/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816297279


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137095/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32059:
URL: https://github.com/apache/spark/pull/32059#discussion_r608244497



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaPruningSuite.scala
##########
@@ -774,4 +774,17 @@ abstract class SchemaPruningSuite
         assert(scanSchema === expectedScanSchema)
     }
   }
+
+  testSchemaPruning("extract case-insensitive struct field from array") {

Review comment:
       Do we need to have a test coverage for `extract case-insensitive struct field` too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814654543


   **[Test build #136983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136983/testReport)** for PR 32059 at commit [`15bdcd6`](https://github.com/apache/spark/commit/15bdcd6cb6c0147473ac3335350a190e14837110).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814066364


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136935/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816088670






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816035909


   **[Test build #137095 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137095/testReport)** for PR 32059 at commit [`9005055`](https://github.com/apache/spark/commit/9005055df2f971a7d6ee909baaa0366c5e3b6683).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816088746


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41673/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-813911550






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-813918550


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41512/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-815997970


   **[Test build #137093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137093/testReport)** for PR 32059 at commit [`ea17366`](https://github.com/apache/spark/commit/ea17366a9d3cddebdfa671d38ed1e46e755de668).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814714377


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136984/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32059:
URL: https://github.com/apache/spark/pull/32059#discussion_r608243907



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ProjectionOverSchema.scala
##########
@@ -41,9 +43,10 @@ case class ProjectionOverSchema(schema: StructType) {
       case a: GetArrayStructFields =>
         getProjection(a.child).map(p => (p, p.dataType)).map {
           case (projection, ArrayType(projSchema @ StructType(_), _)) =>
+            val selectedField = projSchema.find(f => resolver(f.name, a.field.name)).get

Review comment:
       It seems that we are not doing this for `struct` type. To allow this for `array of struct`, maybe it seems that we need this for `struct` first at line 66.
   ```
   GetStructField(projection, projSchema.fieldIndex(field.name))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816035909


   **[Test build #137095 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137095/testReport)** for PR 32059 at commit [`9005055`](https://github.com/apache/spark/commit/9005055df2f971a7d6ee909baaa0366c5e3b6683).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya closed pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
viirya closed pull request #32059:
URL: https://github.com/apache/spark/pull/32059


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814604213


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41562/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814589170


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41560/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-815997970


   **[Test build #137093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137093/testReport)** for PR 32059 at commit [`ea17366`](https://github.com/apache/spark/commit/ea17366a9d3cddebdfa671d38ed1e46e755de668).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814604213


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41562/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816044534


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41671/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814662895


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136983/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #32059:
URL: https://github.com/apache/spark/pull/32059#discussion_r610651237



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ProjectionOverSchema.scala
##########
@@ -41,9 +41,14 @@ case class ProjectionOverSchema(schema: StructType) {
       case a: GetArrayStructFields =>
         getProjection(a.child).map(p => (p, p.dataType)).map {
           case (projection, ArrayType(projSchema @ StructType(_), _)) =>
+            // For case-sensitivity aware field resolution, we should take `ordinal` which

Review comment:
       How about leaving your comment `ExtractValue actually does column name resolving correctly` here, too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814572017


   **[Test build #136984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136984/testReport)** for PR 32059 at commit [`c335304`](https://github.com/apache/spark/commit/c33530448016aa3ff7e785b9613a23bb89c93059).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816297279


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137095/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814571064


   **[Test build #136983 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136983/testReport)** for PR 32059 at commit [`15bdcd6`](https://github.com/apache/spark/commit/15bdcd6cb6c0147473ac3335350a190e14837110).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816296445


   **[Test build #137095 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137095/testReport)** for PR 32059 at commit [`9005055`](https://github.com/apache/spark/commit/9005055df2f971a7d6ee909baaa0366c5e3b6683).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814714377


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136984/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-816044574


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41671/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814066364


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136935/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32059:
URL: https://github.com/apache/spark/pull/32059#discussion_r608235728



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala
##########
@@ -48,7 +49,8 @@ object SchemaPruning {
    * right, recursively. That is, left is a "subschema" of right, ignoring order of
    * fields.
    */
-  private def sortLeftFieldsByRight(left: DataType, right: DataType): DataType =
+  private def sortLeftFieldsByRight(left: DataType, right: DataType): DataType = {

Review comment:
       When we construct `mergedDataSchema` in line 39, it seems also case-sensitive, doesn't it?
   > StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814712382


   **[Test build #136984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136984/testReport)** for PR 32059 at commit [`c335304`](https://github.com/apache/spark/commit/c33530448016aa3ff7e785b9613a23bb89c93059).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org