You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/15 19:50:32 UTC

[GitHub] [spark] edgarRd opened a new pull request, #36881: [SPARK-39484][SQL] Fix field name case sensitivity for struct type in V2WriteCommand.outputResolved

edgarRd opened a new pull request, #36881:
URL: https://github.com/apache/spark/pull/36881

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   
   When a V2 write uses an input with a struct type which contains differences in the casing of field names, the `caseSensitive` config is not being honored, always doing a strict case sensitive comparison.
    
   This PR adds the resolver to the private method `DataType#equalsIgnoreCompatibleNullabilityWithNameResolution`, renamed from `DataType#equalsIgnoreCompatibleNullability` and uses that resolver to perform the field name comparison. The `config.resolver` is passed in the invocation via `V2WriteCommand.outputResolved`.
   
   ### Why are the changes needed?
   I think the same case sensitivity configuration should be honored for field names, therefore this would be a bug.
   
   Repro:
   ```
   CREATE TABLE tmp.test_table_to (key int, object struct<shardId:int>) USING ICEBERG;
   CREATE TABLE tmp.test_table_from (key int, object struct<shardid:int>) USING HIVE;
   INSERT OVERWRITE tmp.test_table_to SELECT 1 as key, object FROM tmp.test_table_from;
   ```
   
   The above results in Exception:
   ```
   Error in query: unresolved operator 'OverwriteByExpression RelationV2[key#3, object#4] spark_catalog.tmp.test_table_to, true, false;
   'OverwriteByExpression RelationV2[key#3, object#4] spark_catalog.tmp.test_table_to, true, false
   +- Project [1 AS key#0, object#2]
      +- SubqueryAlias spark_catalog.tmp.test_table_from
         +- HiveTableRelation [`tmp`.`test_table_from`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Data Cols: [key#1, object#2], Partition Cols: []]
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, previously failing v2 write queries with mismatching casing in struct field names, now would respect the `caseSensitive` config, and if `caseSensitive = false` these queries would now work, as compared to before when they would always fail regardless of the `caseSensitive` config value.
   
   
   ### How was this patch tested?
   
   - Added unit test
   - Tested manually in distribution


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] edgarRd commented on pull request #36881: [SPARK-39484][SQL] Fix field name case sensitivity for struct type in V2WriteCommand.outputResolved

Posted by GitBox <gi...@apache.org>.
edgarRd commented on PR #36881:
URL: https://github.com/apache/spark/pull/36881#issuecomment-1161909366

   PTAL @cloud-fan @dongjoon-hyun - Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36881: [SPARK-39484][SQL] Fix field name case sensitivity for struct type in V2WriteCommand.outputResolved

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36881:
URL: https://github.com/apache/spark/pull/36881#discussion_r903236179


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala:
##########
@@ -53,7 +53,7 @@ trait V2WriteCommand extends UnaryCommand {
           val outType = CharVarcharUtils.getRawType(outAttr.metadata).getOrElse(outAttr.dataType)
           // names and types must match, nullability must be compatible
           inAttr.name == outAttr.name &&

Review Comment:
   It's actually more complicated than it looks. We have a rule `ResolveOutputRelation` to do the by-name resolution for top-level columns. Case sensitivity is not the only inconsistency between top-level columns and inner fields. We should update `ResolveOutputRelation` to take care of inner fields as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #36881: [SPARK-39484][SQL] Fix field name case sensitivity for struct type in V2WriteCommand.outputResolved

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #36881:
URL: https://github.com/apache/spark/pull/36881#issuecomment-1157399016

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #36881: [SPARK-39484][SQL] Fix field name case sensitivity for struct type in V2WriteCommand.outputResolved

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #36881:
URL: https://github.com/apache/spark/pull/36881#issuecomment-1264162468

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #36881: [SPARK-39484][SQL] Fix field name case sensitivity for struct type in V2WriteCommand.outputResolved

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #36881: [SPARK-39484][SQL] Fix field name case sensitivity for struct type in V2WriteCommand.outputResolved
URL: https://github.com/apache/spark/pull/36881


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36881: [SPARK-39484][SQL] Fix field name case sensitivity for struct type in V2WriteCommand.outputResolved

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36881:
URL: https://github.com/apache/spark/pull/36881#discussion_r903184211


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/V2WriteAnalysisSuite.scala:
##########
@@ -238,12 +238,28 @@ abstract class V2WriteAnalysisSuiteBase extends AnalysisTest {
   def byPosition(table: NamedRelation, query: LogicalPlan): LogicalPlan
 
   test("SPARK-33136: output resolved on complex types for V2 write commands") {

Review Comment:
   Thank you for pinging me.
   If you don't mind, please add a new test case instead of changing the existing test case, @edgarRd .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36881: [SPARK-39484][SQL] Fix field name case sensitivity for struct type in V2WriteCommand.outputResolved

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36881:
URL: https://github.com/apache/spark/pull/36881#discussion_r903234801


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala:
##########
@@ -53,7 +53,7 @@ trait V2WriteCommand extends UnaryCommand {
           val outType = CharVarcharUtils.getRawType(outAttr.metadata).getOrElse(outAttr.dataType)
           // names and types must match, nullability must be compatible
           inAttr.name == outAttr.name &&

Review Comment:
   shall we be consistent and also respect case sensitivity here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org