You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ma...@apache.org on 2022/11/23 06:14:11 UTC

[spark] branch master updated: [SPARK-41206][SQL][FOLLOWUP] Make result of `checkColumnNameDuplication` stable to fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new e42d3836af9 [SPARK-41206][SQL][FOLLOWUP] Make result of `checkColumnNameDuplication` stable to fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13
e42d3836af9 is described below

commit e42d3836af9eea881868c80f3c2cbc29e1d7b4f1
Author: yangjie01 <ya...@baidu.com>
AuthorDate: Wed Nov 23 09:13:56 2022 +0300

    [SPARK-41206][SQL][FOLLOWUP] Make result of `checkColumnNameDuplication` stable to fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13
    
    ### What changes were proposed in this pull request?
    This pr add a sort when `columnAlreadyExistsError` will be thrown to make the result of `SchemaUtils#checkColumnNameDuplication` stable.
    
    ### Why are the changes needed?
    Fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    
    - Pass GA
    - Manual test:
    
    ```
    dev/change-scala-version.sh 2.13
    build/sbt clean "sql/testOnly org.apache.spark.sql.DataFrameSuite" -Pscala-2.13
    build/sbt  "sql/testOnly org.apache.spark.sql.execution.datasources.json.JsonV1Suite" -Pscala-2.13
    build/sbt  "sql/testOnly org.apache.spark.sql.execution.datasources.json.JsonV2Suite" -Pscala-2.13
    build/sbt  "sql/testOnly org.apache.spark.sql.execution.datasources.json.JsonLegacyTimeParserSuite" -Pscala-2.13
    ```
    All tests passed
    
    Closes #38764 from LuciferYang/SPARK-41206.
    
    Authored-by: yangjie01 <ya...@baidu.com>
    Signed-off-by: Max Gekk <ma...@gmail.com>
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala
index aac96a9b56c..d202900381a 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala
@@ -107,7 +107,7 @@ private[spark] object SchemaUtils {
     val names = if (caseSensitiveAnalysis) columnNames else columnNames.map(_.toLowerCase)
     // scalastyle:on caselocale
     if (names.distinct.length != names.length) {
-      val columnName = names.groupBy(identity).collectFirst {
+      val columnName = names.groupBy(identity).toSeq.sortBy(_._1).collectFirst {
         case (x, ys) if ys.length > 1 => x
       }.get
       throw QueryCompilationErrors.columnAlreadyExistsError(columnName)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org