You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "viirya (via GitHub)" <gi...@apache.org> on 2023/03/09 05:56:18 UTC

[GitHub] [spark] viirya commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

viirya commented on code in PR #40300:
URL: https://github.com/apache/spark/pull/40300#discussion_r1130515459


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala:
##########
@@ -340,13 +358,26 @@ trait ExposesMetadataColumns extends LogicalPlan {
       val resolve = conf.resolver
       val outputNames = outputSet.map(_.name)
 
-      def isOutputColumn(col: AttributeReference): Boolean = {
-        outputNames.exists(name => resolve(col.name, name))
+      // Generate a unique name by prepending underscores.
+      @scala.annotation.tailrec
+      def makeUnique(name: String): String = name match {
+        case name if outputNames.exists(resolve(_, name)) => makeUnique(s"_$name")
+        case name => name
+      }
+
+      // Rename metadata struct columns whose names conflict with output columns.

Review Comment:
   Looks like `getProjection` wrongly matches attribute. It tries to get pruned datatype from `schema` by looking with field name. `StructType` maintains the mapping between (field name -> StructField), not (expr id -> StructField). If there are duplicate columns, it possibly picks incorrect attribute.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org