You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Daniel Davies (Jira)" <ji...@apache.org> on 2022/02/12 23:08:00 UTC

[jira] [Created] (SPARK-38193) [Spark Core] [Feature] change of unionByName parameter

Daniel Davies created SPARK-38193:
-------------------------------------

             Summary: [Spark Core] [Feature] change of unionByName parameter
                 Key: SPARK-38193
                 URL: https://issues.apache.org/jira/browse/SPARK-38193
             Project: Spark
          Issue Type: New Feature
          Components: Spark Core
    Affects Versions: 3.2.1
            Reporter: Daniel Davies


Hello,

I had a quick question about the unionByName function. This function currently seems to accept a parameter- "allowMissingColumns"- that allows some tolerance to merging datasets with different schemas [here|[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2170]]; but the implementation is currently a bit restrictive, i.e., with the second parameter being a boolean, it is only possible to make unionByName add all columns from both dataframes at the moment. We have other use cases in our workflows- for example, to take only column names that are in both dataframes (and I'm assuming that other users will have different merge strategies in mind also). Does it seem reasonable to extend the parameter from "allowMissingColumns" to a "mode" string-type parameter natively in Spark? If so, I'm happy to make a PR to achieve this (the change would involve amending the [ResolveUnion.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveUnion.scala] utility to make it more flexible in merging columns; to a user it would look a lot more like the 'join' operator, where a join strategy is selected). 

I've posted this question on the dev mailing list also; happy to continue the conversation there if that is preferable.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org