You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nicholas Chammas (Jira)" <ji...@apache.org> on 2021/11/02 14:42:00 UTC
[jira] [Comment Edited] (SPARK-24853) Support Column type for
withColumn and withColumnRenamed apis
[ https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437394#comment-17437394 ]
Nicholas Chammas edited comment on SPARK-24853 at 11/2/21, 2:41 PM:
--------------------------------------------------------------------
[~hyukjin.kwon] - It's not just for consistency. As noted in the description, this is useful when you are trying to rename a column with an ambiguous name.
For example, imagine two tables {{left}} and {{right}}, each with a column called {{count}}:
{code:python}
(
left_counts.alias('left')
.join(right_counts.alias('right'), on='join_key')
.withColumn(
'total_count',
left_counts['count'] + right_counts['count']
)
.withColumnRenamed('left.count', 'left_count') # no-op; alias doesn't work
.withColumnRenamed('count', 'left_count') # incorrect; it renames both count columns
.withColumnRenamed(left_counts['count'], 'left_count') # what, ideally, users want to do here
.show()
){code}
If you don't mind, I'm going to reopen this issue.
was (Author: nchammas):
[~hyukjin.kwon] - It's not just for consistency. As noted in the description, this is useful when you are trying to rename a column with an ambiguous name.
For example, imagine two tables {{left}} and {{right}}, each with a column called {{count}}:
{code:java}
(
left_counts.alias('left')
.join(right_counts.alias('right'), on='join_key')
.withColumn(
'total_count',
left_counts['count'] + right_counts['count']
)
.withColumnRenamed('left.count', 'left_count') # no-op; alias doesn't work
.withColumnRenamed('count', 'left_count') # incorrect; it renames both count columns
.withColumnRenamed(left_counts['count'], 'left_count') # what, ideally, users want to do here
.show()
){code}
If you don't mind, I'm going to reopen this issue.
> Support Column type for withColumn and withColumnRenamed apis
> -------------------------------------------------------------
>
> Key: SPARK-24853
> URL: https://issues.apache.org/jira/browse/SPARK-24853
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.2.2, 3.2.0
> Reporter: nirav patel
> Priority: Minor
>
> Can we add overloaded version of withColumn or withColumnRenamed that accept Column type instead of String? That way I can specify FQN in case when there is duplicate column names. e.g. if I have 2 columns with same name as a result of join and I want to rename one of the field I can do it with this new API.
>
> This would be similar to Drop api which supports both String and Column type.
>
> def
> withColumn(colName: Column, col: Column): DataFrame
> Returns a new Dataset by adding a column or replacing the existing column that has the same name.
>
> def
> withColumnRenamed(existingName: Column, newName: Column): DataFrame
> Returns a new Dataset with a column renamed.
>
>
>
> I think there should also be this one:
>
> def
> withColumnRenamed(existingName: *Column*, newName: *Column*): DataFrame
> Returns a new Dataset with a column renamed.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org