You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nicholas Chammas (Jira)" <ji...@apache.org> on 2021/11/02 14:42:00 UTC

[jira] [Comment Edited] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

    [ https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437394#comment-17437394 ] 

Nicholas Chammas edited comment on SPARK-24853 at 11/2/21, 2:41 PM:
--------------------------------------------------------------------

[~hyukjin.kwon] - It's not just for consistency. As noted in the description, this is useful when you are trying to rename a column with an ambiguous name.

For example, imagine two tables {{left}} and {{right}}, each with a column called {{count}}:
{code:python}
(
  left_counts.alias('left')
  .join(right_counts.alias('right'), on='join_key')
  .withColumn(
    'total_count',
    left_counts['count'] + right_counts['count']
  )
  .withColumnRenamed('left.count', 'left_count')  # no-op; alias doesn't work
  .withColumnRenamed('count', 'left_count')  # incorrect; it renames both count columns
  .withColumnRenamed(left_counts['count'], 'left_count')  # what, ideally, users want to do here
  .show()
){code}
If you don't mind, I'm going to reopen this issue.


was (Author: nchammas):
[~hyukjin.kwon] - It's not just for consistency. As noted in the description, this is useful when you are trying to rename a column with an ambiguous name.

For example, imagine two tables {{left}} and {{right}}, each with a column called {{count}}:
{code:java}
(
  left_counts.alias('left')
  .join(right_counts.alias('right'), on='join_key')
  .withColumn(
    'total_count',
    left_counts['count'] + right_counts['count']
  )
  .withColumnRenamed('left.count', 'left_count')  # no-op; alias doesn't work
  .withColumnRenamed('count', 'left_count')  # incorrect; it renames both count columns
  .withColumnRenamed(left_counts['count'], 'left_count')  # what, ideally, users want to do here
  .show()
){code}
If you don't mind, I'm going to reopen this issue.

> Support Column type for withColumn and withColumnRenamed apis
> -------------------------------------------------------------
>
>                 Key: SPARK-24853
>                 URL: https://issues.apache.org/jira/browse/SPARK-24853
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.2.2, 3.2.0
>            Reporter: nirav patel
>            Priority: Minor
>
> Can we add overloaded version of withColumn or withColumnRenamed that accept Column type instead of String? That way I can specify FQN in case when there is duplicate column names. e.g. if I have 2 columns with same name as a result of join and I want to rename one of the field I can do it with this new API.
>  
> This would be similar to Drop api which supports both String and Column type.
>  
> def
> withColumn(colName: Column, col: Column): DataFrame
> Returns a new Dataset by adding a column or replacing the existing column that has the same name.
>  
> def
> withColumnRenamed(existingName: Column, newName: Column): DataFrame
> Returns a new Dataset with a column renamed.
>  
>  
>  
> I think there should also be this one:
>  
> def
> withColumnRenamed(existingName: *Column*, newName: *Column*): DataFrame
> Returns a new Dataset with a column renamed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org