You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Debabrata Ghosh <ma...@gmail.com> on 2017/10/18 06:17:57 UTC

Need help with String Concat Operation

Hi,
             I am having a dataframe column (name of the column is CTOFF)
and I intend to prefix with '0' in case the length of the column is 3.
Unfortunately, I am unable to acheive my goal and wonder whether you can
help me here.

Command which I am executing:

ctoff_dedup_prep_temp =
ctoff_df.withColumn('CTOFF_NEW',when(length(col('CTOFF')) ==
3,'0'+col('CTOFF')))
ctoff_dedup_prep_temp.show()

+------------+----------+----------+--------------------+---------+-----+---------+
|EVNT_SRVC_AR|EVNT_FCLTY|EVNT_TP_CD|        NTWRK_PRD_CD|
DY_OF_WK|CTOFF|CTOFF_NEW|
+------------+----------+----------+--------------------+---------+-----+---------+
|         HKG|       HKC|        AR|2,3,7,8,C,D,J,P,Q...|1,2,3,4,5|
1440|     null|
|         HKG|       HKC|        AR|             C,Q,T,Y|1,2,3,4,5|
730|    730.0|
|         HKG|       HKC|        AR|         E,K,C,Q,T,Y|1,2,3,4,5|
600|    600.0|
|         HKG|       HKC|        AR|         E,K,C,Q,T,Y|1,2,3,4,5|
900|    900.0|
+------------+----------+----------+--------------------+---------+-----+---------+

The result which I want is:
+------------+----------+----------+--------------------+---------+-----+---------+
|EVNT_SRVC_AR|EVNT_FCLTY|EVNT_TP_CD|        NTWRK_PRD_CD|
DY_OF_WK|CTOFF|CTOFF_NEW|
+------------+----------+----------+--------------------+---------+-----+---------+
|         HKG|       HKC|        AR|2,3,7,8,C,D,J,P,Q...|1,2,3,4,5|
1440|    1440|
|         HKG|       HKC|        AR|             C,Q,T,Y|1,2,3,4,5|
730|    0730|
|         HKG|       HKC|        AR|         E,K,C,Q,T,Y|1,2,3,4,5|
600|    0600|
|         HKG|       HKC|        AR|         E,K,C,Q,T,Y|1,2,3,4,5|
900|    0900|
+------------+----------+----------+--------------------+---------+-----+---------+

So I want the '0' to be prefixed but it's getting suffixed as '.0'. Any
clue around why is this happening

Thanks,

Debu

Re: Need help with String Concat Operation

Posted by 高佳翔 <ga...@gmail.com>.
Hi Debu,

First, Instead of using ‘+’, you can use ‘concat’ to concatenate string
columns. And you should enclose “0” with "lit()" to make it a column.
Second, 1440 become null because you didn’t tell spark what to do if the
when clause is failed. So it simply set the value to null. To fix this, you
should add “.otherwise()” right behind “when()”.

The code looks like this:

ctoff_df.withColumn("CTOFF_NEW",
  when(
    length(col("CTOFF")) ==3,
    concat(lit("0"), col("CTOFF"))
  ).otherwise(
    col("CTOFF")
  ))

Best,

JiaXiang
​

On Wed, Oct 18, 2017 at 2:17 PM, Debabrata Ghosh <ma...@gmail.com>
wrote:

> Hi,
>              I am having a dataframe column (name of the column is CTOFF)
> and I intend to prefix with '0' in case the length of the column is 3.
> Unfortunately, I am unable to acheive my goal and wonder whether you can
> help me here.
>
> Command which I am executing:
>
> ctoff_dedup_prep_temp = ctoff_df.withColumn('CTOFF_NEW',when(length(col('CTOFF'))
> == 3,'0'+col('CTOFF')))
> ctoff_dedup_prep_temp.show()
>
> +------------+----------+----------+--------------------+---
> ------+-----+---------+
> |EVNT_SRVC_AR|EVNT_FCLTY|EVNT_TP_CD|        NTWRK_PRD_CD|
> DY_OF_WK|CTOFF|CTOFF_NEW|
> +------------+----------+----------+--------------------+---
> ------+-----+---------+
> |         HKG|       HKC|        AR|2,3,7,8,C,D,J,P,Q...|1,2,3,4,5|
> 1440|     null|
> |         HKG|       HKC|        AR|             C,Q,T,Y|1,2,3,4,5|
> 730|    730.0|
> |         HKG|       HKC|        AR|         E,K,C,Q,T,Y|1,2,3,4,5|
> 600|    600.0|
> |         HKG|       HKC|        AR|         E,K,C,Q,T,Y|1,2,3,4,5|
> 900|    900.0|
> +------------+----------+----------+--------------------+---
> ------+-----+---------+
>
> The result which I want is:
> +------------+----------+----------+--------------------+---
> ------+-----+---------+
> |EVNT_SRVC_AR|EVNT_FCLTY|EVNT_TP_CD|        NTWRK_PRD_CD|
> DY_OF_WK|CTOFF|CTOFF_NEW|
> +------------+----------+----------+--------------------+---
> ------+-----+---------+
> |         HKG|       HKC|        AR|2,3,7,8,C,D,J,P,Q...|1,2,3,4,5|
> 1440|    1440|
> |         HKG|       HKC|        AR|             C,Q,T,Y|1,2,3,4,5|
> 730|    0730|
> |         HKG|       HKC|        AR|         E,K,C,Q,T,Y|1,2,3,4,5|
> 600|    0600|
> |         HKG|       HKC|        AR|         E,K,C,Q,T,Y|1,2,3,4,5|
> 900|    0900|
> +------------+----------+----------+--------------------+---
> ------+-----+---------+
>
> So I want the '0' to be prefixed but it's getting suffixed as '.0'. Any
> clue around why is this happening
>
> Thanks,
>
> Debu
>



-- 
Gao JiaXiang
Data Analyst, GCBI <http://www.gcbi.com.cn>