You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Junfeng Chen <da...@gmail.com> on 2018/03/28 07:16:07 UTC
[Spark Java] Add new column in DataSet based on existed column
I am working on adding a date transformed field on existed dataset.
The current dataset contains a column named timestamp in ISO format. I want
to parse this field to joda time type, and then extract the year, month,
day, hour info as new column attaching to original dataset.
I have tried df.withColumn function, but it seems only support simple
expression rather than customized function like MapFunction.
How to solve it?
Thanks!
Regard,
Junfeng Chen
Re: [Spark Java] Add new column in DataSet based on existed column
Posted by Divya Gehlot <di...@gmail.com>.
Hi ,
Here is example snippet in scala
// Convert to a Date typeval timestamp2datetype: (Column) => Column =
(x) => { to_date(x) }df = df.withColumn("date",
timestamp2datetype(col("end_date")))
Hope this helps !
Thanks,
Divya
On 28 March 2018 at 15:16, Junfeng Chen <da...@gmail.com> wrote:
> I am working on adding a date transformed field on existed dataset.
>
> The current dataset contains a column named timestamp in ISO format. I
> want to parse this field to joda time type, and then extract the year,
> month, day, hour info as new column attaching to original dataset.
> I have tried df.withColumn function, but it seems only support simple
> expression rather than customized function like MapFunction.
> How to solve it?
>
> Thanks!
>
>
>
> Regard,
> Junfeng Chen
>