You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Junfeng Chen <da...@gmail.com> on 2018/03/28 07:16:07 UTC

[Spark Java] Add new column in DataSet based on existed column

I am working on adding a date transformed field on existed dataset.

The current dataset contains a column named timestamp in ISO format. I want
to parse this field to joda time type, and then extract the year, month,
day, hour info as new column attaching to original dataset.
I have tried df.withColumn function, but it seems only support simple
expression rather than customized function like MapFunction.
How to solve it?

Thanks!



Regard,
Junfeng Chen

Re: [Spark Java] Add new column in DataSet based on existed column

Posted by Divya Gehlot <di...@gmail.com>.

Hi ,

Here is example snippet in scala

// Convert to a Date typeval timestamp2datetype: (Column) => Column =
(x) => { to_date(x) }df = df.withColumn("date",
timestamp2datetype(col("end_date")))

Hope this helps !

Thanks,

Divya



On 28 March 2018 at 15:16, Junfeng Chen <da...@gmail.com> wrote:

> I am working on adding a date transformed field on existed dataset.
>
> The current dataset contains a column named timestamp in ISO format. I
> want to parse this field to joda time type, and then extract the year,
> month, day, hour info as new column attaching to original dataset.
> I have tried df.withColumn function, but it seems only support simple
> expression rather than customized function like MapFunction.
> How to solve it?
>
> Thanks!
>
>
>
> Regard,
> Junfeng Chen
>