You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yong Zhang <ja...@hotmail.com> on 2017/02/28 20:03:53 UTC

Why Spark cannot get the derived field of case class in Dataset?

In the following example, the "day" value is in the case class, but I cannot get that in the Spark dataset, which I would like to use at runtime? Any idea? Do I have to force it to be present in the case class constructor? I like to derive it out automatically and used in the dataset or dataframe.


Thanks


scala> spark.version
res12: String = 2.1.0

scala> import java.text.SimpleDateFormat
import java.text.SimpleDateFormat

scala> val dateFormat = new SimpleDateFormat("yyyy-MM-dd")
dateFormat: java.text.SimpleDateFormat = java.text.SimpleDateFormat@f67a0200

scala> case class Test(time: Long) {
     |   val day = dateFormat.format(time)
     | }
defined class Test
scala> val t = Test(1487185076410L)
t: Test = Test(1487185076410)

scala> t.time
res13: Long = 1487185076410

scala> t.day
res14: String = 2017-02-15

scala> val ds = Seq(t).toDS()
ds: org.apache.spark.sql.Dataset[Test] = [time: bigint]

scala> ds.show
+-------------+
|         time|
+-------------+
|1487185076410|
+-------------+


Re: Why Spark cannot get the derived field of case class in Dataset?

Posted by Michael Armbrust <mi...@databricks.com>.
We only serialize things that are in the constructor.  You would have
access to it in the typed API (df.map(_.day)).  I'd suggest making a
factory method that fills these in and put them in the constructor if you
need to get to it from other dataframe operations.

On Tue, Feb 28, 2017 at 12:03 PM, Yong Zhang <ja...@hotmail.com> wrote:

> In the following example, the "day" value is in the case class, but I
> cannot get that in the Spark dataset, which I would like to use at runtime?
> Any idea? Do I have to force it to be present in the case class
> constructor? I like to derive it out automatically and used in the dataset
> or dataframe.
>
>
> Thanks
>
>
> scala> spark.versionres12: String = 2.1.0
>
> scala> import java.text.SimpleDateFormatimport java.text.SimpleDateFormat
>
> scala> val dateFormat = new SimpleDateFormat("yyyy-MM-dd")dateFormat: java.text.SimpleDateFormat = java.text.SimpleDateFormat@f67a0200
>
> scala> case class Test(time: Long) {     |   val day = dateFormat.format(time)     | }defined class Testscala> val t = Test(1487185076410L)t: Test = Test(1487185076410)
>
> scala> t.timeres13: Long = 1487185076410
>
> scala> t.dayres14: String = 2017-02-15
>
> scala> val ds = Seq(t).toDS()ds: org.apache.spark.sql.Dataset[Test] = [time: bigint]
>
> scala> ds.show+-------------+|         time|+-------------+|1487185076410|+-------------+
>
>
>