You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yong Zhang <ja...@hotmail.com> on 2017/02/28 20:03:53 UTC
Why Spark cannot get the derived field of case class in Dataset?
In the following example, the "day" value is in the case class, but I cannot get that in the Spark dataset, which I would like to use at runtime? Any idea? Do I have to force it to be present in the case class constructor? I like to derive it out automatically and used in the dataset or dataframe.
Thanks
scala> spark.version
res12: String = 2.1.0
scala> import java.text.SimpleDateFormat
import java.text.SimpleDateFormat
scala> val dateFormat = new SimpleDateFormat("yyyy-MM-dd")
dateFormat: java.text.SimpleDateFormat = java.text.SimpleDateFormat@f67a0200
scala> case class Test(time: Long) {
| val day = dateFormat.format(time)
| }
defined class Test
scala> val t = Test(1487185076410L)
t: Test = Test(1487185076410)
scala> t.time
res13: Long = 1487185076410
scala> t.day
res14: String = 2017-02-15
scala> val ds = Seq(t).toDS()
ds: org.apache.spark.sql.Dataset[Test] = [time: bigint]
scala> ds.show
+-------------+
| time|
+-------------+
|1487185076410|
+-------------+
Re: Why Spark cannot get the derived field of case class in Dataset?
Posted by Michael Armbrust <mi...@databricks.com>.
We only serialize things that are in the constructor. You would have
access to it in the typed API (df.map(_.day)). I'd suggest making a
factory method that fills these in and put them in the constructor if you
need to get to it from other dataframe operations.
On Tue, Feb 28, 2017 at 12:03 PM, Yong Zhang <ja...@hotmail.com> wrote:
> In the following example, the "day" value is in the case class, but I
> cannot get that in the Spark dataset, which I would like to use at runtime?
> Any idea? Do I have to force it to be present in the case class
> constructor? I like to derive it out automatically and used in the dataset
> or dataframe.
>
>
> Thanks
>
>
> scala> spark.versionres12: String = 2.1.0
>
> scala> import java.text.SimpleDateFormatimport java.text.SimpleDateFormat
>
> scala> val dateFormat = new SimpleDateFormat("yyyy-MM-dd")dateFormat: java.text.SimpleDateFormat = java.text.SimpleDateFormat@f67a0200
>
> scala> case class Test(time: Long) { | val day = dateFormat.format(time) | }defined class Testscala> val t = Test(1487185076410L)t: Test = Test(1487185076410)
>
> scala> t.timeres13: Long = 1487185076410
>
> scala> t.dayres14: String = 2017-02-15
>
> scala> val ds = Seq(t).toDS()ds: org.apache.spark.sql.Dataset[Test] = [time: bigint]
>
> scala> ds.show+-------------+| time|+-------------+|1487185076410|+-------------+
>
>
>