You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jakob Odersky (JIRA)" <ji...@apache.org> on 2016/10/17 17:40:58 UTC
[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

    [ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15582882#comment-15582882 ] 

Jakob Odersky commented on SPARK-17368:
---------------------------------------

[~arisofalaska@gmail.com] Let me explain the fix to what I thought was initially impossible.
Value classes do have a class-representation for compatibility with Java, and although this will have a slight overhead compared to the primitive counterpart, catalyst will mostly negate that overhead by proving its own encoders and operators on serialized objects. This means that any operations on datasets that allow user defined functions (e.g. `map`, `filter` etc) will work with the class representation instead of the wrapped value.
Regarding the availability of encoders: while we cannot create type-classes that apply only to value classes (an implicit for `AnyVal` will also be applied to primitive types), without resorting to macros, this fix adds value class support to existing encoders. E.g. you can define your value class as a case class and have a working encoder out-of-the-box.
Unfortunately there is no way to statically verify that the wrapped value is also encodable, but encoders in general will perform "deep inspection" during runtime.

> Scala value classes create encoder problems and break at runtime
> ----------------------------------------------------------------
>
>                 Key: SPARK-17368
>                 URL: https://issues.apache.org/jira/browse/SPARK-17368
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 1.6.2, 2.0.0
>         Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>            Reporter: Aris Vlasakakis
>            Assignee: Jakob Odersky
>             Fix For: 2.1.0
>
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but will break at runtime with the error. The value class is of course *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>    +- assertnotnull(input[0, int, true], top level non-flat input object)
>       +- input[0, int, true]".
>         at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
>         at org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
>         at org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> {noformat}
> Test code for Spark 2.0.0:
> {noformat}
> import org.apache.spark.sql.{Dataset, SparkSession}
> object BreakSpark {
>   case class FeatureId(v: Int) extends AnyVal
>   def main(args: Array[String]): Unit = {
>     val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
>     val spark = SparkSession.builder.getOrCreate()
>     import spark.implicits._
>     spark.sparkContext.setLogLevel("warn")
>     val ds: Dataset[FeatureId] = spark.createDataset(seq)
>     println(s"BREAK HERE: ${ds.count}")
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org