You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Lee Dongjin (JIRA)" <ji...@apache.org> on 2017/03/09 15:50:38 UTC

[jira] [Commented] (SPARK-17248) Add native Scala enum support to Dataset Encoders

    [ https://issues.apache.org/jira/browse/SPARK-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903245#comment-15903245 ] 

Lee Dongjin commented on SPARK-17248:
-------------------------------------

[~pdxleif] // Although it may be an expired question, let me answer.

There are two ways of implementing Enum types in Scala. For more information, see: http://alvinalexander.com/scala/how-to-use-scala-enums-enumeration-examples The 'enumeration object' in the tuning guide seems to point the second way, that is, using sealed traits and case objects.

However, it seems like the sentence "Consider using numeric IDs or enumeration objects instead of strings for keys." in the tuning guide does not apply to DataSet, like your case. From my humble knowledge, DataSet supports only case classes with primitive types, not with the other case class or object, in this case, the enum objects.

If you want some workaround, please check out this example: https://github.com/dongjinleekr/spark-dataset/blob/master/src/main/scala/com/github/dongjinleekr/spark/dataset/Titanic.scala It shows an example of DataSet using one of the public datasets. Please pay attention to how I matched the Passenger case class and its corresponding SchemaType - age, pClass, sex and embarked.

[~srowen] // It seems like lots of users are experiencing similar problems. How about changing this issue into providing more examples and explanations in official documentation? I needed, I would like to take the issue.

> Add native Scala enum support to Dataset Encoders
> -------------------------------------------------
>
>                 Key: SPARK-17248
>                 URL: https://issues.apache.org/jira/browse/SPARK-17248
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Silvio Fiorito
>
> Enable support for Scala enums in Encoders. Ideally, users should be able to use enums as part of case classes automatically.
> Currently, this code...
> {code}
> object MyEnum extends Enumeration {
>   type MyEnum = Value
>   val EnumVal1, EnumVal2 = Value
> }
> case class MyClass(col: MyEnum.MyEnum)
> val data = Seq(MyClass(MyEnum.EnumVal1), MyClass(MyEnum.EnumVal2)).toDS()
> {code}
> ...results in this stacktrace:
> {code}
> ava.lang.UnsupportedOperationException: No Encoder found for MyEnum.MyEnum
> - field (class: "scala.Enumeration.Value", name: "col")
> - root class: "line550c9f34c5144aa1a1e76bcac863244717.$read.$iwC.$iwC.$iwC.$iwC.MyClass"
> 	at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:598)
> 	at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:592)
> 	at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:583)
> 	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> 	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> 	at scala.collection.immutable.List.foreach(List.scala:318)
> 	at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
> 	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
> 	at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:583)
> 	at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:425)
> 	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:61)
> 	at org.apache.spark.sql.Encoders$.product(Encoders.scala:274)
> 	at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:47)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org