You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Minh Thai (JIRA)" <ji...@apache.org> on 2018/08/14 22:36:00 UTC
[jira] [Commented] (SPARK-20384) supporting value classes over
primitives in DataSets
[ https://issues.apache.org/jira/browse/SPARK-20384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580498#comment-16580498 ]
Minh Thai commented on SPARK-20384:
-----------------------------------
_(from my comment in SPARK-17368)_
I think the main problem is there was no way to create an encoder specifically for value classes even until today. However, I think we can make a [universal trait|https://docs.scala-lang.org/overviews/core/value-classes.html] called {{OpaqueValue}}^1^ to be used as an upper type bound in encoder. This means:
- Any user-defined value class has to mixin {{OpaqueValue}}
- An encoder can be created to target those value classes.
{code:java}
trait OpaqueValue extends Any
implicit def newValueClassEncoder[T <: Product with OpaqueValue : TypeTag] = ???
case class Id(value: Int) extends AnyVal with OpaqueValue
{code}
this doesn't clash with the existing encoder for case class since the type constraint is more specific
{code:java}
implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] = Encoders.product[T]
{code}
_I'm experimenting with this on my fork and will make a PR if it works well._
_(1) the name is inspired from [Opaque Type|https://docs.scala-lang.org/sips/opaque-types.html] feature of Scala 3_
> supporting value classes over primitives in DataSets
> ----------------------------------------------------
>
> Key: SPARK-20384
> URL: https://issues.apache.org/jira/browse/SPARK-20384
> Project: Spark
> Issue Type: Improvement
> Components: Optimizer, SQL
> Affects Versions: 2.1.0
> Reporter: Daniel Davis
> Priority: Minor
>
> As a spark user who uses value classes in scala for modelling domain objects, I also would like to make use of them for datasets.
> For example, I would like to use the {{User}} case class which is using a value-class for it's {{id}} as the type for a DataSet:
> - the underlying primitive should be mapped to the value-class column
> - function on the column (for example comparison ) should only work if defined on the value-class and use these implementation
> - show() should pick up the toString method of the value-class
> {code}
> case class Id(value: Long) extends AnyVal {
> def toString: String = value.toHexString
> }
> case class User(id: Id, name: String)
> val ds = spark.sparkContext
> .parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS()
> .withColumnRenamed("_1", "id")
> .withColumnRenamed("_2", "name")
> // mapping should work
> val usrs = ds.as[User]
> // show should use toString
> usrs.show()
> // comparison with long should throw exception, as not defined on Id
> usrs.col("id") > 0L
> {code}
> For example `.show()` should use the toString of the `Id` value class:
> {noformat}
> +---+-------+
> | id| name|
> +---+-------+
> | 0| name-0|
> | 1| name-1|
> | 2| name-2|
> | 3| name-3|
> | 4| name-4|
> | 5| name-5|
> | 6| name-6|
> | 7| name-7|
> | 8| name-8|
> | 9| name-9|
> | A|name-10|
> | B|name-11|
> | C|name-12|
> +---+-------+
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org