You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/08/09 05:47:04 UTC

[GitHub] [spark] coleleahy commented on issue #21310: [SPARK-24256][SQL] SPARK-24256: ExpressionEncoder should support user-defined types as fields of Scala case class and tuple

coleleahy commented on issue #21310: [SPARK-24256][SQL] SPARK-24256: ExpressionEncoder should support user-defined types as fields of Scala case class and tuple
URL: https://github.com/apache/spark/pull/21310#issuecomment-519787012
 
 
   As @fangshil [points out](https://github.com/apache/spark/pull/21310#issue-187607196), due to the fact that Spark's encoder-generating facilities found in ScalaReflection and JavaTypeInference cannot be made aware of a user-defined Encoder[T], it is fairly inconvenient to work with a Dataset[T] for which such an encoder has been defined. He mentions two reasons:
   
   1. Common operations like joins and aggregations require the ability to encode a Dataset[(T, S)] or the like, which Spark will not recognize how to encode -- precisely because the encoder-generating facilities in ScalaReflection cannot see the custom user-defined Encoder[T].
   
   2. The perfectly reasonable desire to create a case class or Java bean containing a member of type T is thwarted, again because the encoder-generating facilities in ScalaReflection and JavaTypeInference cannot see the custom Encoder[T].
   
   Now, the first problem can perhaps be worked around, for example by implicitly defining an Encoder[(T, S)] whenever there is an implicit Encoder[T] and Encoder[S]. However, the second problem remains. And that is precisely what the present PR sets out to solve.
   
   I understand if the Spark community would prefer to take another approach to solving this problem, but then I'd like to find out what that approach is.
   
   For instance, is the consensus that the best approach is to create a UserDefinedType[T] and register it through the currently private UDTRegistration API? If so, could someone please point me to a thread in the Spark dev list that can shed light on the justification behind this choice, and on the timeline for making that API public?
   
   Finally, I'd like to ask why, even if the UserDefinedType[T] approach is preferred, the work in the present PR isn't being considered as a supplementary enhancement -- one which many Spark users would find very convenient.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org