You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:16:39 UTC
[jira] [Resolved] (SPARK-11003) Allowing UserDefinedTypes to extend
primatives
[ https://issues.apache.org/jira/browse/SPARK-11003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-11003.
----------------------------------
Resolution: Incomplete
> Allowing UserDefinedTypes to extend primatives
> ----------------------------------------------
>
> Key: SPARK-11003
> URL: https://issues.apache.org/jira/browse/SPARK-11003
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.5.0, 1.5.1
> Reporter: John Muller
> Priority: Minor
> Labels: DataType, UDT, bulk-closed
>
> Currently, the classes and constructors of all the primative DataTypes (of StructFields) are private:
> https://github.com/apache/spark/tree/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types
> Which means for even simple String-based UDTs users will always have to implement serialize() and deserialize(). UDTs for something as simple as a Northwind database (products, orders, customers) would be very useful for pattern matching / validation. For example:
> import org.apache.spark.sql.types._
> @SQLUserDefinedType(udt = classOf[ProductNameUDT])
> case class ProductName(name: String) extends StringType with Validator {
> import scala.util.matching.Regex
> private val pattern = """[A-Z][A-Za-z]*"""
> def validate(): Boolean = {
> name match {
> case pattern(_*) => true
> case _ => false
> }
> }
> }
> class ProductNameUDT extends UserDefinedType[ProductName] {
> // No need for this; ProductName is a StringType so we know how to deserialize
> override def serialize(p: Any): Any = {
> p match {
> case p: ProductName => Seq(p.name)
> }
> }
>
> // Not sure why this override is needed at all; can't we always get this simply by the UDT type param?
> override def userClass: Class[ProductName] = classOf[ProductName]
>
> // Instead of the below, just infer the StructField name via reflection of the wrapper class' name
> override def sqlType: DataType = StructType(Seq(StructField("ProductName", StringType)))
> // Still needed.
> override def deserialize(datum: Any): ProductName = {
> datum match {
> case values: Seq[_] =>
> assert(values.length == 1)
> ProductName(values.head.asInstanceOf[String])
> }
> }
> }
> This would simplify the process of creating "primative extension" UDTs down to just 2 steps:
> 1. Annotated case class that extends a primative DataType
> 2. The UDT itself just needs a deserializer
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org