You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/27 17:29:43 UTC

[GitHub] fottey commented on issue #23908: [SPARK-27001][SQL] Refactor "serializerFor" method between ScalaReflection and JavaTypeInference

fottey commented on issue #23908:  [SPARK-27001][SQL] Refactor "serializerFor" method between ScalaReflection and JavaTypeInference
URL: https://github.com/apache/spark/pull/23908#issuecomment-467955010
 
 
   As an outside observer, would this refactoring allow the method `ScalaReflection.serializeFor` to handle arbitrary types that conform to the Java bean interface, and/or common Java specific types, such as `java.util.List`?
   
   I recently discovered that because most of the common Scala implicit encoders reduce to `ExpressionEncoder`'s `apply` method, it's very difficult to work with arbitrary Java bean type's in the Dataset API.  
   
   Specifically, given a java bean type, `MyBean`, and an implicit encoder of that bean type in scope, existing Spark 2.4.0 machinery in can't synthesize a valid encoder at runtime for hybrid Scala / Java types, like `Seq[MyBean]` or tuple types like `(Int, MyBean)` despite the fact that we have encoders for `Seq[_]`, `Tuple2[_, _]`, and `MyBean` available separately.
   
   While it may be unreasonable to solve the problem generically across all potential classes, it would be really nice if `ExpressionEncoder`'s `apply` method could somehow detect and support at least java beans and java.util.Lists at runtime... 
   
   See below code examples:
   
   ```scala
   import com.example.MyBean
   import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
   
   object Example {
       case class Test()
   
       def main(args: Array[String]): Unit = {
           val spark: SparkSession = ???
   
           import spark.implicits._
   
           // Works today after above implicit import
           val ds: Dataset[Seq[Test]] = Seq(Seq(Test()), Seq(Test()), ...).toDS
   
           // DOES NOT WORK
           // ExpressionEncoder's apply method cannot handle type MyBean!
           implicit def newMyBeanExpressionEncoder: Encoder[MyBean] = ExpressionEncoder()
           // 
           // Need to do the following:
           implicit def newMyBeanBeanEncoder: Encoder[MyBean] = Encoders.bean(classOf[MyBean])
   
           // But this only allows expressing things like this:
           val ds: Dataset[MyBean] = Seq(new MyBean(), new MyBean(), ...).toDS
   
           // Due to the above limitation we CANNOT do the following, EVEN AFTER
           // newMyBeanBeanEncoder is brought into scope!
           // DOES NOT WORK 
           val ds: Dataset[Seq[MyBean]] = Seq(Seq(new MyBean()), Seq(new MyBean()), ...).toDS
   
           // Finally, these do not work: 
   
           // DOES NOT WORK 
           val ds: Dataset[(Int, MyBean)] = Seq((0, new MyBean()),(0, new MyBean()), ...).toDS
   
           // DOES NOT WORK
           implicit def newMyBeanEncoder: Encoder[Seq[MyBean]] = ExpressionEncoder()
           
           // DOES NOT WORK
           implicit def newMyBeanEncoder: Encoder[java.util.List[MyBean]] = ExpressionEncoder()
   
           // The above samples all rely on ExpressionEncoder
           // being able to handle every type in the expression...
           // currently seems to work for:
           // - case classes
           // - tuples
           // - scala.Product
           // - scala "primitives"
           // other common types with encoders... BUT NOT java beans... :'(
       }
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org