You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sam hendley (Jira)" <ji...@apache.org> on 2019/12/31 18:56:00 UTC

[jira] [Commented] (SPARK-24202) Separate SQLContext dependency from SparkSession.implicits

    [ https://issues.apache.org/jira/browse/SPARK-24202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006214#comment-17006214 ] 

Sam hendley commented on SPARK-24202:
-------------------------------------

I agree that this would be a very valuable change, was there a reason this was closed without comment?

> Separate SQLContext dependency from SparkSession.implicits
> ----------------------------------------------------------
>
>                 Key: SPARK-24202
>                 URL: https://issues.apache.org/jira/browse/SPARK-24202
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Gerard Maas
>            Priority: Major
>              Labels: bulk-closed
>
> The current implementation of the implicits in SparkSession passes the current active SQLContext to the SQLImplicits class. This implies that all usage of these (extremely helpful) implicits require the prior creation of a Spark Session instance.
> Usage is typically done as follows:
>  
> {code:java}
> val sparkSession = SparkSession.builder()
> ....getOrCreate()
> import sparkSession.implicits._
> {code}
>  
> This is OK in user code, but it burdens the creation of library code that uses Spark, where  static imports for _Encoder_ support is required.
> A simple example would be:
>  
> {code:java}
> class SparkTransformation[In: Encoder, Out: Encoder] {
>     def transform(ds: Dataset[In]): Dataset[Out]
> }
> {code}
>  
> Attempting to compile such code would result in the following exception:
> {code:java}
> Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.{code}
> The usage of the _SQLContext_ instance in _SQLImplicits_ is limited to two utilities to transform _RDD_ and local collections into a _Dataset_.
> These are 2 methods of the 46 implicit conversions offered by this class.
> The request is to separate the two implicit methods that depend on the SQLContext instance creation into a separate class:
> {code:java}
> SQLImplicits#214-229
> /**
>  * Creates a [[Dataset]] from an RDD.
>  *
>  * @since 1.6.0
>  */
> implicit def rddToDatasetHolder[T : Encoder](rdd: RDD[T]): DatasetHolder[T] = {
>  DatasetHolder(_sqlContext.createDataset(rdd))
> }
> /**
>  * Creates a [[Dataset]] from a local Seq.
>  * @since 1.6.0
>  */
> implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = {
>  DatasetHolder(_sqlContext.createDataset(s))
> }{code}
> By separating the static methods from these two methods that depend on _sqlContext_ into  separate classes, we could provide static imports for all the other functionality and only require the instance-bound  implicits for the RDD and collection support (Which is an uncommon use case these days)
> As this is potentially breaking the current interface, this might be a candidate for Spark 3.0. Although there's nothing stopping us from creating a separate hierarchy for the static encoders already. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org