You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Russell Spitzer (JIRA)" <ji...@apache.org> on 2019/04/01 19:49:00 UTC

[jira] [Comment Edited] (SPARK-25003) Pyspark Does not use Spark Sql Extensions

    [ https://issues.apache.org/jira/browse/SPARK-25003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807121#comment-16807121 ] 

Russell Spitzer edited comment on SPARK-25003 at 4/1/19 7:48 PM:
-----------------------------------------------------------------

There was no interest in putting in OSS 2.4 and 2.2, but I did do this backport for the Datastax Distribution of Spark 2.4 and I can report it is a relatively simple and straightforward process.


was (Author: rspitzer):
There was no interest in putting in OSS 2.4, but I did do this backport for the Datastax Distribution of Spark 2.4 and I can report it is a relatively simple and straightforward process.

> Pyspark Does not use Spark Sql Extensions
> -----------------------------------------
>
>                 Key: SPARK-25003
>                 URL: https://issues.apache.org/jira/browse/SPARK-25003
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.2, 2.3.1
>            Reporter: Russell Spitzer
>            Assignee: Russell Spitzer
>            Priority: Major
>             Fix For: 3.0.0
>
>
> When creating a SparkSession here
> [https://github.com/apache/spark/blob/v2.2.2/python/pyspark/sql/session.py#L216]
> {code:python}
> if jsparkSession is None:
>   jsparkSession = self._jvm.SparkSession(self._jsc.sc())
> self._jsparkSession = jsparkSession
> {code}
> I believe it ends up calling the constructor here
> https://github.com/apache/spark/blob/v2.2.2/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L85-L87
> {code:scala}
>   private[sql] def this(sc: SparkContext) {
>     this(sc, None, None, new SparkSessionExtensions)
>   }
> {code}
> Which creates a new SparkSessionsExtensions object and does not pick up new extensions that could have been set in the config like the companion getOrCreate does.
> https://github.com/apache/spark/blob/v2.2.2/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L928-L944
> {code:scala}
> //in getOrCreate
>         // Initialize extensions if the user has defined a configurator class.
>         val extensionConfOption = sparkContext.conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS)
>         if (extensionConfOption.isDefined) {
>           val extensionConfClassName = extensionConfOption.get
>           try {
>             val extensionConfClass = Utils.classForName(extensionConfClassName)
>             val extensionConf = extensionConfClass.newInstance()
>               .asInstanceOf[SparkSessionExtensions => Unit]
>             extensionConf(extensions)
>           } catch {
>             // Ignore the error if we cannot find the class or when the class has the wrong type.
>             case e @ (_: ClassCastException |
>                       _: ClassNotFoundException |
>                       _: NoClassDefFoundError) =>
>               logWarning(s"Cannot use $extensionConfClassName to configure session extensions.", e)
>           }
>         }
> {code}
> I think a quick fix would be to use the getOrCreate method from the companion object instead of calling the constructor from the SparkContext. Or we could fix this by ensuring that all constructors attempt to pick up custom extensions if they are set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org