You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sunitha Kambhampati (JIRA)" <ji...@apache.org> on 2019/04/01 18:53:00 UTC

[jira] [Commented] (SPARK-25003) Pyspark Does not use Spark Sql Extensions

    [ https://issues.apache.org/jira/browse/SPARK-25003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807073#comment-16807073 ] 

Sunitha Kambhampati commented on SPARK-25003:
---------------------------------------------

Hi [~hyukjin.kwon], [~rspitzer],   The extension points functionality is not available from pyspark. Would it be possible to get this fix on the v2.4 branch. Are there any issues with back porting it?  Thank you so much. 

> Pyspark Does not use Spark Sql Extensions
> -----------------------------------------
>
>                 Key: SPARK-25003
>                 URL: https://issues.apache.org/jira/browse/SPARK-25003
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.2, 2.3.1
>            Reporter: Russell Spitzer
>            Assignee: Russell Spitzer
>            Priority: Major
>             Fix For: 3.0.0
>
>
> When creating a SparkSession here
> [https://github.com/apache/spark/blob/v2.2.2/python/pyspark/sql/session.py#L216]
> {code:python}
> if jsparkSession is None:
>   jsparkSession = self._jvm.SparkSession(self._jsc.sc())
> self._jsparkSession = jsparkSession
> {code}
> I believe it ends up calling the constructor here
> https://github.com/apache/spark/blob/v2.2.2/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L85-L87
> {code:scala}
>   private[sql] def this(sc: SparkContext) {
>     this(sc, None, None, new SparkSessionExtensions)
>   }
> {code}
> Which creates a new SparkSessionsExtensions object and does not pick up new extensions that could have been set in the config like the companion getOrCreate does.
> https://github.com/apache/spark/blob/v2.2.2/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L928-L944
> {code:scala}
> //in getOrCreate
>         // Initialize extensions if the user has defined a configurator class.
>         val extensionConfOption = sparkContext.conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS)
>         if (extensionConfOption.isDefined) {
>           val extensionConfClassName = extensionConfOption.get
>           try {
>             val extensionConfClass = Utils.classForName(extensionConfClassName)
>             val extensionConf = extensionConfClass.newInstance()
>               .asInstanceOf[SparkSessionExtensions => Unit]
>             extensionConf(extensions)
>           } catch {
>             // Ignore the error if we cannot find the class or when the class has the wrong type.
>             case e @ (_: ClassCastException |
>                       _: ClassNotFoundException |
>                       _: NoClassDefFoundError) =>
>               logWarning(s"Cannot use $extensionConfClassName to configure session extensions.", e)
>           }
>         }
> {code}
> I think a quick fix would be to use the getOrCreate method from the companion object instead of calling the constructor from the SparkContext. Or we could fix this by ensuring that all constructors attempt to pick up custom extensions if they are set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org