You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Vladimir Feinberg (JIRA)" <ji...@apache.org> on 2016/06/28 20:26:57 UTC
[jira] [Commented] (SPARK-16263) SparkSession caches configuration in an unituitive global way

    [ https://issues.apache.org/jira/browse/SPARK-16263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353658#comment-15353658 ] 

Vladimir Feinberg commented on SPARK-16263:
-------------------------------------------

Right, I'm not arguing for the need for multiple sessions at once, but I think it's reasonable to expect this global state to have some notion of idempotency. I think whatever we do the restrictions on the use case must be enforced by the API. If I'm really only ever allowed to invoke SparkSession creation once, then the builder should raise on the second time (and building a session should be a process independent of getOrCreate()-ing it).

On the other hand, if we're ok with the one-spark-session-at-a-time (which the code is mostly in line with already), then it's just a matter of clearing the global variables on shutdown.

> SparkSession caches configuration in an unituitive global way
> -------------------------------------------------------------
>
>                 Key: SPARK-16263
>                 URL: https://issues.apache.org/jira/browse/SPARK-16263
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>            Reporter: Vladimir Feinberg
>            Priority: Minor
>
> The following use case demonstrates the issue. Note that as a workaround to SPARK-16262 I use {{reset_spark()}} to stop the current {{SparkSession}}.
> {code} 
> >>> from pyspark.sql import SparkSession
> >>> def reset_spark(): global spark; spark.stop(); SparkSession._instantiatedContext = None
> ... 
> >>> spark = SparkSession.builder.getOrCreate()
> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> 16/06/28 11:41:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 16/06/28 11:41:36 WARN Utils: Your hostname, vlad-databricks resolves to a loopback address: 127.0.1.1; using 192.168.3.166 instead (on interface enp0s31f6)
> 16/06/28 11:41:36 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
> >>> spark.conf.get("spark.sql.retainGroupColumns")
> u'true'
> >>> reset_spark()
> >>> spark = SparkSession.builder.config("spark.sql.retainGroupColumns", "false").getOrCreate()
> >>> spark.conf.get("spark.sql.retainGroupColumns")
> u'false'
> >>> reset_spark()
> >>> spark = SparkSession.builder.getOrCreate()
> >>> spark.conf.get("spark.sql.retainGroupColumns")
> u'false'
> >>> 
> {code}
> The last line should output {{u'true'}} instead - there is absolutely no expectation for global config state to persist across sessions, which should use default configuration unless deviated from in each session's specific builder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org