You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Rafal Wojdyla (Jira)" <ji...@apache.org> on 2022/04/14 09:08:00 UTC

[jira] [Commented] (SPARK-38438) Can't update spark.jars.packages on existing global/default context

    [ https://issues.apache.org/jira/browse/SPARK-38438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522176#comment-17522176 ] 

Rafal Wojdyla commented on SPARK-38438:
---------------------------------------

For posterity see the context in https://lists.apache.org/thread/42rsmbyqc5p1zfv956rwz4wk9lhj4s6w. 

[~srowen] thanks for the comment, feel free to close this issue if you believe there's no chance of getting this one in.

> Can't update spark.jars.packages on existing global/default context
> -------------------------------------------------------------------
>
>                 Key: SPARK-38438
>                 URL: https://issues.apache.org/jira/browse/SPARK-38438
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, Spark Core
>    Affects Versions: 3.2.1
>         Environment: py: 3.9
> spark: 3.2.1
>            Reporter: Rafal Wojdyla
>            Priority: Minor
>
> Reproduction:
> {code:python}
> from pyspark.sql import SparkSession
> # default session:
> s = SparkSession.builder.getOrCreate()
> # later on we want to update jars.packages, here's e.g. spark-hats
> s = (SparkSession.builder
>      .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2")
>      .getOrCreate())
> # line below returns None, the config was not propagated:
> s._sc._conf.get("spark.jars.packages")
> {code}
> Stopping the context doesn't help, in fact it's even more confusing, because the configuration is updated, but doesn't have an effect:
> {code:python}
> from pyspark.sql import SparkSession
> # default session:
> s = SparkSession.builder.getOrCreate()
> s.stop()
> s = (SparkSession.builder
>      .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2")
>      .getOrCreate())
> # now this line returns 'za.co.absa:spark-hats_2.12:0.2.2', but the context
> # doesn't download the jar/package, as it would if there was no global context
> # thus the extra package is unusable. It's not downloaded, or added to the
> # classpath.
> s._sc._conf.get("spark.jars.packages")
> {code}
> One workaround is to stop the context AND kill the JVM gateway, which seems to be a kind of hard reset:
> {code:python}
> from pyspark import SparkContext
> from pyspark.sql import SparkSession
> # default session:
> s = SparkSession.builder.getOrCreate()
> # Hard reset:
> s.stop()
> s._sc._gateway.shutdown()
> s._sc._gateway.proc.stdin.close()
> SparkContext._gateway = None
> SparkContext._jvm = None
> s = (SparkSession.builder
>      .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2")
>      .getOrCreate())
> # Now we are guaranteed there's a new spark session, and packages
> # are downloaded, added to the classpath etc.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org