You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Rafal Wojdyla (Jira)" <ji...@apache.org> on 2022/03/07 22:56:00 UTC

[jira] [Created] (SPARK-38438) Can't update spark.jars.packages on existing global/default context

Rafal Wojdyla created SPARK-38438:
-------------------------------------

             Summary: Can't update spark.jars.packages on existing global/default context
                 Key: SPARK-38438
                 URL: https://issues.apache.org/jira/browse/SPARK-38438
             Project: Spark
          Issue Type: New Feature
          Components: PySpark, Spark Core
    Affects Versions: 3.2.1
         Environment: py: 3.9
spark: 3.2.1
            Reporter: Rafal Wojdyla


Reproduction:

{code:python}
from pyspark import SparkConf
from pyspark.sql import SparkSession

# default session:
s = SparkSession.builder.getOrCreate()

# later on we want to update jars.packages, here's e.g. spark-hats
s = (SparkSession.builder
     .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2")
     .getOrCreate())

# line below return None, the config was not propagated:
s._sc._conf.get("spark.jars.packages")
{code}

Stopping the context doesn't help, in fact it's even more confusing, because the configuration is updated, but doesn't have an effect:

{code:python}
from pyspark import SparkConf
from pyspark.sql import SparkSession

# default session:
s = SparkSession.builder.getOrCreate()

s.stop()

s = (SparkSession.builder
     .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2")
     .getOrCreate())

# now this line returns 'za.co.absa:spark-hats_2.12:0.2.2', but the context
# doesn't download the jar/package, as it would if there was no global context
# thus the extra package is unusable. It's not downloaded, or added to the
# classpath.
s._sc._conf.get("spark.jars.packages")
{code}

One workaround is to stop the context AND kill the JVM gateway, which seems to be a kind of hard reset:

{code:python}
from pyspark import SparkConf
from pyspark.sql import SparkSession

# default session:
s = SparkSession.builder.getOrCreate()

# Hard reset:
s.stop()
s._sc._gateway.shutdown()
SparkContext._gateway = None
SparkContext._jvm = None

s = (SparkSession.builder
     .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2")
     .getOrCreate())

# Now we are guaranteed there's a new spark session, and packages
# are downloaded, added to the classpath etc.
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org