You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2023/03/14 10:33:48 UTC

[spark] branch master updated: [SPARK-42732][PYSPARK][CONNECT] Support spark connect session getActiveSession method

This is an automated email from the ASF dual-hosted git repository.

weichenxu123 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 753864fedee [SPARK-42732][PYSPARK][CONNECT] Support spark connect session getActiveSession method
753864fedee is described below

commit 753864fedee62f638354040063d95b2b3ba93d46
Author: Weichen Xu <we...@databricks.com>
AuthorDate: Tue Mar 14 18:33:24 2023 +0800

    [SPARK-42732][PYSPARK][CONNECT] Support spark connect session getActiveSession method
    
    ### What changes were proposed in this pull request?
    
    Support spark connect session getActiveSession method.
    
    Spark connect ML needs this API to get active session in some cases (e.g. fetching model attributes from server side).
    
    ### Why are the changes needed?
    
    Manually.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes. Implemented `pyspark.sql.connect.session.SparkSession.getActiveSession` API.
    
    ### How was this patch tested?
    
    N/A
    
    Closes #40353 from WeichenXu123/spark-connect-get-active-session.
    
    Authored-by: Weichen Xu <we...@databricks.com>
    Signed-off-by: Weichen Xu <we...@databricks.com>
---
 python/pyspark/sql/connect/session.py | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/connect/session.py b/python/pyspark/sql/connect/session.py
index 5e7c8361d80..ffa139eba3e 100644
--- a/python/pyspark/sql/connect/session.py
+++ b/python/pyspark/sql/connect/session.py
@@ -74,6 +74,11 @@ if TYPE_CHECKING:
     from pyspark.sql.connect.udf import UDFRegistration
 
 
+# `_active_spark_session` stores the active spark connect session created by
+# `SparkSession.builder.getOrCreate`. It is used by ML code.
+_active_spark_session = None
+
+
 class SparkSession:
     class Builder:
         """Builder for :class:`SparkSession`."""
@@ -119,7 +124,11 @@ class SparkSession:
             raise NotImplementedError("enableHiveSupport not implemented for Spark Connect")
 
         def getOrCreate(self) -> "SparkSession":
-            return SparkSession(connectionString=self._options["spark.remote"])
+            global _active_spark_session
+            if _active_spark_session is not None:
+                return _active_spark_session
+            _active_spark_session = SparkSession(connectionString=self._options["spark.remote"])
+            return _active_spark_session
 
     _client: SparkConnectClient
 
@@ -434,7 +443,9 @@ class SparkSession:
         # specifically in Spark Connect the Spark Connect server is designed for
         # multi-tenancy - the remote client side cannot just stop the server and stop
         # other remote clients being used from other users.
+        global _active_spark_session
         self.client.close()
+        _active_spark_session = None
 
         if "SPARK_LOCAL_REMOTE" in os.environ:
             # When local mode is in use, follow the regular Spark session's


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org