You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by huaxingao <gi...@git.apache.org> on 2018/08/31 00:53:43 UTC

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

GitHub user huaxingao opened a pull request:

    https://github.com/apache/spark/pull/22295

    [SPARK-25255][PYTHON]Add getActiveSession to SparkSession in PySpark

    ## What changes were proposed in this pull request?
    
    add getActiveSession  in session.py
    
    ## How was this patch tested?
    
    add doctest


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/huaxingao/spark spark25255

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22295.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22295
    
----
commit e9885b38e35dcfd01a40e43cf442beeaea226b98
Author: Huaxin Gao <hu...@...>
Date:   2018-08-31T00:50:23Z

    [SPARK-25255][PYTHON]Add getActiveSession to SparkSession in PySpark

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #96833 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96833/testReport)** for PR 22295 at commit [`b83cf8e`](https://github.com/apache/spark/commit/b83cf8ee68806a5fcab83d939d55f19f752e2a2a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3609/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    LGTM except the 3.0 to 2.5 I'll change that during the merge.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r226184011
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2713,6 +2713,25 @@ def from_csv(col, schema, options={}):
         return Column(jc)
     
     
    +@since(3.0)
    +def _getActiveSession():
    --- End diff --
    
    I mean the function itself .. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22295


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r224983437
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +253,22 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @since(3.0)
    --- End diff --
    
    Yes, at that time, 2.5 was targeted. Now 3.0 is targeted per https://github.com/apache/spark/commit/9bf397c0e45cb161f3f12f09bd2bf14ff96dc823


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r226178191
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3863,6 +3863,145 @@ def test_jvm_default_session_already_set(self):
                 spark.stop()
     
     
    +class SparkSessionTests2(unittest.TestCase):
    +
    +    def test_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            activeSession = SparkSession.getActiveSession()
    +            df = activeSession.createDataFrame([(1, 'Alice')], ['age', 'name'])
    +            self.assertEqual(df.collect(), [Row(age=1, name=u'Alice')])
    +        finally:
    +            spark.stop()
    +
    +    def test_get_active_session_when_no_active_session(self):
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, None)
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, spark)
    +        spark.stop()
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, None)
    +
    +    def test_SparkSession(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .config("some-config", "v2") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(spark.conf.get("some-config"), "v2")
    +            self.assertEqual(spark.sparkContext._conf.get("some-config"), "v2")
    +            self.assertEqual(spark.version, spark.sparkContext.version)
    +            spark.sql("CREATE DATABASE test_db")
    +            spark.catalog.setCurrentDatabase("test_db")
    +            self.assertEqual(spark.catalog.currentDatabase(), "test_db")
    +            spark.sql("CREATE TABLE table1 (name STRING, age INT) USING parquet")
    +            self.assertEqual(spark.table("table1").columns, ['name', 'age'])
    +            self.assertEqual(spark.range(3).count(), 3)
    +        finally:
    +            spark.stop()
    +
    +    def test_global_default_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(SparkSession.builder.getOrCreate(), spark)
    +        finally:
    +            spark.stop()
    +
    +    def test_default_and_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        activeSession = spark._jvm.SparkSession.getActiveSession()
    +        defaultSession = spark._jvm.SparkSession.getDefaultSession()
    +        try:
    +            self.assertEqual(activeSession, defaultSession)
    +        finally:
    +            spark.stop()
    +
    +    def test_config_option_propagated_to_existing_SparkSession(self):
    --- End diff --
    
    Will change. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r215022091
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +252,16 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @since(2.4)
    +    def getActiveSession(self):
    +        """
    +        Returns the active SparkSession for the current thread, returned by the builder.
    +        >>> s = spark.getActiveSession()
    +        >>> spark._jsparkSession.getDefaultSession().get().equals(s.get())
    --- End diff --
    
    @holdenk @felixcheung Thanks for the review. I will change this. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r224983406
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2633,6 +2633,23 @@ def sequence(start, stop, step=None):
                 _to_java_column(start), _to_java_column(stop), _to_java_column(step)))
     
     
    +@since(3.0)
    +def getActiveSession():
    +    """
    +    Returns the active SparkSession for the current thread
    +    """
    +    from pyspark.sql import SparkSession
    +    sc = SparkContext._active_spark_context
    --- End diff --
    
    Yea, we should match the behaviour with Scala side - that was my point essentially. The problem about the previous approach was that session was being handled within Python - I believe we will basically reuse JVM's session implementation rather than reimplementing the seperate Python session support within PySpark side.
    
    > What about if sc isNone we just return Nonesince we can't have an activeSession without an active SparkContext -- does that sound reasonable?
    
    In that case, I think we should follow Scala's behaviour.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #97124 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97124/testReport)** for PR 22295 at commit [`55f1b03`](https://github.com/apache/spark/commit/55f1b03d870a20a38f9590964f7717aaef1d0d4c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97466/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    nvm, the merge script only triggers the edits if we have conflicts. If you can update 3.0 to 2.5 I'd be happy to merge.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r221005236
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None):
                     or SparkSession._instantiatedSession._sc._jsc is None:
                 SparkSession._instantiatedSession = self
                 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
    +            self._jvm.SparkSession.setActiveSession(self._jsparkSession)
    --- End diff --
    
    @huaxingao, can you check if the active session is set? for instance when we `createDataFrame`? From a cursory look, we are not setting it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96707/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r215022059
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +252,16 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @since(2.4)
    +    def getActiveSession(self):
    +        """
    +        Returns the active SparkSession for the current thread, returned by the builder.
    +        >>> s = spark.getActiveSession()
    +        >>> spark._jsparkSession.getDefaultSession().get().equals(s.get())
    +        True
    +        """
    +        return self._jsparkSession.getActiveSession()
    --- End diff --
    
    @HyukjinKwon Sorry for the late reply. Yes, this returns a JVM instance. 
    In the scala code, ```SparkSession.getActiveSession``` returns an ```Option[SparkSession]```
    I am not sure how to do a python equivalent of Scala's ```Option```. In the following code, is there a way to wrap the python ```session``` in else path to something equivalent of Scala's ```Option```?  If not, can I just return the python ```session```?
    ```
            if self._jsparkSession.getActiveSession() is None:
                return None
            else:
                return self.__class__(self._sc, self._jsparkSession.getActiveSession().get())
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #96707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96707/testReport)** for PR 22295 at commit [`c966846`](https://github.com/apache/spark/commit/c966846366d1f9a4deb2c7fde4485eac86a0610c).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2716/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #95509 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95509/testReport)** for PR 22295 at commit [`89c3b44`](https://github.com/apache/spark/commit/89c3b44fa965dde7623d00596df25fd254806c9d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #95507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95507/testReport)** for PR 22295 at commit [`e9885b3`](https://github.com/apache/spark/commit/e9885b38e35dcfd01a40e43cf442beeaea226b98).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r223173806
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +255,20 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @classmethod
    +    @since(2.5)
    +    def getActiveSession(cls):
    +        """
    +        Returns the active SparkSession for the current thread, returned by the builder.
    +        >>> s = SparkSession.getActiveSession()
    +        >>> l = [('Alice', 1)]
    +        >>> rdd = s.sparkContext.parallelize(l)
    +        >>> df = s.createDataFrame(rdd, ['name', 'age'])
    +        >>> df.select("age").collect()
    +        [Row(age=1)]
    +        """
    +        return cls._activeSession
    --- End diff --
    
    Yea, it should look like that


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #96708 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96708/testReport)** for PR 22295 at commit [`3e11d0a`](https://github.com/apache/spark/commit/3e11d0a239e645baf0abae4f855983ff2948002b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3801/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r221429200
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +253,22 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @since(2.5)
    +    def getActiveSession(self):
    --- End diff --
    
    Wait .. this should be class method. since the scala usage is `SparkSession. getActiveSession`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97577/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r219551059
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None):
                     or SparkSession._instantiatedSession._sc._jsc is None:
                 SparkSession._instantiatedSession = self
                 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
    +            self._jvm.SparkSession.setActiveSession(self._jsparkSession)
    --- End diff --
    
    Yes, that sounds like the right approach and I think we need that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3360/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r225666954
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None):
                     or SparkSession._instantiatedSession._sc._jsc is None:
                 SparkSession._instantiatedSession = self
                 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
    +            self._jvm.SparkSession.setActiveSession(self._jsparkSession)
    --- End diff --
    
    @holdenk @HyukjinKwon 
    Thanks for the comments. I looked the scala code, it ```setActiveSession``` in ```createDataFrame```. 
    ```
      def createDataFrame[A <: Product : TypeTag](rdd: RDD[A]): DataFrame = {
        SparkSession.setActiveSession(this)
        ...
      }
    ```
    I will do the same for python.
    ```
    def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=True):
            SparkSession._activeSession = self
            self._jvm.SparkSession.setActiveSession(self._jsparkSession)
    ```
    Will also add a test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r218370828
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3654,6 +3654,107 @@ def test_jvm_default_session_already_set(self):
                 spark.stop()
     
     
    +class SparkSessionTests2(ReusedSQLTestCase):
    --- End diff --
    
    Do we need to extend `ReusedSQLTestCase`? Looks we can just `unittest.TestCase`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r222700425
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +255,20 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @classmethod
    +    @since(2.5)
    +    def getActiveSession(cls):
    +        """
    +        Returns the active SparkSession for the current thread, returned by the builder.
    +        >>> s = SparkSession.getActiveSession()
    +        >>> l = [('Alice', 1)]
    +        >>> rdd = s.sparkContext.parallelize(l)
    +        >>> df = s.createDataFrame(rdd, ['name', 'age'])
    +        >>> df.select("age").collect()
    +        [Row(age=1)]
    +        """
    +        return cls._activeSession
    --- End diff --
    
    Do you mean in a multi-language notebook environment?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r226178054
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2713,6 +2713,25 @@ def from_csv(col, schema, options={}):
         return Column(jc)
     
     
    +@since(3.0)
    +def _getActiveSession():
    --- End diff --
    
    Do you mean the _ prefix or the function itself?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged to master for 3.0. Thanks for fixing this @huaxingao :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r226165866
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2713,6 +2713,25 @@ def from_csv(col, schema, options={}):
         return Column(jc)
     
     
    +@since(3.0)
    +def _getActiveSession():
    --- End diff --
    
    eh.. why is it in functions.py?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #97577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97577/testReport)** for PR 22295 at commit [`94e3db0`](https://github.com/apache/spark/commit/94e3db0c0c9873daaca688c2a63f01420882692e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #97503 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97503/testReport)** for PR 22295 at commit [`56282da`](https://github.com/apache/spark/commit/56282dab6c569c4eaa7e1217dff3f8c092563985).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95507/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #95815 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95815/testReport)** for PR 22295 at commit [`cd87f06`](https://github.com/apache/spark/commit/cd87f0648e4e782e76b7166a7b494a8a9266096a).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95815/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95889/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r221104383
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None):
                     or SparkSession._instantiatedSession._sc._jsc is None:
                 SparkSession._instantiatedSession = self
                 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
    +            self._jvm.SparkSession.setActiveSession(self._jsparkSession)
    --- End diff --
    
    > When createDataFrame, we already have a session
    
    but wouldn't we not set the active session properly if session A sets an active session in `__init__`, and then session B sets an active session in `__init__`, and then session A calls `createDataFrame` ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97124/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #97124 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97124/testReport)** for PR 22295 at commit [`55f1b03`](https://github.com/apache/spark/commit/55f1b03d870a20a38f9590964f7717aaef1d0d4c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96451/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r216115581
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +252,16 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @since(2.4)
    +    def getActiveSession(self):
    +        """
    +        Returns the active SparkSession for the current thread, returned by the builder.
    +        >>> s = spark.getActiveSession()
    +        >>> spark._jsparkSession.getDefaultSession().get().equals(s.get())
    +        True
    +        """
    +        return self._jsparkSession.getActiveSession()
    --- End diff --
    
    @HyukjinKwon I add a set of tests. Some of them are borrowed from ```SparkSessionBuilderSuite.scala```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r214410416
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +252,16 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @since(2.4)
    +    def getActiveSession(self):
    +        """
    +        Returns the active SparkSession for the current thread, returned by the builder.
    +        >>> s = spark.getActiveSession()
    +        >>> spark._jsparkSession.getDefaultSession().get().equals(s.get())
    --- End diff --
    
    So normally we try and have doc tests like these be examples of how the user should use this. So I would consider getting the active session and then doing something a normal user would with it (like paralleling a collection).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #97503 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97503/testReport)** for PR 22295 at commit [`56282da`](https://github.com/apache/spark/commit/56282dab6c569c4eaa7e1217dff3f8c092563985).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #95889 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95889/testReport)** for PR 22295 at commit [`2345e55`](https://github.com/apache/spark/commit/2345e559723183c9ad8f51c76d93f1992451fbf4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95509/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r226166020
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3863,6 +3863,145 @@ def test_jvm_default_session_already_set(self):
                 spark.stop()
     
     
    +class SparkSessionTests2(unittest.TestCase):
    +
    +    def test_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            activeSession = SparkSession.getActiveSession()
    +            df = activeSession.createDataFrame([(1, 'Alice')], ['age', 'name'])
    +            self.assertEqual(df.collect(), [Row(age=1, name=u'Alice')])
    +        finally:
    +            spark.stop()
    +
    +    def test_get_active_session_when_no_active_session(self):
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, None)
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, spark)
    +        spark.stop()
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, None)
    +
    +    def test_SparkSession(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .config("some-config", "v2") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(spark.conf.get("some-config"), "v2")
    +            self.assertEqual(spark.sparkContext._conf.get("some-config"), "v2")
    +            self.assertEqual(spark.version, spark.sparkContext.version)
    +            spark.sql("CREATE DATABASE test_db")
    +            spark.catalog.setCurrentDatabase("test_db")
    +            self.assertEqual(spark.catalog.currentDatabase(), "test_db")
    +            spark.sql("CREATE TABLE table1 (name STRING, age INT) USING parquet")
    +            self.assertEqual(spark.table("table1").columns, ['name', 'age'])
    +            self.assertEqual(spark.range(3).count(), 3)
    +        finally:
    +            spark.stop()
    +
    +    def test_global_default_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(SparkSession.builder.getOrCreate(), spark)
    +        finally:
    +            spark.stop()
    +
    +    def test_default_and_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        activeSession = spark._jvm.SparkSession.getActiveSession()
    +        defaultSession = spark._jvm.SparkSession.getDefaultSession()
    +        try:
    +            self.assertEqual(activeSession, defaultSession)
    +        finally:
    +            spark.stop()
    +
    +    def test_config_option_propagated_to_existing_SparkSession(self):
    +        session1 = SparkSession.builder \
    +            .master("local") \
    +            .config("spark-config1", "a") \
    +            .getOrCreate()
    +        self.assertEqual(session1.conf.get("spark-config1"), "a")
    +        session2 = SparkSession.builder \
    +            .config("spark-config1", "b") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(session1, session2)
    +            self.assertEqual(session1.conf.get("spark-config1"), "b")
    +        finally:
    +            session1.stop()
    +
    +    def test_new_session(self):
    +        session = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        newSession = session.newSession()
    +        try:
    +            self.assertNotEqual(session, newSession)
    +        finally:
    +            session.stop()
    +            newSession.stop()
    +
    +    def test_create_new_session_if_old_session_stopped(self):
    +        session = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        session.stop()
    +        newSession = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            self.assertNotEqual(session, newSession)
    +        finally:
    +            newSession.stop()
    +
    +    def test_active_session_with_None_and_not_None_context(self):
    +        from pyspark.context import SparkContext
    +        from pyspark.conf import SparkConf
    +        sc = SparkContext._active_spark_context
    +        self.assertEqual(sc, None)
    +        activeSession = SparkSession.getActiveSession()
    +        self.assertEqual(activeSession, None)
    +        sparkConf = SparkConf()
    +        sc = SparkContext.getOrCreate(sparkConf)
    +        activeSession = sc._jvm.SparkSession.getActiveSession()
    +        self.assertFalse(activeSession.isDefined())
    +        session = SparkSession(sc)
    +        activeSession = sc._jvm.SparkSession.getActiveSession()
    +        self.assertTrue(activeSession.isDefined())
    +        activeSession2 = SparkSession.getActiveSession()
    +        self.assertNotEqual(activeSession2, None)
    +
    +
    +class SparkSessionTests3(ReusedSQLTestCase):
    +
    +    def test_get_active_session_after_create_dataframe(self):
    +        activeSession1 = SparkSession.getActiveSession()
    +        session1 = self.spark
    +        self.assertEqual(session1, activeSession1)
    +        session2 = self.spark.newSession()
    +        activeSession2 = SparkSession.getActiveSession()
    +        self.assertEqual(session1, activeSession2)
    +        self.assertNotEqual(session2, activeSession2)
    +        session2.createDataFrame([(1, 'Alice')], ['age', 'name'])
    +        activeSession3 = SparkSession.getActiveSession()
    +        self.assertEqual(session2, activeSession3)
    +        session1.createDataFrame([(1, 'Alice')], ['age', 'name'])
    +        activeSession4 = SparkSession.getActiveSession()
    +        self.assertEqual(session1, activeSession4)
    +        session2.stop()
    --- End diff --
    
    I think you can put this in try-finally


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r218370450
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3654,6 +3654,107 @@ def test_jvm_default_session_already_set(self):
                 spark.stop()
     
     
    +class SparkSessionTests2(ReusedSQLTestCase):
    +
    +    def test_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            activeSession = spark.getActiveSession()
    +            df = activeSession.createDataFrame([(1, 'Alice')], ['age', 'name'])
    +            self.assertEqual(df.collect(), [Row(age=1, name=u'Alice')])
    +        finally:
    +            spark.stop()
    +
    +    def test_SparkSession(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .config("some-config", "v2") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(spark.conf.get("some-config"), "v2")
    +            self.assertEqual(spark.sparkContext._conf.get("some-config"), "v2")
    +            self.assertEqual(spark.version, spark.sparkContext.version)
    +            spark.sql("CREATE DATABASE test_db")
    +            spark.catalog.setCurrentDatabase("test_db")
    +            self.assertEqual(spark.catalog.currentDatabase(), "test_db")
    +            spark.sql("CREATE TABLE table1 (name STRING, age INT) USING parquet")
    +            self.assertEqual(spark.table("table1").columns, ['name', 'age'])
    +            self.assertEqual(spark.range(3).count(), 3)
    +        finally:
    +            spark.stop()
    +
    +    def test_global_default_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(SparkSession.builder.getOrCreate(), spark)
    +        finally:
    +            spark.stop()
    +
    +    def test_default_and_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        activeSession = spark._jvm.SparkSession.getActiveSession()
    +        defaultSession = spark._jvm.SparkSession.getDefaultSession()
    +        try:
    +            self.assertEqual(activeSession, defaultSession)
    +        finally:
    +            spark.stop()
    +
    +    def test_config_option_propagated_to_existing_SparkSession(self):
    +        session1 = SparkSession.builder \
    +            .master("local") \
    +            .config("spark-config1", "a") \
    +            .getOrCreate()
    +        self.assertEqual(session1.conf.get("spark-config1"), "a")
    +        session2 = SparkSession.builder \
    +            .config("spark-config1", "b") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(session1, session2)
    +            self.assertEqual(session1.conf.get("spark-config1"), "b")
    +        finally:
    +            session1.stop()
    +
    +    def test_newSession(self):
    +        session = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        newSession = session.newSession()
    +        try:
    +            self.assertNotEqual(session, newSession)
    +        finally:
    +            session.stop()
    +            newSession.stop()
    +
    +    def test_create_new_session_if_old_session_stopped(self):
    +        session = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        session.stop()
    +        newSession = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            self.assertNotEqual(session, newSession)
    +        finally:
    +            newSession.stop()
    +
    +    def test_create_SparkContext_then_SparkSession(self):
    --- End diff --
    
    nit: let's just name it `spark_context` and `spark_session`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #97502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97502/testReport)** for PR 22295 at commit [`7c6d2d5`](https://github.com/apache/spark/commit/7c6d2d57363c0cef2eeb80228cc2a2f14ec9b226).
     * This patch **fails RAT tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r217771519
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None):
                     or SparkSession._instantiatedSession._sc._jsc is None:
                 SparkSession._instantiatedSession = self
                 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
    +            self._jvm.SparkSession.setActiveSession(self._jsparkSession)
    --- End diff --
    
    Thanks for catching this! Filed a follow up https://issues.apache.org/jira/browse/SPARK-25432


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r221429353
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +253,22 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @since(2.5)
    +    def getActiveSession(self):
    --- End diff --
    
    I think the class method should initialize JVM if non existent (see functions.py). Probably Spark context too. If exists, it should use the existing one.
    
    Also, let's define this as a property since that's closer to Scala's usage.
    
    I know it's difficult to define a static property. You can refer https://github.com/graphframes/graphframes/pull/169/files#diff-e81e6b169c0aa35012a3263b2f31b330R381 or we should consider adding this as a function
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97503/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r224860828
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None):
                     or SparkSession._instantiatedSession._sc._jsc is None:
                 SparkSession._instantiatedSession = self
                 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
    +            self._jvm.SparkSession.setActiveSession(self._jsparkSession)
    --- End diff --
    
    If we're going to support this we should have test for it, or if we aren't going to support this right now we should document the behaviour.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4111/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r222392450
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +255,20 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @classmethod
    +    @since(2.5)
    +    def getActiveSession(cls):
    +        """
    +        Returns the active SparkSession for the current thread, returned by the builder.
    +        >>> s = SparkSession.getActiveSession()
    +        >>> l = [('Alice', 1)]
    +        >>> rdd = s.sparkContext.parallelize(l)
    +        >>> df = s.createDataFrame(rdd, ['name', 'age'])
    +        >>> df.select("age").collect()
    +        [Row(age=1)]
    +        """
    +        return cls._activeSession
    --- End diff --
    
    The problem here is when we share single JVM like Zeppelin. It should get the session from JVM.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2985/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r224860350
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3654,6 +3654,109 @@ def test_jvm_default_session_already_set(self):
                 spark.stop()
     
     
    +class SparkSessionTests2(unittest.TestCase):
    +
    +    def test_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            activeSession = SparkSession.getActiveSession()
    +            df = activeSession.createDataFrame([(1, 'Alice')], ['age', 'name'])
    +            self.assertEqual(df.collect(), [Row(age=1, name=u'Alice')])
    +        finally:
    +            spark.stop()
    +
    +    def test_get_active_session_when_no_active_session(self):
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, None)
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, spark)
    +        spark.stop()
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, None)
    --- End diff --
    
    Given the change for how we construct the SparkSession can we add a test that makes sure we do whatever we decide to with the SparkContext?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    I just saw this fix [SPARK-25525][SQL][PYSPARK] Do not update conf for existing SparkContext in SparkSession.getOrCreate. #22545
    I will remove ```test_create_SparkContext_then_SparkSession``` 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #97502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97502/testReport)** for PR 22295 at commit [`7c6d2d5`](https://github.com/apache/spark/commit/7c6d2d57363c0cef2eeb80228cc2a2f14ec9b226).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    @huaxingao, thanks for addressing comments. Would you mind rebasing it and resolving the conflicts?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r219551669
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3654,6 +3654,107 @@ def test_jvm_default_session_already_set(self):
                 spark.stop()
     
     
    +class SparkSessionTests2(ReusedSQLTestCase):
    --- End diff --
    
    @HyukjinKwon there's no strong need for it, however it does mean that the first `getOrCreate` will already have a session it can use, but given that we set up and tear down the session this may be less than ideal.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r219552522
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3654,6 +3654,107 @@ def test_jvm_default_session_already_set(self):
                 spark.stop()
     
     
    +class SparkSessionTests2(ReusedSQLTestCase):
    +
    +    def test_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            activeSession = spark.getActiveSession()
    +            df = activeSession.createDataFrame([(1, 'Alice')], ['age', 'name'])
    +            self.assertEqual(df.collect(), [Row(age=1, name=u'Alice')])
    +        finally:
    +            spark.stop()
    +
    +    def test_SparkSession(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .config("some-config", "v2") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(spark.conf.get("some-config"), "v2")
    +            self.assertEqual(spark.sparkContext._conf.get("some-config"), "v2")
    +            self.assertEqual(spark.version, spark.sparkContext.version)
    +            spark.sql("CREATE DATABASE test_db")
    +            spark.catalog.setCurrentDatabase("test_db")
    +            self.assertEqual(spark.catalog.currentDatabase(), "test_db")
    +            spark.sql("CREATE TABLE table1 (name STRING, age INT) USING parquet")
    +            self.assertEqual(spark.table("table1").columns, ['name', 'age'])
    +            self.assertEqual(spark.range(3).count(), 3)
    +        finally:
    +            spark.stop()
    +
    +    def test_global_default_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(SparkSession.builder.getOrCreate(), spark)
    +        finally:
    +            spark.stop()
    +
    +    def test_default_and_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        activeSession = spark._jvm.SparkSession.getActiveSession()
    +        defaultSession = spark._jvm.SparkSession.getDefaultSession()
    +        try:
    +            self.assertEqual(activeSession, defaultSession)
    +        finally:
    +            spark.stop()
    +
    +    def test_config_option_propagated_to_existing_SparkSession(self):
    +        session1 = SparkSession.builder \
    +            .master("local") \
    +            .config("spark-config1", "a") \
    +            .getOrCreate()
    +        self.assertEqual(session1.conf.get("spark-config1"), "a")
    +        session2 = SparkSession.builder \
    +            .config("spark-config1", "b") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(session1, session2)
    +            self.assertEqual(session1.conf.get("spark-config1"), "b")
    +        finally:
    +            session1.stop()
    +
    +    def test_newSession(self):
    +        session = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        newSession = session.newSession()
    +        try:
    +            self.assertNotEqual(session, newSession)
    +        finally:
    +            session.stop()
    +            newSession.stop()
    +
    +    def test_create_new_session_if_old_session_stopped(self):
    +        session = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        session.stop()
    +        newSession = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            self.assertNotEqual(session, newSession)
    +        finally:
    +            newSession.stop()
    +
    +    def test_create_SparkContext_then_SparkSession(self):
    --- End diff --
    
    I don't strongly agree here. I think given that the method names are camel case in the `SparkSession` & `SparkContext` in Python this naming is perfectly reasonable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #95889 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95889/testReport)** for PR 22295 at commit [`2345e55`](https://github.com/apache/spark/commit/2345e559723183c9ad8f51c76d93f1992451fbf4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3544/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r220787563
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +253,22 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @since(3.0)
    --- End diff --
    
    Let's change this to 2.5


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96833/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r226178127
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3863,6 +3863,145 @@ def test_jvm_default_session_already_set(self):
                 spark.stop()
     
     
    +class SparkSessionTests2(unittest.TestCase):
    +
    +    def test_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            activeSession = SparkSession.getActiveSession()
    +            df = activeSession.createDataFrame([(1, 'Alice')], ['age', 'name'])
    +            self.assertEqual(df.collect(), [Row(age=1, name=u'Alice')])
    +        finally:
    +            spark.stop()
    +
    +    def test_get_active_session_when_no_active_session(self):
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, None)
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, spark)
    +        spark.stop()
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, None)
    +
    +    def test_SparkSession(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .config("some-config", "v2") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(spark.conf.get("some-config"), "v2")
    +            self.assertEqual(spark.sparkContext._conf.get("some-config"), "v2")
    +            self.assertEqual(spark.version, spark.sparkContext.version)
    +            spark.sql("CREATE DATABASE test_db")
    +            spark.catalog.setCurrentDatabase("test_db")
    +            self.assertEqual(spark.catalog.currentDatabase(), "test_db")
    +            spark.sql("CREATE TABLE table1 (name STRING, age INT) USING parquet")
    +            self.assertEqual(spark.table("table1").columns, ['name', 'age'])
    +            self.assertEqual(spark.range(3).count(), 3)
    +        finally:
    +            spark.stop()
    +
    +    def test_global_default_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(SparkSession.builder.getOrCreate(), spark)
    +        finally:
    +            spark.stop()
    +
    +    def test_default_and_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        activeSession = spark._jvm.SparkSession.getActiveSession()
    +        defaultSession = spark._jvm.SparkSession.getDefaultSession()
    +        try:
    +            self.assertEqual(activeSession, defaultSession)
    +        finally:
    +            spark.stop()
    +
    +    def test_config_option_propagated_to_existing_SparkSession(self):
    +        session1 = SparkSession.builder \
    +            .master("local") \
    +            .config("spark-config1", "a") \
    +            .getOrCreate()
    +        self.assertEqual(session1.conf.get("spark-config1"), "a")
    +        session2 = SparkSession.builder \
    +            .config("spark-config1", "b") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(session1, session2)
    +            self.assertEqual(session1.conf.get("spark-config1"), "b")
    +        finally:
    +            session1.stop()
    +
    +    def test_new_session(self):
    +        session = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        newSession = session.newSession()
    +        try:
    +            self.assertNotEqual(session, newSession)
    +        finally:
    +            session.stop()
    +            newSession.stop()
    +
    +    def test_create_new_session_if_old_session_stopped(self):
    +        session = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        session.stop()
    +        newSession = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            self.assertNotEqual(session, newSession)
    +        finally:
    +            newSession.stop()
    +
    +    def test_active_session_with_None_and_not_None_context(self):
    +        from pyspark.context import SparkContext
    +        from pyspark.conf import SparkConf
    +        sc = SparkContext._active_spark_context
    +        self.assertEqual(sc, None)
    +        activeSession = SparkSession.getActiveSession()
    +        self.assertEqual(activeSession, None)
    +        sparkConf = SparkConf()
    +        sc = SparkContext.getOrCreate(sparkConf)
    +        activeSession = sc._jvm.SparkSession.getActiveSession()
    +        self.assertFalse(activeSession.isDefined())
    +        session = SparkSession(sc)
    +        activeSession = sc._jvm.SparkSession.getActiveSession()
    +        self.assertTrue(activeSession.isDefined())
    +        activeSession2 = SparkSession.getActiveSession()
    +        self.assertNotEqual(activeSession2, None)
    +
    +
    +class SparkSessionTests3(ReusedSQLTestCase):
    +
    +    def test_get_active_session_after_create_dataframe(self):
    +        activeSession1 = SparkSession.getActiveSession()
    +        session1 = self.spark
    +        self.assertEqual(session1, activeSession1)
    +        session2 = self.spark.newSession()
    +        activeSession2 = SparkSession.getActiveSession()
    +        self.assertEqual(session1, activeSession2)
    +        self.assertNotEqual(session2, activeSession2)
    +        session2.createDataFrame([(1, 'Alice')], ['age', 'name'])
    +        activeSession3 = SparkSession.getActiveSession()
    +        self.assertEqual(session2, activeSession3)
    +        session1.createDataFrame([(1, 'Alice')], ['age', 'name'])
    +        activeSession4 = SparkSession.getActiveSession()
    +        self.assertEqual(session1, activeSession4)
    +        session2.stop()
    --- End diff --
    
    Will change. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3019/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r216553283
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +253,22 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @since(2.4)
    --- End diff --
    
    @huaxingao, let's target this 3.0.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r214237818
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +252,16 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @since(2.4)
    +    def getActiveSession(self):
    +        """
    +        Returns the active SparkSession for the current thread, returned by the builder.
    +        >>> s = spark.getActiveSession()
    +        >>> spark._jsparkSession.getDefaultSession().get().equals(s.get())
    +        True
    +        """
    +        return self._jsparkSession.getActiveSession()
    --- End diff --
    
    Does this return JVM instance?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #95815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95815/testReport)** for PR 22295 at commit [`cd87f06`](https://github.com/apache/spark/commit/cd87f0648e4e782e76b7166a7b494a8a9266096a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #96831 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96831/testReport)** for PR 22295 at commit [`765cf27`](https://github.com/apache/spark/commit/765cf27d2d182e6956271ccc8aae01595d40853c).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4066/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #96831 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96831/testReport)** for PR 22295 at commit [`765cf27`](https://github.com/apache/spark/commit/765cf27d2d182e6956271ccc8aae01595d40853c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r225667174
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2633,6 +2633,23 @@ def sequence(start, stop, step=None):
                 _to_java_column(start), _to_java_column(stop), _to_java_column(step)))
     
     
    +@since(3.0)
    +def getActiveSession():
    +    """
    +    Returns the active SparkSession for the current thread
    +    """
    +    from pyspark.sql import SparkSession
    +    sc = SparkContext._active_spark_context
    --- End diff --
    
    @holdenk @HyukjinKwon 
    Thanks for the comments.
    I checked Scala's behavior:
    ```
      test("my test") {
        val cx = SparkContext.getActive
        val session = SparkSession.getActiveSession
        println(cx)
        println(session)
      }
    ```
    The result is 
    ```
    None
    None
    ```
    So it returns None if sc isNone. Actually my current code returns None if sc isNone, but I will change the code a bit to make it more obvious. I will also add _ prefix in the function name and mention in the docstring that this function is not supposed to be called directly.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r219552270
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None):
                     or SparkSession._instantiatedSession._sc._jsc is None:
                 SparkSession._instantiatedSession = self
                 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
    +            self._jvm.SparkSession.setActiveSession(self._jsparkSession)
    --- End diff --
    
    Yes this seems like the right path forward, thanks for figuring out that was missing as well.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #95507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95507/testReport)** for PR 22295 at commit [`e9885b3`](https://github.com/apache/spark/commit/e9885b38e35dcfd01a40e43cf442beeaea226b98).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96831/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #96451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96451/testReport)** for PR 22295 at commit [`d7be3bf`](https://github.com/apache/spark/commit/d7be3bfbdbbcd2d95885f26bef690b7a949ff5ed).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r218309924
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +253,22 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @since(3.0)
    +    def getActiveSession(self):
    --- End diff --
    
    cc @ueshin 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2938/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #97466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97466/testReport)** for PR 22295 at commit [`1ee58af`](https://github.com/apache/spark/commit/1ee58af4cf68ccc18e00cb5b0b2919886dcf04f7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r215556819
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +252,16 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @since(2.4)
    +    def getActiveSession(self):
    +        """
    +        Returns the active SparkSession for the current thread, returned by the builder.
    +        >>> s = spark.getActiveSession()
    +        >>> spark._jsparkSession.getDefaultSession().get().equals(s.get())
    +        True
    +        """
    +        return self._jsparkSession.getActiveSession()
    --- End diff --
    
    Yea, I think we should return Python session one. JVM instance should not be exposed .. I assume returning `None` is fine. The thing is, we have the lack of session supports in PySpark. It's partially implemented but not very well tested as far as I can tell.
    
    Can you add a set of tests for it, and manually test them as well? Actually, my guts say this is quite a big deal 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r218237306
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None):
                     or SparkSession._instantiatedSession._sc._jsc is None:
                 SparkSession._instantiatedSession = self
                 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
    +            self._jvm.SparkSession.setActiveSession(self._jsparkSession)
    --- End diff --
    
    Thanks you very much for your comments. 
    I have a question here. In stop() method, shall we clear the activeSession too? Currently, it has
    ```
        def stop(self):
            """Stop the underlying :class:`SparkContext`.
            """
            self._jvm.SparkSession.clearDefaultSession()
            SparkSession._instantiatedSession = None
    ```
    Do I need to add the following?
    ```
          self._jvm.SparkSession.clearActiveSession()
    ```
    To test for getActiveSession when there is no active session, I am thinking of adding
    ```
        def test_get_active_session_when_no_active_session(self):
            spark = SparkSession.builder \
                .master("local") \
                .getOrCreate()
            spark.stop()
            active = spark.getActiveSession()
            self.assertEqual(active, None)
    ```
    The test didn't pass because in stop(), the active session is not cleared. 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #96708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96708/testReport)** for PR 22295 at commit [`3e11d0a`](https://github.com/apache/spark/commit/3e11d0a239e645baf0abae4f855983ff2948002b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #96707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96707/testReport)** for PR 22295 at commit [`c966846`](https://github.com/apache/spark/commit/c966846366d1f9a4deb2c7fde4485eac86a0610c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    I'll leave this for if @HyukjinKwon has any final comments, otherwise I'm happy to merge.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r224858233
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2633,6 +2633,23 @@ def sequence(start, stop, step=None):
                 _to_java_column(start), _to_java_column(stop), _to_java_column(step)))
     
     
    +@since(3.0)
    +def getActiveSession():
    +    """
    +    Returns the active SparkSession for the current thread
    +    """
    +    from pyspark.sql import SparkSession
    +    sc = SparkContext._active_spark_context
    --- End diff --
    
    If this is being done to simplify implementation and we don't expect people to call it directly here we should mention that in the docstring and also use an _ prefix.
    
    I disagree with @HyukjinKwon about this behaviour being what people would expect -- it doesn't match the Scala behaviour and one of the reasons to have something like `getActiveSession()` instead of `getOrCreate()` is to allow folks to do something if we have an active session or do something else if we don't.
    
    What about if `sc` is`None` we just return `None `since we can't have an `activeSession` without an active `SparkContext` -- does that sound reasonable?
    
    That being said if folks feel strongly about this I'm _ok_ with us setting up a SparkContext but we need to document that if that's the path we go.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4042/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r221429170
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None):
                     or SparkSession._instantiatedSession._sc._jsc is None:
                 SparkSession._instantiatedSession = self
                 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
    +            self._jvm.SparkSession.setActiveSession(self._jsparkSession)
    --- End diff --
    
    Simialr. I was expecting something like:
    
    ```python
    session1 = SparkSession.builder.config("key1", "value1").getOrCreate()
    session2 = SparkSession.builder.config("key2", "value2").getOrCreate()
    
    assert(session2 == SparkSession.getActiveSession())
    
    session1.createDataFrame([(1, 'Alice')], ['age', 'name'])
    
    assert(session1 == SparkSession.getActiveSession())
    ```
    
    does this work?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Thank you very much for your help! ! @holdenk @HyukjinKwon 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #95509 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95509/testReport)** for PR 22295 at commit [`89c3b44`](https://github.com/apache/spark/commit/89c3b44fa965dde7623d00596df25fd254806c9d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Looks close to go.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #96451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96451/testReport)** for PR 22295 at commit [`d7be3bf`](https://github.com/apache/spark/commit/d7be3bfbdbbcd2d95885f26bef690b7a949ff5ed).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95953/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #97577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97577/testReport)** for PR 22295 at commit [`94e3db0`](https://github.com/apache/spark/commit/94e3db0c0c9873daaca688c2a63f01420882692e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r221394694
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None):
                     or SparkSession._instantiatedSession._sc._jsc is None:
                 SparkSession._instantiatedSession = self
                 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
    +            self._jvm.SparkSession.setActiveSession(self._jsparkSession)
    --- End diff --
    
    @HyukjinKwon Do you mean something like this:
    ```
        def test_two_spark_session(self):
            session1 = None
            session2 = None
            try:
                session1 = SparkSession.builder.config("key1", "value1").getOrCreate()
                session2 = SparkSession.builder.config("key2", "value2").getOrCreate()
                self.assertEqual(session1, session2)
    
                df = session1.createDataFrame([(1, 'Alice')], ['age', 'name'])
                self.assertEqual(df.collect(), [Row(age=1, name=u'Alice')])
                activeSession1 = session1.getActiveSession()
                activeSession2 = session2.getActiveSession()
                self.assertEqual(activeSession1, activeSession1)
    
            finally:
                if session1 is not None:
                    session1.stop()
                if session2 is not None:
                    session2.stop()
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r223165392
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +255,20 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @classmethod
    +    @since(2.5)
    +    def getActiveSession(cls):
    +        """
    +        Returns the active SparkSession for the current thread, returned by the builder.
    +        >>> s = SparkSession.getActiveSession()
    +        >>> l = [('Alice', 1)]
    +        >>> rdd = s.sparkContext.parallelize(l)
    +        >>> df = s.createDataFrame(rdd, ['name', 'age'])
    +        >>> df.select("age").collect()
    +        [Row(age=1)]
    +        """
    +        return cls._activeSession
    --- End diff --
    
    @HyukjinKwon I am not sure if I follow your suggestion correctly. Does the following look right to you?
    session.py
    ```
        @classmethod
        @since(3.0)
        def getActiveSession(cls):
            from pyspark.sql import functions
            return functions.getActiveSession()
    ```
    functions.py
    ```
    @since(3.0)
    def getActiveSession():
        from pyspark.sql import SparkSession
        sc = SparkContext._active_spark_context
        if sc is None:
          sc = SparkContext()
    
        if sc._jvm.SparkSession.getActiveSession().isDefined():
            SparkSession(sc, sc._jvm.SparkSession.getActiveSession().get())
            return SparkSession._activeSession
        else:
            return None
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96708/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3611/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r222958780
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +255,20 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @classmethod
    +    @since(2.5)
    +    def getActiveSession(cls):
    +        """
    +        Returns the active SparkSession for the current thread, returned by the builder.
    +        >>> s = SparkSession.getActiveSession()
    +        >>> l = [('Alice', 1)]
    +        >>> rdd = s.sparkContext.parallelize(l)
    +        >>> df = s.createDataFrame(rdd, ['name', 'age'])
    +        >>> df.select("age").collect()
    +        [Row(age=1)]
    +        """
    +        return cls._activeSession
    --- End diff --
    
    Yup.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r224983583
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None):
                     or SparkSession._instantiatedSession._sc._jsc is None:
                 SparkSession._instantiatedSession = self
                 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
    +            self._jvm.SparkSession.setActiveSession(self._jsparkSession)
    --- End diff --
    
    Oh, okay. I had to be explicit. I meant:
    
    ```scala
    scala> import org.apache.spark.sql.SparkSession
    import org.apache.spark.sql.SparkSession
    
    scala> SparkSession.getActiveSession
    res0: Option[org.apache.spark.sql.SparkSession] = Some(org.apache.spark.sql.SparkSession@3ef4a8fb)
    
    scala> val session1 = spark
    session1: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@3ef4a8fb
    
    scala> val session2 = spark.newSession()
    session2: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@4b74a4d
    
    scala> SparkSession.getActiveSession
    res1: Option[org.apache.spark.sql.SparkSession] = Some(org.apache.spark.sql.SparkSession@3ef4a8fb)
    
    scala> session2.createDataFrame(Seq(Tuple1(1)))
    res2: org.apache.spark.sql.DataFrame = [_1: int]
    
    scala> SparkSession.getActiveSession
    res3: Option[org.apache.spark.sql.SparkSession] = Some(org.apache.spark.sql.SparkSession@4b74a4d)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2717/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3545/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r221089916
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None):
                     or SparkSession._instantiatedSession._sc._jsc is None:
                 SparkSession._instantiatedSession = self
                 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
    +            self._jvm.SparkSession.setActiveSession(self._jsparkSession)
    --- End diff --
    
    @HyukjinKwon Seems to me that active session is set OK in the ```__init__```. When createDataFrame, we already have a session, and the active session is already set in the ```__init__```. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r222699236
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None):
                     or SparkSession._instantiatedSession._sc._jsc is None:
                 SparkSession._instantiatedSession = self
                 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
    +            self._jvm.SparkSession.setActiveSession(self._jsparkSession)
    --- End diff --
    
    So @HyukjinKwon in this code session1 and session2 are already equal:
    
    > Welcome to
    >       ____              __
    >      / __/__  ___ _____/ /__
    >     _\ \/ _ \/ _ `/ __/  '_/
    >    /__ / .__/\_,_/_/ /_/\_\   version 2.3.1
    >       /_/
    > 
    > Using Python version 3.6.5 (default, Apr 29 2018 16:14:56)
    > SparkSession available as 'spark'.
    > >>> session1 = SparkSession.builder.config("key1", "value1").getOrCreate()
    > >>> session2 = SparkSession.builder.config("key2", "value2").getOrCreate()
    > >>> session1
    > <pyspark.sql.session.SparkSession object at 0x7ff6d4843b00>
    > >>> session2
    > <pyspark.sql.session.SparkSession object at 0x7ff6d4843b00>
    > >>> session1 == session2
    > True
    > >>> 
    > 
    > 
    > 
    > 
    > 
    
    That being said the possibility of having multiple Spark session in Python is doable you manually have to call the init e.g.:
    
    > >>> session3 = SparkSession(sc)
    > >>> session3
    > <pyspark.sql.session.SparkSession object at 0x7ff6d3dbd160>
    > >>> 
    > 
    
    And supporting that is reasonable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #95953 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95953/testReport)** for PR 22295 at commit [`65fc45f`](https://github.com/apache/spark/commit/65fc45fe6677a20fcfb01e0426e5187b01b044cc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #95953 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95953/testReport)** for PR 22295 at commit [`65fc45f`](https://github.com/apache/spark/commit/65fc45fe6677a20fcfb01e0426e5187b01b044cc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r226166057
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3863,6 +3863,145 @@ def test_jvm_default_session_already_set(self):
                 spark.stop()
     
     
    +class SparkSessionTests2(unittest.TestCase):
    +
    +    def test_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            activeSession = SparkSession.getActiveSession()
    +            df = activeSession.createDataFrame([(1, 'Alice')], ['age', 'name'])
    +            self.assertEqual(df.collect(), [Row(age=1, name=u'Alice')])
    +        finally:
    +            spark.stop()
    +
    +    def test_get_active_session_when_no_active_session(self):
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, None)
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, spark)
    +        spark.stop()
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, None)
    +
    +    def test_SparkSession(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .config("some-config", "v2") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(spark.conf.get("some-config"), "v2")
    +            self.assertEqual(spark.sparkContext._conf.get("some-config"), "v2")
    +            self.assertEqual(spark.version, spark.sparkContext.version)
    +            spark.sql("CREATE DATABASE test_db")
    +            spark.catalog.setCurrentDatabase("test_db")
    +            self.assertEqual(spark.catalog.currentDatabase(), "test_db")
    +            spark.sql("CREATE TABLE table1 (name STRING, age INT) USING parquet")
    +            self.assertEqual(spark.table("table1").columns, ['name', 'age'])
    +            self.assertEqual(spark.range(3).count(), 3)
    +        finally:
    +            spark.stop()
    +
    +    def test_global_default_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(SparkSession.builder.getOrCreate(), spark)
    +        finally:
    +            spark.stop()
    +
    +    def test_default_and_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        activeSession = spark._jvm.SparkSession.getActiveSession()
    +        defaultSession = spark._jvm.SparkSession.getDefaultSession()
    +        try:
    +            self.assertEqual(activeSession, defaultSession)
    +        finally:
    +            spark.stop()
    +
    +    def test_config_option_propagated_to_existing_SparkSession(self):
    --- End diff --
    
    Let's just above `SparkSession` -> `spark_session`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r222959106
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +255,20 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @classmethod
    +    @since(2.5)
    --- End diff --
    
    Let's do this to 3.0. Per https://github.com/apache/spark/commit/9bf397c0e45cb161f3f12f09bd2bf14ff96dc823, looks we are going ahead for 3.0 now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r224858616
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +253,22 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @since(3.0)
    --- End diff --
    
    @HyukjinKwon are you OK to mark this comment as resolved since we're now targeting `3.0`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r214530177
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +252,16 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @since(2.4)
    +    def getActiveSession(self):
    +        """
    +        Returns the active SparkSession for the current thread, returned by the builder.
    +        >>> s = spark.getActiveSession()
    +        >>> spark._jsparkSession.getDefaultSession().get().equals(s.get())
    --- End diff --
    
    ..and probably shouldn't access `_jsparkSession`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r225667299
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3654,6 +3654,109 @@ def test_jvm_default_session_already_set(self):
                 spark.stop()
     
     
    +class SparkSessionTests2(unittest.TestCase):
    +
    +    def test_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            activeSession = SparkSession.getActiveSession()
    +            df = activeSession.createDataFrame([(1, 'Alice')], ['age', 'name'])
    +            self.assertEqual(df.collect(), [Row(age=1, name=u'Alice')])
    +        finally:
    +            spark.stop()
    +
    +    def test_get_active_session_when_no_active_session(self):
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, None)
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, spark)
    +        spark.stop()
    +        active = SparkSession.getActiveSession()
    +        self.assertEqual(active, None)
    --- End diff --
    
    Thanks @holdenk 
    I will add a test for the above comment and also add a test for your comment regarding
    ```
    self._jvm.SparkSession.setActiveSession(self._jsparkSession)
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r218371105
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3654,6 +3654,107 @@ def test_jvm_default_session_already_set(self):
                 spark.stop()
     
     
    +class SparkSessionTests2(ReusedSQLTestCase):
    +
    +    def test_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            activeSession = spark.getActiveSession()
    +            df = activeSession.createDataFrame([(1, 'Alice')], ['age', 'name'])
    +            self.assertEqual(df.collect(), [Row(age=1, name=u'Alice')])
    +        finally:
    +            spark.stop()
    +
    +    def test_SparkSession(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .config("some-config", "v2") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(spark.conf.get("some-config"), "v2")
    +            self.assertEqual(spark.sparkContext._conf.get("some-config"), "v2")
    +            self.assertEqual(spark.version, spark.sparkContext.version)
    +            spark.sql("CREATE DATABASE test_db")
    +            spark.catalog.setCurrentDatabase("test_db")
    +            self.assertEqual(spark.catalog.currentDatabase(), "test_db")
    +            spark.sql("CREATE TABLE table1 (name STRING, age INT) USING parquet")
    +            self.assertEqual(spark.table("table1").columns, ['name', 'age'])
    +            self.assertEqual(spark.range(3).count(), 3)
    +        finally:
    +            spark.stop()
    +
    +    def test_global_default_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(SparkSession.builder.getOrCreate(), spark)
    +        finally:
    +            spark.stop()
    +
    +    def test_default_and_active_session(self):
    +        spark = SparkSession.builder \
    +            .master("local") \
    +            .getOrCreate()
    +        activeSession = spark._jvm.SparkSession.getActiveSession()
    +        defaultSession = spark._jvm.SparkSession.getDefaultSession()
    +        try:
    +            self.assertEqual(activeSession, defaultSession)
    +        finally:
    +            spark.stop()
    +
    +    def test_config_option_propagated_to_existing_SparkSession(self):
    +        session1 = SparkSession.builder \
    +            .master("local") \
    +            .config("spark-config1", "a") \
    +            .getOrCreate()
    +        self.assertEqual(session1.conf.get("spark-config1"), "a")
    +        session2 = SparkSession.builder \
    +            .config("spark-config1", "b") \
    +            .getOrCreate()
    +        try:
    +            self.assertEqual(session1, session2)
    +            self.assertEqual(session1.conf.get("spark-config1"), "b")
    +        finally:
    +            session1.stop()
    +
    +    def test_newSession(self):
    --- End diff --
    
    ditto for naming. Let's just follow Python's convention in those names


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97502/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #97466 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97466/testReport)** for PR 22295 at commit [`1ee58af`](https://github.com/apache/spark/commit/1ee58af4cf68ccc18e00cb5b0b2919886dcf04f7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22295#discussion_r215556683
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -252,6 +252,16 @@ def newSession(self):
             """
             return self.__class__(self._sc, self._jsparkSession.newSession())
     
    +    @since(2.4)
    +    def getActiveSession(self):
    +        """
    +        Returns the active SparkSession for the current thread, returned by the builder.
    +        >>> s = spark.getActiveSession()
    +        >>> spark._jsparkSession.getDefaultSession().get().equals(s.get())
    +        True
    +        """
    +        return self._jsparkSession.getActiveSession()
    --- End diff --
    
    Yea, I think we should return Python session one. JVM instance should not be exposed .. I assume returning `None` is fine. The thing is, we have the lack of session supports in PySpark. It's partially implemented but not very well tested as far as I can tell.
    
    Can you add a set of tests for it, and manually test them as well? Actually, my guys say this is quite a big deal 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    **[Test build #96833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96833/testReport)** for PR 22295 at commit [`b83cf8e`](https://github.com/apache/spark/commit/b83cf8ee68806a5fcab83d939d55f19f752e2a2a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22295
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4067/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org