You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2017/01/03 03:35:29 UTC

[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/16454

    [SPARK-19055][SQL][PySpark] Fix SparkSession initialization when SparkContext is stopped

    ## What changes were proposed in this pull request?
    
    
    
    In SparkSession initialization, we store created the instance of SparkSession into a class variable _instantiatedContext. Next time we can use SparkSession.builder.getOrCreate() to retrieve the existing SparkSession instance.
    
    However, when the active SparkContext is stopped and we create another new SparkContext to use, the existing SparkSession is still associated with the stopped SparkContext. So the operations with this existing SparkSession will be failed.
    
    We need to detect such case in SparkSession and renew the class variable _instantiatedContext if needed.
    
    ## How was this patch tested?
    
    New test added in PySpark.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 fix-pyspark-sparksession

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16454.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16454
    
----
commit 80bba5ead0601f3ef4b05fff5391d07a61e06341
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2017-01-03T03:06:21Z

    Fix SparkSession initialization when previous SparkContext is stopped and new SparkContext is created.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    **[Test build #70979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70979/testReport)** for PR 16454 at commit [`6078c7d`](https://github.com/apache/spark/commit/6078c7dd37152301859d31789ca571a6be92f4eb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    **[Test build #70962 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70962/testReport)** for PR 16454 at commit [`a22a08f`](https://github.com/apache/spark/commit/a22a08fd8655862fbbadc7b376b1a0ad8acfaadb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16454


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    **[Test build #70962 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70962/testReport)** for PR 16454 at commit [`a22a08f`](https://github.com/apache/spark/commit/a22a08fd8655862fbbadc7b376b1a0ad8acfaadb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by vijoshi <gi...@git.apache.org>.
Github user vijoshi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16454#discussion_r94903274
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -214,8 +214,14 @@ def __init__(self, sparkContext, jsparkSession=None):
             self._wrapped = SQLContext(self._sc, self, self._jwrapped)
             _monkey_patch_RDD(self)
             install_exception_handler()
    -        if SparkSession._instantiatedContext is None:
    -            SparkSession._instantiatedContext = self
    +        if SparkSession._instantiatedSession is None:
    +            SparkSession._instantiatedSession = self
    +        else:
    +            # If we had an instantiated SparkSession attached with a SparkContext
    +            # which is stopped now, we need to renew the instantiated SparkSession.
    +            # Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate.
    +            if SparkSession._instantiatedSession._sc._jsc is None:
    +                SparkSession._instantiatedSession = self
    --- End diff --
    
    Would this be simpler ?
    ``` 
        if SparkSession._instantiatedSession is None \
                or SparkSession._instantiatedSession._sc._jsc is None:
            SparkSession._instantiatedContext = self
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16454#discussion_r95258453
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -161,8 +161,8 @@ def getOrCreate(self):
                 with self._lock:
                     from pyspark.context import SparkContext
                     from pyspark.conf import SparkConf
    -                session = SparkSession._instantiatedContext
    -                if session is None:
    +                session = SparkSession._instantiatedSession
    +                if session is None or session._sc._jsc is None:
    --- End diff --
    
    #16519 adds `isStopped`, which I think is a better way of checking whether the Spark context is valid than just checking whether `_jsc` is defined because `_jsc` could be stopped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71251/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    I just posted a fix to this also. I'll close that one in favor of this and add comments here for what it did differently that we should consider.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16454#discussion_r95055690
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1886,6 +1887,28 @@ def test_hivecontext(self):
             self.assertTrue(os.path.exists(metastore_path))
     
     
    +class SQLTests2(ReusedPySparkTestCase):
    --- End diff --
    
    `ReusedPySparkTestCase` launches a class's spark context. So I use it as the default spark context.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16454#discussion_r94903491
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -214,8 +214,14 @@ def __init__(self, sparkContext, jsparkSession=None):
             self._wrapped = SQLContext(self._sc, self, self._jwrapped)
             _monkey_patch_RDD(self)
             install_exception_handler()
    -        if SparkSession._instantiatedContext is None:
    -            SparkSession._instantiatedContext = self
    +        if SparkSession._instantiatedSession is None:
    +            SparkSession._instantiatedSession = self
    +        else:
    +            # If we had an instantiated SparkSession attached with a SparkContext
    +            # which is stopped now, we need to renew the instantiated SparkSession.
    +            # Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate.
    +            if SparkSession._instantiatedSession._sc._jsc is None:
    +                SparkSession._instantiatedSession = self
    --- End diff --
    
    yeah. sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    **[Test build #70793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70793/testReport)** for PR 16454 at commit [`80bba5e`](https://github.com/apache/spark/commit/80bba5ead0601f3ef4b05fff5391d07a61e06341).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    **[Test build #70793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70793/testReport)** for PR 16454 at commit [`80bba5e`](https://github.com/apache/spark/commit/80bba5ead0601f3ef4b05fff5391d07a61e06341).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70979/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16454#discussion_r94905140
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -214,8 +214,14 @@ def __init__(self, sparkContext, jsparkSession=None):
             self._wrapped = SQLContext(self._sc, self, self._jwrapped)
             _monkey_patch_RDD(self)
             install_exception_handler()
    -        if SparkSession._instantiatedContext is None:
    -            SparkSession._instantiatedContext = self
    +        if SparkSession._instantiatedSession is None:
    +            SparkSession._instantiatedSession = self
    +        else:
    +            # If we had an instantiated SparkSession attached with a SparkContext
    +            # which is stopped now, we need to renew the instantiated SparkSession.
    +            # Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate.
    +            if SparkSession._instantiatedSession._sc._jsc is None:
    +                SparkSession._instantiatedSession = self
    --- End diff --
    
    `SparkSession._instantiatedSession._sc` is from the `sparkContext` passed into `__init__`. I think it should not be None.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    **[Test build #71251 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71251/testReport)** for PR 16454 at commit [`6078c7d`](https://github.com/apache/spark/commit/6078c7dd37152301859d31789ca571a6be92f4eb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16454#discussion_r95846995
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -161,8 +161,8 @@ def getOrCreate(self):
                 with self._lock:
                     from pyspark.context import SparkContext
                     from pyspark.conf import SparkConf
    -                session = SparkSession._instantiatedContext
    -                if session is None:
    +                session = SparkSession._instantiatedSession
    +                if session is None or session._sc._jsc is None:
    --- End diff --
    
    Yeah, I had to change to just checking whether _jsc was defined in our branch for the same reason. I'd prefer to add isStopped to Java and Python eventually, but that doesn't need to block this commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    how about we set `_instantiatedSession` to None in `SparkContext.stop`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16454#discussion_r95724161
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -161,8 +161,8 @@ def getOrCreate(self):
                 with self._lock:
                     from pyspark.context import SparkContext
                     from pyspark.conf import SparkConf
    -                session = SparkSession._instantiatedContext
    -                if session is None:
    +                session = SparkSession._instantiatedSession
    +                if session is None or session._sc._jsc is None:
    --- End diff --
    
    Actually `_jsc` is a `JavaSparkContext` instead of `SparkContext` and it doesn't have `isStopped`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    @vanzin yup, I also think about it too. Maybe we should use `_instantiatedSession` to replace it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16454#discussion_r95724228
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -161,8 +161,8 @@ def getOrCreate(self):
                 with self._lock:
                     from pyspark.context import SparkContext
                     from pyspark.conf import SparkConf
    -                session = SparkSession._instantiatedContext
    -                if session is None:
    +                session = SparkSession._instantiatedSession
    +                if session is None or session._sc._jsc is None:
    --- End diff --
    
    We can add `isStopped` to `JavaSparkContext`, but I will prefer to solve it in Python side if possible.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16454#discussion_r94904472
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -214,8 +214,14 @@ def __init__(self, sparkContext, jsparkSession=None):
             self._wrapped = SQLContext(self._sc, self, self._jwrapped)
             _monkey_patch_RDD(self)
             install_exception_handler()
    -        if SparkSession._instantiatedContext is None:
    -            SparkSession._instantiatedContext = self
    +        if SparkSession._instantiatedSession is None:
    +            SparkSession._instantiatedSession = self
    +        else:
    +            # If we had an instantiated SparkSession attached with a SparkContext
    +            # which is stopped now, we need to renew the instantiated SparkSession.
    +            # Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate.
    +            if SparkSession._instantiatedSession._sc._jsc is None:
    +                SparkSession._instantiatedSession = self
    --- End diff --
    
    Can SparkSession._instantiatedSession._sc is None?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    `SparkContext` is not associated with `SparkSession` because `SparkSession`  is a SQL-only class. So I think it is not good to import sql's `SparkSession` to `SparkContext` and to modify its value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16454#discussion_r95846835
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -214,8 +214,12 @@ def __init__(self, sparkContext, jsparkSession=None):
             self._wrapped = SQLContext(self._sc, self, self._jwrapped)
             _monkey_patch_RDD(self)
             install_exception_handler()
    -        if SparkSession._instantiatedContext is None:
    -            SparkSession._instantiatedContext = self
    +        # If we had an instantiated SparkSession attached with a SparkContext
    +        # which is stopped now, we need to renew the instantiated SparkSession.
    +        # Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate.
    +        if SparkSession._instantiatedSession is None \
    +                or SparkSession._instantiatedSession._sc._jsc is None:
    --- End diff --
    
    If this can't tell whether the java context is stopped, then it should be fine to use this logic. +1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16454#discussion_r95729251
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -214,8 +214,12 @@ def __init__(self, sparkContext, jsparkSession=None):
             self._wrapped = SQLContext(self._sc, self, self._jwrapped)
             _monkey_patch_RDD(self)
             install_exception_handler()
    -        if SparkSession._instantiatedContext is None:
    -            SparkSession._instantiatedContext = self
    +        # If we had an instantiated SparkSession attached with a SparkContext
    +        # which is stopped now, we need to renew the instantiated SparkSession.
    +        # Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate.
    +        if SparkSession._instantiatedSession is None \
    +                or SparkSession._instantiatedSession._sc._jsc is None:
    --- End diff --
    
    Well, as I tried, this would case test failed.
    
    `__init__` and `getOrCreate` are both legal paths. Once one calls `__init__` to get a `SparkSession`  and other calls `getOrCreate` later, the second one will be the different session than the first one, if the logic to update `_instantiatedSession` is different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    **[Test build #70979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70979/testReport)** for PR 16454 at commit [`6078c7d`](https://github.com/apache/spark/commit/6078c7dd37152301859d31789ca571a6be92f4eb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70962/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16454#discussion_r95050517
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1886,6 +1887,28 @@ def test_hivecontext(self):
             self.assertTrue(os.path.exists(metastore_path))
     
     
    +class SQLTests2(ReusedPySparkTestCase):
    --- End diff --
    
    Is there any particular reason this is built on a `ReusedPySparkTestCase`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    **[Test build #71251 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71251/testReport)** for PR 16454 at commit [`6078c7d`](https://github.com/apache/spark/commit/6078c7dd37152301859d31789ca571a6be92f4eb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    Looks ok but I wonder why the variable is called `_instantiatedContext` and not `_instantiatedSession` or something that says it's a session.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    cc @cloud-fan @holdenk @srowen @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    thanks, merging to maser/2.1/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16454#discussion_r95724518
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -161,8 +161,8 @@ def getOrCreate(self):
                 with self._lock:
                     from pyspark.context import SparkContext
                     from pyspark.conf import SparkConf
    -                session = SparkSession._instantiatedContext
    -                if session is None:
    +                session = SparkSession._instantiatedSession
    +                if session is None or session._sc._jsc is None:
    --- End diff --
    
    I can change it to `isStopped` way if others also prefer that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16454#discussion_r95725013
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -214,8 +214,12 @@ def __init__(self, sparkContext, jsparkSession=None):
             self._wrapped = SQLContext(self._sc, self, self._jwrapped)
             _monkey_patch_RDD(self)
             install_exception_handler()
    -        if SparkSession._instantiatedContext is None:
    -            SparkSession._instantiatedContext = self
    +        # If we had an instantiated SparkSession attached with a SparkContext
    +        # which is stopped now, we need to renew the instantiated SparkSession.
    +        # Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate.
    +        if SparkSession._instantiatedSession is None \
    +                or SparkSession._instantiatedSession._sc._jsc is None:
    --- End diff --
    
    Sounds good. I will do the change, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16454#discussion_r95259073
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -214,8 +214,12 @@ def __init__(self, sparkContext, jsparkSession=None):
             self._wrapped = SQLContext(self._sc, self, self._jwrapped)
             _monkey_patch_RDD(self)
             install_exception_handler()
    -        if SparkSession._instantiatedContext is None:
    -            SparkSession._instantiatedContext = self
    +        # If we had an instantiated SparkSession attached with a SparkContext
    +        # which is stopped now, we need to renew the instantiated SparkSession.
    +        # Otherwise, we will use invalid SparkSession when we call Builder.getOrCreate.
    +        if SparkSession._instantiatedSession is None \
    +                or SparkSession._instantiatedSession._sc._jsc is None:
    --- End diff --
    
    What about setting `SparkSession._instantiatedSession` to `None` in `getOrCreate` instead of duplicating the logic?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession initializat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16454
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70793/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org