You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by RussellSpitzer <gi...@git.apache.org> on 2018/08/03 16:07:13 UTC

[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

GitHub user RussellSpitzer opened a pull request:

    https://github.com/apache/spark/pull/21990

    [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

    Master
    
    ## What changes were proposed in this pull request?
    
    Previously Pyspark used the private constructor for SparkSession when
    building that object. This resulted in a SparkSession without checking
    the sql.extensions parameter for additional session extensions. To fix
    this we instead use the Session.builder() path as SparkR uses, this
    loads the extensions and allows their use in PySpark.
    
    ## How was this patch tested?
    
    This was manually tested by passing a class to spark.sql.extensions and making sure it's included strategies appeared in the spark._jsparkSession.sessionState.planner.strategies list. We could add a automatic test but i'm not very familiar with the Pyspark Testing framework. But I would be glad to implement that if requested.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/RussellSpitzer/spark SPARK-25003-master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21990.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21990
    
----
commit f790ae000ae1c3f4030162028186448d345e2984
Author: Russell Spitzer <ru...@...>
Date:   2018-08-03T16:04:00Z

    [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark
    
    Previously Pyspark used the private constructor for SparkSession when
    building that object. This resulted in a SparkSession without checking
    the sql.extensions parameter for additional session extensions. To fix
    this we instead use the Session.builder() path as SparkR uses, this
    loads the extensions and allows their use in PySpark.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94975/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r225613517
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
           SparkSession.clearDefaultSession()
         }
       }
    +
    +  /**
    +   * Initialize extensions if the user has defined a configurator class in their SparkConf.
    +   * This class will be applied to the extensions passed into this function.
    +   */
    +  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) {
    --- End diff --
    
    It's difficult here since I'm attempting to cause the least change in behavior for the old code paths :(


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    @RussellSpitzer, also please ping me here if you face any difficulties. am wiling to help or push some changes to your branch.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r221436324
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -212,13 +212,17 @@ def __init__(self, sparkContext, jsparkSession=None):
             self._sc = sparkContext
             self._jsc = self._sc._jsc
             self._jvm = self._sc._jvm
    +
    --- End diff --
    
    really not a big deal but let's revert this newline to leave related changes only.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK][MASTER] Use SessionExtensions in ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    @RussellSpitzer, let's close other ones except for this and name it `[SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark`. Let me review this one within few days. Also, I don't think we should do it, at least, to branch-2.2. This logic here is quite convoluted and I would rather avoid to backport even to branch-2.3 actually.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r210790581
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3563,6 +3563,51 @@ def test_query_execution_listener_on_collect_with_arrow(self):
                     "The callback from the query execution listener should be called after 'toPandas'")
     
     
    +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
    +    # These tests are separate because it uses 'spark.sql.extensions' which is
    +    # static and immutable. This can't be set or unset, for example, via `spark.conf`.
    +
    +    @classmethod
    +    def setUpClass(cls):
    +        import glob
    +        from pyspark.find_spark_home import _find_spark_home
    +
    +        SPARK_HOME = _find_spark_home()
    +        filename_pattern = (
    +            "sql/core/target/scala-*/test-classes/org/apache/spark/sql/"
    +            "SparkSessionExtensionSuite.class")
    +        if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)):
    +            raise unittest.SkipTest(
    +                "'org.apache.spark.sql.SparkSessionExtensionSuite.' is not "
    +                "available. Will skip the related tests.")
    +
    +        # Note that 'spark.sql.extensions' is a static immutable configuration.
    +        cls.spark = SparkSession.builder \
    +            .master("local[4]") \
    +            .appName(cls.__name__) \
    +            .config(
    +                "spark.sql.extensions",
    +                "org.apache.spark.sql.MyExtensions") \
    +            .getOrCreate()
    +
    +    @classmethod
    +    def tearDownClass(cls):
    +        cls.spark.stop()
    +
    +    def tearDown(self):
    +        self.spark._jvm.OnSuccessCall.clear()
    --- End diff --
    
    This wouldn't be needed since I did this for testing if the callback is called or not in the PR pointed out.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r210909083
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3563,6 +3563,51 @@ def test_query_execution_listener_on_collect_with_arrow(self):
                     "The callback from the query execution listener should be called after 'toPandas'")
     
     
    +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
    +    # These tests are separate because it uses 'spark.sql.extensions' which is
    +    # static and immutable. This can't be set or unset, for example, via `spark.conf`.
    +
    +    @classmethod
    +    def setUpClass(cls):
    +        import glob
    +        from pyspark.find_spark_home import _find_spark_home
    +
    +        SPARK_HOME = _find_spark_home()
    +        filename_pattern = (
    +            "sql/core/target/scala-*/test-classes/org/apache/spark/sql/"
    +            "SparkSessionExtensionSuite.class")
    +        if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)):
    +            raise unittest.SkipTest(
    +                "'org.apache.spark.sql.SparkSessionExtensionSuite.' is not "
    +                "available. Will skip the related tests.")
    +
    +        # Note that 'spark.sql.extensions' is a static immutable configuration.
    +        cls.spark = SparkSession.builder \
    +            .master("local[4]") \
    +            .appName(cls.__name__) \
    +            .config(
    +                "spark.sql.extensions",
    +                "org.apache.spark.sql.MyExtensions") \
    --- End diff --
    
    I'm not sure what you mean here? This class already exists as part of the SparkSqlExtensions Suite and i'm just reusing with no changes. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #94873 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94873/testReport)** for PR 21990 at commit [`0eea205`](https://github.com/apache/spark/commit/0eea205ca0591c68975412873b34393f6bf19437).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #94866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94866/testReport)** for PR 21990 at commit [`7ba70b5`](https://github.com/apache/spark/commit/7ba70b524f9779529142f6c70b04610b5b068a05).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #97472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97472/testReport)** for PR 21990 at commit [`d9b2a55`](https://github.com/apache/spark/commit/d9b2a55275b74c406d9f9c435bf1b53a6ef4b35a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r225818067
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
           SparkSession.clearDefaultSession()
         }
       }
    +
    +  /**
    +   * Initialize extensions if the user has defined a configurator class in their SparkConf.
    +   * This class will be applied to the extensions passed into this function.
    +   */
    +  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) {
    --- End diff --
    
    Eh .. I think it's okay to have a function and returns that updated extensions.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94866/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #97497 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97497/testReport)** for PR 21990 at commit [`3629c78`](https://github.com/apache/spark/commit/3629c78118eb77589ae1126e877f0fd2b57441ce).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94870/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #96827 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96827/testReport)** for PR 21990 at commit [`cf7cf75`](https://github.com/apache/spark/commit/cf7cf75efc616151d75e1da10630731fa98e117e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96824/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #96253 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96253/testReport)** for PR 21990 at commit [`7775c08`](https://github.com/apache/spark/commit/7775c0886a7ae2be35e3caa974763f7b23954c6f).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r213010408
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -218,7 +218,9 @@ def __init__(self, sparkContext, jsparkSession=None):
                             .sparkContext().isStopped():
                     jsparkSession = self._jvm.SparkSession.getDefaultSession().get()
                 else:
    -                jsparkSession = self._jvm.SparkSession(self._jsc.sc())
    +                jsparkSession = self._jvm.SparkSession.builder() \
    --- End diff --
    
    @RussellSpitzer, have you maybe had a chance to take a look and see if we can deduplicate some logics comparing to Scala's `getOrCreate`? I am suggesting this since now it looks the code path duplicates some logics there.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #96257 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96257/testReport)** for PR 21990 at commit [`67d9772`](https://github.com/apache/spark/commit/67d9772e1f470bba83e311cd10cfe586a7f11b92).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96253/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #97471 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97471/testReport)** for PR 21990 at commit [`8ab76d4`](https://github.com/apache/spark/commit/8ab76d4385d1849c43ffd940d539c143e0ba8c81).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96827/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r225077270
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
           SparkSession.clearDefaultSession()
         }
       }
    +
    +  /**
    +   * Initialize extensions if the user has defined a configurator class in their SparkConf.
    +   * This class will be applied to the extensions passed into this function.
    +   */
    +  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) {
    --- End diff --
    
    How about returning `SparkSessionExtensions` from this method, and modifying the secondary constructor of `SparkSession` as:
    
    ```scala
    private[sql] def this(sc: SparkContext) {
      this(sc, None, None,
        SparkSession.applyExtensionsFromConf(sc.getConf, new SparkSessionExtensions))
    }
    ```
    
    I'm a little worried whether the order we apply extensions might affect.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r225439067
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
           SparkSession.clearDefaultSession()
         }
       }
    +
    +  /**
    +   * Initialize extensions if the user has defined a configurator class in their SparkConf.
    +   * This class will be applied to the extensions passed into this function.
    +   */
    +  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) {
    --- End diff --
    
    On second thoughts, we could move the method call to the top of the default constructor?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Addressed Comments from @HyukjinKwon , I'm interested in @ueshin 's suggestions, but I can't figure out how to do that unless we bake it into the Extensions constructor. If we place it in the Sessions constructor then invocations of "newSession" will reapply existing extensions. I added a note in the code. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #94151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94151/testReport)** for PR 21990 at commit [`84c2513`](https://github.com/apache/spark/commit/84c2513a2cb7a50ba01cdc0d49e2ad7f9583799b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #97510 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97510/testReport)** for PR 21990 at commit [`3629c78`](https://github.com/apache/spark/commit/3629c78118eb77589ae1126e877f0fd2b57441ce).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96199/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r225805019
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
           SparkSession.clearDefaultSession()
         }
       }
    +
    +  /**
    +   * Initialize extensions if the user has defined a configurator class in their SparkConf.
    +   * This class will be applied to the extensions passed into this function.
    +   */
    +  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) {
    --- End diff --
    
    I am always a little nervous about having functions return objects they take in as parameters and then modify. Gives an impression to me that they are stateless. If you think that this is clearer I can make the change. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r210941883
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3563,6 +3563,51 @@ def test_query_execution_listener_on_collect_with_arrow(self):
                     "The callback from the query execution listener should be called after 'toPandas'")
     
     
    +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
    +    # These tests are separate because it uses 'spark.sql.extensions' which is
    +    # static and immutable. This can't be set or unset, for example, via `spark.conf`.
    +
    +    @classmethod
    +    def setUpClass(cls):
    +        import glob
    +        from pyspark.find_spark_home import _find_spark_home
    +
    +        SPARK_HOME = _find_spark_home()
    +        filename_pattern = (
    +            "sql/core/target/scala-*/test-classes/org/apache/spark/sql/"
    +            "SparkSessionExtensionSuite.class")
    +        if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)):
    +            raise unittest.SkipTest(
    +                "'org.apache.spark.sql.SparkSessionExtensionSuite.' is not "
    +                "available. Will skip the related tests.")
    +
    +        # Note that 'spark.sql.extensions' is a static immutable configuration.
    +        cls.spark = SparkSession.builder \
    +            .master("local[4]") \
    +            .appName(cls.__name__) \
    +            .config(
    +                "spark.sql.extensions",
    +                "org.apache.spark.sql.MyExtensions") \
    --- End diff --
    
    Reduce, reuse, recycle :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97497/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #94975 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94975/testReport)** for PR 21990 at commit [`21ff627`](https://github.com/apache/spark/commit/21ff6273a7fcf54c2f40abde951d4aab89f96a02).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r225819012
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
           SparkSession.clearDefaultSession()
         }
       }
    +
    +  /**
    +   * Initialize extensions if the user has defined a configurator class in their SparkConf.
    +   * This class will be applied to the extensions passed into this function.
    +   */
    +  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) {
    --- End diff --
    
    Actually either way looks okay.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Looks close to go.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #94873 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94873/testReport)** for PR 21990 at commit [`0eea205`](https://github.com/apache/spark/commit/0eea205ca0591c68975412873b34393f6bf19437).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97510/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r225762148
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
           SparkSession.clearDefaultSession()
         }
       }
    +
    +  /**
    +   * Initialize extensions if the user has defined a configurator class in their SparkConf.
    +   * This class will be applied to the extensions passed into this function.
    +   */
    +  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) {
    --- End diff --
    
    Oh, I see, moving to the default constructor was not a good idea.
    How about the first suggestion?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #96257 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96257/testReport)** for PR 21990 at commit [`67d9772`](https://github.com/apache/spark/commit/67d9772e1f470bba83e311cd10cfe586a7f11b92).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    @HyukjinKwon so you want me to rewrite the code in python? I will note SparkR is doing this exact same thing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    @HyukjinKwon So i've been staring at this for a while today, and I guess the big issue is that we always need to make a Python SparkContext to get a handle on the JavaGateway, so everything that happens before the context is made cannot just be a wrapper of SQL methods and must reimplement. Unless we decided to refactor the code so that the JVM is more generally available (probably not possible) we will be stuck with redoing code in python ... 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r224984761
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
           SparkSession.clearDefaultSession()
         }
       }
    +
    +  /**
    +   * Initialize extensions if the user has defined a configurator class in their SparkConf.
    +   * This class will be applied to the extensions passed into this function.
    +   */
    +  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) {
    --- End diff --
    
    Let's make it `private`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r210940572
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3563,6 +3563,51 @@ def test_query_execution_listener_on_collect_with_arrow(self):
                     "The callback from the query execution listener should be called after 'toPandas'")
     
     
    +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
    +    # These tests are separate because it uses 'spark.sql.extensions' which is
    +    # static and immutable. This can't be set or unset, for example, via `spark.conf`.
    +
    +    @classmethod
    +    def setUpClass(cls):
    +        import glob
    +        from pyspark.find_spark_home import _find_spark_home
    +
    +        SPARK_HOME = _find_spark_home()
    +        filename_pattern = (
    +            "sql/core/target/scala-*/test-classes/org/apache/spark/sql/"
    +            "SparkSessionExtensionSuite.class")
    +        if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)):
    +            raise unittest.SkipTest(
    +                "'org.apache.spark.sql.SparkSessionExtensionSuite.' is not "
    +                "available. Will skip the related tests.")
    +
    +        # Note that 'spark.sql.extensions' is a static immutable configuration.
    +        cls.spark = SparkSession.builder \
    +            .master("local[4]") \
    +            .appName(cls.__name__) \
    +            .config(
    +                "spark.sql.extensions",
    +                "org.apache.spark.sql.MyExtensions") \
    --- End diff --
    
    Oh sorry I thought you wrote that class for this test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #94149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94149/testReport)** for PR 21990 at commit [`f790ae0`](https://github.com/apache/spark/commit/f790ae000ae1c3f4030162028186448d345e2984).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #94870 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94870/testReport)** for PR 21990 at commit [`d5c37b7`](https://github.com/apache/spark/commit/d5c37b732f1948d8240bd8de33a080ac5db03571).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r225811960
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
           SparkSession.clearDefaultSession()
         }
       }
    +
    +  /**
    +   * Initialize extensions if the user has defined a configurator class in their SparkConf.
    +   * This class will be applied to the extensions passed into this function.
    +   */
    +  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) {
    --- End diff --
    
    I see, but in that case, we need to ensure that no injection of extensions is used in the default constructor to avoid initializing without extensions from the conf.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #97497 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97497/testReport)** for PR 21990 at commit [`3629c78`](https://github.com/apache/spark/commit/3629c78118eb77589ae1126e877f0fd2b57441ce).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r214572917
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3563,6 +3563,48 @@ def test_query_execution_listener_on_collect_with_arrow(self):
                     "The callback from the query execution listener should be called after 'toPandas'")
     
     
    +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
    +    # These tests are separate because it uses 'spark.sql.extensions' which is
    +    # static and immutable. This can't be set or unset, for example, via `spark.conf`.
    +
    +    @classmethod
    +    def setUpClass(cls):
    +        import glob
    +        from pyspark.find_spark_home import _find_spark_home
    +
    +        SPARK_HOME = _find_spark_home()
    +        filename_pattern = (
    +            "sql/core/target/scala-*/test-classes/org/apache/spark/sql/"
    +            "SparkSessionExtensionSuite.class")
    +        if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)):
    +            raise unittest.SkipTest(
    +                "'org.apache.spark.sql.SparkSessionExtensionSuite.' is not "
    --- End diff --
    
    nit: `SparkSessionExtensionSuite.` -> `SparkSessionExtensionSuite`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96828/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #96824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96824/testReport)** for PR 21990 at commit [`8a26728`](https://github.com/apache/spark/commit/8a2672806c6513692435bef113935f2e6f67a910).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #97472 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97472/testReport)** for PR 21990 at commit [`d9b2a55`](https://github.com/apache/spark/commit/d9b2a55275b74c406d9f9c435bf1b53a6ef4b35a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r224984675
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -219,6 +219,7 @@ def __init__(self, sparkContext, jsparkSession=None):
                     jsparkSession = self._jvm.SparkSession.getDefaultSession().get()
                 else:
                     jsparkSession = self._jvm.SparkSession(self._jsc.sc())
    +
    --- End diff --
    
    Oh haha, let's get rid of this change


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r225612270
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
           SparkSession.clearDefaultSession()
         }
       }
    +
    +  /**
    +   * Initialize extensions if the user has defined a configurator class in their SparkConf.
    +   * This class will be applied to the extensions passed into this function.
    +   */
    +  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) {
    --- End diff --
    
    I thought about this and was worried then about multiple invocations of the extensions
    Once every time the SparkSession is cloned


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94873/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #96199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96199/testReport)** for PR 21990 at commit [`def4f3e`](https://github.com/apache/spark/commit/def4f3e91b38e7d63c03fc7206489fd01220e878).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96825/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #97471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97471/testReport)** for PR 21990 at commit [`8ab76d4`](https://github.com/apache/spark/commit/8ab76d4385d1849c43ffd940d539c143e0ba8c81).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21990


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94151/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #94870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94870/testReport)** for PR 21990 at commit [`d5c37b7`](https://github.com/apache/spark/commit/d5c37b732f1948d8240bd8de33a080ac5db03571).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r224985166
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
           SparkSession.clearDefaultSession()
         }
       }
    +
    +  /**
    +   * Initialize extensions if the user has defined a configurator class in their SparkConf.
    +   * This class will be applied to the extensions passed into this function.
    +   */
    +  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) {
    +    val extensionConfOption = conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS)
    --- End diff --
    
    I think we can even only pass `conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS)` as its argument instead of `SparkConf`, and name it `applyExtensions`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r224985023
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3563,6 +3563,48 @@ def test_query_execution_listener_on_collect_with_arrow(self):
                     "The callback from the query execution listener should be called after 'toPandas'")
     
     
    +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
    --- End diff --
    
    I think `SQLTestUtils` is not needed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r224984813
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -86,6 +86,7 @@ class SparkSession private(
     
       private[sql] def this(sc: SparkContext) {
         this(sc, None, None, new SparkSessionExtensions)
    +    SparkSession.applyExtensionsFromConf(sc.getConf, this.extensions)
    --- End diff --
    
    Let's add some comments why this is only here in this constructor. It might look weird why this constructor specifically requires to run `applyExtensionsFromConf` alone.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    I'm fine with anything really, I still think the ideal solution is probably not to tie the creation of the py4j gateway to the SparkContext, but that's probably a much bigger refactor.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97472/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r225610975
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
           SparkSession.clearDefaultSession()
         }
       }
    +
    +  /**
    +   * Initialize extensions if the user has defined a configurator class in their SparkConf.
    +   * This class will be applied to the extensions passed into this function.
    +   */
    +  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) {
    --- End diff --
    
    The Default constructor of SparkSession?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r221436367
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -212,13 +212,17 @@ def __init__(self, sparkContext, jsparkSession=None):
             self._sc = sparkContext
             self._jsc = self._sc._jsc
             self._jvm = self._sc._jvm
    +
             if jsparkSession is None:
                 if self._jvm.SparkSession.getDefaultSession().isDefined() \
                         and not self._jvm.SparkSession.getDefaultSession().get() \
                             .sparkContext().isStopped():
                     jsparkSession = self._jvm.SparkSession.getDefaultSession().get()
                 else:
    -                jsparkSession = self._jvm.SparkSession(self._jsc.sc())
    +                extensions = self._sc._jvm.org.apache.spark.sql\
    --- End diff --
    
    tiny nit: just for consistency `extensions = self._sc._jvm.org.apache.spark.sql\` -> `extensions = self._sc._jvm.org.apache.spark.sql \`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97471/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #96828 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96828/testReport)** for PR 21990 at commit [`4ddaff8`](https://github.com/apache/spark/commit/4ddaff8fcd1bfedc221eda7d6f37e5e9f28bfb4e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    LGTM.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    > We could just translate the rest of GetOrCreate into Python but then every time there is a patch of the code in scala it will need a Python mod as well.
    
    @RussellSpitzer, I actually think we already duplicate some codes in Python side and Scala side at this code path now (see `getOrCreate` at Scala side and `__init__` at Python side). They are now duplicatedly executed with this change. I believe this isn't a orthogonal stuff to handle separately.
    
    If that requires a bit of duplicated codes in Python side to avoid duplicated code path executions, then I think that can be a workaround to get through for now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r209507286
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -218,7 +218,9 @@ def __init__(self, sparkContext, jsparkSession=None):
                             .sparkContext().isStopped():
                     jsparkSession = self._jvm.SparkSession.getDefaultSession().get()
                 else:
    -                jsparkSession = self._jvm.SparkSession(self._jsc.sc())
    +                jsparkSession = self._jvm.SparkSession.builder() \
    +                    .sparkContext(self._jsc.sc()) \
    +                    .getOrCreate()
    --- End diff --
    
    @RussellSpitzer, mind checking the logic `getOrCreate` inside Scala side and deduplicate them here while we are here? Some logics for instance setting default session, etc. are duplicated Here in Python side and there in Scala side.
    
    It would be nicer if we have some tests as well. `spark.sql.extensions` are static configuration, right? in that case, we could add a test, for example, please refer https://github.com/apache/spark/pull/21007. I added a test with static configuration before there.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #94149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94149/testReport)** for PR 21990 at commit [`f790ae0`](https://github.com/apache/spark/commit/f790ae000ae1c3f4030162028186448d345e2984).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Added new method of injecting extensions, this way the "getOrCreate" code from the scala method is not needed. @HyukjinKwon 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r210428804
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -218,7 +218,9 @@ def __init__(self, sparkContext, jsparkSession=None):
                             .sparkContext().isStopped():
                     jsparkSession = self._jvm.SparkSession.getDefaultSession().get()
                 else:
    -                jsparkSession = self._jvm.SparkSession(self._jsc.sc())
    +                jsparkSession = self._jvm.SparkSession.builder() \
    +                    .sparkContext(self._jsc.sc()) \
    +                    .getOrCreate()
    --- End diff --
    
    Yeah let me add in the test, and then I'll clear out all the python duplication of Scala code. I can make it more of a wrapper and less of a reimplementer.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #96253 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96253/testReport)** for PR 21990 at commit [`7775c08`](https://github.com/apache/spark/commit/7775c0886a7ae2be35e3caa974763f7b23954c6f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #96828 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96828/testReport)** for PR 21990 at commit [`4ddaff8`](https://github.com/apache/spark/commit/4ddaff8fcd1bfedc221eda7d6f37e5e9f28bfb4e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #96824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96824/testReport)** for PR 21990 at commit [`8a26728`](https://github.com/apache/spark/commit/8a2672806c6513692435bef113935f2e6f67a910).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #97510 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97510/testReport)** for PR 21990 at commit [`3629c78`](https://github.com/apache/spark/commit/3629c78118eb77589ae1126e877f0fd2b57441ce).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94149/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #96827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96827/testReport)** for PR 21990 at commit [`cf7cf75`](https://github.com/apache/spark/commit/cf7cf75efc616151d75e1da10630731fa98e117e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #96825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96825/testReport)** for PR 21990 at commit [`27c9d28`](https://github.com/apache/spark/commit/27c9d28430d5c0cc5404947d9ec690d53a626c2d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    I'm +1 on switching to the builder and not using the private interface.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r210909196
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3563,6 +3563,51 @@ def test_query_execution_listener_on_collect_with_arrow(self):
                     "The callback from the query execution listener should be called after 'toPandas'")
     
     
    +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
    +    # These tests are separate because it uses 'spark.sql.extensions' which is
    +    # static and immutable. This can't be set or unset, for example, via `spark.conf`.
    +
    +    @classmethod
    +    def setUpClass(cls):
    +        import glob
    +        from pyspark.find_spark_home import _find_spark_home
    +
    +        SPARK_HOME = _find_spark_home()
    +        filename_pattern = (
    +            "sql/core/target/scala-*/test-classes/org/apache/spark/sql/"
    +            "SparkSessionExtensionSuite.class")
    +        if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)):
    +            raise unittest.SkipTest(
    +                "'org.apache.spark.sql.SparkSessionExtensionSuite.' is not "
    +                "available. Will skip the related tests.")
    +
    +        # Note that 'spark.sql.extensions' is a static immutable configuration.
    +        cls.spark = SparkSession.builder \
    +            .master("local[4]") \
    +            .appName(cls.__name__) \
    +            .config(
    +                "spark.sql.extensions",
    +                "org.apache.spark.sql.MyExtensions") \
    +            .getOrCreate()
    +
    +    @classmethod
    +    def tearDownClass(cls):
    +        cls.spark.stop()
    +
    +    def tearDown(self):
    +        self.spark._jvm.OnSuccessCall.clear()
    --- End diff --
    
    Sounds good to me, i'll take that out.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r210790531
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3563,6 +3563,51 @@ def test_query_execution_listener_on_collect_with_arrow(self):
                     "The callback from the query execution listener should be called after 'toPandas'")
     
     
    +class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):
    +    # These tests are separate because it uses 'spark.sql.extensions' which is
    +    # static and immutable. This can't be set or unset, for example, via `spark.conf`.
    +
    +    @classmethod
    +    def setUpClass(cls):
    +        import glob
    +        from pyspark.find_spark_home import _find_spark_home
    +
    +        SPARK_HOME = _find_spark_home()
    +        filename_pattern = (
    +            "sql/core/target/scala-*/test-classes/org/apache/spark/sql/"
    +            "SparkSessionExtensionSuite.class")
    +        if not glob.glob(os.path.join(SPARK_HOME, filename_pattern)):
    +            raise unittest.SkipTest(
    +                "'org.apache.spark.sql.SparkSessionExtensionSuite.' is not "
    +                "available. Will skip the related tests.")
    +
    +        # Note that 'spark.sql.extensions' is a static immutable configuration.
    +        cls.spark = SparkSession.builder \
    +            .master("local[4]") \
    +            .appName(cls.__name__) \
    +            .config(
    +                "spark.sql.extensions",
    +                "org.apache.spark.sql.MyExtensions") \
    --- End diff --
    
    @RussellSpitzer, I think you should push `MyExtensions` scala side code too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r225608605
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -219,6 +219,7 @@ def __init__(self, sparkContext, jsparkSession=None):
                     jsparkSession = self._jvm.SparkSession.getDefaultSession().get()
                 else:
                     jsparkSession = self._jvm.SparkSession(self._jsc.sc())
    +
    --- End diff --
    
    I'm addicted to whitespace apparently


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96257/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r224984751
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -86,6 +86,7 @@ class SparkSession private(
     
       private[sql] def this(sc: SparkContext) {
    --- End diff --
    
    @RussellSpitzer, mind if I ask to note that this is currently not used in Scala code base but PySpark and tests in Scala side?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #94975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94975/testReport)** for PR 21990 at commit [`21ff627`](https://github.com/apache/spark/commit/21ff6273a7fcf54c2f40abde951d4aab89f96a02).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    What I wanted was to just call the Scala Methods, instead of having half the code and half in python, but we create the JVM in the SparkContext creation code so this ends up not being a good method I think. We could just translate the rest of GetOrCreate into Python but then every time there is a patch of the code in scala it will need a Python mod as well. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #96825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96825/testReport)** for PR 21990 at commit [`27c9d28`](https://github.com/apache/spark/commit/27c9d28430d5c0cc5404947d9ec690d53a626c2d).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #96199 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96199/testReport)** for PR 21990 at commit [`def4f3e`](https://github.com/apache/spark/commit/def4f3e91b38e7d63c03fc7206489fd01220e878).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r225992993
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
           SparkSession.clearDefaultSession()
         }
       }
    +
    +  /**
    +   * Initialize extensions if the user has defined a configurator class in their SparkConf.
    +   * This class will be applied to the extensions passed into this function.
    +   */
    +  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) {
    --- End diff --
    
    Updated with replacement then :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #94866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94866/testReport)** for PR 21990 at commit [`7ba70b5`](https://github.com/apache/spark/commit/7ba70b524f9779529142f6c70b04610b5b068a05).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21990#discussion_r221436261
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala ---
    @@ -169,3 +178,29 @@ class SparkSessionExtensions {
         parserBuilders += builder
       }
     }
    +
    +object SparkSessionExtensions extends Logging {
    +
    +  /**
    +   * Initialize extensions if the user has defined a configurator class in their SparkConf.
    +   * This class will be applied to the extensions passed into this function.
    +   */
    +  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) {
    --- End diff --
    
    Can you just put this in `SparkSession` object and make it `private[spark]`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21990
  
    **[Test build #94151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94151/testReport)** for PR 21990 at commit [`84c2513`](https://github.com/apache/spark/commit/84c2513a2cb7a50ba01cdc0d49e2ad7f9583799b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org