You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2016/05/03 03:50:34 UTC

[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/12860

    [SPARK-15084][PYTHON][SQL] Use builder pattern to create SparkSession in PySpark.

    ## What changes were proposed in this pull request?
    
    This is a python port of corresponding Scala builder pattern code. `sql.py` is modified as a target example case.
    
    ## How was this patch tested?
    
    Manual.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-15084

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12860.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12860
    
----
commit 2b55814bd077ae6acc1ea9ee12a1117707b8f2af
Author: Dongjoon Hyun <do...@apache.org>
Date:   2016-05-03T03:46:00Z

    [SPARK-15084][PYSPARK] Use builder pattern to create SparkSession in PySpark.
    
    This is a port of corresponding Scala builder pattern code.
    `sql.py` is modified as a target example case.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216435366
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57593/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216724708
  
    Thank you, @andrewor14 !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216431710
  
    **[Test build #57592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57592/consoleFull)** for PR 12860 at commit [`07a926b`](https://github.com/apache/spark/commit/07a926bca1c17cf99acd1d4ccc1e89a404bde17e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12860#discussion_r61929305
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -445,6 +452,86 @@ def read(self):
             """
             return DataFrameReader(self._wrapped)
     
    +    @classmethod
    +    @since(2.0)
    +    def builder(cls):
    +        """Returns a new :class:`SparkSession.Builder` for constructing a :class:`SparkSession`.
    +        """
    +        return SparkSession.Builder()
    +
    +    class Builder(object):
    +        """Builder for :class:`SparkSession`.
    +        """
    +
    +        _lock = RLock()
    --- End diff --
    
    Since we create an Builder() every time, so it do need to be thread safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216693876
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12860#discussion_r61841998
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -445,6 +452,77 @@ def read(self):
             """
             return DataFrameReader(self._wrapped)
     
    +    @classmethod
    +    @since(2.0)
    +    def builder(cls):
    +        """Returns a new :class:`SparkSession.Builder` for constructing a :class:`SparkSession`.
    +        """
    +        return SparkSession.Builder()
    +
    +    class Builder(object):
    +        """Builder for :class:`SparkSession`.
    +        """
    +
    +        _lock = RLock()
    +        _options = {}
    +
    +        @since(2.0)
    +        def config(self, key=None, value=None, conf=None):
    +            """Sets a config option. Options set using this method are automatically propagated to
    +            both :class:`SparkConf` and :class:`SparkSession`'s own configuration.
    +
    +            :param key: a key name string for configuration property
    +            :param value: a value for configuration property
    +            :param conf: an instance of :class:`SparkConf`
    +            """
    +            with SparkSession.Builder._lock:
    --- End diff --
    
    self._lock


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216435351
  
    **[Test build #57593 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57593/consoleFull)** for PR 12860 at commit [`1fe7fb0`](https://github.com/apache/spark/commit/1fe7fb0af52e6a5f27e66daf0783d22c045c4f07).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216713413
  
    Thanks, merging into master 2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216430890
  
    **[Test build #57589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57589/consoleFull)** for PR 12860 at commit [`2b55814`](https://github.com/apache/spark/commit/2b55814bd077ae6acc1ea9ee12a1117707b8f2af).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `    class Builder(object):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216433356
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216690794
  
    **[Test build #57684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57684/consoleFull)** for PR 12860 at commit [`589cba8`](https://github.com/apache/spark/commit/589cba8e237b034e96c5b23891236ceb998c6f0c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12860#discussion_r61970538
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -58,10 +59,16 @@ def toDF(self, schema=None, sampleRatio=None):
     
     
     class SparkSession(object):
    -    """Main entry point for Spark SQL functionality.
    +    r"""The entry point to programming Spark with the Dataset and DataFrame API.
    --- End diff --
    
    Yep. I'll remove that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12860#discussion_r61968765
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -58,10 +59,16 @@ def toDF(self, schema=None, sampleRatio=None):
     
     
     class SparkSession(object):
    -    """Main entry point for Spark SQL functionality.
    +    r"""The entry point to programming Spark with the Dataset and DataFrame API.
    --- End diff --
    
    Oh, I added that since Python3 complains on multiline commands.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216706125
  
    Hi, @davies and @andrewor14 . Now, it's updated.
    
    - Add `stop` in `SparkSession`
    - Update builder pattern according to the Scala versions.
      - One builder per one SparkSession.
    - Clean up docs.
    
    For the calling Scala's `.getOrCreate`, I prefer to keep Python builder similar to Scala one. After updating Scala builder, I hope to update this in the same manner easily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216620533
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57649/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12860#discussion_r61841944
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -445,6 +452,77 @@ def read(self):
             """
             return DataFrameReader(self._wrapped)
     
    +    @classmethod
    +    @since(2.0)
    +    def builder(cls):
    +        """Returns a new :class:`SparkSession.Builder` for constructing a :class:`SparkSession`.
    +        """
    +        return SparkSession.Builder()
    +
    +    class Builder(object):
    +        """Builder for :class:`SparkSession`.
    +        """
    +
    +        _lock = RLock()
    +        _options = {}
    +
    +        @since(2.0)
    +        def config(self, key=None, value=None, conf=None):
    +            """Sets a config option. Options set using this method are automatically propagated to
    +            both :class:`SparkConf` and :class:`SparkSession`'s own configuration.
    +
    +            :param key: a key name string for configuration property
    +            :param value: a value for configuration property
    +            :param conf: an instance of :class:`SparkConf`
    +            """
    +            with SparkSession.Builder._lock:
    +                if conf is None:
    +                    self._options[key] = str(value)
    +                else:
    +                    for (k, v) in conf.getAll():
    +                        self._options[k] = v
    +                return self
    +
    +        @since(2.0)
    +        def master(self, master):
    +            """Sets the Spark master URL to connect to, such as "local" to run locally, "local[4]"
    +            to run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone
    +            cluster.
    +
    +            :param master: a url for spark master
    +            """
    +            return self.config("spark.master", master)
    +
    +        @since(2.0)
    +        def appName(self, name):
    +            """Sets a name for the application, which will be shown in the Spark web UI.
    +
    +            :param name: an application name
    +            """
    +            return self.config("spark.app.name", name)
    +
    +        @since(2.0)
    +        def enableHiveSupport(self):
    +            """Enables Hive support, including connectivity to a persistent Hive metastore, support
    +            for Hive serdes, and Hive user-defined functions.
    +            """
    +            return self.config("spark.sql.catalogImplementation", "hive")
    +
    +        @since(2.0)
    +        def getOrCreate(self):
    +            """Gets an existing :class:`SparkSession` or, if there is no existing one, creates a new
    +            one based on the options set in this builder.
    +            """
    +            with SparkSession.Builder._lock:
    +                from pyspark.conf import SparkConf
    +                from pyspark.context import SparkContext
    +                from pyspark.sql.context import SQLContext
    +                sparkConf = SparkConf()
    +                for key, value in self._options.items():
    +                    sparkConf.set(key, value)
    +                sparkContext = SparkContext.getOrCreate(sparkConf)
    +                return SQLContext.getOrCreate(sparkContext).sparkSession
    +
    --- End diff --
    
    We could create a builder here, then we can use it like this:
    ```
    SparkSession.builder.master().getOrCreater()
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12860#discussion_r61929447
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -445,6 +452,77 @@ def read(self):
             """
             return DataFrameReader(self._wrapped)
     
    +    @classmethod
    +    @since(2.0)
    +    def builder(cls):
    +        """Returns a new :class:`SparkSession.Builder` for constructing a :class:`SparkSession`.
    +        """
    +        return SparkSession.Builder()
    +
    +    class Builder(object):
    +        """Builder for :class:`SparkSession`.
    +        """
    +
    +        _lock = RLock()
    +        _options = {}
    +
    +        @since(2.0)
    +        def config(self, key=None, value=None, conf=None):
    +            """Sets a config option. Options set using this method are automatically propagated to
    +            both :class:`SparkConf` and :class:`SparkSession`'s own configuration.
    +
    +            :param key: a key name string for configuration property
    +            :param value: a value for configuration property
    +            :param conf: an instance of :class:`SparkConf`
    +            """
    +            with SparkSession.Builder._lock:
    +                if conf is None:
    +                    self._options[key] = str(value)
    +                else:
    +                    for (k, v) in conf.getAll():
    +                        self._options[k] = v
    +                return self
    +
    +        @since(2.0)
    +        def master(self, master):
    +            """Sets the Spark master URL to connect to, such as "local" to run locally, "local[4]"
    +            to run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone
    +            cluster.
    +
    +            :param master: a url for spark master
    +            """
    +            return self.config("spark.master", master)
    +
    +        @since(2.0)
    +        def appName(self, name):
    +            """Sets a name for the application, which will be shown in the Spark web UI.
    +
    +            :param name: an application name
    +            """
    +            return self.config("spark.app.name", name)
    +
    +        @since(2.0)
    +        def enableHiveSupport(self):
    +            """Enables Hive support, including connectivity to a persistent Hive metastore, support
    +            for Hive serdes, and Hive user-defined functions.
    +            """
    +            return self.config("spark.sql.catalogImplementation", "hive")
    +
    +        @since(2.0)
    +        def getOrCreate(self):
    +            """Gets an existing :class:`SparkSession` or, if there is no existing one, creates a new
    +            one based on the options set in this builder.
    +            """
    +            with SparkSession.Builder._lock:
    +                from pyspark.conf import SparkConf
    +                from pyspark.context import SparkContext
    +                from pyspark.sql.context import SQLContext
    +                sparkConf = SparkConf()
    +                for key, value in self._options.items():
    +                    sparkConf.set(key, value)
    +                sparkContext = SparkContext.getOrCreate(sparkConf)
    +                return SQLContext.getOrCreate(sparkContext).sparkSession
    +
    --- End diff --
    
    nvm, we also create a Builder every time in Scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12860#discussion_r61969146
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -58,10 +59,16 @@ def toDF(self, schema=None, sampleRatio=None):
     
     
     class SparkSession(object):
    -    """Main entry point for Spark SQL functionality.
    +    r"""The entry point to programming Spark with the Dataset and DataFrame API.
    --- End diff --
    
    aren't there multiline comments everywhere else?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216430923
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57589/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216433350
  
    **[Test build #57592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57592/consoleFull)** for PR 12860 at commit [`07a926b`](https://github.com/apache/spark/commit/07a926bca1c17cf99acd1d4ccc1e89a404bde17e).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216692033
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12860#discussion_r61837000
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -445,6 +446,77 @@ def read(self):
             """
             return DataFrameReader(self._wrapped)
     
    +    @classmethod
    +    @since(2.0)
    --- End diff --
    
    Yep. It's updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216688891
  
    **[Test build #57683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57683/consoleFull)** for PR 12860 at commit [`ac5bc68`](https://github.com/apache/spark/commit/ac5bc68d22942c369cc3a79e5462b1eba54accd3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216620498
  
    **[Test build #57649 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57649/consoleFull)** for PR 12860 at commit [`6cd7740`](https://github.com/apache/spark/commit/6cd7740da667514eb95070daeec5f783318a05e2).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216447683
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12860#discussion_r61929215
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -58,10 +59,16 @@ def toDF(self, schema=None, sampleRatio=None):
     
     
     class SparkSession(object):
    -    """Main entry point for Spark SQL functionality.
    +    r"""The entry point to programming Spark with the Dataset and DataFrame API.
    --- End diff --
    
    why add this 'r' here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216601861
  
    Thank you for review, @davies . I'll update soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216429783
  
    cc @davies can you take a look at the builder API? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216447634
  
    **[Test build #57602 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57602/consoleFull)** for PR 12860 at commit [`356bc85`](https://github.com/apache/spark/commit/356bc859a3d928055118f945c8ec4d3ae0a0da77).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216614939
  
    @davies . I addressed two comments, but I'm not sure about the first one.
    We need to change `ScalaSession.Builder` first if we want to change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216666678
  
    Great! Thank you, @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216433357
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57592/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12860#discussion_r61836206
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -445,6 +446,77 @@ def read(self):
             """
             return DataFrameReader(self._wrapped)
     
    +    @classmethod
    +    @since(2.0)
    --- End diff --
    
    we should update SparkSession's doc itself to indicate how to create it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216623388
  
    Thank you, @davies and @andrewor14 .
    Ya, it's still evolving! No problem. After merging #12873 , I'll update accordingly again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216663715
  
    It's been merged!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12860#discussion_r61842033
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -445,6 +452,77 @@ def read(self):
             """
             return DataFrameReader(self._wrapped)
     
    +    @classmethod
    +    @since(2.0)
    +    def builder(cls):
    +        """Returns a new :class:`SparkSession.Builder` for constructing a :class:`SparkSession`.
    +        """
    +        return SparkSession.Builder()
    +
    +    class Builder(object):
    +        """Builder for :class:`SparkSession`.
    +        """
    +
    +        _lock = RLock()
    +        _options = {}
    +
    +        @since(2.0)
    +        def config(self, key=None, value=None, conf=None):
    +            """Sets a config option. Options set using this method are automatically propagated to
    +            both :class:`SparkConf` and :class:`SparkSession`'s own configuration.
    +
    +            :param key: a key name string for configuration property
    +            :param value: a value for configuration property
    +            :param conf: an instance of :class:`SparkConf`
    --- End diff --
    
    Could you put some example here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12860#discussion_r61930511
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -445,6 +452,86 @@ def read(self):
             """
             return DataFrameReader(self._wrapped)
     
    +    @classmethod
    +    @since(2.0)
    +    def builder(cls):
    +        """Returns a new :class:`SparkSession.Builder` for constructing a :class:`SparkSession`.
    +        """
    +        return SparkSession.Builder()
    +
    +    class Builder(object):
    +        """Builder for :class:`SparkSession`.
    +        """
    +
    +        _lock = RLock()
    +        _options = {}
    +
    +        @since(2.0)
    +        def config(self, key=None, value=None, conf=None):
    +            """Sets a config option. Options set using this method are automatically propagated to
    +            both :class:`SparkConf` and :class:`SparkSession`'s own configuration.
    +
    +            For an existing SparkConf, use `conf` parameter.
    +            >>> from pyspark.conf import SparkConf
    +            >>> SparkSession.builder().config(conf=SparkConf())
    +            <pyspark.sql.session.Builder object at ...>
    +
    +            For a (key, value) pair, you can omit parameter names.
    +            >>> SparkSession.builder().config("spark.some.config.option", "some-value")
    +            <pyspark.sql.session.Builder object at ...>
    +
    +            :param key: a key name string for configuration property
    +            :param value: a value for configuration property
    +            :param conf: an instance of :class:`SparkConf`
    +            """
    +            with self._lock:
    +                if conf is None:
    +                    self._options[key] = str(value)
    +                else:
    +                    for (k, v) in conf.getAll():
    +                        self._options[k] = v
    +                return self
    +
    +        @since(2.0)
    +        def master(self, master):
    +            """Sets the Spark master URL to connect to, such as "local" to run locally, "local[4]"
    +            to run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone
    +            cluster.
    +
    +            :param master: a url for spark master
    +            """
    +            return self.config("spark.master", master)
    +
    +        @since(2.0)
    +        def appName(self, name):
    +            """Sets a name for the application, which will be shown in the Spark web UI.
    +
    +            :param name: an application name
    +            """
    +            return self.config("spark.app.name", name)
    +
    +        @since(2.0)
    +        def enableHiveSupport(self):
    +            """Enables Hive support, including connectivity to a persistent Hive metastore, support
    +            for Hive serdes, and Hive user-defined functions.
    +            """
    +            return self.config("spark.sql.catalogImplementation", "hive")
    +
    +        @since(2.0)
    +        def getOrCreate(self):
    +            """Gets an existing :class:`SparkSession` or, if there is no existing one, creates a new
    +            one based on the options set in this builder.
    +            """
    +            with self._lock:
    +                from pyspark.conf import SparkConf
    +                from pyspark.context import SparkContext
    +                from pyspark.sql.context import SQLContext
    +                sparkConf = SparkConf()
    +                for key, value in self._options.items():
    +                    sparkConf.set(key, value)
    +                sparkContext = SparkContext.getOrCreate(sparkConf)
    +                return SQLContext.getOrCreate(sparkContext).sparkSession
    --- End diff --
    
    if we just call scala's `getOrCreate` here then we don't need to fix this in the future


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216447685
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57602/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216434348
  
    **[Test build #57593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57593/consoleFull)** for PR 12860 at commit [`1fe7fb0`](https://github.com/apache/spark/commit/1fe7fb0af52e6a5f27e66daf0783d22c045c4f07).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216429563
  
    **[Test build #57589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57589/consoleFull)** for PR 12860 at commit [`2b55814`](https://github.com/apache/spark/commit/2b55814bd077ae6acc1ea9ee12a1117707b8f2af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216445508
  
    **[Test build #57602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57602/consoleFull)** for PR 12860 at commit [`356bc85`](https://github.com/apache/spark/commit/356bc859a3d928055118f945c8ec4d3ae0a0da77).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216691977
  
    **[Test build #57683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57683/consoleFull)** for PR 12860 at commit [`ac5bc68`](https://github.com/apache/spark/commit/ac5bc68d22942c369cc3a79e5462b1eba54accd3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `    class Builder(object):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12860#discussion_r61928761
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -445,6 +452,86 @@ def read(self):
             """
             return DataFrameReader(self._wrapped)
     
    +    @classmethod
    +    @since(2.0)
    +    def builder(cls):
    +        """Returns a new :class:`SparkSession.Builder` for constructing a :class:`SparkSession`.
    +        """
    +        return SparkSession.Builder()
    +
    +    class Builder(object):
    +        """Builder for :class:`SparkSession`.
    +        """
    +
    +        _lock = RLock()
    +        _options = {}
    +
    +        @since(2.0)
    +        def config(self, key=None, value=None, conf=None):
    +            """Sets a config option. Options set using this method are automatically propagated to
    +            both :class:`SparkConf` and :class:`SparkSession`'s own configuration.
    +
    +            For an existing SparkConf, use `conf` parameter.
    +            >>> from pyspark.conf import SparkConf
    +            >>> SparkSession.builder().config(conf=SparkConf())
    +            <pyspark.sql.session.Builder object at ...>
    +
    +            For a (key, value) pair, you can omit parameter names.
    +            >>> SparkSession.builder().config("spark.some.config.option", "some-value")
    +            <pyspark.sql.session.Builder object at ...>
    +
    +            :param key: a key name string for configuration property
    +            :param value: a value for configuration property
    +            :param conf: an instance of :class:`SparkConf`
    +            """
    +            with self._lock:
    +                if conf is None:
    +                    self._options[key] = str(value)
    +                else:
    +                    for (k, v) in conf.getAll():
    +                        self._options[k] = v
    +                return self
    +
    +        @since(2.0)
    +        def master(self, master):
    +            """Sets the Spark master URL to connect to, such as "local" to run locally, "local[4]"
    +            to run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone
    +            cluster.
    +
    +            :param master: a url for spark master
    +            """
    +            return self.config("spark.master", master)
    +
    +        @since(2.0)
    +        def appName(self, name):
    +            """Sets a name for the application, which will be shown in the Spark web UI.
    +
    +            :param name: an application name
    +            """
    +            return self.config("spark.app.name", name)
    +
    +        @since(2.0)
    +        def enableHiveSupport(self):
    +            """Enables Hive support, including connectivity to a persistent Hive metastore, support
    +            for Hive serdes, and Hive user-defined functions.
    +            """
    +            return self.config("spark.sql.catalogImplementation", "hive")
    +
    +        @since(2.0)
    +        def getOrCreate(self):
    +            """Gets an existing :class:`SparkSession` or, if there is no existing one, creates a new
    +            one based on the options set in this builder.
    +            """
    +            with self._lock:
    +                from pyspark.conf import SparkConf
    +                from pyspark.context import SparkContext
    +                from pyspark.sql.context import SQLContext
    +                sparkConf = SparkConf()
    +                for key, value in self._options.items():
    +                    sparkConf.set(key, value)
    +                sparkContext = SparkContext.getOrCreate(sparkConf)
    +                return SQLContext.getOrCreate(sparkContext).sparkSession
    --- End diff --
    
    It's weird to use SQLContext to create an SparkSession, can't we create an SparkSession directly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216693795
  
    **[Test build #57684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57684/consoleFull)** for PR 12860 at commit [`589cba8`](https://github.com/apache/spark/commit/589cba8e237b034e96c5b23891236ceb998c6f0c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216430921
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216615278
  
    **[Test build #57649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57649/consoleFull)** for PR 12860 at commit [`6cd7740`](https://github.com/apache/spark/commit/6cd7740da667514eb95070daeec5f783318a05e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216620530
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12860


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12860#discussion_r61929340
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -445,6 +452,86 @@ def read(self):
             """
             return DataFrameReader(self._wrapped)
     
    +    @classmethod
    +    @since(2.0)
    +    def builder(cls):
    +        """Returns a new :class:`SparkSession.Builder` for constructing a :class:`SparkSession`.
    +        """
    +        return SparkSession.Builder()
    +
    +    class Builder(object):
    +        """Builder for :class:`SparkSession`.
    +        """
    +
    +        _lock = RLock()
    +        _options = {}
    +
    +        @since(2.0)
    +        def config(self, key=None, value=None, conf=None):
    +            """Sets a config option. Options set using this method are automatically propagated to
    +            both :class:`SparkConf` and :class:`SparkSession`'s own configuration.
    +
    +            For an existing SparkConf, use `conf` parameter.
    +            >>> from pyspark.conf import SparkConf
    +            >>> SparkSession.builder().config(conf=SparkConf())
    +            <pyspark.sql.session.Builder object at ...>
    +
    +            For a (key, value) pair, you can omit parameter names.
    +            >>> SparkSession.builder().config("spark.some.config.option", "some-value")
    +            <pyspark.sql.session.Builder object at ...>
    +
    +            :param key: a key name string for configuration property
    +            :param value: a value for configuration property
    +            :param conf: an instance of :class:`SparkConf`
    +            """
    +            with self._lock:
    +                if conf is None:
    +                    self._options[key] = str(value)
    +                else:
    +                    for (k, v) in conf.getAll():
    +                        self._options[k] = v
    +                return self
    +
    +        @since(2.0)
    +        def master(self, master):
    +            """Sets the Spark master URL to connect to, such as "local" to run locally, "local[4]"
    +            to run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone
    +            cluster.
    +
    +            :param master: a url for spark master
    +            """
    +            return self.config("spark.master", master)
    +
    +        @since(2.0)
    +        def appName(self, name):
    +            """Sets a name for the application, which will be shown in the Spark web UI.
    +
    +            :param name: an application name
    +            """
    +            return self.config("spark.app.name", name)
    +
    +        @since(2.0)
    +        def enableHiveSupport(self):
    +            """Enables Hive support, including connectivity to a persistent Hive metastore, support
    +            for Hive serdes, and Hive user-defined functions.
    +            """
    +            return self.config("spark.sql.catalogImplementation", "hive")
    +
    +        @since(2.0)
    +        def getOrCreate(self):
    +            """Gets an existing :class:`SparkSession` or, if there is no existing one, creates a new
    +            one based on the options set in this builder.
    +            """
    +            with self._lock:
    +                from pyspark.conf import SparkConf
    +                from pyspark.context import SparkContext
    +                from pyspark.sql.context import SQLContext
    +                sparkConf = SparkConf()
    +                for key, value in self._options.items():
    +                    sparkConf.set(key, value)
    +                sparkContext = SparkContext.getOrCreate(sparkConf)
    +                return SQLContext.getOrCreate(sparkContext).sparkSession
    --- End diff --
    
    We also use that in Scala, it's OK for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216429487
  
    @rxin .
    This is the initial commit to confirm the direction. Could you give me some advice?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216619217
  
    Looks good otherwise


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216693879
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57684/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216692036
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57683/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12860#issuecomment-216435365
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15084][PYTHON][SQL] Use builder pattern...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12860#discussion_r61928851
  
    --- Diff: examples/src/main/python/sql.py ---
    @@ -57,24 +52,22 @@
         else:
             path = sys.argv[1]
         # Create a DataFrame from the file(s) pointed to by path
    -    people = sqlContext.jsonFile(path)
    +    people = spark.read.json(path)
         # root
         #  |-- person_name: string (nullable = false)
         #  |-- person_age: integer (nullable = false)
     
         # The inferred schema can be visualized using the printSchema() method.
         people.printSchema()
         # root
    -    #  |-- age: IntegerType
    -    #  |-- name: StringType
    +    #  |-- age: long (nullable = true)
    +    #  |-- name: string (nullable = true)
     
         # Register this DataFrame as a table.
    -    people.registerAsTable("people")
    +    people.registerTempTable("people")
     
         # SQL statements can be run by using the sql methods provided by sqlContext
    -    teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")
    +    teenagers = spark.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")
     
         for each in teenagers.collect():
             print(each[0])
    -
    -    sc.stop()
    --- End diff --
    
    we still need to do this. Once we merge #12873 we'll have a `spark.stop()` method available in scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org