You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by xuanyuanking <gi...@git.apache.org> on 2018/06/27 05:44:56 UTC

[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...

GitHub user xuanyuanking opened a pull request:

    https://github.com/apache/spark/pull/21648

    [SPARK-24665][PySpark] Add SQLConf in PySpark to manage all sql configs

    ## What changes were proposed in this pull request?
    
    Add SQLConf and ConfigEntry for PySpark to manage all sql configs, drop all the hard code in config usage.
    
    ## How was this patch tested?
    
    Existing UT.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuanyuanking/spark SPARK-24665

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21648.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21648
    
----
commit 6a77ed523a27b98cc2c2887ebb85703039c318a4
Author: Yuanjian Li <xy...@...>
Date:   2018-06-26T15:12:17Z

    Move all core configs into a single class

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92518/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199385685
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -93,6 +93,11 @@ def _ssql_ctx(self):
             """
             return self._jsqlContext
     
    +    @property
    +    def _conf(self):
    +        """Accessor for the JVM SQL-specific configurations"""
    +        return self.sparkSession._jsparkSession.sessionState().conf()
    --- End diff --
    
    I'm fine with it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21648


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199378189
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -93,6 +93,11 @@ def _ssql_ctx(self):
             """
             return self._jsqlContext
     
    +    @property
    +    def _conf(self):
    +        """Accessor for the JVM SQL-specific configurations"""
    +        return self.sparkSession._jsparkSession.sessionState().conf()
    --- End diff --
    
    What's the difference between this and `SQLContext.getConf` ? Can users being confused by two similar APIs?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199370180
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -93,6 +93,11 @@ def _ssql_ctx(self):
             """
             return self._jsqlContext
     
    +    @property
    +    def _conf(self):
    +        """Get SQLConf from current SqlContext"""
    --- End diff --
    
    nit: `Accessor for the JVM SQL-specific configurations`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    I changed the PR title and description cause maybe we can just use the SQLConf in SessionState, don't need to do a extra wrapping work.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92496/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/611/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199386302
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -93,6 +93,11 @@ def _ssql_ctx(self):
             """
             return self._jsqlContext
     
    +    @property
    +    def _conf(self):
    +        """Accessor for the JVM SQL-specific configurations"""
    +        return self.sparkSession._jsparkSession.sessionState().conf()
    --- End diff --
    
    I am okay too but let's just leave it. It's private and not a big deal though.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92372/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    **[Test build #92496 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92496/testReport)** for PR 21648 at commit [`b816549`](https://github.com/apache/spark/commit/b8165495f2ad68d54f3f2ba6b4247d58825bd41c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    **[Test build #92496 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92496/testReport)** for PR 21648 at commit [`b816549`](https://github.com/apache/spark/commit/b8165495f2ad68d54f3f2ba6b4247d58825bd41c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r198377709
  
    --- Diff: python/pyspark/sql/conf.py ---
    @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier):
                                 (identifier, obj, type(obj).__name__))
     
     
    +class ConfigEntry(object):
    +    """An entry contains all meta information for a configuration"""
    +
    +    def __init__(self, confKey):
    +        """Create a new ConfigEntry with config key"""
    +        self.confKey = confKey
    +        self.converter = None
    +        self.default = _NoValue
    +
    +    def boolConf(self):
    +        """Designate current config entry is boolean config"""
    +        self.converter = lambda x: str(x).lower() == "true"
    +        return self
    +
    +    def intConf(self):
    +        """Designate current config entry is integer config"""
    +        self.converter = lambda x: int(x)
    +        return self
    +
    +    def stringConf(self):
    +        """Designate current config entry is string config"""
    +        self.converter = lambda x: str(x)
    +        return self
    +
    +    def withDefault(self, default):
    +        """Give a default value for current config entry, the default value will be set
    +        to _NoValue when its absent"""
    +        self.default = default
    +        return self
    +
    +    def read(self, ctx):
    +        """Read value from this config entry through sql context"""
    +        return self.converter(ctx.getConf(self.confKey, self.default))
    +
    +
    +class SQLConf(object):
    +    """A class that enables the getting of SQL config parameters in pyspark"""
    +
    +    REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\
    --- End diff --
    
    Great, thanks for your guidance, I'll take a look at them in detail.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r198376125
  
    --- Diff: python/pyspark/sql/conf.py ---
    @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier):
                                 (identifier, obj, type(obj).__name__))
     
     
    +class ConfigEntry(object):
    +    """An entry contains all meta information for a configuration"""
    +
    +    def __init__(self, confKey):
    +        """Create a new ConfigEntry with config key"""
    +        self.confKey = confKey
    +        self.converter = None
    +        self.default = _NoValue
    +
    +    def boolConf(self):
    +        """Designate current config entry is boolean config"""
    +        self.converter = lambda x: str(x).lower() == "true"
    +        return self
    +
    +    def intConf(self):
    +        """Designate current config entry is integer config"""
    +        self.converter = lambda x: int(x)
    +        return self
    +
    +    def stringConf(self):
    +        """Designate current config entry is string config"""
    +        self.converter = lambda x: str(x)
    +        return self
    +
    +    def withDefault(self, default):
    +        """Give a default value for current config entry, the default value will be set
    +        to _NoValue when its absent"""
    +        self.default = default
    +        return self
    +
    +    def read(self, ctx):
    +        """Read value from this config entry through sql context"""
    +        return self.converter(ctx.getConf(self.confKey, self.default))
    +
    +
    +class SQLConf(object):
    +    """A class that enables the getting of SQL config parameters in pyspark"""
    +
    +    REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\
    --- End diff --
    
    or [this hack](https://github.com/apache/spark/blob/173fe450df203b262b58f7e71c6b52a79db95ee0/python/pyspark/ml/image.py#L37-L234)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r198378065
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -358,22 +360,19 @@ def show(self, n=20, truncate=True, vertical=False):
         def _eager_eval(self):
    --- End diff --
    
    Yep, agree.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92497/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r198375767
  
    --- Diff: python/pyspark/sql/conf.py ---
    @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier):
                                 (identifier, obj, type(obj).__name__))
     
     
    +class ConfigEntry(object):
    +    """An entry contains all meta information for a configuration"""
    +
    +    def __init__(self, confKey):
    +        """Create a new ConfigEntry with config key"""
    +        self.confKey = confKey
    +        self.converter = None
    +        self.default = _NoValue
    +
    +    def boolConf(self):
    +        """Designate current config entry is boolean config"""
    +        self.converter = lambda x: str(x).lower() == "true"
    +        return self
    +
    +    def intConf(self):
    +        """Designate current config entry is integer config"""
    +        self.converter = lambda x: int(x)
    +        return self
    +
    +    def stringConf(self):
    +        """Designate current config entry is string config"""
    +        self.converter = lambda x: str(x)
    +        return self
    +
    +    def withDefault(self, default):
    +        """Give a default value for current config entry, the default value will be set
    +        to _NoValue when its absent"""
    +        self.default = default
    +        return self
    +
    +    def read(self, ctx):
    +        """Read value from this config entry through sql context"""
    +        return self.converter(ctx.getConf(self.confKey, self.default))
    +
    +
    +class SQLConf(object):
    +    """A class that enables the getting of SQL config parameters in pyspark"""
    +
    +    REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\
    +        .boolConf()\
    +        .withDefault("false")
    +
    +    REPL_EAGER_EVAL_MAX_NUM_ROWS = ConfigEntry("spark.sql.repl.eagerEval.maxNumRows")\
    +        .intConf()\
    +        .withDefault("20")
    +
    +    REPL_EAGER_EVAL_TRUNCATE = ConfigEntry("spark.sql.repl.eagerEval.truncate")\
    +        .intConf()\
    +        .withDefault("20")
    +
    +    PANDAS_RESPECT_SESSION_LOCAL_TIMEZONE = \
    +        ConfigEntry("spark.sql.execution.pandas.respectSessionTimeZone")\
    +        .boolConf()
    +
    +    SESSION_LOCAL_TIMEZONE = ConfigEntry("spark.sql.session.timeZone")\
    +        .stringConf()
    +
    +    ARROW_EXECUTION_ENABLED = ConfigEntry("spark.sql.execution.arrow.enabled")\
    +        .boolConf()\
    +        .withDefault("false")
    +
    +    ARROW_FALLBACK_ENABLED = ConfigEntry("spark.sql.execution.arrow.fallback.enabled")\
    +        .boolConf()\
    +        .withDefault("true")
    --- End diff --
    
    For current approach, It introduces another chunk of codes - 100 addition and it doesn't quite look extendable too. I suggested few possible thoughts below. If that's difficult or impossible, I wouldn't do this for now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    **[Test build #92518 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92518/testReport)** for PR 21648 at commit [`1f40375`](https://github.com/apache/spark/commit/1f40375f4ac75977ab0e989bb1ed9e04bd697763).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Seems fine if the tests pass. I (or someone else) should take another look before merging it though.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199380206
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -93,6 +93,11 @@ def _ssql_ctx(self):
             """
             return self._jsqlContext
     
    +    @property
    +    def _conf(self):
    +        """Accessor for the JVM SQL-specific configurations"""
    +        return self.sparkSession._jsparkSession.sessionState().conf()
    --- End diff --
    
    Ok. Make sense to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to manage ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    **[Test build #92372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92372/testReport)** for PR 21648 at commit [`6a77ed5`](https://github.com/apache/spark/commit/6a77ed523a27b98cc2c2887ebb85703039c318a4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/506/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    LGTM for the changes.
    I'm just wondering if all config types are supported by the PY4J method call. Do we need some tests for that?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r198375388
  
    --- Diff: python/pyspark/sql/conf.py ---
    @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier):
                                 (identifier, obj, type(obj).__name__))
     
     
    +class ConfigEntry(object):
    +    """An entry contains all meta information for a configuration"""
    +
    +    def __init__(self, confKey):
    +        """Create a new ConfigEntry with config key"""
    +        self.confKey = confKey
    +        self.converter = None
    +        self.default = _NoValue
    +
    +    def boolConf(self):
    +        """Designate current config entry is boolean config"""
    +        self.converter = lambda x: str(x).lower() == "true"
    +        return self
    +
    +    def intConf(self):
    +        """Designate current config entry is integer config"""
    +        self.converter = lambda x: int(x)
    +        return self
    +
    +    def stringConf(self):
    +        """Designate current config entry is string config"""
    +        self.converter = lambda x: str(x)
    +        return self
    +
    +    def withDefault(self, default):
    +        """Give a default value for current config entry, the default value will be set
    +        to _NoValue when its absent"""
    +        self.default = default
    +        return self
    +
    +    def read(self, ctx):
    +        """Read value from this config entry through sql context"""
    +        return self.converter(ctx.getConf(self.confKey, self.default))
    +
    +
    +class SQLConf(object):
    +    """A class that enables the getting of SQL config parameters in pyspark"""
    +
    +    REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\
    +        .boolConf()\
    +        .withDefault("false")
    +
    +    REPL_EAGER_EVAL_MAX_NUM_ROWS = ConfigEntry("spark.sql.repl.eagerEval.maxNumRows")\
    --- End diff --
    
    I would also look up the attributes via Py4J and dynamically define the attributes here. It's internal purpose


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199379810
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -93,6 +93,11 @@ def _ssql_ctx(self):
             """
             return self._jsqlContext
     
    +    @property
    +    def _conf(self):
    +        """Accessor for the JVM SQL-specific configurations"""
    +        return self.sparkSession._jsparkSession.sessionState().conf()
    --- End diff --
    
    Btw, `_conf` sounds like a property internal to the class/object, why we have a underscore there? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r198375472
  
    --- Diff: python/pyspark/sql/conf.py ---
    @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier):
                                 (identifier, obj, type(obj).__name__))
     
     
    +class ConfigEntry(object):
    +    """An entry contains all meta information for a configuration"""
    +
    +    def __init__(self, confKey):
    +        """Create a new ConfigEntry with config key"""
    +        self.confKey = confKey
    +        self.converter = None
    +        self.default = _NoValue
    +
    +    def boolConf(self):
    +        """Designate current config entry is boolean config"""
    +        self.converter = lambda x: str(x).lower() == "true"
    +        return self
    +
    +    def intConf(self):
    +        """Designate current config entry is integer config"""
    +        self.converter = lambda x: int(x)
    +        return self
    +
    +    def stringConf(self):
    +        """Designate current config entry is string config"""
    +        self.converter = lambda x: str(x)
    +        return self
    +
    +    def withDefault(self, default):
    +        """Give a default value for current config entry, the default value will be set
    +        to _NoValue when its absent"""
    +        self.default = default
    +        return self
    +
    +    def read(self, ctx):
    +        """Read value from this config entry through sql context"""
    +        return self.converter(ctx.getConf(self.confKey, self.default))
    +
    +
    +class SQLConf(object):
    +    """A class that enables the getting of SQL config parameters in pyspark"""
    +
    +    REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\
    +        .boolConf()\
    +        .withDefault("false")
    +
    +    REPL_EAGER_EVAL_MAX_NUM_ROWS = ConfigEntry("spark.sql.repl.eagerEval.maxNumRows")\
    +        .intConf()\
    +        .withDefault("20")
    +
    +    REPL_EAGER_EVAL_TRUNCATE = ConfigEntry("spark.sql.repl.eagerEval.truncate")\
    +        .intConf()\
    +        .withDefault("20")
    +
    +    PANDAS_RESPECT_SESSION_LOCAL_TIMEZONE = \
    +        ConfigEntry("spark.sql.execution.pandas.respectSessionTimeZone")\
    +        .boolConf()
    +
    +    SESSION_LOCAL_TIMEZONE = ConfigEntry("spark.sql.session.timeZone")\
    +        .stringConf()
    +
    +    ARROW_EXECUTION_ENABLED = ConfigEntry("spark.sql.execution.arrow.enabled")\
    +        .boolConf()\
    +        .withDefault("false")
    +
    +    ARROW_FALLBACK_ENABLED = ConfigEntry("spark.sql.execution.arrow.fallback.enabled")\
    +        .boolConf()\
    +        .withDefault("true")
    --- End diff --
    
    Just want to remove the hard coding as we discussed in https://github.com/apache/spark/pull/21370#discussion_r194276735. For the duplication of Scala code, currently I have an idea is just call buildConf and doc in Scala side to register the config and leave its doc, and manage the name also default value in python SQLConf. May I ask your suggestion? :) Thx.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199379417
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -93,6 +93,11 @@ def _ssql_ctx(self):
             """
             return self._jsqlContext
     
    +    @property
    +    def _conf(self):
    +        """Accessor for the JVM SQL-specific configurations"""
    +        return self.sparkSession._jsparkSession.sessionState().conf()
    --- End diff --
    
    ```
    What's the difference between this and SQLContext.getConf ?
    ```
    getConf returns the value of Spark SQL conf for given key and we add this _conf wants to access the helper methods in SQLConf. Actually this following the behavior in SQLContext.scala. https://github.com/apache/spark/blob/3c0c2d09ca89c6b6247137823169db17847dfae3/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L80
    
    ```
    Can users being confused by two similar APIs?
    ```
    Maybe we should add more comments to explain this? AFAIC its more natural than get conf in hard code config key.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r198373530
  
    --- Diff: python/pyspark/sql/conf.py ---
    @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier):
                                 (identifier, obj, type(obj).__name__))
     
     
    +class ConfigEntry(object):
    +    """An entry contains all meta information for a configuration"""
    +
    +    def __init__(self, confKey):
    +        """Create a new ConfigEntry with config key"""
    +        self.confKey = confKey
    +        self.converter = None
    +        self.default = _NoValue
    +
    +    def boolConf(self):
    +        """Designate current config entry is boolean config"""
    +        self.converter = lambda x: str(x).lower() == "true"
    +        return self
    +
    +    def intConf(self):
    +        """Designate current config entry is integer config"""
    +        self.converter = lambda x: int(x)
    +        return self
    +
    +    def stringConf(self):
    +        """Designate current config entry is string config"""
    +        self.converter = lambda x: str(x)
    +        return self
    +
    +    def withDefault(self, default):
    +        """Give a default value for current config entry, the default value will be set
    +        to _NoValue when its absent"""
    +        self.default = default
    +        return self
    +
    +    def read(self, ctx):
    +        """Read value from this config entry through sql context"""
    +        return self.converter(ctx.getConf(self.confKey, self.default))
    +
    +
    +class SQLConf(object):
    +    """A class that enables the getting of SQL config parameters in pyspark"""
    +
    +    REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\
    +        .boolConf()\
    +        .withDefault("false")
    +
    +    REPL_EAGER_EVAL_MAX_NUM_ROWS = ConfigEntry("spark.sql.repl.eagerEval.maxNumRows")\
    +        .intConf()\
    +        .withDefault("20")
    +
    +    REPL_EAGER_EVAL_TRUNCATE = ConfigEntry("spark.sql.repl.eagerEval.truncate")\
    +        .intConf()\
    +        .withDefault("20")
    +
    +    PANDAS_RESPECT_SESSION_LOCAL_TIMEZONE = \
    +        ConfigEntry("spark.sql.execution.pandas.respectSessionTimeZone")\
    +        .boolConf()
    +
    +    SESSION_LOCAL_TIMEZONE = ConfigEntry("spark.sql.session.timeZone")\
    +        .stringConf()
    +
    +    ARROW_EXECUTION_ENABLED = ConfigEntry("spark.sql.execution.arrow.enabled")\
    +        .boolConf()\
    +        .withDefault("false")
    +
    +    ARROW_FALLBACK_ENABLED = ConfigEntry("spark.sql.execution.arrow.fallback.enabled")\
    +        .boolConf()\
    +        .withDefault("true")
    --- End diff --
    
    This duplicates codes in Scala side. What's a gain by doing this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199385323
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -93,6 +93,11 @@ def _ssql_ctx(self):
             """
             return self._jsqlContext
     
    +    @property
    +    def _conf(self):
    +        """Accessor for the JVM SQL-specific configurations"""
    +        return self.sparkSession._jsparkSession.sessionState().conf()
    --- End diff --
    
    Thanks for your advise, I want to change the comments in 
    ```
    """Accessor for the JVM SQL-specific configurations. 
    
    This property returns a SQLConf which keep same behavior in scala SQLContext,
    we mainly use this to call the helper methods in SQLConf.
    """
    ```
    Do you think it's reasonable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r198375444
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -81,6 +82,7 @@ def __init__(self, jdf, sql_ctx):
             # Check whether _repr_html is supported or not, we use it to avoid calling _jdf twice
             # by __repr__ and _repr_html_ while eager evaluation opened.
             self._support_repr_html = False
    +        self.sql_conf = SQLConf(sql_ctx)
    --- End diff --
    
    I'd make this "private" `self._sql_conf`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r198375498
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -81,6 +82,7 @@ def __init__(self, jdf, sql_ctx):
             # Check whether _repr_html is supported or not, we use it to avoid calling _jdf twice
             # by __repr__ and _repr_html_ while eager evaluation opened.
             self._support_repr_html = False
    +        self.sql_conf = SQLConf(sql_ctx)
    --- End diff --
    
    BTW, does Scala side has such attribute? If no, I won't do it in this way.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199316323
  
    --- Diff: python/pyspark/sql/conf.py ---
    @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier):
                                 (identifier, obj, type(obj).__name__))
     
     
    +class ConfigEntry(object):
    +    """An entry contains all meta information for a configuration"""
    +
    +    def __init__(self, confKey):
    +        """Create a new ConfigEntry with config key"""
    +        self.confKey = confKey
    +        self.converter = None
    +        self.default = _NoValue
    +
    +    def boolConf(self):
    +        """Designate current config entry is boolean config"""
    +        self.converter = lambda x: str(x).lower() == "true"
    +        return self
    +
    +    def intConf(self):
    +        """Designate current config entry is integer config"""
    +        self.converter = lambda x: int(x)
    +        return self
    +
    +    def stringConf(self):
    +        """Designate current config entry is string config"""
    +        self.converter = lambda x: str(x)
    +        return self
    +
    +    def withDefault(self, default):
    +        """Give a default value for current config entry, the default value will be set
    +        to _NoValue when its absent"""
    +        self.default = default
    +        return self
    +
    +    def read(self, ctx):
    +        """Read value from this config entry through sql context"""
    +        return self.converter(ctx.getConf(self.confKey, self.default))
    +
    +
    +class SQLConf(object):
    +    """A class that enables the getting of SQL config parameters in pyspark"""
    +
    +    REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\
    --- End diff --
    
    During see the currently implement, maybe we can directly use the SQLConf in SessionState? I do this in last commit 453004a. Please have a look when you have time, thanks :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199316817
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -93,6 +93,10 @@ def _ssql_ctx(self):
             """
             return self._jsqlContext
     
    +    def conf(self):
    --- End diff --
    
    Thanks, done in 4fc0ae4.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r198375556
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -358,22 +360,19 @@ def show(self, n=20, truncate=True, vertical=False):
         def _eager_eval(self):
    --- End diff --
    
    If this can be replaced by sql_conf access, we don't really need this private function.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199387596
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -93,6 +93,11 @@ def _ssql_ctx(self):
             """
             return self._jsqlContext
     
    +    @property
    +    def _conf(self):
    +        """Accessor for the JVM SQL-specific configurations"""
    +        return self.sparkSession._jsparkSession.sessionState().conf()
    --- End diff --
    
    Got it :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    I roughly double checked most of primitive types like Boolean, Long and String work via Py4J, and some codes related with it in Py4J. Some types like `Option[...]` will probably just return the Py4J object (via those private helper methods like `isReplEagerEvalEnabled`).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    **[Test build #92497 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92497/testReport)** for PR 21648 at commit [`4fc0ae4`](https://github.com/apache/spark/commit/4fc0ae4410edcf9890a89bb0b47d0d633bfb3dda).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to manage ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    **[Test build #92372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92372/testReport)** for PR 21648 at commit [`6a77ed5`](https://github.com/apache/spark/commit/6a77ed523a27b98cc2c2887ebb85703039c318a4).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class ConfigEntry(object):`
      * `class SQLConf(object):`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r198375176
  
    --- Diff: python/pyspark/sql/conf.py ---
    @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier):
                                 (identifier, obj, type(obj).__name__))
     
     
    +class ConfigEntry(object):
    +    """An entry contains all meta information for a configuration"""
    +
    +    def __init__(self, confKey):
    +        """Create a new ConfigEntry with config key"""
    +        self.confKey = confKey
    +        self.converter = None
    +        self.default = _NoValue
    +
    +    def boolConf(self):
    +        """Designate current config entry is boolean config"""
    +        self.converter = lambda x: str(x).lower() == "true"
    +        return self
    +
    +    def intConf(self):
    +        """Designate current config entry is integer config"""
    +        self.converter = lambda x: int(x)
    +        return self
    +
    +    def stringConf(self):
    +        """Designate current config entry is string config"""
    +        self.converter = lambda x: str(x)
    +        return self
    +
    +    def withDefault(self, default):
    +        """Give a default value for current config entry, the default value will be set
    +        to _NoValue when its absent"""
    +        self.default = default
    +        return self
    +
    +    def read(self, ctx):
    +        """Read value from this config entry through sql context"""
    +        return self.converter(ctx.getConf(self.confKey, self.default))
    +
    +
    +class SQLConf(object):
    +    """A class that enables the getting of SQL config parameters in pyspark"""
    +
    +    REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\
    --- End diff --
    
    Can we do this by wrapping existing SQLConf? We can make them static properties by, for example, [this hack](https://github.com/graphframes/graphframes/pull/169/files#diff-e81e6b169c0aa35012a3263b2f31b330R381)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r198378267
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -81,6 +82,7 @@ def __init__(self, jdf, sql_ctx):
             # Check whether _repr_html is supported or not, we use it to avoid calling _jdf twice
             # by __repr__ and _repr_html_ while eager evaluation opened.
             self._support_repr_html = False
    +        self.sql_conf = SQLConf(sql_ctx)
    --- End diff --
    
    Got it, I'll try to do this by wrapping existing SQLConf.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199316401
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -93,6 +93,10 @@ def _ssql_ctx(self):
             """
             return self._jsqlContext
     
    +    def conf(self):
    --- End diff --
    
    Let's make this `_conf` with a property.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    **[Test build #92518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92518/testReport)** for PR 21648 at commit [`1f40375`](https://github.com/apache/spark/commit/1f40375f4ac75977ab0e989bb1ed9e04bd697763).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r198377607
  
    --- Diff: python/pyspark/sql/conf.py ---
    @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier):
                                 (identifier, obj, type(obj).__name__))
     
     
    +class ConfigEntry(object):
    +    """An entry contains all meta information for a configuration"""
    +
    +    def __init__(self, confKey):
    +        """Create a new ConfigEntry with config key"""
    +        self.confKey = confKey
    +        self.converter = None
    +        self.default = _NoValue
    +
    +    def boolConf(self):
    +        """Designate current config entry is boolean config"""
    +        self.converter = lambda x: str(x).lower() == "true"
    +        return self
    +
    +    def intConf(self):
    +        """Designate current config entry is integer config"""
    +        self.converter = lambda x: int(x)
    +        return self
    +
    +    def stringConf(self):
    +        """Designate current config entry is string config"""
    +        self.converter = lambda x: str(x)
    +        return self
    +
    +    def withDefault(self, default):
    +        """Give a default value for current config entry, the default value will be set
    +        to _NoValue when its absent"""
    +        self.default = default
    +        return self
    +
    +    def read(self, ctx):
    +        """Read value from this config entry through sql context"""
    +        return self.converter(ctx.getConf(self.confKey, self.default))
    +
    +
    +class SQLConf(object):
    +    """A class that enables the getting of SQL config parameters in pyspark"""
    +
    +    REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\
    +        .boolConf()\
    +        .withDefault("false")
    +
    +    REPL_EAGER_EVAL_MAX_NUM_ROWS = ConfigEntry("spark.sql.repl.eagerEval.maxNumRows")\
    +        .intConf()\
    +        .withDefault("20")
    +
    +    REPL_EAGER_EVAL_TRUNCATE = ConfigEntry("spark.sql.repl.eagerEval.truncate")\
    +        .intConf()\
    +        .withDefault("20")
    +
    +    PANDAS_RESPECT_SESSION_LOCAL_TIMEZONE = \
    +        ConfigEntry("spark.sql.execution.pandas.respectSessionTimeZone")\
    +        .boolConf()
    +
    +    SESSION_LOCAL_TIMEZONE = ConfigEntry("spark.sql.session.timeZone")\
    +        .stringConf()
    +
    +    ARROW_EXECUTION_ENABLED = ConfigEntry("spark.sql.execution.arrow.enabled")\
    +        .boolConf()\
    +        .withDefault("false")
    +
    +    ARROW_FALLBACK_ENABLED = ConfigEntry("spark.sql.execution.arrow.fallback.enabled")\
    +        .boolConf()\
    +        .withDefault("true")
    --- End diff --
    
    Got it, thanks for your advice.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199316328
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -358,22 +360,19 @@ def show(self, n=20, truncate=True, vertical=False):
         def _eager_eval(self):
    --- End diff --
    
    Thanks, Done in 453004a.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/599/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199373295
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -93,6 +93,11 @@ def _ssql_ctx(self):
             """
             return self._jsqlContext
     
    +    @property
    +    def _conf(self):
    +        """Get SQLConf from current SqlContext"""
    --- End diff --
    
    Thanks and done in next commit.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/598/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    **[Test build #92497 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92497/testReport)** for PR 21648 at commit [`4fc0ae4`](https://github.com/apache/spark/commit/4fc0ae4410edcf9890a89bb0b47d0d633bfb3dda).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to manage ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/21648
  
    Got it, thanks again for your advise and guidance.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199379979
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -93,6 +93,11 @@ def _ssql_ctx(self):
             """
             return self._jsqlContext
     
    +    @property
    +    def _conf(self):
    +        """Accessor for the JVM SQL-specific configurations"""
    +        return self.sparkSession._jsparkSession.sessionState().conf()
    --- End diff --
    
    This one should be considered as an internal API only to use it to reduce the hardcoded ones and the leading underscore implies it. I think we are fine with it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Use SQLConf in PySpark to ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21648#discussion_r199379854
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -93,6 +93,11 @@ def _ssql_ctx(self):
             """
             return self._jsqlContext
     
    +    @property
    +    def _conf(self):
    +        """Accessor for the JVM SQL-specific configurations"""
    +        return self.sparkSession._jsparkSession.sessionState().conf()
    --- End diff --
    
    > Maybe we should add more comments to explain this? AFAIC its more natural than get conf in hard code config key.
    
    Yeah, I think this is good idea.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org