You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2018/02/03 18:33:14 UTC

[GitHub] spark pull request #20499: [SPARK-19454][PYTHON][FOLLOWUP] Disallow default ...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/20499

    [SPARK-19454][PYTHON][FOLLOWUP] Disallow default value None when 'to_replace' is not a dictionary

    ## What changes were proposed in this pull request?
    
    This PR proposes to disallow default value None when 'to_replace' is not a dictionary.
    
    It seems weird we set the default value of `value` to `None` and we ended up allowing the case as below:
    
    ```python
    >>> df.show()
    ```
    ```
    +----+------+-----+
    | age|height| name|
    +----+------+-----+
    |  10|    80|Alice|
    ...
    ```
    
    ```python
    >>> df.na.replace('Alice').show()
    ```
    ```
    +----+------+----+
    | age|height|name|
    +----+------+----+
    |  10|    80|null|
    ...
    ```
    
    **After**
    
    This PR targets to disallow the case above:
    
    ```python
    >>> df.na.replace('Alice').show()
    ```
    ```
    ...
    TypeError: value is required when to_replace is not a dictionary.
    ```
    
    while we still allow when `to_replace` is a dictionary:
    
    ```python
    >>> df.na.replace({'Alice': None}).show()
    ```
    ```
    +----+------+----+
    | age|height|name|
    +----+------+----+
    |  10|    80|null|
    ...
    ```
    
    This is tricky because we should `value` is explicitly set or not. So, this PR uses `args` and `kwargs`.
    
    So, there are manual arguments checks:
    
    ```
    >>> df.na.replace(1, 2, 3, 4).show()
    ...
    TypeError: Arguments are expected to be to_replace, value and subset; however, got [1, 2, 3, 4]
    ```
    
    ```
    >>> df.na.replace("a", foo=1, bar=1).show()
    ...
    TypeError: Arguments are expected to be to_replace, value and subset; however, got unexpected keyword arguments: [foo, bar]
    ```
    
    One small downside of this is pydoc because now we use args and kwargs signature:
    
    ```
    replace(self, to_replace, *args, **kwargs)
    ```
    
    ## How was this patch tested?
    
    Manually tested, tests were added in `python/pyspark/sql/tests.py` and doctests were fixed.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-19454-followup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20499.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20499
    
----
commit 13bdc24696563a85283a9bb6d9dc3f9bb2bedaf2
Author: hyukjinkwon <gu...@...>
Date:   2018-02-03T18:14:04Z

    Disallow default value None when 'to_replace' is not a dictionary

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/706/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r165834947
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1587,6 +1600,51 @@ def replace(self, to_replace, value=None, subset=None):
             |null|  null|null|
             +----+------+----+
             """
    +
    +        # It deals with a problem when 'value' is set to None and 'to_replace' is a dictionary.
    +        is_more_than_two = len(args) + len(kwargs) > 2
    +        if is_more_than_two:
    +            raise TypeError(
    +                "Arguments are expected to be to_replace, value and subset; however, got "
    +                "[%s]" % ", ".join(map(str, [to_replace] + list(args) + list(kwargs.values()))))
    +
    +        is_unexpected_kwargs = \
    +            len(kwargs) == 2 and ("value" not in kwargs or "subset" not in kwargs)
    +        if is_unexpected_kwargs:
    +            raise TypeError(
    +                "Arguments are expected to be to_replace, value and subset; however, got "
    +                "unexpected keyword arguments: [%s]" % ", ".join(
    +                    filter(lambda a: a not in ["value", "subset"], kwargs.keys())))
    +
    +        if "value" in kwargs and "subset" in kwargs:
    +            # replace(..., value=..., subset=...)
    +            # replace(..., subset=..., value=...)
    +            value = kwargs["value"]
    +            subset = kwargs["subset"]
    +        elif "value" in kwargs:
    +            # replace(..., ..., value=...)
    +            value = kwargs["value"]
    +            subset = args[0] if len(args) == 1 else None
    +        elif "subset" in kwargs:
    +            # replace(..., ..., subset=...)
    +            if len(args) == 1:
    +                value = args[0]
    +            elif isinstance(to_replace, dict):
    +                value = None  # When to_replace is a dictionary, value can be omitted.
    +            else:
    +                raise TypeError("value is required when to_replace is not a dictionary.")
    --- End diff --
    
    Hm, I think that check is still valid. The newly added logic here focuses on checking missing arguments whereas the below logics focus on checking if arguments are valid.
    
    Will try to add a explicit test for https://github.com/apache/spark/pull/20499#discussion_r165828448 case with few comment changes.
    
    For https://github.com/apache/spark/pull/20499#discussion_r165828519, I just tried to check. Seems we should keep that `None` to support:
    
    ```python
    >>> df.na.replace('Alice', None).show()
    ```
    
    ```
    +----+------+----+
    | age|height|name|
    +----+------+----+
    |  10|    80|null|
    ...
    ```
    
    ```
    ...
    ValueError: If to_replace is not a dict, value should be a bool, float, int, long, string, list, tuple or None. Got <type 'NoneType'>
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Will update this tonight


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Yup, I should fix the guide for 2.2 anyway :-) Will open a backport tonight KST.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87205/testReport)** for PR 20499 at commit [`885b4d0`](https://github.com/apache/spark/commit/885b4d00af53dfd0148c431fdacce9a2789f32a2).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Yup, sounds good.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87051/testReport)** for PR 20499 at commit [`5c18594`](https://github.com/apache/spark/commit/5c185941312d5379761d7600127c735eb5b27084).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87206/testReport)** for PR 20499 at commit [`885b4d0`](https://github.com/apache/spark/commit/885b4d00af53dfd0148c431fdacce9a2789f32a2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Sure, let me unset the target version.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    I think that behavior is shipped in 2.2, right? Then we may need to add a note in migration guide.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-19454][PYTHON][FOLLOWUP] Disallow default ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r165821715
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2245,11 +2245,6 @@ def test_replace(self):
                    .replace(False, True).first())
             self.assertTupleEqual(row, (True, True))
     
    -        # replace list while value is not given (default to None)
    --- End diff --
    
    Seems we should disallow the case. Please see https://github.com/apache/spark/pull/16793#issuecomment-362684399


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r165951235
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2186,7 +2186,7 @@ def test_replace(self):
             # replace with subset specified with one column replaced, another column not in subset
             # stays unchanged.
             row = self.spark.createDataFrame(
    -            [(u'Alice', 10, 10.0)], schema).replace(10, 20, subset=['name', 'age']).first()
    +            [(u'Alice', 10, 10.0)], schema).replace(10, value=20, subset=['name', 'age']).first()
    --- End diff --
    
    I don't think it's necessary but let me keep them since at least it tests different combinations of valid cases.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    LGTM, waiting for more feedbacks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r166216324
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2175,7 +2175,7 @@ def test_replace(self):
     
             # replace with subset specified by a string of a column name w/ actual change
             row = self.spark.createDataFrame(
    -            [(u'Alice', 10, 80.1)], schema).replace(10, 20, subset='age').first()
    +            [(u'Alice', 10, 80.1)], schema).replace(10, 'age', value=20).first()
    --- End diff --
    
    Will this conflict with the convention of function argument in Python?
    
    Usually, I think the arguments before keyword arg are resolved by position. But now `age` is resolved to `subset` which is the third argument behind `value`.
    
    Since the function signature is changed, this may not be a big issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/702/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r166290471
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1532,7 +1532,7 @@ def fillna(self, value, subset=None):
                 return DataFrame(self._jdf.na().fill(value, self._jseq(subset)), self.sql_ctx)
     
         @since(1.4)
    -    def replace(self, to_replace, value=None, subset=None):
    +    def replace(self, to_replace, *args, **kwargs):
    --- End diff --
    
    I see the problem now. If `to_replace` is dict, then `value` should be ignored and we should provide a default value. If `to_replace` is not dict, then `value` is required and we should not provide a default value.
    
    Can we use an invalid value as the default value for `value`? Then we can throw exception if the `value` is not set by user.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87201/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r166302987
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1532,7 +1532,7 @@ def fillna(self, value, subset=None):
                 return DataFrame(self._jdf.na().fill(value, self._jseq(subset)), self.sql_ctx)
     
         @since(1.4)
    -    def replace(self, to_replace, value=None, subset=None):
    +    def replace(self, to_replace, *args, **kwargs):
    --- End diff --
    
    It's just as is:
    
    ```
    def replace(self, to_replace, *args, **kwargs)
    ```
    
    but this is better than `replace(self, to_replace, value={}, subset=None)` IMHO.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    I think it's a bug fix. For the context,
    
    1. the dictionary support was added but `value` was required (and ignored) at the first place [SPARK-6876](https://issues.apache.org/jira/browse/SPARK-6876):
    
        ```python
        df.replace({"Alice": "Bob"}, value=1).show()
        ```
    
    2. `value=None` was added to allow the case below [SPARK-19454](https://issues.apache.org/jira/browse/SPARK-19454):
    
        ```python
        df.replace({"Alice": "Bob"}).show()
        ```
    
        but it happened to allow the case below too:
    
        ```
        df.replace("Alice").show()
        ```
    
    3. `df.na.replace` is an alias of `df.replace`. So, it has the same default value `value=None` [SPARK-21658](https://issues.apache.org/jira/browse/SPARK-21658):
    
        ```python
        df.replace({"Alice": "Bob"}).show()
        ```
    
    4. An design issue was raised in [SPARK-21658](https://issues.apache.org/jira/browse/SPARK-21658) and  https://github.com/apache/spark/pull/16793#issuecomment-362684399 ([SPARK-19454](https://issues.apache.org/jira/browse/SPARK-19454)).
    
        To cut it short, I think the issue raised is basically that the case below is weird:
    
        ```python
        df.replace("Alice").show()
        ```
    
    5.  [SPARK-21658](https://issues.apache.org/jira/browse/SPARK-21658) was reverted.
    
    5. This PR deals tries to throw an exception in 4. case, and also deals with [SPARK-21658](https://issues.apache.org/jira/browse/SPARK-21658)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON][FOLLOWUP] Disallow default value N...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87037/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20499


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r165950192
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1557,6 +1557,9 @@ def replace(self, to_replace, value=None, subset=None):
                 For example, if `value` is a string, and subset contains a non-string column,
                 then the non-string column is simply ignored.
     
    +        .. note:: `value` can only be omitted when `to_replace` is a dictionary. Otherwise,
    +            it is required.
    --- End diff --
    
    Sure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/698/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON][FOLLOWUP] Disallow default value N...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON][FOLLOWUP] Disallow default value N...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    The linked JIRA targets 2.3.0 and it was an alternative of reverting https://github.com/apache/spark/pull/20496#issuecomment-362843558 .. Let me rebase it here anyway ..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r166286067
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1532,7 +1532,7 @@ def fillna(self, value, subset=None):
                 return DataFrame(self._jdf.na().fill(value, self._jseq(subset)), self.sql_ctx)
     
         @since(1.4)
    -    def replace(self, to_replace, value=None, subset=None):
    +    def replace(self, to_replace, *args, **kwargs):
    --- End diff --
    
    Yes, if `value` is explicitly given, I thought ignoring `value` as we did from the first place.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r166320786
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1532,7 +1532,7 @@ def fillna(self, value, subset=None):
                 return DataFrame(self._jdf.na().fill(value, self._jseq(subset)), self.sql_ctx)
     
         @since(1.4)
    -    def replace(self, to_replace, value=None, subset=None):
    +    def replace(self, to_replace, *args, **kwargs):
    --- End diff --
    
    Yea, to me either way works to me. Let me try to look around this a bit more and give a shot to show how it looks like.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r165884365
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1587,6 +1600,52 @@ def replace(self, to_replace, value=None, subset=None):
             |null|  null|null|
             +----+------+----+
             """
    +
    +        # It deals with a problem when 'value' is set to None and 'to_replace' is a dictionary.
    +        # Validate if arguments are missing or not.
    +        is_more_than_two = len(args) + len(kwargs) > 2
    +        if is_more_than_two:
    +            raise TypeError(
    +                "The number of arguments should not be more than 3; however, got "
    +                "%s arguments." % len([to_replace] + list(args) + list(kwargs.values())))
    +
    +        is_unexpected_kwargs = \
    +            len(kwargs) == 2 and ("value" not in kwargs or "subset" not in kwargs)
    --- End diff --
    
    ```python
    df.na.replace({'Alice': 'Bob'}, foo = 'bar').show()
    ```
    
    Seems this case can't be detected?
    
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-19454][PYTHON][FOLLOWUP] Disallow default value N...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    @HyukjinKwon We need a separate PR and target it to 2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87198/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87042 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87042/testReport)** for PR 20499 at commit [`3da7ba6`](https://github.com/apache/spark/commit/3da7ba620a13ff5121556b3439a36c75d78d70f8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87201 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87201/testReport)** for PR 20499 at commit [`885b4d0`](https://github.com/apache/spark/commit/885b4d00af53dfd0148c431fdacce9a2789f32a2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87068 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87068/testReport)** for PR 20499 at commit [`1849f59`](https://github.com/apache/spark/commit/1849f5948d41d9a0a137a810b8a699755232f7cb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87196 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87196/testReport)** for PR 20499 at commit [`9f49b05`](https://github.com/apache/spark/commit/9f49b05de312495c84fccec93c82d0af8205eff3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class _NoValueType(object):`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87196/testReport)** for PR 20499 at commit [`9f49b05`](https://github.com/apache/spark/commit/9f49b05de312495c84fccec93c82d0af8205eff3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON][FOLLOWUP] Disallow default value N...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87038/testReport)** for PR 20499 at commit [`198bda4`](https://github.com/apache/spark/commit/198bda411e19c21712549255c1bb2c484a044e61).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON][FOLLOWUP] Disallow default value N...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87037 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87037/testReport)** for PR 20499 at commit [`13bdc24`](https://github.com/apache/spark/commit/13bdc24696563a85283a9bb6d9dc3f9bb2bedaf2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87206/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON][FOLLOWUP] Disallow default value N...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/567/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r166220816
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1545,8 +1545,8 @@ def replace(self, to_replace, value=None, subset=None):
     
             :param to_replace: bool, int, long, float, string, list or dict.
                 Value to be replaced.
    -            If the value is a dict, then `value` is ignored and `to_replace` must be a
    -            mapping between a value and a replacement.
    +            If the value is a dict, then `value` is ignored or can be omitted, and `to_replace`
    --- End diff --
    
    now there is no parameter named `value`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-19454][PYTHON][FOLLOWUP] Disallow default value N...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    cc @rxin, @gatorsmile, @holdenk, @zero323 and @viirya, this is an alternative of reverting its alias matching, and a fix to address https://github.com/apache/spark/pull/16793#issuecomment-362684399. Could you guys take a look and see if makes sense?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87038/testReport)** for PR 20499 at commit [`198bda4`](https://github.com/apache/spark/commit/198bda411e19c21712549255c1bb2c484a044e61).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r165883566
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2186,7 +2186,7 @@ def test_replace(self):
             # replace with subset specified with one column replaced, another column not in subset
             # stays unchanged.
             row = self.spark.createDataFrame(
    -            [(u'Alice', 10, 10.0)], schema).replace(10, 20, subset=['name', 'age']).first()
    +            [(u'Alice', 10, 10.0)], schema).replace(10, value=20, subset=['name', 'age']).first()
    --- End diff --
    
    Are the above two test changes necessary?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    I think this is what originally proposed in the JIRA:
    
    >It is possible to use dict as to_replace, but we cannot skip or use None as the value value (although it is ignored). This requires passing "magic" values:
    ```scala
    df = sc.parallelize([("Alice", 1, 3.0)]).toDF()
    df.replace({"Alice": "Bob"}, 1)
    ```
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r165828448
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1587,6 +1600,51 @@ def replace(self, to_replace, value=None, subset=None):
             |null|  null|null|
             +----+------+----+
             """
    +
    +        # It deals with a problem when 'value' is set to None and 'to_replace' is a dictionary.
    +        is_more_than_two = len(args) + len(kwargs) > 2
    +        if is_more_than_two:
    +            raise TypeError(
    +                "Arguments are expected to be to_replace, value and subset; however, got "
    +                "[%s]" % ", ".join(map(str, [to_replace] + list(args) + list(kwargs.values()))))
    +
    +        is_unexpected_kwargs = \
    +            len(kwargs) == 2 and ("value" not in kwargs or "subset" not in kwargs)
    +        if is_unexpected_kwargs:
    +            raise TypeError(
    +                "Arguments are expected to be to_replace, value and subset; however, got "
    +                "unexpected keyword arguments: [%s]" % ", ".join(
    +                    filter(lambda a: a not in ["value", "subset"], kwargs.keys())))
    +
    +        if "value" in kwargs and "subset" in kwargs:
    +            # replace(..., value=..., subset=...)
    +            # replace(..., subset=..., value=...)
    +            value = kwargs["value"]
    +            subset = kwargs["subset"]
    +        elif "value" in kwargs:
    +            # replace(..., ..., value=...)
    +            value = kwargs["value"]
    +            subset = args[0] if len(args) == 1 else None
    +        elif "subset" in kwargs:
    +            # replace(..., ..., subset=...)
    +            if len(args) == 1:
    +                value = args[0]
    +            elif isinstance(to_replace, dict):
    +                value = None  # When to_replace is a dictionary, value can be omitted.
    +            else:
    +                raise TypeError("value is required when to_replace is not a dictionary.")
    --- End diff --
    
    There are some old check below like:
    ```python
            if not isinstance(value, valid_types) and value is not None \
                    and not isinstance(to_replace, dict):
                raise ValueError("If to_replace is not a dict, value should be "
                                 "a bool, float, int, long, string, list, tuple or None. "
                                 "Got {0}".format(type(value)))
    ```
    
    Should we clean up it too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87068/testReport)** for PR 20499 at commit [`1849f59`](https://github.com/apache/spark/commit/1849f5948d41d9a0a137a810b8a699755232f7cb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    thanks, merging to master/2.3!
    
    Can you send a new PR for 2.2? it conflicts...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r166300631
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1532,7 +1532,7 @@ def fillna(self, value, subset=None):
                 return DataFrame(self._jdf.na().fill(value, self._jseq(subset)), self.sql_ctx)
     
         @since(1.4)
    -    def replace(self, to_replace, value=None, subset=None):
    +    def replace(self, to_replace, *args, **kwargs):
    --- End diff --
    
    Yea, I think that summarises the issue 
    
    > Can we use an invalid value as the default value for value? Then we can throw exception if the value is not set by user.
    
    Yea, we could define a class / instance to indeicate no value like NumPy does -
     https://github.com/numpy/numpy/blob/master/numpy/_globals.py#L76 . I was thinking resembling this way too but this is kind of a new approach to Spark and this is a single case so far.
    
    To get to the point, yea, we could maybe use an invalid value and unset if `to_replace` is a dictionary. For example, I can assign `{}`. But then the problem is docstring by pydoc and API documentation. It will show something like:
    
    ```
    Help on method replace in module pyspark.sql.dataframe:
    
    replace(self, to_replace, value={}, subset=None) method of pyspark.sql.dataframe.DataFrame instance
        Returns a new :class:`DataFrame` replacing a value with another value.
    ...
    ```
    
    This is pretty confusing. Up to my knowledge, we can't really override this signature - I tried few times before, and I failed if I remember this correctly.
    
    Maybe, this is good enough but I didn't want to start it by such because the issue @rxin raised sounds like because it has a default value, to be more strictly. 
    
    To be honest, seems Pandas's `replace` also has `None` for default value -
     https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html#pandas.DataFrame.replace.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87198 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87198/testReport)** for PR 20499 at commit [`a349d07`](https://github.com/apache/spark/commit/a349d078a78efe0763b90b932c8d0b26f6aa4c86).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class _NoValueType(object):`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r166302599
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1532,7 +1532,7 @@ def fillna(self, value, subset=None):
                 return DataFrame(self._jdf.na().fill(value, self._jseq(subset)), self.sql_ctx)
     
         @since(1.4)
    -    def replace(self, to_replace, value=None, subset=None):
    +    def replace(self, to_replace, *args, **kwargs):
    --- End diff --
    
    what's the docstring for `def replace(self, to_replace, *args, **kwargs)`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    I'd fix this in 2.3, and 2.2.1 as well.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/705/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r166301172
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1532,7 +1532,7 @@ def fillna(self, value, subset=None):
                 return DataFrame(self._jdf.na().fill(value, self._jseq(subset)), self.sql_ctx)
     
         @since(1.4)
    -    def replace(self, to_replace, value=None, subset=None):
    +    def replace(self, to_replace, *args, **kwargs):
    --- End diff --
    
    So, to cut it short, yea, if less pretty doc is fine, I can try. That would reduce the change a lot.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Seems RC3 is near to be cut, do we want to get this in 2.3?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON][FOLLOWUP] Disallow default value N...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-19454][PYTHON][FOLLOWUP] Disallow default value N...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Hm .. could't we backport this to 2.3 as well? Or, do you suggest to make the current change to branch-2.3 only?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87201 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87201/testReport)** for PR 20499 at commit [`885b4d0`](https://github.com/apache/spark/commit/885b4d00af53dfd0148c431fdacce9a2789f32a2).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87042 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87042/testReport)** for PR 20499 at commit [`3da7ba6`](https://github.com/apache/spark/commit/3da7ba620a13ff5121556b3439a36c75d78d70f8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-19454][PYTHON][FOLLOWUP] Disallow default value N...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-19454][PYTHON][FOLLOWUP] Disallow default value N...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    > We need a separate JIRA and target it to 2.3
    
    Sure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    is it a bug fix or a new feature?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-19454][PYTHON][FOLLOWUP] Disallow default value N...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87037/testReport)** for PR 20499 at commit [`13bdc24`](https://github.com/apache/spark/commit/13bdc24696563a85283a9bb6d9dc3f9bb2bedaf2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/571/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r166310719
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1532,7 +1532,7 @@ def fillna(self, value, subset=None):
                 return DataFrame(self._jdf.na().fill(value, self._jseq(subset)), self.sql_ctx)
     
         @since(1.4)
    -    def replace(self, to_replace, value=None, subset=None):
    +    def replace(self, to_replace, *args, **kwargs):
    --- End diff --
    
    I prefer `def replace(self, to_replace, value=_NoValue, subset=None)`.
    
    `def replace(self, to_replace, *args, **kwargs)` loses the information about `value` and `subset`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-19454][PYTHON][FOLLOWUP] Disallow default value N...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/566/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87038/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87051/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Looks like an existing issue since Spark 2.2, I don't think this should block 2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r166217503
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1587,6 +1597,51 @@ def replace(self, to_replace, value=None, subset=None):
             |null|  null|null|
             +----+------+----+
             """
    +
    +        # It deals with a problem when 'value' is set to None and 'to_replace' is a dictionary.
    +        # Validate if arguments are missing or not.
    +        is_more_than_two = len(args) + len(kwargs) > 2
    --- End diff --
    
    I read this few times and still feel that this is kind of verbose. But seems there is no better way to check if an optional parameter is set or not in Python.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87198/testReport)** for PR 20499 at commit [`a349d07`](https://github.com/apache/spark/commit/a349d078a78efe0763b90b932c8d0b26f6aa4c86).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87205/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87051/testReport)** for PR 20499 at commit [`5c18594`](https://github.com/apache/spark/commit/5c185941312d5379761d7600127c735eb5b27084).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87196/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87206 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87206/testReport)** for PR 20499 at commit [`885b4d0`](https://github.com/apache/spark/commit/885b4d00af53dfd0148c431fdacce9a2789f32a2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87042/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/591/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    **[Test build #87205 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87205/testReport)** for PR 20499 at commit [`885b4d0`](https://github.com/apache/spark/commit/885b4d00af53dfd0148c431fdacce9a2789f32a2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r165883051
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1557,6 +1557,9 @@ def replace(self, to_replace, value=None, subset=None):
                 For example, if `value` is a string, and subset contains a non-string column,
                 then the non-string column is simply ignored.
     
    +        .. note:: `value` can only be omitted when `to_replace` is a dictionary. Otherwise,
    +            it is required.
    --- End diff --
    
    Shall we just describe this in `value`'s param doc?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r165828519
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1587,6 +1600,51 @@ def replace(self, to_replace, value=None, subset=None):
             |null|  null|null|
             +----+------+----+
             """
    +
    +        # It deals with a problem when 'value' is set to None and 'to_replace' is a dictionary.
    +        is_more_than_two = len(args) + len(kwargs) > 2
    +        if is_more_than_two:
    +            raise TypeError(
    +                "Arguments are expected to be to_replace, value and subset; however, got "
    +                "[%s]" % ", ".join(map(str, [to_replace] + list(args) + list(kwargs.values()))))
    +
    +        is_unexpected_kwargs = \
    +            len(kwargs) == 2 and ("value" not in kwargs or "subset" not in kwargs)
    +        if is_unexpected_kwargs:
    +            raise TypeError(
    +                "Arguments are expected to be to_replace, value and subset; however, got "
    +                "unexpected keyword arguments: [%s]" % ", ".join(
    +                    filter(lambda a: a not in ["value", "subset"], kwargs.keys())))
    +
    +        if "value" in kwargs and "subset" in kwargs:
    +            # replace(..., value=..., subset=...)
    +            # replace(..., subset=..., value=...)
    +            value = kwargs["value"]
    +            subset = kwargs["subset"]
    +        elif "value" in kwargs:
    +            # replace(..., ..., value=...)
    +            value = kwargs["value"]
    +            subset = args[0] if len(args) == 1 else None
    +        elif "subset" in kwargs:
    +            # replace(..., ..., subset=...)
    +            if len(args) == 1:
    +                value = args[0]
    +            elif isinstance(to_replace, dict):
    +                value = None  # When to_replace is a dictionary, value can be omitted.
    +            else:
    +                raise TypeError("value is required when to_replace is not a dictionary.")
    --- End diff --
    
    Btw, can't we just remove `value is not None` in above to let `None` disallowed when `to_replace` is not a dict?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20499#discussion_r166278935
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1532,7 +1532,7 @@ def fillna(self, value, subset=None):
                 return DataFrame(self._jdf.na().fill(value, self._jseq(subset)), self.sql_ctx)
     
         @since(1.4)
    -    def replace(self, to_replace, value=None, subset=None):
    +    def replace(self, to_replace, *args, **kwargs):
    --- End diff --
    
    what's the expectation? if `to_replace` is a dict, `value` should be ignored?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-19454][PYTHON][FOLLOWUP] Disallow default value N...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Thanks! 
    
    Also cc @ueshin @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87068/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/576/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org