You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zero323 <gi...@git.apache.org> on 2017/02/03 20:13:44 UTC

[GitHub] spark pull request #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace impr...

GitHub user zero323 opened a pull request:

    https://github.com/apache/spark/pull/16793

    [SPARK-19454][PYTHON][SQL] DataFrame.replace improvements

    ## What changes were proposed in this pull request?
    
    - Allows skipping `value` argument if `to_replace` is a `dict`:
    	```python
    	In [1]: df = sc.parallelize([("Alice", 1, 3.0)]).toDF()
    	
    	In [2]: df.replace({"Alice": "Bob"}).show()
    	+---+---+---+
    	| _1| _2| _3|
    	+---+---+---+
    	|Bob|  1|3.0|
    	+---+---+---+
    	````
    - Adds validation step to ensure homogeneous values / replacements.
    - Simplifies internal control flow.
    - Improves unit tests coverage.
    
    ## How was this patch tested?
    
    Existing unit tests, additional unit tests, manual testing.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zero323/spark SPARK-19454

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16793.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16793
    
----
commit 9045a3587b8ad27df893182bff67800118c661d3
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-02-02T00:16:05Z

    Ignore value in DataFrame.replace if to_replace is dict

commit e13351f2635dd58c575c665d1626d4d3e495e91b
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-02-03T10:26:53Z

    Test if failure conditions are recognized

commit 5550ba7738cd44a4f50c2e22ffd2bcc5f8ececab
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-02-03T10:35:15Z

    Add tests for DataFrame.replace failures

commit 557c4fd8713324f3684a3dbdf4ef83c41278850a
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-02-03T10:42:24Z

    Group preconditions in DataFrame.replace

commit 609563b9b353d7a8e520dc13f83e3741c3a9a3f5
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-02-03T10:53:21Z

    Add tests for DataFrame.with tuple and multi-element sequence

commit bc1ed34d7695596c67199ae49b8d2e3339059b3d
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-02-03T10:57:24Z

    Remove obsolete casts to tuple

commit 68826073151e6bea9c25d67bdd3f49881940fcb9
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-02-03T10:58:26Z

    Simplify overall workflow

commit f6d9a5cd254ef81fc12f8c4d06865586007f3e98
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-02-03T11:03:57Z

    Reorder pre-conditions and extend error messages

commit de6167696645eecee216ac09cd33d89e62a7fff3
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-02-03T11:13:59Z

    Issue a warning when to_replace is dict but the value has been provided

commit 86db30b19d8986947d76cf578bb0f8fba9ff06ed
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-02-03T11:32:28Z

    Raise ValuError if received mixed types

commit 904db242be9dde4b05b46bfdc58758b85af28c90
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-02-03T11:35:15Z

    Explain purpose of each section

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace impr...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16793#discussion_r103089544
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1307,43 +1307,66 @@ def replace(self, to_replace, value, subset=None):
             |null|  null|null|
             +----+------+----+
             """
    -        if not isinstance(to_replace, (float, int, long, basestring, list, tuple, dict)):
    +        # Helper functions
    +        def all_of(types):
    --- End diff --
    
    Maybe give this a doc-string to clarify what all_of does even though its not user facing better to have a docstring than not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72319/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #72392 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72392/testReport)** for PR 16793 at commit [`f61b782`](https://github.com/apache/spark/commit/f61b78264693dd626ae379d950bbc051153bca0e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #72318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72318/testReport)** for PR 16793 at commit [`904db24`](https://github.com/apache/spark/commit/904db242be9dde4b05b46bfdc58758b85af28c90).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #73524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73524/testReport)** for PR 16793 at commit [`17e6820`](https://github.com/apache/spark/commit/17e68205ef639893902c65c0394c8aa4406191be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74153/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Otherwise, please give me few days .. let me give a shot with `def replace(self, to_replace, *args, **kwargs):`and see if I can resolve it if we are okay with that although I guess pydoc will show a less pretty doc .. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace impr...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16793#discussion_r103533408
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1307,43 +1309,75 @@ def replace(self, to_replace, value, subset=None):
             |null|  null|null|
             +----+------+----+
             """
    -        if not isinstance(to_replace, (float, int, long, basestring, list, tuple, dict)):
    +        # Helper functions
    +        def all_of(types):
    +            """Given a type or tuple of types
    --- End diff --
    
    The formatting of this docstring seems odd here. Also I'd clarify that all_of returns a function which you can use for the check rather than it doing the check its self.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #74146 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74146/testReport)** for PR 16793 at commit [`03303df`](https://github.com/apache/spark/commit/03303dfba528f78f7c9118e8a98cca49371993f7).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace impr...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16793#discussion_r100704329
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1591,6 +1591,67 @@ def test_replace(self):
             self.assertEqual(row.age, 10)
             self.assertEqual(row.height, None)
     
    +        # replace with lists
    +        row = self.spark.createDataFrame(
    +            [(u'Alice', 10, 80.1)], schema).replace([u'Alice'], [u'Ann']).first()
    +        self.assertTupleEqual(row, (u'Ann', 10, 80.1))
    +
    +        # replace with dict
    +        row = self.spark.createDataFrame(
    +            [(u'Alice', 10, 80.1)], schema).replace({10: 11}).first()
    +        self.assertTupleEqual(row, (u'Alice', 11, 80.1))
    --- End diff --
    
    These tests are mostly a side effect of discussions related to https://github.com/apache/spark/pull/16792 Right now test coverage is low and we depend on a certain behavior of Py4j and Scala counterpart. Also I wanted to be sure that all the expected types are still accepted after the changes I've made. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74163/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace impr...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16793#discussion_r100701818
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1591,6 +1591,67 @@ def test_replace(self):
             self.assertEqual(row.age, 10)
             self.assertEqual(row.height, None)
     
    +        # replace with lists
    +        row = self.spark.createDataFrame(
    +            [(u'Alice', 10, 80.1)], schema).replace([u'Alice'], [u'Ann']).first()
    +        self.assertTupleEqual(row, (u'Ann', 10, 80.1))
    +
    +        # replace with dict
    +        row = self.spark.createDataFrame(
    +            [(u'Alice', 10, 80.1)], schema).replace({10: 11}).first()
    +        self.assertTupleEqual(row, (u'Alice', 11, 80.1))
    --- End diff --
    
    This is the only test of "new" functionality (excluding error cases), correct?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #74163 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74163/testReport)** for PR 16793 at commit [`03303df`](https://github.com/apache/spark/commit/03303dfba528f78f7c9118e8a98cca49371993f7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #72319 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72319/testReport)** for PR 16793 at commit [`a3a3127`](https://github.com/apache/spark/commit/a3a3127e49aa96e36ac9fa52ab2398829fe84115).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #72390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72390/testReport)** for PR 16793 at commit [`a7b6dba`](https://github.com/apache/spark/commit/a7b6dba5ce2268e2dddff2b3267961ad4441f43a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #72389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72389/testReport)** for PR 16793 at commit [`c06b97c`](https://github.com/apache/spark/commit/c06b97c84ad62b02c066a4df50e3a720d4868cba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #72393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72393/testReport)** for PR 16793 at commit [`a02e4ff`](https://github.com/apache/spark/commit/a02e4ff65e0ed4d785dbadbb2bbabde5b5fb3f91).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #73517 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73517/testReport)** for PR 16793 at commit [`e014867`](https://github.com/apache/spark/commit/e014867271099b8450369fd591fd765c530b083d).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #72389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72389/testReport)** for PR 16793 at commit [`c06b97c`](https://github.com/apache/spark/commit/c06b97c84ad62b02c066a4df50e3a720d4868cba).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #72319 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72319/testReport)** for PR 16793 at commit [`a3a3127`](https://github.com/apache/spark/commit/a3a3127e49aa96e36ac9fa52ab2398829fe84115).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #74153 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74153/testReport)** for PR 16793 at commit [`03303df`](https://github.com/apache/spark/commit/03303dfba528f78f7c9118e8a98cca49371993f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace impr...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16793#discussion_r103089704
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1591,6 +1591,67 @@ def test_replace(self):
             self.assertEqual(row.age, 10)
             self.assertEqual(row.height, None)
     
    +        # replace with lists
    +        row = self.spark.createDataFrame(
    +            [(u'Alice', 10, 80.1)], schema).replace([u'Alice'], [u'Ann']).first()
    +        self.assertTupleEqual(row, (u'Ann', 10, 80.1))
    +
    +        # replace with dict
    +        row = self.spark.createDataFrame(
    +            [(u'Alice', 10, 80.1)], schema).replace({10: 11}).first()
    +        self.assertTupleEqual(row, (u'Alice', 11, 80.1))
    --- End diff --
    
    I think (and I could be wrong) that @nchammas was suggesting it might make sense to have some more tests with dict, not that the other additional new tests are bad.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace impr...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16793#discussion_r104810137
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1307,43 +1309,75 @@ def replace(self, to_replace, value, subset=None):
             |null|  null|null|
             +----+------+----+
             """
    -        if not isinstance(to_replace, (float, int, long, basestring, list, tuple, dict)):
    +        # Helper functions
    +        def all_of(types):
    +            """Given a type or tuple of types
    +            and sequence of xs check if each x
    +            is instance of type(s)
    +
    +            >>> all_of(bool)([True, False])
    +            True
    +            >>> all_of(basestring)(["a", 1])
    +            False
    +            """
    +            def all_of_(xs):
    +                return all(isinstance(x, types) for x in xs)
    +            return all_of_
    +
    +        all_of_bool = all_of(bool)
    +        all_of_str = all_of(basestring)
    +        all_of_numeric = all_of((float, int, long))
    +
    +        # Validate input types
    +        valid_types = (bool, float, int, long, basestring, list, tuple)
    +        if not isinstance(to_replace, valid_types + (dict, )):
                 raise ValueError(
    -                "to_replace should be a float, int, long, string, list, tuple, or dict")
    +                "to_replace should be a float, int, long, string, list, tuple, or dict. "
    +                "Got {0}".format(type(to_replace)))
     
    -        if not isinstance(value, (float, int, long, basestring, list, tuple)):
    -            raise ValueError("value should be a float, int, long, string, list, or tuple")
    +        if (not isinstance(value, valid_types) and
    +                not isinstance(to_replace, dict)):
    +            raise ValueError("If to_replace is not a dict, value should be "
    +                             "a float, int, long, string, list, or tuple. "
    +                             "Got {0}".format(type(value)))
    +
    +        if isinstance(to_replace, (list, tuple)) and isinstance(value, (list, tuple)):
    +            if len(to_replace) != len(value):
    +                raise ValueError("to_replace and value lists should be of the same length. "
    +                                 "Got {0} and {1}".format(len(to_replace), len(value)))
     
    -        rep_dict = dict()
    +        if not (subset is None or isinstance(subset, (list, tuple, basestring))):
    +            raise ValueError("subset should be a list or tuple of column names, "
    +                             "column name or None. Got {0}".format(type(subset)))
     
    +        # Reshape input arguments if necessary
             if isinstance(to_replace, (float, int, long, basestring)):
                 to_replace = [to_replace]
     
    -        if isinstance(to_replace, tuple):
    -            to_replace = list(to_replace)
    +        if isinstance(value, (float, int, long, basestring)):
    +            value = [value for _ in range(len(to_replace))]
     
    -        if isinstance(value, tuple):
    -            value = list(value)
    -
    -        if isinstance(to_replace, list) and isinstance(value, list):
    -            if len(to_replace) != len(value):
    -                raise ValueError("to_replace and value lists should be of the same length")
    -            rep_dict = dict(zip(to_replace, value))
    -        elif isinstance(to_replace, list) and isinstance(value, (float, int, long, basestring)):
    -            rep_dict = dict([(tr, value) for tr in to_replace])
    -        elif isinstance(to_replace, dict):
    +        if isinstance(to_replace, dict):
                 rep_dict = to_replace
    +            if value is not None:
    +                warnings.warn("to_replace is a dict, but value is not None. "
    --- End diff --
    
    Maybe not.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace impr...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16793


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    cc @holdenk 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Jenkins retest this please (47b2f68a885b7a2fc593ac7a55cd19742016364d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73517/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72390/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72389/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #73524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73524/testReport)** for PR 16793 at commit [`17e6820`](https://github.com/apache/spark/commit/17e68205ef639893902c65c0394c8aa4406191be).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Also the implementation doesn't match what was proposed in https://issues.apache.org/jira/browse/SPARK-19454
    
    Having null value as the default in a function called replace is too risky and error prone.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #74163 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74163/testReport)** for PR 16793 at commit [`03303df`](https://github.com/apache/spark/commit/03303dfba528f78f7c9118e8a98cca49371993f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72392/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #72318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72318/testReport)** for PR 16793 at commit [`904db24`](https://github.com/apache/spark/commit/904db242be9dde4b05b46bfdc58758b85af28c90).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Jenkins retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73524/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74146/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #72390 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72390/testReport)** for PR 16793 at commit [`a7b6dba`](https://github.com/apache/spark/commit/a7b6dba5ce2268e2dddff2b3267961ad4441f43a).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72318/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #73517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73517/testReport)** for PR 16793 at commit [`e014867`](https://github.com/apache/spark/commit/e014867271099b8450369fd591fd765c530b083d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    I think the actual root cause is because we happen to allow a dictionary for `to_replace` at the first place.
    
    So, do you prefer to have?
    
    ```python
    def replace(self, to_replace, value, subset=None):
        ...
    ```
    
    but in this case, we should do as below if `to_replace` is a dictionary.
    
    ```
     df.replace({"Alice": "Bob"}, 1).show()
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72393/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace impr...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16793#discussion_r104452225
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1307,43 +1309,75 @@ def replace(self, to_replace, value, subset=None):
             |null|  null|null|
             +----+------+----+
             """
    -        if not isinstance(to_replace, (float, int, long, basestring, list, tuple, dict)):
    +        # Helper functions
    +        def all_of(types):
    +            """Given a type or tuple of types
    +            and sequence of xs check if each x
    +            is instance of type(s)
    +
    +            >>> all_of(bool)([True, False])
    +            True
    +            >>> all_of(basestring)(["a", 1])
    +            False
    +            """
    +            def all_of_(xs):
    +                return all(isinstance(x, types) for x in xs)
    +            return all_of_
    +
    +        all_of_bool = all_of(bool)
    +        all_of_str = all_of(basestring)
    +        all_of_numeric = all_of((float, int, long))
    +
    +        # Validate input types
    +        valid_types = (bool, float, int, long, basestring, list, tuple)
    +        if not isinstance(to_replace, valid_types + (dict, )):
                 raise ValueError(
    -                "to_replace should be a float, int, long, string, list, tuple, or dict")
    +                "to_replace should be a float, int, long, string, list, tuple, or dict. "
    +                "Got {0}".format(type(to_replace)))
     
    -        if not isinstance(value, (float, int, long, basestring, list, tuple)):
    -            raise ValueError("value should be a float, int, long, string, list, or tuple")
    +        if (not isinstance(value, valid_types) and
    --- End diff --
    
    This seems like a weird split.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #74153 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74153/testReport)** for PR 16793 at commit [`03303df`](https://github.com/apache/spark/commit/03303dfba528f78f7c9118e8a98cca49371993f7).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #72393 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72393/testReport)** for PR 16793 at commit [`a02e4ff`](https://github.com/apache/spark/commit/a02e4ff65e0ed4d785dbadbb2bbabde5b5fb3f91).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Thanks @holdenk 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    @holdenk Do you think it is realistic to see this merged into 2.2?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    > I think (and I could be wrong) that @nchammas was suggesting it might make sense to have some more tests with dict, not that the other additional new tests are bad.
    
    I am like Python - you have to be explicit :) I'll try to figure out some useful tests and get back to you. Thanks for the feedback @holdenk, @nchammas


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Let me try and take a look tonight. It seems like there are some small formatting issues still at a quick glance but I feel like this should be close.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace impr...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16793#discussion_r104451583
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1307,43 +1309,75 @@ def replace(self, to_replace, value, subset=None):
             |null|  null|null|
             +----+------+----+
             """
    -        if not isinstance(to_replace, (float, int, long, basestring, list, tuple, dict)):
    +        # Helper functions
    +        def all_of(types):
    +            """Given a type or tuple of types
    +            and sequence of xs check if each x
    +            is instance of type(s)
    +
    +            >>> all_of(bool)([True, False])
    +            True
    +            >>> all_of(basestring)(["a", 1])
    +            False
    +            """
    +            def all_of_(xs):
    +                return all(isinstance(x, types) for x in xs)
    +            return all_of_
    +
    +        all_of_bool = all_of(bool)
    +        all_of_str = all_of(basestring)
    +        all_of_numeric = all_of((float, int, long))
    +
    +        # Validate input types
    +        valid_types = (bool, float, int, long, basestring, list, tuple)
    +        if not isinstance(to_replace, valid_types + (dict, )):
                 raise ValueError(
    -                "to_replace should be a float, int, long, string, list, tuple, or dict")
    +                "to_replace should be a float, int, long, string, list, tuple, or dict. "
    +                "Got {0}".format(type(to_replace)))
     
    -        if not isinstance(value, (float, int, long, basestring, list, tuple)):
    -            raise ValueError("value should be a float, int, long, string, list, or tuple")
    +        if (not isinstance(value, valid_types) and
    +                not isinstance(to_replace, dict)):
    +            raise ValueError("If to_replace is not a dict, value should be "
    +                             "a float, int, long, string, list, or tuple. "
    +                             "Got {0}".format(type(value)))
    +
    +        if isinstance(to_replace, (list, tuple)) and isinstance(value, (list, tuple)):
    +            if len(to_replace) != len(value):
    +                raise ValueError("to_replace and value lists should be of the same length. "
    +                                 "Got {0} and {1}".format(len(to_replace), len(value)))
     
    -        rep_dict = dict()
    +        if not (subset is None or isinstance(subset, (list, tuple, basestring))):
    +            raise ValueError("subset should be a list or tuple of column names, "
    +                             "column name or None. Got {0}".format(type(subset)))
     
    +        # Reshape input arguments if necessary
             if isinstance(to_replace, (float, int, long, basestring)):
                 to_replace = [to_replace]
     
    -        if isinstance(to_replace, tuple):
    -            to_replace = list(to_replace)
    +        if isinstance(value, (float, int, long, basestring)):
    +            value = [value for _ in range(len(to_replace))]
     
    -        if isinstance(value, tuple):
    -            value = list(value)
    -
    -        if isinstance(to_replace, list) and isinstance(value, list):
    -            if len(to_replace) != len(value):
    -                raise ValueError("to_replace and value lists should be of the same length")
    -            rep_dict = dict(zip(to_replace, value))
    -        elif isinstance(to_replace, list) and isinstance(value, (float, int, long, basestring)):
    -            rep_dict = dict([(tr, value) for tr in to_replace])
    -        elif isinstance(to_replace, dict):
    +        if isinstance(to_replace, dict):
                 rep_dict = to_replace
    +            if value is not None:
    +                warnings.warn("to_replace is a dict, but value is not None. "
    --- End diff --
    
    Does this need to be split?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #72392 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72392/testReport)** for PR 16793 at commit [`f61b782`](https://github.com/apache/spark/commit/f61b78264693dd626ae379d950bbc051153bca0e).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    Sorry I object this change. Why would we put null as the default replace value, in a function called replace? That seems very counterintuitive and error prone.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace impr...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16793#discussion_r103089653
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1307,43 +1307,66 @@ def replace(self, to_replace, value, subset=None):
             |null|  null|null|
             +----+------+----+
             """
    -        if not isinstance(to_replace, (float, int, long, basestring, list, tuple, dict)):
    +        # Helper functions
    +        def all_of(types):
    +            def all_of_(xs):
    +                return all(isinstance(x, types) for x in xs)
    +            return all_of_
    +
    +        all_of_bool = all_of(bool)
    +        all_of_str = all_of(basestring)
    +        all_of_numeric = all_of((float, int, long))
    +
    +        # Validate input types
    +        valid_types = (bool, float, int, long, basestring, list, tuple)
    +        if not isinstance(to_replace, valid_types + (dict, )):
                 raise ValueError(
    -                "to_replace should be a float, int, long, string, list, tuple, or dict")
    +                "to_replace should be a float, int, long, string, list, tuple, or dict. "
    +                "Got {0}".format(type(to_replace)))
     
    -        if not isinstance(value, (float, int, long, basestring, list, tuple)):
    -            raise ValueError("value should be a float, int, long, string, list, or tuple")
    +        if (not isinstance(value, valid_types) and
    +                not isinstance(to_replace, dict)):
    +            raise ValueError("If to_replace is not a dict, value should be "
    +                             "a float, int, long, string, list, or tuple. "
    +                             "Got {0}".format(type(value)))
    +
    +        if isinstance(to_replace, (list, tuple)) and isinstance(value, (list, tuple)):
    +            if len(to_replace) != len(value):
    +                raise ValueError("to_replace and value lists should be of the same length. "
    +                                 "Got {0} and {1}".format(len(to_replace), len(value)))
     
    -        rep_dict = dict()
    +        if not (subset is None or isinstance(subset, (list, tuple, basestring))):
    +            raise ValueError("subset should be a list or tuple of column names, "
    +                             "column name or None. Got {0}".format(type(subset)))
     
    +        # Reshape input arguments if necessary
             if isinstance(to_replace, (float, int, long, basestring)):
                 to_replace = [to_replace]
     
    -        if isinstance(to_replace, tuple):
    -            to_replace = list(to_replace)
    +        if isinstance(value, (float, int, long, basestring)):
    +            value = [value for _ in range(len(to_replace))]
     
    -        if isinstance(value, tuple):
    -            value = list(value)
    -
    -        if isinstance(to_replace, list) and isinstance(value, list):
    -            if len(to_replace) != len(value):
    -                raise ValueError("to_replace and value lists should be of the same length")
    -            rep_dict = dict(zip(to_replace, value))
    -        elif isinstance(to_replace, list) and isinstance(value, (float, int, long, basestring)):
    -            rep_dict = dict([(tr, value) for tr in to_replace])
    -        elif isinstance(to_replace, dict):
    +        if isinstance(to_replace, dict):
                 rep_dict = to_replace
    +            if value is not None:
    +                warnings.warn("to_replace is a dict, but value is not None. "
    +                              "value will be ignored.")
    +        else:
    +            rep_dict = dict(zip(to_replace, value))
     
    -        if subset is None:
    -            return DataFrame(self._jdf.na().replace('*', rep_dict), self.sql_ctx)
    -        elif isinstance(subset, basestring):
    +        if isinstance(subset, basestring):
                 subset = [subset]
     
    -        if not isinstance(subset, (list, tuple)):
    -            raise ValueError("subset should be a list or tuple of column names")
    +        # Check if we won't pass mixed type generics
    --- End diff --
    
    This reads a bit awkwardly. How about "Verify we were not passed in mixed type generics."?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16793
  
    **[Test build #74146 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74146/testReport)** for PR 16793 at commit [`03303df`](https://github.com/apache/spark/commit/03303dfba528f78f7c9118e8a98cca49371993f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org