You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by rberenguel <gi...@git.apache.org> on 2017/05/31 15:59:55 UTC

[GitHub] spark pull request #18164: [Spark-19732][SQL][PYSPARK] fillna bools

GitHub user rberenguel opened a pull request:

    https://github.com/apache/spark/pull/18164

    [Spark-19732][SQL][PYSPARK] fillna bools

    ## What changes were proposed in this pull request?
    
    Allow fill/replace of NAs with booleans, both in Python and Scala
    
    ## How was this patch tested?
    
    Unit tests, doctests
    
    This PR is original work from me and I license this work to the Spark project

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rberenguel/spark SPARK-19732-fillna-bools

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18164.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18164
    
----
commit 60b60e307acae467b40d6dd8880918134e367140
Author: Ruben Berenguel Montoro <ru...@mostlymaths.net>
Date:   2017-05-30T23:22:44Z

    SPARK-19732 fillna for booleans in place. Tomorrow testing for it and adding to PySpark

commit fb0904b21a048f40a35da72ff518822e8e3fb6c4
Author: Ruben Berenguel Montoro <ru...@mostlymaths.net>
Date:   2017-05-31T15:57:03Z

    SPARK-19732 Fillna for booleans (Spark and Scala)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] Add fill functions for nulls...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119812666
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
    @@ -196,6 +196,24 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) {
       def fill(value: String, cols: Seq[String]): DataFrame = fillValue(value, cols)
     
       /**
    +   * Returns a new `DataFrame` that replaces null values in boolean columns with `value`.
    +   */
    +  def fill(value: Boolean): DataFrame = fill(value, df.columns)
    --- End diff --
    
    I wasn't sure about this, wanted to ask actually. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77674/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    **[Test build #77675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77675/testReport)** for PR 18164 at commit [`21b4f67`](https://github.com/apache/spark/commit/21b4f677fe3c521005bbc9b95877dc9d093fbe40).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] Add fill functions for nulls...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    @HyukjinKwon I changed it, does it look any clearer? I have always thought of `na` in terms of Python and not R anyway :) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] Add fill functions for nulls...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    **[Test build #77685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77685/testReport)** for PR 18164 at commit [`fb65d34`](https://github.com/apache/spark/commit/fb65d34870919cfd73043a7d743f21f3d5f7806f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119804978
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1697,40 +1697,56 @@ def test_fillna(self):
             schema = StructType([
                 StructField("name", StringType(), True),
                 StructField("age", IntegerType(), True),
    -            StructField("height", DoubleType(), True)])
    +            StructField("height", DoubleType(), True),
    +            StructField("spy", BooleanType(), True)])
     
             # fillna shouldn't change non-null values
    -        row = self.spark.createDataFrame([(u'Alice', 10, 80.1)], schema).fillna(50).first()
    +        row = self.spark.createDataFrame([(u'Alice', 10, 80.1, True)], schema).fillna(50).first()
             self.assertEqual(row.age, 10)
     
             # fillna with int
    -        row = self.spark.createDataFrame([(u'Alice', None, None)], schema).fillna(50).first()
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(50).first()
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, 50.0)
     
             # fillna with double
    -        row = self.spark.createDataFrame([(u'Alice', None, None)], schema).fillna(50.1).first()
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(50.1).first()
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, 50.1)
     
    +        # fillna with bool
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(True).first()
    +        self.assertEqual(row.age, None)
    +        self.assertEqual(row.spy, True)
    +
             # fillna with string
    -        row = self.spark.createDataFrame([(None, None, None)], schema).fillna("hello").first()
    +        row = self.spark.createDataFrame([(None, None, None, None)], schema).fillna("hello").first()
             self.assertEqual(row.name, u"hello")
             self.assertEqual(row.age, None)
     
             # fillna with subset specified for numeric cols
             row = self.spark.createDataFrame(
    -            [(None, None, None)], schema).fillna(50, subset=['name', 'age']).first()
    +            [(None, None, None, None)], schema).fillna(50, subset=['name', 'age']).first()
             self.assertEqual(row.name, None)
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, None)
     
    -        # fillna with subset specified for numeric cols
    +        # fillna with subset specified for string cols
             row = self.spark.createDataFrame(
    -            [(None, None, None)], schema).fillna("haha", subset=['name', 'age']).first()
    +            [(None, None, None, None)], schema).fillna("haha", subset=['name', 'age']).first()
             self.assertEqual(row.name, "haha")
             self.assertEqual(row.age, None)
             self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, None)
    +
    +        # fillna with subset specified for bool cols
    +        row = self.spark.createDataFrame(
    +            [(None, None, None, None)], schema).fillna(True, subset=['name', 'age']).first()
    +        self.assertEqual(row.name, None)
    +        self.assertEqual(row.age, None)
    +        self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, True)
    --- End diff --
    
    @HyukjinKwon I passed the test in my local environment after I updated to the latest commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] Add fill functions for nulls...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    For me, it looks good. Please let me leave this to @ueshin. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119800139
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1697,40 +1697,56 @@ def test_fillna(self):
             schema = StructType([
                 StructField("name", StringType(), True),
                 StructField("age", IntegerType(), True),
    -            StructField("height", DoubleType(), True)])
    +            StructField("height", DoubleType(), True),
    +            StructField("spy", BooleanType(), True)])
     
             # fillna shouldn't change non-null values
    -        row = self.spark.createDataFrame([(u'Alice', 10, 80.1)], schema).fillna(50).first()
    +        row = self.spark.createDataFrame([(u'Alice', 10, 80.1, True)], schema).fillna(50).first()
             self.assertEqual(row.age, 10)
     
             # fillna with int
    -        row = self.spark.createDataFrame([(u'Alice', None, None)], schema).fillna(50).first()
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(50).first()
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, 50.0)
     
             # fillna with double
    -        row = self.spark.createDataFrame([(u'Alice', None, None)], schema).fillna(50.1).first()
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(50.1).first()
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, 50.1)
     
    +        # fillna with bool
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(True).first()
    +        self.assertEqual(row.age, None)
    +        self.assertEqual(row.spy, True)
    +
             # fillna with string
    -        row = self.spark.createDataFrame([(None, None, None)], schema).fillna("hello").first()
    +        row = self.spark.createDataFrame([(None, None, None, None)], schema).fillna("hello").first()
             self.assertEqual(row.name, u"hello")
             self.assertEqual(row.age, None)
     
             # fillna with subset specified for numeric cols
             row = self.spark.createDataFrame(
    -            [(None, None, None)], schema).fillna(50, subset=['name', 'age']).first()
    +            [(None, None, None, None)], schema).fillna(50, subset=['name', 'age']).first()
             self.assertEqual(row.name, None)
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, None)
     
    -        # fillna with subset specified for numeric cols
    +        # fillna with subset specified for string cols
             row = self.spark.createDataFrame(
    -            [(None, None, None)], schema).fillna("haha", subset=['name', 'age']).first()
    +            [(None, None, None, None)], schema).fillna("haha", subset=['name', 'age']).first()
             self.assertEqual(row.name, "haha")
             self.assertEqual(row.age, None)
             self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, None)
    +
    +        # fillna with subset specified for bool cols
    +        row = self.spark.createDataFrame(
    +            [(None, None, None, None)], schema).fillna(True, subset=['name', 'age']).first()
    +        self.assertEqual(row.name, None)
    +        self.assertEqual(row.age, None)
    +        self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, True)
    --- End diff --
    
    Well, I think this fails :
    
    ```
    ======================================================================
    ERROR [0.452s]: test_fillna (pyspark.sql.tests.SQLTests)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File ".../spark/python/pyspark/sql/tests.py", line 1749, in test_fillna
        self.assertEqual(row.spy, True)
    AssertionError: None != True
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] Add fill functions for nulls...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119805293
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1697,40 +1697,56 @@ def test_fillna(self):
             schema = StructType([
                 StructField("name", StringType(), True),
                 StructField("age", IntegerType(), True),
    -            StructField("height", DoubleType(), True)])
    +            StructField("height", DoubleType(), True),
    +            StructField("spy", BooleanType(), True)])
     
             # fillna shouldn't change non-null values
    -        row = self.spark.createDataFrame([(u'Alice', 10, 80.1)], schema).fillna(50).first()
    +        row = self.spark.createDataFrame([(u'Alice', 10, 80.1, True)], schema).fillna(50).first()
             self.assertEqual(row.age, 10)
     
             # fillna with int
    -        row = self.spark.createDataFrame([(u'Alice', None, None)], schema).fillna(50).first()
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(50).first()
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, 50.0)
     
             # fillna with double
    -        row = self.spark.createDataFrame([(u'Alice', None, None)], schema).fillna(50.1).first()
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(50.1).first()
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, 50.1)
     
    +        # fillna with bool
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(True).first()
    +        self.assertEqual(row.age, None)
    +        self.assertEqual(row.spy, True)
    +
             # fillna with string
    -        row = self.spark.createDataFrame([(None, None, None)], schema).fillna("hello").first()
    +        row = self.spark.createDataFrame([(None, None, None, None)], schema).fillna("hello").first()
             self.assertEqual(row.name, u"hello")
             self.assertEqual(row.age, None)
     
             # fillna with subset specified for numeric cols
             row = self.spark.createDataFrame(
    -            [(None, None, None)], schema).fillna(50, subset=['name', 'age']).first()
    +            [(None, None, None, None)], schema).fillna(50, subset=['name', 'age']).first()
             self.assertEqual(row.name, None)
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, None)
     
    -        # fillna with subset specified for numeric cols
    +        # fillna with subset specified for string cols
             row = self.spark.createDataFrame(
    -            [(None, None, None)], schema).fillna("haha", subset=['name', 'age']).first()
    +            [(None, None, None, None)], schema).fillna("haha", subset=['name', 'age']).first()
             self.assertEqual(row.name, "haha")
             self.assertEqual(row.age, None)
             self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, None)
    +
    +        # fillna with subset specified for bool cols
    +        row = self.spark.createDataFrame(
    +            [(None, None, None, None)], schema).fillna(True, subset=['name', 'age']).first()
    +        self.assertEqual(row.name, None)
    +        self.assertEqual(row.age, None)
    +        self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, True)
    --- End diff --
    
    Yea, I meant your initial comment was right ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    @ueshin @HyukjinKwon thanks for giving it a very thorough look and sorry for my previous comment, that was terribly unclear. I was confused because the Appveyor tick mark was green for commit 076ebed and I  had run the tests locally (forgot linting, though), so I was pretty sure the test was right and I was confused about how the subset wrong still had a passing test. 
    
    I probably skipped the wrong step for testing the Python tests (I'm still figuring out which corners I can cut to avoid a full compile/build cycle for the whole project, which takes ages for me) so I didn't see my local test failing, but the remote one was more puzzling, I guess appveyor had a hiccup here. Sorry again for the confused and confusing statements above :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119811145
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameNaFunctionsSuite.scala ---
    @@ -124,6 +134,13 @@ class DataFrameNaFunctionsSuite extends QueryTest with SharedSQLContext {
             Row("Nina") :: Row("Amy") :: Row("unknown") :: Nil)
         assert(input.na.fill("unknown").columns.toSeq === input.columns.toSeq)
     
    +    // boolean
    +    checkAnswer(
    +      boolInput.na.fill(true).select("spy"),
    +      Row(false) :: Row(true) :: Row(true) ::
    --- End diff --
    
    I think we could make this inlined.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119813861
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameNaFunctionsSuite.scala ---
    @@ -124,6 +134,13 @@ class DataFrameNaFunctionsSuite extends QueryTest with SharedSQLContext {
             Row("Nina") :: Row("Amy") :: Row("unknown") :: Nil)
         assert(input.na.fill("unknown").columns.toSeq === input.columns.toSeq)
     
    +    // boolean
    +    checkAnswer(
    +      boolInput.na.fill(true).select("spy"),
    +      Row(false) :: Row(true) :: Row(true) ::
    --- End diff --
    
    Ah, I meant ...
    
    ```
    Row(false) :: Row(true) :: Row(true) :: Row(true) :: Nil
    ```
    
    because it does not look exceeding the length limit, 100 - https://github.com/apache/spark/blob/master/scalastyle-config.xml#L78


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119798778
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1697,40 +1697,56 @@ def test_fillna(self):
             schema = StructType([
                 StructField("name", StringType(), True),
                 StructField("age", IntegerType(), True),
    -            StructField("height", DoubleType(), True)])
    +            StructField("height", DoubleType(), True),
    +            StructField("spy", BooleanType(), True)])
     
             # fillna shouldn't change non-null values
    -        row = self.spark.createDataFrame([(u'Alice', 10, 80.1)], schema).fillna(50).first()
    +        row = self.spark.createDataFrame([(u'Alice', 10, 80.1, True)], schema).fillna(50).first()
             self.assertEqual(row.age, 10)
     
             # fillna with int
    -        row = self.spark.createDataFrame([(u'Alice', None, None)], schema).fillna(50).first()
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(50).first()
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, 50.0)
     
             # fillna with double
    -        row = self.spark.createDataFrame([(u'Alice', None, None)], schema).fillna(50.1).first()
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(50.1).first()
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, 50.1)
     
    +        # fillna with bool
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(True).first()
    +        self.assertEqual(row.age, None)
    +        self.assertEqual(row.spy, True)
    +
             # fillna with string
    -        row = self.spark.createDataFrame([(None, None, None)], schema).fillna("hello").first()
    +        row = self.spark.createDataFrame([(None, None, None, None)], schema).fillna("hello").first()
             self.assertEqual(row.name, u"hello")
             self.assertEqual(row.age, None)
     
             # fillna with subset specified for numeric cols
             row = self.spark.createDataFrame(
    -            [(None, None, None)], schema).fillna(50, subset=['name', 'age']).first()
    +            [(None, None, None, None)], schema).fillna(50, subset=['name', 'age']).first()
             self.assertEqual(row.name, None)
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, None)
     
    -        # fillna with subset specified for numeric cols
    +        # fillna with subset specified for string cols
             row = self.spark.createDataFrame(
    -            [(None, None, None)], schema).fillna("haha", subset=['name', 'age']).first()
    +            [(None, None, None, None)], schema).fillna("haha", subset=['name', 'age']).first()
             self.assertEqual(row.name, "haha")
             self.assertEqual(row.age, None)
             self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, None)
    +
    +        # fillna with subset specified for bool cols
    +        row = self.spark.createDataFrame(
    +            [(None, None, None, None)], schema).fillna(True, subset=['name', 'age']).first()
    +        self.assertEqual(row.name, None)
    +        self.assertEqual(row.age, None)
    +        self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, True)
    --- End diff --
    
    Hi @ueshin indeed! Thanks for catching this, I have modified the test. BUT, this test, as it stands on your comment, should have failed, doesn't it? The subset should not have been applied to spy (so, spy should have been None, and the assertion should have been marked as false, but either the test passed or the test didn't run), if I understood correctly how subsetting fillna's work. But this is weird, since I didn't change any internals of how it works, I just created the methods to enable it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    @ueshin, do you think it is okay to add this? I want to help review here if so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] Add fill functions for nulls...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119812409
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1303,9 +1312,11 @@ def fillna(self, value, subset=None):
             +---+------+-------+
             """
             if not isinstance(value, (float, int, long, basestring, dict)):
    -            raise ValueError("value should be a float, int, long, string, or dict")
    +            raise ValueError("value should be a float, int, long, string, boolean or dict")
     
    -        if isinstance(value, (int, long)):
    +        if isinstance(value, bool):
    +            pass
    +        elif isinstance(value, (int, long)):
    --- End diff --
    
    Thanks, indeed makes sense and makes it a bit nicer than having a pass. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119812943
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameNaFunctionsSuite.scala ---
    @@ -36,6 +36,15 @@ class DataFrameNaFunctionsSuite extends QueryTest with SharedSQLContext {
           ).toDF("name", "age", "height")
       }
     
    +  def createBooleanDF(): DataFrame = {
    --- End diff --
    
    Yup, right. I added it on top to keep both together, but it's only used for the boolean tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    **[Test build #77674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77674/testReport)** for PR 18164 at commit [`1b3c712`](https://github.com/apache/spark/commit/1b3c7126f827b23db585bc4e5a4cecef854320c4).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] Add fill functions for nulls...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    **[Test build #77685 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77685/testReport)** for PR 18164 at commit [`fb65d34`](https://github.com/apache/spark/commit/fb65d34870919cfd73043a7d743f21f3d5f7806f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119812631
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1303,9 +1312,11 @@ def fillna(self, value, subset=None):
             +---+------+-------+
             """
             if not isinstance(value, (float, int, long, basestring, dict)):
    -            raise ValueError("value should be a float, int, long, string, or dict")
    +            raise ValueError("value should be a float, int, long, string, boolean or dict")
    --- End diff --
    
    Thanks, will change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119800727
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1697,40 +1697,56 @@ def test_fillna(self):
             schema = StructType([
                 StructField("name", StringType(), True),
                 StructField("age", IntegerType(), True),
    -            StructField("height", DoubleType(), True)])
    +            StructField("height", DoubleType(), True),
    +            StructField("spy", BooleanType(), True)])
     
             # fillna shouldn't change non-null values
    -        row = self.spark.createDataFrame([(u'Alice', 10, 80.1)], schema).fillna(50).first()
    +        row = self.spark.createDataFrame([(u'Alice', 10, 80.1, True)], schema).fillna(50).first()
             self.assertEqual(row.age, 10)
     
             # fillna with int
    -        row = self.spark.createDataFrame([(u'Alice', None, None)], schema).fillna(50).first()
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(50).first()
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, 50.0)
     
             # fillna with double
    -        row = self.spark.createDataFrame([(u'Alice', None, None)], schema).fillna(50.1).first()
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(50.1).first()
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, 50.1)
     
    +        # fillna with bool
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(True).first()
    +        self.assertEqual(row.age, None)
    +        self.assertEqual(row.spy, True)
    +
             # fillna with string
    -        row = self.spark.createDataFrame([(None, None, None)], schema).fillna("hello").first()
    +        row = self.spark.createDataFrame([(None, None, None, None)], schema).fillna("hello").first()
             self.assertEqual(row.name, u"hello")
             self.assertEqual(row.age, None)
     
             # fillna with subset specified for numeric cols
             row = self.spark.createDataFrame(
    -            [(None, None, None)], schema).fillna(50, subset=['name', 'age']).first()
    +            [(None, None, None, None)], schema).fillna(50, subset=['name', 'age']).first()
             self.assertEqual(row.name, None)
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, None)
     
    -        # fillna with subset specified for numeric cols
    +        # fillna with subset specified for string cols
             row = self.spark.createDataFrame(
    -            [(None, None, None)], schema).fillna("haha", subset=['name', 'age']).first()
    +            [(None, None, None, None)], schema).fillna("haha", subset=['name', 'age']).first()
             self.assertEqual(row.name, "haha")
             self.assertEqual(row.age, None)
             self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, None)
    +
    +        # fillna with subset specified for bool cols
    +        row = self.spark.createDataFrame(
    +            [(None, None, None, None)], schema).fillna(True, subset=['name', 'age']).first()
    +        self.assertEqual(row.name, None)
    +        self.assertEqual(row.age, None)
    +        self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, True)
    --- End diff --
    
    @rberenguel I'm sorry but I didn't understand what you are getting at.
    I guess if the subset is `['name', 'spy']` as you updated, `row.spy` will become `True` because the `row.spy` is `BooleanType` and the value is boolean.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119791032
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1697,40 +1697,56 @@ def test_fillna(self):
             schema = StructType([
                 StructField("name", StringType(), True),
                 StructField("age", IntegerType(), True),
    -            StructField("height", DoubleType(), True)])
    +            StructField("height", DoubleType(), True),
    +            StructField("spy", BooleanType(), True)])
     
             # fillna shouldn't change non-null values
    -        row = self.spark.createDataFrame([(u'Alice', 10, 80.1)], schema).fillna(50).first()
    +        row = self.spark.createDataFrame([(u'Alice', 10, 80.1, True)], schema).fillna(50).first()
             self.assertEqual(row.age, 10)
     
             # fillna with int
    -        row = self.spark.createDataFrame([(u'Alice', None, None)], schema).fillna(50).first()
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(50).first()
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, 50.0)
     
             # fillna with double
    -        row = self.spark.createDataFrame([(u'Alice', None, None)], schema).fillna(50.1).first()
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(50.1).first()
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, 50.1)
     
    +        # fillna with bool
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(True).first()
    +        self.assertEqual(row.age, None)
    +        self.assertEqual(row.spy, True)
    +
             # fillna with string
    -        row = self.spark.createDataFrame([(None, None, None)], schema).fillna("hello").first()
    +        row = self.spark.createDataFrame([(None, None, None, None)], schema).fillna("hello").first()
             self.assertEqual(row.name, u"hello")
             self.assertEqual(row.age, None)
     
             # fillna with subset specified for numeric cols
             row = self.spark.createDataFrame(
    -            [(None, None, None)], schema).fillna(50, subset=['name', 'age']).first()
    +            [(None, None, None, None)], schema).fillna(50, subset=['name', 'age']).first()
             self.assertEqual(row.name, None)
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, None)
     
    -        # fillna with subset specified for numeric cols
    +        # fillna with subset specified for string cols
             row = self.spark.createDataFrame(
    -            [(None, None, None)], schema).fillna("haha", subset=['name', 'age']).first()
    +            [(None, None, None, None)], schema).fillna("haha", subset=['name', 'age']).first()
             self.assertEqual(row.name, "haha")
             self.assertEqual(row.age, None)
             self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, None)
    +
    +        # fillna with subset specified for bool cols
    +        row = self.spark.createDataFrame(
    +            [(None, None, None, None)], schema).fillna(True, subset=['name', 'age']).first()
    +        self.assertEqual(row.name, None)
    +        self.assertEqual(row.age, None)
    +        self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, True)
    --- End diff --
    
    This should be `None` or an argument `subset` of `fillna()` above should be `['name', 'spy']`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    **[Test build #77674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77674/testReport)** for PR 18164 at commit [`1b3c712`](https://github.com/apache/spark/commit/1b3c7126f827b23db585bc4e5a4cecef854320c4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    BTW, mind fixng the title/contents in the PR to be a bit more descriptive, for example, saying "null" instead of "NA"?  Not a big deal but non R guys might get confused ... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    Ah... I see. Sorry, I misunderstood. BTW, AppVeyor only runs SparkR tests on Windows currently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119805698
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1697,40 +1697,56 @@ def test_fillna(self):
             schema = StructType([
                 StructField("name", StringType(), True),
                 StructField("age", IntegerType(), True),
    -            StructField("height", DoubleType(), True)])
    +            StructField("height", DoubleType(), True),
    +            StructField("spy", BooleanType(), True)])
     
             # fillna shouldn't change non-null values
    -        row = self.spark.createDataFrame([(u'Alice', 10, 80.1)], schema).fillna(50).first()
    +        row = self.spark.createDataFrame([(u'Alice', 10, 80.1, True)], schema).fillna(50).first()
             self.assertEqual(row.age, 10)
     
             # fillna with int
    -        row = self.spark.createDataFrame([(u'Alice', None, None)], schema).fillna(50).first()
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(50).first()
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, 50.0)
     
             # fillna with double
    -        row = self.spark.createDataFrame([(u'Alice', None, None)], schema).fillna(50.1).first()
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(50.1).first()
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, 50.1)
     
    +        # fillna with bool
    +        row = self.spark.createDataFrame([(u'Alice', None, None, None)], schema).fillna(True).first()
    +        self.assertEqual(row.age, None)
    +        self.assertEqual(row.spy, True)
    +
             # fillna with string
    -        row = self.spark.createDataFrame([(None, None, None)], schema).fillna("hello").first()
    +        row = self.spark.createDataFrame([(None, None, None, None)], schema).fillna("hello").first()
             self.assertEqual(row.name, u"hello")
             self.assertEqual(row.age, None)
     
             # fillna with subset specified for numeric cols
             row = self.spark.createDataFrame(
    -            [(None, None, None)], schema).fillna(50, subset=['name', 'age']).first()
    +            [(None, None, None, None)], schema).fillna(50, subset=['name', 'age']).first()
             self.assertEqual(row.name, None)
             self.assertEqual(row.age, 50)
             self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, None)
     
    -        # fillna with subset specified for numeric cols
    +        # fillna with subset specified for string cols
             row = self.spark.createDataFrame(
    -            [(None, None, None)], schema).fillna("haha", subset=['name', 'age']).first()
    +            [(None, None, None, None)], schema).fillna("haha", subset=['name', 'age']).first()
             self.assertEqual(row.name, "haha")
             self.assertEqual(row.age, None)
             self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, None)
    +
    +        # fillna with subset specified for bool cols
    +        row = self.spark.createDataFrame(
    +            [(None, None, None, None)], schema).fillna(True, subset=['name', 'age']).first()
    +        self.assertEqual(row.name, None)
    +        self.assertEqual(row.age, None)
    +        self.assertEqual(row.height, None)
    +        self.assertEqual(row.spy, True)
    --- End diff --
    
    Ah, I see. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119810057
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
    @@ -196,6 +196,24 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) {
       def fill(value: String, cols: Seq[String]): DataFrame = fillValue(value, cols)
     
       /**
    +   * Returns a new `DataFrame` that replaces null values in boolean columns with `value`.
    +   */
    +  def fill(value: Boolean): DataFrame = fill(value, df.columns)
    --- End diff --
    
    Looks we need `@since 2.3.0` for this and the same instances below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119810649
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameNaFunctionsSuite.scala ---
    @@ -36,6 +36,15 @@ class DataFrameNaFunctionsSuite extends QueryTest with SharedSQLContext {
           ).toDF("name", "age", "height")
       }
     
    +  def createBooleanDF(): DataFrame = {
    --- End diff --
    
    It looks this functions is only used once. I think we could just move the lines in the functions into the test, "fill".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119813043
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameNaFunctionsSuite.scala ---
    @@ -124,6 +134,13 @@ class DataFrameNaFunctionsSuite extends QueryTest with SharedSQLContext {
             Row("Nina") :: Row("Amy") :: Row("unknown") :: Nil)
         assert(input.na.fill("unknown").columns.toSeq === input.columns.toSeq)
     
    +    // boolean
    +    checkAnswer(
    +      boolInput.na.fill(true).select("spy"),
    +      Row(false) :: Row(true) :: Row(true) ::
    --- End diff --
    
    Sorry, what do you mean by inlined here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] Add fill functions for nulls...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77685/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119812590
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1303,9 +1312,11 @@ def fillna(self, value, subset=None):
             +---+------+-------+
             """
             if not isinstance(value, (float, int, long, basestring, dict)):
    --- End diff --
    
    I omitted it just because it wasn't failing for this if, but indeed, I'm a bit more on the side of putting it in even if just for completeness. Makes reading the code much saner if we have the if for bool


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] Add fill functions for nulls...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119808075
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1303,9 +1312,11 @@ def fillna(self, value, subset=None):
             +---+------+-------+
             """
             if not isinstance(value, (float, int, long, basestring, dict)):
    -            raise ValueError("value should be a float, int, long, string, or dict")
    +            raise ValueError("value should be a float, int, long, string, boolean or dict")
    --- End diff --
    
    I think we should use the same term, `bool` or `boolean`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119808932
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1303,9 +1312,11 @@ def fillna(self, value, subset=None):
             +---+------+-------+
             """
             if not isinstance(value, (float, int, long, basestring, dict)):
    -            raise ValueError("value should be a float, int, long, string, or dict")
    +            raise ValueError("value should be a float, int, long, string, boolean or dict")
     
    -        if isinstance(value, (int, long)):
    +        if isinstance(value, bool):
    +            pass
    +        elif isinstance(value, (int, long)):
    --- End diff --
    
    Could we just make this `not isinstance(value, bool) and isinstance(value, (int, long))` (maybe with a small comment)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    @HyukjinKwon Yes, I think it's okay to add this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] Add fill functions for nulls...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    Aaa...okay that's fine to me. NA always reminds me of R first :). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] Add fill functions fo...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18164


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119808351
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1303,9 +1312,11 @@ def fillna(self, value, subset=None):
             +---+------+-------+
             """
             if not isinstance(value, (float, int, long, basestring, dict)):
    --- End diff --
    
    I know a bool in Python inherits an int but wouldn't it be more clear if we explicitly mention it here? I don't strongly feel about this.
    
    BTW, this rings a bell - some Python APIs take a `bool` in this way and work unexpectedly in some cases IIRC ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119812842
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
    @@ -196,6 +196,24 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) {
       def fill(value: String, cols: Seq[String]): DataFrame = fillValue(value, cols)
     
       /**
    +   * Returns a new `DataFrame` that replaces null values in boolean columns with `value`.
    +   */
    +  def fill(value: Boolean): DataFrame = fill(value, df.columns)
    +
    +  /**
    +   * (Scala-specific) Returns a new `DataFrame` that replaces null or NaN values in specified
    --- End diff --
    
    Oh, right. I copied the defs and docs from double, as it shows. Will change, NaN booleans would be weird indeed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18164: [SPARK-19732][SQL][PYSPARK] fillna bools

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18164#discussion_r119810367
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
    @@ -196,6 +196,24 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) {
       def fill(value: String, cols: Seq[String]): DataFrame = fillValue(value, cols)
     
       /**
    +   * Returns a new `DataFrame` that replaces null values in boolean columns with `value`.
    +   */
    +  def fill(value: Boolean): DataFrame = fill(value, df.columns)
    +
    +  /**
    +   * (Scala-specific) Returns a new `DataFrame` that replaces null or NaN values in specified
    --- End diff --
    
    I think a boolean column could not have "NaN values".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] Add fill functions for nulls...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77675/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18164: [SPARK-19732][SQL][PYSPARK] Add fill functions for nulls...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18164
  
    **[Test build #77675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77675/testReport)** for PR 18164 at commit [`21b4f67`](https://github.com/apache/spark/commit/21b4f677fe3c521005bbc9b95877dc9d093fbe40).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org