You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by zhengruifeng <gi...@git.apache.org> on 2017/02/17 03:23:11 UTC

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

GitHub user zhengruifeng opened a pull request:

    https://github.com/apache/spark/pull/16971

    [SPARK-19573][SQL] Make NaN/null handling consistent in approxQuantile

    ## What changes were proposed in this pull request?
    update `StatFunctions.multipleApproxQuantiles` to handle NaN/null
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    existing tests and added tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zhengruifeng/spark quantiles_nan

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16971.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16971
    
----
commit d5e79a809b1edd91a7e0c1d8046bb8bfec2ba4c9
Author: Zheng RuiFeng <ru...@foxmail.com>
Date:   2017-02-17T03:18:43Z

    create pr

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r102146260
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala ---
    @@ -78,7 +80,13 @@ object StatFunctions extends Logging {
         def apply(summaries: Array[QuantileSummaries], row: Row): Array[QuantileSummaries] = {
           var i = 0
           while (i < summaries.length) {
    -        summaries(i) = summaries(i).insert(row.getDouble(i))
    +        val item = row(i)
    --- End diff --
    
    This works, though perhaps we can do:
    
    ```scala
    if (!row.isNullAt(i)) {
      val v = row.getDouble(i)
      if (!v.isNaN) {
        summaries(i) = summaries(i).insert(v)
      }
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by thunterdb <gi...@git.apache.org>.

Github user thunterdb commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r102592174
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala ---
    @@ -78,7 +80,12 @@ object StatFunctions extends Logging {
         def apply(summaries: Array[QuantileSummaries], row: Row): Array[QuantileSummaries] = {
           var i = 0
           while (i < summaries.length) {
    -        summaries(i) = summaries(i).insert(row.getDouble(i))
    +        if (!row.isNullAt(i)) {
    --- End diff --
    
    Thank you for fixing this issue, it was an oversight in my original implementation. 
    
    The current exception being thrown depends on an implementation detail (calling `sampled.head`). Can you modify the function `def query` below to explicitly throw an exception if `sampled` is empty, and document this behavior in that function? This way, we will not forget it if decide to change the semantics of that class.
    
    As @MLnick was mentioning above, it would be preferrable to either return an Option, null or NaN eventually, but this can wait for more consensus.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r103844548
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ---
    @@ -245,7 +245,7 @@ object ApproximatePercentile {
             val result = new Array[Double](percentages.length)
             var i = 0
             while (i < percentages.length) {
    -          result(i) = summaries.query(percentages(i))
    +          result(i) = summaries.query(percentages(i)).get
    --- End diff --
    
    Is it possible to return `None`? Then, you will get a strange exception. Could you also add a test case for `getPercentiles` in `ApproximatePercentileQuerySuite`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #73744 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73744/testReport)** for PR 16971 at commit [`2071aae`](https://github.com/apache/spark/commit/2071aaec4cb3805c2cebbf2732f274d881182f3d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #73674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73674/testReport)** for PR 16971 at commit [`8d5941b`](https://github.com/apache/spark/commit/8d5941b8dff12a429a0db833b812b23993c7a888).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74717/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74703/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #73739 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73739/testReport)** for PR 16971 at commit [`2071aae`](https://github.com/apache/spark/commit/2071aaec4cb3805c2cebbf2732f274d881182f3d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74705/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #74837 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74837/testReport)** for PR 16971 at commit [`b1125fd`](https://github.com/apache/spark/commit/b1125fd1a6610d61689f28b6022b4554146ac878).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74908/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r103270963
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala ---
    @@ -91,7 +100,13 @@ object StatFunctions extends Logging {
         }
         val summaries = df.select(columns: _*).rdd.aggregate(emptySummaries)(apply, merge)
     
    -    summaries.map { summary => probabilities.map(summary.query) }
    +    summaries.map { summary =>
    +      try {
    +        probabilities.map(summary.query)
    +      } catch {
    +        case e: SparkException => Seq.empty[Double]
    --- End diff --
    
    Please do not use the Exception handling for this purpose. Instead, you can return None. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #74837 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74837/testReport)** for PR 16971 at commit [`b1125fd`](https://github.com/apache/spark/commit/b1125fd1a6610d61689f28b6022b4554146ac878).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Yes my point was returning null is not very idiomatic in Scala. Better to
    return Option or empty collection. Option doesn't work for Java compat, so
    empty Array is best in this case I believe.
    
    +1 for empty Array and if we can return the quantiles for the non-empty /
    non-NaN cols as per your suggestion that is ideal.
    On Thu, 23 Feb 2017 at 08:12, Ruifeng Zheng <no...@github.com>
    wrote:
    
    > @thunterdb <https://github.com/thunterdb> Good point. I will check the
    > sampled in def query.
    >
    > @MLnick <https://github.com/MLnick> @gatorsmile
    > <https://github.com/gatorsmile> I perfer empty array as the result for
    > empty dataset or columns that only contains na.
    > And, in the case that only some columns only contains na. Current
    > implementation will return null, and result for all column can not be
    > obtained. I think the result for common columns should be accessable.
    >
    > val rows = spark.sparkContext.parallelize(Seq(Row(Double.NaN, 1.0, Double.NaN),
    > +      Row(1.0, -1.0, null), Row(-1.0, Double.NaN, null), Row(Double.NaN, Double.NaN, null),
    > +      Row(null, null, Double.NaN), Row(null, 1.0, null), Row(-1.0, null, Double.NaN),
    > +      Row(Double.NaN, null, null)))
    >      val schema = StructType(Seq(StructField("input1", DoubleType, nullable = true),
    > +      StructField("input2", DoubleType, nullable = true),
    > +      StructField("input3", DoubleType, nullable = true)))
    >      val dfNaN = spark.createDataFrame(rows, schema)
    > val resNaNAll = dfNaN.stat.approxQuantile(Array("input1", "input2", "input3"),
    >        Array(q1, q2), epsilon)
    >
    > In the returned array, result for columns input1 and input2 should be ok,
    > and result for input3 is empty. Array(Array(num1, num2), Array(num3,
    > num4), Array())
    >
    > \u2014
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/16971#issuecomment-281904333>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AA_SB04fXTwOGec3BqPZ06w9F6ps-hTxks5rfSNLgaJpZM4MD1Gt>
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r105588823
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ---
    @@ -245,7 +245,7 @@ object ApproximatePercentile {
             val result = new Array[Double](percentages.length)
             var i = 0
             while (i < percentages.length) {
    -          result(i) = summaries.query(percentages(i))
    +          result(i) = summaries.query(percentages(i)).get
    --- End diff --
    
    cc @thunterdb to ensure it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #73676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73676/testReport)** for PR 16971 at commit [`402deb0`](https://github.com/apache/spark/commit/402deb00f34d535be3c7070ff4062b7f4e67101b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r106335040
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ---
    @@ -245,7 +245,7 @@ object ApproximatePercentile {
             val result = new Array[Double](percentages.length)
             var i = 0
             while (i < percentages.length) {
    -          result(i) = summaries.query(percentages(i))
    +          result(i) = summaries.query(percentages(i)).get
    --- End diff --
    
    Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r106569064
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ---
    @@ -245,7 +245,7 @@ object ApproximatePercentile {
             val result = new Array[Double](percentages.length)
             var i = 0
             while (i < percentages.length) {
    -          result(i) = summaries.query(percentages(i))
    +          result(i) = summaries.query(percentages(i)).get
    --- End diff --
    
    Thanks all. I will add a comment here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73739/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r103844007
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala ---
    @@ -176,17 +176,21 @@ class QuantileSummaries(
        * @param quantile the target quantile
        * @return
        */
    -  def query(quantile: Double): Double = {
    +  def query(quantile: Double): Option[Double] = {
         require(quantile >= 0 && quantile <= 1.0, "quantile should be in the range [0.0, 1.0]")
         require(headSampled.isEmpty,
           "Cannot operate on an uncompressed summary, call compress() first")
     
    +    if (sampled.isEmpty) {
    +      return None
    +    }
    --- End diff --
    
    Nit: 
    ```
    if (sampled.isEmpty) return None
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #74793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74793/testReport)** for PR 16971 at commit [`ed6dacd`](https://github.com/apache/spark/commit/ed6dacdb3e3bdfd4e9ccb5c57bf8b4118636b0c6).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #74717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74717/testReport)** for PR 16971 at commit [`00d67f7`](https://github.com/apache/spark/commit/00d67f71e8c3eb254eabb63a53efdf675689aeb3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #74908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74908/testReport)** for PR 16971 at commit [`b1125fd`](https://github.com/apache/spark/commit/b1125fd1a6610d61689f28b6022b4554146ac878).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74724/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #74908 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74908/testReport)** for PR 16971 at commit [`b1125fd`](https://github.com/apache/spark/commit/b1125fd1a6610d61689f28b6022b4554146ac878).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r106569149
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala ---
    @@ -55,7 +55,7 @@ class QuantileSummariesSuite extends SparkFunSuite {
       }
     
       private def checkQuantile(quant: Double, data: Seq[Double], summary: QuantileSummaries): Unit = {
    -    val approx = summary.query(quant)
    +    val approx = summary.query(quant).get
    --- End diff --
    
    Ok, I will add a test on empty `data` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #73676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73676/testReport)** for PR 16971 at commit [`402deb0`](https://github.com/apache/spark/commit/402deb00f34d535be3c7070ff4062b7f4e67101b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r102601793
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
    @@ -89,18 +89,17 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
        *   Note that values greater than 1 are accepted but give the same result as 1.
        * @return the approximate quantiles at the given probabilities of each column
        *
    -   * @note Rows containing any null or NaN values will be removed before calculation. If
    -   *   the dataframe is empty or all rows contain null or NaN, null is returned.
    +   * @note null and NaN values will be removed from the numerical column before calculation. If
    +   *   the dataframe is empty, or all rows in some column contain null or NaN, null is returned.
        *
        * @since 2.2.0
        */
       def approxQuantile(
           cols: Array[String],
           probabilities: Array[Double],
           relativeError: Double): Array[Array[Double]] = {
    -    // TODO: Update NaN/null handling to keep consistent with the single-column version
         try {
    -      StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*).na.drop(), cols,
    +      StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*), cols,
             probabilities, relativeError).map(_.toArray).toArray
         } catch {
           case e: NoSuchElementException => null
    --- End diff --
    
    In Spark SQL, all the other built-in functions will not throw an exception if the input data set is empty. An empty inupt data set is pretty normal. Returning either `null` or empty `Array` looks ok to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r105588596
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ---
    @@ -245,7 +245,7 @@ object ApproximatePercentile {
             val result = new Array[Double](percentages.length)
             var i = 0
             while (i < percentages.length) {
    -          result(i) = summaries.query(percentages(i))
    +          result(i) = summaries.query(percentages(i)).get
    --- End diff --
    
    Yes. I think it is impossible to hit it. We need some test cases to ensure it. See my next comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    ping @zhengruifeng  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74837/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73344/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    @gatorsmile @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73674/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #74724 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74724/testReport)** for PR 16971 at commit [`ed6dacd`](https://github.com/apache/spark/commit/ed6dacdb3e3bdfd4e9ccb5c57bf8b4118636b0c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #73211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73211/testReport)** for PR 16971 at commit [`31346c3`](https://github.com/apache/spark/commit/31346c3afc67974223a4c001aeb171dcf6ed9d13).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r103872828
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ---
    @@ -245,7 +245,7 @@ object ApproximatePercentile {
             val result = new Array[Double](percentages.length)
             var i = 0
             while (i < percentages.length) {
    -          result(i) = summaries.query(percentages(i))
    +          result(i) = summaries.query(percentages(i)).get
    --- End diff --
    
    It looks like that it is impossible not return `None` here: Since `summaries.count != 0`, the `summaries.sampled` should not be empty, then `None` will not be returned. @thunterdb Is this correct?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r102145412
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala ---
    @@ -54,6 +54,8 @@ object StatFunctions extends Logging {
        *   Note that values greater than 1 are accepted but give the same result as 1.
        *
        * @return for each column, returns the requested approximations
    +   *
    +   * @note null and NaN values will be removed from the numerical column before calculation.
    --- End diff --
    
    I think "will be ignored" is more accurate than "will be removed"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r102167055
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala ---
    @@ -214,20 +214,29 @@ class DataFrameStatSuite extends QueryTest with SharedSQLContext {
         val q1 = 0.5
         val q2 = 0.8
         val epsilon = 0.1
    -    val rows = spark.sparkContext.parallelize(Seq(Row(Double.NaN, 1.0), Row(1.0, 1.0),
    +    val rows = spark.sparkContext.parallelize(Seq(Row(Double.NaN, 1.0), Row(1.0, -1.0),
           Row(-1.0, Double.NaN), Row(Double.NaN, Double.NaN), Row(null, null), Row(null, 1.0),
           Row(-1.0, null), Row(Double.NaN, null)))
         val schema = StructType(Seq(StructField("input1", DoubleType, nullable = true),
           StructField("input2", DoubleType, nullable = true)))
         val dfNaN = spark.createDataFrame(rows, schema)
    -    val resNaN = dfNaN.stat.approxQuantile("input1", Array(q1, q2), epsilon)
    -    assert(resNaN.count(_.isNaN) === 0)
    -    assert(resNaN.count(_ == null) === 0)
    +    val resNaN1 = dfNaN.stat.approxQuantile("input1", Array(q1, q2), epsilon)
    +    assert(resNaN1.count(_.isNaN) === 0)
    +    assert(resNaN1.count(_ == null) === 0)
     
    -    val resNaN2 = dfNaN.stat.approxQuantile(Array("input1", "input2"),
    +    val resNaN2 = dfNaN.stat.approxQuantile("input2", Array(q1, q2), epsilon)
    +    assert(resNaN2.count(_.isNaN) === 0)
    +    assert(resNaN2.count(_ == null) === 0)
    +
    +    val resNaNAll = dfNaN.stat.approxQuantile(Array("input1", "input2"),
           Array(q1, q2), epsilon)
    -    assert(resNaN2.flatten.count(_.isNaN) === 0)
    -    assert(resNaN2.flatten.count(_ == null) === 0)
    +    assert(resNaNAll.flatten.count(_.isNaN) === 0)
    +    assert(resNaNAll.flatten.count(_ == null) === 0)
    +
    +    assert(resNaN1(0) === resNaNAll(0)(0))
    +    assert(resNaN1(1) === resNaNAll(0)(1))
    +    assert(resNaN2(0) === resNaNAll(1)(0))
    +    assert(resNaN2(1) === resNaNAll(1)(1))
    --- End diff --
    
    Yes, I create a new column containing only NaN/null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r105588798
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala ---
    @@ -55,7 +55,7 @@ class QuantileSummariesSuite extends SparkFunSuite {
       }
     
       private def checkQuantile(quant: Double, data: Seq[Double], summary: QuantileSummaries): Unit = {
    -    val approx = summary.query(quant)
    +    val approx = summary.query(quant).get
    --- End diff --
    
    Add a test case with `summary.count == 0` and improve this helper function to cover it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #74705 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74705/testReport)** for PR 16971 at commit [`7bf7db3`](https://github.com/apache/spark/commit/7bf7db3114234dc900a9fd7a9b36615fa0bc1a3f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    @thunterdb Good point. I will check the `sampled` in `def query`. 
    
    @MLnick @gatorsmile I perfer empty array as the result for empty dataset or columns that only contains na.
    And, in the case that only some columns only contains na. Current implementation will return null,  and result for all column can not be obtained. I think the result for common columns should be accessable.
    ```
    val rows = spark.sparkContext.parallelize(Seq(Row(Double.NaN, 1.0, Double.NaN),
    +      Row(1.0, -1.0, null), Row(-1.0, Double.NaN, null), Row(Double.NaN, Double.NaN, null),
    +      Row(null, null, Double.NaN), Row(null, 1.0, null), Row(-1.0, null, Double.NaN),
    +      Row(Double.NaN, null, null)))
         val schema = StructType(Seq(StructField("input1", DoubleType, nullable = true),
    +      StructField("input2", DoubleType, nullable = true),
    +      StructField("input3", DoubleType, nullable = true)))
         val dfNaN = spark.createDataFrame(rows, schema)
    val resNaNAll = dfNaN.stat.approxQuantile(Array("input1", "input2", "input3"),
           Array(q1, q2), epsilon)
    ```
    In the returned array, result for columns `input1` and `input2` should be ok, and result for `input3` is empty. `Array(Array(num1, num2), Array(num3, num4), Array())`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r102145908
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
    @@ -89,18 +89,17 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
        *   Note that values greater than 1 are accepted but give the same result as 1.
        * @return the approximate quantiles at the given probabilities of each column
        *
    -   * @note Rows containing any null or NaN values will be removed before calculation. If
    -   *   the dataframe is empty or all rows contain null or NaN, null is returned.
    +   * @note null and NaN values will be removed from the numerical column before calculation. If
    +   *   the dataframe is empty, or all rows in some column contain null or NaN, null is returned.
        *
        * @since 2.2.0
        */
       def approxQuantile(
           cols: Array[String],
           probabilities: Array[Double],
           relativeError: Double): Array[Array[Double]] = {
    -    // TODO: Update NaN/null handling to keep consistent with the single-column version
         try {
    -      StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*).na.drop(), cols,
    +      StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*), cols,
             probabilities, relativeError).map(_.toArray).toArray
         } catch {
           case e: NoSuchElementException => null
    --- End diff --
    
    This went in for the other PR but I still question whether we should be returning `null` here. Is this standard in SparkSQL? What about returning an empty `Array`? cc @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by thunterdb <gi...@git.apache.org>.

Github user thunterdb commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r106309333
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ---
    @@ -245,7 +245,7 @@ object ApproximatePercentile {
             val result = new Array[Double](percentages.length)
             var i = 0
             while (i < percentages.length) {
    -          result(i) = summaries.query(percentages(i))
    +          result(i) = summaries.query(percentages(i)).get
    --- End diff --
    
    Yes, this is correct. But you should leave a comment, since it is not obvious.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r103843868
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
    @@ -89,22 +88,17 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
        *   Note that values greater than 1 are accepted but give the same result as 1.
        * @return the approximate quantiles at the given probabilities of each column
        *
    -   * @note Rows containing any null or NaN values will be removed before calculation. If
    -   *   the dataframe is empty or all rows contain null or NaN, null is returned.
    +   * @note null and NaN values will be ignored in numerical columns before calculation. For
    +   *   columns only containing null or NaN values, an empty array is returned.
        *
        * @since 2.2.0
        */
       def approxQuantile(
           cols: Array[String],
           probabilities: Array[Double],
           relativeError: Double): Array[Array[Double]] = {
    -    // TODO: Update NaN/null handling to keep consistent with the single-column version
    -    try {
    -      StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*).na.drop(), cols,
    -        probabilities, relativeError).map(_.toArray).toArray
    -    } catch {
    -      case e: NoSuchElementException => null
    -    }
    +    StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*), cols,
    +      probabilities, relativeError).map(_.toArray).toArray
    --- End diff --
    
    Nit: style issue
    ```
        StatFunctions.multipleApproxQuantiles(
          df.select(cols.map(col): _*),
          cols,
          probabilities,
          relativeError).map(_.toArray).toArray
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #73033 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73033/testReport)** for PR 16971 at commit [`d5e79a8`](https://github.com/apache/spark/commit/d5e79a809b1edd91a7e0c1d8046bb8bfec2ba4c9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #73744 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73744/testReport)** for PR 16971 at commit [`2071aae`](https://github.com/apache/spark/commit/2071aaec4cb3805c2cebbf2732f274d881182f3d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by thunterdb <gi...@git.apache.org>.

Github user thunterdb commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    @zhengruifeng thanks for looking into this issue. I have one comment above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74793/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #74705 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74705/testReport)** for PR 16971 at commit [`7bf7db3`](https://github.com/apache/spark/commit/7bf7db3114234dc900a9fd7a9b36615fa0bc1a3f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    LGTM pending Jenkins 
    
    cc @thunterdb @MLnick 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r102145538
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
    @@ -89,18 +89,17 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
        *   Note that values greater than 1 are accepted but give the same result as 1.
        * @return the approximate quantiles at the given probabilities of each column
        *
    -   * @note Rows containing any null or NaN values will be removed before calculation. If
    -   *   the dataframe is empty or all rows contain null or NaN, null is returned.
    +   * @note null and NaN values will be removed from the numerical column before calculation. If
    --- End diff --
    
    Again, "ignored" is slightly better than "removed from"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #73033 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73033/testReport)** for PR 16971 at commit [`d5e79a8`](https://github.com/apache/spark/commit/d5e79a809b1edd91a7e0c1d8046bb8bfec2ba4c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73744/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r103844048
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala ---
    @@ -78,7 +81,12 @@ object StatFunctions extends Logging {
         def apply(summaries: Array[QuantileSummaries], row: Row): Array[QuantileSummaries] = {
           var i = 0
           while (i < summaries.length) {
    -        summaries(i) = summaries(i).insert(row.getDouble(i))
    +        if (!row.isNullAt(i)) {
    +          val v = row.getDouble(i)
    +          if (!v.isNaN) {
    +            summaries(i) = summaries(i).insert(v)
    +          }
    --- End diff --
    
    Nit:
    ```
    if (!v.isNaN) summaries(i) = summaries(i).insert(v)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73676/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73033/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16971


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #73211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73211/testReport)** for PR 16971 at commit [`31346c3`](https://github.com/apache/spark/commit/31346c3afc67974223a4c001aeb171dcf6ed9d13).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by thunterdb <gi...@git.apache.org>.

Github user thunterdb commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r102589719
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
    @@ -89,18 +89,17 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
        *   Note that values greater than 1 are accepted but give the same result as 1.
        * @return the approximate quantiles at the given probabilities of each column
        *
    -   * @note Rows containing any null or NaN values will be removed before calculation. If
    -   *   the dataframe is empty or all rows contain null or NaN, null is returned.
    +   * @note null and NaN values will be removed from the numerical column before calculation. If
    +   *   the dataframe is empty, or all rows in some column contain null or NaN, null is returned.
        *
        * @since 2.2.0
        */
       def approxQuantile(
           cols: Array[String],
           probabilities: Array[Double],
           relativeError: Double): Array[Array[Double]] = {
    -    // TODO: Update NaN/null handling to keep consistent with the single-column version
         try {
    -      StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*).na.drop(), cols,
    +      StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*), cols,
             probabilities, relativeError).map(_.toArray).toArray
         } catch {
           case e: NoSuchElementException => null
    --- End diff --
    
    +1. I tend to think that the result should be `NaN` (following the IEEE convention) or `null` (following scala Option convention). But pending a resolution, I am fine with throwing an exception because it is the most conservative behavior (stopping computations).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #74703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74703/testReport)** for PR 16971 at commit [`42c7b25`](https://github.com/apache/spark/commit/42c7b25ca0665bd5187e67e732dba0edb37c05a8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #73338 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73338/testReport)** for PR 16971 at commit [`bdf3bf0`](https://github.com/apache/spark/commit/bdf3bf08e30b4fceda0351e62bc03f00e84bb459).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #74703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74703/testReport)** for PR 16971 at commit [`42c7b25`](https://github.com/apache/spark/commit/42c7b25ca0665bd5187e67e732dba0edb37c05a8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #74717 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74717/testReport)** for PR 16971 at commit [`00d67f7`](https://github.com/apache/spark/commit/00d67f71e8c3eb254eabb63a53efdf675689aeb3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #73338 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73338/testReport)** for PR 16971 at commit [`bdf3bf0`](https://github.com/apache/spark/commit/bdf3bf08e30b4fceda0351e62bc03f00e84bb459).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #73344 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73344/testReport)** for PR 16971 at commit [`e0365b9`](https://github.com/apache/spark/commit/e0365b9c494e2eefaef04057a70877a14b50cb06).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r102146144
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala ---
    @@ -214,20 +214,29 @@ class DataFrameStatSuite extends QueryTest with SharedSQLContext {
         val q1 = 0.5
         val q2 = 0.8
         val epsilon = 0.1
    -    val rows = spark.sparkContext.parallelize(Seq(Row(Double.NaN, 1.0), Row(1.0, 1.0),
    +    val rows = spark.sparkContext.parallelize(Seq(Row(Double.NaN, 1.0), Row(1.0, -1.0),
           Row(-1.0, Double.NaN), Row(Double.NaN, Double.NaN), Row(null, null), Row(null, 1.0),
           Row(-1.0, null), Row(Double.NaN, null)))
         val schema = StructType(Seq(StructField("input1", DoubleType, nullable = true),
           StructField("input2", DoubleType, nullable = true)))
         val dfNaN = spark.createDataFrame(rows, schema)
    -    val resNaN = dfNaN.stat.approxQuantile("input1", Array(q1, q2), epsilon)
    -    assert(resNaN.count(_.isNaN) === 0)
    -    assert(resNaN.count(_ == null) === 0)
    +    val resNaN1 = dfNaN.stat.approxQuantile("input1", Array(q1, q2), epsilon)
    +    assert(resNaN1.count(_.isNaN) === 0)
    +    assert(resNaN1.count(_ == null) === 0)
     
    -    val resNaN2 = dfNaN.stat.approxQuantile(Array("input1", "input2"),
    +    val resNaN2 = dfNaN.stat.approxQuantile("input2", Array(q1, q2), epsilon)
    +    assert(resNaN2.count(_.isNaN) === 0)
    +    assert(resNaN2.count(_ == null) === 0)
    +
    +    val resNaNAll = dfNaN.stat.approxQuantile(Array("input1", "input2"),
           Array(q1, q2), epsilon)
    -    assert(resNaN2.flatten.count(_.isNaN) === 0)
    -    assert(resNaN2.flatten.count(_ == null) === 0)
    +    assert(resNaNAll.flatten.count(_.isNaN) === 0)
    +    assert(resNaNAll.flatten.count(_ == null) === 0)
    +
    +    assert(resNaN1(0) === resNaNAll(0)(0))
    +    assert(resNaN1(1) === resNaNAll(0)(1))
    +    assert(resNaN2(0) === resNaNAll(1)(0))
    +    assert(resNaN2(1) === resNaNAll(1)(1))
    --- End diff --
    
    Do we need a test for one column all nulls (that it returns null)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #73674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73674/testReport)** for PR 16971 at commit [`8d5941b`](https://github.com/apache/spark/commit/8d5941b8dff12a429a0db833b812b23993c7a888).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16971#discussion_r103844105
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala ---
    @@ -91,7 +99,9 @@ object StatFunctions extends Logging {
         }
         val summaries = df.select(columns: _*).rdd.aggregate(emptySummaries)(apply, merge)
     
    -    summaries.map { summary => probabilities.map(summary.query) }
    +    summaries.map { summary =>
    +      probabilities.flatMap(summary.query)
    +    }
    --- End diff --
    
    Nit: 
    ```
    summaries.map { summary => probabilities.flatMap(summary.query) }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    ping @MLnick @gatorsmile @thunterdb 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Since it is close to code freeze, I am first merging this PR. If any more comments, we can resolve them in the follow-up PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #73344 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73344/testReport)** for PR 16971 at commit [`e0365b9`](https://github.com/apache/spark/commit/e0365b9c494e2eefaef04057a70877a14b50cb06).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73338/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    **[Test build #74793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74793/testReport)** for PR 16971 at commit [`ed6dacd`](https://github.com/apache/spark/commit/ed6dacdb3e3bdfd4e9ccb5c57bf8b4118636b0c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16971
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73211/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org