You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by wzhfy <gi...@git.apache.org> on 2017/10/05 15:50:52 UTC

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

GitHub user wzhfy opened a pull request:

    https://github.com/apache/spark/pull/19438

    [SPARK-22208] [SQL] Improve percentile_approx by not rounding up targetError and starting from index 0

    ## What changes were proposed in this pull request?
    
    Currently percentile_approx never returns the first element when percentile is in (relativeError, 1/N], where relativeError default 1/10000, and N is the total number of elements. But ideally, percentiles in [0, 1/N] should all return the first element as the answer.
    
    For example, given input data 1 to 10, if a user queries 10% (or even less) percentile, it should return 1, because the first value 1 already reaches 10%. Currently it returns 2.
    
    Based on the paper, targetError is not rounded up, and searching index should start from 0 instead of 1. By following the paper, we should be able to fix the cases mentioned above.
    
    ## How was this patch tested?
    
    Added a new test case and fix existing test cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wzhfy/spark improve_percentile_approx

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19438.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19438
    
----
commit 24f8295498a7ad6d2d99ea27a196ccf154165907
Author: Zhenhua Wang <wz...@163.com>
Date:   2017-09-30T16:04:32Z

    return the first element for small percentage

commit 8c8c22dbebe99def6127b49988dfc4f886797bd6
Author: Zhenhua Wang <wz...@163.com>
Date:   2017-10-02T10:24:28Z

    fix test

commit dbc3d47b0a56113032d2a4565180932e4ef26219
Author: Zhenhua Wang <wz...@163.com>
Date:   2017-10-02T14:53:04Z

    fix test

commit 9815ce8e17e34422f8c915d115061a9635abd119
Author: Zhenhua Wang <wz...@163.com>
Date:   2017-10-03T14:51:55Z

    fix pyspark test

commit f2b153800ebdf10999d4a8bb3578101a12f6d631
Author: Zhenhua Wang <wz...@163.com>
Date:   2017-10-05T15:47:27Z

    follow the paper and fix sparkR test

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143481416
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala ---
    @@ -58,7 +58,7 @@ class QuantileSummariesSuite extends SparkFunSuite {
         if (data.nonEmpty) {
           val approx = summary.query(quant).get
           // The rank of the approximation.
    -      val rank = data.count(_ < approx) // has to be <, not <= to be exact
    +      val rank = data.count(_ <= approx)
    --- End diff --
    
    Or I can get the rank as follows, then the tests can pass:
    ```
          val minRank = data.count(_ < approx)
          val maxRank = data.count(_ <= approx)
          val rank = if (maxRank - minRank > 1) (minRank + maxRank) / 2 else maxRank
    ```
    what do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    **[Test build #82534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82534/testReport)** for PR 19438 at commit [`49262d1`](https://github.com/apache/spark/commit/49262d1bc2bc30c635e727952eda2dc5612b887c).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143480931
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala ---
    @@ -58,7 +58,7 @@ class QuantileSummariesSuite extends SparkFunSuite {
         if (data.nonEmpty) {
           val approx = summary.query(quant).get
           // The rank of the approximation.
    -      val rank = data.count(_ < approx) // has to be <, not <= to be exact
    +      val rank = data.count(_ <= approx)
    --- End diff --
    
    In one of the test case, `data.count(_ < approx)` = 39 and `data.count(_ <= approx)` = 40, so the average (39 + 40) / 2 < 40 (lower bound), the test still fails. Besides, data in the test suite is increasing/decreasing/random, so the case [1,2,2,2,2,2,2,2,3] can hardly happen.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82593/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143203632
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/ImputerSuite.scala ---
    @@ -43,7 +43,7 @@ class ImputerSuite extends SparkFunSuite with MLlibTestSparkContext with Default
           (0, 1.0, 1.0, 1.0),
           (1, 3.0, 3.0, 3.0),
           (2, Double.NaN, Double.NaN, Double.NaN),
    -      (3, -1.0, 2.0, 3.0)
    +      (3, -1.0, 2.0, 1.0)
    --- End diff --
    
    yes, the data has only two values 1.0 and 3.0, after the change, median (50% percentile) is 1.0


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82512/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82479/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    **[Test build #82539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82539/testReport)** for PR 19438 at commit [`b3e976f`](https://github.com/apache/spark/commit/b3e976f5209651cbec5f8fd360a3ea43bd620d80).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143321930
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala ---
    @@ -58,7 +58,7 @@ class QuantileSummariesSuite extends SparkFunSuite {
         if (data.nonEmpty) {
           val approx = summary.query(quant).get
           // The rank of the approximation.
    -      val rank = data.count(_ < approx) // has to be <, not <= to be exact
    +      val rank = data.count(_ <= approx)
    --- End diff --
    
    @srowen @jiangxb1987 @thunterdb Not sure I did the right change, is the previous comment (has to be <, not <= to be exact) still correct? Similar changes were made in `DataFrameStatSuite`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Getting close, down to some R test failures.
    
    ```
    Failed -------------------------------------------------------------------------
    1. Failure: approxQuantile() on a DataFrame (@test_sparkSQL.R#2747) ------------
    quantiles2[[2]] not equal to list(50, 80).
    Component 1: Mean relative difference: 0.02040816
    Component 2: Mean relative difference: 0.01265823
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    **[Test build #82536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82536/testReport)** for PR 19438 at commit [`e5fbdca`](https://github.com/apache/spark/commit/e5fbdca6cf73b2b931cc22f815fc7fa426351bea).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    **[Test build #82512 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82512/testReport)** for PR 19438 at commit [`aae2769`](https://github.com/apache/spark/commit/aae2769b753db483e0ec2652fa0c4f09fcd14c61).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143481784
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala ---
    @@ -157,21 +157,21 @@ class DataFrameStatSuite extends QueryTest with SharedSQLContext {
           val error_single = 2 * 1000 * epsilon
           val error_double = 2 * 2000 * epsilon
     
    -      assert(math.abs(single1 - q1 * n) < error_single)
    -      assert(math.abs(double2 - 2 * q2 * n) < error_double)
    -      assert(math.abs(s1 - q1 * n) < error_single)
    -      assert(math.abs(s2 - q2 * n) < error_single)
    -      assert(math.abs(d1 - 2 * q1 * n) < error_double)
    -      assert(math.abs(d2 - 2 * q2 * n) < error_double)
    +      assert(math.abs(single1 - q1 * n) <= error_single)
    --- End diff --
    
    Yes, I can change the data to an odd number of values.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143324827
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala ---
    @@ -58,7 +58,7 @@ class QuantileSummariesSuite extends SparkFunSuite {
         if (data.nonEmpty) {
           val approx = summary.query(quant).get
           // The rank of the approximation.
    -      val rank = data.count(_ < approx) // has to be <, not <= to be exact
    +      val rank = data.count(_ <= approx)
    --- End diff --
    
    This is trying to recover the rank of the element that was picked as the quantile. I think that's problematic when the input repeats the value chosen as the quantile. Consider estimating the median of [1,2,2,2,2,2,2,2,3]. If the method correctly picks 2, depending on whether you define this test as < or <=, you conclude that it picked rank 1 or 8 of 9 as the median. Any reasonable test of whether that rank is near the expected 5 will fail either way in some cases.
    
    One reasonable fix it to actually use the average of `data.count(_ < approx)` and `data.count(_ <= approx)` as the implied rank that was chosen.
    
    Do you know which test case failed without this change?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    **[Test build #82479 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82479/testReport)** for PR 19438 at commit [`f2b1538`](https://github.com/apache/spark/commit/f2b153800ebdf10999d4a8bb3578101a12f6d631).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Maybe we can run some of the major test suites locally and update all the results.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143000567
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/ImputerSuite.scala ---
    @@ -43,7 +43,7 @@ class ImputerSuite extends SparkFunSuite with MLlibTestSparkContext with Default
           (0, 1.0, 1.0, 1.0),
           (1, 3.0, 3.0, 3.0),
           (2, Double.NaN, Double.NaN, Double.NaN),
    -      (3, -1.0, 2.0, 3.0)
    +      (3, -1.0, 2.0, 1.0)
    --- End diff --
    
    Did this have to change as a result? just checking it's intentional


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82539/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    **[Test build #82536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82536/testReport)** for PR 19438 at commit [`e5fbdca`](https://github.com/apache/spark/commit/e5fbdca6cf73b2b931cc22f815fc7fa426351bea).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by WeichenXu123 <gi...@git.apache.org>.

Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143492975
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala ---
    @@ -58,7 +58,7 @@ class QuantileSummariesSuite extends SparkFunSuite {
         if (data.nonEmpty) {
           val approx = summary.query(quant).get
           // The rank of the approximation.
    -      val rank = data.count(_ < approx) // has to be <, not <= to be exact
    +      val rank = data.count(_ <= approx)
    --- End diff --
    
    I agreed that the rank here is not accurate, especially such case `[1,2,2,2,2,2,2,2,3]`.
    Use average of `data.count(_ < approx) ` and `data.count(_ <= approx) ` looks more reasonable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143348208
  
    --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
    @@ -2738,7 +2738,7 @@ test_that("sampleBy() on a DataFrame", {
     })
     
     test_that("approxQuantile() on a DataFrame", {
    -  l <- lapply(c(0:99), function(i) { list(i, 99 - i) })
    +  l <- lapply(c(1:100), function(i) { list(i, 101 - i) })
    --- End diff --
    
    For data 0-99, before this pr, the 0.5 percentile is 50, after this pr, the percentile is 49. Both 49 and 50 is correct answer as 0.5 percentile for 0-99.
    So we can fix the test by either change data to 1-100, or change the expected percentile to 49 if data unchanged (0-99).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143748535
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala ---
    @@ -58,7 +58,7 @@ class QuantileSummariesSuite extends SparkFunSuite {
         if (data.nonEmpty) {
           val approx = summary.query(quant).get
           // The rank of the approximation.
    -      val rank = data.count(_ < approx) // has to be <, not <= to be exact
    +      val rank = data.count(_ <= approx)
    --- End diff --
    
    It queries 50% quantile with relativeError 0.1, then targetError is 0.1*100 = 10, so the expected rank should be in [40, 60].


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19438


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    **[Test build #82538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82538/testReport)** for PR 19438 at commit [`b3e976f`](https://github.com/apache/spark/commit/b3e976f5209651cbec5f8fd360a3ea43bd620d80).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143324966
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala ---
    @@ -157,21 +157,21 @@ class DataFrameStatSuite extends QueryTest with SharedSQLContext {
           val error_single = 2 * 1000 * epsilon
           val error_double = 2 * 2000 * epsilon
     
    -      assert(math.abs(single1 - q1 * n) < error_single)
    -      assert(math.abs(double2 - 2 * q2 * n) < error_double)
    -      assert(math.abs(s1 - q1 * n) < error_single)
    -      assert(math.abs(s2 - q2 * n) < error_single)
    -      assert(math.abs(d1 - 2 * q1 * n) < error_double)
    -      assert(math.abs(d2 - 2 * q2 * n) < error_double)
    +      assert(math.abs(single1 - q1 * n) <= error_single)
    --- End diff --
    
    Were these failing?
    I think the test is a little off. The input col "singles" is 0-999, not 1-1000. The median, for example, could really be any number between 499 and 500. It might conventionally be defined as 499.5 but given that this is approximate and chooses an integral rank, 499 and 500 are OK, as are 498 and 501.
    
    I think loosening the condition like this is OK, it's coherent. It also strikes me that changing to `Seq.tablulate(n+1)...` above would make the expected values implied here correct and thus also fix it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r142999631
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala ---
    @@ -129,7 +144,7 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSQLContext {
         withTempView(table) {
           (1 to 1000).toDF("col").createOrReplaceTempView(table)
           checkAnswer(
    -        spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 800D) FROM $table"),
    +        spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 8000D) FROM $table"),
    --- End diff --
    
    I recall that without the change the answer was "499", which is also really close, so I think this is fine.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143331525
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala ---
    @@ -58,7 +58,7 @@ class QuantileSummariesSuite extends SparkFunSuite {
         if (data.nonEmpty) {
           val approx = summary.query(quant).get
           // The rank of the approximation.
    -      val rank = data.count(_ < approx) // has to be <, not <= to be exact
    +      val rank = data.count(_ <= approx)
    --- End diff --
    
    Without this change, four test cases failed:
    1. Merging ordered lists with epsi=0.1 and seq=increasing, compression=1000
    2. Merging ordered lists with epsi=0.1 and seq=increasing, compression=10
    3. Merging ordered lists with epsi=0.1 and seq=decreasing, compression=1000
    4. Merging ordered lists with epsi=0.1 and seq=decreasing, compression=10


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143761026
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala ---
    @@ -58,7 +58,7 @@ class QuantileSummariesSuite extends SparkFunSuite {
         if (data.nonEmpty) {
           val approx = summary.query(quant).get
           // The rank of the approximation.
    -      val rank = data.count(_ < approx) // has to be <, not <= to be exact
    +      val rank = data.count(_ <= approx)
    --- End diff --
    
    Yes I think rounding up the average can solve the problem.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    cc @srowen @jiangxb1987 @HyukjinKwon 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Thanks! 
    
    This changes the query result. I think this is not a regression bug. I only merge it to master. If anybody else has a concern, we still can address it in the follow-up PRs. also cc @yanboliang 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r142981865
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala ---
    @@ -129,7 +144,7 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSQLContext {
         withTempView(table) {
           (1 to 1000).toDF("col").createOrReplaceTempView(table)
           checkAnswer(
    -        spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 800D) FROM $table"),
    +        spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 8000D) FROM $table"),
    --- End diff --
    
    here, fix the test case by increasing accuracy


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143327081
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala ---
    @@ -157,21 +157,21 @@ class DataFrameStatSuite extends QueryTest with SharedSQLContext {
           val error_single = 2 * 1000 * epsilon
           val error_double = 2 * 2000 * epsilon
     
    -      assert(math.abs(single1 - q1 * n) < error_single)
    -      assert(math.abs(double2 - 2 * q2 * n) < error_double)
    -      assert(math.abs(s1 - q1 * n) < error_single)
    -      assert(math.abs(s2 - q2 * n) < error_single)
    -      assert(math.abs(d1 - 2 * q1 * n) < error_double)
    -      assert(math.abs(d2 - 2 * q2 * n) < error_double)
    +      assert(math.abs(single1 - q1 * n) <= error_single)
    --- End diff --
    
    Yes, I tried them one by one, all these checks need to be loosened to pass the test (if keep the input unchanged 0-999).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143347310
  
    --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
    @@ -2738,7 +2738,7 @@ test_that("sampleBy() on a DataFrame", {
     })
     
     test_that("approxQuantile() on a DataFrame", {
    -  l <- lapply(c(0:99), function(i) { list(i, 99 - i) })
    +  l <- lapply(c(1:100), function(i) { list(i, 101 - i) })
    --- End diff --
    
    could you elaborate how this fix the test?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82538/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    **[Test build #82479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82479/testReport)** for PR 19438 at commit [`f2b1538`](https://github.com/apache/spark/commit/f2b153800ebdf10999d4a8bb3578101a12f6d631).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143406835
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala ---
    @@ -58,7 +58,7 @@ class QuantileSummariesSuite extends SparkFunSuite {
         if (data.nonEmpty) {
           val approx = summary.query(quant).get
           // The rank of the approximation.
    -      val rank = data.count(_ < approx) // has to be <, not <= to be exact
    +      val rank = data.count(_ <= approx)
    --- End diff --
    
    I still tend to think we need to fix this test case differently, as it has the same potential problem but the other way.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    **[Test build #82593 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82593/testReport)** for PR 19438 at commit [`1180265`](https://github.com/apache/spark/commit/11802650c938be59163721254193c67bf3949a99).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143001025
  
    --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
    @@ -2538,7 +2538,7 @@ test_that("describe() and summary() on a DataFrame", {
     
       stats2 <- summary(df)
       expect_equal(collect(stats2)[5, "summary"], "25%")
    -  expect_equal(collect(stats2)[5, "age"], "30")
    +  expect_equal(collect(stats2)[5, "age"], "19")
    --- End diff --
    
    Also looks more logical given the input contains values 19 and 30 only.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82536/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    **[Test build #82539 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82539/testReport)** for PR 19438 at commit [`b3e976f`](https://github.com/apache/spark/commit/b3e976f5209651cbec5f8fd360a3ea43bd620d80).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    **[Test build #82538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82538/testReport)** for PR 19438 at commit [`b3e976f`](https://github.com/apache/spark/commit/b3e976f5209651cbec5f8fd360a3ea43bd620d80).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143406940
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala ---
    @@ -157,21 +157,21 @@ class DataFrameStatSuite extends QueryTest with SharedSQLContext {
           val error_single = 2 * 1000 * epsilon
           val error_double = 2 * 2000 * epsilon
     
    -      assert(math.abs(single1 - q1 * n) < error_single)
    -      assert(math.abs(double2 - 2 * q2 * n) < error_double)
    -      assert(math.abs(s1 - q1 * n) < error_single)
    -      assert(math.abs(s2 - q2 * n) < error_single)
    -      assert(math.abs(d1 - 2 * q1 * n) < error_double)
    -      assert(math.abs(d2 - 2 * q2 * n) < error_double)
    +      assert(math.abs(single1 - q1 * n) <= error_single)
    --- End diff --
    
    Does `Seq.tabulate(n + 1)` also work? for this and the R test below, it's less ambiguous to search for the median in a list of an odd number of values.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143684083
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala ---
    @@ -58,7 +58,7 @@ class QuantileSummariesSuite extends SparkFunSuite {
         if (data.nonEmpty) {
           val approx = summary.query(quant).get
           // The rank of the approximation.
    -      val rank = data.count(_ < approx) // has to be <, not <= to be exact
    +      val rank = data.count(_ <= approx)
    --- End diff --
    
    @wzhfy that formula is asymmetric which feels wrong; it may happen to fix this but maybe would fail another future case. It would be a little more principled to round the average.
    
    Yeah I know that [1,2,2,2,2,2,2,2,3] can't happen in this test, just illustrating a general point.
    
    Hm, what's the case where the quantile is between 39 and 40? the input is 0-99 in that case? I don't see a test for the 40% quantile so wondering if we really do have a problem or are misunderstanding the failure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143202515
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala ---
    @@ -129,7 +144,7 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSQLContext {
         withTempView(table) {
           (1 to 1000).toDF("col").createOrReplaceTempView(table)
           checkAnswer(
    -        spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 800D) FROM $table"),
    +        spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 8000D) FROM $table"),
    --- End diff --
    
    ok, I'll change the expected result to 499


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    **[Test build #82593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82593/testReport)** for PR 19438 at commit [`1180265`](https://github.com/apache/spark/commit/11802650c938be59163721254193c67bf3949a99).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19438#discussion_r143000448
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1038,8 +1038,8 @@ def summary(self, *statistics):
             |   mean|               3.5| null|
             | stddev|2.1213203435596424| null|
             |    min|                 2|Alice|
    -        |    25%|                 5| null|
    -        |    50%|                 5| null|
    +        |    25%|                 2| null|
    --- End diff --
    
    Although this looks like a big change, the test data set has only two data elements, with values 2 and 5, so these are pretty equally valid. It's probably more logical that the 25% percentile is 2 if 75% is 5.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82534/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    **[Test build #82534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82534/testReport)** for PR 19438 at commit [`49262d1`](https://github.com/apache/spark/commit/49262d1bc2bc30c635e727952eda2dc5612b887c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19438
  
    **[Test build #82512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82512/testReport)** for PR 19438 at commit [`aae2769`](https://github.com/apache/spark/commit/aae2769b753db483e0ec2652fa0c4f09fcd14c61).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org