You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/09/11 08:59:20 UTC

[jira] [Resolved] (SPARK-17439) QuantilesSummaries returns the wrong result after compression

     [ https://issues.apache.org/jira/browse/SPARK-17439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-17439.
-------------------------------
       Resolution: Fixed
         Assignee: Tim Hunter
    Fix Version/s: 2.1.0
                   2.0.1

Resolved by https://github.com/apache/spark/pull/15002

> QuantilesSummaries returns the wrong result after compression
> -------------------------------------------------------------
>
>                 Key: SPARK-17439
>                 URL: https://issues.apache.org/jira/browse/SPARK-17439
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Tim Hunter
>            Assignee: Tim Hunter
>              Labels: correctness
>             Fix For: 2.0.1, 2.1.0
>
>
> [~clockfly] found the following corner case that returns the wrong quantile (off by 1):
> {code}
> test("test QuantileSummaries compression") {
>     var left = new QuantileSummaries(10000, 0.0001)
>     System.out.println("LEFT      RIGHT")
>     System.out.println("====================")
>     (0 to 10).foreach { index =>
>       left = left.insert(index)
>       left = left.compress()
>       var right = new QuantileSummaries(10000, 0.0001)
>       (0 to index).foreach(right.insert(_))
>       right = right.compress()
>       System.out.println(s"${left.query(0.5)}   ${right.query(0.5)}")
>     }
>   }
> {code}
> The result is:
> {code}
> LEFT      RIGHT
> ====================
> 0.0   0.0
> 0.0   1.0
> 0.0   1.0
> 0.0   1.0
> 1.0   2.0
> 1.0   2.0
> 2.0   3.0
> 2.0   3.0
> 3.0   4.0
> 3.0   4.0
> 4.0   5.0
> {code}
> The value of the "LEFT" column represents the output when using QuantileSummaries in Window function, the value on the "RIGHT" column represents the expected result. The different between "LEFT" and "RIGHT" column is that the "LEFT" column does intermediate compression on the storage of QuantileSummaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org