You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/09/07 21:15:21 UTC

[jira] [Assigned] (SPARK-17439) QuantilesSummaries returns the wrong result after compression

     [ https://issues.apache.org/jira/browse/SPARK-17439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-17439:
------------------------------------

    Assignee: Apache Spark

> QuantilesSummaries returns the wrong result after compression
> -------------------------------------------------------------
>
>                 Key: SPARK-17439
>                 URL: https://issues.apache.org/jira/browse/SPARK-17439
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Tim Hunter
>            Assignee: Apache Spark
>
> [~clockfly] found the following corner case that returns the wrong quantile (off by 1):
> {code}
> test("test QuantileSummaries compression") {
>     var left = new QuantileSummaries(10000, 0.0001)
>     System.out.println("LEFT      RIGHT")
>     System.out.println("====================")
>     (0 to 10).foreach { index =>
>       left = left.insert(index)
>       left = left.compress()
>       var right = new QuantileSummaries(10000, 0.0001)
>       (0 to index).foreach(right.insert(_))
>       right = right.compress()
>       System.out.println(s"${left.query(0.5)}   ${right.query(0.5)}")
>     }
>   }
> {code}
> The result is:
> {code}
> LEFT      RIGHT
> ====================
> 0.0   0.0
> 0.0   1.0
> 0.0   1.0
> 0.0   1.0
> 1.0   2.0
> 1.0   2.0
> 2.0   3.0
> 2.0   3.0
> 3.0   4.0
> 3.0   4.0
> 4.0   5.0
> {code}
> The value of the "LEFT" column represents the output when using QuantileSummaries in Window function, the value on the "RIGHT" column represents the expected result. The different between "LEFT" and "RIGHT" column is that the "LEFT" column does intermediate compression on the storage of QuantileSummaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org