You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tim Hunter (JIRA)" <ji...@apache.org> on 2016/09/07 21:04:22 UTC

[jira] [Created] (SPARK-17439) QuantilesSummaries returns the wrong result after compression

Tim Hunter created SPARK-17439:
----------------------------------

             Summary: QuantilesSummaries returns the wrong result after compression
                 Key: SPARK-17439
                 URL: https://issues.apache.org/jira/browse/SPARK-17439
             Project: Spark
          Issue Type: Bug
            Reporter: Tim Hunter


[~clockfly] found the following corner case that returns the wrong quantile (off by 1):

{code}
test("test QuantileSummaries compression") {
    var left = new QuantileSummaries(10000, 0.0001)
    System.out.println("LEFT      RIGHT")
    System.out.println("====================")
    (0 to 10).foreach { index =>
      left = left.insert(index)
      left = left.compress()

      var right = new QuantileSummaries(10000, 0.0001)
      (0 to index).foreach(right.insert(_))
      right = right.compress()
      System.out.println(s"${left.query(0.5)}   ${right.query(0.5)}")
    }
  }
{code}

The result is:
{code}
LEFT      RIGHT
====================
0.0   0.0
0.0   1.0
0.0   1.0
0.0   1.0
1.0   2.0
1.0   2.0
2.0   3.0
2.0   3.0
3.0   4.0
3.0   4.0
4.0   5.0
{code}


The value of the "LEFT" column represents the output when using QuantileSummaries in Window function, the value on the "RIGHT" column represents the expected result. The different between "LEFT" and "RIGHT" column is that the "LEFT" column does intermediate compression on the storage of QuantileSummaries.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org