You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/10/05 09:30:26 UTC

[GitHub] [spark] sitegui commented on issue #26029: [SPARK-29336][SQL] Fix the implementation of QuantileSummaries.merge (guarantee that the relativeError will be respected)

sitegui commented on issue #26029: [SPARK-29336][SQL] Fix the implementation of QuantileSummaries.merge (guarantee that the relativeError will be respected)
URL: https://github.com/apache/spark/pull/26029#issuecomment-538633749
 
 
   [This test](https://github.com/apache/spark/blob/8556710409d9f2fbaee9dbf76a2ea70218316693/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala#L124-L142) breaks, but I guess it was not testing the right thing.
   
   It checks if with a higher accuracy we get a lower error for a specific query. The issue is that the algorithm guarantees that the maximum error for all queries will reduce with accuracy: it is bounded by `count / accuracy`. However, it can be the case that for some queries the measured accuracy is much higher than that maximum.
   
   The test in question does a single query against a dataset of 1000 elements, with accuracies 1, 10, 100, 1000 and 10000. Before the patch the measured error was `249, 97, 9, 1, 0` and now it is `249, 40, 0, 1, 0`. Both respect the maximum errors of `1000, 100, 10, 1, 0`, we just got lucky in the third case now.
   
   I'll modify this test case to test for the respect of the reducing maximum bound instead. If you have other suggestions, please let me know.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org