You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yang Jie (Jira)" <ji...@apache.org> on 2020/08/24 11:30:00 UTC

[jira] [Created] (SPARK-32690) Spark-32550 affects the performance of some cases

Yang Jie created SPARK-32690:
--------------------------------

             Summary: Spark-32550 affects the performance of some cases
                 Key: SPARK-32690
                 URL: https://issues.apache.org/jira/browse/SPARK-32690
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.1.0
            Reporter: Yang Jie


I found that [Spark-32550|https://github.com/apache/spark/pull/29366] affected the performance of some cases, the typical cases is "deterministic cardinality estimation" in 

HyperLogLogPlusPlusSuite when rsd is 0.001, we found the code that is significantly slower is

 

[https://github.com/apache/spark/blob/08b951b1cb58cea2c34703e43202fe7c84725c8a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlusSuite.scala#L41]

 

The results of comparison before and after spark-32550 merged are as follows:
| |After SPARK-32550 create createBuffer|After SPARK-32550 end to end |Before SPARK-32550 create input|Before SPARK-32550 end to end |
|rsd 0.001, n 1000|52715513243|53004810687|195807999|773977677|
|rsd 0.001, n 5000|51881246165|52519358215|13689949|249974855|
|rsd 0.001, n 10000|52234282788|52374639172|14199071|183452846|
|rsd 0.001, n 50000|55503517122|55664035449|15219394|584477125|
|rsd 0.001, n 100000|51862662845|52116774177|19662834|166483678|
|rsd 0.001, n 500000|51619226715|52183189526|178048012|16681330|
|rsd 0.001, n 1000000|54861366981|54976399142|226178708|18826340|
|rsd 0.001, n 5000000|52023602143|52354615149|388173579|15446409|
|rsd 0.001, n 10000000|53008591660|53601392304|533454460|16033032|

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org