You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2020/08/24 14:16:00 UTC

[jira] [Assigned] (SPARK-32690) Spark-32550 affects the performance of some cases

     [ https://issues.apache.org/jira/browse/SPARK-32690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-32690:
------------------------------------

    Assignee:     (was: Apache Spark)

> Spark-32550 affects the performance of some cases
> -------------------------------------------------
>
>                 Key: SPARK-32690
>                 URL: https://issues.apache.org/jira/browse/SPARK-32690
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Yang Jie
>            Priority: Major
>
> I found that [Spark-32550|https://github.com/apache/spark/pull/29366] affected the performance of some cases, the typical cases is "deterministic cardinality estimation" in 
> HyperLogLogPlusPlusSuite when rsd is 0.001, we found the code that is significantly slower is
>  
> [https://github.com/apache/spark/blob/08b951b1cb58cea2c34703e43202fe7c84725c8a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlusSuite.scala#L41]
>  
> {code:java}
> 40:def createBuffer(hll: HyperLogLogPlusPlus): InternalRow = {    
> 41:  val buffer = new SpecificInternalRow(hll.aggBufferAttributes.map(_.dataType))    
> 42:  hll.initialize(buffer)    
> 43:  buffer  
> 44:}
> {code}
>  
> The size of "hll.aggBufferAttributes" in this case is 209716, the results of comparison before and after spark-32550 merged are as follows:
> | |After SPARK-32550 create createBuffer|After SPARK-32550 end to end |Before SPARK-32550 create input|Before SPARK-32550 end to end |
> |rsd 0.001, n 1000|52715513243|53004810687|195807999|773977677|
> |rsd 0.001, n 5000|51881246165|52519358215|13689949|249974855|
> |rsd 0.001, n 10000|52234282788|52374639172|14199071|183452846|
> |rsd 0.001, n 50000|55503517122|55664035449|15219394|584477125|
> |rsd 0.001, n 100000|51862662845|52116774177|19662834|166483678|
> |rsd 0.001, n 500000|51619226715|52183189526|178048012|16681330|
> |rsd 0.001, n 1000000|54861366981|54976399142|226178708|18826340|
> |rsd 0.001, n 5000000|52023602143|52354615149|388173579|15446409|
> |rsd 0.001, n 10000000|53008591660|53601392304|533454460|16033032|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org