You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yang Jie (Jira)" <ji...@apache.org> on 2022/07/13 14:45:00 UTC

[jira] [Comment Edited] (SPARK-39766) For the `arrayOfAnyAsSeq` scenario in `GenericArrayDataBenchmark`, using Scala 2.13 is slower than Scala 2.12

    [ https://issues.apache.org/jira/browse/SPARK-39766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566355#comment-17566355 ] 

Yang Jie edited comment on SPARK-39766 at 7/13/22 2:44 PM:
-----------------------------------------------------------

Should we add a new  GenericArrayData for Scala 2.13? Then we can rewrite `def this(seq: scala.collection.Seq[Any])` to To optimize this scenario, for example:
{code:java}
def this(seq: scala.collection.Seq[Any]) = this(seq match {
  case ias: scala.collection.immutable.ArraySeq.ofRef[_] =>
    ias.unsafeArray.asInstanceOf[Array[Any]]
  case mas: scala.collection.mutable.ArraySeq.ofRef[_] =>
    mas.array.asInstanceOf[Array[Any]]
  case _ => seq.toArray
}) {code}
with above change, the results of  `arrayOfAnyAsSeq` scenario as follows:
{code:java}
OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
constructor:                              Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
arrayOfAnyAsSeq                                       4              4           0       2491.6           0.4       1.0X
 {code}
*The rate from `41.4M/s` to `2491.6M/s`*

 


was (Author: luciferyang):
Should we add a new  GenericArrayData for Scala 2.13? Then we can rewrite `def this(seq: scala.collection.Seq[Any])` to To optimize this scenario, for example:

 
{code:java}
def this(seq: scala.collection.Seq[Any]) = this(seq match {
  case ias: scala.collection.immutable.ArraySeq.ofRef[_] =>
    ias.unsafeArray.asInstanceOf[Array[Any]]
  case mas: scala.collection.mutable.ArraySeq.ofRef[_] =>
    mas.array.asInstanceOf[Array[Any]]
  case _ => seq.toArray
}) {code}
 

 

with above change, the results of  `arrayOfAnyAsSeq` scenario as follows:

 

 
{code:java}
OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
constructor:                              Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
arrayOfAnyAsSeq                                       4              4           0       2491.6           0.4       1.0X
 {code}
 

 

Rate from `41.4M/s` to `2491.6M/s`

 

> For the `arrayOfAnyAsSeq` scenario in `GenericArrayDataBenchmark`, using Scala 2.13 is slower than Scala 2.12
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-39766
>                 URL: https://issues.apache.org/jira/browse/SPARK-39766
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Yang Jie
>            Priority: Major
>
> Run `GenericArrayDataBenchmark` with Scala 2.13 and 2.12, for the `arrayOfAnyAsSeq` scenario in `GenericArrayDataBenchmark`, using Scala 2.13 is slower than Scala 2.12:
> *Scala 2.12*
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.13.0-1021-azure
> Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
> constructor:                              Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> arrayOfAnyAsSeq                                      25             29           2        395.1           2.5       0.1X{code}
> *Scala 2.13*
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> constructor:                              Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> arrayOfAnyAsSeq                                     241            243           1         41.4          24.1       0.0X {code}
> the test code as follows:
> {code:java}
> benchmark.addCase("arrayOfAnyAsSeq") { _ =>
>   val arr: Seq[Any] = new Array[Any](arraySize)
>   var n = 0
>   while (n < valuesPerIteration) {
>     new GenericArrayData(arr)
>     n += 1
>   }
> } {code}
> the constructor of GenericArrayData as follows:
> {code:java}
> def this(seq: scala.collection.Seq[Any]) = this(seq.toArray) {code}
>  
> The performance difference is due to the following reasons:
> *When using Scala 2.12:*
> The class type of `arr` is `s.c.mutable.WrappedArrayWrappedArray$ofRef`, `toArray` return `array.asInstanceOf[Array[U]]`, there is no memory copy.
> *When using Scala 2.13:*
> The class type of `arr` is `s.c.immutable.ArraySeq$ofRef`, `toArray` will call `IterableOnceOps#toArray`, the corresponding implementation uses memory copy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org