You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yang Jie (Jira)" <ji...@apache.org> on 2022/07/15 04:49:00 UTC

[jira] [Commented] (SPARK-39652) For CoalescedRDDBenchmark, the test results using Scala 2.13 are slower than Scala 2.12`, the test results using Scala 2.13 are slower than Scala 2.12

    [ https://issues.apache.org/jira/browse/SPARK-39652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567087#comment-17567087 ] 

Yang Jie commented on SPARK-39652:
----------------------------------

For this issue, I haven't found workaround way,  We need to wait for the fix of [https://github.com/scala/bug/issues/12614]  and upgrade  to new Scala 2.13 version

 

> For CoalescedRDDBenchmark, the test results using Scala 2.13 are slower than Scala 2.12`, the test results using Scala 2.13 are slower than Scala 2.12
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-39652
>                 URL: https://issues.apache.org/jira/browse/SPARK-39652
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core
>    Affects Versions: 3.4.0
>            Reporter: Yang Jie
>            Priority: Major
>         Attachments: CoalescedRDDBenchmark-212-results.txt, CoalescedRDDBenchmark-213-results.txt
>
>
> Typical result are as follows:
> *2.12.16*
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Coalesced RDD:                                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ----------------------------------------------------------------------------------------------------------------------------
> Coalesce Num Partitions: 500 Num Hosts: 1               659            671          19          0.2        6588.2       0.4X
> Coalesce Num Partitions: 1000 Num Hosts: 1             1111           1133          25          0.1       11114.3       0.3X
> Coalesce Num Partitions: 5000 Num Hosts: 1             4573           4580          12          0.0       45727.1       0.1X
> Coalesce Num Partitions: 10000 Num Hosts: 1            8949           9107         240          0.0       89494.7       0.0X{code}
>  
> *2.13.8*
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Coalesced RDD:                                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ----------------------------------------------------------------------------------------------------------------------------
> Coalesce Num Partitions: 500 Num Hosts: 1               964           1131         284          0.1        9643.3       0.3X
> Coalesce Num Partitions: 1000 Num Hosts: 1             1732           1742          10          0.1       17318.2       0.2X
> Coalesce Num Partitions: 5000 Num Hosts: 1             7534           7539           4          0.0       75339.8       0.0X
> Coalesce Num Partitions: 10000 Num Hosts: 1           14524          14565          37          0.0      145245.0       0.0X {code}
>  
> From the analysis of jfr results, the hot path is `ArrayBuffer.min`
> [https://github.com/apache/spark/blob/ded5981823ac8e8e9339291415d9828dfcc6e062/core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala#L224-L225]
>  
> {code:java}
> def getLeastGroupHash(key: String): Option[PartitionGroup] =
>   groupHash.get(key).filter(_.nonEmpty).map(_.min) {code}
>  
> then I write a new simple bench to test `ArrayBuffer.min` and `ArrayBuffer.max`:
> {code:java}
> private def min(numIters: Int, bufferSize: Int, loops: Int): Unit = {
>   val benchmark = new Benchmark("Array Buffer", loops, output = output)
>   val buffer = new mutable.ArrayBuffer[Int]()
>   (0 until bufferSize).foreach(i => buffer += i)
>   benchmark.addCase(s"buffer min with $bufferSize items", numIters) { _ =>
>     (0 until loops).foreach(_ => buffer.min)
>   }
>   benchmark.addCase(s"buffer sorted head with $bufferSize items", numIters) { _ =>
>     (0 until loops).foreach(_ => buffer.sorted.head)
>   }
>   benchmark.addCase(s"buffer max with $bufferSize items", numIters) { _ =>
>     (0 until loops).foreach(_ => buffer.max)
>   }
>   benchmark.addCase(s"buffer sorted last with $bufferSize items", numIters) { _ =>
>     (0 until loops).foreach(_ => buffer.sorted.last)
>   }
>   benchmark.run()
> }
> override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
>   val numIters = 3
>   val loops = 20000
>   runBenchmark("ArrayBuffer min and max") {
>     Seq(1000, 10000, 100000).foreach { bufferSize =>
>       min(numIters, bufferSize, loops)
>     }
>   }
> } {code}
>  
> The result as follows:
> *Scala 2.12*
>  
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 1000 items                          121            121           0          0.2        6043.9       1.0X
> buffer max with 1000 items                          134            134           0          0.1        6705.5       0.9XOpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 10000 items                        1272           1272           0          0.0       63601.9       1.0X
> buffer max with 10000 items                        1339           1340           0          0.0       66972.2       0.9XOpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 100000 items                      12785          12788           3          0.0      639232.4       1.0X
> buffer max with 100000 items                      13433          13434           1          0.0      671654.2       1.0X {code}
>  
> *Scala 2.13*
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 1000 items                          273            273           0          0.1       13646.5       1.0X
> buffer max with 1000 items                          162            162           0          0.1        8076.9       1.7XOpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 10000 items                        1676           1676           0          0.0       83782.1       1.0X
> buffer max with 10000 items                        1610           1610           0          0.0       80485.5       1.0XOpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 100000 items                      16849          16856          10          0.0      842469.3       1.0X
> buffer max with 100000 items                      15513          15515           4          0.0      775641.0       1.1X{code}
>  
> Already submit a issue to scala:  [https://github.com/scala/bug/issues/12614] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org