You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yang Jie (Jira)" <ji...@apache.org> on 2022/07/15 04:49:00 UTC
[jira] [Commented] (SPARK-39652) For CoalescedRDDBenchmark, the test results using Scala 2.13 are slower than Scala 2.12`, the test results using Scala 2.13 are slower than Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-39652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567087#comment-17567087 ]
Yang Jie commented on SPARK-39652:
----------------------------------
For this issue, I haven't found workaround way, We need to wait for the fix of [https://github.com/scala/bug/issues/12614] and upgrade to new Scala 2.13 version
> For CoalescedRDDBenchmark, the test results using Scala 2.13 are slower than Scala 2.12`, the test results using Scala 2.13 are slower than Scala 2.12
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-39652
> URL: https://issues.apache.org/jira/browse/SPARK-39652
> Project: Spark
> Issue Type: Sub-task
> Components: Spark Core
> Affects Versions: 3.4.0
> Reporter: Yang Jie
> Priority: Major
> Attachments: CoalescedRDDBenchmark-212-results.txt, CoalescedRDDBenchmark-213-results.txt
>
>
> Typical result are as follows:
> *2.12.16*
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Coalesced RDD: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
> ----------------------------------------------------------------------------------------------------------------------------
> Coalesce Num Partitions: 500 Num Hosts: 1 659 671 19 0.2 6588.2 0.4X
> Coalesce Num Partitions: 1000 Num Hosts: 1 1111 1133 25 0.1 11114.3 0.3X
> Coalesce Num Partitions: 5000 Num Hosts: 1 4573 4580 12 0.0 45727.1 0.1X
> Coalesce Num Partitions: 10000 Num Hosts: 1 8949 9107 240 0.0 89494.7 0.0X{code}
>
> *2.13.8*
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Coalesced RDD: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
> ----------------------------------------------------------------------------------------------------------------------------
> Coalesce Num Partitions: 500 Num Hosts: 1 964 1131 284 0.1 9643.3 0.3X
> Coalesce Num Partitions: 1000 Num Hosts: 1 1732 1742 10 0.1 17318.2 0.2X
> Coalesce Num Partitions: 5000 Num Hosts: 1 7534 7539 4 0.0 75339.8 0.0X
> Coalesce Num Partitions: 10000 Num Hosts: 1 14524 14565 37 0.0 145245.0 0.0X {code}
>
> From the analysis of jfr results, the hot path is `ArrayBuffer.min`
> [https://github.com/apache/spark/blob/ded5981823ac8e8e9339291415d9828dfcc6e062/core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala#L224-L225]
>
> {code:java}
> def getLeastGroupHash(key: String): Option[PartitionGroup] =
> groupHash.get(key).filter(_.nonEmpty).map(_.min) {code}
>
> then I write a new simple bench to test `ArrayBuffer.min` and `ArrayBuffer.max`:
> {code:java}
> private def min(numIters: Int, bufferSize: Int, loops: Int): Unit = {
> val benchmark = new Benchmark("Array Buffer", loops, output = output)
> val buffer = new mutable.ArrayBuffer[Int]()
> (0 until bufferSize).foreach(i => buffer += i)
> benchmark.addCase(s"buffer min with $bufferSize items", numIters) { _ =>
> (0 until loops).foreach(_ => buffer.min)
> }
> benchmark.addCase(s"buffer sorted head with $bufferSize items", numIters) { _ =>
> (0 until loops).foreach(_ => buffer.sorted.head)
> }
> benchmark.addCase(s"buffer max with $bufferSize items", numIters) { _ =>
> (0 until loops).foreach(_ => buffer.max)
> }
> benchmark.addCase(s"buffer sorted last with $bufferSize items", numIters) { _ =>
> (0 until loops).foreach(_ => buffer.sorted.last)
> }
> benchmark.run()
> }
> override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
> val numIters = 3
> val loops = 20000
> runBenchmark("ArrayBuffer min and max") {
> Seq(1000, 10000, 100000).foreach { bufferSize =>
> min(numIters, bufferSize, loops)
> }
> }
> } {code}
>
> The result as follows:
> *Scala 2.12*
>
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 1000 items 121 121 0 0.2 6043.9 1.0X
> buffer max with 1000 items 134 134 0 0.1 6705.5 0.9XOpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 10000 items 1272 1272 0 0.0 63601.9 1.0X
> buffer max with 10000 items 1339 1340 0 0.0 66972.2 0.9XOpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 100000 items 12785 12788 3 0.0 639232.4 1.0X
> buffer max with 100000 items 13433 13434 1 0.0 671654.2 1.0X {code}
>
> *Scala 2.13*
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 1000 items 273 273 0 0.1 13646.5 1.0X
> buffer max with 1000 items 162 162 0 0.1 8076.9 1.7XOpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 10000 items 1676 1676 0 0.0 83782.1 1.0X
> buffer max with 10000 items 1610 1610 0 0.0 80485.5 1.0XOpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> Array Buffer: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------------------------------
> buffer min with 100000 items 16849 16856 10 0.0 842469.3 1.0X
> buffer max with 100000 items 15513 15515 4 0.0 775641.0 1.1X{code}
>
> Already submit a issue to scala: [https://github.com/scala/bug/issues/12614]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org