You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/05/28 06:57:00 UTC
[jira] [Commented] (SPARK-39325) Improve MapOutputTracker convertMapStatuses performance

    [ https://issues.apache.org/jira/browse/SPARK-39325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543424#comment-17543424 ] 

Apache Spark commented on SPARK-39325:
--------------------------------------

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/36709

> Improve MapOutputTracker convertMapStatuses performance
> -------------------------------------------------------
>
>                 Key: SPARK-39325
>                 URL: https://issues.apache.org/jira/browse/SPARK-39325
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.4.0
>            Reporter: Wan Kun
>            Priority: Major
>
> How to reproduce this issue:
> {code:java}
> val benchmark = new Benchmark("MapStatuses Convert", 1, output = output)
> val blockManagerNumber = 1000
> val mapNumber = 50000
> val shufflePartitions = 10000
> val shuffleId: Int = 0
> // First reduce task will fetch map data from startPartition to endPartition
> val startPartition = 0
> val startMapIndex = 0
> val endMapIndex = mapNumber
> val blockManagers = Array.tabulate(blockManagerNumber) { i =>
>   BlockManagerId("a", "host" + i, 7337)
> }
> val mapStatuses: Array[MapStatus] = Array.tabulate(mapNumber) { mapTaskId =>
>   HighlyCompressedMapStatus(
>     blockManagers(mapTaskId % blockManagerNumber),
>     Array.tabulate(shufflePartitions)(i => if (i % 50 == 0) 1 else 0),
>     mapTaskId)
> }
> val bitmap = new RoaringBitmap()
> Range(0, 4000).foreach(bitmap.add(_))
> val mergeStatuses = Array.tabulate(shufflePartitions) { part =>
>   MergeStatus(blockManagers(part % blockManagerNumber), shuffleId, bitmap, 100)
> }
> Array(499, 999, 1499).foreach { endPartition =>
>   benchmark.addCase(
>     s"Num Maps: $mapNumber Fetch partitions:${endPartition - startPartition + 1}",
>     numIters) { _ =>
>     MapOutputTracker.convertMapStatuses(
>       shuffleId,
>       startPartition,
>       endPartition,
>       mapStatuses,
>       startMapIndex,
>       endMapIndex,
>       Some(mergeStatuses))
>   }
> }
> benchmark.run() {code}
> Benchmark result
> {code:java}
> ================================================================================================ MapStatuses Convert Benchmark ================================================================================================Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.15.7 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz MapStatuses Convert:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative ------------------------------------------------------------------------------------------------------------------------ Num Maps: 50000 Fetch partitions:500               3393           3483          96          0.0  3393439257.0       1.0X Num Maps: 50000 Fetch partitions:1000              6640           6772         121          0.0  6639654832.0       0.5X Num Maps: 50000 Fetch partitions:1500             10035          10143         108          0.0 10035100069.0       0.3X
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org