You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wan Kun (Jira)" <ji...@apache.org> on 2022/05/28 06:46:00 UTC
[jira] [Created] (SPARK-39325) Improve MapOutputTracker convertMapStatuses performance
Wan Kun created SPARK-39325:
-------------------------------
Summary: Improve MapOutputTracker convertMapStatuses performance
Key: SPARK-39325
URL: https://issues.apache.org/jira/browse/SPARK-39325
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.4.0
Reporter: Wan Kun
How to reproduce this issue:
{code:java}
val benchmark = new Benchmark("MapStatuses Convert", 1, output = output)
val blockManagerNumber = 1000
val mapNumber = 50000
val shufflePartitions = 10000
val shuffleId: Int = 0
// First reduce task will fetch map data from startPartition to endPartition
val startPartition = 0
val startMapIndex = 0
val endMapIndex = mapNumber
val blockManagers = Array.tabulate(blockManagerNumber) { i =>
BlockManagerId("a", "host" + i, 7337)
}
val mapStatuses: Array[MapStatus] = Array.tabulate(mapNumber) { mapTaskId =>
HighlyCompressedMapStatus(
blockManagers(mapTaskId % blockManagerNumber),
Array.tabulate(shufflePartitions)(i => if (i % 50 == 0) 1 else 0),
mapTaskId)
}
val bitmap = new RoaringBitmap()
Range(0, 4000).foreach(bitmap.add(_))
val mergeStatuses = Array.tabulate(shufflePartitions) { part =>
MergeStatus(blockManagers(part % blockManagerNumber), shuffleId, bitmap, 100)
}
Array(499, 999, 1499).foreach { endPartition =>
benchmark.addCase(
s"Num Maps: $mapNumber Fetch partitions:${endPartition - startPartition + 1}",
numIters) { _ =>
MapOutputTracker.convertMapStatuses(
shuffleId,
startPartition,
endPartition,
mapStatuses,
startMapIndex,
endMapIndex,
Some(mergeStatuses))
}
}
benchmark.run() {code}
Benchmark result
{code:java}
================================================================================================ MapStatuses Convert Benchmark ================================================================================================Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.15.7 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz MapStatuses Convert: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Num Maps: 50000 Fetch partitions:500 3393 3483 96 0.0 3393439257.0 1.0X Num Maps: 50000 Fetch partitions:1000 6640 6772 121 0.0 6639654832.0 0.5X Num Maps: 50000 Fetch partitions:1500 10035 10143 108 0.0 10035100069.0 0.3X
{code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org