You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wan Kun (Jira)" <ji...@apache.org> on 2022/05/28 06:46:00 UTC

[jira] [Created] (SPARK-39325) Improve MapOutputTracker convertMapStatuses performance

Wan Kun created SPARK-39325:
-------------------------------

             Summary: Improve MapOutputTracker convertMapStatuses performance
                 Key: SPARK-39325
                 URL: https://issues.apache.org/jira/browse/SPARK-39325
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.4.0
            Reporter: Wan Kun


How to reproduce this issue:
{code:java}
val benchmark = new Benchmark("MapStatuses Convert", 1, output = output)

val blockManagerNumber = 1000
val mapNumber = 50000
val shufflePartitions = 10000

val shuffleId: Int = 0
// First reduce task will fetch map data from startPartition to endPartition
val startPartition = 0
val startMapIndex = 0
val endMapIndex = mapNumber
val blockManagers = Array.tabulate(blockManagerNumber) { i =>
  BlockManagerId("a", "host" + i, 7337)
}
val mapStatuses: Array[MapStatus] = Array.tabulate(mapNumber) { mapTaskId =>
  HighlyCompressedMapStatus(
    blockManagers(mapTaskId % blockManagerNumber),
    Array.tabulate(shufflePartitions)(i => if (i % 50 == 0) 1 else 0),
    mapTaskId)
}
val bitmap = new RoaringBitmap()
Range(0, 4000).foreach(bitmap.add(_))
val mergeStatuses = Array.tabulate(shufflePartitions) { part =>
  MergeStatus(blockManagers(part % blockManagerNumber), shuffleId, bitmap, 100)
}

Array(499, 999, 1499).foreach { endPartition =>
  benchmark.addCase(
    s"Num Maps: $mapNumber Fetch partitions:${endPartition - startPartition + 1}",
    numIters) { _ =>
    MapOutputTracker.convertMapStatuses(
      shuffleId,
      startPartition,
      endPartition,
      mapStatuses,
      startMapIndex,
      endMapIndex,
      Some(mergeStatuses))
  }
}

benchmark.run() {code}
Benchmark result
{code:java}
================================================================================================ MapStatuses Convert Benchmark ================================================================================================Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.15.7 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz MapStatuses Convert:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative ------------------------------------------------------------------------------------------------------------------------ Num Maps: 50000 Fetch partitions:500               3393           3483          96          0.0  3393439257.0       1.0X Num Maps: 50000 Fetch partitions:1000              6640           6772         121          0.0  6639654832.0       0.5X Num Maps: 50000 Fetch partitions:1500             10035          10143         108          0.0 10035100069.0       0.3X
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org