You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/01 02:29:18 UTC
[GitHub] [spark] LuciferYang opened a new pull request, #37353: [CORE] Optimize `Utils.getIteratorSize` for Scala 2.13 refer to `IterableOnceOps.size`
LuciferYang opened a new pull request, #37353:
URL: https://github.com/apache/spark/pull/37353
### What changes were proposed in this pull request?
`Utils.getIteratorSize` is slightly slower than `Iterator.size` when using Scala 2.13, but it seems that we can't call `Iterator.size` directly, because `Utils.getIteratorSize` method returns `Long` type data, using `Iterator.size` directly can't ensure that overflow will not occur.
So this pr adds `if (iterator.knownSize > 0)` conditional branches to optimize this method when using Scala 2.13, this optimization way refers to `IterableOnceOps.size`:
https://github.com/scala/scala/blob/cbc012e573346dc685c478eec5fcbb56d22ea884/src/library/scala/collection/IterableOnce.scala#L835-L849
<img width="553" alt="image" src="https://user-images.githubusercontent.com/1475305/182060742-c56ead62-4164-4633-a501-48e2e3151752.png">
### Why are the changes needed?
Optimize `Utils.getIteratorSize` for Scala 2.13.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- Pass GitHub Actions
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] srowen closed pull request #37353: [SPARK-39928][CORE] Optimize `Utils.getIteratorSize` for Scala 2.13 refer to `IterableOnceOps.size`
Posted by GitBox <gi...@apache.org>.
srowen closed pull request #37353: [SPARK-39928][CORE] Optimize `Utils.getIteratorSize` for Scala 2.13 refer to `IterableOnceOps.size`
URL: https://github.com/apache/spark/pull/37353
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #37353: [WIP][SPARK-39928][CORE] Optimize `Utils.getIteratorSize` for Scala 2.13 refer to `IterableOnceOps.size`
Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37353:
URL: https://github.com/apache/spark/pull/37353#issuecomment-1200633108
From mirco-benchmark results, the performance gap shows a linear growth trend with the increase of input data size before this pr . and no significant performance difference after this pr
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #37353: [WIP][SPARK-39928][CORE] Optimize `Utils.getIteratorSize` for Scala 2.13 refer to `IterableOnceOps.size`
Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37353:
URL: https://github.com/apache/spark/pull/37353#issuecomment-1200630210
Write a mirco-bench to test `Utils.getIteratorSize` vs `Iterator.size`:
```
object IteratorSizeBenchmark extends BenchmarkBase {
private def testRangeIteratorSize(valuesPerIteration: Int, range: Range): Unit = {
val benchmark = new Benchmark(s"Test Range iterator size ${range.size}",
valuesPerIteration,
output = output)
benchmark.addCase("Use Iterator.size") { _: Int =>
for (_ <- 0L until valuesPerIteration) {
range.iterator.size.toLong
}
}
benchmark.addCase("Use Utils.getIteratorSize") { _: Int =>
for (_ <- 0L until valuesPerIteration) {
Utils.getIteratorSize(range.iterator)
}
}
benchmark.run()
}
private def testSeqIteratorSize(valuesPerIteration: Int, seq: Seq[Int]): Unit = {
val benchmark = new Benchmark(s"Test Seq iterator size ${seq.size}",
valuesPerIteration,
output = output)
benchmark.addCase("Use Iterator.size") { _: Int =>
for (_ <- 0L until valuesPerIteration) {
seq.iterator.size.toLong
}
}
benchmark.addCase("Use Utils.getIteratorSize") { _: Int =>
for (_ <- 0L until valuesPerIteration) {
Utils.getIteratorSize(seq.iterator)
}
}
benchmark.run()
}
private def testArrayIteratorSize(valuesPerIteration: Int, array: Array[Int]): Unit = {
val benchmark = new Benchmark(s"Test Array iterator size ${array.length}",
valuesPerIteration,
output = output)
benchmark.addCase("Use Iterator.size") { _: Int =>
for (_ <- 0L until valuesPerIteration) {
array.iterator.size.toLong
}
}
benchmark.addCase("Use Utils.getIteratorSize") { _: Int =>
for (_ <- 0L until valuesPerIteration) {
Utils.getIteratorSize(array.iterator)
}
}
benchmark.run()
}
override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
val valuesPerIteration = 100000
// Test Range
testRangeIteratorSize(valuesPerIteration, Range(0, 100))
testRangeIteratorSize(valuesPerIteration, Range(0, 1000))
testRangeIteratorSize(valuesPerIteration, Range(0, 10000))
testRangeIteratorSize(valuesPerIteration, Range(0, 30000))
// Test Seq
testSeqIteratorSize(valuesPerIteration, Seq.range(0, 100))
testSeqIteratorSize(valuesPerIteration, Seq.range(0, 1000))
testSeqIteratorSize(valuesPerIteration, Seq.range(0, 10000))
testSeqIteratorSize(valuesPerIteration, Seq.range(0, 30000))
// Test Array
testArrayIteratorSize(valuesPerIteration, Array.range(0, 100))
testArrayIteratorSize(valuesPerIteration, Array.range(0, 1000))
testArrayIteratorSize(valuesPerIteration, Array.range(0, 10000))
testArrayIteratorSize(valuesPerIteration, Array.range(0, 30000))
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #37353: [WIP][SPARK-39928][CORE] Optimize `Utils.getIteratorSize` for Scala 2.13 refer to `IterableOnceOps.size`
Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37353:
URL: https://github.com/apache/spark/pull/37353#issuecomment-1200630766
Using Scala 2.13
**Before**
```
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Range iterator size 100: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 1 1 1 88.5 11.3 1.0X
Use Utils.getIteratorSize 12 12 0 8.4 119.4 0.1X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Range iterator size 1000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 1 1 1 88.6 11.3 1.0X
Use Utils.getIteratorSize 419 504 104 0.2 4187.4 0.0X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Range iterator size 10000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 2 3 2 50.1 19.9 1.0X
Use Utils.getIteratorSize 7666 8169 711 0.0 76659.3 0.0X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Range iterator size 30000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 1 2 1 87.0 11.5 1.0X
Use Utils.getIteratorSize 18400 18615 305 0.0 183999.1 0.0X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Seq iterator size 100: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 5 6 2 21.1 47.3 1.0X
Use Utils.getIteratorSize 38 39 1 2.6 382.4 0.1X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Seq iterator size 1000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 5 5 1 20.0 50.1 1.0X
Use Utils.getIteratorSize 941 968 46 0.1 9407.8 0.0X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Seq iterator size 10000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 9 14 2 11.3 88.6 1.0X
Use Utils.getIteratorSize 8646 8977 468 0.0 86457.7 0.0X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Seq iterator size 30000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 5 6 2 20.1 49.8 1.0X
Use Utils.getIteratorSize 28275 29490 1717 0.0 282753.1 0.0X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Array iterator size 100: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 1 2 1 77.8 12.8 1.0X
Use Utils.getIteratorSize 30 31 1 3.3 303.9 0.0X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Array iterator size 1000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 1 1 1 185.2 5.4 1.0X
Use Utils.getIteratorSize 388 414 27 0.3 3881.9 0.0X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Array iterator size 10000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 1 1 1 185.3 5.4 1.0X
Use Utils.getIteratorSize 4651 5066 586 0.0 46508.7 0.0X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Array iterator size 30000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 1 1 1 185.7 5.4 1.0X
Use Utils.getIteratorSize 14514 15972 2061 0.0 145144.5 0.0X
```
**After**
```
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Range iterator size 100: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 1 2 1 90.3 11.1 1.0X
Use Utils.getIteratorSize 1 2 2 82.7 12.1 0.9X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Range iterator size 1000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 1 2 1 90.9 11.0 1.0X
Use Utils.getIteratorSize 1 2 1 83.8 11.9 0.9X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Range iterator size 10000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 1 2 1 89.1 11.2 1.0X
Use Utils.getIteratorSize 1 1 1 83.1 12.0 0.9X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Range iterator size 30000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 1 1 1 89.6 11.2 1.0X
Use Utils.getIteratorSize 1 2 1 82.8 12.1 0.9X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Seq iterator size 100: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 4 6 3 28.4 35.2 1.0X
Use Utils.getIteratorSize 4 5 2 27.3 36.6 1.0X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Seq iterator size 1000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 5 7 3 20.4 49.1 1.0X
Use Utils.getIteratorSize 6 11 2 16.8 59.5 0.8X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Seq iterator size 10000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 5 9 5 19.9 50.3 1.0X
Use Utils.getIteratorSize 6 11 2 15.6 64.0 0.8X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Seq iterator size 30000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 8 14 2 12.5 80.3 1.0X
Use Utils.getIteratorSize 6 11 3 18.0 55.5 1.4X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Array iterator size 100: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 1 1 1 101.2 9.9 1.0X
Use Utils.getIteratorSize 1 1 1 80.0 12.5 0.8X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Array iterator size 1000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 1 1 0 185.3 5.4 1.0X
Use Utils.getIteratorSize 2 4 2 42.9 23.3 0.2X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Array iterator size 10000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 1 1 1 142.1 7.0 1.0X
Use Utils.getIteratorSize 1 2 1 83.7 11.9 0.6X
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test Array iterator size 30000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Iterator.size 1 1 1 185.4 5.4 1.0X
Use Utils.getIteratorSize 1 1 1 118.8 8.4 0.6X
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #37353: [SPARK-39928][CORE] Optimize `Utils.getIteratorSize` for Scala 2.13 refer to `IterableOnceOps.size`
Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37353:
URL: https://github.com/apache/spark/pull/37353#issuecomment-1203975885
thanks @srowen @mridulm
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #37353: [SPARK-39928][CORE] Optimize `Utils.getIteratorSize` for Scala 2.13 refer to `IterableOnceOps.size`
Posted by GitBox <gi...@apache.org>.
mridulm commented on PR #37353:
URL: https://github.com/apache/spark/pull/37353#issuecomment-1202154486
Looks good to me.
+CC @srowen
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #37353: [SPARK-39928][CORE] Optimize `Utils.getIteratorSize` for Scala 2.13 refer to `IterableOnceOps.size`
Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37353:
URL: https://github.com/apache/spark/pull/37353#issuecomment-1202398817
The following 3 method will use this:
https://github.com/apache/spark/blob/222cb80227fb9c24b1677c5a746c0a174a18b537/core/src/main/scala/org/apache/spark/rdd/LocalRDDCheckpointData.scala#L49
https://github.com/apache/spark/blob/222cb80227fb9c24b1677c5a746c0a174a18b537/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1268
https://github.com/apache/spark/blob/222cb80227fb9c24b1677c5a746c0a174a18b537/core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala#L50-L54
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] srowen commented on pull request #37353: [SPARK-39928][CORE] Optimize `Utils.getIteratorSize` for Scala 2.13 refer to `IterableOnceOps.size`
Posted by GitBox <gi...@apache.org>.
srowen commented on PR #37353:
URL: https://github.com/apache/spark/pull/37353#issuecomment-1202391005
It's a little ugly to do just for this method, but yeah maybe worth it for the speedup in 2.13. I don't know how much this method is used or in what hot paths.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] srowen commented on pull request #37353: [SPARK-39928][CORE] Optimize `Utils.getIteratorSize` for Scala 2.13 refer to `IterableOnceOps.size`
Posted by GitBox <gi...@apache.org>.
srowen commented on PR #37353:
URL: https://github.com/apache/spark/pull/37353#issuecomment-1203963655
Merged to master
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org