You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2020/10/16 20:50:06 UTC
[spark] branch branch-3.0 updated: [SPARK-32436][CORE] Initialize
numNonEmptyBlocks in HighlyCompressedMapStatus.readExternal
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push:
new 1bec8a3 [SPARK-32436][CORE] Initialize numNonEmptyBlocks in HighlyCompressedMapStatus.readExternal
1bec8a3 is described below
commit 1bec8a3e17ffd81126b36037a852cfc648764006
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Sat Jul 25 10:16:01 2020 -0700
[SPARK-32436][CORE] Initialize numNonEmptyBlocks in HighlyCompressedMapStatus.readExternal
### What changes were proposed in this pull request?
This PR aims to initialize `numNonEmptyBlocks` in `HighlyCompressedMapStatus.readExternal`.
In Scala 2.12, this is initialized to `-1` via the following.
```scala
protected def this() = this(null, -1, null, -1, null, -1) // For deserialization only
```
### Why are the changes needed?
In Scala 2.13, this causes several UT failures because `HighlyCompressedMapStatus.readExternal` doesn't initialize this field. The following is one example.
- org.apache.spark.scheduler.MapStatusSuite
```
MapStatusSuite:
- compressSize
- decompressSize
*** RUN ABORTED ***
java.lang.NoSuchFieldError: numNonEmptyBlocks
at org.apache.spark.scheduler.HighlyCompressedMapStatus.<init>(MapStatus.scala:181)
at org.apache.spark.scheduler.HighlyCompressedMapStatus$.apply(MapStatus.scala:281)
at org.apache.spark.scheduler.MapStatus$.apply(MapStatus.scala:73)
at org.apache.spark.scheduler.MapStatusSuite.$anonfun$new$8(MapStatusSuite.scala:64)
at scala.runtime.java8.JFunction1$mcVD$sp.apply(JFunction1$mcVD$sp.scala:18)
at scala.collection.immutable.List.foreach(List.scala:333)
at org.apache.spark.scheduler.MapStatusSuite.$anonfun$new$7(MapStatusSuite.scala:61)
at scala.runtime.java8.JFunction1$mcVJ$sp.apply(JFunction1$mcVJ$sp.scala:18)
at scala.collection.immutable.List.foreach(List.scala:333)
at org.apache.spark.scheduler.MapStatusSuite.$anonfun$new$6(MapStatusSuite.scala:60)
...
```
### Does this PR introduce _any_ user-facing change?
No. This is a private class.
### How was this patch tested?
1. Pass the GitHub Action or Jenkins with the existing tests.
2. Test with Scala-2.13 with `MapStatusSuite`.
```
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.scheduler.MapStatusSuite
...
MapStatusSuite:
- compressSize
- decompressSize
- MapStatus should never report non-empty blocks' sizes as 0
- large tasks should use org.apache.spark.scheduler.HighlyCompressedMapStatus
- HighlyCompressedMapStatus: estimated size should be the average non-empty block size
- SPARK-22540: ensure HighlyCompressedMapStatus calculates correct avgSize
- RoaringBitmap: runOptimize succeeded
- RoaringBitmap: runOptimize failed
- Blocks which are bigger than SHUFFLE_ACCURATE_BLOCK_THRESHOLD should not be underestimated.
- SPARK-21133 HighlyCompressedMapStatus#writeExternal throws NPE
Run completed in 7 seconds, 971 milliseconds.
Total number of tests run: 10
Suites: completed 2, aborted 0
Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```
Closes #29231 from dongjoon-hyun/SPARK-32436.
Authored-by: Dongjoon Hyun <do...@apache.org>
Signed-off-by: Dongjoon Hyun <do...@apache.org>
(cherry picked from commit f9f18673dca16c78a77c3386a1629136031bf2ba)
Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala | 1 +
1 file changed, 1 insertion(+)
diff --git a/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala b/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
index 7f8893f..6005240 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
@@ -206,6 +206,7 @@ private[spark] class HighlyCompressedMapStatus private (
override def readExternal(in: ObjectInput): Unit = Utils.tryOrIOException {
loc = BlockManagerId(in)
+ numNonEmptyBlocks = -1 // SPARK-32436 Scala 2.13 doesn't initialize this during deserialization
emptyBlocks = new RoaringBitmap()
emptyBlocks.readExternal(in)
avgSize = in.readLong()
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org