You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wangjiaochun <gi...@git.apache.org> on 2018/12/05 08:04:49 UTC
[GitHub] spark pull request #23225: [MINOR][CORE]Don't need to create an empty spill ...
GitHub user wangjiaochun opened a pull request:
https://github.com/apache/spark/pull/23225
[MINOR][CORE]Don't need to create an empty spill file when memory has no records
## What changes were proposed in this pull request?
If there are no records in memory, then we don't need to create an empty temp spill file.
## How was this patch tested?
Existing tests
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wangjiaochun/spark ShufflSorter
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/23225.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #23225
----
commit 6f9dcee39131429d9b40df45229149ffdde4fdbd
Author: 10087686 <wa...@...>
Date: 2018-12-05T07:52:55Z
"add if"
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r239707130
--- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
@@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
}
}
+ @Test
+ public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception {
+ final UnsafeShuffleWriter<Object, Object> writer = createWriter(true);
+ writer.write(Iterators.emptyIterator());
+ final Option<MapStatus> mapStatus = writer.stop(true);
+ assertTrue(mapStatus.isDefined());
+ assertTrue(mergedOutputFile.exists());
+ assertEquals(0, spillFilesCreated.size());
--- End diff --
Ya. That's the same what I observed. There is no need to keep both test cases because there is no code path like that.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [MINOR][CORE]Don't need to create an empty spill file wh...
Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on the issue:
https://github.com/apache/spark/pull/23225
1. I think test case writeEmptyIterator in UnsafeShuffleWriterSuite.java cover this scenes
2. I will propose a JIRA soon.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23225
**[Test build #99890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99890/testReport)** for PR 23225 at commit [`bac1991`](https://github.com/apache/spark/commit/bac1991a5c047fd546785179d19144f9c076aa58).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23225
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23225
**[Test build #4458 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4458/testReport)** for PR 23225 at commit [`5a3e797`](https://github.com/apache/spark/commit/5a3e797b1338ea967e3d6bf7a80b945465f4891d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r239989805
--- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) {
final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords =
inMemSorter.getSortedIterator();
+ // If there are no sorted records, so we don't need to create an empty spill file.
+ if (!sortedRecords.hasNext()) {
+ return;
+ }
--- End diff --
I think it's better not to do that.Because change the original code style and it don't makes an appreciable difference in readability.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r239859088
--- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) {
final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords =
inMemSorter.getSortedIterator();
+ // If there are no sorted records, so we don't need to create an empty spill file.
+ if (!sortedRecords.hasNext()) {
+ return;
+ }
--- End diff --
@wangjiaochun . Can we move this line 160 ~ 167 to line 147? If then, we don't need to do `new SuffleWriteMetrics()` at all.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [MINOR][CORE]Don't need to create an empty spill file wh...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23225
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r239702595
--- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
@@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
}
}
+ @Test
+ public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception {
+ final UnsafeShuffleWriter<Object, Object> writer = createWriter(true);
+ writer.write(Iterators.emptyIterator());
+ final Option<MapStatus> mapStatus = writer.stop(true);
+ assertTrue(mapStatus.isDefined());
+ assertTrue(mergedOutputFile.exists());
+ assertEquals(0, spillFilesCreated.size());
--- End diff --
Do you mean that `writeEmptyIterator` will fail if we add this line `assertEquals(0, spillFilesCreated.size());` there?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [MINOR][CORE]Don't need to create an empty spill ...
Posted by heary-cao <gi...@git.apache.org>.
Github user heary-cao commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r239305307
--- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -167,64 +167,67 @@ private void writeSortedFile(boolean isLastFile) {
// record;
final byte[] writeBuffer = new byte[diskWriteBufferSize];
- // Because this output will be read during shuffle, its compression codec must be controlled by
- // spark.shuffle.compress instead of spark.shuffle.spill.compress, so we need to use
- // createTempShuffleBlock here; see SPARK-3426 for more details.
- final Tuple2<TempShuffleBlockId, File> spilledFileInfo =
- blockManager.diskBlockManager().createTempShuffleBlock();
- final File file = spilledFileInfo._2();
- final TempShuffleBlockId blockId = spilledFileInfo._1();
- final SpillInfo spillInfo = new SpillInfo(numPartitions, file, blockId);
-
- // Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter.
- // Our write path doesn't actually use this serializer (since we end up calling the `write()`
- // OutputStream methods), but DiskBlockObjectWriter still calls some methods on it. To work
- // around this, we pass a dummy no-op serializer.
- final SerializerInstance ser = DummySerializerInstance.INSTANCE;
-
- int currentPartition = -1;
- final FileSegment committedSegment;
- try (DiskBlockObjectWriter writer =
- blockManager.getDiskWriter(blockId, file, ser, fileBufferSizeBytes, writeMetricsToUse)) {
-
- final int uaoSize = UnsafeAlignedOffset.getUaoSize();
- while (sortedRecords.hasNext()) {
- sortedRecords.loadNext();
- final int partition = sortedRecords.packedRecordPointer.getPartitionId();
- assert (partition >= currentPartition);
- if (partition != currentPartition) {
- // Switch to the new partition
- if (currentPartition != -1) {
- final FileSegment fileSegment = writer.commitAndGet();
- spillInfo.partitionLengths[currentPartition] = fileSegment.length();
+ // If there are no sorted records, so we don't need to create an empty spill file.
+ if (sortedRecords.hasNext()) {
--- End diff --
Can you go to the function entry ahead of time to decide whether to return? Be similar to `BypassMergeSortShuffleWriter.write`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23225
**[Test build #99890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99890/testReport)** for PR 23225 at commit [`bac1991`](https://github.com/apache/spark/commit/bac1991a5c047fd546785179d19144f9c076aa58).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r239700999
--- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
@@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
}
}
+ @Test
+ public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception {
+ final UnsafeShuffleWriter<Object, Object> writer = createWriter(true);
+ writer.write(Iterators.emptyIterator());
+ final Option<MapStatus> mapStatus = writer.stop(true);
+ assertTrue(mapStatus.isDefined());
+ assertTrue(mergedOutputFile.exists());
+ assertEquals(0, spillFilesCreated.size());
--- End diff --
Test writeEmptyIterator() has create spill file although write empty iterator,but the writeEmptyIteratorNotCreateEmptySpillFile test not create spill file while there has not records in memroy
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r239860885
--- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) {
final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords =
inMemSorter.getSortedIterator();
+ // If there are no sorted records, so we don't need to create an empty spill file.
+ if (!sortedRecords.hasNext()) {
+ return;
+ }
--- End diff --
@wangjiaochun . Shall we move line 160 ~ 167 to the beginning of this function (line 147)?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r239697860
--- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
@@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
}
}
+ @Test
+ public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception {
--- End diff --
This looks like mostly a duplication of `writeEmptyIterator`. Logically, do we need to keep both `writeEmptyIterator` and `writeEmptyIteratorNotCreateEmptySpillFile` seperately?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [MINOR][CORE]Don't need to create an empty spill file wh...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/23225
If a test passes without/with this changes, it wouldn't verify the regression.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r239872296
--- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
@@ -235,6 +235,7 @@ public void writeEmptyIterator() throws Exception {
final Option<MapStatus> mapStatus = writer.stop(true);
assertTrue(mapStatus.isDefined());
assertTrue(mergedOutputFile.exists());
+ assertEquals(0, spillFilesCreated.size());
--- End diff --
BTW, since we are touching `ShuffleExternalSorter.java`, it seems that we need to add a test case to `ShuffleExternalSorterSuite`, too.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r239990083
--- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
@@ -235,6 +235,7 @@ public void writeEmptyIterator() throws Exception {
final Option<MapStatus> mapStatus = writer.stop(true);
assertTrue(mapStatus.isDefined());
assertTrue(mergedOutputFile.exists());
+ assertEquals(0, spillFilesCreated.size());
--- End diff --
I think it's unnecessary to add a new test case, and it can delete line 239~242 of this test writeEmptyIterator because they're always right.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r239990587
--- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) {
final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords =
inMemSorter.getSortedIterator();
+ // If there are no sorted records, so we don't need to create an empty spill file.
+ if (!sortedRecords.hasNext()) {
+ return;
+ }
--- End diff --
Okay, I will make the changes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [MINOR][CORE]Don't need to create an empty spill file wh...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23225
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23225
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5906/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/23225
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/23225
Thank you, @wangjiaochun .
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/23225
Retest this please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [MINOR][CORE]Don't need to create an empty spill file wh...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23225
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [MINOR][CORE]Don't need to create an empty spill file wh...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/23225
How come existing tests cover if the empty file is created or not?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r239697971
--- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
@@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
}
}
+ @Test
+ public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception {
+ final UnsafeShuffleWriter<Object, Object> writer = createWriter(true);
+ writer.write(Iterators.emptyIterator());
+ final Option<MapStatus> mapStatus = writer.stop(true);
+ assertTrue(mapStatus.isDefined());
+ assertTrue(mergedOutputFile.exists());
+ assertEquals(0, spillFilesCreated.size());
--- End diff --
It seems that we can move this line to `writeEmptyIterator()` instead of making a new test case.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23225
**[Test build #4458 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4458/testReport)** for PR 23225 at commit [`5a3e797`](https://github.com/apache/spark/commit/5a3e797b1338ea967e3d6bf7a80b945465f4891d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...
Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on the issue:
https://github.com/apache/spark/pull/23225
Okey.@dongjoon-hyun
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r240014126
--- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
@@ -235,11 +235,8 @@ public void writeEmptyIterator() throws Exception {
final Option<MapStatus> mapStatus = writer.stop(true);
assertTrue(mapStatus.isDefined());
assertTrue(mergedOutputFile.exists());
+ assertEquals(0, spillFilesCreated.size());
assertArrayEquals(new long[NUM_PARTITITONS], partitionSizesInMergedFile);
- assertEquals(0, taskMetrics.shuffleWriteMetrics().recordsWritten());
- assertEquals(0, taskMetrics.shuffleWriteMetrics().bytesWritten());
- assertEquals(0, taskMetrics.diskBytesSpilled());
- assertEquals(0, taskMetrics.memoryBytesSpilled());
--- End diff --
We need to keep these test coverage guaranteeing that the task metrics remained an untouched state.
Please revert this removal.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23225
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99890/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23225
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r239990036
--- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) {
final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords =
inMemSorter.getSortedIterator();
+ // If there are no sorted records, so we don't need to create an empty spill file.
+ if (!sortedRecords.hasNext()) {
+ return;
+ }
--- End diff --
If you're going to short-circuit, why not do it at the start of the function and save the rest of the work done above?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r239711919
--- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
@@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
}
}
+ @Test
+ public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception {
+ final UnsafeShuffleWriter<Object, Object> writer = createWriter(true);
+ writer.write(Iterators.emptyIterator());
+ final Option<MapStatus> mapStatus = writer.stop(true);
+ assertTrue(mapStatus.isDefined());
+ assertTrue(mergedOutputFile.exists());
+ assertEquals(0, spillFilesCreated.size());
--- End diff --
That's it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...
Posted by heary-cao <gi...@git.apache.org>.
Github user heary-cao commented on the issue:
https://github.com/apache/spark/pull/23225
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23225: [MINOR][CORE]Don't need to create an empty spill file wh...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/23225
Also, it needs a JIRA. it's not minor one.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23225#discussion_r239704796
--- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
@@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
}
}
+ @Test
+ public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception {
+ final UnsafeShuffleWriter<Object, Object> writer = createWriter(true);
+ writer.write(Iterators.emptyIterator());
+ final Option<MapStatus> mapStatus = writer.stop(true);
+ assertTrue(mapStatus.isDefined());
+ assertTrue(mergedOutputFile.exists());
+ assertEquals(0, spillFilesCreated.size());
--- End diff --
I mean that before add code "if (sortedRecords.hasNext()) { return }" it will fail. now add assertEquals(0, spillFilesCreated.size()) to writeEmptyIterator seems good.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org