You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wangjiaochun <gi...@git.apache.org> on 2018/12/05 08:04:49 UTC

[GitHub] spark pull request #23225: [MINOR][CORE]Don't need to create an empty spill ...

GitHub user wangjiaochun opened a pull request:

    https://github.com/apache/spark/pull/23225

    [MINOR][CORE]Don't need to create an empty spill file when memory has no records

    ## What changes were proposed in this pull request?
     If there are no records in memory, then we don't need to create an empty temp spill file.
    
    ## How was this patch tested?
    Existing tests
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangjiaochun/spark ShufflSorter

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23225.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23225
    
----
commit 6f9dcee39131429d9b40df45229149ffdde4fdbd
Author: 10087686 <wa...@...>
Date:   2018-12-05T07:52:55Z

    "add if"

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r239707130
  
    --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
    @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
         }
       }
     
    +  @Test
    +  public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception {
    +    final UnsafeShuffleWriter<Object, Object> writer = createWriter(true);
    +    writer.write(Iterators.emptyIterator());
    +    final Option<MapStatus> mapStatus = writer.stop(true);
    +    assertTrue(mapStatus.isDefined());
    +    assertTrue(mergedOutputFile.exists());
    +    assertEquals(0, spillFilesCreated.size());
    --- End diff --
    
    Ya. That's the same what I observed. There is no need to keep both test cases because there is no code path like that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [MINOR][CORE]Don't need to create an empty spill file wh...

Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    1. I think test case writeEmptyIterator in UnsafeShuffleWriterSuite.java cover this scenes
    2. I will propose a  JIRA soon.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    **[Test build #99890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99890/testReport)** for PR 23225 at commit [`bac1991`](https://github.com/apache/spark/commit/bac1991a5c047fd546785179d19144f9c076aa58).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    **[Test build #4458 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4458/testReport)** for PR 23225 at commit [`5a3e797`](https://github.com/apache/spark/commit/5a3e797b1338ea967e3d6bf7a80b945465f4891d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r239989805
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) {
         final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords =
           inMemSorter.getSortedIterator();
     
    +    // If there are no sorted records, so we don't need to create an empty spill file.
    +    if (!sortedRecords.hasNext()) {
    +      return;
    +    }
    --- End diff --
    
    I think it's better not to do that.Because change the original code style and it don't makes an appreciable difference in readability.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r239859088
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) {
         final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords =
           inMemSorter.getSortedIterator();
     
    +    // If there are no sorted records, so we don't need to create an empty spill file.
    +    if (!sortedRecords.hasNext()) {
    +      return;
    +    }
    --- End diff --
    
    @wangjiaochun . Can we move this line 160 ~ 167 to line 147? If then, we don't need to do `new SuffleWriteMetrics()` at all.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [MINOR][CORE]Don't need to create an empty spill file wh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r239702595
  
    --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
    @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
         }
       }
     
    +  @Test
    +  public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception {
    +    final UnsafeShuffleWriter<Object, Object> writer = createWriter(true);
    +    writer.write(Iterators.emptyIterator());
    +    final Option<MapStatus> mapStatus = writer.stop(true);
    +    assertTrue(mapStatus.isDefined());
    +    assertTrue(mergedOutputFile.exists());
    +    assertEquals(0, spillFilesCreated.size());
    --- End diff --
    
    Do you mean that `writeEmptyIterator` will fail if we add this line `assertEquals(0, spillFilesCreated.size());` there?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [MINOR][CORE]Don't need to create an empty spill ...

Posted by heary-cao <gi...@git.apache.org>.
Github user heary-cao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r239305307
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -167,64 +167,67 @@ private void writeSortedFile(boolean isLastFile) {
         // record;
         final byte[] writeBuffer = new byte[diskWriteBufferSize];
     
    -    // Because this output will be read during shuffle, its compression codec must be controlled by
    -    // spark.shuffle.compress instead of spark.shuffle.spill.compress, so we need to use
    -    // createTempShuffleBlock here; see SPARK-3426 for more details.
    -    final Tuple2<TempShuffleBlockId, File> spilledFileInfo =
    -      blockManager.diskBlockManager().createTempShuffleBlock();
    -    final File file = spilledFileInfo._2();
    -    final TempShuffleBlockId blockId = spilledFileInfo._1();
    -    final SpillInfo spillInfo = new SpillInfo(numPartitions, file, blockId);
    -
    -    // Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter.
    -    // Our write path doesn't actually use this serializer (since we end up calling the `write()`
    -    // OutputStream methods), but DiskBlockObjectWriter still calls some methods on it. To work
    -    // around this, we pass a dummy no-op serializer.
    -    final SerializerInstance ser = DummySerializerInstance.INSTANCE;
    -
    -    int currentPartition = -1;
    -    final FileSegment committedSegment;
    -    try (DiskBlockObjectWriter writer =
    -        blockManager.getDiskWriter(blockId, file, ser, fileBufferSizeBytes, writeMetricsToUse)) {
    -
    -      final int uaoSize = UnsafeAlignedOffset.getUaoSize();
    -      while (sortedRecords.hasNext()) {
    -        sortedRecords.loadNext();
    -        final int partition = sortedRecords.packedRecordPointer.getPartitionId();
    -        assert (partition >= currentPartition);
    -        if (partition != currentPartition) {
    -          // Switch to the new partition
    -          if (currentPartition != -1) {
    -            final FileSegment fileSegment = writer.commitAndGet();
    -            spillInfo.partitionLengths[currentPartition] = fileSegment.length();
    +    // If there are no sorted records, so we don't need to create an empty spill file.
    +    if (sortedRecords.hasNext()) {
    --- End diff --
    
    Can you go to the function entry ahead of time to decide whether to return? Be similar to `BypassMergeSortShuffleWriter.write`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    **[Test build #99890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99890/testReport)** for PR 23225 at commit [`bac1991`](https://github.com/apache/spark/commit/bac1991a5c047fd546785179d19144f9c076aa58).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r239700999
  
    --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
    @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
         }
       }
     
    +  @Test
    +  public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception {
    +    final UnsafeShuffleWriter<Object, Object> writer = createWriter(true);
    +    writer.write(Iterators.emptyIterator());
    +    final Option<MapStatus> mapStatus = writer.stop(true);
    +    assertTrue(mapStatus.isDefined());
    +    assertTrue(mergedOutputFile.exists());
    +    assertEquals(0, spillFilesCreated.size());
    --- End diff --
    
    Test writeEmptyIterator() has create spill file although write  empty iterator,but the writeEmptyIteratorNotCreateEmptySpillFile test not create spill file while there has not records in memroy


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r239860885
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) {
         final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords =
           inMemSorter.getSortedIterator();
     
    +    // If there are no sorted records, so we don't need to create an empty spill file.
    +    if (!sortedRecords.hasNext()) {
    +      return;
    +    }
    --- End diff --
    
    @wangjiaochun . Shall we move line 160 ~ 167 to the beginning of this function (line 147)?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r239697860
  
    --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
    @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
         }
       }
     
    +  @Test
    +  public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception {
    --- End diff --
    
    This looks like mostly a duplication of `writeEmptyIterator`. Logically, do we need to keep both `writeEmptyIterator` and `writeEmptyIteratorNotCreateEmptySpillFile` seperately?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [MINOR][CORE]Don't need to create an empty spill file wh...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    If a test passes without/with this changes, it wouldn't verify the regression.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r239872296
  
    --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
    @@ -235,6 +235,7 @@ public void writeEmptyIterator() throws Exception {
         final Option<MapStatus> mapStatus = writer.stop(true);
         assertTrue(mapStatus.isDefined());
         assertTrue(mergedOutputFile.exists());
    +    assertEquals(0, spillFilesCreated.size());
    --- End diff --
    
    BTW, since we are touching `ShuffleExternalSorter.java`, it seems that we need to add a test case to `ShuffleExternalSorterSuite`, too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r239990083
  
    --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
    @@ -235,6 +235,7 @@ public void writeEmptyIterator() throws Exception {
         final Option<MapStatus> mapStatus = writer.stop(true);
         assertTrue(mapStatus.isDefined());
         assertTrue(mergedOutputFile.exists());
    +    assertEquals(0, spillFilesCreated.size());
    --- End diff --
    
    I think it's unnecessary to add  a new test case, and it can delete  line 239~242 of this test writeEmptyIterator because they're always right.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r239990587
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) {
         final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords =
           inMemSorter.getSortedIterator();
     
    +    // If there are no sorted records, so we don't need to create an empty spill file.
    +    if (!sortedRecords.hasNext()) {
    +      return;
    +    }
    --- End diff --
    
    Okay, I will make the changes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [MINOR][CORE]Don't need to create an empty spill file wh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5906/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/23225


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    Thank you, @wangjiaochun .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    Retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [MINOR][CORE]Don't need to create an empty spill file wh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [MINOR][CORE]Don't need to create an empty spill file wh...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    How come existing tests cover if the empty file is created or not?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r239697971
  
    --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
    @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
         }
       }
     
    +  @Test
    +  public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception {
    +    final UnsafeShuffleWriter<Object, Object> writer = createWriter(true);
    +    writer.write(Iterators.emptyIterator());
    +    final Option<MapStatus> mapStatus = writer.stop(true);
    +    assertTrue(mapStatus.isDefined());
    +    assertTrue(mergedOutputFile.exists());
    +    assertEquals(0, spillFilesCreated.size());
    --- End diff --
    
    It seems that we can move this line to `writeEmptyIterator()` instead of making a new test case.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    **[Test build #4458 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4458/testReport)** for PR 23225 at commit [`5a3e797`](https://github.com/apache/spark/commit/5a3e797b1338ea967e3d6bf7a80b945465f4891d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    Okey.@dongjoon-hyun


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r240014126
  
    --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
    @@ -235,11 +235,8 @@ public void writeEmptyIterator() throws Exception {
         final Option<MapStatus> mapStatus = writer.stop(true);
         assertTrue(mapStatus.isDefined());
         assertTrue(mergedOutputFile.exists());
    +    assertEquals(0, spillFilesCreated.size());
         assertArrayEquals(new long[NUM_PARTITITONS], partitionSizesInMergedFile);
    -    assertEquals(0, taskMetrics.shuffleWriteMetrics().recordsWritten());
    -    assertEquals(0, taskMetrics.shuffleWriteMetrics().bytesWritten());
    -    assertEquals(0, taskMetrics.diskBytesSpilled());
    -    assertEquals(0, taskMetrics.memoryBytesSpilled());
    --- End diff --
    
    We need to keep these test coverage guaranteeing that the task metrics remained an untouched state.
    Please revert this removal.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99890/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r239990036
  
    --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
    @@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) {
         final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords =
           inMemSorter.getSortedIterator();
     
    +    // If there are no sorted records, so we don't need to create an empty spill file.
    +    if (!sortedRecords.hasNext()) {
    +      return;
    +    }
    --- End diff --
    
    If you're going to short-circuit, why not do it at the start of the function and save the rest of the work done above?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r239711919
  
    --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
    @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
         }
       }
     
    +  @Test
    +  public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception {
    +    final UnsafeShuffleWriter<Object, Object> writer = createWriter(true);
    +    writer.write(Iterators.emptyIterator());
    +    final Option<MapStatus> mapStatus = writer.stop(true);
    +    assertTrue(mapStatus.isDefined());
    +    assertTrue(mergedOutputFile.exists());
    +    assertEquals(0, spillFilesCreated.size());
    --- End diff --
    
    That's it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [SPARK-26287][CORE]Don't need to create an empty spill f...

Posted by heary-cao <gi...@git.apache.org>.
Github user heary-cao commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23225: [MINOR][CORE]Don't need to create an empty spill file wh...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23225
  
    Also, it needs a JIRA. it's not minor one.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

Posted by wangjiaochun <gi...@git.apache.org>.
Github user wangjiaochun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23225#discussion_r239704796
  
    --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java ---
    @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception {
         }
       }
     
    +  @Test
    +  public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception {
    +    final UnsafeShuffleWriter<Object, Object> writer = createWriter(true);
    +    writer.write(Iterators.emptyIterator());
    +    final Option<MapStatus> mapStatus = writer.stop(true);
    +    assertTrue(mapStatus.isDefined());
    +    assertTrue(mergedOutputFile.exists());
    +    assertEquals(0, spillFilesCreated.size());
    --- End diff --
    
    I mean that before add code "if (sortedRecords.hasNext()) { return }" it will fail. now add assertEquals(0, spillFilesCreated.size()) to writeEmptyIterator seems good. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org