You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by WenboZhao <gi...@git.apache.org> on 2018/06/19 20:43:05 UTC

[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...

GitHub user WenboZhao opened a pull request:

    https://github.com/apache/spark/pull/21593

    [SPARK-24578][Core] Cap sub-region's size of returned nio buffer

    ## What changes were proposed in this pull request?
    This PR tries to fix the performance regression introduced by SPARK-21517. 
    
    In our production job, we performed many parallel computations, with high possibility, some task could be scheduled to a host-2 where it needs to read the cache block data from host-1. Often, this big transfer makes the cluster suffer time out issue (it will retry 3 times, each with 120s timeout, and then do recompute to put the cache block into the local MemoryStore).
    
    The root cause is that we don't do `consolidateIfNeeded` anymore for many small chunks which causes the `buf.notBuffer()` has bad performance in the case that we have to call `copyByteBuf()` many times.
    
    ## How was this patch tested?
    Existing unit tests and also test in production 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/WenboZhao/spark spark-24578

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21593.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21593
    
----
commit a30d4de019ac4380cf5bfd36ff0cf12ef72d78f7
Author: Wenbo Zhao <wz...@...>
Date:   2018-06-19T20:34:30Z

    Cap sub-region's size of returned nio buffer

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21593


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:

    https://github.com/apache/spark/pull/21593
  
    Jenkins, ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21593
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21593
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92114/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21593#discussion_r210094158
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
    @@ -137,30 +137,15 @@ protected void deallocate() {
       }
     
       private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
    -    ByteBuffer buffer = buf.nioBuffer();
    -    int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
    -      target.write(buffer) : writeNioBuffer(target, buffer);
    +    // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
    +    // for the case that the passed-in buffer has too many components.
    +    int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
    +    ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
    --- End diff --
    
    for anybody watching this, eventually SPARK-25115 was opened (currently has a PR)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...

Posted by WenboZhao <gi...@git.apache.org>.
Github user WenboZhao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21593#discussion_r196939992
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
    @@ -137,30 +137,15 @@ protected void deallocate() {
       }
     
       private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
    -    ByteBuffer buffer = buf.nioBuffer();
    -    int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
    -      target.write(buffer) : writeNioBuffer(target, buffer);
    +    // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
    +    // for the case that the passed-in buffer has too many components.
    +    int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
    +    ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
    --- End diff --
    
    Thanks @squito and @zsxwing. I would prefer to do it in a different PR with more careful benchmark and testing. As @squito, that change is more prone to bugs.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21593
  
    **[Test build #92114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92114/testReport)** for PR 21593 at commit [`a30d4de`](https://github.com/apache/spark/commit/a30d4de019ac4380cf5bfd36ff0cf12ef72d78f7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21593#discussion_r196933769
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
    @@ -137,30 +137,15 @@ protected void deallocate() {
       }
     
       private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
    -    ByteBuffer buffer = buf.nioBuffer();
    -    int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
    -      target.write(buffer) : writeNioBuffer(target, buffer);
    +    // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
    +    // for the case that the passed-in buffer has too many components.
    +    int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
    +    ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
    --- End diff --
    
    this pr is fixing a pretty serious issue, we know Wenbo is going to roll this out immediately, and I suspect even more users will.  this fix is also "obviously correct" -- the followup here is not super complicated, but also will be more prone to bugs.  So I'm inclined to just get this in.
    
    anyway if @WenboZhao can do the other part today, then sure, but I think we should get this in quickly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21593
  
    **[Test build #92122 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92122/testReport)** for PR 21593 at commit [`a30d4de`](https://github.com/apache/spark/commit/a30d4de019ac4380cf5bfd36ff0cf12ef72d78f7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21593
  
    **[Test build #92114 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92114/testReport)** for PR 21593 at commit [`a30d4de`](https://github.com/apache/spark/commit/a30d4de019ac4380cf5bfd36ff0cf12ef72d78f7).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/21593
  
    Thanks! Merging to master and 2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21593
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...

Posted by WenboZhao <gi...@git.apache.org>.
Github user WenboZhao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21593#discussion_r196761978
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
    @@ -137,30 +137,15 @@ protected void deallocate() {
       }
     
       private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
    -    ByteBuffer buffer = buf.nioBuffer();
    -    int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
    -      target.write(buffer) : writeNioBuffer(target, buffer);
    +    // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
    +    // for the case that the passed-in buffer has too many components.
    +    int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
    +    ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
    --- End diff --
    
    Sure, I will make a follow up PR to address this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21593
  
    **[Test build #92122 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92122/testReport)** for PR 21593 at commit [`a30d4de`](https://github.com/apache/spark/commit/a30d4de019ac4380cf5bfd36ff0cf12ef72d78f7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21593
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21593#discussion_r203157915
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
    @@ -137,30 +137,15 @@ protected void deallocate() {
       }
     
       private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
    -    ByteBuffer buffer = buf.nioBuffer();
    -    int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
    -      target.write(buffer) : writeNioBuffer(target, buffer);
    +    // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
    +    // for the case that the passed-in buffer has too many components.
    +    int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
    +    ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
    --- End diff --
    
    @WenboZhao did you ever follow up on this, or at least file another jira for it?  sorry if I missed it


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21593
  
    cc @zsxwing @JoshRosen 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21593
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92122/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21593#discussion_r196918296
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
    @@ -137,30 +137,15 @@ protected void deallocate() {
       }
     
       private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
    -    ByteBuffer buffer = buf.nioBuffer();
    -    int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
    -      target.write(buffer) : writeNioBuffer(target, buffer);
    +    // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
    +    // for the case that the passed-in buffer has too many components.
    +    int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
    +    ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
    --- End diff --
    
    Why not do this in this PR since this is a small change and we don't have a new release recently?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21593#discussion_r196637144
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
    @@ -137,30 +137,15 @@ protected void deallocate() {
       }
     
       private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
    -    ByteBuffer buffer = buf.nioBuffer();
    -    int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
    -      target.write(buffer) : writeNioBuffer(target, buffer);
    +    // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
    +    // for the case that the passed-in buffer has too many components.
    +    int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
    +    ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
    --- End diff --
    
    I think you can go one step further here, and call `buf.nioBuffers(int, int)` (plural) 
    https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ByteBuf.java#L2355
    
    that will avoid the copying required to create the merged buffer (though its a bit complicated as you have to check for incomplete writes from any single `target.write()` call).
    
    Also OK to leave this for now as this is a pretty important fix.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21593
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21593#discussion_r196947295
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
    @@ -137,30 +137,15 @@ protected void deallocate() {
       }
     
       private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
    -    ByteBuffer buffer = buf.nioBuffer();
    -    int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
    -      target.write(buffer) : writeNioBuffer(target, buffer);
    +    // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
    +    // for the case that the passed-in buffer has too many components.
    +    int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
    +    ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
    --- End diff --
    
    Fair enough.
    
    I just spent several minutes to write the following codes:
    ```
      private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
        // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
        // for the case that the passed-in buffer has too many components.
        int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
        ByteBuffer[] buffers = buf.nioBuffers(buf.readerIndex(), length);
        int totalWritten = 0;
        for (ByteBuffer buffer : buffers) {
          int remaining = buffer.remaining();
          int written = target.write(buffer);
          totalWritten += written;
          if (written < remaining) {
            break;
          }
        }
        buf.skipBytes(totalWritten);
        return totalWritten;
      }
    ```
    Feel free to use them in your follow up PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21593
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21593
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org