You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by WenboZhao <gi...@git.apache.org> on 2018/06/19 20:43:05 UTC
[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...
GitHub user WenboZhao opened a pull request:
https://github.com/apache/spark/pull/21593
[SPARK-24578][Core] Cap sub-region's size of returned nio buffer
## What changes were proposed in this pull request?
This PR tries to fix the performance regression introduced by SPARK-21517.
In our production job, we performed many parallel computations, with high possibility, some task could be scheduled to a host-2 where it needs to read the cache block data from host-1. Often, this big transfer makes the cluster suffer time out issue (it will retry 3 times, each with 120s timeout, and then do recompute to put the cache block into the local MemoryStore).
The root cause is that we don't do `consolidateIfNeeded` anymore for many small chunks which causes the `buf.notBuffer()` has bad performance in the case that we have to call `copyByteBuf()` many times.
## How was this patch tested?
Existing unit tests and also test in production
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/WenboZhao/spark spark-24578
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21593.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21593
----
commit a30d4de019ac4380cf5bfd36ff0cf12ef72d78f7
Author: Wenbo Zhao <wz...@...>
Date: 2018-06-19T20:34:30Z
Cap sub-region's size of returned nio buffer
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/21593
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:
https://github.com/apache/spark/pull/21593
Jenkins, ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21593
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21593
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92114/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...
Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/21593#discussion_r210094158
--- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
@@ -137,30 +137,15 @@ protected void deallocate() {
}
private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
- ByteBuffer buffer = buf.nioBuffer();
- int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
- target.write(buffer) : writeNioBuffer(target, buffer);
+ // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
+ // for the case that the passed-in buffer has too many components.
+ int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
+ ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
--- End diff --
for anybody watching this, eventually SPARK-25115 was opened (currently has a PR)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...
Posted by WenboZhao <gi...@git.apache.org>.
Github user WenboZhao commented on a diff in the pull request:
https://github.com/apache/spark/pull/21593#discussion_r196939992
--- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
@@ -137,30 +137,15 @@ protected void deallocate() {
}
private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
- ByteBuffer buffer = buf.nioBuffer();
- int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
- target.write(buffer) : writeNioBuffer(target, buffer);
+ // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
+ // for the case that the passed-in buffer has too many components.
+ int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
+ ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
--- End diff --
Thanks @squito and @zsxwing. I would prefer to do it in a different PR with more careful benchmark and testing. As @squito, that change is more prone to bugs.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21593
**[Test build #92114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92114/testReport)** for PR 21593 at commit [`a30d4de`](https://github.com/apache/spark/commit/a30d4de019ac4380cf5bfd36ff0cf12ef72d78f7).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...
Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/21593#discussion_r196933769
--- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
@@ -137,30 +137,15 @@ protected void deallocate() {
}
private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
- ByteBuffer buffer = buf.nioBuffer();
- int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
- target.write(buffer) : writeNioBuffer(target, buffer);
+ // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
+ // for the case that the passed-in buffer has too many components.
+ int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
+ ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
--- End diff --
this pr is fixing a pretty serious issue, we know Wenbo is going to roll this out immediately, and I suspect even more users will. this fix is also "obviously correct" -- the followup here is not super complicated, but also will be more prone to bugs. So I'm inclined to just get this in.
anyway if @WenboZhao can do the other part today, then sure, but I think we should get this in quickly.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21593
**[Test build #92122 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92122/testReport)** for PR 21593 at commit [`a30d4de`](https://github.com/apache/spark/commit/a30d4de019ac4380cf5bfd36ff0cf12ef72d78f7).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21593
**[Test build #92114 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92114/testReport)** for PR 21593 at commit [`a30d4de`](https://github.com/apache/spark/commit/a30d4de019ac4380cf5bfd36ff0cf12ef72d78f7).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:
https://github.com/apache/spark/pull/21593
Thanks! Merging to master and 2.3.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21593
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...
Posted by WenboZhao <gi...@git.apache.org>.
Github user WenboZhao commented on a diff in the pull request:
https://github.com/apache/spark/pull/21593#discussion_r196761978
--- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
@@ -137,30 +137,15 @@ protected void deallocate() {
}
private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
- ByteBuffer buffer = buf.nioBuffer();
- int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
- target.write(buffer) : writeNioBuffer(target, buffer);
+ // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
+ // for the case that the passed-in buffer has too many components.
+ int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
+ ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
--- End diff --
Sure, I will make a follow up PR to address this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21593
**[Test build #92122 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92122/testReport)** for PR 21593 at commit [`a30d4de`](https://github.com/apache/spark/commit/a30d4de019ac4380cf5bfd36ff0cf12ef72d78f7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21593
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...
Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/21593#discussion_r203157915
--- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
@@ -137,30 +137,15 @@ protected void deallocate() {
}
private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
- ByteBuffer buffer = buf.nioBuffer();
- int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
- target.write(buffer) : writeNioBuffer(target, buffer);
+ // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
+ // for the case that the passed-in buffer has too many components.
+ int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
+ ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
--- End diff --
@WenboZhao did you ever follow up on this, or at least file another jira for it? sorry if I missed it
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21593
cc @zsxwing @JoshRosen
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21593
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92122/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/21593#discussion_r196918296
--- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
@@ -137,30 +137,15 @@ protected void deallocate() {
}
private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
- ByteBuffer buffer = buf.nioBuffer();
- int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
- target.write(buffer) : writeNioBuffer(target, buffer);
+ // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
+ // for the case that the passed-in buffer has too many components.
+ int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
+ ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
--- End diff --
Why not do this in this PR since this is a small change and we don't have a new release recently?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...
Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/21593#discussion_r196637144
--- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
@@ -137,30 +137,15 @@ protected void deallocate() {
}
private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
- ByteBuffer buffer = buf.nioBuffer();
- int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
- target.write(buffer) : writeNioBuffer(target, buffer);
+ // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
+ // for the case that the passed-in buffer has too many components.
+ int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
+ ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
--- End diff --
I think you can go one step further here, and call `buf.nioBuffers(int, int)` (plural)
https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ByteBuf.java#L2355
that will avoid the copying required to create the merged buffer (though its a bit complicated as you have to check for incomplete writes from any single `target.write()` call).
Also OK to leave this for now as this is a pretty important fix.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21593
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/21593#discussion_r196947295
--- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
@@ -137,30 +137,15 @@ protected void deallocate() {
}
private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
- ByteBuffer buffer = buf.nioBuffer();
- int written = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
- target.write(buffer) : writeNioBuffer(target, buffer);
+ // SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
+ // for the case that the passed-in buffer has too many components.
+ int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
+ ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
--- End diff --
Fair enough.
I just spent several minutes to write the following codes:
```
private int copyByteBuf(ByteBuf buf, WritableByteChannel target) throws IOException {
// SPARK-24578: cap the sub-region's size of returned nio buffer to improve the performance
// for the case that the passed-in buffer has too many components.
int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
ByteBuffer[] buffers = buf.nioBuffers(buf.readerIndex(), length);
int totalWritten = 0;
for (ByteBuffer buffer : buffers) {
int remaining = buffer.remaining();
int written = target.write(buffer);
totalWritten += written;
if (written < remaining) {
break;
}
}
buf.skipBytes(totalWritten);
return totalWritten;
}
```
Feel free to use them in your follow up PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21593
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21593: [SPARK-24578][Core] Cap sub-region's size of returned ni...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21593
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org