You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by tg...@apache.org on 2020/01/29 21:13:46 UTC
[spark] branch branch-2.4 updated: [SPARK-30512] Added a dedicated
boss event loop group
This is an automated email from the ASF dual-hosted git repository.
tgraves pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-2.4 by this push:
new 12f4492 [SPARK-30512] Added a dedicated boss event loop group
12f4492 is described below
commit 12f449220dde0b8476f58806f7dee06fcb54da87
Author: Chandni Singh <ch...@linkedin.com>
AuthorDate: Wed Jan 29 15:02:48 2020 -0600
[SPARK-30512] Added a dedicated boss event loop group
### What changes were proposed in this pull request?
Adding a dedicated boss event loop group to the Netty pipeline in the External Shuffle Service to avoid the delay in channel registration.
```
EventLoopGroup bossGroup = NettyUtils.createEventLoop(ioMode, 1,
conf.getModuleName() + "-boss");
EventLoopGroup workerGroup = NettyUtils.createEventLoop(ioMode, conf.serverThreads(),
conf.getModuleName() + "-server");
bootstrap = new ServerBootstrap()
.group(bossGroup, workerGroup)
.channel(NettyUtils.getServerChannelClass(ioMode))
.option(ChannelOption.ALLOCATOR, allocator)
```
### Why are the changes needed?
We have been seeing a large number of SASL authentication (RPC requests) timing out with the external shuffle service.
```
java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timeout waiting for task.
at org.spark-project.guava.base.Throwables.propagate(Throwables.java:160)
at org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:278)
at org.apache.spark.network.sasl.SaslClientBootstrap.doBootstrap(SaslClientBootstrap.java:80)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:181)
at org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:141)
at org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:218)
```
The investigation that we have done is described here:
https://github.com/netty/netty/issues/9890
After adding `LoggingHandler` to the netty pipeline, we saw that the registration of the channel was getting delay which is because the worker threads are busy with the existing channels.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
We have tested the patch on our clusters and with a stress testing tool. After this change, we didn't see any SASL requests timing out. Existing unit tests pass.
Closes #27240 from otterc/SPARK-30512.
Authored-by: Chandni Singh <ch...@linkedin.com>
Signed-off-by: Thomas Graves <tg...@apache.org>
(cherry picked from commit 6b47ace27d04012bcff47951ea1eea2aa6fb7d60)
Signed-off-by: Thomas Graves <tg...@apache.org>
---
.../main/java/org/apache/spark/network/server/TransportServer.java | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java b/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java
index 9c85ab2..da3fe30 100644
--- a/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java
+++ b/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java
@@ -91,9 +91,10 @@ public class TransportServer implements Closeable {
private void init(String hostToBind, int portToBind) {
IOMode ioMode = IOMode.valueOf(conf.ioMode());
- EventLoopGroup bossGroup =
- NettyUtils.createEventLoop(ioMode, conf.serverThreads(), conf.getModuleName() + "-server");
- EventLoopGroup workerGroup = bossGroup;
+ EventLoopGroup bossGroup = NettyUtils.createEventLoop(ioMode, 1,
+ conf.getModuleName() + "-boss");
+ EventLoopGroup workerGroup = NettyUtils.createEventLoop(ioMode, conf.serverThreads(),
+ conf.getModuleName() + "-server");
PooledByteBufAllocator allocator = NettyUtils.createPooledByteBufAllocator(
conf.preferDirectBufs(), true /* allowCache */, conf.serverThreads());
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org