You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Manu Zhang (Jira)" <ji...@apache.org> on 2020/03/23 09:31:00 UTC
[jira] [Updated] (SPARK-31219) YarnShuffleService doesn't close
idle netty channel
[ https://issues.apache.org/jira/browse/SPARK-31219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Manu Zhang updated SPARK-31219:
-------------------------------
Description:
Recently, we find our YarnShuffleService has a lot of [half-open connections|https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html] where shuffle servers' connections are active while clients have already closed.
For example, from server's `ss -nt sport = :7337` output we have
{code:java}
ESTAB 0 0 server:7337 client:port
{code}
However, on client `ss -nt dport =: 7337 | grep server` would return nothing.
Looking at the code, `YarnShuffleService` creates a `TransportContext` with `closeIdleConnections` set to false.
{code:java}
public class YarnShuffleService extends AuxiliaryService {
...
@Override protected void serviceInit(Configuration conf) throws Exception {
...
transportContext = new TransportContext(transportConf, blockHandler);
...
}
...
}
public class TransportContext implements Closeable {
...
public TransportContext(TransportConf conf, RpcHandler rpcHandler) {
this(conf, rpcHandler, false, false);
}
public TransportContext(TransportConf conf, RpcHandler rpcHandler, boolean closeIdleConnections) {
this(conf, rpcHandler, closeIdleConnections, false);
}
...
}{code}
Hence, it's possible the channel may never get closed at server side if the server misses the event that the client has closed it.
I find that parameter is true for `ExternalShuffleService`.
Is there any reason for the difference here ? Can we enable closeIdleConnections in YarnShuffleService or at least add a configuration to enable it ?
was:
Recently, we find our YarnShuffleService has a lot of [half-open connections|https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html] where shuffle servers' connections are active while clients have already closed.
For example, from server's `ss -nt sport = :7337` output we have
{code:java}
ESTAB 0 0 server:7337 client:port
{code}
However, on client `ss -nt dport =: 7337 | grep server` would return nothing.
Looking at the code, `YarnShuffleService` creates a `TransportContext` with `closeIdleConnections` set to false.
{code:java}
public class YarnShuffleService extends AuxiliaryService {
...
@Override protected void serviceInit(Configuration conf) throws Exception {
...
transportContext = new TransportContext(transportConf, blockHandler);
...
}
...
}
public class TransportContext implements Closeable {
...
public TransportContext(TransportConf conf, RpcHandler rpcHandler) {
this(conf, rpcHandler, false, false);
}
public TransportContext(TransportConf conf, RpcHandler rpcHandler, boolean closeIdleConnections) {
this(conf, rpcHandler, closeIdleConnections, false);
}
...
}{code}
Hence, it's possible the channel may never get closed at server side if the server misses the event that the client has closed it.
I find that parameter is true for `ExternalShuffleService`.
Is there any reason for the difference here ? Will it be valuable to add a configuration to allow enabling closeIdleConnections ?
> YarnShuffleService doesn't close idle netty channel
> ---------------------------------------------------
>
> Key: SPARK-31219
> URL: https://issues.apache.org/jira/browse/SPARK-31219
> Project: Spark
> Issue Type: Improvement
> Components: Shuffle
> Affects Versions: 2.4.5, 3.0.0
> Reporter: Manu Zhang
> Priority: Major
>
> Recently, we find our YarnShuffleService has a lot of [half-open connections|https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html] where shuffle servers' connections are active while clients have already closed.
> For example, from server's `ss -nt sport = :7337` output we have
> {code:java}
> ESTAB 0 0 server:7337 client:port
> {code}
> However, on client `ss -nt dport =: 7337 | grep server` would return nothing.
> Looking at the code, `YarnShuffleService` creates a `TransportContext` with `closeIdleConnections` set to false.
> {code:java}
> public class YarnShuffleService extends AuxiliaryService {
> ...
> @Override protected void serviceInit(Configuration conf) throws Exception {
> ...
> transportContext = new TransportContext(transportConf, blockHandler);
> ...
> }
> ...
> }
> public class TransportContext implements Closeable {
> ...
> public TransportContext(TransportConf conf, RpcHandler rpcHandler) {
> this(conf, rpcHandler, false, false);
> }
> public TransportContext(TransportConf conf, RpcHandler rpcHandler, boolean closeIdleConnections) {
> this(conf, rpcHandler, closeIdleConnections, false);
> }
> ...
> }{code}
> Hence, it's possible the channel may never get closed at server side if the server misses the event that the client has closed it.
> I find that parameter is true for `ExternalShuffleService`.
> Is there any reason for the difference here ? Can we enable closeIdleConnections in YarnShuffleService or at least add a configuration to enable it ?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org