You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by peshopetrov <gi...@git.apache.org> on 2018/02/05 17:46:13 UTC
[GitHub] spark pull request #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive...
GitHub user peshopetrov opened a pull request:
https://github.com/apache/spark/pull/20512
[SPARK-23182][CORE] Allow enabling TCP keep alive on the master RPC connections.
## What changes were proposed in this pull request?
Make it possible for the master to enable TCP keep alive on the RPC connections with clients.
## How was this patch tested?
Manually tested.
Added the following:
spark.rpc.io.enableTcpKeepAlive true
to spark-defaults.conf.
Observed the following:
# netstat -town | grep 7077
tcp6 0 0 10.240.3.134:7077 10.240.1.25:42851 ESTABLISHED keepalive (6736.50/0/0)
tcp6 0 0 10.240.3.134:44911 10.240.3.134:7077 ESTABLISHED keepalive (4098.68/0/0)
tcp6 0 0 10.240.3.134:7077 10.240.3.134:44911 ESTABLISHED keepalive (4098.68/0/0)
Which proves that the keep alive setting is taking effect.
It's currently possible to enable TCP keep alive on the worker / executor, but is not possible to configure on the master. It's unclear to me why this could be the case. Keep alive is more important for the master to protect it against suddenly departing workers / executors, thus I think it's very important to have it. Particularly this makes the master resilient in case of using preemptible worker VMs in GCE. GCE has the concept of shutdown scripts, which it doesn't guarantee to execute. So workers often don't get shutdown gracefully and the TCP connections on the master linger as there's nothing to close them. Thus the need of enabling keep alive.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/peshopetrov/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20512.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20512
----
commit c5e2d98b9e98fd3416a36ab91262260146bf4ac5
Author: Petar Petrov <pe...@...>
Date: 2018-01-23T09:02:41Z
[SPARK-23182][CORE] Allow enabling TCP keep alive on the master RPC connections.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...
Posted by vundela <gi...@git.apache.org>.
Github user vundela commented on the issue:
https://github.com/apache/spark/pull/20512
cc @squito @vanzin
Can you please comment on this PR?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20512
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20512
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/20512
Is it possible that TCP keepalive is disable by kernel, so that your approach cannot be worked? I was thinking if it is better to add application level heartbeat msg to detect lost workers?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20512
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...
Posted by peshopetrov <gi...@git.apache.org>.
Github user peshopetrov commented on the issue:
https://github.com/apache/spark/pull/20512
For completeness it should be possible to enable OS-level TCP keep alives. The client does enable TCP keepalive on its side and it should be possible on the server too.
However, independent of that it perhaps makes sense to also have application level heartbeats because in the JVM it seems it's not possible to tune the timeouts of TCP keepalive.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...
Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:
https://github.com/apache/spark/pull/20512
this is just far enough outside my expertise I don't have an opinion -- but @zsxwing might have some thoughts
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...
Posted by peshopetrov <gi...@git.apache.org>.
Github user peshopetrov commented on the issue:
https://github.com/apache/spark/pull/20512
Any update?
We have rolled out our Spark clusters with this change and it seems to be working great. We see no lingering connections on the masters.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org