You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by peshopetrov <gi...@git.apache.org> on 2018/02/05 17:46:13 UTC

[GitHub] spark pull request #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive...

GitHub user peshopetrov opened a pull request:

    https://github.com/apache/spark/pull/20512

    [SPARK-23182][CORE] Allow enabling TCP keep alive on the master RPC connections.

    ## What changes were proposed in this pull request?
    
    Make it possible for the master to enable TCP keep alive on the RPC connections with clients.
    
    ## How was this patch tested?
    
    Manually tested.
    
    Added the following:
    spark.rpc.io.enableTcpKeepAlive  true
    
    to spark-defaults.conf.
    
    Observed the following:
    # netstat -town | grep 7077
    tcp6       0      0 10.240.3.134:7077       10.240.1.25:42851       ESTABLISHED keepalive (6736.50/0/0)
    tcp6       0      0 10.240.3.134:44911      10.240.3.134:7077       ESTABLISHED keepalive (4098.68/0/0)
    tcp6       0      0 10.240.3.134:7077       10.240.3.134:44911      ESTABLISHED keepalive (4098.68/0/0)
    
    Which proves that the keep alive setting is taking effect.
    
    
    It's currently possible to enable TCP keep alive on the worker / executor, but is not possible to configure on the master. It's unclear to me why this could be the case. Keep alive is more important for the master to protect it against suddenly departing workers / executors, thus I think it's very important to have it. Particularly this makes the master resilient in case of using preemptible worker VMs in GCE. GCE has the concept of shutdown scripts, which it doesn't guarantee to execute. So workers often don't get shutdown gracefully and the TCP connections on the master linger as there's nothing to close them. Thus the need of enabling keep alive.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/peshopetrov/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20512.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20512
    
----
commit c5e2d98b9e98fd3416a36ab91262260146bf4ac5
Author: Petar Petrov <pe...@...>
Date:   2018-01-23T09:02:41Z

    [SPARK-23182][CORE] Allow enabling TCP keep alive on the master RPC connections.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...

Posted by vundela <gi...@git.apache.org>.
Github user vundela commented on the issue:

    https://github.com/apache/spark/pull/20512
  
    cc @squito @vanzin 
    Can you please comment on this PR? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20512
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20512
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/20512
  
    Is it possible that TCP keepalive is disable by kernel, so that your approach cannot be worked? I was thinking if it is better to add application level heartbeat msg to detect lost workers?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20512
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...

Posted by peshopetrov <gi...@git.apache.org>.
Github user peshopetrov commented on the issue:

    https://github.com/apache/spark/pull/20512
  
    For completeness it should be possible to enable OS-level TCP keep alives. The client does enable TCP keepalive on its side and it should be possible on the server too.
    
    However, independent of that it perhaps makes sense to also have application level heartbeats because in the JVM it seems it's not possible to tune the timeouts of TCP keepalive.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:

    https://github.com/apache/spark/pull/20512
  
    this is just far enough outside my expertise I don't have an opinion -- but @zsxwing might have some thoughts


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...

Posted by peshopetrov <gi...@git.apache.org>.
Github user peshopetrov commented on the issue:

    https://github.com/apache/spark/pull/20512
  
    Any update?
    We have rolled out our Spark clusters with this change and it seems to be working great. We see no lingering connections on the masters.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org