You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2014/08/01 00:37:40 UTC
[jira] [Comment Edited] (SPARK-2282) PySpark crashes if too many
tasks complete quickly
[ https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081616#comment-14081616 ]
Josh Rosen edited comment on SPARK-2282 at 7/31/14 10:37 PM:
-------------------------------------------------------------
Merged the improved fix from https://github.com/apache/spark/pull/1503 into 1.1.
was (Author: joshrosen):
Merged the improved fix from https://github.com/apache/spark/pull/1503
> PySpark crashes if too many tasks complete quickly
> --------------------------------------------------
>
> Key: SPARK-2282
> URL: https://issues.apache.org/jira/browse/SPARK-2282
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 0.9.1, 1.0.0, 1.0.1
> Reporter: Aaron Davidson
> Assignee: Aaron Davidson
> Fix For: 0.9.2, 1.0.0, 1.0.1, 1.1.0
>
>
> Upon every task completion, PythonAccumulatorParam constructs a new socket to the Accumulator server running inside the pyspark daemon. This can cause a buildup of used ephemeral ports from sockets in the TIME_WAIT termination stage, which will cause the SparkContext to crash if too many tasks complete too quickly. We ran into this bug with 17k tasks completing in 15 seconds.
> This bug can be fixed outside of Spark by ensuring these properties are set (on a linux server);
> echo "1" > /proc/sys/net/ipv4/tcp_tw_reuse
> echo "1" > /proc/sys/net/ipv4/tcp_tw_recycle
> or by adding the SO_REUSEADDR option to the Socket creation within Spark.
--
This message was sent by Atlassian JIRA
(v6.2#6252)