You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Antony Mayi <an...@yahoo.com.INVALID> on 2016/01/04 10:40:05 UTC

Re: pyspark streaming crashes

just for reference in my case this problem is caused by this bug: https://issues.apache.org/jira/browse/SPARK-12617 

    On Monday, 21 December 2015, 14:32, Antony Mayi <an...@yahoo.com> wrote:
 
 

 I noticed it might be related to longer GC pauses (1-2 sec) - the crash usually occurs after such pause. could that be causing the python-java gateway timing out? 

    On Sunday, 20 December 2015, 23:05, Antony Mayi <an...@yahoo.com> wrote:
 
 

 Hi,
can anyone please help me troubleshooting this prob: I have a streaming pyspark application (spark 1.5.2 on yarn-client) which keeps crashing after few hours. Doesn't seem to be running out of mem neither on driver or executors.
driver error:
py4j.protocol.Py4JJavaError: An error occurred while calling o1.awaitTermination.: java.io.IOException: py4j.Py4JException: Error while obtaining a new communication channel        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)        at org.apache.spark.streaming.api.python.TransformFunction.writeObject(PythonDStream.scala:77)        at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)

(all) executors error:
  File "/u04/yarn/local/usercache/das/appcache/application_1450337892069_0336/container_1450337892069_0336_01_000008/pyspark.zip/pyspark/worker.py", line 136, in main    if read_int(infile) == SpecialLengths.END_OF_STREAM:  File "/u04/yarn/local/usercache/das/appcache/application_1450337892069_0336/container_1450337892069_0336_01_000008/pyspark.zip/pyspark/serializers.py", line 545, in read_int    raise EOFError


GC (using G1GC) debugging just before crash:
driver:   [Eden: 2316.0M(2316.0M)->0.0B(2318.0M) Survivors: 140.0M->138.0M Heap: 3288.7M(4096.0M)->675.5M(4096.0M)]
executor(s):   [Eden: 2342.0M(2342.0M)->0.0B(2378.0M) Survivors: 52.0M->34.0M Heap: 3601.7M(4096.0M)->1242.7M(4096.0M)]

thanks.