You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by shahid <sh...@trialx.com> on 2015/10/22 18:37:27 UTC

Python worker exited unexpectedly (crashed)

Hi 

I am running 10 node standalone cluster on aws
and loading 100G data on HDFS.. doing first groupby operation.
and then generating pairs from the groupedrdd (key,[a1,b1],key,[a,b,c]) 
generating the pairs like
(a1,b1),(a,b),(a,c) ... n
PairRDD will get large in size.

some stats from ui when starting to get errors and finally script fails
Details for Stage 1 (Attempt 0)
Total Time Across All Tasks: 1.3 h
Shuffle Read: 4.4 GB / 1402058
Shuffle Spill (Memory): 73.1 GB
Shuffle Spill (Disk): 3.6 GB

Get following stack trace 

WARN scheduler.TaskSetManager: Lost task 0.3 in stage 1.0 (TID 943,
10.239.131.154): org.apache.spark.SparkException: Python worker exited
unexpectedly (crashed)
	at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:175)
	at
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
	at org.apache.spark.scheduler.Task.run(Task.scala:70)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(DataInputStream.java:392)
	at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:111)
	... 10 more

15/10/22 16:30:17 ERROR scheduler.TaskSetManager: Task 0 in stage 1.0 failed
4 times; aborting job
15/10/22 16:30:17 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-worker-exited-unexpectedly-crashed-tp25164.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org