You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Kostas Kougios <ko...@googlemail.com> on 2015/07/03 16:16:00 UTC

ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM

I have this problem with a job. A random executor gets this

ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM

Almost always at the same point in the processing of the data. I am
processing 1 mil files with sc.wholeText. At around the 600.000th file, a
container receives this signal. On the driver i get:

15/07/03 14:20:11 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated
or disconnected! Shutting down. cruncher03.stratified:44617
15/07/03 14:20:11 ERROR cluster.YarnClusterScheduler: Lost executor 3 on
cruncher03.stratified: remote Rpc client disassociated
15/07/03 14:20:11 WARN remote.ReliableDeliverySupervisor: Association with
remote system [akka.tcp://sparkExecutor@cruncher03.stratified:44617] has
failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/07/03 14:20:11 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated
or disconnected! Shutting down. cruncher03.stratified:44617
15/07/03 14:20:11 INFO scheduler.TaskSetManager: Re-queueing tasks for 3
from TaskSet 5.0


There is plenty of memory on the machine and container jvm, so I don't think
it is an OOM (after all it would be a SIGKILL) or an OutOfMemory (there is
no out of mem exception)

What can be causing this?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-executor-CoarseGrainedExecutorBackend-RECEIVED-SIGNAL-15-SIGTERM-tp23613.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org