You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Herman van Hovell (JIRA)" <ji...@apache.org> on 2016/09/07 11:14:20 UTC

[jira] [Commented] (SPARK-17430) Spark task Hangs after OOM while DAG scheduler tries to serialize a task

    [ https://issues.apache.org/jira/browse/SPARK-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15470321#comment-15470321 ] 

Herman van Hovell commented on SPARK-17430:
-------------------------------------------

[~Mikhail] We use github pull requests for fixes. Could you open one?

In PR you are catching an `OutOfMemoryError`, the problem with that is that we cannot be sure what caused the OOME (it doesn't have to be the DAG scheduler), and that we cannot guarantee that the Driver is in a usable state after one occurs. It might be a better plan to figure out why it is OOMing.

> Spark task Hangs after OOM while DAG scheduler tries to serialize a task
> ------------------------------------------------------------------------
>
>                 Key: SPARK-17430
>                 URL: https://issues.apache.org/jira/browse/SPARK-17430
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.6.2
>            Reporter: Mikhail
>         Attachments: abort-task-on-oom-in-dag-scheduler.patch
>
>
> Hi here,
> We're running Spark under Hadoop 2.7.1 Yarn and faced a problem.
> The problem is that sometimes an exception raises inside JavaSerializer (see the stacktrace below). The exception isn't a problem itself but after it happens, the task hangs. It's shown as "running" in the Hadoop task list but no one worker is executing task, no more records appear in Spark job log until somebody kills it.
> We have fixed the issue by patching Spark code (catch OOM in submitMissingTasks()) but it looks like OOM error is deliberately ignored so probably there should be a better solution.
> {noformat}
> Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:3332)
> 	at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
> 	at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
> 	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
> 	at java.lang.StringBuilder.append(StringBuilder.java:136)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> 	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> 	at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
> 	at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
> 	at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1003)
> 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:921)
> 	at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:861)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1607)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org