You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2017/05/08 02:16:04 UTC

[jira] [Assigned] (SPARK-20633) FileFormatWriter wrap the FetchFailedException which breaks job's failover

     [ https://issues.apache.org/jira/browse/SPARK-20633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-20633:
------------------------------------

    Assignee: Apache Spark

> FileFormatWriter wrap the FetchFailedException which breaks job's failover
> --------------------------------------------------------------------------
>
>                 Key: SPARK-20633
>                 URL: https://issues.apache.org/jira/browse/SPARK-20633
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.1
>            Reporter: Liu Shaohui
>            Assignee: Apache Spark
>
> The task scheduler handles FetchFailedException separately for the task failover. But the FileFormatWriter wraps the FetchFailedException with SparkException. This causes the job cannot be recovered from the failure like a external shuffle server is down.
> See the stacktrace:
> {code}
> 2017-04-30,05:02:42,348 ERROR org.apache.spark.sql.execution.datasources.DefaultWriterContainer: Task attempt attempt_201704300443_0018_m_000096_1 aborted.
> 2017-04-30,05:02:42,392 ERROR org.apache.spark.executor.Executor: Exception in task 96.1 in stage 18.0 (TID 26538)
> org.apache.spark.SparkException: Task failed while writing rows
>   at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261)
>   at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>   at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException: Executor is not registered (appId=application_1491898760056_636981, execId=546)
>   at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:319)
>   at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:87)
>   at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:74)
>   at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:152)
>   at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
>   at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org