You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2019/01/29 14:46:00 UTC

[jira] [Resolved] (SPARK-26728) Make rdd.unpersist blocking configurable

     [ https://issues.apache.org/jira/browse/SPARK-26728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-26728.
-------------------------------
    Resolution: Won't Fix

Closing this in favor of https://issues.apache.org/jira/browse/SPARK-26728

> Make rdd.unpersist blocking configurable
> ----------------------------------------
>
>                 Key: SPARK-26728
>                 URL: https://issues.apache.org/jira/browse/SPARK-26728
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.1.0, 2.4.0
>            Reporter: liupengcheng
>            Priority: Major
>
> Currently, rdd.unpersist's blocking argument is set to true by default. However, in actual production cluster(especially large cluster), node lost or network issue can always happen.
> Users always use rdd.unpersist as non-exceptional, so sometimes the blocking unpersist may cause user's job failure, and this happened many times in our cluster.
> {code:java}
> 2018-05-16,13:28:33,489 WARN org.apache.spark.storage.BlockManagerMaster: Failed to remove RDD 15 - Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
> java.io.IOException: Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
> 	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
> 	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
> 	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
> 	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
> 	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
> 	at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
> 	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
> 	at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
> 	at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
> 	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
> 	at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
> 	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
> 	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> 	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedChannelException
> 2018-05-16,13:28:33,489 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: User class threw exception: java.io.IOException: Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
> java.io.IOException: Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
> 	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
> 	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
> 	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
> 	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
> 	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
> 	at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
> 	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
> 	at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
> 	at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
> 	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
> 	at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
> 	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
> 	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> 	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedChannelException
> {code}
> I think we can make this blocking argument as a config, so that we can control the default value of it with gray scale systems.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org