You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "liupengcheng (JIRA)" <ji...@apache.org> on 2019/01/25 09:21:00 UTC

[jira] [Created] (SPARK-26728) Make rdd.unpersist and broadcast.unpersist blocking configurable

liupengcheng created SPARK-26728:
------------------------------------

             Summary: Make rdd.unpersist and broadcast.unpersist blocking configurable
                 Key: SPARK-26728
                 URL: https://issues.apache.org/jira/browse/SPARK-26728
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.4.0, 2.1.0
            Reporter: liupengcheng


Currently, rdd.unpersist's blocking argument is set to true by default. However, in actual production cluster(especially large cluster), node lost or network issue can always happen.

Users always use rdd.unpersist as non-exceptional, so sometimes the blocking unpersist may cause user's job failure, and this happened many times in our cluster.
{code:java}
2018-05-16,13:28:33,489 WARN org.apache.spark.storage.BlockManagerMaster: Failed to remove RDD 15 - Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
java.io.IOException: Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
	at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
	at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException
2018-05-16,13:28:33,489 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: User class threw exception: java.io.IOException: Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
java.io.IOException: Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
	at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
	at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException
{code}
I think we can make this blocking argument as a config, so that we can control the default value of it with gray scale systems.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org