You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2019/01/29 14:46:00 UTC
[jira] [Comment Edited] (SPARK-26728) Make rdd.unpersist blocking
configurable
[ https://issues.apache.org/jira/browse/SPARK-26728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755069#comment-16755069 ]
Sean Owen edited comment on SPARK-26728 at 1/29/19 2:45 PM:
------------------------------------------------------------
Closing this in favor of https://issues.apache.org/jira/browse/SPARK-26771
was (Author: srowen):
Closing this in favor of https://issues.apache.org/jira/browse/SPARK-26728
> Make rdd.unpersist blocking configurable
> ----------------------------------------
>
> Key: SPARK-26728
> URL: https://issues.apache.org/jira/browse/SPARK-26728
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.1.0, 2.4.0
> Reporter: liupengcheng
> Priority: Major
>
> Currently, rdd.unpersist's blocking argument is set to true by default. However, in actual production cluster(especially large cluster), node lost or network issue can always happen.
> Users always use rdd.unpersist as non-exceptional, so sometimes the blocking unpersist may cause user's job failure, and this happened many times in our cluster.
> {code:java}
> 2018-05-16,13:28:33,489 WARN org.apache.spark.storage.BlockManagerMaster: Failed to remove RDD 15 - Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
> java.io.IOException: Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
> at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
> at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
> at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
> at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
> at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
> at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
> at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
> at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
> at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
> at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
> at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
> at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
> at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
> at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedChannelException
> 2018-05-16,13:28:33,489 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: User class threw exception: java.io.IOException: Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
> java.io.IOException: Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
> at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
> at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
> at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
> at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
> at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
> at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
> at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
> at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
> at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
> at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
> at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
> at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
> at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
> at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedChannelException
> {code}
> I think we can make this blocking argument as a config, so that we can control the default value of it with gray scale systems.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org