You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "wuyi (Jira)" <ji...@apache.org> on 2020/06/24 15:02:00 UTC

[jira] [Created] (SPARK-32091) Ignore timeout error when remove blocks on the lost executor

wuyi created SPARK-32091:
----------------------------

             Summary: Ignore timeout error when remove blocks on the lost executor
                 Key: SPARK-32091
                 URL: https://issues.apache.org/jira/browse/SPARK-32091
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.0.0, 2.4.0
            Reporter: wuyi


When removing blocks(e.g. RDD, broadcast, shuffle), BlockManagerMaserEndpoint will make RPC calls to each known BlockManagerSlaveEndpoint to remove the specific blocks. The PRC call sometimes could end in a timeout when the executor has been lost, but only notified the BlockManagerSlaveEndpoint after the removing call has already happened. The timeout could therefore fail the whole query.

In this case, we actually could just ignore the error since those blocks on the lost executor could be considered as removed already.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org