You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "L. C. Hsieh (Jira)" <ji...@apache.org> on 2019/10/14 21:27:00 UTC

[jira] [Assigned] (SPARK-29469) Avoid retries by RetryingBlockFetcher when ExternalBlockStoreClient is closed

     [ https://issues.apache.org/jira/browse/SPARK-29469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

L. C. Hsieh reassigned SPARK-29469:
-----------------------------------

    Assignee: L. C. Hsieh

> Avoid retries by RetryingBlockFetcher when ExternalBlockStoreClient is closed
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-29469
>                 URL: https://issues.apache.org/jira/browse/SPARK-29469
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 3.0.0
>            Reporter: L. C. Hsieh
>            Assignee: L. C. Hsieh
>            Priority: Minor
>
> Found that some NPE was thrown in job log:
> 2019-10-14 20:06:16 ERROR RetryingBlockFetcher:143 - Exception while beginning fetch of 2 outstanding blocks (after 3 retries)
> java.lang.NullPointerException
> 	at org.apache.spark.network.shuffle.ExternalShuffleClient.lambda$fetchBlocks$0(ExternalShuffleClient.java:100)
> 	at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141)
> 	at org.apache.spark.network.shuffle.RetryingBlockFetcher.lambda$initiateRetry$0(RetryingBlockFetcher.java:169)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> It was happened after BlockManager and ExternalBlockStoreClient was closed due to previous errors. In this cases, RetryingBlockFetcher does not need to retry. This NPE is harmless for job execution, but is a source of misleading when looking at log. Especially for end-users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org