You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:35:23 UTC

[jira] [Resolved] (SPARK-16830) Executors Keep Trying to Fetch Blocks from a Bad Host

     [ https://issues.apache.org/jira/browse/SPARK-16830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-16830.
----------------------------------
    Resolution: Incomplete

> Executors Keep Trying to Fetch Blocks from a Bad Host
> -----------------------------------------------------
>
>                 Key: SPARK-16830
>                 URL: https://issues.apache.org/jira/browse/SPARK-16830
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams, Spark Core
>    Affects Versions: 1.6.2
>         Environment: EMR 4.7.2
>            Reporter: Renxia Wang
>            Priority: Major
>              Labels: bulk-closed
>
> When a host became unreachable, driver removes the executors and block managers on that hosts because it doesn't receive heartbeats. However, executors on other hosts still keep trying to fetch blocks from the bad hosts. 
> I am running a Spark Streaming job to consume data from Kinesis. As a result of this block fetch retrying and failing, I started seeing ProvisionedThroughputExceededException on shards, AmazonHttpClient (to Kinesis) SocketException, Kinesis ExpiredIteratorException etc. 
> This issue also expose a potential memory leak. Starting from the time that the bad host became unreachable, the physical memory usages of executors that keep trying to fetch block from the bad host started increasing and finally hit the physical memory limit and killed by YARN. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org