You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Renxia Wang (JIRA)" <ji...@apache.org> on 2016/08/01 06:35:20 UTC

[jira] [Created] (SPARK-16830) Executors Keep Trying to Fetch Blocks from a Bad Host

Renxia Wang created SPARK-16830:
-----------------------------------

             Summary: Executors Keep Trying to Fetch Blocks from a Bad Host
                 Key: SPARK-16830
                 URL: https://issues.apache.org/jira/browse/SPARK-16830
             Project: Spark
          Issue Type: Bug
          Components: Spark Core, Streaming
    Affects Versions: 1.6.2
         Environment: EMR 4.7.2
            Reporter: Renxia Wang


When a host became unreachable, driver removes the executors and block managers on that hosts because it doesn't receive heartbeats. However, executors on other hosts still keep trying to fetch blocks from the bad hosts. 

I am running a Spark Streaming job to consume data from Kinesis. As a result of this block fetch retrying and failing, I started seeing ProvisionedThroughputExceededException on shards, AmazonHttpClient (to Kinesis) SocketException, Kinesis ExpiredIteratorException etc. 

This issue also expose a potential memory leak. Starting from the time that the bad host became unreachable, the physical memory usages of executors that keep trying to fetch block from the bad host started increasing and finally hit the physical memory limit and killed by YARN. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org