You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dibyendu Bhattacharya (JIRA)" <ji...@apache.org> on 2015/07/04 09:17:04 UTC
[jira] [Updated] (SPARK-8591) Block failed to unroll to memory should not be replicated for MEMORY_ONLY_2 StorageLevel

     [ https://issues.apache.org/jira/browse/SPARK-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dibyendu Bhattacharya updated SPARK-8591:
-----------------------------------------
    Description: 
Block which failed to unroll to memory and returned iterator and size 0, should not be replicated to peer node as putBlockStatus comes as StorageLevel.NONE and BlockStatus is not reported to Master.

Primary issue here is , for level is MEMORY_ONLY_2 , if BlockManager failed to unroll the block to memory and store to local is failed, BlockManager still replicate the same block to Remote peer. For Spark Streaming case , the Receivers get the PutResult from local BlockManager and if block failed to store , Receiver ReceivedBlockHandler throws the SparkException back to Receiver even though the block successfully replicated in Remote peer by BlockManager. This leads to wastage of memory at remote peer as that block can never be used in Streaming jobs. In case of Receiver failed to store the block, it can retry and for every failed retry ( to store to local) may leads to adding another unused block to remote and this may leads to many unwanted blocks in case of high volume receivers does multiple retry. 

The fix here proposed is to stop replicating the block if store to local has failed. This fix will prevent the scenario mentioned above and also will not impact the RDD Partition replications ( during Cache or Persists) as RDD CacheManager perform unrolling to memory first before attempting to store in local memory, and this can never happen that block unroll is successful but store to local memory has failed. 



  was:Block which failed to unroll to memory and returned iterator and size 0, should not be replicated to peer node as putBlockStatus comes as StorageLevel.NONE and BlockStatus is not reported to Master


> Block failed to unroll to memory should not be replicated for MEMORY_ONLY_2 StorageLevel
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-8591
>                 URL: https://issues.apache.org/jira/browse/SPARK-8591
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager
>    Affects Versions: 1.4.0
>            Reporter: Dibyendu Bhattacharya
>
> Block which failed to unroll to memory and returned iterator and size 0, should not be replicated to peer node as putBlockStatus comes as StorageLevel.NONE and BlockStatus is not reported to Master.
> Primary issue here is , for level is MEMORY_ONLY_2 , if BlockManager failed to unroll the block to memory and store to local is failed, BlockManager still replicate the same block to Remote peer. For Spark Streaming case , the Receivers get the PutResult from local BlockManager and if block failed to store , Receiver ReceivedBlockHandler throws the SparkException back to Receiver even though the block successfully replicated in Remote peer by BlockManager. This leads to wastage of memory at remote peer as that block can never be used in Streaming jobs. In case of Receiver failed to store the block, it can retry and for every failed retry ( to store to local) may leads to adding another unused block to remote and this may leads to many unwanted blocks in case of high volume receivers does multiple retry. 
> The fix here proposed is to stop replicating the block if store to local has failed. This fix will prevent the scenario mentioned above and also will not impact the RDD Partition replications ( during Cache or Persists) as RDD CacheManager perform unrolling to memory first before attempting to store in local memory, and this can never happen that block unroll is successful but store to local memory has failed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org