You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "liangtianlun (Jira)" <ji...@apache.org> on 2020/12/09 07:18:00 UTC

[jira] [Commented] (SPARK-33710) Shuffle Index use Guava cache OOM, Yarn NodeManage GC Alarm

    [ https://issues.apache.org/jira/browse/SPARK-33710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246329#comment-17246329 ] 

liangtianlun commented on SPARK-33710:
--------------------------------------

Thank you. I'll change it into English

> Shuffle Index use Guava cache OOM, Yarn NodeManage GC Alarm
> -----------------------------------------------------------
>
>                 Key: SPARK-33710
>                 URL: https://issues.apache.org/jira/browse/SPARK-33710
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle, YARN
>    Affects Versions: 3.2.0
>            Reporter: liangtianlun
>            Priority: Major
>
> h2. CDH6.3 Yarn nodemanger frequently GC, and then the dump file is generated due to memory overflow
> !https://upload-images.jianshu.io/upload_images/18249296-24acecfcc46dc744.png?imageMogr2/auto-orient/strip|imageView2/2/w/607/format/webp!
>   
> h2. Use the Memory Analyzer Tool to locate the shuffle index module
>  
> Using guava to cache the memory limit, there is no restriction on the cache key, resulting in a lot of path information in the memory. If the size of shuffleindexinformation in the cache is very small, the number of keys will be very large, and eventually lead to memory overflow. I think there is a defect here, and the capacity of key should be added to the statistics of 100MB
>  
> !https://upload-images.jianshu.io/upload_images/18249296-ed0cfee76b6f6bf2.png?imageMogr2/auto-orient/strip|imageView2/2/w/630/format/webp!
> According to the MAT, the ExternalShuffleBlockHandler uses guava's local cache and takes up 82.88% of the heap memory
>  
>  
> !https://upload-images.jianshu.io/upload_images/18249296-43ec91771f3c68b7.png?imageMogr2/auto-orient/strip|imageView2/2/w/760/format/webp!
> !https://upload-images.jianshu.io/upload_images/18249296-f85e27a501605260.png?imageMogr2/auto-orient/strip|imageView2/2/w/1147/format/webp!
> Through the analysis, it is found that there are a lot of shuffle index path information in the memory, which takes up more than 400 MB of memory, and the number is very large. This path is the key of shuffleindex cache in the external shufflebock resolver. After looking at the source code, we know that there may be some defects in the cache management, because the limited 100MB does not include the key statistics
>   !https://upload-images.jianshu.io/upload_images/18249296-87118ce13744c2ca.png?imageMogr2/auto-orient/strip|imageView2/2/w/1200/format/webp!
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org