You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2018/12/20 12:41:09 UTC

[GitHub] wangshuo128 commented on issue #23355: [SPARK-26418][SHUFFLE] Only OpenBlocks without any ChunkFetch for one stream will cause memory leak in ExternalShuffleService

wangshuo128 commented on issue #23355: [SPARK-26418][SHUFFLE] Only OpenBlocks without any ChunkFetch for one stream will cause memory leak in ExternalShuffleService
URL: https://github.com/apache/spark/pull/23355#issuecomment-448986131
 
 
   Let me explain my problem in detail.
   
   We use `YarnShuffleService` as aux service of NodeManager in our cluster. Full GC happened in some NodeManagers. We dump the heap memory, found that the map held `StreamState` in `OneForOneStreamManager` was 3G bytes, almost 80% of heap size. Some applications have finished, but the `StreamState`s were still in `OneForOneStreamManager`.
   
   In current code, server creates `StreamState` when handle `OpenBlocks` request and  associates `StreamState` with channel when handle following `ChunkFetchRequest`s. 
   
   I think two reasons will cause this:
   
   1. `OpenBlocks` request is received and `StreamState` is initialized in server side. Then transport layer client lost or even executor lost, no `ChunkFetchRequest` is sent to server for the stream.
   2. `OpenBlocks` request is received and `StreamState` is initialized in server side. `ChunkFetchRequest`s for the stream are sent to server. But server is under heavy pressure and not able to handle the `ChunkFetchRequest` before timeout. Then client close its connection in `TransportChannelHandler`.`userEventTriggered`.
   
   Currently the `OpenBlocks` request and following `FetchChunkRequest`s for a specific stream are sent in the same `TransportClient` in `OneForOneBlockFetcher`. So I think associate `StreamState` with channel when handle `OpenBlocks` request will be fine.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org