You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/24 03:31:56 UTC

[GitHub] [spark] taroplus opened a new pull request #34090: [SPARK-36827][CORE] Fix perf issue in AppStatusListener.cleanupStages

taroplus opened a new pull request #34090:
URL: https://github.com/apache/spark/pull/34090


   ### What changes were proposed in this pull request?
   This PR fixes a performance issue in `AppStatusListener.cleanupStages`. When there are large number of stages in store, this logic below runs like N*M order.
   
   ```
       val stageIds = stages.map { s =>
         val key = Array(s.info.stageId, s.info.attemptId)
         kvstore.delete(s.getClass(), key)
   
         // Check whether there are remaining attempts for the same stage. If there aren't, then
         // also delete the RDD graph data.
         val remainingAttempts = kvstore.view(classOf[StageDataWrapper])
           .index("stageId")
           .first(s.info.stageId)
           .last(s.info.stageId)
           .closeableIterator()
           ...
   ```
   Instead of accessing the view for checking remaining task per stage, this change is to move the logic after removing stages. Then it only needs to access the view(`kvstore.view(classOf[StageDataWrapper])`) once.
   
   ### Why are the changes needed?
   When there are more than ideal number of stages kept inside the memory, the clean up process is unable to catch up with the speed of incoming stages because of this perf issue, that leads to a behavior which looks like a memory leak.  Eventually it causes OutOfMemoryError.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   The behavior should be identical before and after the change, and the existing tests should verify that. This change has been applied to the environment where constant memory leak was observed. With the same load, now services are running perfectly healthy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #34090: [SPARK-36827][CORE] Fix perf issue in AppStatusListener.cleanupStages

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #34090:
URL: https://github.com/apache/spark/pull/34090#issuecomment-926347932


   @taroplus I was working on this as well yesterday.  https://github.com/apache/spark/pull/34092
   If we have to pull all the stage data out from KVStore, we should avoid calling `KVUtils.viewToSeq(view, countToDelete.toInt)` which will copy the stage data and perform sorting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] taroplus closed pull request #34090: [SPARK-36827][CORE] Fix perf issue in AppStatusListener.cleanupStages

Posted by GitBox <gi...@apache.org>.
taroplus closed pull request #34090:
URL: https://github.com/apache/spark/pull/34090


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34090: [SPARK-36827][CORE] Fix perf issue in AppStatusListener.cleanupStages

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34090:
URL: https://github.com/apache/spark/pull/34090#issuecomment-926316219


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] taroplus commented on pull request #34090: [SPARK-36827][CORE] Fix perf issue in AppStatusListener.cleanupStages

Posted by GitBox <gi...@apache.org>.
taroplus commented on pull request #34090:
URL: https://github.com/apache/spark/pull/34090#issuecomment-926348877


   in favor of 
   https://github.com/apache/spark/pull/34092


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org