You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "VitoMakarevich (via GitHub)" <gi...@apache.org> on 2023/09/13 17:32:36 UTC

[GitHub] [spark] VitoMakarevich commented on pull request #42893: [SPARK-44459][Core] Add System.runFinalization() to periodic cleanup

VitoMakarevich commented on PR #42893:
URL: https://github.com/apache/spark/pull/42893#issuecomment-1718041681

   My reasoning was that currently Spark has `System.gc()` so it should not be a problem to add another call simular to it.
   As for comments:
   1. `Impact on performance` - Indeed in some circumstances, it may be a problem, but it's better than failing with OOM since millions of objects are accumulated. Also, it's done by a single thread, so the blast radius is limited. And yet it may be added under a feature flag so users who struggle with simular problem may enable it(probably it's minority of users, but they may need a fix as well).
   2. `Impact on memory usage` - Yet again you are saying that usually standard approach should work and I'm sure it works, but there are cases when it's not enough. I assume we can say the same about `System.gc()` - usually you should not invoke it since it's costly, but it's here.
   3. `Possible change in expected behavior` - there is no way to define logic depending on it as far as I know, it's JVM internal, and it's different from `ReferenceQueue` - a mechanism designed for controllable release of resources - and that's why `finalization` is evil.
   
   As for `try finally` - for sure this is the best way to do it, but as I said - the problem comes from the side library(as I understood from `netty`) and java implementation - so you can't apply this pattern there. And the second problem is that even if you surround a block with `try finally` if it uses objects with non-empty `finalize` - they will go through `Finalizer` thread anyway, so there is no way to prevent it in Java 8.
   As for Spark update - you guys see better, maybe it's not a problem in the long-term, but as usual with Java - many users will live years on Java 8 as living now.
   
   As for GC settings - it may look strange, but I didn't find `any` mechanism to control the finalization queue in `any` GC. If you know such - for sure it's up to users to change it according to their application. Another argument for adding this call is that while the user can tune GC - there is still a `System.gc()` in the Spark codebase.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org