You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by mridulm <gi...@git.apache.org> on 2016/01/24 10:17:01 UTC

[GitHub] spark pull request: [SPARK-10193] [core] [wip] Eliminate Skipped S...

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/8427#issuecomment-174271331
  
    Just a note about MapOutputTracker - it is fairly trivial to make it use bare minimum amount of memory even if it does not get cleaned up for 'old' stages : using a disk backed map (mapdb for example) via LRU.
    Which keeps utmost current and previous map output in memory and everything else on disk (until there is a node failure requiring recomputation - which brings portions of this back into memory).
    
    This is what we used to do for production jobs in some earlier projects.
    
    
    I am not sure what the impact of the current proposal is from memory overhead pov  - map output was (obviously) expensive enough to attempt this and the affect was not pervasive/diffuse across the codebase for shuffle output tracking.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org