You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2011/06/08 19:18:00 UTC

[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2494:
-------------------------------------------

    Attachment: MAPREDUCE-2494-20.20X-V1.patch

This patch also takes into account the issues shown with MAPREDUCE-2573.  This is for the security branch.

     [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.


> Make the distributed cache delete entires using LRU priority
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2494
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch
>
>
> Currently the distributed cache will wait until a cache directory is above a preconfigured threshold.  At which point it will delete all entries that are not currently being used.  It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used.  We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira