You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2011/06/07 16:06:58 UTC

[jira] [Updated] (MAPREDUCE-2572) Throttle the deletion of data from the distributed cache

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2572:
-------------------------------------------

    Attachment: THROTTLING-security-v1.patch

This patch includes a backport of MAPREDUCE-2494 LRU ordering of deletion and throttling.  Currently we are throttling based off of a given number of bytes per second.  There is a lot of work that still needs to go into this.  The tests need to be improved and the sleep interval needs to take into account the amount of time spent actually deleting data.

It has also been suggested that perhaps we want to have the throttling be tied to the fill rate of the cache, so that the faster it fills the faster we clear it out.  I would like some feedback on this. 

> Throttle the deletion of data from the distributed cache
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-2572
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2572
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.20.205.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>         Attachments: THROTTLING-security-v1.patch
>
>
> When deleting entries from the distributed cache we do so in a background thread.  Once the size limit of the distributed cache is reached all unused entries are deleted.  MAPREDUCE-2494 changes this so that entries are deleted in LRU order until the usage falls below a given threshold.  In either of these cases we are periodically flooding a disk with delete requests which can slow down all IO operations to a drive.  It would be better to be able to throttle this deletion so that it is spread out over a longer period of time.  This jira is to add in this throttling.
> On investigating it seems much simpler to backport MPAREDUCE-2494 to 20S before implementing this change rather then try to implement it without LRU deletion, because LRU goes a long way towards reducing the load on the disk anyways.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira