You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2011/04/07 17:42:11 UTC

[jira] [Commented] (MAPREDUCE-1892) RaidNode can allow layered policies more efficiently

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016899#comment-13016899 ] 

Hudson commented on MAPREDUCE-1892:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #643 (See [https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/])
    

> RaidNode can allow layered policies more efficiently
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-1892
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1892
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1892.patch, MAPREDUCE-1892.patch
>
>
> The RaidNode policy file can have layered policies that can cover a file more than once. To avoid processing a file multiple times (for RAIDing), RaidNode maintains a list of processed files that is used to avoid duplicate processing attempts.
> This is problematic in that a large number of processed files could cause the RaidNode to run out of memory.
> This task proposes a better method of detecting processed files. The method is based on the observation that a more selective policy will have a better match with a file name than a less selective one. Specifically, the more selective policy will have a longer common prefix with the file name.
> So to detect if a file has already been processed, the RaidNode only needs to maintain a list of processed policies and compare the lengths of the common prefixes. If the file has a longer common prefix with one of the processed policies than with the current policy, it can be assumed to be processed already.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira