You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Mahadev konar (JIRA)" <ji...@apache.org> on 2011/05/12 19:25:47 UTC

[jira] [Commented] (MAPREDUCE-2459) Cache HAR filesystem metadata

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032524#comment-13032524 ] 

Mahadev konar commented on MAPREDUCE-2459:
------------------------------------------

Mac, looks like the tests are failing (especially TestHarFileSystem). The patch looks good to me. Is there any particular reason on using an _ in front of the following variables?

{noformat}
_harMetaCache
{noformat}

Also, this is meant for trunk only?




> Cache HAR filesystem metadata
> -----------------------------
>
>                 Key: MAPREDUCE-2459
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2459
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: harchive
>            Reporter: Mac Yang
>            Assignee: Mac Yang
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2459.1.patch
>
>
> Each HAR file system has two index files that contains information on how files are stored in the part files. During the block location calculation, these indexes are reread for every file in the archive. Caching the indexes and the status of the part files will greatly reduce the number of name node operations during the job setup time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira