You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ramkumar Vadali (JIRA)" <ji...@apache.org> on 2011/05/23 19:40:47 UTC

[jira] [Commented] (MAPREDUCE-2186) DistributedRaidFileSystem should implement getFileBlockLocations()

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038086#comment-13038086 ] 

Ramkumar Vadali commented on MAPREDUCE-2186:
--------------------------------------------

The main motivation to open this jira was to allow CombineFileInputFormat to work when there are missing blocks. CombineFileInputFormat figures out the host/rack information for input blocks and uses that information to create input splits. It does not handle the case where a block does not have any host/rack information.

The proposed fix to return the location of parity blocks in the case where source blocks are missing is not good because it is fixing the problem in the wrong place. It also causes us to get false locality. 
Instead of changing RAID FS to handle this case, its better to fix CFIF to handle the case when there are missing blocks (MAPREDUCE-2185)

> DistributedRaidFileSystem should implement getFileBlockLocations()
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2186
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2186
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>
> If a RAIDed file has missing blocks, DistributedRaidFileSystem.getFileBlockLocations() would return no block locations. This could lead a client to believe that the file is not readable. But if parity data is available, the file actually is readable.
> It would be better to implement getFileBlockLocations() and return the location of the parity blocks that would be needed to reconstruct the missing block.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira