You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Konstantin Shvachko (JIRA)" <ji...@apache.org> on 2007/03/21 02:53:32 UTC

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482631 ] 

Konstantin Shvachko commented on HADOOP-894:
--------------------------------------------

I understand the problem as that a lot of clients are opening the same file and read the first block of it,
e.g. in streaming, and then each reads a specific part of the file. So each client does not need to receive
a block map for the whole file, but rather needs to get block locations in a specified range.

I propose to modify ClientProtocol.open() to
OpenFileInfo open( String src, int numBlocks )
where
src - is the path;
numBlocks - is the number of blocks, which locations the client wants to be calculated by the open()
@returns
OpenFileInfo : extends DFSFileInfo {
    LocatedBlock[ numBlocks ];
}
DFSFileInfo contains file information including file length and replication.

ClientProtocol should also contain
public LocatedBlock[] getBlockLocations(String src, int offset, int length) throws IOException;
offset - is the starting offset in the file
length - is the number of bytes the client is supposed to read

class LocatedBlock should include an additional field
+ long startFrom;  which determines the offset within the block to the desired region of bytes.

Then we will need to reimplement seeks and reads for DFSInputStream using that API.
What would be a good default for the number of blocks that getBlockLocations()
would fetch per call if the file is read from start to finish?

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Wendy Chien
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.