You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Arjun <ba...@mail.uc.edu> on 2014/07/23 17:15:24 UTC

Finding file size during block placement

Hi,

I want to write a block placement policy that takes the size of the file 
being placed into account. Something like what is done in CoHadoop or 
BEEMR paper. I have the following questions:

1- Is srcPath in chooseTarget the path to the original un-chunked file, 
or it is a path to a single block?

2- Will a simple new File(srcPath) will do?

3- I've spent time looking at hadoop source code. I can't find a way to 
go from srcPath in chooseTarget to a file size. Every function I think 
can do it, in FSNamesystem, FSDirectory, etc., is either non-public, or 
cannot be called from inside the blockmanagement package or 
blockplacement class.

How do I go from srcPath in blockplacement class to size of the file 
being placed?

Thank you,

AB

Re: Finding file size during block placement

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
On Wed, Jul 23, 2014 at 8:15 AM, Arjun <ba...@mail.uc.edu> wrote:

> Hi,
>
> I want to write a block placement policy that takes the size of the file
> being placed into account. Something like what is done in CoHadoop or BEEMR
> paper. I have the following questions:
>
>
Hadoop uses a stream metaphor.  So at the time you're deciding what blocks
to use for a DFSOutputStream, you don't know how many bytes the user code
is going to write.  It could be terabytes, or nothing.

You could potentially start placing the later replicas differently, once
the first few blocks had been written.  You would probably need to modify
the BlockPlacementPolicy interface to supply this information.  I could be
wrong, but as far as I can see, there's no way to access that with the
current API.

cheers,
Colin



> 1- Is srcPath in chooseTarget the path to the original un-chunked file, or
> it is a path to a single block?
>
> 2- Will a simple new File(srcPath) will do?
>
> 3- I've spent time looking at hadoop source code. I can't find a way to go
> from srcPath in chooseTarget to a file size. Every function I think can do
> it, in FSNamesystem, FSDirectory, etc., is either non-public, or cannot be
> called from inside the blockmanagement package or blockplacement class.
>
> How do I go from srcPath in blockplacement class to size of the file being
> placed?
>
> Thank you,
>
> AB
>