You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Arjun <ba...@mail.uc.edu> on 2014/07/23 17:15:24 UTC
Finding file size during block placement
Hi,
I want to write a block placement policy that takes the size of the file
being placed into account. Something like what is done in CoHadoop or
BEEMR paper. I have the following questions:
1- Is srcPath in chooseTarget the path to the original un-chunked file,
or it is a path to a single block?
2- Will a simple new File(srcPath) will do?
3- I've spent time looking at hadoop source code. I can't find a way to
go from srcPath in chooseTarget to a file size. Every function I think
can do it, in FSNamesystem, FSDirectory, etc., is either non-public, or
cannot be called from inside the blockmanagement package or
blockplacement class.
How do I go from srcPath in blockplacement class to size of the file
being placed?
Thank you,
AB
Re: Finding file size during block placement
Posted by Colin McCabe <cm...@alumni.cmu.edu>.
On Wed, Jul 23, 2014 at 8:15 AM, Arjun <ba...@mail.uc.edu> wrote:
> Hi,
>
> I want to write a block placement policy that takes the size of the file
> being placed into account. Something like what is done in CoHadoop or BEEMR
> paper. I have the following questions:
>
>
Hadoop uses a stream metaphor. So at the time you're deciding what blocks
to use for a DFSOutputStream, you don't know how many bytes the user code
is going to write. It could be terabytes, or nothing.
You could potentially start placing the later replicas differently, once
the first few blocks had been written. You would probably need to modify
the BlockPlacementPolicy interface to supply this information. I could be
wrong, but as far as I can see, there's no way to access that with the
current API.
cheers,
Colin
> 1- Is srcPath in chooseTarget the path to the original un-chunked file, or
> it is a path to a single block?
>
> 2- Will a simple new File(srcPath) will do?
>
> 3- I've spent time looking at hadoop source code. I can't find a way to go
> from srcPath in chooseTarget to a file size. Every function I think can do
> it, in FSNamesystem, FSDirectory, etc., is either non-public, or cannot be
> called from inside the blockmanagement package or blockplacement class.
>
> How do I go from srcPath in blockplacement class to size of the file being
> placed?
>
> Thank you,
>
> AB
>