You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by momina khan <mo...@gmail.com> on 2010/05/07 12:52:28 UTC

data locality on HDFS

hi,

i am trying to figure out how hadoop uses data locality to schedule maps on
nodes which locally store tha map input ... going through code i am going in
circles in between a couple of file but not really getting anywhere ... that
is to say that i cant locate the HDFS API or func that can communicate a
node list that store replicas foe say a block!

i am going from FSNameSystem.java to DFSClient.java to
BlocksWithLocations.java to DataNodeDescriptor.java and then back again
without getting to the HDFS interface that communicates replicas' storing
nodes for a block!

someone plz help!
momina

Re: data locality on HDFS

Posted by Eli Collins <el...@cloudera.com>.
Hey Momina,

Here's the path on 20:

DistributedFileSystem#getFileBlockLocations
-> DFSClient#getFileBlockLocations
    -> callGetBlockLocations
       -> ClientProtocol#getBlockLocations
           -> (via proxy) NameNode#getBlockLocations

See createNamenode and createRPCNamenode in the DFSClient constructor
for how the RPC proxy is established.

Thanks,
Eli

On Fri, May 7, 2010 at 8:16 PM, momina khan <mo...@gmail.com> wrote:
> hi
>
> i am still going in circles .... i still cant pin point a single
> function call that interacts with the HDFS for block locations... it
> is as if files are making circular calls to getBlockLocations() which
> is implemented such that it calls the same function in a different
> class ... i mean it is not talking to the HDFS anywhere.
>
> plz help!
> momina
>
> On 5/7/10, Amogh Vasekar <am...@yahoo-inc.com> wrote:
>> Hi,
>> The (o.a.h.fs) FileSystem API has GetBlockLocations that is used to
>> determine replicas.
>> In general cases, (o.a.h.mapreduce.lib.input) FileInputFormat's getSplits()
>> calls this method, which is passed on for job scheduling along with the
>> split info.
>>
>> Hope this is what you were looking for.
>>
>> Amogh
>>
>>
>> On 5/7/10 4:22 PM, "momina khan" <mo...@gmail.com> wrote:
>>
>> hi,
>>
>> i am trying to figure out how hadoop uses data locality to schedule maps on
>> nodes which locally store tha map input ... going through code i am going in
>> circles in between a couple of file but not really getting anywhere ... that
>> is to say that i cant locate the HDFS API or func that can communicate a
>> node list that store replicas foe say a block!
>>
>> i am going from FSNameSystem.java to DFSClient.java to
>> BlocksWithLocations.java to DataNodeDescriptor.java and then back again
>> without getting to the HDFS interface that communicates replicas' storing
>> nodes for a block!
>>
>> someone plz help!
>> momina
>>
>>
>

Re: data locality on HDFS

Posted by momina khan <mo...@gmail.com>.
hi

i am still going in circles .... i still cant pin point a single
function call that interacts with the HDFS for block locations... it
is as if files are making circular calls to getBlockLocations() which
is implemented such that it calls the same function in a different
class ... i mean it is not talking to the HDFS anywhere.

plz help!
momina

On 5/7/10, Amogh Vasekar <am...@yahoo-inc.com> wrote:
> Hi,
> The (o.a.h.fs) FileSystem API has GetBlockLocations that is used to
> determine replicas.
> In general cases, (o.a.h.mapreduce.lib.input) FileInputFormat's getSplits()
> calls this method, which is passed on for job scheduling along with the
> split info.
>
> Hope this is what you were looking for.
>
> Amogh
>
>
> On 5/7/10 4:22 PM, "momina khan" <mo...@gmail.com> wrote:
>
> hi,
>
> i am trying to figure out how hadoop uses data locality to schedule maps on
> nodes which locally store tha map input ... going through code i am going in
> circles in between a couple of file but not really getting anywhere ... that
> is to say that i cant locate the HDFS API or func that can communicate a
> node list that store replicas foe say a block!
>
> i am going from FSNameSystem.java to DFSClient.java to
> BlocksWithLocations.java to DataNodeDescriptor.java and then back again
> without getting to the HDFS interface that communicates replicas' storing
> nodes for a block!
>
> someone plz help!
> momina
>
>

Re: data locality on HDFS

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Hi,
The (o.a.h.fs) FileSystem API has GetBlockLocations that is used to determine replicas.
In general cases, (o.a.h.mapreduce.lib.input) FileInputFormat's getSplits() calls this method, which is passed on for job scheduling along with the split info.

Hope this is what you were looking for.

Amogh


On 5/7/10 4:22 PM, "momina khan" <mo...@gmail.com> wrote:

hi,

i am trying to figure out how hadoop uses data locality to schedule maps on
nodes which locally store tha map input ... going through code i am going in
circles in between a couple of file but not really getting anywhere ... that
is to say that i cant locate the HDFS API or func that can communicate a
node list that store replicas foe say a block!

i am going from FSNameSystem.java to DFSClient.java to
BlocksWithLocations.java to DataNodeDescriptor.java and then back again
without getting to the HDFS interface that communicates replicas' storing
nodes for a block!

someone plz help!
momina