You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Arun C Murthy <ac...@hortonworks.com> on 2013/11/01 03:14:29 UTC

Re: question about preserving data locality in MapReduce with Yarn

The code is slightly hard to follow since it's split between the client and the ApplicationMaster.

The client invokes InputFormat.getSplits to compute locations and writes it to a file in HDFS.
The ApplicationMaster then reads the file and creates resource-requests based on the locations for each input file (3-replicas). See TaskAttemptImpl.dataLocalHosts and TaskAttemptImpl.dataLocalRacks - follow those variables around in the code-base.

hth,
Arun

On Oct 28, 2013, at 11:10 PM, ricky l <ri...@gmail.com> wrote:

> Hi Sandy, thank you very much for the information. It is good to know that MapReduce AM considers the block location information. BTW, I am not very familiar with the concept of splits. Is it specific to MR jobs? If possible, code location would be very helpful for reference as I am trying to implement an application master that needs to consider HDFS data-locality. thx.
> 
> r.
> 
> 
> On Mon, Oct 28, 2013 at 10:21 PM, Sandy Ryza <sa...@cloudera.com> wrote:
> Hi Ricky,
> 
> The input splits contain the locations of the blocks they cover.  The AM gets the information from the input splits and submits requests for those location.  Each container request spans all the replicas that the block is located on.  Are you interested in something more specific?
> 
> -Sandy
> 
> 
> On Mon, Oct 28, 2013 at 7:09 PM, ricky lee <ri...@gmail.com> wrote:
> Well, I thought an application master can somewhat ask where the data exist to a namenode.... isn't it true? If it does not know where the data reside, does a MapReduce application master specify the resource name as "*" which means data locality might not be preserved at all? thx,
> 
> r
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.