You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Arun C Murthy <ac...@yahoo-inc.com> on 2010/04/01 00:37:02 UTC

Re: Where MR meets block locations

Moving to mapreduce-dev@ (bcc common-dev@).

Responses inline:

On Mar 29, 2010, at 7:02 PM, Mike Cardosa wrote:
>
> 1) When the jobtracker assigns a task to a tasktracker, it determines
> if the task is data-local or rack-local from the splits (which were
> generated during the job init process). Where in the code could I
> "refresh" the split locations in case they have changed or blocks have
> been replicated to additional new datanodes?
>

No easy way to do that. But in practice, I don't think it matters much.

> 2) When a tasktracker is assigned a map task, is it informed if it's a
> data-local or rack-local map task? If so, where in the code does this
> take place, and is it possible to patch the code to have it check to
> see if it has a data-local copy of the block first before going to the
> network to download the block from another datanode?
>

No, the TT doesn't know/care. The DFSClient in the Map has the smarts  
to do the i/o from the 'nearest' datanode.

Arun