You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Fabio <an...@gmail.com> on 2015/01/11 07:03:23 UTC

How is locality really implemented?

Hi everyone,
I am trying to understand how locality is actually implemented in 
Hadoop, specifically in the Capacity Scheduler.
I see that an application has to specify the "location" for a request, 
that can be a node, a rack, or ANY (*).
For this reason I want to ask if an application that wants a resource 
request to be considered at higher levels of locality, has to submit 
more than just one request for the same resource (e.g.: if I want my 
request to be considered also at rack and off-switch level, I will issue 
a request specifying the node, one specifying the rack, and one with 
just *, all of them with the relax locality set to true).
This seems to me as a necessity since in the Capacity scheduler code the 
rack local requests (and similarly for the off-switch ones) are obtained 
in a way like this:

application.getResourceRequest(priority, *node.getRackName()*)

while if I submitted a single request just for a specific node, even 
allowing relaxed locality, the scheduler would not be able to process it 
at rack level, since it specifically looks for requests made for the 
current rack (and at switch level too).
Is this correct?
If it is, what's the point of the relax locality parameter? I don't see 
a possible situation when I would have any request with relax locality 
set to false (in particular for rack and off-switch levels). Why would 
an application issue a rack-level request with relax locality set to false?

Thanks in advance

Fabio