You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by Sandy Ryza <sa...@cloudera.com> on 2013/02/05 22:59:18 UTC

reforming the the AM-RM resource requests

I recently filed a JIRA (YARN-371) that has sparked a fair bit of
discussion.  Arun asked me to move the discussion to yarn-dev.  Here's a
summary:

Each AMRM heartbeat consists of a list of resource requests. Currently,
each resource request consists of a container count, a resource vector, and
a location, which may be a node, a rack, or "*". When an application wishes
to request a task run in multiple locations, it must issue a request for
each location. This means that for a node-local task, it must issue three
requests, one at the node-level, one at the rack-level, and one with *
(any). These requests are not linked with each other, so when a container
is allocated for one of them, the RM has no way of knowing which others to
get rid of. When a node-local container is allocated, this is handled by
decrementing the number of requests on that node's rack and in *. But when
the scheduler allocates a task with a node-local request on its rack, the
request on the node is left there.

This can cause delay-scheduling to try to assign a container on a node that
nobody cares about anymore.  It also makes it impossible to support
requests for only a single node (not its rack and * as well), as well as
other weird kinds of needs, such as gang-scheduling, in which tasks need to
be run at the same time, and scheduling for IO bound jobs where an
application wants its tasks to run close to each other, but doesn't care
where.

My proposal is to modify the AMRM protocol to be "task-centric", i.e. to
contain information about which requests to particular locations are linked
to requests to other locations.  This would allow future schedulers the
ability to act on a task-centric view of an application's needs, and
support a richer set of scheduling features.

I understand that this is not the first time this discussion has been had,
and I'm sure this design was considered during YARN's initial design
stages.  However, I believe that the current format may severely limit the
types of applications that can be run on YARN, and as the system starts to
become widely used, it is important to at least take another look at the
protocol before APIs are locked down.

Arun had concerns related to the overhead that would be induced by this
change.  I won't try to represent his views myself, but they are available
on the JIRA.

Thanks for reading, and let me know what you think!
Sandy