You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Everett Anderson <ev...@nuna.com.INVALID> on 2017/06/15 15:47:56 UTC

Application Master machine affinity/preference settings?

Hi!

We've been using Hadoop MapReduce and Spark on YARN on AWS Elastic
MapReduce (EMR). EMR has a concept of Core versus Task nodes, where Core
nodes participate in HDFS but Task nodes don't and their number can be more
easily scaled up or down based on load.

Most applications we run are batch and can tolerate machines going away
well, but some of them are ad hoc interactive Spark sessions. Spark seems
to handle executors (workers) going away okay, but if the main Application
Master for that user's session goes away, they lose state.

Is there a mechanism in YARN such that we could prioritize launching
Application Masters on the Core machine pool in a cluster when resources
are available?

I know there are scheduling queues that we could use to segregate isolate
entire applications -- such as batch versus interactive ones -- but I'm not
sure if there's a way to ensure just the AM of a given application is
prioritized to be on a specific set of machines.

Thanks!

- Everett

Re: Application Master machine affinity/preference settings?

Posted by Naganarasimha Garla <na...@apache.org>.

Hi Everett Anderson,
     I can think of doing it in 2 ways,
1. Create a labels for CoreMachine pool (as Exclusive or non exclusive
partition) and submit the AM request with CoreMachine label expression. In
this way AM's are submitted in the Coremachine pool itself. refer
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeLabel.html
2. After YARN-6050, if you are aware of the nodes which are CoreMachine,
then you can submit AM with multiple ResourceRequest with each
having ResourceName pointing to different nodes.

Regards,
+ Naga


On Thu, Jun 15, 2017 at 9:17 PM, Everett Anderson <ev...@nuna.com.invalid>
wrote:

> Hi!
>
> We've been using Hadoop MapReduce and Spark on YARN on AWS Elastic
> MapReduce (EMR). EMR has a concept of Core versus Task nodes, where Core
> nodes participate in HDFS but Task nodes don't and their number can be more
> easily scaled up or down based on load.
>
> Most applications we run are batch and can tolerate machines going away
> well, but some of them are ad hoc interactive Spark sessions. Spark seems
> to handle executors (workers) going away okay, but if the main Application
> Master for that user's session goes away, they lose state.
>
> Is there a mechanism in YARN such that we could prioritize launching
> Application Masters on the Core machine pool in a cluster when resources
> are available?
>
> I know there are scheduling queues that we could use to segregate isolate
> entire applications -- such as batch versus interactive ones -- but I'm not
> sure if there's a way to ensure just the AM of a given application is
> prioritized to be on a specific set of machines.
>
> Thanks!
>
> - Everett
>
>