You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Agam (Jira)" <ji...@apache.org> on 2021/09/13 13:03:00 UTC
[jira] [Created] (YARN-10941) Wrong Yarn node label mapping with AWS EMR machine types

Agam created YARN-10941:
---------------------------

             Summary: Wrong Yarn node label mapping with AWS EMR machine types
                 Key: YARN-10941
                 URL: https://issues.apache.org/jira/browse/YARN-10941
             Project: Hadoop YARN
          Issue Type: Bug
          Components: yarn
    Affects Versions: 2.10.1
            Reporter: Agam


Does anyone have experience with Yarn node labels on AWS EMR? If so you please share your thoughts. We want to run All the Spark executors on Task(Spot) machine and all the Spark ApplicationMaster/Driver on Core(on-Demand) machine. Previously we were running Spark executors and Spark Driver all on the CORE machine(on-demand).

In order to achieve this, we have created the "TASK" yarn node label as a part of a custom AWS EMR Bootstrap action. And Have mapped the same "TASK" yarn label when any Spot instance is registered with AWS EMR in a separate bootstrap action. As "CORE" is the default yarn node label expression, so we are simply mapping it with an on-demand instance upon registration of the node in the bootstrap action.

We are using `"spark.yarn.executor.nodeLabelExpression": "TASK"` spark conf to launch spark executors on Task nodes.

So.. we are facing the problem of the wrong mapping of the Yarn node label with the appropriate machine i.e For a short duration of time(around 1-2 mins) the "TASK" yarn node label is mapped with on-demand instances and "CORE" yarn node label is mapped with spot instance. So During this short duration of wrong labeling Yarn launches Spark executors on On-demand instances and Spark drivers on Spot instances.

This wrong mapping of labels with corresponding machine type persists till the bootstrap actions are complete and after that, the mapping is automatically resolved to its correct state.

The script we are running as a part of the bootstrap action:

This script is run on all new machines to assign a label to that machine. The script is being run as a background PID as the yarn is only available after all custom bootstrap actions are completed

```

??#!/usr/bin/env bash??

??set -ex??

??function waitTillYarnComesUp()??

??{ IS_YARN_EXIST=$(which yarn | grep -i yarn | wc -l) while [ $IS_YARN_EXIST != '1' ] do echo "Yarn not exist" sleep 15 IS_YARN_EXIST=$(which yarn | grep -i yarn | wc -l) done echo "Yarn exist.." }??

??function waitTillTaskLabelSyncs()??

??{ LABEL_EXIST=$(yarn cluster --list-node-labels | grep -i TASK | wc -l) while [ $LABEL_EXIST -eq 0 ] do sleep 15 LABEL_EXIST=$(yarn cluster --list-node-labels | grep -i TASK | wc -l) done }??

??function getHostInstanceTypeAndApplyLabel() {??
?? HOST_IP=$(curl [http://169.254.169.254/latest/meta-data/local-hostname])??
?? echo "host ip is ${HOST_IP}"??
?? INSTANCE_TYPE=$(curl [http://169.254.169.254/latest/meta-data/instance-life-cycle])??
?? echo "instance type is ${INSTANCE_TYPE}"??
?? PORT_NUMBER=8041??
?? spot="spot"??
?? onDemand="on-demand"??

??if [ $INSTANCE_TYPE == $spot ]; then??
?? yarn rmadmin -replaceLabelsOnNode "${HOST_IP}:${PORT_NUMBER}=TASK"??
?? elif [ $INSTANCE_TYPE == $onDemand ]??
?? then??
?? yarn rmadmin -replaceLabelsOnNode "${HOST_IP}:${PORT_NUMBER}=CORE"??

??fi??
?? }??

??waitTillYarnComesUp??
 # ??holding for resource manager sync??
?? sleep 100??
?? waitTillTaskLabelSyncs??
?? getHostInstanceTypeAndApplyLabel??
?? exit 0??

```

 

??`yarn rmadmin -addToClusterNodeLabels "TASK(exclusive=false)"`??

This command is being run on the Master instance to create a new TASK yarn node label at the time of cluster creation.

Does anyone have clue to prevent this wrong mapping of labels?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org