You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@myriad.apache.org by Swapnil Daingade <sw...@gmail.com> on 2015/07/17 02:24:35 UTC

Merging MyriadExecutor with NodeManager

Hi All,

Currently with Fine Grained Scheduling (FGS), the workflow for reporting
status and relinquishing resources used
by a YARN container is as following

1. The NodeManager reports the status/completion of the container to the
ResourceManager
     as part of container statuses included in the NM to RM heartbeat

2. This container status is intercepted by the Myriad Scheduler. The
Scheduler sends a
    frameworkMessage to the MyriadExecutor running on the NodeManager node.
    See NMHeartBeatHandler.handleStatusUpdate here

https://github.com/mesos/myriad/blob/issue_14/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMHeartBeatHandler.java#L112

3. This frameworkMessage instructs the MyriadExecutor to report the task
state corresponding to the YARN container status back to mesos.
     See MyriadExecutor.frameworkMessage here

https://github.com/mesos/myriad/blob/issue_14/myriad-executor/src/main/java/com/ebay/myriad/executor/MyriadExecutor.java#L252

There are some disadvantages to this approach

1. In step 2 we use SchedulerDriver.sendFrameworkMessage() API. According
to the API documentation, this message is best effort.
  /**
    * Sends a message from the framework to one of its executors. These
    * messages are best effort; do not expect a framework message to be
    * retransmitted in any reliable fashion.

2. This requires the Scheduler/RM to be up for YARN containers/Mesos Tasks
to be able to report statuses to Mesos Master.
     If Scheduler/RM goes down, we will not be able to send task statuses
to Mesos, until the Scheduler/RM is back up.
     This can lead to resource leakages.

3. There is additional overhead of sending messages back from Scheduler/RM
back to the Executors for each container on each
     heartbeat. (Number of yarn containers/node * Number of Nodes)
additional messages.

In order to avoid the above mentioned issues, we are proposing merging of
the MyriadExecutor and NodeManager.
The MyriadExecutor will run as a NM auxiliary service (same process as NM).
It will be able to intercept YARN container completion locally and inform
mesos-master irrespective of weather scheduler is running.
We will no longer have to use the sendFrameworkMessage method.
There will be less message traffic from scheduler to executor.

I have posted my proposed changes as part of the pull request here
https://github.com/mesos/myriad/pull/118

Request you take a look and let me know your feedback.

Regards
Swapnil

Re: Merging MyriadExecutor with NodeManager

Posted by Swapnil Daingade <sw...@gmail.com>.
Hi Darin,

I had to add the following in the yarn-site.xml on NMs.

<property>
    <description>A comma separated list of services where service name
should only contain a-zA-Z0-9_ and can not start with numbers</description>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle, myriad_executor</value>
</property>
  <property>
    <name>yarn.nodemanager.aux-services.myriad_executor.class</name>
    <value>com.ebay.myriad.executor.MyriadExecutorAuxService</value>
  </property>

and I had to extract the contents of myriad-executor-runnable-0.0.1.jar
into $YARN_HOME/share/hadoop/yarn/lib to get it working.
Please let me know if you face any issues or any other feedback you may
have.

Regards
Swapnil


On Fri, Jul 17, 2015 at 8:08 AM, Swapnil Daingade <
swapnil.daingade@gmail.com> wrote:

> Should work with 2.5 as earlier (I tested on single node mapr cluster with
> hadoop-common 2.5.1).
>
> Only change I did was to copy the myriad executor jars under $PROJECT_HOME/myriad-executor/build/libs/
> to $YARN_HOME/share/hadoop/yarn/lib
> as the MyriadExecutor now runs as part of NM.
>
> Actually let me spin up a NM only node and make sure all the dependencies
> for MyriadExecutor are satisfied (I tested on a node that had both RM and
> NM).
>
> Regards
> Swapnil
>
>
> On Fri, Jul 17, 2015 at 5:27 AM, Darin Johnson <db...@gmail.com>
> wrote:
>
>> Awesome took a quick look, will test and go over code soon.  Any
>> dependency
>> issues between 2.5, 2.6 and 2.7 to be aware of?
>> Hi All,
>>
>> Currently with Fine Grained Scheduling (FGS), the workflow for reporting
>> status and relinquishing resources used
>> by a YARN container is as following
>>
>> 1. The NodeManager reports the status/completion of the container to the
>> ResourceManager
>>      as part of container statuses included in the NM to RM heartbeat
>>
>> 2. This container status is intercepted by the Myriad Scheduler. The
>> Scheduler sends a
>>     frameworkMessage to the MyriadExecutor running on the NodeManager
>> node.
>>     See NMHeartBeatHandler.handleStatusUpdate here
>>
>>
>> https://github.com/mesos/myriad/blob/issue_14/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMHeartBeatHandler.java#L112
>>
>> 3. This frameworkMessage instructs the MyriadExecutor to report the task
>> state corresponding to the YARN container status back to mesos.
>>      See MyriadExecutor.frameworkMessage here
>>
>>
>> https://github.com/mesos/myriad/blob/issue_14/myriad-executor/src/main/java/com/ebay/myriad/executor/MyriadExecutor.java#L252
>>
>> There are some disadvantages to this approach
>>
>> 1. In step 2 we use SchedulerDriver.sendFrameworkMessage() API. According
>> to the API documentation, this message is best effort.
>>   /**
>>     * Sends a message from the framework to one of its executors. These
>>     * messages are best effort; do not expect a framework message to be
>>     * retransmitted in any reliable fashion.
>>
>> 2. This requires the Scheduler/RM to be up for YARN containers/Mesos Tasks
>> to be able to report statuses to Mesos Master.
>>      If Scheduler/RM goes down, we will not be able to send task statuses
>> to Mesos, until the Scheduler/RM is back up.
>>      This can lead to resource leakages.
>>
>> 3. There is additional overhead of sending messages back from Scheduler/RM
>> back to the Executors for each container on each
>>      heartbeat. (Number of yarn containers/node * Number of Nodes)
>> additional messages.
>>
>> In order to avoid the above mentioned issues, we are proposing merging of
>> the MyriadExecutor and NodeManager.
>> The MyriadExecutor will run as a NM auxiliary service (same process as
>> NM).
>> It will be able to intercept YARN container completion locally and inform
>> mesos-master irrespective of weather scheduler is running.
>> We will no longer have to use the sendFrameworkMessage method.
>> There will be less message traffic from scheduler to executor.
>>
>> I have posted my proposed changes as part of the pull request here
>> https://github.com/mesos/myriad/pull/118
>>
>> Request you take a look and let me know your feedback.
>>
>> Regards
>> Swapnil
>>
>
>

Re: Merging MyriadExecutor with NodeManager

Posted by Swapnil Daingade <sw...@gmail.com>.
Should work with 2.5 as earlier (I tested on single node mapr cluster with
hadoop-common 2.5.1).

Only change I did was to copy the myriad executor jars under
$PROJECT_HOME/myriad-executor/build/libs/
to $YARN_HOME/share/hadoop/yarn/lib
as the MyriadExecutor now runs as part of NM.

Actually let me spin up a NM only node and make sure all the dependencies
for MyriadExecutor are satisfied (I tested on a node that had both RM and
NM).

Regards
Swapnil


On Fri, Jul 17, 2015 at 5:27 AM, Darin Johnson <db...@gmail.com>
wrote:

> Awesome took a quick look, will test and go over code soon.  Any dependency
> issues between 2.5, 2.6 and 2.7 to be aware of?
> Hi All,
>
> Currently with Fine Grained Scheduling (FGS), the workflow for reporting
> status and relinquishing resources used
> by a YARN container is as following
>
> 1. The NodeManager reports the status/completion of the container to the
> ResourceManager
>      as part of container statuses included in the NM to RM heartbeat
>
> 2. This container status is intercepted by the Myriad Scheduler. The
> Scheduler sends a
>     frameworkMessage to the MyriadExecutor running on the NodeManager node.
>     See NMHeartBeatHandler.handleStatusUpdate here
>
>
> https://github.com/mesos/myriad/blob/issue_14/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMHeartBeatHandler.java#L112
>
> 3. This frameworkMessage instructs the MyriadExecutor to report the task
> state corresponding to the YARN container status back to mesos.
>      See MyriadExecutor.frameworkMessage here
>
>
> https://github.com/mesos/myriad/blob/issue_14/myriad-executor/src/main/java/com/ebay/myriad/executor/MyriadExecutor.java#L252
>
> There are some disadvantages to this approach
>
> 1. In step 2 we use SchedulerDriver.sendFrameworkMessage() API. According
> to the API documentation, this message is best effort.
>   /**
>     * Sends a message from the framework to one of its executors. These
>     * messages are best effort; do not expect a framework message to be
>     * retransmitted in any reliable fashion.
>
> 2. This requires the Scheduler/RM to be up for YARN containers/Mesos Tasks
> to be able to report statuses to Mesos Master.
>      If Scheduler/RM goes down, we will not be able to send task statuses
> to Mesos, until the Scheduler/RM is back up.
>      This can lead to resource leakages.
>
> 3. There is additional overhead of sending messages back from Scheduler/RM
> back to the Executors for each container on each
>      heartbeat. (Number of yarn containers/node * Number of Nodes)
> additional messages.
>
> In order to avoid the above mentioned issues, we are proposing merging of
> the MyriadExecutor and NodeManager.
> The MyriadExecutor will run as a NM auxiliary service (same process as NM).
> It will be able to intercept YARN container completion locally and inform
> mesos-master irrespective of weather scheduler is running.
> We will no longer have to use the sendFrameworkMessage method.
> There will be less message traffic from scheduler to executor.
>
> I have posted my proposed changes as part of the pull request here
> https://github.com/mesos/myriad/pull/118
>
> Request you take a look and let me know your feedback.
>
> Regards
> Swapnil
>

Re: Merging MyriadExecutor with NodeManager

Posted by Darin Johnson <db...@gmail.com>.
Awesome took a quick look, will test and go over code soon.  Any dependency
issues between 2.5, 2.6 and 2.7 to be aware of?
Hi All,

Currently with Fine Grained Scheduling (FGS), the workflow for reporting
status and relinquishing resources used
by a YARN container is as following

1. The NodeManager reports the status/completion of the container to the
ResourceManager
     as part of container statuses included in the NM to RM heartbeat

2. This container status is intercepted by the Myriad Scheduler. The
Scheduler sends a
    frameworkMessage to the MyriadExecutor running on the NodeManager node.
    See NMHeartBeatHandler.handleStatusUpdate here

https://github.com/mesos/myriad/blob/issue_14/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMHeartBeatHandler.java#L112

3. This frameworkMessage instructs the MyriadExecutor to report the task
state corresponding to the YARN container status back to mesos.
     See MyriadExecutor.frameworkMessage here

https://github.com/mesos/myriad/blob/issue_14/myriad-executor/src/main/java/com/ebay/myriad/executor/MyriadExecutor.java#L252

There are some disadvantages to this approach

1. In step 2 we use SchedulerDriver.sendFrameworkMessage() API. According
to the API documentation, this message is best effort.
  /**
    * Sends a message from the framework to one of its executors. These
    * messages are best effort; do not expect a framework message to be
    * retransmitted in any reliable fashion.

2. This requires the Scheduler/RM to be up for YARN containers/Mesos Tasks
to be able to report statuses to Mesos Master.
     If Scheduler/RM goes down, we will not be able to send task statuses
to Mesos, until the Scheduler/RM is back up.
     This can lead to resource leakages.

3. There is additional overhead of sending messages back from Scheduler/RM
back to the Executors for each container on each
     heartbeat. (Number of yarn containers/node * Number of Nodes)
additional messages.

In order to avoid the above mentioned issues, we are proposing merging of
the MyriadExecutor and NodeManager.
The MyriadExecutor will run as a NM auxiliary service (same process as NM).
It will be able to intercept YARN container completion locally and inform
mesos-master irrespective of weather scheduler is running.
We will no longer have to use the sendFrameworkMessage method.
There will be less message traffic from scheduler to executor.

I have posted my proposed changes as part of the pull request here
https://github.com/mesos/myriad/pull/118

Request you take a look and let me know your feedback.

Regards
Swapnil

Re: Merging MyriadExecutor with NodeManager

Posted by Adam Bordelon <ad...@mesosphere.io>.
Thanks for the clear explanation, Swapnil. This sounds awesome. A much
cleaner design, and it seems like exactly the kind of thing YARN
AuxServices were created for.

On Thu, Jul 16, 2015 at 5:24 PM, Swapnil Daingade <
swapnil.daingade@gmail.com> wrote:

> Hi All,
>
> Currently with Fine Grained Scheduling (FGS), the workflow for reporting
> status and relinquishing resources used
> by a YARN container is as following
>
> 1. The NodeManager reports the status/completion of the container to the
> ResourceManager
>      as part of container statuses included in the NM to RM heartbeat
>
> 2. This container status is intercepted by the Myriad Scheduler. The
> Scheduler sends a
>     frameworkMessage to the MyriadExecutor running on the NodeManager node.
>     See NMHeartBeatHandler.handleStatusUpdate here
>
>
> https://github.com/mesos/myriad/blob/issue_14/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMHeartBeatHandler.java#L112
>
> 3. This frameworkMessage instructs the MyriadExecutor to report the task
> state corresponding to the YARN container status back to mesos.
>      See MyriadExecutor.frameworkMessage here
>
>
> https://github.com/mesos/myriad/blob/issue_14/myriad-executor/src/main/java/com/ebay/myriad/executor/MyriadExecutor.java#L252
>
> There are some disadvantages to this approach
>
> 1. In step 2 we use SchedulerDriver.sendFrameworkMessage() API. According
> to the API documentation, this message is best effort.
>   /**
>     * Sends a message from the framework to one of its executors. These
>     * messages are best effort; do not expect a framework message to be
>     * retransmitted in any reliable fashion.
>
> 2. This requires the Scheduler/RM to be up for YARN containers/Mesos Tasks
> to be able to report statuses to Mesos Master.
>      If Scheduler/RM goes down, we will not be able to send task statuses
> to Mesos, until the Scheduler/RM is back up.
>      This can lead to resource leakages.
>
> 3. There is additional overhead of sending messages back from Scheduler/RM
> back to the Executors for each container on each
>      heartbeat. (Number of yarn containers/node * Number of Nodes)
> additional messages.
>
> In order to avoid the above mentioned issues, we are proposing merging of
> the MyriadExecutor and NodeManager.
> The MyriadExecutor will run as a NM auxiliary service (same process as NM).
> It will be able to intercept YARN container completion locally and inform
> mesos-master irrespective of weather scheduler is running.
> We will no longer have to use the sendFrameworkMessage method.
> There will be less message traffic from scheduler to executor.
>
> I have posted my proposed changes as part of the pull request here
> https://github.com/mesos/myriad/pull/118
>
> Request you take a look and let me know your feedback.
>
> Regards
> Swapnil
>