You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Zhankun Tang (JIRA)" <ji...@apache.org> on 2018/10/18 02:22:00 UTC

[jira] [Comment Edited] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development

    [ https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654534#comment-16654534 ] 

Zhankun Tang edited comment on YARN-8851 at 10/18/18 2:21 AM:
--------------------------------------------------------------

[~leftnoteasy] Thanks for the review! Very helpful comments!

1. 
{code:java}
 
 // Check version for compatibility
 String pluginVersion = request.getVersion();
 if (!isVersionCompatible(pluginVersion)) {
 LOG.error("Class: " + pluginClassName + " version: " + pluginVersion +
 " is not compatible. Expected: " + DeviceConstants.version);
 }
{code}
What's the use case for this? My understanding is, version match should happen when requests come to NM. And I'm not sure if it is the best idea to limit format of version, maybe we should just treat it as an identifier in addition to name?

{color:#ff0000}Zhankun -->{color} Sorry for the misleading name "pluginVersion". It should be "APIVersion" in fact. The format of it follows semantic versioning which is "Major.Mino.patch". A vendor plugin should report which DevicePlugin API version it is using.

Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.

When NM gets the request from vendor plugin, this "APIVersion" is used to check if the vendor plugin is developed by a compatible version of "org.apache.hadoop.yarn.server.nodemanager.api.deviceplugin". For instance, the NM uses a "1.0.0" but the plugin's APIversion is "0.1.0"(which means this vendor plugin is developed by 0.1.0 APIs), we should reject this register request because the APIs it used maybe deprecated (major version 0 < 1).

And we can add a field of "pluginVersion" for the plugin to indicate its own version. But I guess this not that important to YARN.

2. 

Instead of adding two configs:
{code:java}
 @Private
 public static final String NM_RESOURCE_PLUGINS_ENABLE_PLUGGABLE_DEVICE_FRAMEWORK =
 NM_RESOURCE_PLUGINS + ".pluggable-device-framework.enable";


 @Private
 public static final String NM_RESOURCE_PLUGINS_PLUGGABLE_CLASS =
 NM_RESOURCE_PLUGINS + ".pluggable-class";
{code}
Maybe leaving the ....pluggable-class is sufficient?

{color:#ff0000}Zhankun -->{color} Ah ha, I think leave only this one is ok for now. But I'm not sure if there'll be more configurations related to the device framework. So maybe leave a switch here is more easy for the administrator to open/close the whole?

3.

Set<Device> getAndWatch(), 
I'm not sure what does the "Watch" mean? Should it be just getDevices?

{color:#ff0000}Zhankun–>{color} Good idea.

4. It looks like you try to make DevicePlugin agnostic to Container itself, maybe we should change the name:
preLaunchContainer => allocateDevices 
postCompleteContainer => releaseDevices?

{color:#ff0000}Zhankun–> {color:#333333}Yeah. This name is confusing. How about this? Since we want the vendor plugin {color}{color}developer{color:#333333} to know these two are hooks which will be invoked by NM (more accurate, DevicePluginAdapter).{color}

 

 


was (Author: tangzhankun):
[~leftnoteasy] Thanks for the review! Very helpful comments!

1. 
{code:java}
 
 // Check version for compatibility
 String pluginVersion = request.getVersion();
 if (!isVersionCompatible(pluginVersion)) {
 LOG.error("Class: " + pluginClassName + " version: " + pluginVersion +
 " is not compatible. Expected: " + DeviceConstants.version);
 }
{code}
What's the use case for this? My understanding is, version match should happen when requests come to NM. And I'm not sure if it is the best idea to limit format of version, maybe we should just treat it as an identifier in addition to name?

{color:#FF0000}Zhankun -->{color} Sorry for the misleading name "pluginVersion". It should be "APIVersion" in fact. The format of it follows semantic versioning which is "Major.Mino.patch". A vendor plugin should report which DevicePlugin API version it is using.

Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.

When NM gets the request from vendor plugin, this "APIVersion" is used to check if the vendor plugin is developed by a compatible version of "org.apache.hadoop.yarn.server.nodemanager.api.deviceplugin". For instance, the NM uses a "1.0.0" but the plugin's APIversion is "0.1.0"(which means this vendor plugin is developed by 0.1.0 APIs), we should reject this register request because the APIs it used maybe deprecated (major version 0 < 1).

And we can add a field of "pluginVersion" for the plugin to indicate its own version. But I guess this not that important to YARN.

2. 

Instead of adding two configs:
{code:java}
 @Private
 public static final String NM_RESOURCE_PLUGINS_ENABLE_PLUGGABLE_DEVICE_FRAMEWORK =
 NM_RESOURCE_PLUGINS + ".pluggable-device-framework.enable";


 @Private
 public static final String NM_RESOURCE_PLUGINS_PLUGGABLE_CLASS =
 NM_RESOURCE_PLUGINS + ".pluggable-class";
{code}
Maybe leaving the ....pluggable-class is sufficient?

{color:#FF0000}Zhankun -->{color} Ah ha, I think leave only this one is ok for now. But I'm not sure if there'll be more configurations related to the device framework. So maybe leave a switch here is more easy for the administrator to open/close the whole?

3.

Set<Device> getAndWatch(), 
I'm not sure what does the "Watch" mean? Should it be just getDevices?

{color:#FF0000}Zhankun–>{color} Good idea.

4. It looks like you try to make DevicePlugin agnostic to Container itself, maybe we should change the name:
preLaunchContainer => allocateDevices 
postCompleteContainer => releaseDevices?

{color:#FF0000}Zhankun–> {color:#333333}Yeah. This name is confusing. How about this? Since we want the vendor plugin {color}deveoper{color:#333333} to know these two are hooks which will be invoked by NM (more accurate, DevicePluginAdapter).{color}{color}

 

 

> [Umbrella] A new pluggable device plugin framework to ease vendor plugin development
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-8851
>                 URL: https://issues.apache.org/jira/browse/YARN-8851
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: yarn
>            Reporter: Zhankun Tang
>            Assignee: Zhankun Tang
>            Priority: Major
>         Attachments: YARN-8851-WIP2-trunk.001.patch, YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, [YARN-8851] YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] YARN_New_Device_Plugin_Framework_Design_Proposal.pdf
>
>
> At present, we support GPU/FPGA device in YARN through a native, coupling way. But it's difficult for a vendor to implement such a device plugin because the developer needs much knowledge of YARN internals. And this brings burden to the community to maintain both YARN core and vendor-specific code.
> Here we propose a new device plugin framework to ease vendor device plugin development and provide a more flexible way to integrate with YARN NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org