You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Nate Woody (JIRA)" <ji...@apache.org> on 2009/03/09 13:44:50 UTC

[jira] Created: (HADOOP-5441) HOD refactoring to ease integration with scheduler/resource managers other than torque

HOD refactoring to ease integration with scheduler/resource managers other than torque
--------------------------------------------------------------------------------------

                 Key: HADOOP-5441
                 URL: https://issues.apache.org/jira/browse/HADOOP-5441
             Project: Hadoop Core
          Issue Type: Improvement
          Components: contrib/hod
         Environment: All
            Reporter: Nate Woody


Situation: HOD currently uses the pbsdsh (a distributed shell that works via Torque's TM interface to start remote processes) command to start processes on all nodes in the job.  This call is provided as part of a torqueInterface class that is meant to abstract interactions with the torque resource managers (RMs).  However, this is not functionality typically provided by other RMs, and is instead typically performed by an distributed command available on the HPC system, mpiexec, ssh, or site-specific scripts.  The specificity of pbsdsh to Torque makes writing HOD interfaces to other RMs somewhat difficult as it forces the implementer to choose the remote start method on a somewhat faulty per-RM basis.

Proposal: Refactor the torqueInterface and nodePool classes so that the choice of remote start method is available as a configuration option in hodrc.  This involves fairly simple changes to remove the pbsdsh command from the Scheduler class and addition configuration step of starting the appropriate remote start wrapper.  The selection of the nodePool class will be altered to allow dynamic loading of classes, so that new interfaces people choose to write will not require altering HOD code.  Provide remote start classes for pbsdsh, mpiexec, ssh, as well as custom scripts (sites often provide mpiexec wrappers that ensure proper selection of network interfaces, etc).  Provide interface classes to SGE and Moab, as well as updated Torque class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5441) HOD refactoring to ease integration with scheduler/resource managers other than torque

Posted by "Nate Woody (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate Woody updated HADOOP-5441:
-------------------------------

    Status: In Progress  (was: Patch Available)

> HOD refactoring to ease integration with scheduler/resource managers other than torque
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hod
>    Affects Versions: 0.19.1
>         Environment: All
>            Reporter: Nate Woody
>             Fix For: 0.19.1
>
>         Attachments: HOD_patch1
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Situation: HOD currently uses the pbsdsh (a distributed shell that works via Torque's TM interface to start remote processes) command to start processes on all nodes in the job.  This call is provided as part of a torqueInterface class that is meant to abstract interactions with the torque resource managers (RMs).  However, this is not functionality typically provided by other RMs, and is instead typically performed by an distributed command available on the HPC system, mpiexec, ssh, or site-specific scripts.  The specificity of pbsdsh to Torque makes writing HOD interfaces to other RMs somewhat difficult as it forces the implementer to choose the remote start method on a somewhat faulty per-RM basis.
> Proposal: Refactor the torqueInterface and nodePool classes so that the choice of remote start method is available as a configuration option in hodrc.  This involves fairly simple changes to remove the pbsdsh command from the Scheduler class and addition configuration step of starting the appropriate remote start wrapper.  The selection of the nodePool class will be altered to allow dynamic loading of classes, so that new interfaces people choose to write will not require altering HOD code.  Provide remote start classes for pbsdsh, mpiexec, ssh, as well as custom scripts (sites often provide mpiexec wrappers that ensure proper selection of network interfaces, etc).  Provide interface classes to SGE and Moab, as well as updated Torque class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5441) HOD refactoring to ease integration with scheduler/resource managers other than torque

Posted by "Nate Woody (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate Woody updated HADOOP-5441:
-------------------------------

    Attachment:     (was: HOD_patch1)

> HOD refactoring to ease integration with scheduler/resource managers other than torque
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hod
>    Affects Versions: 0.19.1
>         Environment: All
>            Reporter: Nate Woody
>             Fix For: 0.19.1
>
>         Attachments: HOD_patch2
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Situation: HOD currently uses the pbsdsh (a distributed shell that works via Torque's TM interface to start remote processes) command to start processes on all nodes in the job.  This call is provided as part of a torqueInterface class that is meant to abstract interactions with the torque resource managers (RMs).  However, this is not functionality typically provided by other RMs, and is instead typically performed by an distributed command available on the HPC system, mpiexec, ssh, or site-specific scripts.  The specificity of pbsdsh to Torque makes writing HOD interfaces to other RMs somewhat difficult as it forces the implementer to choose the remote start method on a somewhat faulty per-RM basis.
> Proposal: Refactor the torqueInterface and nodePool classes so that the choice of remote start method is available as a configuration option in hodrc.  This involves fairly simple changes to remove the pbsdsh command from the Scheduler class and addition configuration step of starting the appropriate remote start wrapper.  The selection of the nodePool class will be altered to allow dynamic loading of classes, so that new interfaces people choose to write will not require altering HOD code.  Provide remote start classes for pbsdsh, mpiexec, ssh, as well as custom scripts (sites often provide mpiexec wrappers that ensure proper selection of network interfaces, etc).  Provide interface classes to SGE and Moab, as well as updated Torque class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5441) HOD refactoring to ease integration with scheduler/resource managers other than torque

Posted by "Nate Woody (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate Woody updated HADOOP-5441:
-------------------------------

    Attachment: HOD_patch2

Previous patch was diff taken from wrong location in tree.

> HOD refactoring to ease integration with scheduler/resource managers other than torque
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hod
>    Affects Versions: 0.19.1
>         Environment: All
>            Reporter: Nate Woody
>             Fix For: 0.19.1
>
>         Attachments: HOD_patch1, HOD_patch2
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Situation: HOD currently uses the pbsdsh (a distributed shell that works via Torque's TM interface to start remote processes) command to start processes on all nodes in the job.  This call is provided as part of a torqueInterface class that is meant to abstract interactions with the torque resource managers (RMs).  However, this is not functionality typically provided by other RMs, and is instead typically performed by an distributed command available on the HPC system, mpiexec, ssh, or site-specific scripts.  The specificity of pbsdsh to Torque makes writing HOD interfaces to other RMs somewhat difficult as it forces the implementer to choose the remote start method on a somewhat faulty per-RM basis.
> Proposal: Refactor the torqueInterface and nodePool classes so that the choice of remote start method is available as a configuration option in hodrc.  This involves fairly simple changes to remove the pbsdsh command from the Scheduler class and addition configuration step of starting the appropriate remote start wrapper.  The selection of the nodePool class will be altered to allow dynamic loading of classes, so that new interfaces people choose to write will not require altering HOD code.  Provide remote start classes for pbsdsh, mpiexec, ssh, as well as custom scripts (sites often provide mpiexec wrappers that ensure proper selection of network interfaces, etc).  Provide interface classes to SGE and Moab, as well as updated Torque class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5441) HOD refactoring to ease integration with scheduler/resource managers other than torque

Posted by "Nate Woody (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate Woody updated HADOOP-5441:
-------------------------------

    Attachment: HOD_patch1

Patch to resolve issue
1) Allow dynamic loading of nodePools (other than torque)
2) Move pbsdsh functionality out of Schedulers/torque into seperate remote-start module
3) Expose new config-setting to allow specification of remote-start method and set default to pbsdsh

> HOD refactoring to ease integration with scheduler/resource managers other than torque
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hod
>    Affects Versions: 0.19.1
>         Environment: All
>            Reporter: Nate Woody
>             Fix For: 0.19.1
>
>         Attachments: HOD_patch1
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Situation: HOD currently uses the pbsdsh (a distributed shell that works via Torque's TM interface to start remote processes) command to start processes on all nodes in the job.  This call is provided as part of a torqueInterface class that is meant to abstract interactions with the torque resource managers (RMs).  However, this is not functionality typically provided by other RMs, and is instead typically performed by an distributed command available on the HPC system, mpiexec, ssh, or site-specific scripts.  The specificity of pbsdsh to Torque makes writing HOD interfaces to other RMs somewhat difficult as it forces the implementer to choose the remote start method on a somewhat faulty per-RM basis.
> Proposal: Refactor the torqueInterface and nodePool classes so that the choice of remote start method is available as a configuration option in hodrc.  This involves fairly simple changes to remove the pbsdsh command from the Scheduler class and addition configuration step of starting the appropriate remote start wrapper.  The selection of the nodePool class will be altered to allow dynamic loading of classes, so that new interfaces people choose to write will not require altering HOD code.  Provide remote start classes for pbsdsh, mpiexec, ssh, as well as custom scripts (sites often provide mpiexec wrappers that ensure proper selection of network interfaces, etc).  Provide interface classes to SGE and Moab, as well as updated Torque class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5441) HOD refactoring to ease integration with scheduler/resource managers other than torque

Posted by "Nate Woody (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate Woody updated HADOOP-5441:
-------------------------------

        Fix Version/s: 0.19.1
    Affects Version/s: 0.19.1
         Release Note: Allow dynamic loading of nodePool objects and moves remote start (pbsdsh) functionality out of Scheduler objects
               Status: Patch Available  (was: Open)

This patch removes the pbsdsh command from Schedulers/torque and moves it into a new module.  NodePool parent object was given a new method to allow selection of the appropriate remote start object at runtime from a configuration method.  Common/desc was modified to provide access to the remote-start config-file option and sets pbsdsh as the default.  Common/nodepoolutil was modified to allow dynamic loading of nodePool objects based on the naming scheme used for the TorquePool class.    

> HOD refactoring to ease integration with scheduler/resource managers other than torque
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hod
>    Affects Versions: 0.19.1
>         Environment: All
>            Reporter: Nate Woody
>             Fix For: 0.19.1
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Situation: HOD currently uses the pbsdsh (a distributed shell that works via Torque's TM interface to start remote processes) command to start processes on all nodes in the job.  This call is provided as part of a torqueInterface class that is meant to abstract interactions with the torque resource managers (RMs).  However, this is not functionality typically provided by other RMs, and is instead typically performed by an distributed command available on the HPC system, mpiexec, ssh, or site-specific scripts.  The specificity of pbsdsh to Torque makes writing HOD interfaces to other RMs somewhat difficult as it forces the implementer to choose the remote start method on a somewhat faulty per-RM basis.
> Proposal: Refactor the torqueInterface and nodePool classes so that the choice of remote start method is available as a configuration option in hodrc.  This involves fairly simple changes to remove the pbsdsh command from the Scheduler class and addition configuration step of starting the appropriate remote start wrapper.  The selection of the nodePool class will be altered to allow dynamic loading of classes, so that new interfaces people choose to write will not require altering HOD code.  Provide remote start classes for pbsdsh, mpiexec, ssh, as well as custom scripts (sites often provide mpiexec wrappers that ensure proper selection of network interfaces, etc).  Provide interface classes to SGE and Moab, as well as updated Torque class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5441) HOD refactoring to ease integration with scheduler/resource managers other than torque

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682364#action_12682364 ] 

Hadoop QA commented on HADOOP-5441:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12402294/HOD_patch1
  against trunk revision 754927.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/68/console

This message is automatically generated.

> HOD refactoring to ease integration with scheduler/resource managers other than torque
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hod
>    Affects Versions: 0.19.1
>         Environment: All
>            Reporter: Nate Woody
>             Fix For: 0.19.1
>
>         Attachments: HOD_patch1
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Situation: HOD currently uses the pbsdsh (a distributed shell that works via Torque's TM interface to start remote processes) command to start processes on all nodes in the job.  This call is provided as part of a torqueInterface class that is meant to abstract interactions with the torque resource managers (RMs).  However, this is not functionality typically provided by other RMs, and is instead typically performed by an distributed command available on the HPC system, mpiexec, ssh, or site-specific scripts.  The specificity of pbsdsh to Torque makes writing HOD interfaces to other RMs somewhat difficult as it forces the implementer to choose the remote start method on a somewhat faulty per-RM basis.
> Proposal: Refactor the torqueInterface and nodePool classes so that the choice of remote start method is available as a configuration option in hodrc.  This involves fairly simple changes to remove the pbsdsh command from the Scheduler class and addition configuration step of starting the appropriate remote start wrapper.  The selection of the nodePool class will be altered to allow dynamic loading of classes, so that new interfaces people choose to write will not require altering HOD code.  Provide remote start classes for pbsdsh, mpiexec, ssh, as well as custom scripts (sites often provide mpiexec wrappers that ensure proper selection of network interfaces, etc).  Provide interface classes to SGE and Moab, as well as updated Torque class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5441) HOD refactoring to ease integration with scheduler/resource managers other than torque

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-5441:
----------------------------------

         Fix Version/s:     (was: 0.19.1)
    Remaining Estimate:     (was: 72h)
     Original Estimate:     (was: 72h)

0.19.1 has been released; this can be committed no earlier than 0.19.2. It is rare for improvements to be backported, but since this is in contrib it's not impossible.

> HOD refactoring to ease integration with scheduler/resource managers other than torque
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hod
>    Affects Versions: 0.19.1
>         Environment: All
>            Reporter: Nate Woody
>
> Situation: HOD currently uses the pbsdsh (a distributed shell that works via Torque's TM interface to start remote processes) command to start processes on all nodes in the job.  This call is provided as part of a torqueInterface class that is meant to abstract interactions with the torque resource managers (RMs).  However, this is not functionality typically provided by other RMs, and is instead typically performed by an distributed command available on the HPC system, mpiexec, ssh, or site-specific scripts.  The specificity of pbsdsh to Torque makes writing HOD interfaces to other RMs somewhat difficult as it forces the implementer to choose the remote start method on a somewhat faulty per-RM basis.
> Proposal: Refactor the torqueInterface and nodePool classes so that the choice of remote start method is available as a configuration option in hodrc.  This involves fairly simple changes to remove the pbsdsh command from the Scheduler class and addition configuration step of starting the appropriate remote start wrapper.  The selection of the nodePool class will be altered to allow dynamic loading of classes, so that new interfaces people choose to write will not require altering HOD code.  Provide remote start classes for pbsdsh, mpiexec, ssh, as well as custom scripts (sites often provide mpiexec wrappers that ensure proper selection of network interfaces, etc).  Provide interface classes to SGE and Moab, as well as updated Torque class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5441) HOD refactoring to ease integration with scheduler/resource managers other than torque

Posted by "Nate Woody (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate Woody updated HADOOP-5441:
-------------------------------

    Attachment:     (was: HOD_patch2)

> HOD refactoring to ease integration with scheduler/resource managers other than torque
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hod
>    Affects Versions: 0.19.1
>         Environment: All
>            Reporter: Nate Woody
>             Fix For: 0.19.1
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Situation: HOD currently uses the pbsdsh (a distributed shell that works via Torque's TM interface to start remote processes) command to start processes on all nodes in the job.  This call is provided as part of a torqueInterface class that is meant to abstract interactions with the torque resource managers (RMs).  However, this is not functionality typically provided by other RMs, and is instead typically performed by an distributed command available on the HPC system, mpiexec, ssh, or site-specific scripts.  The specificity of pbsdsh to Torque makes writing HOD interfaces to other RMs somewhat difficult as it forces the implementer to choose the remote start method on a somewhat faulty per-RM basis.
> Proposal: Refactor the torqueInterface and nodePool classes so that the choice of remote start method is available as a configuration option in hodrc.  This involves fairly simple changes to remove the pbsdsh command from the Scheduler class and addition configuration step of starting the appropriate remote start wrapper.  The selection of the nodePool class will be altered to allow dynamic loading of classes, so that new interfaces people choose to write will not require altering HOD code.  Provide remote start classes for pbsdsh, mpiexec, ssh, as well as custom scripts (sites often provide mpiexec wrappers that ensure proper selection of network interfaces, etc).  Provide interface classes to SGE and Moab, as well as updated Torque class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5441) HOD refactoring to ease integration with scheduler/resource managers other than torque

Posted by "Nate Woody (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate Woody updated HADOOP-5441:
-------------------------------

    Comment: was deleted

(was: This patch removes the pbsdsh command from Schedulers/torque and moves it into a new module.  NodePool parent object was given a new method to allow selection of the appropriate remote start object at runtime from a configuration method.  Common/desc was modified to provide access to the remote-start config-file option and sets pbsdsh as the default.  Common/nodepoolutil was modified to allow dynamic loading of nodePool objects based on the naming scheme used for the TorquePool class.    )

> HOD refactoring to ease integration with scheduler/resource managers other than torque
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hod
>    Affects Versions: 0.19.1
>         Environment: All
>            Reporter: Nate Woody
>             Fix For: 0.19.1
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Situation: HOD currently uses the pbsdsh (a distributed shell that works via Torque's TM interface to start remote processes) command to start processes on all nodes in the job.  This call is provided as part of a torqueInterface class that is meant to abstract interactions with the torque resource managers (RMs).  However, this is not functionality typically provided by other RMs, and is instead typically performed by an distributed command available on the HPC system, mpiexec, ssh, or site-specific scripts.  The specificity of pbsdsh to Torque makes writing HOD interfaces to other RMs somewhat difficult as it forces the implementer to choose the remote start method on a somewhat faulty per-RM basis.
> Proposal: Refactor the torqueInterface and nodePool classes so that the choice of remote start method is available as a configuration option in hodrc.  This involves fairly simple changes to remove the pbsdsh command from the Scheduler class and addition configuration step of starting the appropriate remote start wrapper.  The selection of the nodePool class will be altered to allow dynamic loading of classes, so that new interfaces people choose to write will not require altering HOD code.  Provide remote start classes for pbsdsh, mpiexec, ssh, as well as custom scripts (sites often provide mpiexec wrappers that ensure proper selection of network interfaces, etc).  Provide interface classes to SGE and Moab, as well as updated Torque class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.