You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ralph Castain (Commented) (JIRA)" <ji...@apache.org> on 2011/11/25 17:33:41 UTC

[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157227#comment-13157227 ] 

Ralph Castain commented on MAPREDUCE-2911:
------------------------------------------

Let me preface my comment by confessing my current ignorance of Hadoop. I'm working on rectifying that situation, but won't claim to be anywhere close to fully understanding it.

That said, I'm wondering if it is possible to simply run the MPI processes as standard Hadoop processes? I confess this was my initial thought. Rather than creating a cluster and using mpirun, just have the user start a standard Hadoop job - but with the processes being part of an overall MPI application. Thus, the processes would all call MPI_Init, execute as an MPI application, call MPI_Finalize, and then exit. If a user wants to integrate that application with MapReduce, more power to them - I can see some cases where that would be of interest.

My point here is that you don't need mpirun at all, nor do you need all the overhead of running OMPI daemons. The Hadoop daemons can start and monitor the state of health of the MPI processes just fine. We might add some capability to the Hadoop daemons to assist (e.g., binding), but those would be of use regardless of whether or not the process is part of an MPI application.

As I said, please forgive the ignorance if my suggestion makes no sense.

                
> Hamster: Hadoop And Mpi on the same cluSTER
> -------------------------------------------
>
>                 Key: MAPREDUCE-2911
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2
>    Affects Versions: 0.23.0
>         Environment: All Unix-Environments
>            Reporter: Milind Bhandarkar
>            Assignee: Milind Bhandarkar
>             Fix For: 0.24.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> MPI is commonly used for many machine-learning applications. OpenMPI (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the past, running MPI application on a Hadoop cluster was achieved using Hadoop Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was kludgy. After the resource-manager separation from JobTracker in Hadoop, we have all the tools needed to make MPI a first-class citizen on a Hadoop cluster. I am currently working on the patch to make MPI an application-master. Initial version of this patch will be available soon (hopefully before September 10.) This jira will track the development of Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira