You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@airavata.apache.org by Mangirish Wagle <va...@gmail.com> on 2016/09/21 18:36:20 UTC

Running MPI jobs on Mesos based clusters

Hello All,

I would like to post for everybody's awareness about the study that I am
undertaking this fall, i.e. to evaluate various different frameworks that
would facilitate MPI jobs on Mesos based clusters for Apache Airavata.

Some of the options that I am looking at are:-

   1. MPI support framework bundled with Mesos
   2. Apache Aurora
   3. Marathon
   4. Chronos

Some of the evaluation criteria that I am planning to base my investigation
are:-

   - Ease of setup
   - Documentation
   - Reliability features like HA
   - Scaling and Fault recovery
   - Performance
   - Community Support

Gourav and Shameera are working on ansible based automation to spin up a
mesos based cluster and I am planning to use it to setup a cluster for
experimentation.

Any suggestions or information about prior work on this would be highly
appreciated.

Thank you.

Best Regards,
Mangirish Wagle

Re: Running MPI jobs on Mesos based clusters

Posted by Mangirish Wagle <va...@gmail.com>.

Hello Devs,

I was able to run a basic single node MPI program on Mesos cluster on EC2
using the OpenMPI library through Aurora.

Link to the Mesos Sandbox of the MPI Job
<http://52.91.23.81:5050/#/agents/8d8ad711-1a0f-410e-840d-6190173c69ca-S0/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2F8d8ad711-1a0f-410e-840d-6190173c69ca-S0%2Fframeworks%2Fa257854f-3f3c-462c-8edb-c7b9dc3c79f5-0000%2Fexecutors%2Fthermos-centos-devel-mpi_test-0-a73d0de0-e232-4b27-8ebd-c42e57a38253%2Fruns%2F393f692c-1876-4c54-90bc-bf58a67cf1ae%2Fsandbox>

Link to the outputs of the job
<http://52.91.23.81:5050/#/agents/8d8ad711-1a0f-410e-840d-6190173c69ca-S0/browse?path=%2Fvar%2Flib%2Fmesos%2Fslaves%2F8d8ad711-1a0f-410e-840d-6190173c69ca-S0%2Fframeworks%2Fa257854f-3f3c-462c-8edb-c7b9dc3c79f5-0000%2Fexecutors%2Fthermos-centos-devel-mpi_test-0-a73d0de0-e232-4b27-8ebd-c42e57a38253%2Fruns%2F393f692c-1876-4c54-90bc-bf58a67cf1ae%2Fsandbox%2F.logs%2Ftest_mpi%2F0>

Following are the details of steps that I took to run the MPI program.


   1. For running the MPI, the Mesos slaves need to be equipped with an MPI
   library. I Installed OpenMPI 2.0.1 on slaves, using a quick installation
   script that I have created (*openmpi_install.sh*).
   2. Prerequisite setup on Mesos slave:-
      - I used a sample test MPI C program and compiled it using mpicc
      compiler provided by OpenMPI. I have attached the code file '
      *mpi_test.c*' that I used with this email (*Reference:
      https://hpcc.usc.edu/support/documentation/examples-of-mpi-programs
      <https://hpcc.usc.edu/support/documentation/examples-of-mpi-programs>*
      ).
      - The 'mpirun' tool provided by OpenMPI requires a machine host file
      that specifies the list of hosts to run the jobs on. I used a host file
      with just one 'localhost' entry, targeting single node MPI
execution local
      to the target slave on which Aurora would run the job.
   3. Next step is to launch an Aurora job calling mpirun on the compiled
   binary of the C program.
   4. I created an Aurora config file '*mpi_test.aurora*' which had steps
   to copy the binary and machine host file inside the execution container and
   call mpirun.
   5. The job was then submitted using the aurora command line client:-
      - # aurora job create example/centos/devel/mpi_test mpi_test.aurora


*Further improvements:-*

   - We may have a shared file system between the masters and slaves using
   NFS/ SSHFS which could be used to share the MPI executables avoiding manual
   copy in the steps above.
   - Slave configurations described could be automated through ansible.


*Further work I would want to focus on w.r.t gang scheduling:-*
Multiple nodes could be mimicked by launching multiple Aurora processes
using separate containers. But the key issues that need to be addressed
are:-

   1. We need a reliable way for inter container communication for parallel
   processes.
   2. We need to figure out a reliable technique for external scaffolding
   that is required to synchronize between the parallel processes.


Any thoughts/ suggestions would be highly appreciated.

Best Regards,
Mangirish


On Thu, Oct 20, 2016 at 11:53 PM, Mangirish Wagle <va...@gmail.com>
wrote:

> Thanks *Gourav, *for sharing the information. I may need your help with
> quick ramp up for jumping into using Aurora on the mesos cluster and
> exploring its capabilities more.
>
> Hi *Suresh*,
>
> Thanks for bringing that up. I did notice that repository earlier. It is
> maintained by the same developer whom I am in touch over emails from Mesos
> team. He did not specifically say anything about mesos-slurm repo in his
> earlier emails, rather recommended looking at the GaSc repo. I observed
> that the code is almost the same as the main slurm repo code (
> https://github.com/SchedMD/slurm). The readme instructions are not
> specific to mesos. Nonetheless, I have dropped Niklas an email asking him
> if there has been some mesos specific customization in this repo. It would
> be interesting to know if/ how he has played around with it over mesos.
> I shall keep updating about the info that I get from him on the dev list.
>
> Regards,
> Mangirish
>
> On Thu, Oct 20, 2016 at 11:19 PM, Suresh Marru <sm...@apache.org> wrote:
>
>> Hi Gourav, Mangirish,
>>
>> Did you checkout SLURM on Mesos - https://github.com/nqn/slurm-mesos
>>
>> Note that this is GPL licensed code and incompatible with ASL V2. It does
>> not preclude from using it, but need to watch out when integrating
>> incompatible licensed codes.
>>
>> Suresh
>>
>> On Oct 20, 2016, at 10:26 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>
>> wrote:
>>
>> Hi Mangirish, devs:
>>
>> The Aurora documentation for “Tasks” & “Processes” provides very good
>> information which I felt would be helpful in implementing gang scheduling,
>> as you mentioned.
>>
>> http://aurora.apache.org/documentation/latest/reference/configuration/
>>
>>
>> From what I understood, there are these constraints:
>> 1.       If targeting single-node (multi-core) MPI, then a “JOB” will be
>> broken down into multiple “PROCESSESES”, each of which will run on these
>> multi-cores.
>> 2.       Even if *any one* of these processes fail, then the JOB should
>> be marked as failed.
>>
>> As mentioned in my earlier email, Aurora provides Job abstraction – “a
>> job consists of multiple tasks, which in turn consist of multiple
>> processes”. This abstraction comes in extremely handy if we want to run MPI
>> jobs on a single node.
>>
>> While submitting a job to Aurora, we can control the following parameters
>> for a TASK:
>>
>> a.       “max_failures” for a TASK – the number of failed processes
>> which is needed to mark a task as failed. Hence if we set max_failures = 1,
>> then even if a single process in a task fails, Aurora will mark that task
>> as failed.
>> *Note*: Since a JOB can have multiple tasks, and even a JOB has
>> “max_task_failures” parameter, we can set this to 1.
>>
>> b.       “max_concurrency” for a TASK – number of processes to run in
>> parallel. If a node has 16 cores, then we can limit the amount of
>> parallelism to <=16.
>>
>> I did not get much time to experiment with these parameters for job
>> submission, but found this document to be handy and worth sharing. Hope
>> this helps!
>>
>> Thanks and Regards,
>> Gourav Shenoy
>>
>> *From: *Mangirish Wagle <va...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Tuesday, October 18, 2016 at 11:48 AM
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Re: Running MPI jobs on Mesos based clusters
>>
>> Sure Suresh, will update my findings on the mailing list. Thanks!
>>
>> On Tue, Oct 18, 2016 at 7:59 AM, Suresh Marru <sm...@apache.org> wrote:
>>
>> Hi Mangirish,
>>
>> This is interesting. Looking forward to see what you will find our
>> further on gang scheduling support. Since the compute nodes are getting
>> bigger, even if you can explore single node MPI (on Jetstream using 22
>> cores) that will help.
>>
>> Suresh
>>
>> P.S. Good to see the momentum on mailing list discussions on such topics.
>>
>>
>> On Oct 18, 2016, at 1:54 AM, Mangirish Wagle <va...@gmail.com>
>> wrote:
>>
>>
>> Hello Devs,
>>
>> Here is an update on some new learnings and thoughts based on my
>> interactions with Mesos and Aurora devs.
>>
>> MPI implementations in Mesos repositories (like MPI Hydra) rely on
>> obsolete MPI platforms and no longer supported my the developer community.
>> Hence it is not recommended that we use this for our purpose.
>>
>> One of the known ways of running MPI jobs over mesos is using "gang
>> scheduling" which is basically distributing the MPI run over multiple jobs
>> on mesos in place of multiple nodes. The challenge here is the jobs need to
>> be scheduled as one task and any job errored should collectively error out
>> the main program including all the distributed jobs.
>>
>> One of the Mesos developer (Niklas Nielsen) pointed me out to his work on
>> gang scheduling: https://github.com/nqn. This code may not be fully
>> tested but certainly a good starting point to explore gang scheduling.
>>
>> One of the Aurora developer (Stephen Erb) suggests using gang scheduling
>> on top of Aurora. Aurora scheduler assumes that every job is independent.
>> Hence, there would be a need to develop some external scaffolding to
>> coordinate and schedule these jobs, which might not be trivial. One
>> advantage of using Aurora as a backend for gang scheduling is that we would
>> inherit the robustness of Aurora, which otherwise would be a key challenge
>> if targeting bare mesos.
>>
>> Alternative to all the options above, I think we should probably be able
>> to run a 1 node MPI job through Aurora. A resource offer with CPUs and
>> Memory from Mesos is abstracted as a single runtime, but is mapped to
>> multiple nodes underneath, which eventually would exploit distributed
>> resource capabilities.
>>
>> I intend to try out the 1 node MPI job submission approach first and
>> simultaneously explore the gang scheduling approach.
>>
>> Please let me know your thoughts/ suggestions.
>> Best Regards,
>> Mangirish
>>
>>
>>
>> On Thu, Oct 13, 2016 at 12:39 PM, Mangirish Wagle <
>> vaglomangirish@gmail.com> wrote:
>>
>> Hi Marlon,
>>
>> Thanks for confirming and sharing the legal link.
>> -Mangirish
>>
>> On Thu, Oct 13, 2016 at 12:13 PM, Pierce, Marlon <ma...@iu.edu> wrote:
>>
>> BSD is ok: https://www.apache.org/legal/resolved.
>>
>> *From: *Mangirish Wagle <va...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Thursday, October 13, 2016 at 12:03 PM
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Re: Running MPI jobs on Mesos based clusters
>>
>>
>> Hello Devs,
>>
>> I needed some advice on the license of the MPI libraries. The MPICH
>> library that I have been trying claims to have a "BSD Like" license (
>> http://git.mpich.org/mpich.git/blob/HEAD:/COPYRIGHT).
>>
>> I am aware that OpenMPI which uses BSD license is currently used in our
>> application. I had chosen to start investigating MPICH because it claims to
>> be a highly portable and high quality implementation of latest MPI
>> standard, suitable to cloud based clusters.
>>
>> If anyone could please advise on the acceptance of the MPICH libraries
>> MSD Like license for ASF, that would help.
>>
>> Thank you.
>> Best Regards,
>> Mangirish Wagle
>>
>> On Thu, Oct 6, 2016 at 1:48 AM, Mangirish Wagle <va...@gmail.com>
>> wrote:
>>
>> Hello Devs,
>>
>> The network issue mentioned above now stands resolved. The problem was
>> with the iptables had some conflicting rules which blocked the traffic. It
>> was resolved by simple iptables flush.
>>
>> Here is the test MPI program running on multiple machines:-
>>
>> [centos@mesos-slave-1 ~]$ mpiexec -f machinefile -n 2 ./mpitest
>> Hello world!  I am process number: 0 on host mesos-slave-1
>> Hello world!  I am process number: 1 on host mesos-slave-2
>>
>> The next step is to try invoking this through framework like Marathon.
>> However, the job submission still does not run through Marathon. It seems
>> to gets stuck in the 'waiting' state forever (For example
>> http://149.165.170.245:8080/ui/#/apps/%2Fmaw-try). Further, I notice
>> that Marathon is listed under 'inactive frameworks' in mesos dashboard (
>> http://149.165.171.33:5050/#/frameworks).
>>
>> I am trying to get this working, though any help/ clues with this would
>> be really helpful.
>>
>> Thanks and Regards,
>>
>> Mangirish Wagle
>>
>>
>> On Fri, Sep 30, 2016 at 9:21 PM, Mangirish Wagle <
>> vaglomangirish@gmail.com> wrote:
>>
>> Hello Devs,
>>
>> I am currently running a sample MPI C program using 'mpiexec' provided by
>> MPICH. I followed their installation guide
>> <http://www.mpich.org/static/downloads/3.2/mpich-3.2-installguide.pdf> to
>> install the libraries on the master and slave nodes of the mesos cluster.
>>
>> The approach that I am trying out here is that I am equipping the
>> underlying nodes with MPI handling tools and then use the Mesos framework
>> like Marathon/ Aurora to submit jobs to run MPI programs by invoking these
>> tools.
>>
>> You can potentially run an MPI program using mpiexec in the following
>> manner:-
>>
>> # *mpiexec -f machinefile -n 2 ./mpitest*
>>
>>    - *machinefile *-> File which contains an inventory of machines to
>>    run the program on and number of processes on each machine.
>>    - *mpitest *-> MPI program compiled in C using mpicc compiler. The
>>    program returns the process number and he hostname of the machine running
>>    the process.
>>    - *-n *option indicates number of processes that it needs to spawn
>>
>> Example of machinefile contents:-
>>
>> # Entries in the format <hostname/IP>:<number of processes>
>> mesos-slave-1:1
>> mesos-slave-2:1
>>
>> The reason for choosing slaves is that Mesos runs the jobs on slaves,
>> managed by 'agents' pertaining to the slaves.
>>
>> Output of the program with '-n 1':-
>>
>> # mpiexec -f machinefile -n 1 ./mpitest
>> Hello world!  I am process number: 0 on host mesos-slave-1
>>
>> But when I try for '-n 2', I am hitting the following error:-
>>
>> # mpiexec -f machinefile -n 2 ./mpitest
>> [proxy:0:1@mesos-slave-2] HYDU_sock_connect
>> (/home/centos/mpich-3.2/src/pm/hydra/utils/sock/sock.c:172): unable to
>> connect from "mesos-slave-2" to "mesos-slave-1" (No route to host)
>> [proxy:0:1@mesos-slave-2] main (/home/centos/mpich-3.2/src/pm
>> /hydra/pm/pmiserv/pmip.c:189): *unable to connect to server
>> mesos-slave-1 at port 44788* (check for firewalls!)
>>
>> It seems to not allow the program execution due to network traffic being
>> blocked. I checked security groups in scigap openstack for mesos-slave-1,
>> mesos-slave-2 nodes and it is set to 'wideopen' policy. Furthermore, I
>> tried adding explicit rules to the policies to allow all TCP and UDP
>> (Currently I am not sure what protocol is used underneath), even then it
>> continues throwing this error.
>>
>> Any clues, suggestions, comments about the error or approach as a whole
>> would be helpful.
>>
>> Thanks and Regards,
>> Mangirish Wagle
>>
>> *Error! Filename not specified.*
>>
>> On Tue, Sep 27, 2016 at 11:23 AM, Mangirish Wagle <
>> vaglomangirish@gmail.com> wrote:
>>
>> Hello Devs,
>>
>> Thanks Gourav and Shameera for all the work w.r.t. setting up the
>> Mesos-Marathon cluster on Jetstream.
>>
>> I am currently evaluating MPICH (http://www.mpich.org/about/overview/)
>> to be used for launching MPI jobs on top of mesos. MPICH version 1.2
>> supports Mesos based MPI scheduling. I have been also trying to submit jobs
>> to the cluster through Marathon. However, in either cases I am currently
>> facing issues which I am working to get resolved.
>>
>> I am compiling my notes into the following google doc. You may please
>> review and let me know your comments, suggestions.
>>
>> https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3
>> la25y6bcPcmrTD6nR8g/edit?usp=sharing
>>
>> Thanks and Regards,
>> Mangirish Wagle
>>
>>
>> *Error! Filename not specified.*
>>
>> On Wed, Sep 21, 2016 at 3:20 PM, Shenoy, Gourav Ganesh <
>> goshenoy@indiana.edu> wrote:
>>
>> Hi Mangirish,
>>
>> I have set up a Mesos-Marathon cluster for you on Jetstream. I will share
>> with you with the cluster details in a separate email. Kindly note that
>> there are 3 masters & 2 slaves in this cluster.
>>
>> I am also working on automating this process for Jetstream (similar to
>> Shameera’s ansible script for EC2) and when that is ready, we can create
>> clusters or add/remove slave machines from the cluster.
>>
>> Thanks and Regards,
>> Gourav Shenoy
>>
>> *From: *Mangirish Wagle <va...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Wednesday, September 21, 2016 at 2:36 PM
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Running MPI jobs on Mesos based clusters
>>
>> Hello All,
>>
>> I would like to post for everybody's awareness about the study that I am
>> undertaking this fall, i.e. to evaluate various different frameworks that
>> would facilitate MPI jobs on Mesos based clusters for Apache Airavata.
>>
>> Some of the options that I am looking at are:-
>>
>>    1. MPI support framework bundled with Mesos
>>    2. Apache Aurora
>>    3. Marathon
>>    4. Chronos
>>
>> Some of the evaluation criteria that I am planning to base my
>> investigation are:-
>>
>>    - Ease of setup
>>    - Documentation
>>    - Reliability features like HA
>>    - Scaling and Fault recovery
>>    - Performance
>>    - Community Support
>>
>> Gourav and Shameera are working on ansible based automation to spin up a
>> mesos based cluster and I am planning to use it to setup a cluster for
>> experimentation.
>>
>> Any suggestions or information about prior work on this would be highly
>> appreciated.
>>
>> Thank you.
>>
>> Best Regards,
>> Mangirish Wagle
>> *Error! Filename not specified.*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Running MPI jobs on Mesos based clusters

Posted by Mangirish Wagle <va...@gmail.com>.

Thanks *Gourav, *for sharing the information. I may need your help with
quick ramp up for jumping into using Aurora on the mesos cluster and
exploring its capabilities more.

Hi *Suresh*,

Thanks for bringing that up. I did notice that repository earlier. It is
maintained by the same developer whom I am in touch over emails from Mesos
team. He did not specifically say anything about mesos-slurm repo in his
earlier emails, rather recommended looking at the GaSc repo. I observed
that the code is almost the same as the main slurm repo code (
https://github.com/SchedMD/slurm). The readme instructions are not specific
to mesos. Nonetheless, I have dropped Niklas an email asking him if there
has been some mesos specific customization in this repo. It would be
interesting to know if/ how he has played around with it over mesos.
I shall keep updating about the info that I get from him on the dev list.

Regards,
Mangirish

On Thu, Oct 20, 2016 at 11:19 PM, Suresh Marru <sm...@apache.org> wrote:

> Hi Gourav, Mangirish,
>
> Did you checkout SLURM on Mesos - https://github.com/nqn/slurm-mesos
>
> Note that this is GPL licensed code and incompatible with ASL V2. It does
> not preclude from using it, but need to watch out when integrating
> incompatible licensed codes.
>
> Suresh
>
> On Oct 20, 2016, at 10:26 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>
> wrote:
>
> Hi Mangirish, devs:
>
> The Aurora documentation for “Tasks” & “Processes” provides very good
> information which I felt would be helpful in implementing gang scheduling,
> as you mentioned.
>
> http://aurora.apache.org/documentation/latest/reference/configuration/
>
>
> From what I understood, there are these constraints:
> 1.       If targeting single-node (multi-core) MPI, then a “JOB” will be
> broken down into multiple “PROCESSESES”, each of which will run on these
> multi-cores.
> 2.       Even if *any one* of these processes fail, then the JOB should
> be marked as failed.
>
> As mentioned in my earlier email, Aurora provides Job abstraction – “a job
> consists of multiple tasks, which in turn consist of multiple processes”.
> This abstraction comes in extremely handy if we want to run MPI jobs on a
> single node.
>
> While submitting a job to Aurora, we can control the following parameters
> for a TASK:
>
> a.       “max_failures” for a TASK – the number of failed processes which
> is needed to mark a task as failed. Hence if we set max_failures = 1, then
> even if a single process in a task fails, Aurora will mark that task as
> failed.
> *Note*: Since a JOB can have multiple tasks, and even a JOB has
> “max_task_failures” parameter, we can set this to 1.
>
> b.       “max_concurrency” for a TASK – number of processes to run in
> parallel. If a node has 16 cores, then we can limit the amount of
> parallelism to <=16.
>
> I did not get much time to experiment with these parameters for job
> submission, but found this document to be handy and worth sharing. Hope
> this helps!
>
> Thanks and Regards,
> Gourav Shenoy
>
> *From: *Mangirish Wagle <va...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Tuesday, October 18, 2016 at 11:48 AM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: Running MPI jobs on Mesos based clusters
>
> Sure Suresh, will update my findings on the mailing list. Thanks!
>
> On Tue, Oct 18, 2016 at 7:59 AM, Suresh Marru <sm...@apache.org> wrote:
>
> Hi Mangirish,
>
> This is interesting. Looking forward to see what you will find our further
> on gang scheduling support. Since the compute nodes are getting bigger,
> even if you can explore single node MPI (on Jetstream using 22 cores) that
> will help.
>
> Suresh
>
> P.S. Good to see the momentum on mailing list discussions on such topics.
>
>
> On Oct 18, 2016, at 1:54 AM, Mangirish Wagle <va...@gmail.com>
> wrote:
>
>
> Hello Devs,
>
> Here is an update on some new learnings and thoughts based on my
> interactions with Mesos and Aurora devs.
>
> MPI implementations in Mesos repositories (like MPI Hydra) rely on
> obsolete MPI platforms and no longer supported my the developer community.
> Hence it is not recommended that we use this for our purpose.
>
> One of the known ways of running MPI jobs over mesos is using "gang
> scheduling" which is basically distributing the MPI run over multiple jobs
> on mesos in place of multiple nodes. The challenge here is the jobs need to
> be scheduled as one task and any job errored should collectively error out
> the main program including all the distributed jobs.
>
> One of the Mesos developer (Niklas Nielsen) pointed me out to his work on
> gang scheduling: https://github.com/nqn. This code may not be fully
> tested but certainly a good starting point to explore gang scheduling.
>
> One of the Aurora developer (Stephen Erb) suggests using gang scheduling
> on top of Aurora. Aurora scheduler assumes that every job is independent.
> Hence, there would be a need to develop some external scaffolding to
> coordinate and schedule these jobs, which might not be trivial. One
> advantage of using Aurora as a backend for gang scheduling is that we would
> inherit the robustness of Aurora, which otherwise would be a key challenge
> if targeting bare mesos.
>
> Alternative to all the options above, I think we should probably be able
> to run a 1 node MPI job through Aurora. A resource offer with CPUs and
> Memory from Mesos is abstracted as a single runtime, but is mapped to
> multiple nodes underneath, which eventually would exploit distributed
> resource capabilities.
>
> I intend to try out the 1 node MPI job submission approach first and
> simultaneously explore the gang scheduling approach.
>
> Please let me know your thoughts/ suggestions.
> Best Regards,
> Mangirish
>
>
>
> On Thu, Oct 13, 2016 at 12:39 PM, Mangirish Wagle <
> vaglomangirish@gmail.com> wrote:
>
> Hi Marlon,
>
> Thanks for confirming and sharing the legal link.
> -Mangirish
>
> On Thu, Oct 13, 2016 at 12:13 PM, Pierce, Marlon <ma...@iu.edu> wrote:
>
> BSD is ok: https://www.apache.org/legal/resolved.
>
> *From: *Mangirish Wagle <va...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Thursday, October 13, 2016 at 12:03 PM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: Running MPI jobs on Mesos based clusters
>
>
> Hello Devs,
>
> I needed some advice on the license of the MPI libraries. The MPICH
> library that I have been trying claims to have a "BSD Like" license (
> http://git.mpich.org/mpich.git/blob/HEAD:/COPYRIGHT).
>
> I am aware that OpenMPI which uses BSD license is currently used in our
> application. I had chosen to start investigating MPICH because it claims to
> be a highly portable and high quality implementation of latest MPI
> standard, suitable to cloud based clusters.
>
> If anyone could please advise on the acceptance of the MPICH libraries MSD
> Like license for ASF, that would help.
>
> Thank you.
> Best Regards,
> Mangirish Wagle
>
> On Thu, Oct 6, 2016 at 1:48 AM, Mangirish Wagle <va...@gmail.com>
> wrote:
>
> Hello Devs,
>
> The network issue mentioned above now stands resolved. The problem was
> with the iptables had some conflicting rules which blocked the traffic. It
> was resolved by simple iptables flush.
>
> Here is the test MPI program running on multiple machines:-
>
> [centos@mesos-slave-1 ~]$ mpiexec -f machinefile -n 2 ./mpitest
> Hello world!  I am process number: 0 on host mesos-slave-1
> Hello world!  I am process number: 1 on host mesos-slave-2
>
> The next step is to try invoking this through framework like Marathon.
> However, the job submission still does not run through Marathon. It seems
> to gets stuck in the 'waiting' state forever (For example
> http://149.165.170.245:8080/ui/#/apps/%2Fmaw-try). Further, I notice that
> Marathon is listed under 'inactive frameworks' in mesos dashboard (
> http://149.165.171.33:5050/#/frameworks).
>
> I am trying to get this working, though any help/ clues with this would be
> really helpful.
>
> Thanks and Regards,
>
> Mangirish Wagle
>
>
> On Fri, Sep 30, 2016 at 9:21 PM, Mangirish Wagle <va...@gmail.com>
> wrote:
>
> Hello Devs,
>
> I am currently running a sample MPI C program using 'mpiexec' provided by
> MPICH. I followed their installation guide
> <http://www.mpich.org/static/downloads/3.2/mpich-3.2-installguide.pdf> to
> install the libraries on the master and slave nodes of the mesos cluster.
>
> The approach that I am trying out here is that I am equipping the
> underlying nodes with MPI handling tools and then use the Mesos framework
> like Marathon/ Aurora to submit jobs to run MPI programs by invoking these
> tools.
>
> You can potentially run an MPI program using mpiexec in the following
> manner:-
>
> # *mpiexec -f machinefile -n 2 ./mpitest*
>
>    - *machinefile *-> File which contains an inventory of machines to run
>    the program on and number of processes on each machine.
>    - *mpitest *-> MPI program compiled in C using mpicc compiler. The
>    program returns the process number and he hostname of the machine running
>    the process.
>    - *-n *option indicates number of processes that it needs to spawn
>
> Example of machinefile contents:-
>
> # Entries in the format <hostname/IP>:<number of processes>
> mesos-slave-1:1
> mesos-slave-2:1
>
> The reason for choosing slaves is that Mesos runs the jobs on slaves,
> managed by 'agents' pertaining to the slaves.
>
> Output of the program with '-n 1':-
>
> # mpiexec -f machinefile -n 1 ./mpitest
> Hello world!  I am process number: 0 on host mesos-slave-1
>
> But when I try for '-n 2', I am hitting the following error:-
>
> # mpiexec -f machinefile -n 2 ./mpitest
> [proxy:0:1@mesos-slave-2] HYDU_sock_connect (/home/centos/mpich-3.2/src/
> pm/hydra/utils/sock/sock.c:172): unable to connect from "mesos-slave-2"
> to "mesos-slave-1" (No route to host)
> [proxy:0:1@mesos-slave-2] main (/home/centos/mpich-3.2/src/
> pm/hydra/pm/pmiserv/pmip.c:189): *unable to connect to server
> mesos-slave-1 at port 44788* (check for firewalls!)
>
> It seems to not allow the program execution due to network traffic being
> blocked. I checked security groups in scigap openstack for mesos-slave-1,
> mesos-slave-2 nodes and it is set to 'wideopen' policy. Furthermore, I
> tried adding explicit rules to the policies to allow all TCP and UDP
> (Currently I am not sure what protocol is used underneath), even then it
> continues throwing this error.
>
> Any clues, suggestions, comments about the error or approach as a whole
> would be helpful.
>
> Thanks and Regards,
> Mangirish Wagle
>
> *Error! Filename not specified.*
>
> On Tue, Sep 27, 2016 at 11:23 AM, Mangirish Wagle <
> vaglomangirish@gmail.com> wrote:
>
> Hello Devs,
>
> Thanks Gourav and Shameera for all the work w.r.t. setting up the
> Mesos-Marathon cluster on Jetstream.
>
> I am currently evaluating MPICH (http://www.mpich.org/about/overview/) to
> be used for launching MPI jobs on top of mesos. MPICH version 1.2 supports
> Mesos based MPI scheduling. I have been also trying to submit jobs to the
> cluster through Marathon. However, in either cases I am currently facing
> issues which I am working to get resolved.
>
> I am compiling my notes into the following google doc. You may please
> review and let me know your comments, suggestions.
>
> https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3la25y6bc
> PcmrTD6nR8g/edit?usp=sharing
>
> Thanks and Regards,
> Mangirish Wagle
>
>
> *Error! Filename not specified.*
>
> On Wed, Sep 21, 2016 at 3:20 PM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
> Hi Mangirish,
>
> I have set up a Mesos-Marathon cluster for you on Jetstream. I will share
> with you with the cluster details in a separate email. Kindly note that
> there are 3 masters & 2 slaves in this cluster.
>
> I am also working on automating this process for Jetstream (similar to
> Shameera’s ansible script for EC2) and when that is ready, we can create
> clusters or add/remove slave machines from the cluster.
>
> Thanks and Regards,
> Gourav Shenoy
>
> *From: *Mangirish Wagle <va...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Wednesday, September 21, 2016 at 2:36 PM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Running MPI jobs on Mesos based clusters
>
> Hello All,
>
> I would like to post for everybody's awareness about the study that I am
> undertaking this fall, i.e. to evaluate various different frameworks that
> would facilitate MPI jobs on Mesos based clusters for Apache Airavata.
>
> Some of the options that I am looking at are:-
>
>    1. MPI support framework bundled with Mesos
>    2. Apache Aurora
>    3. Marathon
>    4. Chronos
>
> Some of the evaluation criteria that I am planning to base my
> investigation are:-
>
>    - Ease of setup
>    - Documentation
>    - Reliability features like HA
>    - Scaling and Fault recovery
>    - Performance
>    - Community Support
>
> Gourav and Shameera are working on ansible based automation to spin up a
> mesos based cluster and I am planning to use it to setup a cluster for
> experimentation.
>
> Any suggestions or information about prior work on this would be highly
> appreciated.
>
> Thank you.
>
> Best Regards,
> Mangirish Wagle
> *Error! Filename not specified.*
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: Running MPI jobs on Mesos based clusters

Posted by Suresh Marru <sm...@apache.org>.

Hi Gourav, Mangirish,

Did you checkout SLURM on Mesos - https://github.com/nqn/slurm-mesos <https://github.com/nqn/slurm-mesos>

Note that this is GPL licensed code and incompatible with ASL V2. It does not preclude from using it, but need to watch out when integrating incompatible licensed codes. 

Suresh

> On Oct 20, 2016, at 10:26 PM, Shenoy, Gourav Ganesh <go...@indiana.edu> wrote:
> 
> Hi Mangirish, devs:
>  
> The Aurora documentation for “Tasks” & “Processes” provides very good information which I felt would be helpful in implementing gang scheduling, as you mentioned.
>  
> http://aurora.apache.org/documentation/latest/reference/configuration/ <http://aurora.apache.org/documentation/latest/reference/configuration/>
>  
>  
> From what I understood, there are these constraints:
> 1.       If targeting single-node (multi-core) MPI, then a “JOB” will be broken down into multiple “PROCESSESES”, each of which will run on these multi-cores.
> 2.       Even if any one of these processes fail, then the JOB should be marked as failed.
>  
> As mentioned in my earlier email, Aurora provides Job abstraction – “a job consists of multiple tasks, which in turn consist of multiple processes”. This abstraction comes in extremely handy if we want to run MPI jobs on a single node.
>  
> While submitting a job to Aurora, we can control the following parameters for a TASK:
>  
> a.       “max_failures” for a TASK – the number of failed processes which is needed to mark a task as failed. Hence if we set max_failures = 1, then even if a single process in a task fails, Aurora will mark that task as failed.
> Note: Since a JOB can have multiple tasks, and even a JOB has “max_task_failures” parameter, we can set this to 1.
>  
> b.       “max_concurrency” for a TASK – number of processes to run in parallel. If a node has 16 cores, then we can limit the amount of parallelism to <=16.
>  
> I did not get much time to experiment with these parameters for job submission, but found this document to be handy and worth sharing. Hope this helps!
>  
> Thanks and Regards,
> Gourav Shenoy
>  
> From: Mangirish Wagle <va...@gmail.com>
> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> Date: Tuesday, October 18, 2016 at 11:48 AM
> To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> Subject: Re: Running MPI jobs on Mesos based clusters
>  
> Sure Suresh, will update my findings on the mailing list. Thanks!
>  
> On Tue, Oct 18, 2016 at 7:59 AM, Suresh Marru <smarru@apache.org <ma...@apache.org>> wrote:
> Hi Mangirish, 
>  
> This is interesting. Looking forward to see what you will find our further on gang scheduling support. Since the compute nodes are getting bigger, even if you can explore single node MPI (on Jetstream using 22 cores) that will help.  
>  
> Suresh
>  
> P.S. Good to see the momentum on mailing list discussions on such topics. 
>  
> On Oct 18, 2016, at 1:54 AM, Mangirish Wagle <vaglomangirish@gmail.com <ma...@gmail.com>> wrote:
>  
> Hello Devs,
> 
> Here is an update on some new learnings and thoughts based on my interactions with Mesos and Aurora devs.
> 
> MPI implementations in Mesos repositories (like MPI Hydra) rely on obsolete MPI platforms and no longer supported my the developer community. Hence it is not recommended that we use this for our purpose.
> 
> One of the known ways of running MPI jobs over mesos is using "gang scheduling" which is basically distributing the MPI run over multiple jobs on mesos in place of multiple nodes. The challenge here is the jobs need to be scheduled as one task and any job errored should collectively error out the main program including all the distributed jobs. 
> 
> One of the Mesos developer (Niklas Nielsen) pointed me out to his work on gang scheduling: https://github.com/nqn <https://github.com/nqn>. This code may not be fully tested but certainly a good starting point to explore gang scheduling.
> 
> One of the Aurora developer (Stephen Erb) suggests using gang scheduling on top of Aurora. Aurora scheduler assumes that every job is independent. Hence, there would be a need to develop some external scaffolding to coordinate and schedule these jobs, which might not be trivial. One advantage of using Aurora as a backend for gang scheduling is that we would inherit the robustness of Aurora, which otherwise would be a key challenge if targeting bare mesos.
> 
> Alternative to all the options above, I think we should probably be able to run a 1 node MPI job through Aurora. A resource offer with CPUs and Memory from Mesos is abstracted as a single runtime, but is mapped to multiple nodes underneath, which eventually would exploit distributed resource capabilities.
> 
> I intend to try out the 1 node MPI job submission approach first and simultaneously explore the gang scheduling approach.
> 
> Please let me know your thoughts/ suggestions.
> 
> Best Regards,
> Mangirish 
>  
> 
>  
> On Thu, Oct 13, 2016 at 12:39 PM, Mangirish Wagle <vaglomangirish@gmail.com <ma...@gmail.com>> wrote:
> Hi Marlon,
> Thanks for confirming and sharing the legal link.
> 
> -Mangirish
>  
> On Thu, Oct 13, 2016 at 12:13 PM, Pierce, Marlon <marpierc@iu.edu <ma...@iu.edu>> wrote:
> BSD is ok: https://www.apache.org/legal/resolved <https://www.apache.org/legal/resolved>.
>  
> From: Mangirish Wagle <vaglomangirish@gmail.com <ma...@gmail.com>>
> Reply-To: "dev@airavata.apache.org <ma...@airavata.apache.org>" <dev@airavata.apache.org <ma...@airavata.apache.org>>
> Date: Thursday, October 13, 2016 at 12:03 PM
> To: "dev@airavata.apache.org <ma...@airavata.apache.org>" <dev@airavata.apache.org <ma...@airavata.apache.org>>
> Subject: Re: Running MPI jobs on Mesos based clusters
>  
> Hello Devs,
> 
> I needed some advice on the license of the MPI libraries. The MPICH library that I have been trying claims to have a "BSD Like" license (http://git.mpich.org/mpich.git/blob/HEAD:/COPYRIGHT <http://git.mpich.org/mpich.git/blob/HEAD:/COPYRIGHT>).
> 
> I am aware that OpenMPI which uses BSD license is currently used in our application. I had chosen to start investigating MPICH because it claims to be a highly portable and high quality implementation of latest MPI standard, suitable to cloud based clusters.
> 
> If anyone could please advise on the acceptance of the MPICH libraries MSD Like license for ASF, that would help.
> 
> Thank you.
> 
> Best Regards,
> Mangirish Wagle
>  
> On Thu, Oct 6, 2016 at 1:48 AM, Mangirish Wagle <vaglomangirish@gmail.com <ma...@gmail.com>> wrote:
> Hello Devs,
>  
> The network issue mentioned above now stands resolved. The problem was with the iptables had some conflicting rules which blocked the traffic. It was resolved by simple iptables flush.
>  
> Here is the test MPI program running on multiple machines:-
>  
> [centos@mesos-slave-1 ~]$ mpiexec -f machinefile -n 2 ./mpitest
> Hello world!  I am process number: 0 on host mesos-slave-1
> Hello world!  I am process number: 1 on host mesos-slave-2
>  
> The next step is to try invoking this through framework like Marathon. However, the job submission still does not run through Marathon. It seems to gets stuck in the 'waiting' state forever (For example http://149.165.170.245:8080/ui/#/apps/%2Fmaw-try <http://149.165.170.245:8080/ui/#/apps/%2Fmaw-try>). Further, I notice that Marathon is listed under 'inactive frameworks' in mesos dashboard (http://149.165.171.33:5050/#/frameworks <http://149.165.171.33:5050/#/frameworks>).
>  
> I am trying to get this working, though any help/ clues with this would be really helpful.
>  
> Thanks and Regards,
> Mangirish Wagle
> 
> 
> 
>  
> On Fri, Sep 30, 2016 at 9:21 PM, Mangirish Wagle <vaglomangirish@gmail.com <ma...@gmail.com>> wrote:
> Hello Devs,
>  
> I am currently running a sample MPI C program using 'mpiexec' provided by MPICH. I followed their installation guide <http://www.mpich.org/static/downloads/3.2/mpich-3.2-installguide.pdf> to install the libraries on the master and slave nodes of the mesos cluster.
>  
> The approach that I am trying out here is that I am equipping the underlying nodes with MPI handling tools and then use the Mesos framework like Marathon/ Aurora to submit jobs to run MPI programs by invoking these tools.
>  
> You can potentially run an MPI program using mpiexec in the following manner:-
>  
> # mpiexec -f machinefile -n 2 ./mpitest
> machinefile -> File which contains an inventory of machines to run the program on and number of processes on each machine.
> mpitest -> MPI program compiled in C using mpicc compiler. The program returns the process number and he hostname of the machine running the process.
> -n option indicates number of processes that it needs to spawn
> Example of machinefile contents:-
>  
> # Entries in the format <hostname/IP>:<number of processes>
> mesos-slave-1:1
> mesos-slave-2:1
>  
> The reason for choosing slaves is that Mesos runs the jobs on slaves, managed by 'agents' pertaining to the slaves.
>  
> Output of the program with '-n 1':-
>  
> # mpiexec -f machinefile -n 1 ./mpitest
> Hello world!  I am process number: 0 on host mesos-slave-1
>  
> But when I try for '-n 2', I am hitting the following error:-
>  
> # mpiexec -f machinefile -n 2 ./mpitest
> [proxy:0:1@mesos-slave-2] HYDU_sock_connect (/home/centos/mpich-3.2/src/pm/hydra/utils/sock/sock.c:172): unable to connect from "mesos-slave-2" to "mesos-slave-1" (No route to host)
> [proxy:0:1@mesos-slave-2] main (/home/centos/mpich-3.2/src/pm/hydra/pm/pmiserv/pmip.c:189): unable to connect to server mesos-slave-1 at port 44788 (check for firewalls!)
>  
> It seems to not allow the program execution due to network traffic being blocked. I checked security groups in scigap openstack for mesos-slave-1, mesos-slave-2 nodes and it is set to 'wideopen' policy. Furthermore, I tried adding explicit rules to the policies to allow all TCP and UDP (Currently I am not sure what protocol is used underneath), even then it continues throwing this error.
>  
> Any clues, suggestions, comments about the error or approach as a whole would be helpful.
>  
> Thanks and Regards,
> Mangirish Wagle
>  
> Error! Filename not specified.
>  
> On Tue, Sep 27, 2016 at 11:23 AM, Mangirish Wagle <vaglomangirish@gmail.com <ma...@gmail.com>> wrote:
> Hello Devs,
>  
> Thanks Gourav and Shameera for all the work w.r.t. setting up the Mesos-Marathon cluster on Jetstream.
>  
> I am currently evaluating MPICH (http://www.mpich.org/about/overview/ <http://www.mpich.org/about/overview/>) to be used for launching MPI jobs on top of mesos. MPICH version 1.2 supports Mesos based MPI scheduling. I have been also trying to submit jobs to the cluster through Marathon. However, in either cases I am currently facing issues which I am working to get resolved.
>  
> I am compiling my notes into the following google doc. You may please review and let me know your comments, suggestions.
>  
> https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3la25y6bcPcmrTD6nR8g/edit?usp=sharing <https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3la25y6bcPcmrTD6nR8g/edit?usp=sharing>
>  
> Thanks and Regards,
> Mangirish Wagle
> 
> 
> Error! Filename not specified.
>  
> On Wed, Sep 21, 2016 at 3:20 PM, Shenoy, Gourav Ganesh <goshenoy@indiana.edu <ma...@indiana.edu>> wrote:
> Hi Mangirish,
>  
> I have set up a Mesos-Marathon cluster for you on Jetstream. I will share with you with the cluster details in a separate email. Kindly note that there are 3 masters & 2 slaves in this cluster. 
>  
> I am also working on automating this process for Jetstream (similar to Shameera’s ansible script for EC2) and when that is ready, we can create clusters or add/remove slave machines from the cluster.
>  
> Thanks and Regards,
> Gourav Shenoy
>  
> From: Mangirish Wagle <vaglomangirish@gmail.com <ma...@gmail.com>>
> Reply-To: "dev@airavata.apache.org <ma...@airavata.apache.org>" <dev@airavata.apache.org <ma...@airavata.apache.org>>
> Date: Wednesday, September 21, 2016 at 2:36 PM
> To: "dev@airavata.apache.org <ma...@airavata.apache.org>" <dev@airavata.apache.org <ma...@airavata.apache.org>>
> Subject: Running MPI jobs on Mesos based clusters
>  
> Hello All,
>  
> I would like to post for everybody's awareness about the study that I am undertaking this fall, i.e. to evaluate various different frameworks that would facilitate MPI jobs on Mesos based clusters for Apache Airavata.
>  
> Some of the options that I am looking at are:-
> MPI support framework bundled with Mesos
> Apache Aurora
> Marathon
> Chronos
> Some of the evaluation criteria that I am planning to base my investigation are:-
> Ease of setup
> Documentation
> Reliability features like HA
> Scaling and Fault recovery
> Performance
> Community Support
> Gourav and Shameera are working on ansible based automation to spin up a mesos based cluster and I am planning to use it to setup a cluster for experimentation.
>  
> Any suggestions or information about prior work on this would be highly appreciated.
>  
> Thank you.
>  
> Best Regards,
> Mangirish Wagle
> Error! Filename not specified.
>  
>  
>  
>  
>  
>  
>  
>

Re: Running MPI jobs on Mesos based clusters

Posted by "Shenoy, Gourav Ganesh" <go...@indiana.edu>.

Hi Mangirish, devs:

The Aurora documentation for “Tasks” & “Processes” provides very good information which I felt would be helpful in implementing gang scheduling, as you mentioned.

http://aurora.apache.org/documentation/latest/reference/configuration/


From what I understood, there are these constraints:

1.       If targeting single-node (multi-core) MPI, then a “JOB” will be broken down into multiple “PROCESSESES”, each of which will run on these multi-cores.

2.       Even if any one of these processes fail, then the JOB should be marked as failed.

As mentioned in my earlier email, Aurora provides Job abstraction – “a job consists of multiple tasks, which in turn consist of multiple processes”. This abstraction comes in extremely handy if we want to run MPI jobs on a single node.

While submitting a job to Aurora, we can control the following parameters for a TASK:


a.       “max_failures” for a TASK – the number of failed processes which is needed to mark a task as failed. Hence if we set max_failures = 1, then even if a single process in a task fails, Aurora will mark that task as failed.
Note: Since a JOB can have multiple tasks, and even a JOB has “max_task_failures” parameter, we can set this to 1.


b.       “max_concurrency” for a TASK – number of processes to run in parallel. If a node has 16 cores, then we can limit the amount of parallelism to <=16.

I did not get much time to experiment with these parameters for job submission, but found this document to be handy and worth sharing. Hope this helps!

Thanks and Regards,
Gourav Shenoy

From: Mangirish Wagle <va...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Tuesday, October 18, 2016 at 11:48 AM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: Running MPI jobs on Mesos based clusters

Sure Suresh, will update my findings on the mailing list. Thanks!

On Tue, Oct 18, 2016 at 7:59 AM, Suresh Marru <sm...@apache.org>> wrote:
Hi Mangirish,

This is interesting. Looking forward to see what you will find our further on gang scheduling support. Since the compute nodes are getting bigger, even if you can explore single node MPI (on Jetstream using 22 cores) that will help.

Suresh

P.S. Good to see the momentum on mailing list discussions on such topics.

On Oct 18, 2016, at 1:54 AM, Mangirish Wagle <va...@gmail.com>> wrote:

Hello Devs,
Here is an update on some new learnings and thoughts based on my interactions with Mesos and Aurora devs.
MPI implementations in Mesos repositories (like MPI Hydra) rely on obsolete MPI platforms and no longer supported my the developer community. Hence it is not recommended that we use this for our purpose.
One of the known ways of running MPI jobs over mesos is using "gang scheduling" which is basically distributing the MPI run over multiple jobs on mesos in place of multiple nodes. The challenge here is the jobs need to be scheduled as one task and any job errored should collectively error out the main program including all the distributed jobs.

One of the Mesos developer (Niklas Nielsen) pointed me out to his work on gang scheduling: https://github.com/nqn. This code may not be fully tested but certainly a good starting point to explore gang scheduling.

One of the Aurora developer (Stephen Erb) suggests using gang scheduling on top of Aurora. Aurora scheduler assumes that every job is independent. Hence, there would be a need to develop some external scaffolding to coordinate and schedule these jobs, which might not be trivial. One advantage of using Aurora as a backend for gang scheduling is that we would inherit the robustness of Aurora, which otherwise would be a key challenge if targeting bare mesos.
Alternative to all the options above, I think we should probably be able to run a 1 node MPI job through Aurora. A resource offer with CPUs and Memory from Mesos is abstracted as a single runtime, but is mapped to multiple nodes underneath, which eventually would exploit distributed resource capabilities.
I intend to try out the 1 node MPI job submission approach first and simultaneously explore the gang scheduling approach.
Please let me know your thoughts/ suggestions.
Best Regards,
Mangirish


On Thu, Oct 13, 2016 at 12:39 PM, Mangirish Wagle <va...@gmail.com>> wrote:
Hi Marlon,
Thanks for confirming and sharing the legal link.
-Mangirish

On Thu, Oct 13, 2016 at 12:13 PM, Pierce, Marlon <ma...@iu.edu>> wrote:
BSD is ok: https://www.apache.org/legal/resolved.

From: Mangirish Wagle <va...@gmail.com>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Thursday, October 13, 2016 at 12:03 PM
To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Re: Running MPI jobs on Mesos based clusters

Hello Devs,
I needed some advice on the license of the MPI libraries. The MPICH library that I have been trying claims to have a "BSD Like" license (http://git.mpich.org/mpich.git/blob/HEAD:/COPYRIGHT).
I am aware that OpenMPI which uses BSD license is currently used in our application. I had chosen to start investigating MPICH because it claims to be a highly portable and high quality implementation of latest MPI standard, suitable to cloud based clusters.
If anyone could please advise on the acceptance of the MPICH libraries MSD Like license for ASF, that would help.
Thank you.
Best Regards,
Mangirish Wagle

On Thu, Oct 6, 2016 at 1:48 AM, Mangirish Wagle <va...@gmail.com>> wrote:
Hello Devs,

The network issue mentioned above now stands resolved. The problem was with the iptables had some conflicting rules which blocked the traffic. It was resolved by simple iptables flush.

Here is the test MPI program running on multiple machines:-

[centos@mesos-slave-1 ~]$ mpiexec -f machinefile -n 2 ./mpitest
Hello world!  I am process number: 0 on host mesos-slave-1
Hello world!  I am process number: 1 on host mesos-slave-2

The next step is to try invoking this through framework like Marathon. However, the job submission still does not run through Marathon. It seems to gets stuck in the 'waiting' state forever (For example http://149.165.170.245:8080/ui/#/apps/%2Fmaw-try). Further, I notice that Marathon is listed under 'inactive frameworks' in mesos dashboard (http://149.165.171.33:5050/#/frameworks).

I am trying to get this working, though any help/ clues with this would be really helpful.

Thanks and Regards,
Mangirish Wagle



On Fri, Sep 30, 2016 at 9:21 PM, Mangirish Wagle <va...@gmail.com>> wrote:
Hello Devs,

I am currently running a sample MPI C program using 'mpiexec' provided by MPICH. I followed their installation guide<http://www.mpich.org/static/downloads/3.2/mpich-3.2-installguide.pdf> to install the libraries on the master and slave nodes of the mesos cluster.

The approach that I am trying out here is that I am equipping the underlying nodes with MPI handling tools and then use the Mesos framework like Marathon/ Aurora to submit jobs to run MPI programs by invoking these tools.

You can potentially run an MPI program using mpiexec in the following manner:-

# mpiexec -f machinefile -n 2 ./mpitest

  *   machinefile -> File which contains an inventory of machines to run the program on and number of processes on each machine.
  *   mpitest -> MPI program compiled in C using mpicc compiler. The program returns the process number and he hostname of the machine running the process.
  *   -n option indicates number of processes that it needs to spawn
Example of machinefile contents:-

# Entries in the format <hostname/IP>:<number of processes>
mesos-slave-1:1
mesos-slave-2:1

The reason for choosing slaves is that Mesos runs the jobs on slaves, managed by 'agents' pertaining to the slaves.

Output of the program with '-n 1':-

# mpiexec -f machinefile -n 1 ./mpitest
Hello world!  I am process number: 0 on host mesos-slave-1

But when I try for '-n 2', I am hitting the following error:-

# mpiexec -f machinefile -n 2 ./mpitest
[proxy:0:1@mesos-slave-2] HYDU_sock_connect (/home/centos/mpich-3.2/src/pm/hydra/utils/sock/sock.c:172): unable to connect from "mesos-slave-2" to "mesos-slave-1" (No route to host)
[proxy:0:1@mesos-slave-2] main (/home/centos/mpich-3.2/src/pm/hydra/pm/pmiserv/pmip.c:189): unable to connect to server mesos-slave-1 at port 44788 (check for firewalls!)

It seems to not allow the program execution due to network traffic being blocked. I checked security groups in scigap openstack for mesos-slave-1, mesos-slave-2 nodes and it is set to 'wideopen' policy. Furthermore, I tried adding explicit rules to the policies to allow all TCP and UDP (Currently I am not sure what protocol is used underneath), even then it continues throwing this error.

Any clues, suggestions, comments about the error or approach as a whole would be helpful.

Thanks and Regards,
Mangirish Wagle

Error! Filename not specified.

On Tue, Sep 27, 2016 at 11:23 AM, Mangirish Wagle <va...@gmail.com>> wrote:
Hello Devs,

Thanks Gourav and Shameera for all the work w.r.t. setting up the Mesos-Marathon cluster on Jetstream.

I am currently evaluating MPICH (http://www.mpich.org/about/overview/) to be used for launching MPI jobs on top of mesos. MPICH version 1.2 supports Mesos based MPI scheduling. I have been also trying to submit jobs to the cluster through Marathon. However, in either cases I am currently facing issues which I am working to get resolved.

I am compiling my notes into the following google doc. You may please review and let me know your comments, suggestions.

https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3la25y6bcPcmrTD6nR8g/edit?usp=sharing

Thanks and Regards,
Mangirish Wagle


Error! Filename not specified.

On Wed, Sep 21, 2016 at 3:20 PM, Shenoy, Gourav Ganesh <go...@indiana.edu>> wrote:
Hi Mangirish,

I have set up a Mesos-Marathon cluster for you on Jetstream. I will share with you with the cluster details in a separate email. Kindly note that there are 3 masters & 2 slaves in this cluster.

I am also working on automating this process for Jetstream (similar to Shameera’s ansible script for EC2) and when that is ready, we can create clusters or add/remove slave machines from the cluster.

Thanks and Regards,
Gourav Shenoy

From: Mangirish Wagle <va...@gmail.com>>
Reply-To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Date: Wednesday, September 21, 2016 at 2:36 PM
To: "dev@airavata.apache.org<ma...@airavata.apache.org>" <de...@airavata.apache.org>>
Subject: Running MPI jobs on Mesos based clusters

Hello All,

I would like to post for everybody's awareness about the study that I am undertaking this fall, i.e. to evaluate various different frameworks that would facilitate MPI jobs on Mesos based clusters for Apache Airavata.

Some of the options that I am looking at are:-

  1.  MPI support framework bundled with Mesos
  2.  Apache Aurora
  3.  Marathon
  4.  Chronos
Some of the evaluation criteria that I am planning to base my investigation are:-

  *   Ease of setup
  *   Documentation
  *   Reliability features like HA
  *   Scaling and Fault recovery
  *   Performance
  *   Community Support
Gourav and Shameera are working on ansible based automation to spin up a mesos based cluster and I am planning to use it to setup a cluster for experimentation.

Any suggestions or information about prior work on this would be highly appreciated.

Thank you.

Best Regards,
Mangirish Wagle
Error! Filename not specified.

Re: Running MPI jobs on Mesos based clusters

Posted by Mangirish Wagle <va...@gmail.com>.

Sure Suresh, will update my findings on the mailing list. Thanks!

On Tue, Oct 18, 2016 at 7:59 AM, Suresh Marru <sm...@apache.org> wrote:

> Hi Mangirish,
>
> This is interesting. Looking forward to see what you will find our further
> on gang scheduling support. Since the compute nodes are getting bigger,
> even if you can explore single node MPI (on Jetstream using 22 cores) that
> will help.
>
> Suresh
>
> P.S. Good to see the momentum on mailing list discussions on such topics.
>
> On Oct 18, 2016, at 1:54 AM, Mangirish Wagle <va...@gmail.com>
> wrote:
>
> Hello Devs,
>
> Here is an update on some new learnings and thoughts based on my
> interactions with Mesos and Aurora devs.
>
> MPI implementations in Mesos repositories (like MPI Hydra) rely on
> obsolete MPI platforms and no longer supported my the developer community.
> Hence it is not recommended that we use this for our purpose.
>
> One of the known ways of running MPI jobs over mesos is using "gang
> scheduling" which is basically distributing the MPI run over multiple jobs
> on mesos in place of multiple nodes. The challenge here is the jobs need to
> be scheduled as one task and any job errored should collectively error out
> the main program including all the distributed jobs.
>
> One of the Mesos developer (Niklas Nielsen) pointed me out to his work on
> gang scheduling: https://github.com/nqn. This code may not be fully
> tested but certainly a good starting point to explore gang scheduling.
>
> One of the Aurora developer (Stephen Erb) suggests using gang scheduling
> on top of Aurora. Aurora scheduler assumes that every job is independent.
> Hence, there would be a need to develop some external scaffolding to
> coordinate and schedule these jobs, which might not be trivial. One
> advantage of using Aurora as a backend for gang scheduling is that we would
> inherit the robustness of Aurora, which otherwise would be a key challenge
> if targeting bare mesos.
>
> Alternative to all the options above, I think we should probably be able
> to run a 1 node MPI job through Aurora. A resource offer with CPUs and
> Memory from Mesos is abstracted as a single runtime, but is mapped to
> multiple nodes underneath, which eventually would exploit distributed
> resource capabilities.
>
> I intend to try out the 1 node MPI job submission approach first and
> simultaneously explore the gang scheduling approach.
>
> Please let me know your thoughts/ suggestions.
>
> Best Regards,
> Mangirish
>
>
>
> On Thu, Oct 13, 2016 at 12:39 PM, Mangirish Wagle <
> vaglomangirish@gmail.com> wrote:
>
>> Hi Marlon,
>> Thanks for confirming and sharing the legal link.
>>
>> -Mangirish
>>
>> On Thu, Oct 13, 2016 at 12:13 PM, Pierce, Marlon <ma...@iu.edu> wrote:
>>
>>> BSD is ok: https://www.apache.org/legal/resolved.
>>>
>>>
>>>
>>> *From: *Mangirish Wagle <va...@gmail.com>
>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Date: *Thursday, October 13, 2016 at 12:03 PM
>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Subject: *Re: Running MPI jobs on Mesos based clusters
>>>
>>>
>>>
>>> Hello Devs,
>>>
>>> I needed some advice on the license of the MPI libraries. The MPICH
>>> library that I have been trying claims to have a "BSD Like" license (
>>> http://git.mpich.org/mpich.git/blob/HEAD:/COPYRIGHT).
>>>
>>> I am aware that OpenMPI which uses BSD license is currently used in our
>>> application. I had chosen to start investigating MPICH because it claims to
>>> be a highly portable and high quality implementation of latest MPI
>>> standard, suitable to cloud based clusters.
>>>
>>> If anyone could please advise on the acceptance of the MPICH libraries
>>> MSD Like license for ASF, that would help.
>>>
>>> Thank you.
>>>
>>> Best Regards,
>>>
>>> Mangirish Wagle
>>>
>>>
>>>
>>> On Thu, Oct 6, 2016 at 1:48 AM, Mangirish Wagle <
>>> vaglomangirish@gmail.com> wrote:
>>>
>>> Hello Devs,
>>>
>>>
>>>
>>> The network issue mentioned above now stands resolved. The problem was
>>> with the iptables had some conflicting rules which blocked the traffic. It
>>> was resolved by simple iptables flush.
>>>
>>>
>>>
>>> Here is the test MPI program running on multiple machines:-
>>>
>>>
>>>
>>> [centos@mesos-slave-1 ~]$ mpiexec -f machinefile -n 2 ./mpitest
>>>
>>> Hello world!  I am process number: 0 on host mesos-slave-1
>>>
>>> Hello world!  I am process number: 1 on host mesos-slave-2
>>>
>>>
>>>
>>> The next step is to try invoking this through framework like Marathon.
>>> However, the job submission still does not run through Marathon. It seems
>>> to gets stuck in the 'waiting' state forever (For example
>>> http://149.165.170.245:8080/ui/#/apps/%2Fmaw-try). Further, I notice
>>> that Marathon is listed under 'inactive frameworks' in mesos dashboard (
>>> http://149.165.171.33:5050/#/frameworks).
>>>
>>>
>>>
>>> I am trying to get this working, though any help/ clues with this would
>>> be really helpful.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Mangirish Wagle
>>>
>>>
>>>
>>>
>>> On Fri, Sep 30, 2016 at 9:21 PM, Mangirish Wagle <
>>> vaglomangirish@gmail.com> wrote:
>>>
>>> Hello Devs,
>>>
>>>
>>>
>>> I am currently running a sample MPI C program using 'mpiexec' provided
>>> by MPICH. I followed their installation guide
>>> <http://www.mpich.org/static/downloads/3.2/mpich-3.2-installguide.pdf> to
>>> install the libraries on the master and slave nodes of the mesos cluster.
>>>
>>>
>>>
>>> The approach that I am trying out here is that I am equipping the
>>> underlying nodes with MPI handling tools and then use the Mesos framework
>>> like Marathon/ Aurora to submit jobs to run MPI programs by invoking these
>>> tools.
>>>
>>>
>>>
>>> You can potentially run an MPI program using mpiexec in the following
>>> manner:-
>>>
>>>
>>>
>>> # *mpiexec -f machinefile -n 2 ./mpitest*
>>>
>>>    - *machinefile *-> File which contains an inventory of machines to
>>>    run the program on and number of processes on each machine.
>>>    - *mpitest *-> MPI program compiled in C using mpicc compiler. The
>>>    program returns the process number and he hostname of the machine running
>>>    the process.
>>>    - *-n *option indicates number of processes that it needs to spawn
>>>
>>> Example of machinefile contents:-
>>>
>>>
>>>
>>> # Entries in the format <hostname/IP>:<number of processes>
>>>
>>> mesos-slave-1:1
>>>
>>> mesos-slave-2:1
>>>
>>>
>>>
>>> The reason for choosing slaves is that Mesos runs the jobs on slaves,
>>> managed by 'agents' pertaining to the slaves.
>>>
>>>
>>>
>>> Output of the program with '-n 1':-
>>>
>>>
>>>
>>> # mpiexec -f machinefile -n 1 ./mpitest
>>>
>>> Hello world!  I am process number: 0 on host mesos-slave-1
>>>
>>>
>>>
>>> But when I try for '-n 2', I am hitting the following error:-
>>>
>>>
>>>
>>> # mpiexec -f machinefile -n 2 ./mpitest
>>>
>>> [proxy:0:1@mesos-slave-2] HYDU_sock_connect
>>> (/home/centos/mpich-3.2/src/pm/hydra/utils/sock/sock.c:172): unable to
>>> connect from "mesos-slave-2" to "mesos-slave-1" (No route to host)
>>>
>>> [proxy:0:1@mesos-slave-2] main (/home/centos/mpich-3.2/src/pm/hydra/pm/pmiserv/pmip.c:189):
>>> *unable to connect to server mesos-slave-1 at port 44788* (check for
>>> firewalls!)
>>>
>>>
>>>
>>> It seems to not allow the program execution due to network traffic being
>>> blocked. I checked security groups in scigap openstack for mesos-slave-1,
>>> mesos-slave-2 nodes and it is set to 'wideopen' policy. Furthermore, I
>>> tried adding explicit rules to the policies to allow all TCP and UDP
>>> (Currently I am not sure what protocol is used underneath), even then it
>>> continues throwing this error.
>>>
>>>
>>>
>>> Any clues, suggestions, comments about the error or approach as a whole
>>> would be helpful.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Mangirish Wagle
>>>
>>>
>>>
>>> *Error! Filename not specified.*
>>>
>>>
>>>
>>> On Tue, Sep 27, 2016 at 11:23 AM, Mangirish Wagle <
>>> vaglomangirish@gmail.com> wrote:
>>>
>>> Hello Devs,
>>>
>>>
>>>
>>> Thanks Gourav and Shameera for all the work w.r.t. setting up the
>>> Mesos-Marathon cluster on Jetstream.
>>>
>>>
>>>
>>> I am currently evaluating MPICH (http://www.mpich.org/about/overview/)
>>> to be used for launching MPI jobs on top of mesos. MPICH version 1.2
>>> supports Mesos based MPI scheduling. I have been also trying to submit jobs
>>> to the cluster through Marathon. However, in either cases I am currently
>>> facing issues which I am working to get resolved.
>>>
>>>
>>>
>>> I am compiling my notes into the following google doc. You may please
>>> review and let me know your comments, suggestions.
>>>
>>>
>>>
>>> https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3
>>> la25y6bcPcmrTD6nR8g/edit?usp=sharing
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Mangirish Wagle
>>>
>>>
>>>
>>> *Error! Filename not specified.*
>>>
>>>
>>>
>>> On Wed, Sep 21, 2016 at 3:20 PM, Shenoy, Gourav Ganesh <
>>> goshenoy@indiana.edu> wrote:
>>>
>>> Hi Mangirish,
>>>
>>>
>>>
>>> I have set up a Mesos-Marathon cluster for you on Jetstream. I will
>>> share with you with the cluster details in a separate email. Kindly note
>>> that there are 3 masters & 2 slaves in this cluster.
>>>
>>>
>>>
>>> I am also working on automating this process for Jetstream (similar to
>>> Shameera’s ansible script for EC2) and when that is ready, we can create
>>> clusters or add/remove slave machines from the cluster.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Gourav Shenoy
>>>
>>>
>>>
>>> *From: *Mangirish Wagle <va...@gmail.com>
>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Date: *Wednesday, September 21, 2016 at 2:36 PM
>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Subject: *Running MPI jobs on Mesos based clusters
>>>
>>>
>>>
>>> Hello All,
>>>
>>>
>>>
>>> I would like to post for everybody's awareness about the study that I am
>>> undertaking this fall, i.e. to evaluate various different frameworks that
>>> would facilitate MPI jobs on Mesos based clusters for Apache Airavata.
>>>
>>>
>>>
>>> Some of the options that I am looking at are:-
>>>
>>>    1. MPI support framework bundled with Mesos
>>>    2. Apache Aurora
>>>    3. Marathon
>>>    4. Chronos
>>>
>>> Some of the evaluation criteria that I am planning to base my
>>> investigation are:-
>>>
>>>    - Ease of setup
>>>    - Documentation
>>>    - Reliability features like HA
>>>    - Scaling and Fault recovery
>>>    - Performance
>>>    - Community Support
>>>
>>> Gourav and Shameera are working on ansible based automation to spin up a
>>> mesos based cluster and I am planning to use it to setup a cluster for
>>> experimentation.
>>>
>>>
>>>
>>> Any suggestions or information about prior work on this would be highly
>>> appreciated.
>>>
>>>
>>>
>>> Thank you.
>>>
>>>
>>>
>>> Best Regards,
>>>
>>> Mangirish Wagle
>>>
>>> *Error! Filename not specified.*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>

Re: Running MPI jobs on Mesos based clusters

Posted by Suresh Marru <sm...@apache.org>.

Hi Mangirish,

This is interesting. Looking forward to see what you will find our further on gang scheduling support. Since the compute nodes are getting bigger, even if you can explore single node MPI (on Jetstream using 22 cores) that will help.  

Suresh

P.S. Good to see the momentum on mailing list discussions on such topics. 

> On Oct 18, 2016, at 1:54 AM, Mangirish Wagle <va...@gmail.com> wrote:
> 
> Hello Devs,
> 
> Here is an update on some new learnings and thoughts based on my interactions with Mesos and Aurora devs.
> 
> MPI implementations in Mesos repositories (like MPI Hydra) rely on obsolete MPI platforms and no longer supported my the developer community. Hence it is not recommended that we use this for our purpose.
> 
> One of the known ways of running MPI jobs over mesos is using "gang scheduling" which is basically distributing the MPI run over multiple jobs on mesos in place of multiple nodes. The challenge here is the jobs need to be scheduled as one task and any job errored should collectively error out the main program including all the distributed jobs. 
> 
> One of the Mesos developer (Niklas Nielsen) pointed me out to his work on gang scheduling: https://github.com/nqn <https://github.com/nqn>. This code may not be fully tested but certainly a good starting point to explore gang scheduling.
> 
> One of the Aurora developer (Stephen Erb) suggests using gang scheduling on top of Aurora. Aurora scheduler assumes that every job is independent. Hence, there would be a need to develop some external scaffolding to coordinate and schedule these jobs, which might not be trivial. One advantage of using Aurora as a backend for gang scheduling is that we would inherit the robustness of Aurora, which otherwise would be a key challenge if targeting bare mesos.
> 
> Alternative to all the options above, I think we should probably be able to run a 1 node MPI job through Aurora. A resource offer with CPUs and Memory from Mesos is abstracted as a single runtime, but is mapped to multiple nodes underneath, which eventually would exploit distributed resource capabilities.
> 
> I intend to try out the 1 node MPI job submission approach first and simultaneously explore the gang scheduling approach.
> 
> Please let me know your thoughts/ suggestions.
> 
> Best Regards,
> Mangirish 
> 
> 
> 
> On Thu, Oct 13, 2016 at 12:39 PM, Mangirish Wagle <vaglomangirish@gmail.com <ma...@gmail.com>> wrote:
> Hi Marlon,
> Thanks for confirming and sharing the legal link.
> 
> -Mangirish
> 
> On Thu, Oct 13, 2016 at 12:13 PM, Pierce, Marlon <marpierc@iu.edu <ma...@iu.edu>> wrote:
> BSD is ok: https://www.apache.org/legal/resolved <https://www.apache.org/legal/resolved>.
> 
>  
> 
> From: Mangirish Wagle <vaglomangirish@gmail.com <ma...@gmail.com>>
> Reply-To: "dev@airavata.apache.org <ma...@airavata.apache.org>" <dev@airavata.apache.org <ma...@airavata.apache.org>>
> Date: Thursday, October 13, 2016 at 12:03 PM
> To: "dev@airavata.apache.org <ma...@airavata.apache.org>" <dev@airavata.apache.org <ma...@airavata.apache.org>>
> Subject: Re: Running MPI jobs on Mesos based clusters
> 
>  
> 
> Hello Devs,
> 
> I needed some advice on the license of the MPI libraries. The MPICH library that I have been trying claims to have a "BSD Like" license (http://git.mpich.org/mpich.git/blob/HEAD:/COPYRIGHT <http://git.mpich.org/mpich.git/blob/HEAD:/COPYRIGHT>).
> 
> I am aware that OpenMPI which uses BSD license is currently used in our application. I had chosen to start investigating MPICH because it claims to be a highly portable and high quality implementation of latest MPI standard, suitable to cloud based clusters.
> 
> If anyone could please advise on the acceptance of the MPICH libraries MSD Like license for ASF, that would help.
> 
> Thank you.
> 
> Best Regards,
> 
> Mangirish Wagle
> 
>  
> 
> On Thu, Oct 6, 2016 at 1:48 AM, Mangirish Wagle <vaglomangirish@gmail.com <ma...@gmail.com>> wrote:
> 
> Hello Devs,
> 
>  
> 
> The network issue mentioned above now stands resolved. The problem was with the iptables had some conflicting rules which blocked the traffic. It was resolved by simple iptables flush.
> 
>  
> 
> Here is the test MPI program running on multiple machines:-
> 
>  
> 
> [centos@mesos-slave-1 ~]$ mpiexec -f machinefile -n 2 ./mpitest
> 
> Hello world!  I am process number: 0 on host mesos-slave-1
> 
> Hello world!  I am process number: 1 on host mesos-slave-2
> 
>  
> 
> The next step is to try invoking this through framework like Marathon. However, the job submission still does not run through Marathon. It seems to gets stuck in the 'waiting' state forever (For example http://149.165.170.245:8080/ui/#/apps/%2Fmaw-try <http://149.165.170.245:8080/ui/#/apps/%2Fmaw-try>). Further, I notice that Marathon is listed under 'inactive frameworks' in mesos dashboard (http://149.165.171.33:5050/#/frameworks <http://149.165.171.33:5050/#/frameworks>).
> 
>  
> 
> I am trying to get this working, though any help/ clues with this would be really helpful.
> 
>  
> 
> Thanks and Regards,
> 
> Mangirish Wagle
> 
> 
> 
> 
> 
>  
> 
> On Fri, Sep 30, 2016 at 9:21 PM, Mangirish Wagle <vaglomangirish@gmail.com <ma...@gmail.com>> wrote:
> 
> Hello Devs,
> 
>  
> 
> I am currently running a sample MPI C program using 'mpiexec' provided by MPICH. I followed their installation guide <http://www.mpich.org/static/downloads/3.2/mpich-3.2-installguide.pdf> to install the libraries on the master and slave nodes of the mesos cluster.
> 
>  
> 
> The approach that I am trying out here is that I am equipping the underlying nodes with MPI handling tools and then use the Mesos framework like Marathon/ Aurora to submit jobs to run MPI programs by invoking these tools.
> 
>  
> 
> You can potentially run an MPI program using mpiexec in the following manner:-
> 
>  
> 
> # mpiexec -f machinefile -n 2 ./mpitest
> 
> machinefile -> File which contains an inventory of machines to run the program on and number of processes on each machine.
> mpitest -> MPI program compiled in C using mpicc compiler. The program returns the process number and he hostname of the machine running the process.
> -n option indicates number of processes that it needs to spawn
> Example of machinefile contents:-
> 
>  
> 
> # Entries in the format <hostname/IP>:<number of processes>
> 
> mesos-slave-1:1
> 
> mesos-slave-2:1
> 
>  
> 
> The reason for choosing slaves is that Mesos runs the jobs on slaves, managed by 'agents' pertaining to the slaves.
> 
>  
> 
> Output of the program with '-n 1':-
> 
>  
> 
> # mpiexec -f machinefile -n 1 ./mpitest
> 
> Hello world!  I am process number: 0 on host mesos-slave-1
> 
>  
> 
> But when I try for '-n 2', I am hitting the following error:-
> 
>  
> 
> # mpiexec -f machinefile -n 2 ./mpitest
> 
> [proxy:0:1@mesos-slave-2] HYDU_sock_connect (/home/centos/mpich-3.2/src/pm/hydra/utils/sock/sock.c:172): unable to connect from "mesos-slave-2" to "mesos-slave-1" (No route to host)
> 
> [proxy:0:1@mesos-slave-2] main (/home/centos/mpich-3.2/src/pm/hydra/pm/pmiserv/pmip.c:189): unable to connect to server mesos-slave-1 at port 44788 (check for firewalls!)
> 
>  
> 
> It seems to not allow the program execution due to network traffic being blocked. I checked security groups in scigap openstack for mesos-slave-1, mesos-slave-2 nodes and it is set to 'wideopen' policy. Furthermore, I tried adding explicit rules to the policies to allow all TCP and UDP (Currently I am not sure what protocol is used underneath), even then it continues throwing this error.
> 
>  
> 
> Any clues, suggestions, comments about the error or approach as a whole would be helpful.
> 
>  
> 
> Thanks and Regards,
> 
> Mangirish Wagle
> 
>  
> 
> Error! Filename not specified.
> 
>  
> 
> On Tue, Sep 27, 2016 at 11:23 AM, Mangirish Wagle <vaglomangirish@gmail.com <ma...@gmail.com>> wrote:
> 
> Hello Devs,
> 
>  
> 
> Thanks Gourav and Shameera for all the work w.r.t. setting up the Mesos-Marathon cluster on Jetstream.
> 
>  
> 
> I am currently evaluating MPICH (http://www.mpich.org/about/overview/ <http://www.mpich.org/about/overview/>) to be used for launching MPI jobs on top of mesos. MPICH version 1.2 supports Mesos based MPI scheduling. I have been also trying to submit jobs to the cluster through Marathon. However, in either cases I am currently facing issues which I am working to get resolved.
> 
>  
> 
> I am compiling my notes into the following google doc. You may please review and let me know your comments, suggestions.
> 
>  
> 
> https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3la25y6bcPcmrTD6nR8g/edit?usp=sharing <https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3la25y6bcPcmrTD6nR8g/edit?usp=sharing>
>  
> 
> Thanks and Regards,
> 
> Mangirish Wagle
> 
> 
> 
> Error! Filename not specified.
> 
>  
> 
> On Wed, Sep 21, 2016 at 3:20 PM, Shenoy, Gourav Ganesh <goshenoy@indiana.edu <ma...@indiana.edu>> wrote:
> 
> Hi Mangirish,
> 
>  
> 
> I have set up a Mesos-Marathon cluster for you on Jetstream. I will share with you with the cluster details in a separate email. Kindly note that there are 3 masters & 2 slaves in this cluster.
> 
>  
> 
> I am also working on automating this process for Jetstream (similar to Shameera’s ansible script for EC2) and when that is ready, we can create clusters or add/remove slave machines from the cluster.
> 
>  
> 
> Thanks and Regards,
> 
> Gourav Shenoy
> 
>  
> 
> From: Mangirish Wagle <vaglomangirish@gmail.com <ma...@gmail.com>>
> Reply-To: "dev@airavata.apache.org <ma...@airavata.apache.org>" <dev@airavata.apache.org <ma...@airavata.apache.org>>
> Date: Wednesday, September 21, 2016 at 2:36 PM
> To: "dev@airavata.apache.org <ma...@airavata.apache.org>" <dev@airavata.apache.org <ma...@airavata.apache.org>>
> Subject: Running MPI jobs on Mesos based clusters
> 
>  
> 
> Hello All,
> 
>  
> 
> I would like to post for everybody's awareness about the study that I am undertaking this fall, i.e. to evaluate various different frameworks that would facilitate MPI jobs on Mesos based clusters for Apache Airavata.
> 
>  
> 
> Some of the options that I am looking at are:-
> 
> MPI support framework bundled with Mesos
> Apache Aurora
> Marathon
> Chronos
> Some of the evaluation criteria that I am planning to base my investigation are:-
> 
> Ease of setup
> Documentation
> Reliability features like HA
> Scaling and Fault recovery
> Performance
> Community Support
> Gourav and Shameera are working on ansible based automation to spin up a mesos based cluster and I am planning to use it to setup a cluster for experimentation.
> 
>  
> 
> Any suggestions or information about prior work on this would be highly appreciated.
> 
>  
> 
> Thank you.
> 
>  
> 
> Best Regards,
> 
> Mangirish Wagle
> 
> Error! Filename not specified.
> 
>  
> 
>  
> 
>  
> 
>  
> 
> 
>

Re: Running MPI jobs on Mesos based clusters

Posted by Mangirish Wagle <va...@gmail.com>.

Hello Devs,

Here is an update on some new learnings and thoughts based on my
interactions with Mesos and Aurora devs.

MPI implementations in Mesos repositories (like MPI Hydra) rely on obsolete
MPI platforms and no longer supported my the developer community. Hence it
is not recommended that we use this for our purpose.

One of the known ways of running MPI jobs over mesos is using "gang
scheduling" which is basically distributing the MPI run over multiple jobs
on mesos in place of multiple nodes. The challenge here is the jobs need to
be scheduled as one task and any job errored should collectively error out
the main program including all the distributed jobs.

One of the Mesos developer (Niklas Nielsen) pointed me out to his work on
gang scheduling: https://github.com/nqn. This code may not be fully tested
but certainly a good starting point to explore gang scheduling.

One of the Aurora developer (Stephen Erb) suggests using gang scheduling on
top of Aurora. Aurora scheduler assumes that every job is independent.
Hence, there would be a need to develop some external scaffolding to
coordinate and schedule these jobs, which might not be trivial. One
advantage of using Aurora as a backend for gang scheduling is that we would
inherit the robustness of Aurora, which otherwise would be a key challenge
if targeting bare mesos.

Alternative to all the options above, I think we should probably be able to
run a 1 node MPI job through Aurora. A resource offer with CPUs and Memory
from Mesos is abstracted as a single runtime, but is mapped to multiple
nodes underneath, which eventually would exploit distributed resource
capabilities.

I intend to try out the 1 node MPI job submission approach first and
simultaneously explore the gang scheduling approach.

Please let me know your thoughts/ suggestions.

Best Regards,
Mangirish



On Thu, Oct 13, 2016 at 12:39 PM, Mangirish Wagle <va...@gmail.com>
wrote:

> Hi Marlon,
> Thanks for confirming and sharing the legal link.
>
> -Mangirish
>
> On Thu, Oct 13, 2016 at 12:13 PM, Pierce, Marlon <ma...@iu.edu> wrote:
>
>> BSD is ok: https://www.apache.org/legal/resolved.
>>
>>
>>
>> *From: *Mangirish Wagle <va...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Thursday, October 13, 2016 at 12:03 PM
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Re: Running MPI jobs on Mesos based clusters
>>
>>
>>
>> Hello Devs,
>>
>> I needed some advice on the license of the MPI libraries. The MPICH
>> library that I have been trying claims to have a "BSD Like" license (
>> http://git.mpich.org/mpich.git/blob/HEAD:/COPYRIGHT).
>>
>> I am aware that OpenMPI which uses BSD license is currently used in our
>> application. I had chosen to start investigating MPICH because it claims to
>> be a highly portable and high quality implementation of latest MPI
>> standard, suitable to cloud based clusters.
>>
>> If anyone could please advise on the acceptance of the MPICH libraries
>> MSD Like license for ASF, that would help.
>>
>> Thank you.
>>
>> Best Regards,
>>
>> Mangirish Wagle
>>
>>
>>
>> On Thu, Oct 6, 2016 at 1:48 AM, Mangirish Wagle <va...@gmail.com>
>> wrote:
>>
>> Hello Devs,
>>
>>
>>
>> The network issue mentioned above now stands resolved. The problem was
>> with the iptables had some conflicting rules which blocked the traffic. It
>> was resolved by simple iptables flush.
>>
>>
>>
>> Here is the test MPI program running on multiple machines:-
>>
>>
>>
>> [centos@mesos-slave-1 ~]$ mpiexec -f machinefile -n 2 ./mpitest
>>
>> Hello world!  I am process number: 0 on host mesos-slave-1
>>
>> Hello world!  I am process number: 1 on host mesos-slave-2
>>
>>
>>
>> The next step is to try invoking this through framework like Marathon.
>> However, the job submission still does not run through Marathon. It seems
>> to gets stuck in the 'waiting' state forever (For example
>> http://149.165.170.245:8080/ui/#/apps/%2Fmaw-try). Further, I notice
>> that Marathon is listed under 'inactive frameworks' in mesos dashboard (
>> http://149.165.171.33:5050/#/frameworks).
>>
>>
>>
>> I am trying to get this working, though any help/ clues with this would
>> be really helpful.
>>
>>
>>
>> Thanks and Regards,
>>
>> Mangirish Wagle
>>
>>
>>
>>
>> On Fri, Sep 30, 2016 at 9:21 PM, Mangirish Wagle <
>> vaglomangirish@gmail.com> wrote:
>>
>> Hello Devs,
>>
>>
>>
>> I am currently running a sample MPI C program using 'mpiexec' provided by
>> MPICH. I followed their installation guide
>> <http://www.mpich.org/static/downloads/3.2/mpich-3.2-installguide.pdf> to
>> install the libraries on the master and slave nodes of the mesos cluster.
>>
>>
>>
>> The approach that I am trying out here is that I am equipping the
>> underlying nodes with MPI handling tools and then use the Mesos framework
>> like Marathon/ Aurora to submit jobs to run MPI programs by invoking these
>> tools.
>>
>>
>>
>> You can potentially run an MPI program using mpiexec in the following
>> manner:-
>>
>>
>>
>> # *mpiexec -f machinefile -n 2 ./mpitest*
>>
>>    - *machinefile *-> File which contains an inventory of machines to
>>    run the program on and number of processes on each machine.
>>    - *mpitest *-> MPI program compiled in C using mpicc compiler. The
>>    program returns the process number and he hostname of the machine running
>>    the process.
>>    - *-n *option indicates number of processes that it needs to spawn
>>
>> Example of machinefile contents:-
>>
>>
>>
>> # Entries in the format <hostname/IP>:<number of processes>
>>
>> mesos-slave-1:1
>>
>> mesos-slave-2:1
>>
>>
>>
>> The reason for choosing slaves is that Mesos runs the jobs on slaves,
>> managed by 'agents' pertaining to the slaves.
>>
>>
>>
>> Output of the program with '-n 1':-
>>
>>
>>
>> # mpiexec -f machinefile -n 1 ./mpitest
>>
>> Hello world!  I am process number: 0 on host mesos-slave-1
>>
>>
>>
>> But when I try for '-n 2', I am hitting the following error:-
>>
>>
>>
>> # mpiexec -f machinefile -n 2 ./mpitest
>>
>> [proxy:0:1@mesos-slave-2] HYDU_sock_connect
>> (/home/centos/mpich-3.2/src/pm/hydra/utils/sock/sock.c:172): unable to
>> connect from "mesos-slave-2" to "mesos-slave-1" (No route to host)
>>
>> [proxy:0:1@mesos-slave-2] main (/home/centos/mpich-3.2/src/pm/hydra/pm/pmiserv/pmip.c:189):
>> *unable to connect to server mesos-slave-1 at port 44788* (check for
>> firewalls!)
>>
>>
>>
>> It seems to not allow the program execution due to network traffic being
>> blocked. I checked security groups in scigap openstack for mesos-slave-1,
>> mesos-slave-2 nodes and it is set to 'wideopen' policy. Furthermore, I
>> tried adding explicit rules to the policies to allow all TCP and UDP
>> (Currently I am not sure what protocol is used underneath), even then it
>> continues throwing this error.
>>
>>
>>
>> Any clues, suggestions, comments about the error or approach as a whole
>> would be helpful.
>>
>>
>>
>> Thanks and Regards,
>>
>> Mangirish Wagle
>>
>>
>>
>> *Error! Filename not specified.*
>>
>>
>>
>> On Tue, Sep 27, 2016 at 11:23 AM, Mangirish Wagle <
>> vaglomangirish@gmail.com> wrote:
>>
>> Hello Devs,
>>
>>
>>
>> Thanks Gourav and Shameera for all the work w.r.t. setting up the
>> Mesos-Marathon cluster on Jetstream.
>>
>>
>>
>> I am currently evaluating MPICH (http://www.mpich.org/about/overview/)
>> to be used for launching MPI jobs on top of mesos. MPICH version 1.2
>> supports Mesos based MPI scheduling. I have been also trying to submit jobs
>> to the cluster through Marathon. However, in either cases I am currently
>> facing issues which I am working to get resolved.
>>
>>
>>
>> I am compiling my notes into the following google doc. You may please
>> review and let me know your comments, suggestions.
>>
>>
>>
>> https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3
>> la25y6bcPcmrTD6nR8g/edit?usp=sharing
>>
>>
>>
>> Thanks and Regards,
>>
>> Mangirish Wagle
>>
>>
>>
>> *Error! Filename not specified.*
>>
>>
>>
>> On Wed, Sep 21, 2016 at 3:20 PM, Shenoy, Gourav Ganesh <
>> goshenoy@indiana.edu> wrote:
>>
>> Hi Mangirish,
>>
>>
>>
>> I have set up a Mesos-Marathon cluster for you on Jetstream. I will share
>> with you with the cluster details in a separate email. Kindly note that
>> there are 3 masters & 2 slaves in this cluster.
>>
>>
>>
>> I am also working on automating this process for Jetstream (similar to
>> Shameera’s ansible script for EC2) and when that is ready, we can create
>> clusters or add/remove slave machines from the cluster.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *Mangirish Wagle <va...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Wednesday, September 21, 2016 at 2:36 PM
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Running MPI jobs on Mesos based clusters
>>
>>
>>
>> Hello All,
>>
>>
>>
>> I would like to post for everybody's awareness about the study that I am
>> undertaking this fall, i.e. to evaluate various different frameworks that
>> would facilitate MPI jobs on Mesos based clusters for Apache Airavata.
>>
>>
>>
>> Some of the options that I am looking at are:-
>>
>>    1. MPI support framework bundled with Mesos
>>    2. Apache Aurora
>>    3. Marathon
>>    4. Chronos
>>
>> Some of the evaluation criteria that I am planning to base my
>> investigation are:-
>>
>>    - Ease of setup
>>    - Documentation
>>    - Reliability features like HA
>>    - Scaling and Fault recovery
>>    - Performance
>>    - Community Support
>>
>> Gourav and Shameera are working on ansible based automation to spin up a
>> mesos based cluster and I am planning to use it to setup a cluster for
>> experimentation.
>>
>>
>>
>> Any suggestions or information about prior work on this would be highly
>> appreciated.
>>
>>
>>
>> Thank you.
>>
>>
>>
>> Best Regards,
>>
>> Mangirish Wagle
>>
>> *Error! Filename not specified.*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Re: Running MPI jobs on Mesos based clusters

Posted by Mangirish Wagle <va...@gmail.com>.

Hi Marlon,
Thanks for confirming and sharing the legal link.

-Mangirish

On Thu, Oct 13, 2016 at 12:13 PM, Pierce, Marlon <ma...@iu.edu> wrote:

> BSD is ok: https://www.apache.org/legal/resolved.
>
>
>
> *From: *Mangirish Wagle <va...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Thursday, October 13, 2016 at 12:03 PM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Re: Running MPI jobs on Mesos based clusters
>
>
>
> Hello Devs,
>
> I needed some advice on the license of the MPI libraries. The MPICH
> library that I have been trying claims to have a "BSD Like" license (
> http://git.mpich.org/mpich.git/blob/HEAD:/COPYRIGHT).
>
> I am aware that OpenMPI which uses BSD license is currently used in our
> application. I had chosen to start investigating MPICH because it claims to
> be a highly portable and high quality implementation of latest MPI
> standard, suitable to cloud based clusters.
>
> If anyone could please advise on the acceptance of the MPICH libraries MSD
> Like license for ASF, that would help.
>
> Thank you.
>
> Best Regards,
>
> Mangirish Wagle
>
>
>
> On Thu, Oct 6, 2016 at 1:48 AM, Mangirish Wagle <va...@gmail.com>
> wrote:
>
> Hello Devs,
>
>
>
> The network issue mentioned above now stands resolved. The problem was
> with the iptables had some conflicting rules which blocked the traffic. It
> was resolved by simple iptables flush.
>
>
>
> Here is the test MPI program running on multiple machines:-
>
>
>
> [centos@mesos-slave-1 ~]$ mpiexec -f machinefile -n 2 ./mpitest
>
> Hello world!  I am process number: 0 on host mesos-slave-1
>
> Hello world!  I am process number: 1 on host mesos-slave-2
>
>
>
> The next step is to try invoking this through framework like Marathon.
> However, the job submission still does not run through Marathon. It seems
> to gets stuck in the 'waiting' state forever (For example
> http://149.165.170.245:8080/ui/#/apps/%2Fmaw-try). Further, I notice that
> Marathon is listed under 'inactive frameworks' in mesos dashboard (
> http://149.165.171.33:5050/#/frameworks).
>
>
>
> I am trying to get this working, though any help/ clues with this would be
> really helpful.
>
>
>
> Thanks and Regards,
>
> Mangirish Wagle
>
>
>
>
> On Fri, Sep 30, 2016 at 9:21 PM, Mangirish Wagle <va...@gmail.com>
> wrote:
>
> Hello Devs,
>
>
>
> I am currently running a sample MPI C program using 'mpiexec' provided by
> MPICH. I followed their installation guide
> <http://www.mpich.org/static/downloads/3.2/mpich-3.2-installguide.pdf> to
> install the libraries on the master and slave nodes of the mesos cluster.
>
>
>
> The approach that I am trying out here is that I am equipping the
> underlying nodes with MPI handling tools and then use the Mesos framework
> like Marathon/ Aurora to submit jobs to run MPI programs by invoking these
> tools.
>
>
>
> You can potentially run an MPI program using mpiexec in the following
> manner:-
>
>
>
> # *mpiexec -f machinefile -n 2 ./mpitest*
>
>    - *machinefile *-> File which contains an inventory of machines to run
>    the program on and number of processes on each machine.
>    - *mpitest *-> MPI program compiled in C using mpicc compiler. The
>    program returns the process number and he hostname of the machine running
>    the process.
>    - *-n *option indicates number of processes that it needs to spawn
>
> Example of machinefile contents:-
>
>
>
> # Entries in the format <hostname/IP>:<number of processes>
>
> mesos-slave-1:1
>
> mesos-slave-2:1
>
>
>
> The reason for choosing slaves is that Mesos runs the jobs on slaves,
> managed by 'agents' pertaining to the slaves.
>
>
>
> Output of the program with '-n 1':-
>
>
>
> # mpiexec -f machinefile -n 1 ./mpitest
>
> Hello world!  I am process number: 0 on host mesos-slave-1
>
>
>
> But when I try for '-n 2', I am hitting the following error:-
>
>
>
> # mpiexec -f machinefile -n 2 ./mpitest
>
> [proxy:0:1@mesos-slave-2] HYDU_sock_connect (/home/centos/mpich-3.2/src/
> pm/hydra/utils/sock/sock.c:172): unable to connect from "mesos-slave-2"
> to "mesos-slave-1" (No route to host)
>
> [proxy:0:1@mesos-slave-2] main (/home/centos/mpich-3.2/src/
> pm/hydra/pm/pmiserv/pmip.c:189): *unable to connect to server
> mesos-slave-1 at port 44788* (check for firewalls!)
>
>
>
> It seems to not allow the program execution due to network traffic being
> blocked. I checked security groups in scigap openstack for mesos-slave-1,
> mesos-slave-2 nodes and it is set to 'wideopen' policy. Furthermore, I
> tried adding explicit rules to the policies to allow all TCP and UDP
> (Currently I am not sure what protocol is used underneath), even then it
> continues throwing this error.
>
>
>
> Any clues, suggestions, comments about the error or approach as a whole
> would be helpful.
>
>
>
> Thanks and Regards,
>
> Mangirish Wagle
>
>
>
> *Error! Filename not specified.*
>
>
>
> On Tue, Sep 27, 2016 at 11:23 AM, Mangirish Wagle <
> vaglomangirish@gmail.com> wrote:
>
> Hello Devs,
>
>
>
> Thanks Gourav and Shameera for all the work w.r.t. setting up the
> Mesos-Marathon cluster on Jetstream.
>
>
>
> I am currently evaluating MPICH (http://www.mpich.org/about/overview/) to
> be used for launching MPI jobs on top of mesos. MPICH version 1.2 supports
> Mesos based MPI scheduling. I have been also trying to submit jobs to the
> cluster through Marathon. However, in either cases I am currently facing
> issues which I am working to get resolved.
>
>
>
> I am compiling my notes into the following google doc. You may please
> review and let me know your comments, suggestions.
>
>
>
> https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3la25y6bc
> PcmrTD6nR8g/edit?usp=sharing
>
>
>
> Thanks and Regards,
>
> Mangirish Wagle
>
>
>
> *Error! Filename not specified.*
>
>
>
> On Wed, Sep 21, 2016 at 3:20 PM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
> Hi Mangirish,
>
>
>
> I have set up a Mesos-Marathon cluster for you on Jetstream. I will share
> with you with the cluster details in a separate email. Kindly note that
> there are 3 masters & 2 slaves in this cluster.
>
>
>
> I am also working on automating this process for Jetstream (similar to
> Shameera’s ansible script for EC2) and when that is ready, we can create
> clusters or add/remove slave machines from the cluster.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Mangirish Wagle <va...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Wednesday, September 21, 2016 at 2:36 PM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Running MPI jobs on Mesos based clusters
>
>
>
> Hello All,
>
>
>
> I would like to post for everybody's awareness about the study that I am
> undertaking this fall, i.e. to evaluate various different frameworks that
> would facilitate MPI jobs on Mesos based clusters for Apache Airavata.
>
>
>
> Some of the options that I am looking at are:-
>
>    1. MPI support framework bundled with Mesos
>    2. Apache Aurora
>    3. Marathon
>    4. Chronos
>
> Some of the evaluation criteria that I am planning to base my
> investigation are:-
>
>    - Ease of setup
>    - Documentation
>    - Reliability features like HA
>    - Scaling and Fault recovery
>    - Performance
>    - Community Support
>
> Gourav and Shameera are working on ansible based automation to spin up a
> mesos based cluster and I am planning to use it to setup a cluster for
> experimentation.
>
>
>
> Any suggestions or information about prior work on this would be highly
> appreciated.
>
>
>
> Thank you.
>
>
>
> Best Regards,
>
> Mangirish Wagle
>
> *Error! Filename not specified.*
>
>
>
>
>
>
>
>
>

Re: Running MPI jobs on Mesos based clusters

Posted by "Pierce, Marlon" <ma...@iu.edu>.

BSD is ok: https://www.apache.org/legal/resolved. 

From: Mangirish Wagle <va...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Thursday, October 13, 2016 at 12:03 PM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Re: Running MPI jobs on Mesos based clusters

Hello Devs,

I needed some advice on the license of the MPI libraries. The MPICH library that I have been trying claims to have a "BSD Like" license (http://git.mpich.org/mpich.git/blob/HEAD:/COPYRIGHT).

I am aware that OpenMPI which uses BSD license is currently used in our application. I had chosen to start investigating MPICH because it claims to be a highly portable and high quality implementation of latest MPI standard, suitable to cloud based clusters.

If anyone could please advise on the acceptance of the MPICH libraries MSD Like license for ASF, that would help.

Thank you.

Best Regards,

Mangirish Wagle

On Thu, Oct 6, 2016 at 1:48 AM, Mangirish Wagle <va...@gmail.com> wrote:

Hello Devs, 

The network issue mentioned above now stands resolved. The problem was with the iptables had some conflicting rules which blocked the traffic. It was resolved by simple iptables flush.

Here is the test MPI program running on multiple machines:-

[centos@mesos-slave-1 ~]$ mpiexec -f machinefile -n 2 ./mpitest

Hello world!  I am process number: 0 on host mesos-slave-1

Hello world!  I am process number: 1 on host mesos-slave-2

The next step is to try invoking this through framework like Marathon. However, the job submission still does not run through Marathon. It seems to gets stuck in the 'waiting' state forever (For example http://149.165.170.245:8080/ui/#/apps/%2Fmaw-try). Further, I notice that Marathon is listed under 'inactive frameworks' in mesos dashboard (http://149.165.171.33:5050/#/frameworks).

I am trying to get this working, though any help/ clues with this would be really helpful.

Thanks and Regards,

Mangirish Wagle

On Fri, Sep 30, 2016 at 9:21 PM, Mangirish Wagle <va...@gmail.com> wrote:

Hello Devs, 

I am currently running a sample MPI C program using 'mpiexec' provided by MPICH. I followed their installation guide to install the libraries on the master and slave nodes of the mesos cluster.

The approach that I am trying out here is that I am equipping the underlying nodes with MPI handling tools and then use the Mesos framework like Marathon/ Aurora to submit jobs to run MPI programs by invoking these tools.

You can potentially run an MPI program using mpiexec in the following manner:-

# mpiexec -f machinefile -n 2 ./mpitest

machinefile -> File which contains an inventory of machines to run the program on and number of processes on each machine.
mpitest -> MPI program compiled in C using mpicc compiler. The program returns the process number and he hostname of the machine running the process.
-n option indicates number of processes that it needs to spawn
Example of machinefile contents:-

# Entries in the format <hostname/IP>:<number of processes>

mesos-slave-1:1

mesos-slave-2:1

The reason for choosing slaves is that Mesos runs the jobs on slaves, managed by 'agents' pertaining to the slaves.

Output of the program with '-n 1':-

# mpiexec -f machinefile -n 1 ./mpitest

Hello world!  I am process number: 0 on host mesos-slave-1

But when I try for '-n 2', I am hitting the following error:-

# mpiexec -f machinefile -n 2 ./mpitest

[proxy:0:1@mesos-slave-2] HYDU_sock_connect (/home/centos/mpich-3.2/src/pm/hydra/utils/sock/sock.c:172): unable to connect from "mesos-slave-2" to "mesos-slave-1" (No route to host)

[proxy:0:1@mesos-slave-2] main (/home/centos/mpich-3.2/src/pm/hydra/pm/pmiserv/pmip.c:189): unable to connect to server mesos-slave-1 at port 44788 (check for firewalls!)

It seems to not allow the program execution due to network traffic being blocked. I checked security groups in scigap openstack for mesos-slave-1, mesos-slave-2 nodes and it is set to 'wideopen' policy. Furthermore, I tried adding explicit rules to the policies to allow all TCP and UDP (Currently I am not sure what protocol is used underneath), even then it continues throwing this error.

Any clues, suggestions, comments about the error or approach as a whole would be helpful.

Thanks and Regards,

Mangirish Wagle

Error! Filename not specified.

On Tue, Sep 27, 2016 at 11:23 AM, Mangirish Wagle <va...@gmail.com> wrote:

Hello Devs, 

Thanks Gourav and Shameera for all the work w.r.t. setting up the Mesos-Marathon cluster on Jetstream.

I am currently evaluating MPICH (http://www.mpich.org/about/overview/) to be used for launching MPI jobs on top of mesos. MPICH version 1.2 supports Mesos based MPI scheduling. I have been also trying to submit jobs to the cluster through Marathon. However, in either cases I am currently facing issues which I am working to get resolved.

I am compiling my notes into the following google doc. You may please review and let me know your comments, suggestions.

https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3la25y6bcPcmrTD6nR8g/edit?usp=sharing

Thanks and Regards,

Mangirish Wagle

Error! Filename not specified.

On Wed, Sep 21, 2016 at 3:20 PM, Shenoy, Gourav Ganesh <go...@indiana.edu> wrote:

Hi Mangirish,

I have set up a Mesos-Marathon cluster for you on Jetstream. I will share with you with the cluster details in a separate email. Kindly note that there are 3 masters & 2 slaves in this cluster. 

I am also working on automating this process for Jetstream (similar to Shameera’s ansible script for EC2) and when that is ready, we can create clusters or add/remove slave machines from the cluster.

Thanks and Regards,

Gourav Shenoy

From: Mangirish Wagle <va...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Wednesday, September 21, 2016 at 2:36 PM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Running MPI jobs on Mesos based clusters

Hello All, 

I would like to post for everybody's awareness about the study that I am undertaking this fall, i.e. to evaluate various different frameworks that would facilitate MPI jobs on Mesos based clusters for Apache Airavata.

Some of the options that I am looking at are:-

MPI support framework bundled with Mesos
Apache Aurora
Marathon
Chronos
Some of the evaluation criteria that I am planning to base my investigation are:-

Ease of setup
Documentation
Reliability features like HA
Scaling and Fault recovery
Performance
Community Support
Gourav and Shameera are working on ansible based automation to spin up a mesos based cluster and I am planning to use it to setup a cluster for experimentation.

Any suggestions or information about prior work on this would be highly appreciated.

Thank you.

Best Regards,

Mangirish Wagle

Error! Filename not specified.

Re: Running MPI jobs on Mesos based clusters

Posted by Mangirish Wagle <va...@gmail.com>.

Hello Devs,

I needed some advice on the license of the MPI libraries. The MPICH library
that I have been trying claims to have a "BSD Like" license (
http://git.mpich.org/mpich.git/blob/HEAD:/COPYRIGHT).

I am aware that OpenMPI which uses BSD license is currently used in our
application. I had chosen to start investigating MPICH because it claims to
be a highly portable and high quality implementation of latest MPI
standard, suitable to cloud based clusters.

If anyone could please advise on the acceptance of the MPICH libraries MSD
Like license for ASF, that would help.

Thank you.

Best Regards,
Mangirish Wagle

On Thu, Oct 6, 2016 at 1:48 AM, Mangirish Wagle <va...@gmail.com>
wrote:

> Hello Devs,
>
> The network issue mentioned above now stands resolved. The problem was
> with the iptables had some conflicting rules which blocked the traffic. It
> was resolved by simple iptables flush.
>
> Here is the test MPI program running on multiple machines:-
>
> [centos@mesos-slave-1 ~]$ mpiexec -f machinefile -n 2 ./mpitest
> Hello world!  I am process number: 0 on host mesos-slave-1
> Hello world!  I am process number: 1 on host mesos-slave-2
>
> The next step is to try invoking this through framework like Marathon.
> However, the job submission still does not run through Marathon. It seems
> to gets stuck in the 'waiting' state forever (For example
> http://149.165.170.245:8080/ui/#/apps/%2Fmaw-try). Further, I notice that
> Marathon is listed under 'inactive frameworks' in mesos dashboard (
> http://149.165.171.33:5050/#/frameworks).
>
> I am trying to get this working, though any help/ clues with this would be
> really helpful.
>
> Thanks and Regards,
> Mangirish Wagle
>
>
>
>
> On Fri, Sep 30, 2016 at 9:21 PM, Mangirish Wagle <vaglomangirish@gmail.com
> > wrote:
>
>> Hello Devs,
>>
>> I am currently running a sample MPI C program using 'mpiexec' provided by
>> MPICH. I followed their installation guide
>> <http://www.mpich.org/static/downloads/3.2/mpich-3.2-installguide.pdf> to
>> install the libraries on the master and slave nodes of the mesos cluster.
>>
>> The approach that I am trying out here is that I am equipping the
>> underlying nodes with MPI handling tools and then use the Mesos framework
>> like Marathon/ Aurora to submit jobs to run MPI programs by invoking these
>> tools.
>>
>> You can potentially run an MPI program using mpiexec in the following
>> manner:-
>>
>> # *mpiexec -f machinefile -n 2 ./mpitest*
>>
>>    - *machinefile *-> File which contains an inventory of machines to
>>    run the program on and number of processes on each machine.
>>    - *mpitest *-> MPI program compiled in C using mpicc compiler. The
>>    program returns the process number and he hostname of the machine running
>>    the process.
>>    - *-n *option indicates number of processes that it needs to spawn
>>
>> Example of machinefile contents:-
>>
>> # Entries in the format <hostname/IP>:<number of processes>
>> mesos-slave-1:1
>> mesos-slave-2:1
>>
>> The reason for choosing slaves is that Mesos runs the jobs on slaves,
>> managed by 'agents' pertaining to the slaves.
>>
>> Output of the program with '-n 1':-
>>
>> # mpiexec -f machinefile -n 1 ./mpitest
>> Hello world!  I am process number: 0 on host mesos-slave-1
>>
>> But when I try for '-n 2', I am hitting the following error:-
>>
>> # mpiexec -f machinefile -n 2 ./mpitest
>> [proxy:0:1@mesos-slave-2] HYDU_sock_connect
>> (/home/centos/mpich-3.2/src/pm/hydra/utils/sock/sock.c:172): unable to
>> connect from "mesos-slave-2" to "mesos-slave-1" (No route to host)
>> [proxy:0:1@mesos-slave-2] main (/home/centos/mpich-3.2/src/pm/hydra/pm/pmiserv/pmip.c:189):
>> *unable to connect to server mesos-slave-1 at port 44788* (check for
>> firewalls!)
>>
>> It seems to not allow the program execution due to network traffic being
>> blocked. I checked security groups in scigap openstack for mesos-slave-1,
>> mesos-slave-2 nodes and it is set to 'wideopen' policy. Furthermore, I
>> tried adding explicit rules to the policies to allow all TCP and UDP
>> (Currently I am not sure what protocol is used underneath), even then it
>> continues throwing this error.
>>
>> Any clues, suggestions, comments about the error or approach as a whole
>> would be helpful.
>>
>> Thanks and Regards,
>> Mangirish Wagle
>>
>>
>> On Tue, Sep 27, 2016 at 11:23 AM, Mangirish Wagle <
>> vaglomangirish@gmail.com> wrote:
>>
>>> Hello Devs,
>>>
>>> Thanks Gourav and Shameera for all the work w.r.t. setting up the
>>> Mesos-Marathon cluster on Jetstream.
>>>
>>> I am currently evaluating MPICH (http://www.mpich.org/about/overview/)
>>> to be used for launching MPI jobs on top of mesos. MPICH version 1.2
>>> supports Mesos based MPI scheduling. I have been also trying to submit jobs
>>> to the cluster through Marathon. However, in either cases I am currently
>>> facing issues which I am working to get resolved.
>>>
>>> I am compiling my notes into the following google doc. You may please
>>> review and let me know your comments, suggestions.
>>>
>>> https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3
>>> la25y6bcPcmrTD6nR8g/edit?usp=sharing
>>>
>>> Thanks and Regards,
>>> Mangirish Wagle
>>>
>>>
>>>
>>> On Wed, Sep 21, 2016 at 3:20 PM, Shenoy, Gourav Ganesh <
>>> goshenoy@indiana.edu> wrote:
>>>
>>>> Hi Mangirish,
>>>>
>>>>
>>>>
>>>> I have set up a Mesos-Marathon cluster for you on Jetstream. I will
>>>> share with you with the cluster details in a separate email. Kindly note
>>>> that there are 3 masters & 2 slaves in this cluster.
>>>>
>>>>
>>>>
>>>> I am also working on automating this process for Jetstream (similar to
>>>> Shameera’s ansible script for EC2) and when that is ready, we can create
>>>> clusters or add/remove slave machines from the cluster.
>>>>
>>>>
>>>>
>>>> Thanks and Regards,
>>>>
>>>> Gourav Shenoy
>>>>
>>>>
>>>>
>>>> *From: *Mangirish Wagle <va...@gmail.com>
>>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>> *Date: *Wednesday, September 21, 2016 at 2:36 PM
>>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>>> *Subject: *Running MPI jobs on Mesos based clusters
>>>>
>>>>
>>>>
>>>> Hello All,
>>>>
>>>>
>>>>
>>>> I would like to post for everybody's awareness about the study that I
>>>> am undertaking this fall, i.e. to evaluate various different frameworks
>>>> that would facilitate MPI jobs on Mesos based clusters for Apache Airavata.
>>>>
>>>>
>>>>
>>>> Some of the options that I am looking at are:-
>>>>
>>>>    1. MPI support framework bundled with Mesos
>>>>    2. Apache Aurora
>>>>    3. Marathon
>>>>    4. Chronos
>>>>
>>>> Some of the evaluation criteria that I am planning to base my
>>>> investigation are:-
>>>>
>>>>    - Ease of setup
>>>>    - Documentation
>>>>    - Reliability features like HA
>>>>    - Scaling and Fault recovery
>>>>    - Performance
>>>>    - Community Support
>>>>
>>>> Gourav and Shameera are working on ansible based automation to spin up
>>>> a mesos based cluster and I am planning to use it to setup a cluster for
>>>> experimentation.
>>>>
>>>>
>>>>
>>>> Any suggestions or information about prior work on this would be highly
>>>> appreciated.
>>>>
>>>>
>>>>
>>>> Thank you.
>>>>
>>>>
>>>>
>>>> Best Regards,
>>>>
>>>> Mangirish Wagle
>>>>
>>>>
>>>
>>
>

Re: Running MPI jobs on Mesos based clusters

Posted by Mangirish Wagle <va...@gmail.com>.

Hello Devs,

The network issue mentioned above now stands resolved. The problem was with
the iptables had some conflicting rules which blocked the traffic. It was
resolved by simple iptables flush.

Here is the test MPI program running on multiple machines:-

[centos@mesos-slave-1 ~]$ mpiexec -f machinefile -n 2 ./mpitest
Hello world!  I am process number: 0 on host mesos-slave-1
Hello world!  I am process number: 1 on host mesos-slave-2

The next step is to try invoking this through framework like Marathon.
However, the job submission still does not run through Marathon. It seems
to gets stuck in the 'waiting' state forever (For example
http://149.165.170.245:8080/ui/#/apps/%2Fmaw-try). Further, I notice that
Marathon is listed under 'inactive frameworks' in mesos dashboard (
http://149.165.171.33:5050/#/frameworks).

I am trying to get this working, though any help/ clues with this would be
really helpful.

Thanks and Regards,
Mangirish Wagle




On Fri, Sep 30, 2016 at 9:21 PM, Mangirish Wagle <va...@gmail.com>
wrote:

> Hello Devs,
>
> I am currently running a sample MPI C program using 'mpiexec' provided by
> MPICH. I followed their installation guide
> <http://www.mpich.org/static/downloads/3.2/mpich-3.2-installguide.pdf> to
> install the libraries on the master and slave nodes of the mesos cluster.
>
> The approach that I am trying out here is that I am equipping the
> underlying nodes with MPI handling tools and then use the Mesos framework
> like Marathon/ Aurora to submit jobs to run MPI programs by invoking these
> tools.
>
> You can potentially run an MPI program using mpiexec in the following
> manner:-
>
> # *mpiexec -f machinefile -n 2 ./mpitest*
>
>    - *machinefile *-> File which contains an inventory of machines to run
>    the program on and number of processes on each machine.
>    - *mpitest *-> MPI program compiled in C using mpicc compiler. The
>    program returns the process number and he hostname of the machine running
>    the process.
>    - *-n *option indicates number of processes that it needs to spawn
>
> Example of machinefile contents:-
>
> # Entries in the format <hostname/IP>:<number of processes>
> mesos-slave-1:1
> mesos-slave-2:1
>
> The reason for choosing slaves is that Mesos runs the jobs on slaves,
> managed by 'agents' pertaining to the slaves.
>
> Output of the program with '-n 1':-
>
> # mpiexec -f machinefile -n 1 ./mpitest
> Hello world!  I am process number: 0 on host mesos-slave-1
>
> But when I try for '-n 2', I am hitting the following error:-
>
> # mpiexec -f machinefile -n 2 ./mpitest
> [proxy:0:1@mesos-slave-2] HYDU_sock_connect (/home/centos/mpich-3.2/src/
> pm/hydra/utils/sock/sock.c:172): unable to connect from "mesos-slave-2"
> to "mesos-slave-1" (No route to host)
> [proxy:0:1@mesos-slave-2] main (/home/centos/mpich-3.2/src/
> pm/hydra/pm/pmiserv/pmip.c:189): *unable to connect to server
> mesos-slave-1 at port 44788* (check for firewalls!)
>
> It seems to not allow the program execution due to network traffic being
> blocked. I checked security groups in scigap openstack for mesos-slave-1,
> mesos-slave-2 nodes and it is set to 'wideopen' policy. Furthermore, I
> tried adding explicit rules to the policies to allow all TCP and UDP
> (Currently I am not sure what protocol is used underneath), even then it
> continues throwing this error.
>
> Any clues, suggestions, comments about the error or approach as a whole
> would be helpful.
>
> Thanks and Regards,
> Mangirish Wagle
>
>
> On Tue, Sep 27, 2016 at 11:23 AM, Mangirish Wagle <
> vaglomangirish@gmail.com> wrote:
>
>> Hello Devs,
>>
>> Thanks Gourav and Shameera for all the work w.r.t. setting up the
>> Mesos-Marathon cluster on Jetstream.
>>
>> I am currently evaluating MPICH (http://www.mpich.org/about/overview/)
>> to be used for launching MPI jobs on top of mesos. MPICH version 1.2
>> supports Mesos based MPI scheduling. I have been also trying to submit jobs
>> to the cluster through Marathon. However, in either cases I am currently
>> facing issues which I am working to get resolved.
>>
>> I am compiling my notes into the following google doc. You may please
>> review and let me know your comments, suggestions.
>>
>> https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3
>> la25y6bcPcmrTD6nR8g/edit?usp=sharing
>>
>> Thanks and Regards,
>> Mangirish Wagle
>>
>>
>>
>> On Wed, Sep 21, 2016 at 3:20 PM, Shenoy, Gourav Ganesh <
>> goshenoy@indiana.edu> wrote:
>>
>>> Hi Mangirish,
>>>
>>>
>>>
>>> I have set up a Mesos-Marathon cluster for you on Jetstream. I will
>>> share with you with the cluster details in a separate email. Kindly note
>>> that there are 3 masters & 2 slaves in this cluster.
>>>
>>>
>>>
>>> I am also working on automating this process for Jetstream (similar to
>>> Shameera’s ansible script for EC2) and when that is ready, we can create
>>> clusters or add/remove slave machines from the cluster.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Gourav Shenoy
>>>
>>>
>>>
>>> *From: *Mangirish Wagle <va...@gmail.com>
>>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Date: *Wednesday, September 21, 2016 at 2:36 PM
>>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>>> *Subject: *Running MPI jobs on Mesos based clusters
>>>
>>>
>>>
>>> Hello All,
>>>
>>>
>>>
>>> I would like to post for everybody's awareness about the study that I am
>>> undertaking this fall, i.e. to evaluate various different frameworks that
>>> would facilitate MPI jobs on Mesos based clusters for Apache Airavata.
>>>
>>>
>>>
>>> Some of the options that I am looking at are:-
>>>
>>>    1. MPI support framework bundled with Mesos
>>>    2. Apache Aurora
>>>    3. Marathon
>>>    4. Chronos
>>>
>>> Some of the evaluation criteria that I am planning to base my
>>> investigation are:-
>>>
>>>    - Ease of setup
>>>    - Documentation
>>>    - Reliability features like HA
>>>    - Scaling and Fault recovery
>>>    - Performance
>>>    - Community Support
>>>
>>> Gourav and Shameera are working on ansible based automation to spin up a
>>> mesos based cluster and I am planning to use it to setup a cluster for
>>> experimentation.
>>>
>>>
>>>
>>> Any suggestions or information about prior work on this would be highly
>>> appreciated.
>>>
>>>
>>>
>>> Thank you.
>>>
>>>
>>>
>>> Best Regards,
>>>
>>> Mangirish Wagle
>>>
>>>
>>
>

Re: Running MPI jobs on Mesos based clusters

Posted by Mangirish Wagle <va...@gmail.com>.

Hello Devs,

I am currently running a sample MPI C program using 'mpiexec' provided by
MPICH. I followed their installation guide
<http://www.mpich.org/static/downloads/3.2/mpich-3.2-installguide.pdf> to
install the libraries on the master and slave nodes of the mesos cluster.

The approach that I am trying out here is that I am equipping the
underlying nodes with MPI handling tools and then use the Mesos framework
like Marathon/ Aurora to submit jobs to run MPI programs by invoking these
tools.

You can potentially run an MPI program using mpiexec in the following
manner:-

# *mpiexec -f machinefile -n 2 ./mpitest*

   - *machinefile *-> File which contains an inventory of machines to run
   the program on and number of processes on each machine.
   - *mpitest *-> MPI program compiled in C using mpicc compiler. The
   program returns the process number and he hostname of the machine running
   the process.
   - *-n *option indicates number of processes that it needs to spawn

Example of machinefile contents:-

# Entries in the format <hostname/IP>:<number of processes>
mesos-slave-1:1
mesos-slave-2:1

The reason for choosing slaves is that Mesos runs the jobs on slaves,
managed by 'agents' pertaining to the slaves.

Output of the program with '-n 1':-

# mpiexec -f machinefile -n 1 ./mpitest
Hello world!  I am process number: 0 on host mesos-slave-1

But when I try for '-n 2', I am hitting the following error:-

# mpiexec -f machinefile -n 2 ./mpitest
[proxy:0:1@mesos-slave-2] HYDU_sock_connect
(/home/centos/mpich-3.2/src/pm/hydra/utils/sock/sock.c:172): unable to
connect from "mesos-slave-2" to "mesos-slave-1" (No route to host)
[proxy:0:1@mesos-slave-2] main
(/home/centos/mpich-3.2/src/pm/hydra/pm/pmiserv/pmip.c:189): *unable to
connect to server mesos-slave-1 at port 44788* (check for firewalls!)

It seems to not allow the program execution due to network traffic being
blocked. I checked security groups in scigap openstack for mesos-slave-1,
mesos-slave-2 nodes and it is set to 'wideopen' policy. Furthermore, I
tried adding explicit rules to the policies to allow all TCP and UDP
(Currently I am not sure what protocol is used underneath), even then it
continues throwing this error.

Any clues, suggestions, comments about the error or approach as a whole
would be helpful.

Thanks and Regards,
Mangirish Wagle


On Tue, Sep 27, 2016 at 11:23 AM, Mangirish Wagle <va...@gmail.com>
wrote:

> Hello Devs,
>
> Thanks Gourav and Shameera for all the work w.r.t. setting up the
> Mesos-Marathon cluster on Jetstream.
>
> I am currently evaluating MPICH (http://www.mpich.org/about/overview/) to
> be used for launching MPI jobs on top of mesos. MPICH version 1.2 supports
> Mesos based MPI scheduling. I have been also trying to submit jobs to the
> cluster through Marathon. However, in either cases I am currently facing
> issues which I am working to get resolved.
>
> I am compiling my notes into the following google doc. You may please
> review and let me know your comments, suggestions.
>
> https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3la25y6bc
> PcmrTD6nR8g/edit?usp=sharing
>
> Thanks and Regards,
> Mangirish Wagle
>
>
>
> On Wed, Sep 21, 2016 at 3:20 PM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
>> Hi Mangirish,
>>
>>
>>
>> I have set up a Mesos-Marathon cluster for you on Jetstream. I will share
>> with you with the cluster details in a separate email. Kindly note that
>> there are 3 masters & 2 slaves in this cluster.
>>
>>
>>
>> I am also working on automating this process for Jetstream (similar to
>> Shameera’s ansible script for EC2) and when that is ready, we can create
>> clusters or add/remove slave machines from the cluster.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *Mangirish Wagle <va...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Date: *Wednesday, September 21, 2016 at 2:36 PM
>> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
>> *Subject: *Running MPI jobs on Mesos based clusters
>>
>>
>>
>> Hello All,
>>
>>
>>
>> I would like to post for everybody's awareness about the study that I am
>> undertaking this fall, i.e. to evaluate various different frameworks that
>> would facilitate MPI jobs on Mesos based clusters for Apache Airavata.
>>
>>
>>
>> Some of the options that I am looking at are:-
>>
>>    1. MPI support framework bundled with Mesos
>>    2. Apache Aurora
>>    3. Marathon
>>    4. Chronos
>>
>> Some of the evaluation criteria that I am planning to base my
>> investigation are:-
>>
>>    - Ease of setup
>>    - Documentation
>>    - Reliability features like HA
>>    - Scaling and Fault recovery
>>    - Performance
>>    - Community Support
>>
>> Gourav and Shameera are working on ansible based automation to spin up a
>> mesos based cluster and I am planning to use it to setup a cluster for
>> experimentation.
>>
>>
>>
>> Any suggestions or information about prior work on this would be highly
>> appreciated.
>>
>>
>>
>> Thank you.
>>
>>
>>
>> Best Regards,
>>
>> Mangirish Wagle
>>
>>
>

Re: Running MPI jobs on Mesos based clusters

Posted by Mangirish Wagle <va...@gmail.com>.

Hello Devs,

Thanks Gourav and Shameera for all the work w.r.t. setting up the
Mesos-Marathon cluster on Jetstream.

I am currently evaluating MPICH (http://www.mpich.org/about/overview/) to
be used for launching MPI jobs on top of mesos. MPICH version 1.2 supports
Mesos based MPI scheduling. I have been also trying to submit jobs to the
cluster through Marathon. However, in either cases I am currently facing
issues which I am working to get resolved.

I am compiling my notes into the following google doc. You may please
review and let me know your comments, suggestions.

https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3la25y6bcPcmrTD6nR8g/edit?usp=sharing

Thanks and Regards,
Mangirish Wagle



On Wed, Sep 21, 2016 at 3:20 PM, Shenoy, Gourav Ganesh <goshenoy@indiana.edu
> wrote:

> Hi Mangirish,
>
>
>
> I have set up a Mesos-Marathon cluster for you on Jetstream. I will share
> with you with the cluster details in a separate email. Kindly note that
> there are 3 masters & 2 slaves in this cluster.
>
>
>
> I am also working on automating this process for Jetstream (similar to
> Shameera’s ansible script for EC2) and when that is ready, we can create
> clusters or add/remove slave machines from the cluster.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Mangirish Wagle <va...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Date: *Wednesday, September 21, 2016 at 2:36 PM
> *To: *"dev@airavata.apache.org" <de...@airavata.apache.org>
> *Subject: *Running MPI jobs on Mesos based clusters
>
>
>
> Hello All,
>
>
>
> I would like to post for everybody's awareness about the study that I am
> undertaking this fall, i.e. to evaluate various different frameworks that
> would facilitate MPI jobs on Mesos based clusters for Apache Airavata.
>
>
>
> Some of the options that I am looking at are:-
>
>    1. MPI support framework bundled with Mesos
>    2. Apache Aurora
>    3. Marathon
>    4. Chronos
>
> Some of the evaluation criteria that I am planning to base my
> investigation are:-
>
>    - Ease of setup
>    - Documentation
>    - Reliability features like HA
>    - Scaling and Fault recovery
>    - Performance
>    - Community Support
>
> Gourav and Shameera are working on ansible based automation to spin up a
> mesos based cluster and I am planning to use it to setup a cluster for
> experimentation.
>
>
>
> Any suggestions or information about prior work on this would be highly
> appreciated.
>
>
>
> Thank you.
>
>
>
> Best Regards,
>
> Mangirish Wagle
>
>

Re: Running MPI jobs on Mesos based clusters

Posted by "Shenoy, Gourav Ganesh" <go...@indiana.edu>.

Hi Mangirish,

I have set up a Mesos-Marathon cluster for you on Jetstream. I will share with you with the cluster details in a separate email. Kindly note that there are 3 masters & 2 slaves in this cluster.

I am also working on automating this process for Jetstream (similar to Shameera’s ansible script for EC2) and when that is ready, we can create clusters or add/remove slave machines from the cluster.

Thanks and Regards,
Gourav Shenoy

From: Mangirish Wagle <va...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Wednesday, September 21, 2016 at 2:36 PM
To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Subject: Running MPI jobs on Mesos based clusters

Hello All,

I would like to post for everybody's awareness about the study that I am undertaking this fall, i.e. to evaluate various different frameworks that would facilitate MPI jobs on Mesos based clusters for Apache Airavata.

Some of the options that I am looking at are:-

  1.  MPI support framework bundled with Mesos
  2.  Apache Aurora
  3.  Marathon
  4.  Chronos
Some of the evaluation criteria that I am planning to base my investigation are:-

  *   Ease of setup
  *   Documentation
  *   Reliability features like HA
  *   Scaling and Fault recovery
  *   Performance
  *   Community Support
Gourav and Shameera are working on ansible based automation to spin up a mesos based cluster and I am planning to use it to setup a cluster for experimentation.

Any suggestions or information about prior work on this would be highly appreciated.

Thank you.

Best Regards,
Mangirish Wagle