You are viewing a plain text version of this content. The canonical link for it is here.
Posted to architecture@airavata.apache.org by Borries Demeler <de...@biochem.uthscsa.edu> on 2014/09/02 17:29:11 UTC

[marpierc@iu.edu: Re: Scheduling stratergies for Airavata]

I just got off the phone with Chris Hempel from TACC. He will investigate if they can create some
exception for the UltraScan community account for the duration of the workshop, so that we can 
have a short term solution in place for the workshop.

For the long term solution, after some discussion with Gary, I agree that we best move forward
by implementing a MS in the context of the thrift version of Airavata. I just ask that we do 
not put this on the back burner, and that we should also involve some of the operators of
the XSEDE resources in the design decision so that we can understand what kinds of tools they
have available to handle such job conditions, and how they would like us to handle mass submissions.
For example, Chris mentioned something about dynamic linking of jobs, which I think means that
only one of the jobs is in the queue at one time, and all will get processed serially, which 
would prevent overloads. Also, he said something about 'backfill' methods that can use our
jobs (which are generally only a few seconds to a couple of minutes long), and can be set up
for different number of cores and hardware configurations to suit the queue and hardware
environment.

-b.

----- Forwarded message from Marlon Pierce <ma...@iu.edu> -----

Date: Tue, 02 Sep 2014 10:39:11 -0400
From: Marlon Pierce <ma...@iu.edu>
To: architecture@airavata.apache.org
Subject: Re: Scheduling stratergies for Airavata
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

One internal note: I think we need to include "Launched" state when
determining how many jobs a gateway is currently running.

Marlon

On 9/2/14, 10:29 AM, Miller, Mark wrote:
> We would have the same issue Borries mentioned: the community account user under xsede owns all the jobs. Luckily, Sdsc makes allowances for known gateway, trusting that we represent many individuals. We are building throttling tools to prevent users from submitting more than x running jobs, and placing reserves against their allocation for running jobs.
> 
> 
> I don't see how to solve the problem of xsede or other resources seeing a gateway user as equivalent to a regular user without help from xsede policy decisions/infrastructure changes; esp for the case where  a code requires a single resource, and is submitted by many users at once.
> 
> I think solving that would require a resource providers to disambiguate regular and community users.
> 
> Mark
> 
>> On Sep 2, 2014, at 10:11 AM, "Borries Demeler" <de...@biochem.uthscsa.edu> wrote:
>> 
>> Our application involves submission of several hundred quite small (a couple of minutes for most
>> clusters, ~128 cores, give or take) computational jobs, running the same code on multiple datasets.
>> 
>> We are hitting the limit of 50 jobs on TACC resources, with all others failing. The problem is
>> made worse because all users submit under a community account, which treats every submission to
>> be part of the same allocation account.
>> 
>> I see a few possibilities:
>> 
>> 1. a separate FIFO queue, making sure none of the resources get overloaded by any community account user
>> 
>> 2. submitting all jobs as a single job somehow to where the job is submitted for the aggregate walltime
>> for all jobs. A special workscript would spawn jobs underneath the parent submission. Not sure if this
>> is feasable or reasonable.
>> 
>> 3. spreading the jobs around all possible resources
>> 
>> 4. a combination of 1 and 3.
>> 
>> -Borries
>> 
>> 
>> 
>> 
>>> On Tue, Sep 02, 2014 at 07:50:12AM -0400, Suresh Marru wrote:
>>> Hi All,
>>> 
>>> Need some guidance on identifying a scheduling strategy and a pluggable third party implementation for airavata scheduling needs. For context let me describe the use cases for scheduling within airavata:
>>> 
>>> * If we gateway/user is submitting a series of jobs, airavata is currently not throttling them and sending them to compute clusters (in a FIFO way). Resources enforce per user job limit within a queue and ensure fair use of the clusters ((example: stampede allows 50 jobs per user in the normal queue [1]). Airavata will need to implement queues and throttle jobs respecting the max-job-per-queue limits of a underlying resource queue.
>>> 
>>> * Current version of Airavata is also not performing job scheduling across available computational resources and expecting gateways/users to pick resources during experiment launch. Airavata will need to implement schedulers which become aware of existing loads on the clusters and spread jobs efficiently. The scheduler should be able to get access to heuristics on previous executions and current requirements which includes job size (number of nodes/cores), memory requirements, wall time estimates and so forth.
>>> 
>>> * As Airavata is mapping multiple individual user jobs into one or more community account submissions, it also becomes critical to implement fair-share scheduling among these users to ensure fair use of allocations as well as allowable queue limits.
>>> 
>>> Other use cases?
>>> 
>>> We will greatly appreciate if folks on this list can shed light on experiences using schedulers implemented in hadoop, mesos, storm or other frameworks outside of their intended use. For instance, hadoop (yarn) capacity [2] and fair schedulers [3][4][5] seem to meet the needs of airavata. Is it a good idea to attempt to reuse these implementations? Any other pluggable third-party alternatives.
>>> 
>>> Thanks in advance for your time and insights,
>>> 
>>> Suresh
>>> 
>>> [1] - https://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide#running
>>> [2] - http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>>> [3] - http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>> [4] - https://issues.apache.org/jira/browse/HADOOP-3746
>>> [5] - https://issues.apache.org/jira/browse/YARN-326
>>> 
>>> 

----- End forwarded message -----

Re: Scheduling stratergies for Airavata

Posted by Suresh Marru <sm...@apache.org>.
Thank you Mark and Borries, these additional use cases and experiences help this conversation.
 
Looking forward to see some third party library suggestions from folks on the architecture list. 

Suresh

On Sep 2, 2014, at 11:29 AM, Borries Demeler <de...@biochem.uthscsa.edu> wrote:

> I just got off the phone with Chris Hempel from TACC. He will investigate if they can create some
> exception for the UltraScan community account for the duration of the workshop, so that we can 
> have a short term solution in place for the workshop.
> 
> For the long term solution, after some discussion with Gary, I agree that we best move forward
> by implementing a MS in the context of the thrift version of Airavata. I just ask that we do 
> not put this on the back burner, and that we should also involve some of the operators of
> the XSEDE resources in the design decision so that we can understand what kinds of tools they
> have available to handle such job conditions, and how they would like us to handle mass submissions.
> For example, Chris mentioned something about dynamic linking of jobs, which I think means that
> only one of the jobs is in the queue at one time, and all will get processed serially, which 
> would prevent overloads. Also, he said something about 'backfill' methods that can use our
> jobs (which are generally only a few seconds to a couple of minutes long), and can be set up
> for different number of cores and hardware configurations to suit the queue and hardware
> environment.
> 
> -b.
> 
> ----- Forwarded message from Marlon Pierce <ma...@iu.edu> -----
> 
> Date: Tue, 02 Sep 2014 10:39:11 -0400
> From: Marlon Pierce <ma...@iu.edu>
> To: architecture@airavata.apache.org
> Subject: Re: Scheduling stratergies for Airavata
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.6.0
> 
> One internal note: I think we need to include "Launched" state when
> determining how many jobs a gateway is currently running.
> 
> Marlon
> 
> On 9/2/14, 10:29 AM, Miller, Mark wrote:
>> We would have the same issue Borries mentioned: the community account user under xsede owns all the jobs. Luckily, Sdsc makes allowances for known gateway, trusting that we represent many individuals. We are building throttling tools to prevent users from submitting more than x running jobs, and placing reserves against their allocation for running jobs.
>> 
>> 
>> I don't see how to solve the problem of xsede or other resources seeing a gateway user as equivalent to a regular user without help from xsede policy decisions/infrastructure changes; esp for the case where  a code requires a single resource, and is submitted by many users at once.
>> 
>> I think solving that would require a resource providers to disambiguate regular and community users.
>> 
>> Mark
>> 
>>> On Sep 2, 2014, at 10:11 AM, "Borries Demeler" <de...@biochem.uthscsa.edu> wrote:
>>> 
>>> Our application involves submission of several hundred quite small (a couple of minutes for most
>>> clusters, ~128 cores, give or take) computational jobs, running the same code on multiple datasets.
>>> 
>>> We are hitting the limit of 50 jobs on TACC resources, with all others failing. The problem is
>>> made worse because all users submit under a community account, which treats every submission to
>>> be part of the same allocation account.
>>> 
>>> I see a few possibilities:
>>> 
>>> 1. a separate FIFO queue, making sure none of the resources get overloaded by any community account user
>>> 
>>> 2. submitting all jobs as a single job somehow to where the job is submitted for the aggregate walltime
>>> for all jobs. A special workscript would spawn jobs underneath the parent submission. Not sure if this
>>> is feasable or reasonable.
>>> 
>>> 3. spreading the jobs around all possible resources
>>> 
>>> 4. a combination of 1 and 3.
>>> 
>>> -Borries
>>> 
>>> 
>>> 
>>> 
>>>> On Tue, Sep 02, 2014 at 07:50:12AM -0400, Suresh Marru wrote:
>>>> Hi All,
>>>> 
>>>> Need some guidance on identifying a scheduling strategy and a pluggable third party implementation for airavata scheduling needs. For context let me describe the use cases for scheduling within airavata:
>>>> 
>>>> * If we gateway/user is submitting a series of jobs, airavata is currently not throttling them and sending them to compute clusters (in a FIFO way). Resources enforce per user job limit within a queue and ensure fair use of the clusters ((example: stampede allows 50 jobs per user in the normal queue [1]). Airavata will need to implement queues and throttle jobs respecting the max-job-per-queue limits of a underlying resource queue.
>>>> 
>>>> * Current version of Airavata is also not performing job scheduling across available computational resources and expecting gateways/users to pick resources during experiment launch. Airavata will need to implement schedulers which become aware of existing loads on the clusters and spread jobs efficiently. The scheduler should be able to get access to heuristics on previous executions and current requirements which includes job size (number of nodes/cores), memory requirements, wall time estimates and so forth.
>>>> 
>>>> * As Airavata is mapping multiple individual user jobs into one or more community account submissions, it also becomes critical to implement fair-share scheduling among these users to ensure fair use of allocations as well as allowable queue limits.
>>>> 
>>>> Other use cases?
>>>> 
>>>> We will greatly appreciate if folks on this list can shed light on experiences using schedulers implemented in hadoop, mesos, storm or other frameworks outside of their intended use. For instance, hadoop (yarn) capacity [2] and fair schedulers [3][4][5] seem to meet the needs of airavata. Is it a good idea to attempt to reuse these implementations? Any other pluggable third-party alternatives.
>>>> 
>>>> Thanks in advance for your time and insights,
>>>> 
>>>> Suresh
>>>> 
>>>> [1] - https://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide#running
>>>> [2] - http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>>>> [3] - http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>> [4] - https://issues.apache.org/jira/browse/HADOOP-3746
>>>> [5] - https://issues.apache.org/jira/browse/YARN-326
>>>> 
>>>> 
> 
> ----- End forwarded message -----