You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@aurora.apache.org by Bryan Helmkamp <br...@codeclimate.com> on 2014/02/26 07:11:32 UTC

Suitibility of Aurora for one-time tasks

Hello,

I am considering Aurora for a key component of our infrastructure.
Awesome work being done here.

My question is: How suitable is Aurora for running short-lived tasks?

Background: We (Code Climate) do static analysis of tens of thousands
of repositories every day. We run a variety of forms of analysis, with
heterogeneous resource requirements, and thus our interest in Mesos.

Looking at Aurora, a lot of the core features look very helpful to us.
Where I am getting hung up is figuring out how to model short-lived
tasks as tasks/jobs. Long-running resource allocations are not really
an option for us due to the variation in our workloads.

My first thought was to create a Task for each type of analysis we
run, and then start a new Job with the appropriate Task every time we
want to run analysis (regulated by a queue). This doesn't seem to work
though. I can't `aurora create` the same `.aurora` file multiple times
with different Job names (as far as I can tell). Also there is the
problem of how to customize each Job slightly (e.g. a payload).

An obvious alternative is to create a unique Task every time we want
to run work. This would result in tens of thousands of tasks being
created every day, and from what I can tell Aurora does not intend to
be used like that. (Please correct me if I am wrong.)

Basically, I would like to hook my job queue up to Aurora to perform
the actual work. There are a dozen different types of jobs, each with
different performance requirements. Every time a job runs, it has a
unique payload containing the definition of the work it should be
performed.

Can Aurora be used this way? If so, what is the proper way to model
this with respect to Jobs and Tasks?

Any/all help is appreciated.

Thanks!

-Bryan

-- 
Bryan Helmkamp, Founder, Code Climate
bryan@codeclimate.com / 646-379-1810 / @brynary

Re: Suitibility of Aurora for one-time tasks

Posted by Bill Farner <wf...@apache.org>.
>
> The problem
> is that sometimes we end up with not enough workers for certain
> classes of jobs (e.g. High Memory), while part of the cluster sits
> idle.


There's no prior art for this, but the Aurora API is actually designed in a
way that would make it possible to have a 'supervisor' job that tunes the
number of instances in each job by sending RPCs to the scheduler.  You'd be
trailblazing here, but it's another path to consider.


-=Bill


On Wed, Feb 26, 2014 at 12:58 PM, Bill Farner <wf...@apache.org> wrote:

> Can you offer some more details on what the workload execution looks like?
>  Are these shell commands?  An application that's provided different
> configuration?
>
> -=Bill
>
>
> On Wed, Feb 26, 2014 at 12:45 PM, Bryan Helmkamp <br...@codeclimate.com>wrote:
>
>> Thanks, Kevin. The idea of always-on workers of varying sizes is
>> effectively what we have right now in our non-Mesos world. The problem
>> is that sometimes we end up with not enough workers for certain
>> classes of jobs (e.g. High Memory), while part of the cluster sits
>> idle.
>>
>> Conceptually, in my mind we would define approximately a dozen Tasks,
>> one for each type of work we need to perform (with different resource
>> requirements), and then run Jobs, each with a Task and a unique
>> payload, but I don't think this model works with Mesos. It seems we'd
>> need to create a unique Task for every Job.
>>
>> -Bryan
>>
>> On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney <ke...@apache.org>
>> wrote:
>> > A job is a group of nearly-identical tasks plus some constraints like
>> rack
>> > diversity. The scheduler considers each task within a job equivalently
>> > schedulable, so you can't vary things like resource footprint. It's
>> > perfectly fine to have several jobs with just a single task, as long as
>> > each has a different job key (which is (role, environment, name)).
>> >
>> > Another approach is to have a bunch of uniform always-on workers (in
>> > different sizes). This can be expressed as a Service like so:
>> >
>> > # workers.aurora
>> > class Profile(Struct):
>> >   queue_name = Required(String)
>> >   resources = Required(Resources)
>> >   instances = Required(Integer)
>> >
>> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
>> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
>> >
>> > work_forever = Process(name = 'work_forever',
>> >   cmdline = '''
>> >     # TODO: Replace this with something that isn't pseudo-bash
>> >     while true; do
>> >       work_item=`take_from_work_queue {{profile.queue_name}}`
>> >       do_work "$work_item"
>> >       tell_work_queue_finished "{{profile.queue_name}}" "$work_item"
>> >     done
>> >   ''')
>> >
>> > task = Task(processes = [work_forever],
>> > *  resources = '{{profile.resources}}, # Note this is static per
>> > queue-name.*
>> > )
>> >
>> > service = Service(
>> >   task = task,
>> >   cluster = 'west',
>> >   role = 'service-account-name',
>> >   environment = 'prod',
>> >   name = '{{profile.queue_name}}_processor'
>> >   *instances = '{{profile.instances}}', # Scale here.*
>> > )
>> >
>> > jobs = [
>> >   service.bind(profile = Profile(
>> >     resources = HIGH_MEM,
>> >     queue_name = 'graph_traversals',
>> >     instances = 50,
>> >   )),
>> >   service.bind(profile = Profile(
>> >     resources = HIGH_CPU,
>> >     queue_name = 'compilations',
>> >     instances = 200,
>> >   )),
>> > ]
>> >
>> >
>> > On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp <bryan@codeclimate.com
>> >wrote:
>> >
>> >> Thanks, Bill.
>> >>
>> >> Am I correct in understanding that is not possible to parameterize
>> >> individual Jobs, just Tasks? Therefore, since I don't know the job
>> >> definitions up front, I will have parameterized Task templates, and
>> >> generate a new Task every time I need to run a Job?
>> >>
>> >> Is that the recommended route?
>> >>
>> >> Our work is very non-uniform so I don't think work-stealing would be
>> >> efficient for us.
>> >>
>> >> -Bryan
>> >>
>> >> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <wf...@apache.org>
>> wrote:
>> >> > Thanks for checking out Aurora!
>> >> >
>> >> > My short answer is that Aurora should handle thousands of short-lived
>> >> > tasks/jobs per day without trouble.  (If you proceed with this
>> approach
>> >> and
>> >> > encounter performance issues, feel free to file tickets!)  The DSL
>> does
>> >> > have some mechanisms for parameterization.  In your case since you
>> >> probably
>> >> > don't know all the job definitions upfront, you'll probably want to
>> >> > parameterize with environment variables.  I don't see this described
>> in
>> >> our
>> >> > docs, but you there's a little detail at the option declaration [1].
>> >> >
>> >> > Another approach worth considering is work-stealing, using a single
>> job
>> >> as
>> >> > your pool of workers.  I would find this easier to manage, but it
>> would
>> >> > only be suitable if your work items are sufficiently-uniform.
>> >> >
>> >> > Feel free to continue the discussion!  We're also pretty active in
>> our
>> >> IRC
>> >> > channel if you'd prefer that medium.
>> >> >
>> >> >
>> >> > [1]
>> >> >
>> >>
>> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
>> >> >
>> >> >
>> >> > -=Bill
>> >> >
>> >> >
>> >> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <
>> bryan@codeclimate.com
>> >> >wrote:
>> >> >
>> >> >> Hello,
>> >> >>
>> >> >> I am considering Aurora for a key component of our infrastructure.
>> >> >> Awesome work being done here.
>> >> >>
>> >> >> My question is: How suitable is Aurora for running short-lived
>> tasks?
>> >> >>
>> >> >> Background: We (Code Climate) do static analysis of tens of
>> thousands
>> >> >> of repositories every day. We run a variety of forms of analysis,
>> with
>> >> >> heterogeneous resource requirements, and thus our interest in Mesos.
>> >> >>
>> >> >> Looking at Aurora, a lot of the core features look very helpful to
>> us.
>> >> >> Where I am getting hung up is figuring out how to model short-lived
>> >> >> tasks as tasks/jobs. Long-running resource allocations are not
>> really
>> >> >> an option for us due to the variation in our workloads.
>> >> >>
>> >> >> My first thought was to create a Task for each type of analysis we
>> >> >> run, and then start a new Job with the appropriate Task every time
>> we
>> >> >> want to run analysis (regulated by a queue). This doesn't seem to
>> work
>> >> >> though. I can't `aurora create` the same `.aurora` file multiple
>> times
>> >> >> with different Job names (as far as I can tell). Also there is the
>> >> >> problem of how to customize each Job slightly (e.g. a payload).
>> >> >>
>> >> >> An obvious alternative is to create a unique Task every time we want
>> >> >> to run work. This would result in tens of thousands of tasks being
>> >> >> created every day, and from what I can tell Aurora does not intend
>> to
>> >> >> be used like that. (Please correct me if I am wrong.)
>> >> >>
>> >> >> Basically, I would like to hook my job queue up to Aurora to perform
>> >> >> the actual work. There are a dozen different types of jobs, each
>> with
>> >> >> different performance requirements. Every time a job runs, it has a
>> >> >> unique payload containing the definition of the work it should be
>> >> >> performed.
>> >> >>
>> >> >> Can Aurora be used this way? If so, what is the proper way to model
>> >> >> this with respect to Jobs and Tasks?
>> >> >>
>> >> >> Any/all help is appreciated.
>> >> >>
>> >> >> Thanks!
>> >> >>
>> >> >> -Bryan
>> >> >>
>> >> >> --
>> >> >> Bryan Helmkamp, Founder, Code Climate
>> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Bryan Helmkamp, Founder, Code Climate
>> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >>
>>
>>
>>
>> --
>> Bryan Helmkamp, Founder, Code Climate
>> bryan@codeclimate.com / 646-379-1810 / @brynary
>>
>
>

Re: Suitibility of Aurora for one-time tasks

Posted by Bill Farner <wf...@apache.org>.
On Wed, Feb 26, 2014 at 7:45 PM, Bryan Helmkamp <br...@codeclimate.com>wrote:

> Got it. Thanks. Do finished Jobs and Tasks get garbage collected
> automatically at some point?


> Otherwise it seems like they will stack up pretty fast. (We might run
> hundreds of thousands of jobs in a day.)
>

Jobs are garbage-collected after a configurable period of inactivity.  This
is tuned on the scheduler with the command line arg history_prune_threshold,
default is currently 2 days.


>
> BTW, Aurora does not seem to like the resources =
> '{{resources[{{resource_profile}}]}}' part. I tried to fix it, but
> keep getting:
>
>     InvalidConfigError: Expected dictionary argument, got
> '{{resources[{{resource_profile}}]}}'
>

Kevin -- does the DSL support nested interpolation?  Either way, maybe you
meant this:

task = Task(processes = [work_on_one_item],
  resources = '{{resources[{{work_item}}]}}')


>
> (For now I'm using a different .aurora file for each resource
> configuration.)
>
> Best,
>
> -Bryan
>
> On Wed, Feb 26, 2014 at 9:04 PM, Kevin Sweeney <ke...@apache.org> wrote:
> > And after a bit of code spelunking the semantics you want already exist
> > (just undocumented). Updated the ticket to update the documentation.
> >
> >
> > On Wed, Feb 26, 2014 at 6:00 PM, Kevin Sweeney <ke...@apache.org>
> wrote:
> >
> >> The example I gave is somewhat syntactically invalid due to coding via
> >> email, but that's more or less what the interface will look like. I also
> >> filed https://issues.apache.org/jira/browse/AURORA-236 for more
> >> first-class support of the semantics I think you want (though currently
> you
> >> can fake it by setting max_failures to a very high number).
> >>
> >>
> >> On Wed, Feb 26, 2014 at 5:33 PM, Bryan Helmkamp <bryan@codeclimate.com
> >wrote:
> >>
> >>> Thanks, Kevin. That pretty much looks like exactly what I need.
> >>>
> >>> -Bryan
> >>>
> >>> On Wed, Feb 26, 2014 at 8:16 PM, Kevin Sweeney <ke...@apache.org>
> >>> wrote:
> >>> > For a more dynamic approach to resource utilization you can use
> >>> something
> >>> > like this:
> >>> >
> >>> > # dynamic.aurora
> >>> > *# Enqueue each individual work-item with aurora create -E
> >>> > work_item=$work_item -E resource_profile=graph_traversals
> >>> > west/service-account-name/prod/process_$work_item*
> >>> > class Profile(Struct):
> >>> >   queue_name = Required(String)
> >>> >   resources = Required(Resources)
> >>> >
> >>> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
> >>> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
> >>> >
> >>> > work_on_one_item = Process(name = 'work_on_one_item',
> >>> >   cmdline = '''
> >>> >     do_work "{{work_item}}"
> >>> >   ''',
> >>> > )
> >>> >
> >>> > task = Task(processes = [work_on_one_item],
> >>> >   resources = '{{resources[{{resource_profile}}]}}')
> >>> >
> >>> > job = Job(
> >>> >   task = task,
> >>> >   cluster = 'west',
> >>> >   role = 'service-account-name',
> >>> >   environment = 'prod',
> >>> >   name = 'process_{{work_item}}',
> >>> > )
> >>> >
> >>> > resources = {
> >>> >   'graph_traversals': HIGH_MEM,
> >>> >   'compilations': HIGH_CPU,
> >>> > }
> >>> >
> >>> > jobs = [job.bind(resources = resources)]
> >>> >
> >>> >
> >>> >
> >>> > On Wed, Feb 26, 2014 at 1:08 PM, Bryan Helmkamp <
> bryan@codeclimate.com
> >>> >wrote:
> >>> >
> >>> >> Sure. Yes, they are shell commands and yes they are provided
> different
> >>> >> configuration on each run.
> >>> >>
> >>> >> In effect we have a number of different job types that are queued
> up,
> >>> >> and we need to run as quickly as possible. Each job type has
> different
> >>> >> resource requirements. Every time we run the job, we provide
> different
> >>> >> arguments (the "payload"). For example:
> >>> >>
> >>> >> $ ./do_something.sh SOME_ID (Requires 1 CPU and 1GB RAM)
> >>> >> $ ./do_something_else.sh SOME_OTHER_ID (Requires 4 CPU and 4GB RAM)
> >>> >> [... there are about 12 of these ...]
> >>> >>
> >>> >> -Bryan
> >>> >>
> >>> >> On Wed, Feb 26, 2014 at 3:58 PM, Bill Farner <wf...@apache.org>
> >>> wrote:
> >>> >> > Can you offer some more details on what the workload execution
> looks
> >>> >> like?
> >>> >> >  Are these shell commands?  An application that's provided
> different
> >>> >> > configuration?
> >>> >> >
> >>> >> > -=Bill
> >>> >> >
> >>> >> >
> >>> >> > On Wed, Feb 26, 2014 at 12:45 PM, Bryan Helmkamp <
> >>> bryan@codeclimate.com
> >>> >> >wrote:
> >>> >> >
> >>> >> >> Thanks, Kevin. The idea of always-on workers of varying sizes is
> >>> >> >> effectively what we have right now in our non-Mesos world. The
> >>> problem
> >>> >> >> is that sometimes we end up with not enough workers for certain
> >>> >> >> classes of jobs (e.g. High Memory), while part of the cluster
> sits
> >>> >> >> idle.
> >>> >> >>
> >>> >> >> Conceptually, in my mind we would define approximately a dozen
> >>> Tasks,
> >>> >> >> one for each type of work we need to perform (with different
> >>> resource
> >>> >> >> requirements), and then run Jobs, each with a Task and a unique
> >>> >> >> payload, but I don't think this model works with Mesos. It seems
> >>> we'd
> >>> >> >> need to create a unique Task for every Job.
> >>> >> >>
> >>> >> >> -Bryan
> >>> >> >>
> >>> >> >> On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney <
> kevints@apache.org>
> >>> >> wrote:
> >>> >> >> > A job is a group of nearly-identical tasks plus some
> constraints
> >>> like
> >>> >> >> rack
> >>> >> >> > diversity. The scheduler considers each task within a job
> >>> equivalently
> >>> >> >> > schedulable, so you can't vary things like resource footprint.
> >>> It's
> >>> >> >> > perfectly fine to have several jobs with just a single task, as
> >>> long
> >>> >> as
> >>> >> >> > each has a different job key (which is (role, environment,
> name)).
> >>> >> >> >
> >>> >> >> > Another approach is to have a bunch of uniform always-on
> workers
> >>> (in
> >>> >> >> > different sizes). This can be expressed as a Service like so:
> >>> >> >> >
> >>> >> >> > # workers.aurora
> >>> >> >> > class Profile(Struct):
> >>> >> >> >   queue_name = Required(String)
> >>> >> >> >   resources = Required(Resources)
> >>> >> >> >   instances = Required(Integer)
> >>> >> >> >
> >>> >> >> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
> >>> >> >> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
> >>> >> >> >
> >>> >> >> > work_forever = Process(name = 'work_forever',
> >>> >> >> >   cmdline = '''
> >>> >> >> >     # TODO: Replace this with something that isn't pseudo-bash
> >>> >> >> >     while true; do
> >>> >> >> >       work_item=`take_from_work_queue {{profile.queue_name}}`
> >>> >> >> >       do_work "$work_item"
> >>> >> >> >       tell_work_queue_finished "{{profile.queue_name}}"
> >>> "$work_item"
> >>> >> >> >     done
> >>> >> >> >   ''')
> >>> >> >> >
> >>> >> >> > task = Task(processes = [work_forever],
> >>> >> >> > *  resources = '{{profile.resources}}, # Note this is static
> per
> >>> >> >> > queue-name.*
> >>> >> >> > )
> >>> >> >> >
> >>> >> >> > service = Service(
> >>> >> >> >   task = task,
> >>> >> >> >   cluster = 'west',
> >>> >> >> >   role = 'service-account-name',
> >>> >> >> >   environment = 'prod',
> >>> >> >> >   name = '{{profile.queue_name}}_processor'
> >>> >> >> >   *instances = '{{profile.instances}}', # Scale here.*
> >>> >> >> > )
> >>> >> >> >
> >>> >> >> > jobs = [
> >>> >> >> >   service.bind(profile = Profile(
> >>> >> >> >     resources = HIGH_MEM,
> >>> >> >> >     queue_name = 'graph_traversals',
> >>> >> >> >     instances = 50,
> >>> >> >> >   )),
> >>> >> >> >   service.bind(profile = Profile(
> >>> >> >> >     resources = HIGH_CPU,
> >>> >> >> >     queue_name = 'compilations',
> >>> >> >> >     instances = 200,
> >>> >> >> >   )),
> >>> >> >> > ]
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp <
> >>> >> bryan@codeclimate.com
> >>> >> >> >wrote:
> >>> >> >> >
> >>> >> >> >> Thanks, Bill.
> >>> >> >> >>
> >>> >> >> >> Am I correct in understanding that is not possible to
> >>> parameterize
> >>> >> >> >> individual Jobs, just Tasks? Therefore, since I don't know the
> >>> job
> >>> >> >> >> definitions up front, I will have parameterized Task
> templates,
> >>> and
> >>> >> >> >> generate a new Task every time I need to run a Job?
> >>> >> >> >>
> >>> >> >> >> Is that the recommended route?
> >>> >> >> >>
> >>> >> >> >> Our work is very non-uniform so I don't think work-stealing
> >>> would be
> >>> >> >> >> efficient for us.
> >>> >> >> >>
> >>> >> >> >> -Bryan
> >>> >> >> >>
> >>> >> >> >> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <
> >>> wfarner@apache.org>
> >>> >> >> wrote:
> >>> >> >> >> > Thanks for checking out Aurora!
> >>> >> >> >> >
> >>> >> >> >> > My short answer is that Aurora should handle thousands of
> >>> >> short-lived
> >>> >> >> >> > tasks/jobs per day without trouble.  (If you proceed with
> this
> >>> >> >> approach
> >>> >> >> >> and
> >>> >> >> >> > encounter performance issues, feel free to file tickets!)
>  The
> >>> DSL
> >>> >> >> does
> >>> >> >> >> > have some mechanisms for parameterization.  In your case
> since
> >>> you
> >>> >> >> >> probably
> >>> >> >> >> > don't know all the job definitions upfront, you'll probably
> >>> want to
> >>> >> >> >> > parameterize with environment variables.  I don't see this
> >>> >> described
> >>> >> >> in
> >>> >> >> >> our
> >>> >> >> >> > docs, but you there's a little detail at the option
> declaration
> >>> >> [1].
> >>> >> >> >> >
> >>> >> >> >> > Another approach worth considering is work-stealing, using a
> >>> single
> >>> >> >> job
> >>> >> >> >> as
> >>> >> >> >> > your pool of workers.  I would find this easier to manage,
> but
> >>> it
> >>> >> >> would
> >>> >> >> >> > only be suitable if your work items are
> sufficiently-uniform.
> >>> >> >> >> >
> >>> >> >> >> > Feel free to continue the discussion!  We're also pretty
> >>> active in
> >>> >> our
> >>> >> >> >> IRC
> >>> >> >> >> > channel if you'd prefer that medium.
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> > [1]
> >>> >> >> >> >
> >>> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> > -=Bill
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <
> >>> >> >> bryan@codeclimate.com
> >>> >> >> >> >wrote:
> >>> >> >> >> >
> >>> >> >> >> >> Hello,
> >>> >> >> >> >>
> >>> >> >> >> >> I am considering Aurora for a key component of our
> >>> infrastructure.
> >>> >> >> >> >> Awesome work being done here.
> >>> >> >> >> >>
> >>> >> >> >> >> My question is: How suitable is Aurora for running
> short-lived
> >>> >> tasks?
> >>> >> >> >> >>
> >>> >> >> >> >> Background: We (Code Climate) do static analysis of tens of
> >>> >> thousands
> >>> >> >> >> >> of repositories every day. We run a variety of forms of
> >>> analysis,
> >>> >> >> with
> >>> >> >> >> >> heterogeneous resource requirements, and thus our interest
> in
> >>> >> Mesos.
> >>> >> >> >> >>
> >>> >> >> >> >> Looking at Aurora, a lot of the core features look very
> >>> helpful to
> >>> >> >> us.
> >>> >> >> >> >> Where I am getting hung up is figuring out how to model
> >>> >> short-lived
> >>> >> >> >> >> tasks as tasks/jobs. Long-running resource allocations are
> not
> >>> >> really
> >>> >> >> >> >> an option for us due to the variation in our workloads.
> >>> >> >> >> >>
> >>> >> >> >> >> My first thought was to create a Task for each type of
> >>> analysis we
> >>> >> >> >> >> run, and then start a new Job with the appropriate Task
> every
> >>> >> time we
> >>> >> >> >> >> want to run analysis (regulated by a queue). This doesn't
> >>> seem to
> >>> >> >> work
> >>> >> >> >> >> though. I can't `aurora create` the same `.aurora` file
> >>> multiple
> >>> >> >> times
> >>> >> >> >> >> with different Job names (as far as I can tell). Also there
> >>> is the
> >>> >> >> >> >> problem of how to customize each Job slightly (e.g. a
> >>> payload).
> >>> >> >> >> >>
> >>> >> >> >> >> An obvious alternative is to create a unique Task every
> time
> >>> we
> >>> >> want
> >>> >> >> >> >> to run work. This would result in tens of thousands of
> tasks
> >>> being
> >>> >> >> >> >> created every day, and from what I can tell Aurora does not
> >>> >> intend to
> >>> >> >> >> >> be used like that. (Please correct me if I am wrong.)
> >>> >> >> >> >>
> >>> >> >> >> >> Basically, I would like to hook my job queue up to Aurora
> to
> >>> >> perform
> >>> >> >> >> >> the actual work. There are a dozen different types of jobs,
> >>> each
> >>> >> with
> >>> >> >> >> >> different performance requirements. Every time a job runs,
> it
> >>> has
> >>> >> a
> >>> >> >> >> >> unique payload containing the definition of the work it
> >>> should be
> >>> >> >> >> >> performed.
> >>> >> >> >> >>
> >>> >> >> >> >> Can Aurora be used this way? If so, what is the proper way
> to
> >>> >> model
> >>> >> >> >> >> this with respect to Jobs and Tasks?
> >>> >> >> >> >>
> >>> >> >> >> >> Any/all help is appreciated.
> >>> >> >> >> >>
> >>> >> >> >> >> Thanks!
> >>> >> >> >> >>
> >>> >> >> >> >> -Bryan
> >>> >> >> >> >>
> >>> >> >> >> >> --
> >>> >> >> >> >> Bryan Helmkamp, Founder, Code Climate
> >>> >> >> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >>> >> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> --
> >>> >> >> >> Bryan Helmkamp, Founder, Code Climate
> >>> >> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >>> >> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> --
> >>> >> >> Bryan Helmkamp, Founder, Code Climate
> >>> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >>> >> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Bryan Helmkamp, Founder, Code Climate
> >>> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Bryan Helmkamp, Founder, Code Climate
> >>> bryan@codeclimate.com / 646-379-1810 / @brynary
> >>>
> >>
> >>
>
>
>
> --
> Bryan Helmkamp, Founder, Code Climate
> bryan@codeclimate.com / 646-379-1810 / @brynary
>

Re: Suitibility of Aurora for one-time tasks

Posted by Bryan Helmkamp <br...@codeclimate.com>.
Got it. Thanks. Do finished Jobs and Tasks get garbage collected
automatically at some point?

Otherwise it seems like they will stack up pretty fast. (We might run
hundreds of thousands of jobs in a day.)

BTW, Aurora does not seem to like the resources =
'{{resources[{{resource_profile}}]}}' part. I tried to fix it, but
keep getting:

    InvalidConfigError: Expected dictionary argument, got
'{{resources[{{resource_profile}}]}}'

(For now I'm using a different .aurora file for each resource configuration.)

Best,

-Bryan

On Wed, Feb 26, 2014 at 9:04 PM, Kevin Sweeney <ke...@apache.org> wrote:
> And after a bit of code spelunking the semantics you want already exist
> (just undocumented). Updated the ticket to update the documentation.
>
>
> On Wed, Feb 26, 2014 at 6:00 PM, Kevin Sweeney <ke...@apache.org> wrote:
>
>> The example I gave is somewhat syntactically invalid due to coding via
>> email, but that's more or less what the interface will look like. I also
>> filed https://issues.apache.org/jira/browse/AURORA-236 for more
>> first-class support of the semantics I think you want (though currently you
>> can fake it by setting max_failures to a very high number).
>>
>>
>> On Wed, Feb 26, 2014 at 5:33 PM, Bryan Helmkamp <br...@codeclimate.com>wrote:
>>
>>> Thanks, Kevin. That pretty much looks like exactly what I need.
>>>
>>> -Bryan
>>>
>>> On Wed, Feb 26, 2014 at 8:16 PM, Kevin Sweeney <ke...@apache.org>
>>> wrote:
>>> > For a more dynamic approach to resource utilization you can use
>>> something
>>> > like this:
>>> >
>>> > # dynamic.aurora
>>> > *# Enqueue each individual work-item with aurora create -E
>>> > work_item=$work_item -E resource_profile=graph_traversals
>>> > west/service-account-name/prod/process_$work_item*
>>> > class Profile(Struct):
>>> >   queue_name = Required(String)
>>> >   resources = Required(Resources)
>>> >
>>> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
>>> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
>>> >
>>> > work_on_one_item = Process(name = 'work_on_one_item',
>>> >   cmdline = '''
>>> >     do_work "{{work_item}}"
>>> >   ''',
>>> > )
>>> >
>>> > task = Task(processes = [work_on_one_item],
>>> >   resources = '{{resources[{{resource_profile}}]}}')
>>> >
>>> > job = Job(
>>> >   task = task,
>>> >   cluster = 'west',
>>> >   role = 'service-account-name',
>>> >   environment = 'prod',
>>> >   name = 'process_{{work_item}}',
>>> > )
>>> >
>>> > resources = {
>>> >   'graph_traversals': HIGH_MEM,
>>> >   'compilations': HIGH_CPU,
>>> > }
>>> >
>>> > jobs = [job.bind(resources = resources)]
>>> >
>>> >
>>> >
>>> > On Wed, Feb 26, 2014 at 1:08 PM, Bryan Helmkamp <bryan@codeclimate.com
>>> >wrote:
>>> >
>>> >> Sure. Yes, they are shell commands and yes they are provided different
>>> >> configuration on each run.
>>> >>
>>> >> In effect we have a number of different job types that are queued up,
>>> >> and we need to run as quickly as possible. Each job type has different
>>> >> resource requirements. Every time we run the job, we provide different
>>> >> arguments (the "payload"). For example:
>>> >>
>>> >> $ ./do_something.sh SOME_ID (Requires 1 CPU and 1GB RAM)
>>> >> $ ./do_something_else.sh SOME_OTHER_ID (Requires 4 CPU and 4GB RAM)
>>> >> [... there are about 12 of these ...]
>>> >>
>>> >> -Bryan
>>> >>
>>> >> On Wed, Feb 26, 2014 at 3:58 PM, Bill Farner <wf...@apache.org>
>>> wrote:
>>> >> > Can you offer some more details on what the workload execution looks
>>> >> like?
>>> >> >  Are these shell commands?  An application that's provided different
>>> >> > configuration?
>>> >> >
>>> >> > -=Bill
>>> >> >
>>> >> >
>>> >> > On Wed, Feb 26, 2014 at 12:45 PM, Bryan Helmkamp <
>>> bryan@codeclimate.com
>>> >> >wrote:
>>> >> >
>>> >> >> Thanks, Kevin. The idea of always-on workers of varying sizes is
>>> >> >> effectively what we have right now in our non-Mesos world. The
>>> problem
>>> >> >> is that sometimes we end up with not enough workers for certain
>>> >> >> classes of jobs (e.g. High Memory), while part of the cluster sits
>>> >> >> idle.
>>> >> >>
>>> >> >> Conceptually, in my mind we would define approximately a dozen
>>> Tasks,
>>> >> >> one for each type of work we need to perform (with different
>>> resource
>>> >> >> requirements), and then run Jobs, each with a Task and a unique
>>> >> >> payload, but I don't think this model works with Mesos. It seems
>>> we'd
>>> >> >> need to create a unique Task for every Job.
>>> >> >>
>>> >> >> -Bryan
>>> >> >>
>>> >> >> On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney <ke...@apache.org>
>>> >> wrote:
>>> >> >> > A job is a group of nearly-identical tasks plus some constraints
>>> like
>>> >> >> rack
>>> >> >> > diversity. The scheduler considers each task within a job
>>> equivalently
>>> >> >> > schedulable, so you can't vary things like resource footprint.
>>> It's
>>> >> >> > perfectly fine to have several jobs with just a single task, as
>>> long
>>> >> as
>>> >> >> > each has a different job key (which is (role, environment, name)).
>>> >> >> >
>>> >> >> > Another approach is to have a bunch of uniform always-on workers
>>> (in
>>> >> >> > different sizes). This can be expressed as a Service like so:
>>> >> >> >
>>> >> >> > # workers.aurora
>>> >> >> > class Profile(Struct):
>>> >> >> >   queue_name = Required(String)
>>> >> >> >   resources = Required(Resources)
>>> >> >> >   instances = Required(Integer)
>>> >> >> >
>>> >> >> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
>>> >> >> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
>>> >> >> >
>>> >> >> > work_forever = Process(name = 'work_forever',
>>> >> >> >   cmdline = '''
>>> >> >> >     # TODO: Replace this with something that isn't pseudo-bash
>>> >> >> >     while true; do
>>> >> >> >       work_item=`take_from_work_queue {{profile.queue_name}}`
>>> >> >> >       do_work "$work_item"
>>> >> >> >       tell_work_queue_finished "{{profile.queue_name}}"
>>> "$work_item"
>>> >> >> >     done
>>> >> >> >   ''')
>>> >> >> >
>>> >> >> > task = Task(processes = [work_forever],
>>> >> >> > *  resources = '{{profile.resources}}, # Note this is static per
>>> >> >> > queue-name.*
>>> >> >> > )
>>> >> >> >
>>> >> >> > service = Service(
>>> >> >> >   task = task,
>>> >> >> >   cluster = 'west',
>>> >> >> >   role = 'service-account-name',
>>> >> >> >   environment = 'prod',
>>> >> >> >   name = '{{profile.queue_name}}_processor'
>>> >> >> >   *instances = '{{profile.instances}}', # Scale here.*
>>> >> >> > )
>>> >> >> >
>>> >> >> > jobs = [
>>> >> >> >   service.bind(profile = Profile(
>>> >> >> >     resources = HIGH_MEM,
>>> >> >> >     queue_name = 'graph_traversals',
>>> >> >> >     instances = 50,
>>> >> >> >   )),
>>> >> >> >   service.bind(profile = Profile(
>>> >> >> >     resources = HIGH_CPU,
>>> >> >> >     queue_name = 'compilations',
>>> >> >> >     instances = 200,
>>> >> >> >   )),
>>> >> >> > ]
>>> >> >> >
>>> >> >> >
>>> >> >> > On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp <
>>> >> bryan@codeclimate.com
>>> >> >> >wrote:
>>> >> >> >
>>> >> >> >> Thanks, Bill.
>>> >> >> >>
>>> >> >> >> Am I correct in understanding that is not possible to
>>> parameterize
>>> >> >> >> individual Jobs, just Tasks? Therefore, since I don't know the
>>> job
>>> >> >> >> definitions up front, I will have parameterized Task templates,
>>> and
>>> >> >> >> generate a new Task every time I need to run a Job?
>>> >> >> >>
>>> >> >> >> Is that the recommended route?
>>> >> >> >>
>>> >> >> >> Our work is very non-uniform so I don't think work-stealing
>>> would be
>>> >> >> >> efficient for us.
>>> >> >> >>
>>> >> >> >> -Bryan
>>> >> >> >>
>>> >> >> >> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <
>>> wfarner@apache.org>
>>> >> >> wrote:
>>> >> >> >> > Thanks for checking out Aurora!
>>> >> >> >> >
>>> >> >> >> > My short answer is that Aurora should handle thousands of
>>> >> short-lived
>>> >> >> >> > tasks/jobs per day without trouble.  (If you proceed with this
>>> >> >> approach
>>> >> >> >> and
>>> >> >> >> > encounter performance issues, feel free to file tickets!)  The
>>> DSL
>>> >> >> does
>>> >> >> >> > have some mechanisms for parameterization.  In your case since
>>> you
>>> >> >> >> probably
>>> >> >> >> > don't know all the job definitions upfront, you'll probably
>>> want to
>>> >> >> >> > parameterize with environment variables.  I don't see this
>>> >> described
>>> >> >> in
>>> >> >> >> our
>>> >> >> >> > docs, but you there's a little detail at the option declaration
>>> >> [1].
>>> >> >> >> >
>>> >> >> >> > Another approach worth considering is work-stealing, using a
>>> single
>>> >> >> job
>>> >> >> >> as
>>> >> >> >> > your pool of workers.  I would find this easier to manage, but
>>> it
>>> >> >> would
>>> >> >> >> > only be suitable if your work items are sufficiently-uniform.
>>> >> >> >> >
>>> >> >> >> > Feel free to continue the discussion!  We're also pretty
>>> active in
>>> >> our
>>> >> >> >> IRC
>>> >> >> >> > channel if you'd prefer that medium.
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > [1]
>>> >> >> >> >
>>> >> >> >>
>>> >> >>
>>> >>
>>> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > -=Bill
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <
>>> >> >> bryan@codeclimate.com
>>> >> >> >> >wrote:
>>> >> >> >> >
>>> >> >> >> >> Hello,
>>> >> >> >> >>
>>> >> >> >> >> I am considering Aurora for a key component of our
>>> infrastructure.
>>> >> >> >> >> Awesome work being done here.
>>> >> >> >> >>
>>> >> >> >> >> My question is: How suitable is Aurora for running short-lived
>>> >> tasks?
>>> >> >> >> >>
>>> >> >> >> >> Background: We (Code Climate) do static analysis of tens of
>>> >> thousands
>>> >> >> >> >> of repositories every day. We run a variety of forms of
>>> analysis,
>>> >> >> with
>>> >> >> >> >> heterogeneous resource requirements, and thus our interest in
>>> >> Mesos.
>>> >> >> >> >>
>>> >> >> >> >> Looking at Aurora, a lot of the core features look very
>>> helpful to
>>> >> >> us.
>>> >> >> >> >> Where I am getting hung up is figuring out how to model
>>> >> short-lived
>>> >> >> >> >> tasks as tasks/jobs. Long-running resource allocations are not
>>> >> really
>>> >> >> >> >> an option for us due to the variation in our workloads.
>>> >> >> >> >>
>>> >> >> >> >> My first thought was to create a Task for each type of
>>> analysis we
>>> >> >> >> >> run, and then start a new Job with the appropriate Task every
>>> >> time we
>>> >> >> >> >> want to run analysis (regulated by a queue). This doesn't
>>> seem to
>>> >> >> work
>>> >> >> >> >> though. I can't `aurora create` the same `.aurora` file
>>> multiple
>>> >> >> times
>>> >> >> >> >> with different Job names (as far as I can tell). Also there
>>> is the
>>> >> >> >> >> problem of how to customize each Job slightly (e.g. a
>>> payload).
>>> >> >> >> >>
>>> >> >> >> >> An obvious alternative is to create a unique Task every time
>>> we
>>> >> want
>>> >> >> >> >> to run work. This would result in tens of thousands of tasks
>>> being
>>> >> >> >> >> created every day, and from what I can tell Aurora does not
>>> >> intend to
>>> >> >> >> >> be used like that. (Please correct me if I am wrong.)
>>> >> >> >> >>
>>> >> >> >> >> Basically, I would like to hook my job queue up to Aurora to
>>> >> perform
>>> >> >> >> >> the actual work. There are a dozen different types of jobs,
>>> each
>>> >> with
>>> >> >> >> >> different performance requirements. Every time a job runs, it
>>> has
>>> >> a
>>> >> >> >> >> unique payload containing the definition of the work it
>>> should be
>>> >> >> >> >> performed.
>>> >> >> >> >>
>>> >> >> >> >> Can Aurora be used this way? If so, what is the proper way to
>>> >> model
>>> >> >> >> >> this with respect to Jobs and Tasks?
>>> >> >> >> >>
>>> >> >> >> >> Any/all help is appreciated.
>>> >> >> >> >>
>>> >> >> >> >> Thanks!
>>> >> >> >> >>
>>> >> >> >> >> -Bryan
>>> >> >> >> >>
>>> >> >> >> >> --
>>> >> >> >> >> Bryan Helmkamp, Founder, Code Climate
>>> >> >> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>>> >> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >> Bryan Helmkamp, Founder, Code Climate
>>> >> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>>> >> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Bryan Helmkamp, Founder, Code Climate
>>> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>>> >> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Bryan Helmkamp, Founder, Code Climate
>>> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>>> >>
>>>
>>>
>>>
>>> --
>>> Bryan Helmkamp, Founder, Code Climate
>>> bryan@codeclimate.com / 646-379-1810 / @brynary
>>>
>>
>>



-- 
Bryan Helmkamp, Founder, Code Climate
bryan@codeclimate.com / 646-379-1810 / @brynary

Re: Suitibility of Aurora for one-time tasks

Posted by Kevin Sweeney <ke...@apache.org>.
And after a bit of code spelunking the semantics you want already exist
(just undocumented). Updated the ticket to update the documentation.


On Wed, Feb 26, 2014 at 6:00 PM, Kevin Sweeney <ke...@apache.org> wrote:

> The example I gave is somewhat syntactically invalid due to coding via
> email, but that's more or less what the interface will look like. I also
> filed https://issues.apache.org/jira/browse/AURORA-236 for more
> first-class support of the semantics I think you want (though currently you
> can fake it by setting max_failures to a very high number).
>
>
> On Wed, Feb 26, 2014 at 5:33 PM, Bryan Helmkamp <br...@codeclimate.com>wrote:
>
>> Thanks, Kevin. That pretty much looks like exactly what I need.
>>
>> -Bryan
>>
>> On Wed, Feb 26, 2014 at 8:16 PM, Kevin Sweeney <ke...@apache.org>
>> wrote:
>> > For a more dynamic approach to resource utilization you can use
>> something
>> > like this:
>> >
>> > # dynamic.aurora
>> > *# Enqueue each individual work-item with aurora create -E
>> > work_item=$work_item -E resource_profile=graph_traversals
>> > west/service-account-name/prod/process_$work_item*
>> > class Profile(Struct):
>> >   queue_name = Required(String)
>> >   resources = Required(Resources)
>> >
>> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
>> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
>> >
>> > work_on_one_item = Process(name = 'work_on_one_item',
>> >   cmdline = '''
>> >     do_work "{{work_item}}"
>> >   ''',
>> > )
>> >
>> > task = Task(processes = [work_on_one_item],
>> >   resources = '{{resources[{{resource_profile}}]}}')
>> >
>> > job = Job(
>> >   task = task,
>> >   cluster = 'west',
>> >   role = 'service-account-name',
>> >   environment = 'prod',
>> >   name = 'process_{{work_item}}',
>> > )
>> >
>> > resources = {
>> >   'graph_traversals': HIGH_MEM,
>> >   'compilations': HIGH_CPU,
>> > }
>> >
>> > jobs = [job.bind(resources = resources)]
>> >
>> >
>> >
>> > On Wed, Feb 26, 2014 at 1:08 PM, Bryan Helmkamp <bryan@codeclimate.com
>> >wrote:
>> >
>> >> Sure. Yes, they are shell commands and yes they are provided different
>> >> configuration on each run.
>> >>
>> >> In effect we have a number of different job types that are queued up,
>> >> and we need to run as quickly as possible. Each job type has different
>> >> resource requirements. Every time we run the job, we provide different
>> >> arguments (the "payload"). For example:
>> >>
>> >> $ ./do_something.sh SOME_ID (Requires 1 CPU and 1GB RAM)
>> >> $ ./do_something_else.sh SOME_OTHER_ID (Requires 4 CPU and 4GB RAM)
>> >> [... there are about 12 of these ...]
>> >>
>> >> -Bryan
>> >>
>> >> On Wed, Feb 26, 2014 at 3:58 PM, Bill Farner <wf...@apache.org>
>> wrote:
>> >> > Can you offer some more details on what the workload execution looks
>> >> like?
>> >> >  Are these shell commands?  An application that's provided different
>> >> > configuration?
>> >> >
>> >> > -=Bill
>> >> >
>> >> >
>> >> > On Wed, Feb 26, 2014 at 12:45 PM, Bryan Helmkamp <
>> bryan@codeclimate.com
>> >> >wrote:
>> >> >
>> >> >> Thanks, Kevin. The idea of always-on workers of varying sizes is
>> >> >> effectively what we have right now in our non-Mesos world. The
>> problem
>> >> >> is that sometimes we end up with not enough workers for certain
>> >> >> classes of jobs (e.g. High Memory), while part of the cluster sits
>> >> >> idle.
>> >> >>
>> >> >> Conceptually, in my mind we would define approximately a dozen
>> Tasks,
>> >> >> one for each type of work we need to perform (with different
>> resource
>> >> >> requirements), and then run Jobs, each with a Task and a unique
>> >> >> payload, but I don't think this model works with Mesos. It seems
>> we'd
>> >> >> need to create a unique Task for every Job.
>> >> >>
>> >> >> -Bryan
>> >> >>
>> >> >> On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney <ke...@apache.org>
>> >> wrote:
>> >> >> > A job is a group of nearly-identical tasks plus some constraints
>> like
>> >> >> rack
>> >> >> > diversity. The scheduler considers each task within a job
>> equivalently
>> >> >> > schedulable, so you can't vary things like resource footprint.
>> It's
>> >> >> > perfectly fine to have several jobs with just a single task, as
>> long
>> >> as
>> >> >> > each has a different job key (which is (role, environment, name)).
>> >> >> >
>> >> >> > Another approach is to have a bunch of uniform always-on workers
>> (in
>> >> >> > different sizes). This can be expressed as a Service like so:
>> >> >> >
>> >> >> > # workers.aurora
>> >> >> > class Profile(Struct):
>> >> >> >   queue_name = Required(String)
>> >> >> >   resources = Required(Resources)
>> >> >> >   instances = Required(Integer)
>> >> >> >
>> >> >> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
>> >> >> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
>> >> >> >
>> >> >> > work_forever = Process(name = 'work_forever',
>> >> >> >   cmdline = '''
>> >> >> >     # TODO: Replace this with something that isn't pseudo-bash
>> >> >> >     while true; do
>> >> >> >       work_item=`take_from_work_queue {{profile.queue_name}}`
>> >> >> >       do_work "$work_item"
>> >> >> >       tell_work_queue_finished "{{profile.queue_name}}"
>> "$work_item"
>> >> >> >     done
>> >> >> >   ''')
>> >> >> >
>> >> >> > task = Task(processes = [work_forever],
>> >> >> > *  resources = '{{profile.resources}}, # Note this is static per
>> >> >> > queue-name.*
>> >> >> > )
>> >> >> >
>> >> >> > service = Service(
>> >> >> >   task = task,
>> >> >> >   cluster = 'west',
>> >> >> >   role = 'service-account-name',
>> >> >> >   environment = 'prod',
>> >> >> >   name = '{{profile.queue_name}}_processor'
>> >> >> >   *instances = '{{profile.instances}}', # Scale here.*
>> >> >> > )
>> >> >> >
>> >> >> > jobs = [
>> >> >> >   service.bind(profile = Profile(
>> >> >> >     resources = HIGH_MEM,
>> >> >> >     queue_name = 'graph_traversals',
>> >> >> >     instances = 50,
>> >> >> >   )),
>> >> >> >   service.bind(profile = Profile(
>> >> >> >     resources = HIGH_CPU,
>> >> >> >     queue_name = 'compilations',
>> >> >> >     instances = 200,
>> >> >> >   )),
>> >> >> > ]
>> >> >> >
>> >> >> >
>> >> >> > On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp <
>> >> bryan@codeclimate.com
>> >> >> >wrote:
>> >> >> >
>> >> >> >> Thanks, Bill.
>> >> >> >>
>> >> >> >> Am I correct in understanding that is not possible to
>> parameterize
>> >> >> >> individual Jobs, just Tasks? Therefore, since I don't know the
>> job
>> >> >> >> definitions up front, I will have parameterized Task templates,
>> and
>> >> >> >> generate a new Task every time I need to run a Job?
>> >> >> >>
>> >> >> >> Is that the recommended route?
>> >> >> >>
>> >> >> >> Our work is very non-uniform so I don't think work-stealing
>> would be
>> >> >> >> efficient for us.
>> >> >> >>
>> >> >> >> -Bryan
>> >> >> >>
>> >> >> >> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <
>> wfarner@apache.org>
>> >> >> wrote:
>> >> >> >> > Thanks for checking out Aurora!
>> >> >> >> >
>> >> >> >> > My short answer is that Aurora should handle thousands of
>> >> short-lived
>> >> >> >> > tasks/jobs per day without trouble.  (If you proceed with this
>> >> >> approach
>> >> >> >> and
>> >> >> >> > encounter performance issues, feel free to file tickets!)  The
>> DSL
>> >> >> does
>> >> >> >> > have some mechanisms for parameterization.  In your case since
>> you
>> >> >> >> probably
>> >> >> >> > don't know all the job definitions upfront, you'll probably
>> want to
>> >> >> >> > parameterize with environment variables.  I don't see this
>> >> described
>> >> >> in
>> >> >> >> our
>> >> >> >> > docs, but you there's a little detail at the option declaration
>> >> [1].
>> >> >> >> >
>> >> >> >> > Another approach worth considering is work-stealing, using a
>> single
>> >> >> job
>> >> >> >> as
>> >> >> >> > your pool of workers.  I would find this easier to manage, but
>> it
>> >> >> would
>> >> >> >> > only be suitable if your work items are sufficiently-uniform.
>> >> >> >> >
>> >> >> >> > Feel free to continue the discussion!  We're also pretty
>> active in
>> >> our
>> >> >> >> IRC
>> >> >> >> > channel if you'd prefer that medium.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > [1]
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > -=Bill
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <
>> >> >> bryan@codeclimate.com
>> >> >> >> >wrote:
>> >> >> >> >
>> >> >> >> >> Hello,
>> >> >> >> >>
>> >> >> >> >> I am considering Aurora for a key component of our
>> infrastructure.
>> >> >> >> >> Awesome work being done here.
>> >> >> >> >>
>> >> >> >> >> My question is: How suitable is Aurora for running short-lived
>> >> tasks?
>> >> >> >> >>
>> >> >> >> >> Background: We (Code Climate) do static analysis of tens of
>> >> thousands
>> >> >> >> >> of repositories every day. We run a variety of forms of
>> analysis,
>> >> >> with
>> >> >> >> >> heterogeneous resource requirements, and thus our interest in
>> >> Mesos.
>> >> >> >> >>
>> >> >> >> >> Looking at Aurora, a lot of the core features look very
>> helpful to
>> >> >> us.
>> >> >> >> >> Where I am getting hung up is figuring out how to model
>> >> short-lived
>> >> >> >> >> tasks as tasks/jobs. Long-running resource allocations are not
>> >> really
>> >> >> >> >> an option for us due to the variation in our workloads.
>> >> >> >> >>
>> >> >> >> >> My first thought was to create a Task for each type of
>> analysis we
>> >> >> >> >> run, and then start a new Job with the appropriate Task every
>> >> time we
>> >> >> >> >> want to run analysis (regulated by a queue). This doesn't
>> seem to
>> >> >> work
>> >> >> >> >> though. I can't `aurora create` the same `.aurora` file
>> multiple
>> >> >> times
>> >> >> >> >> with different Job names (as far as I can tell). Also there
>> is the
>> >> >> >> >> problem of how to customize each Job slightly (e.g. a
>> payload).
>> >> >> >> >>
>> >> >> >> >> An obvious alternative is to create a unique Task every time
>> we
>> >> want
>> >> >> >> >> to run work. This would result in tens of thousands of tasks
>> being
>> >> >> >> >> created every day, and from what I can tell Aurora does not
>> >> intend to
>> >> >> >> >> be used like that. (Please correct me if I am wrong.)
>> >> >> >> >>
>> >> >> >> >> Basically, I would like to hook my job queue up to Aurora to
>> >> perform
>> >> >> >> >> the actual work. There are a dozen different types of jobs,
>> each
>> >> with
>> >> >> >> >> different performance requirements. Every time a job runs, it
>> has
>> >> a
>> >> >> >> >> unique payload containing the definition of the work it
>> should be
>> >> >> >> >> performed.
>> >> >> >> >>
>> >> >> >> >> Can Aurora be used this way? If so, what is the proper way to
>> >> model
>> >> >> >> >> this with respect to Jobs and Tasks?
>> >> >> >> >>
>> >> >> >> >> Any/all help is appreciated.
>> >> >> >> >>
>> >> >> >> >> Thanks!
>> >> >> >> >>
>> >> >> >> >> -Bryan
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >> Bryan Helmkamp, Founder, Code Climate
>> >> >> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Bryan Helmkamp, Founder, Code Climate
>> >> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Bryan Helmkamp, Founder, Code Climate
>> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Bryan Helmkamp, Founder, Code Climate
>> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >>
>>
>>
>>
>> --
>> Bryan Helmkamp, Founder, Code Climate
>> bryan@codeclimate.com / 646-379-1810 / @brynary
>>
>
>

Re: Suitibility of Aurora for one-time tasks

Posted by Kevin Sweeney <ke...@apache.org>.
The example I gave is somewhat syntactically invalid due to coding via
email, but that's more or less what the interface will look like. I also
filed https://issues.apache.org/jira/browse/AURORA-236 for more first-class
support of the semantics I think you want (though currently you can fake it
by setting max_failures to a very high number).


On Wed, Feb 26, 2014 at 5:33 PM, Bryan Helmkamp <br...@codeclimate.com>wrote:

> Thanks, Kevin. That pretty much looks like exactly what I need.
>
> -Bryan
>
> On Wed, Feb 26, 2014 at 8:16 PM, Kevin Sweeney <ke...@apache.org> wrote:
> > For a more dynamic approach to resource utilization you can use something
> > like this:
> >
> > # dynamic.aurora
> > *# Enqueue each individual work-item with aurora create -E
> > work_item=$work_item -E resource_profile=graph_traversals
> > west/service-account-name/prod/process_$work_item*
> > class Profile(Struct):
> >   queue_name = Required(String)
> >   resources = Required(Resources)
> >
> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
> >
> > work_on_one_item = Process(name = 'work_on_one_item',
> >   cmdline = '''
> >     do_work "{{work_item}}"
> >   ''',
> > )
> >
> > task = Task(processes = [work_on_one_item],
> >   resources = '{{resources[{{resource_profile}}]}}')
> >
> > job = Job(
> >   task = task,
> >   cluster = 'west',
> >   role = 'service-account-name',
> >   environment = 'prod',
> >   name = 'process_{{work_item}}',
> > )
> >
> > resources = {
> >   'graph_traversals': HIGH_MEM,
> >   'compilations': HIGH_CPU,
> > }
> >
> > jobs = [job.bind(resources = resources)]
> >
> >
> >
> > On Wed, Feb 26, 2014 at 1:08 PM, Bryan Helmkamp <bryan@codeclimate.com
> >wrote:
> >
> >> Sure. Yes, they are shell commands and yes they are provided different
> >> configuration on each run.
> >>
> >> In effect we have a number of different job types that are queued up,
> >> and we need to run as quickly as possible. Each job type has different
> >> resource requirements. Every time we run the job, we provide different
> >> arguments (the "payload"). For example:
> >>
> >> $ ./do_something.sh SOME_ID (Requires 1 CPU and 1GB RAM)
> >> $ ./do_something_else.sh SOME_OTHER_ID (Requires 4 CPU and 4GB RAM)
> >> [... there are about 12 of these ...]
> >>
> >> -Bryan
> >>
> >> On Wed, Feb 26, 2014 at 3:58 PM, Bill Farner <wf...@apache.org>
> wrote:
> >> > Can you offer some more details on what the workload execution looks
> >> like?
> >> >  Are these shell commands?  An application that's provided different
> >> > configuration?
> >> >
> >> > -=Bill
> >> >
> >> >
> >> > On Wed, Feb 26, 2014 at 12:45 PM, Bryan Helmkamp <
> bryan@codeclimate.com
> >> >wrote:
> >> >
> >> >> Thanks, Kevin. The idea of always-on workers of varying sizes is
> >> >> effectively what we have right now in our non-Mesos world. The
> problem
> >> >> is that sometimes we end up with not enough workers for certain
> >> >> classes of jobs (e.g. High Memory), while part of the cluster sits
> >> >> idle.
> >> >>
> >> >> Conceptually, in my mind we would define approximately a dozen Tasks,
> >> >> one for each type of work we need to perform (with different resource
> >> >> requirements), and then run Jobs, each with a Task and a unique
> >> >> payload, but I don't think this model works with Mesos. It seems we'd
> >> >> need to create a unique Task for every Job.
> >> >>
> >> >> -Bryan
> >> >>
> >> >> On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney <ke...@apache.org>
> >> wrote:
> >> >> > A job is a group of nearly-identical tasks plus some constraints
> like
> >> >> rack
> >> >> > diversity. The scheduler considers each task within a job
> equivalently
> >> >> > schedulable, so you can't vary things like resource footprint. It's
> >> >> > perfectly fine to have several jobs with just a single task, as
> long
> >> as
> >> >> > each has a different job key (which is (role, environment, name)).
> >> >> >
> >> >> > Another approach is to have a bunch of uniform always-on workers
> (in
> >> >> > different sizes). This can be expressed as a Service like so:
> >> >> >
> >> >> > # workers.aurora
> >> >> > class Profile(Struct):
> >> >> >   queue_name = Required(String)
> >> >> >   resources = Required(Resources)
> >> >> >   instances = Required(Integer)
> >> >> >
> >> >> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
> >> >> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
> >> >> >
> >> >> > work_forever = Process(name = 'work_forever',
> >> >> >   cmdline = '''
> >> >> >     # TODO: Replace this with something that isn't pseudo-bash
> >> >> >     while true; do
> >> >> >       work_item=`take_from_work_queue {{profile.queue_name}}`
> >> >> >       do_work "$work_item"
> >> >> >       tell_work_queue_finished "{{profile.queue_name}}"
> "$work_item"
> >> >> >     done
> >> >> >   ''')
> >> >> >
> >> >> > task = Task(processes = [work_forever],
> >> >> > *  resources = '{{profile.resources}}, # Note this is static per
> >> >> > queue-name.*
> >> >> > )
> >> >> >
> >> >> > service = Service(
> >> >> >   task = task,
> >> >> >   cluster = 'west',
> >> >> >   role = 'service-account-name',
> >> >> >   environment = 'prod',
> >> >> >   name = '{{profile.queue_name}}_processor'
> >> >> >   *instances = '{{profile.instances}}', # Scale here.*
> >> >> > )
> >> >> >
> >> >> > jobs = [
> >> >> >   service.bind(profile = Profile(
> >> >> >     resources = HIGH_MEM,
> >> >> >     queue_name = 'graph_traversals',
> >> >> >     instances = 50,
> >> >> >   )),
> >> >> >   service.bind(profile = Profile(
> >> >> >     resources = HIGH_CPU,
> >> >> >     queue_name = 'compilations',
> >> >> >     instances = 200,
> >> >> >   )),
> >> >> > ]
> >> >> >
> >> >> >
> >> >> > On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp <
> >> bryan@codeclimate.com
> >> >> >wrote:
> >> >> >
> >> >> >> Thanks, Bill.
> >> >> >>
> >> >> >> Am I correct in understanding that is not possible to parameterize
> >> >> >> individual Jobs, just Tasks? Therefore, since I don't know the job
> >> >> >> definitions up front, I will have parameterized Task templates,
> and
> >> >> >> generate a new Task every time I need to run a Job?
> >> >> >>
> >> >> >> Is that the recommended route?
> >> >> >>
> >> >> >> Our work is very non-uniform so I don't think work-stealing would
> be
> >> >> >> efficient for us.
> >> >> >>
> >> >> >> -Bryan
> >> >> >>
> >> >> >> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <wfarner@apache.org
> >
> >> >> wrote:
> >> >> >> > Thanks for checking out Aurora!
> >> >> >> >
> >> >> >> > My short answer is that Aurora should handle thousands of
> >> short-lived
> >> >> >> > tasks/jobs per day without trouble.  (If you proceed with this
> >> >> approach
> >> >> >> and
> >> >> >> > encounter performance issues, feel free to file tickets!)  The
> DSL
> >> >> does
> >> >> >> > have some mechanisms for parameterization.  In your case since
> you
> >> >> >> probably
> >> >> >> > don't know all the job definitions upfront, you'll probably
> want to
> >> >> >> > parameterize with environment variables.  I don't see this
> >> described
> >> >> in
> >> >> >> our
> >> >> >> > docs, but you there's a little detail at the option declaration
> >> [1].
> >> >> >> >
> >> >> >> > Another approach worth considering is work-stealing, using a
> single
> >> >> job
> >> >> >> as
> >> >> >> > your pool of workers.  I would find this easier to manage, but
> it
> >> >> would
> >> >> >> > only be suitable if your work items are sufficiently-uniform.
> >> >> >> >
> >> >> >> > Feel free to continue the discussion!  We're also pretty active
> in
> >> our
> >> >> >> IRC
> >> >> >> > channel if you'd prefer that medium.
> >> >> >> >
> >> >> >> >
> >> >> >> > [1]
> >> >> >> >
> >> >> >>
> >> >>
> >>
> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
> >> >> >> >
> >> >> >> >
> >> >> >> > -=Bill
> >> >> >> >
> >> >> >> >
> >> >> >> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <
> >> >> bryan@codeclimate.com
> >> >> >> >wrote:
> >> >> >> >
> >> >> >> >> Hello,
> >> >> >> >>
> >> >> >> >> I am considering Aurora for a key component of our
> infrastructure.
> >> >> >> >> Awesome work being done here.
> >> >> >> >>
> >> >> >> >> My question is: How suitable is Aurora for running short-lived
> >> tasks?
> >> >> >> >>
> >> >> >> >> Background: We (Code Climate) do static analysis of tens of
> >> thousands
> >> >> >> >> of repositories every day. We run a variety of forms of
> analysis,
> >> >> with
> >> >> >> >> heterogeneous resource requirements, and thus our interest in
> >> Mesos.
> >> >> >> >>
> >> >> >> >> Looking at Aurora, a lot of the core features look very
> helpful to
> >> >> us.
> >> >> >> >> Where I am getting hung up is figuring out how to model
> >> short-lived
> >> >> >> >> tasks as tasks/jobs. Long-running resource allocations are not
> >> really
> >> >> >> >> an option for us due to the variation in our workloads.
> >> >> >> >>
> >> >> >> >> My first thought was to create a Task for each type of
> analysis we
> >> >> >> >> run, and then start a new Job with the appropriate Task every
> >> time we
> >> >> >> >> want to run analysis (regulated by a queue). This doesn't seem
> to
> >> >> work
> >> >> >> >> though. I can't `aurora create` the same `.aurora` file
> multiple
> >> >> times
> >> >> >> >> with different Job names (as far as I can tell). Also there is
> the
> >> >> >> >> problem of how to customize each Job slightly (e.g. a payload).
> >> >> >> >>
> >> >> >> >> An obvious alternative is to create a unique Task every time we
> >> want
> >> >> >> >> to run work. This would result in tens of thousands of tasks
> being
> >> >> >> >> created every day, and from what I can tell Aurora does not
> >> intend to
> >> >> >> >> be used like that. (Please correct me if I am wrong.)
> >> >> >> >>
> >> >> >> >> Basically, I would like to hook my job queue up to Aurora to
> >> perform
> >> >> >> >> the actual work. There are a dozen different types of jobs,
> each
> >> with
> >> >> >> >> different performance requirements. Every time a job runs, it
> has
> >> a
> >> >> >> >> unique payload containing the definition of the work it should
> be
> >> >> >> >> performed.
> >> >> >> >>
> >> >> >> >> Can Aurora be used this way? If so, what is the proper way to
> >> model
> >> >> >> >> this with respect to Jobs and Tasks?
> >> >> >> >>
> >> >> >> >> Any/all help is appreciated.
> >> >> >> >>
> >> >> >> >> Thanks!
> >> >> >> >>
> >> >> >> >> -Bryan
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> Bryan Helmkamp, Founder, Code Climate
> >> >> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Bryan Helmkamp, Founder, Code Climate
> >> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Bryan Helmkamp, Founder, Code Climate
> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >> >>
> >>
> >>
> >>
> >> --
> >> Bryan Helmkamp, Founder, Code Climate
> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >>
>
>
>
> --
> Bryan Helmkamp, Founder, Code Climate
> bryan@codeclimate.com / 646-379-1810 / @brynary
>

Re: Suitibility of Aurora for one-time tasks

Posted by Bryan Helmkamp <br...@codeclimate.com>.
Thanks, Kevin. That pretty much looks like exactly what I need.

-Bryan

On Wed, Feb 26, 2014 at 8:16 PM, Kevin Sweeney <ke...@apache.org> wrote:
> For a more dynamic approach to resource utilization you can use something
> like this:
>
> # dynamic.aurora
> *# Enqueue each individual work-item with aurora create -E
> work_item=$work_item -E resource_profile=graph_traversals
> west/service-account-name/prod/process_$work_item*
> class Profile(Struct):
>   queue_name = Required(String)
>   resources = Required(Resources)
>
> HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
> HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
>
> work_on_one_item = Process(name = 'work_on_one_item',
>   cmdline = '''
>     do_work "{{work_item}}"
>   ''',
> )
>
> task = Task(processes = [work_on_one_item],
>   resources = '{{resources[{{resource_profile}}]}}')
>
> job = Job(
>   task = task,
>   cluster = 'west',
>   role = 'service-account-name',
>   environment = 'prod',
>   name = 'process_{{work_item}}',
> )
>
> resources = {
>   'graph_traversals': HIGH_MEM,
>   'compilations': HIGH_CPU,
> }
>
> jobs = [job.bind(resources = resources)]
>
>
>
> On Wed, Feb 26, 2014 at 1:08 PM, Bryan Helmkamp <br...@codeclimate.com>wrote:
>
>> Sure. Yes, they are shell commands and yes they are provided different
>> configuration on each run.
>>
>> In effect we have a number of different job types that are queued up,
>> and we need to run as quickly as possible. Each job type has different
>> resource requirements. Every time we run the job, we provide different
>> arguments (the "payload"). For example:
>>
>> $ ./do_something.sh SOME_ID (Requires 1 CPU and 1GB RAM)
>> $ ./do_something_else.sh SOME_OTHER_ID (Requires 4 CPU and 4GB RAM)
>> [... there are about 12 of these ...]
>>
>> -Bryan
>>
>> On Wed, Feb 26, 2014 at 3:58 PM, Bill Farner <wf...@apache.org> wrote:
>> > Can you offer some more details on what the workload execution looks
>> like?
>> >  Are these shell commands?  An application that's provided different
>> > configuration?
>> >
>> > -=Bill
>> >
>> >
>> > On Wed, Feb 26, 2014 at 12:45 PM, Bryan Helmkamp <bryan@codeclimate.com
>> >wrote:
>> >
>> >> Thanks, Kevin. The idea of always-on workers of varying sizes is
>> >> effectively what we have right now in our non-Mesos world. The problem
>> >> is that sometimes we end up with not enough workers for certain
>> >> classes of jobs (e.g. High Memory), while part of the cluster sits
>> >> idle.
>> >>
>> >> Conceptually, in my mind we would define approximately a dozen Tasks,
>> >> one for each type of work we need to perform (with different resource
>> >> requirements), and then run Jobs, each with a Task and a unique
>> >> payload, but I don't think this model works with Mesos. It seems we'd
>> >> need to create a unique Task for every Job.
>> >>
>> >> -Bryan
>> >>
>> >> On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney <ke...@apache.org>
>> wrote:
>> >> > A job is a group of nearly-identical tasks plus some constraints like
>> >> rack
>> >> > diversity. The scheduler considers each task within a job equivalently
>> >> > schedulable, so you can't vary things like resource footprint. It's
>> >> > perfectly fine to have several jobs with just a single task, as long
>> as
>> >> > each has a different job key (which is (role, environment, name)).
>> >> >
>> >> > Another approach is to have a bunch of uniform always-on workers (in
>> >> > different sizes). This can be expressed as a Service like so:
>> >> >
>> >> > # workers.aurora
>> >> > class Profile(Struct):
>> >> >   queue_name = Required(String)
>> >> >   resources = Required(Resources)
>> >> >   instances = Required(Integer)
>> >> >
>> >> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
>> >> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
>> >> >
>> >> > work_forever = Process(name = 'work_forever',
>> >> >   cmdline = '''
>> >> >     # TODO: Replace this with something that isn't pseudo-bash
>> >> >     while true; do
>> >> >       work_item=`take_from_work_queue {{profile.queue_name}}`
>> >> >       do_work "$work_item"
>> >> >       tell_work_queue_finished "{{profile.queue_name}}" "$work_item"
>> >> >     done
>> >> >   ''')
>> >> >
>> >> > task = Task(processes = [work_forever],
>> >> > *  resources = '{{profile.resources}}, # Note this is static per
>> >> > queue-name.*
>> >> > )
>> >> >
>> >> > service = Service(
>> >> >   task = task,
>> >> >   cluster = 'west',
>> >> >   role = 'service-account-name',
>> >> >   environment = 'prod',
>> >> >   name = '{{profile.queue_name}}_processor'
>> >> >   *instances = '{{profile.instances}}', # Scale here.*
>> >> > )
>> >> >
>> >> > jobs = [
>> >> >   service.bind(profile = Profile(
>> >> >     resources = HIGH_MEM,
>> >> >     queue_name = 'graph_traversals',
>> >> >     instances = 50,
>> >> >   )),
>> >> >   service.bind(profile = Profile(
>> >> >     resources = HIGH_CPU,
>> >> >     queue_name = 'compilations',
>> >> >     instances = 200,
>> >> >   )),
>> >> > ]
>> >> >
>> >> >
>> >> > On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp <
>> bryan@codeclimate.com
>> >> >wrote:
>> >> >
>> >> >> Thanks, Bill.
>> >> >>
>> >> >> Am I correct in understanding that is not possible to parameterize
>> >> >> individual Jobs, just Tasks? Therefore, since I don't know the job
>> >> >> definitions up front, I will have parameterized Task templates, and
>> >> >> generate a new Task every time I need to run a Job?
>> >> >>
>> >> >> Is that the recommended route?
>> >> >>
>> >> >> Our work is very non-uniform so I don't think work-stealing would be
>> >> >> efficient for us.
>> >> >>
>> >> >> -Bryan
>> >> >>
>> >> >> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <wf...@apache.org>
>> >> wrote:
>> >> >> > Thanks for checking out Aurora!
>> >> >> >
>> >> >> > My short answer is that Aurora should handle thousands of
>> short-lived
>> >> >> > tasks/jobs per day without trouble.  (If you proceed with this
>> >> approach
>> >> >> and
>> >> >> > encounter performance issues, feel free to file tickets!)  The DSL
>> >> does
>> >> >> > have some mechanisms for parameterization.  In your case since you
>> >> >> probably
>> >> >> > don't know all the job definitions upfront, you'll probably want to
>> >> >> > parameterize with environment variables.  I don't see this
>> described
>> >> in
>> >> >> our
>> >> >> > docs, but you there's a little detail at the option declaration
>> [1].
>> >> >> >
>> >> >> > Another approach worth considering is work-stealing, using a single
>> >> job
>> >> >> as
>> >> >> > your pool of workers.  I would find this easier to manage, but it
>> >> would
>> >> >> > only be suitable if your work items are sufficiently-uniform.
>> >> >> >
>> >> >> > Feel free to continue the discussion!  We're also pretty active in
>> our
>> >> >> IRC
>> >> >> > channel if you'd prefer that medium.
>> >> >> >
>> >> >> >
>> >> >> > [1]
>> >> >> >
>> >> >>
>> >>
>> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
>> >> >> >
>> >> >> >
>> >> >> > -=Bill
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <
>> >> bryan@codeclimate.com
>> >> >> >wrote:
>> >> >> >
>> >> >> >> Hello,
>> >> >> >>
>> >> >> >> I am considering Aurora for a key component of our infrastructure.
>> >> >> >> Awesome work being done here.
>> >> >> >>
>> >> >> >> My question is: How suitable is Aurora for running short-lived
>> tasks?
>> >> >> >>
>> >> >> >> Background: We (Code Climate) do static analysis of tens of
>> thousands
>> >> >> >> of repositories every day. We run a variety of forms of analysis,
>> >> with
>> >> >> >> heterogeneous resource requirements, and thus our interest in
>> Mesos.
>> >> >> >>
>> >> >> >> Looking at Aurora, a lot of the core features look very helpful to
>> >> us.
>> >> >> >> Where I am getting hung up is figuring out how to model
>> short-lived
>> >> >> >> tasks as tasks/jobs. Long-running resource allocations are not
>> really
>> >> >> >> an option for us due to the variation in our workloads.
>> >> >> >>
>> >> >> >> My first thought was to create a Task for each type of analysis we
>> >> >> >> run, and then start a new Job with the appropriate Task every
>> time we
>> >> >> >> want to run analysis (regulated by a queue). This doesn't seem to
>> >> work
>> >> >> >> though. I can't `aurora create` the same `.aurora` file multiple
>> >> times
>> >> >> >> with different Job names (as far as I can tell). Also there is the
>> >> >> >> problem of how to customize each Job slightly (e.g. a payload).
>> >> >> >>
>> >> >> >> An obvious alternative is to create a unique Task every time we
>> want
>> >> >> >> to run work. This would result in tens of thousands of tasks being
>> >> >> >> created every day, and from what I can tell Aurora does not
>> intend to
>> >> >> >> be used like that. (Please correct me if I am wrong.)
>> >> >> >>
>> >> >> >> Basically, I would like to hook my job queue up to Aurora to
>> perform
>> >> >> >> the actual work. There are a dozen different types of jobs, each
>> with
>> >> >> >> different performance requirements. Every time a job runs, it has
>> a
>> >> >> >> unique payload containing the definition of the work it should be
>> >> >> >> performed.
>> >> >> >>
>> >> >> >> Can Aurora be used this way? If so, what is the proper way to
>> model
>> >> >> >> this with respect to Jobs and Tasks?
>> >> >> >>
>> >> >> >> Any/all help is appreciated.
>> >> >> >>
>> >> >> >> Thanks!
>> >> >> >>
>> >> >> >> -Bryan
>> >> >> >>
>> >> >> >> --
>> >> >> >> Bryan Helmkamp, Founder, Code Climate
>> >> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Bryan Helmkamp, Founder, Code Climate
>> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Bryan Helmkamp, Founder, Code Climate
>> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >>
>>
>>
>>
>> --
>> Bryan Helmkamp, Founder, Code Climate
>> bryan@codeclimate.com / 646-379-1810 / @brynary
>>



-- 
Bryan Helmkamp, Founder, Code Climate
bryan@codeclimate.com / 646-379-1810 / @brynary

Re: Suitibility of Aurora for one-time tasks

Posted by Kevin Sweeney <ke...@apache.org>.
For a more dynamic approach to resource utilization you can use something
like this:

# dynamic.aurora
*# Enqueue each individual work-item with aurora create -E
work_item=$work_item -E resource_profile=graph_traversals
west/service-account-name/prod/process_$work_item*
class Profile(Struct):
  queue_name = Required(String)
  resources = Required(Resources)

HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)

work_on_one_item = Process(name = 'work_on_one_item',
  cmdline = '''
    do_work "{{work_item}}"
  ''',
)

task = Task(processes = [work_on_one_item],
  resources = '{{resources[{{resource_profile}}]}}')

job = Job(
  task = task,
  cluster = 'west',
  role = 'service-account-name',
  environment = 'prod',
  name = 'process_{{work_item}}',
)

resources = {
  'graph_traversals': HIGH_MEM,
  'compilations': HIGH_CPU,
}

jobs = [job.bind(resources = resources)]



On Wed, Feb 26, 2014 at 1:08 PM, Bryan Helmkamp <br...@codeclimate.com>wrote:

> Sure. Yes, they are shell commands and yes they are provided different
> configuration on each run.
>
> In effect we have a number of different job types that are queued up,
> and we need to run as quickly as possible. Each job type has different
> resource requirements. Every time we run the job, we provide different
> arguments (the "payload"). For example:
>
> $ ./do_something.sh SOME_ID (Requires 1 CPU and 1GB RAM)
> $ ./do_something_else.sh SOME_OTHER_ID (Requires 4 CPU and 4GB RAM)
> [... there are about 12 of these ...]
>
> -Bryan
>
> On Wed, Feb 26, 2014 at 3:58 PM, Bill Farner <wf...@apache.org> wrote:
> > Can you offer some more details on what the workload execution looks
> like?
> >  Are these shell commands?  An application that's provided different
> > configuration?
> >
> > -=Bill
> >
> >
> > On Wed, Feb 26, 2014 at 12:45 PM, Bryan Helmkamp <bryan@codeclimate.com
> >wrote:
> >
> >> Thanks, Kevin. The idea of always-on workers of varying sizes is
> >> effectively what we have right now in our non-Mesos world. The problem
> >> is that sometimes we end up with not enough workers for certain
> >> classes of jobs (e.g. High Memory), while part of the cluster sits
> >> idle.
> >>
> >> Conceptually, in my mind we would define approximately a dozen Tasks,
> >> one for each type of work we need to perform (with different resource
> >> requirements), and then run Jobs, each with a Task and a unique
> >> payload, but I don't think this model works with Mesos. It seems we'd
> >> need to create a unique Task for every Job.
> >>
> >> -Bryan
> >>
> >> On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney <ke...@apache.org>
> wrote:
> >> > A job is a group of nearly-identical tasks plus some constraints like
> >> rack
> >> > diversity. The scheduler considers each task within a job equivalently
> >> > schedulable, so you can't vary things like resource footprint. It's
> >> > perfectly fine to have several jobs with just a single task, as long
> as
> >> > each has a different job key (which is (role, environment, name)).
> >> >
> >> > Another approach is to have a bunch of uniform always-on workers (in
> >> > different sizes). This can be expressed as a Service like so:
> >> >
> >> > # workers.aurora
> >> > class Profile(Struct):
> >> >   queue_name = Required(String)
> >> >   resources = Required(Resources)
> >> >   instances = Required(Integer)
> >> >
> >> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
> >> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
> >> >
> >> > work_forever = Process(name = 'work_forever',
> >> >   cmdline = '''
> >> >     # TODO: Replace this with something that isn't pseudo-bash
> >> >     while true; do
> >> >       work_item=`take_from_work_queue {{profile.queue_name}}`
> >> >       do_work "$work_item"
> >> >       tell_work_queue_finished "{{profile.queue_name}}" "$work_item"
> >> >     done
> >> >   ''')
> >> >
> >> > task = Task(processes = [work_forever],
> >> > *  resources = '{{profile.resources}}, # Note this is static per
> >> > queue-name.*
> >> > )
> >> >
> >> > service = Service(
> >> >   task = task,
> >> >   cluster = 'west',
> >> >   role = 'service-account-name',
> >> >   environment = 'prod',
> >> >   name = '{{profile.queue_name}}_processor'
> >> >   *instances = '{{profile.instances}}', # Scale here.*
> >> > )
> >> >
> >> > jobs = [
> >> >   service.bind(profile = Profile(
> >> >     resources = HIGH_MEM,
> >> >     queue_name = 'graph_traversals',
> >> >     instances = 50,
> >> >   )),
> >> >   service.bind(profile = Profile(
> >> >     resources = HIGH_CPU,
> >> >     queue_name = 'compilations',
> >> >     instances = 200,
> >> >   )),
> >> > ]
> >> >
> >> >
> >> > On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp <
> bryan@codeclimate.com
> >> >wrote:
> >> >
> >> >> Thanks, Bill.
> >> >>
> >> >> Am I correct in understanding that is not possible to parameterize
> >> >> individual Jobs, just Tasks? Therefore, since I don't know the job
> >> >> definitions up front, I will have parameterized Task templates, and
> >> >> generate a new Task every time I need to run a Job?
> >> >>
> >> >> Is that the recommended route?
> >> >>
> >> >> Our work is very non-uniform so I don't think work-stealing would be
> >> >> efficient for us.
> >> >>
> >> >> -Bryan
> >> >>
> >> >> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <wf...@apache.org>
> >> wrote:
> >> >> > Thanks for checking out Aurora!
> >> >> >
> >> >> > My short answer is that Aurora should handle thousands of
> short-lived
> >> >> > tasks/jobs per day without trouble.  (If you proceed with this
> >> approach
> >> >> and
> >> >> > encounter performance issues, feel free to file tickets!)  The DSL
> >> does
> >> >> > have some mechanisms for parameterization.  In your case since you
> >> >> probably
> >> >> > don't know all the job definitions upfront, you'll probably want to
> >> >> > parameterize with environment variables.  I don't see this
> described
> >> in
> >> >> our
> >> >> > docs, but you there's a little detail at the option declaration
> [1].
> >> >> >
> >> >> > Another approach worth considering is work-stealing, using a single
> >> job
> >> >> as
> >> >> > your pool of workers.  I would find this easier to manage, but it
> >> would
> >> >> > only be suitable if your work items are sufficiently-uniform.
> >> >> >
> >> >> > Feel free to continue the discussion!  We're also pretty active in
> our
> >> >> IRC
> >> >> > channel if you'd prefer that medium.
> >> >> >
> >> >> >
> >> >> > [1]
> >> >> >
> >> >>
> >>
> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
> >> >> >
> >> >> >
> >> >> > -=Bill
> >> >> >
> >> >> >
> >> >> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <
> >> bryan@codeclimate.com
> >> >> >wrote:
> >> >> >
> >> >> >> Hello,
> >> >> >>
> >> >> >> I am considering Aurora for a key component of our infrastructure.
> >> >> >> Awesome work being done here.
> >> >> >>
> >> >> >> My question is: How suitable is Aurora for running short-lived
> tasks?
> >> >> >>
> >> >> >> Background: We (Code Climate) do static analysis of tens of
> thousands
> >> >> >> of repositories every day. We run a variety of forms of analysis,
> >> with
> >> >> >> heterogeneous resource requirements, and thus our interest in
> Mesos.
> >> >> >>
> >> >> >> Looking at Aurora, a lot of the core features look very helpful to
> >> us.
> >> >> >> Where I am getting hung up is figuring out how to model
> short-lived
> >> >> >> tasks as tasks/jobs. Long-running resource allocations are not
> really
> >> >> >> an option for us due to the variation in our workloads.
> >> >> >>
> >> >> >> My first thought was to create a Task for each type of analysis we
> >> >> >> run, and then start a new Job with the appropriate Task every
> time we
> >> >> >> want to run analysis (regulated by a queue). This doesn't seem to
> >> work
> >> >> >> though. I can't `aurora create` the same `.aurora` file multiple
> >> times
> >> >> >> with different Job names (as far as I can tell). Also there is the
> >> >> >> problem of how to customize each Job slightly (e.g. a payload).
> >> >> >>
> >> >> >> An obvious alternative is to create a unique Task every time we
> want
> >> >> >> to run work. This would result in tens of thousands of tasks being
> >> >> >> created every day, and from what I can tell Aurora does not
> intend to
> >> >> >> be used like that. (Please correct me if I am wrong.)
> >> >> >>
> >> >> >> Basically, I would like to hook my job queue up to Aurora to
> perform
> >> >> >> the actual work. There are a dozen different types of jobs, each
> with
> >> >> >> different performance requirements. Every time a job runs, it has
> a
> >> >> >> unique payload containing the definition of the work it should be
> >> >> >> performed.
> >> >> >>
> >> >> >> Can Aurora be used this way? If so, what is the proper way to
> model
> >> >> >> this with respect to Jobs and Tasks?
> >> >> >>
> >> >> >> Any/all help is appreciated.
> >> >> >>
> >> >> >> Thanks!
> >> >> >>
> >> >> >> -Bryan
> >> >> >>
> >> >> >> --
> >> >> >> Bryan Helmkamp, Founder, Code Climate
> >> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Bryan Helmkamp, Founder, Code Climate
> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >> >>
> >>
> >>
> >>
> >> --
> >> Bryan Helmkamp, Founder, Code Climate
> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >>
>
>
>
> --
> Bryan Helmkamp, Founder, Code Climate
> bryan@codeclimate.com / 646-379-1810 / @brynary
>

Re: Suitibility of Aurora for one-time tasks

Posted by Bryan Helmkamp <br...@codeclimate.com>.
Sure. Yes, they are shell commands and yes they are provided different
configuration on each run.

In effect we have a number of different job types that are queued up,
and we need to run as quickly as possible. Each job type has different
resource requirements. Every time we run the job, we provide different
arguments (the "payload"). For example:

$ ./do_something.sh SOME_ID (Requires 1 CPU and 1GB RAM)
$ ./do_something_else.sh SOME_OTHER_ID (Requires 4 CPU and 4GB RAM)
[... there are about 12 of these ...]

-Bryan

On Wed, Feb 26, 2014 at 3:58 PM, Bill Farner <wf...@apache.org> wrote:
> Can you offer some more details on what the workload execution looks like?
>  Are these shell commands?  An application that's provided different
> configuration?
>
> -=Bill
>
>
> On Wed, Feb 26, 2014 at 12:45 PM, Bryan Helmkamp <br...@codeclimate.com>wrote:
>
>> Thanks, Kevin. The idea of always-on workers of varying sizes is
>> effectively what we have right now in our non-Mesos world. The problem
>> is that sometimes we end up with not enough workers for certain
>> classes of jobs (e.g. High Memory), while part of the cluster sits
>> idle.
>>
>> Conceptually, in my mind we would define approximately a dozen Tasks,
>> one for each type of work we need to perform (with different resource
>> requirements), and then run Jobs, each with a Task and a unique
>> payload, but I don't think this model works with Mesos. It seems we'd
>> need to create a unique Task for every Job.
>>
>> -Bryan
>>
>> On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney <ke...@apache.org> wrote:
>> > A job is a group of nearly-identical tasks plus some constraints like
>> rack
>> > diversity. The scheduler considers each task within a job equivalently
>> > schedulable, so you can't vary things like resource footprint. It's
>> > perfectly fine to have several jobs with just a single task, as long as
>> > each has a different job key (which is (role, environment, name)).
>> >
>> > Another approach is to have a bunch of uniform always-on workers (in
>> > different sizes). This can be expressed as a Service like so:
>> >
>> > # workers.aurora
>> > class Profile(Struct):
>> >   queue_name = Required(String)
>> >   resources = Required(Resources)
>> >   instances = Required(Integer)
>> >
>> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
>> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
>> >
>> > work_forever = Process(name = 'work_forever',
>> >   cmdline = '''
>> >     # TODO: Replace this with something that isn't pseudo-bash
>> >     while true; do
>> >       work_item=`take_from_work_queue {{profile.queue_name}}`
>> >       do_work "$work_item"
>> >       tell_work_queue_finished "{{profile.queue_name}}" "$work_item"
>> >     done
>> >   ''')
>> >
>> > task = Task(processes = [work_forever],
>> > *  resources = '{{profile.resources}}, # Note this is static per
>> > queue-name.*
>> > )
>> >
>> > service = Service(
>> >   task = task,
>> >   cluster = 'west',
>> >   role = 'service-account-name',
>> >   environment = 'prod',
>> >   name = '{{profile.queue_name}}_processor'
>> >   *instances = '{{profile.instances}}', # Scale here.*
>> > )
>> >
>> > jobs = [
>> >   service.bind(profile = Profile(
>> >     resources = HIGH_MEM,
>> >     queue_name = 'graph_traversals',
>> >     instances = 50,
>> >   )),
>> >   service.bind(profile = Profile(
>> >     resources = HIGH_CPU,
>> >     queue_name = 'compilations',
>> >     instances = 200,
>> >   )),
>> > ]
>> >
>> >
>> > On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp <bryan@codeclimate.com
>> >wrote:
>> >
>> >> Thanks, Bill.
>> >>
>> >> Am I correct in understanding that is not possible to parameterize
>> >> individual Jobs, just Tasks? Therefore, since I don't know the job
>> >> definitions up front, I will have parameterized Task templates, and
>> >> generate a new Task every time I need to run a Job?
>> >>
>> >> Is that the recommended route?
>> >>
>> >> Our work is very non-uniform so I don't think work-stealing would be
>> >> efficient for us.
>> >>
>> >> -Bryan
>> >>
>> >> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <wf...@apache.org>
>> wrote:
>> >> > Thanks for checking out Aurora!
>> >> >
>> >> > My short answer is that Aurora should handle thousands of short-lived
>> >> > tasks/jobs per day without trouble.  (If you proceed with this
>> approach
>> >> and
>> >> > encounter performance issues, feel free to file tickets!)  The DSL
>> does
>> >> > have some mechanisms for parameterization.  In your case since you
>> >> probably
>> >> > don't know all the job definitions upfront, you'll probably want to
>> >> > parameterize with environment variables.  I don't see this described
>> in
>> >> our
>> >> > docs, but you there's a little detail at the option declaration [1].
>> >> >
>> >> > Another approach worth considering is work-stealing, using a single
>> job
>> >> as
>> >> > your pool of workers.  I would find this easier to manage, but it
>> would
>> >> > only be suitable if your work items are sufficiently-uniform.
>> >> >
>> >> > Feel free to continue the discussion!  We're also pretty active in our
>> >> IRC
>> >> > channel if you'd prefer that medium.
>> >> >
>> >> >
>> >> > [1]
>> >> >
>> >>
>> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
>> >> >
>> >> >
>> >> > -=Bill
>> >> >
>> >> >
>> >> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <
>> bryan@codeclimate.com
>> >> >wrote:
>> >> >
>> >> >> Hello,
>> >> >>
>> >> >> I am considering Aurora for a key component of our infrastructure.
>> >> >> Awesome work being done here.
>> >> >>
>> >> >> My question is: How suitable is Aurora for running short-lived tasks?
>> >> >>
>> >> >> Background: We (Code Climate) do static analysis of tens of thousands
>> >> >> of repositories every day. We run a variety of forms of analysis,
>> with
>> >> >> heterogeneous resource requirements, and thus our interest in Mesos.
>> >> >>
>> >> >> Looking at Aurora, a lot of the core features look very helpful to
>> us.
>> >> >> Where I am getting hung up is figuring out how to model short-lived
>> >> >> tasks as tasks/jobs. Long-running resource allocations are not really
>> >> >> an option for us due to the variation in our workloads.
>> >> >>
>> >> >> My first thought was to create a Task for each type of analysis we
>> >> >> run, and then start a new Job with the appropriate Task every time we
>> >> >> want to run analysis (regulated by a queue). This doesn't seem to
>> work
>> >> >> though. I can't `aurora create` the same `.aurora` file multiple
>> times
>> >> >> with different Job names (as far as I can tell). Also there is the
>> >> >> problem of how to customize each Job slightly (e.g. a payload).
>> >> >>
>> >> >> An obvious alternative is to create a unique Task every time we want
>> >> >> to run work. This would result in tens of thousands of tasks being
>> >> >> created every day, and from what I can tell Aurora does not intend to
>> >> >> be used like that. (Please correct me if I am wrong.)
>> >> >>
>> >> >> Basically, I would like to hook my job queue up to Aurora to perform
>> >> >> the actual work. There are a dozen different types of jobs, each with
>> >> >> different performance requirements. Every time a job runs, it has a
>> >> >> unique payload containing the definition of the work it should be
>> >> >> performed.
>> >> >>
>> >> >> Can Aurora be used this way? If so, what is the proper way to model
>> >> >> this with respect to Jobs and Tasks?
>> >> >>
>> >> >> Any/all help is appreciated.
>> >> >>
>> >> >> Thanks!
>> >> >>
>> >> >> -Bryan
>> >> >>
>> >> >> --
>> >> >> Bryan Helmkamp, Founder, Code Climate
>> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Bryan Helmkamp, Founder, Code Climate
>> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >>
>>
>>
>>
>> --
>> Bryan Helmkamp, Founder, Code Climate
>> bryan@codeclimate.com / 646-379-1810 / @brynary
>>



-- 
Bryan Helmkamp, Founder, Code Climate
bryan@codeclimate.com / 646-379-1810 / @brynary

Re: Suitibility of Aurora for one-time tasks

Posted by Bill Farner <wf...@apache.org>.
Can you offer some more details on what the workload execution looks like?
 Are these shell commands?  An application that's provided different
configuration?

-=Bill


On Wed, Feb 26, 2014 at 12:45 PM, Bryan Helmkamp <br...@codeclimate.com>wrote:

> Thanks, Kevin. The idea of always-on workers of varying sizes is
> effectively what we have right now in our non-Mesos world. The problem
> is that sometimes we end up with not enough workers for certain
> classes of jobs (e.g. High Memory), while part of the cluster sits
> idle.
>
> Conceptually, in my mind we would define approximately a dozen Tasks,
> one for each type of work we need to perform (with different resource
> requirements), and then run Jobs, each with a Task and a unique
> payload, but I don't think this model works with Mesos. It seems we'd
> need to create a unique Task for every Job.
>
> -Bryan
>
> On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney <ke...@apache.org> wrote:
> > A job is a group of nearly-identical tasks plus some constraints like
> rack
> > diversity. The scheduler considers each task within a job equivalently
> > schedulable, so you can't vary things like resource footprint. It's
> > perfectly fine to have several jobs with just a single task, as long as
> > each has a different job key (which is (role, environment, name)).
> >
> > Another approach is to have a bunch of uniform always-on workers (in
> > different sizes). This can be expressed as a Service like so:
> >
> > # workers.aurora
> > class Profile(Struct):
> >   queue_name = Required(String)
> >   resources = Required(Resources)
> >   instances = Required(Integer)
> >
> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
> >
> > work_forever = Process(name = 'work_forever',
> >   cmdline = '''
> >     # TODO: Replace this with something that isn't pseudo-bash
> >     while true; do
> >       work_item=`take_from_work_queue {{profile.queue_name}}`
> >       do_work "$work_item"
> >       tell_work_queue_finished "{{profile.queue_name}}" "$work_item"
> >     done
> >   ''')
> >
> > task = Task(processes = [work_forever],
> > *  resources = '{{profile.resources}}, # Note this is static per
> > queue-name.*
> > )
> >
> > service = Service(
> >   task = task,
> >   cluster = 'west',
> >   role = 'service-account-name',
> >   environment = 'prod',
> >   name = '{{profile.queue_name}}_processor'
> >   *instances = '{{profile.instances}}', # Scale here.*
> > )
> >
> > jobs = [
> >   service.bind(profile = Profile(
> >     resources = HIGH_MEM,
> >     queue_name = 'graph_traversals',
> >     instances = 50,
> >   )),
> >   service.bind(profile = Profile(
> >     resources = HIGH_CPU,
> >     queue_name = 'compilations',
> >     instances = 200,
> >   )),
> > ]
> >
> >
> > On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp <bryan@codeclimate.com
> >wrote:
> >
> >> Thanks, Bill.
> >>
> >> Am I correct in understanding that is not possible to parameterize
> >> individual Jobs, just Tasks? Therefore, since I don't know the job
> >> definitions up front, I will have parameterized Task templates, and
> >> generate a new Task every time I need to run a Job?
> >>
> >> Is that the recommended route?
> >>
> >> Our work is very non-uniform so I don't think work-stealing would be
> >> efficient for us.
> >>
> >> -Bryan
> >>
> >> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <wf...@apache.org>
> wrote:
> >> > Thanks for checking out Aurora!
> >> >
> >> > My short answer is that Aurora should handle thousands of short-lived
> >> > tasks/jobs per day without trouble.  (If you proceed with this
> approach
> >> and
> >> > encounter performance issues, feel free to file tickets!)  The DSL
> does
> >> > have some mechanisms for parameterization.  In your case since you
> >> probably
> >> > don't know all the job definitions upfront, you'll probably want to
> >> > parameterize with environment variables.  I don't see this described
> in
> >> our
> >> > docs, but you there's a little detail at the option declaration [1].
> >> >
> >> > Another approach worth considering is work-stealing, using a single
> job
> >> as
> >> > your pool of workers.  I would find this easier to manage, but it
> would
> >> > only be suitable if your work items are sufficiently-uniform.
> >> >
> >> > Feel free to continue the discussion!  We're also pretty active in our
> >> IRC
> >> > channel if you'd prefer that medium.
> >> >
> >> >
> >> > [1]
> >> >
> >>
> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
> >> >
> >> >
> >> > -=Bill
> >> >
> >> >
> >> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <
> bryan@codeclimate.com
> >> >wrote:
> >> >
> >> >> Hello,
> >> >>
> >> >> I am considering Aurora for a key component of our infrastructure.
> >> >> Awesome work being done here.
> >> >>
> >> >> My question is: How suitable is Aurora for running short-lived tasks?
> >> >>
> >> >> Background: We (Code Climate) do static analysis of tens of thousands
> >> >> of repositories every day. We run a variety of forms of analysis,
> with
> >> >> heterogeneous resource requirements, and thus our interest in Mesos.
> >> >>
> >> >> Looking at Aurora, a lot of the core features look very helpful to
> us.
> >> >> Where I am getting hung up is figuring out how to model short-lived
> >> >> tasks as tasks/jobs. Long-running resource allocations are not really
> >> >> an option for us due to the variation in our workloads.
> >> >>
> >> >> My first thought was to create a Task for each type of analysis we
> >> >> run, and then start a new Job with the appropriate Task every time we
> >> >> want to run analysis (regulated by a queue). This doesn't seem to
> work
> >> >> though. I can't `aurora create` the same `.aurora` file multiple
> times
> >> >> with different Job names (as far as I can tell). Also there is the
> >> >> problem of how to customize each Job slightly (e.g. a payload).
> >> >>
> >> >> An obvious alternative is to create a unique Task every time we want
> >> >> to run work. This would result in tens of thousands of tasks being
> >> >> created every day, and from what I can tell Aurora does not intend to
> >> >> be used like that. (Please correct me if I am wrong.)
> >> >>
> >> >> Basically, I would like to hook my job queue up to Aurora to perform
> >> >> the actual work. There are a dozen different types of jobs, each with
> >> >> different performance requirements. Every time a job runs, it has a
> >> >> unique payload containing the definition of the work it should be
> >> >> performed.
> >> >>
> >> >> Can Aurora be used this way? If so, what is the proper way to model
> >> >> this with respect to Jobs and Tasks?
> >> >>
> >> >> Any/all help is appreciated.
> >> >>
> >> >> Thanks!
> >> >>
> >> >> -Bryan
> >> >>
> >> >> --
> >> >> Bryan Helmkamp, Founder, Code Climate
> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >> >>
> >>
> >>
> >>
> >> --
> >> Bryan Helmkamp, Founder, Code Climate
> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >>
>
>
>
> --
> Bryan Helmkamp, Founder, Code Climate
> bryan@codeclimate.com / 646-379-1810 / @brynary
>

Re: Suitibility of Aurora for one-time tasks

Posted by Bryan Helmkamp <br...@codeclimate.com>.
Thanks, Kevin. The idea of always-on workers of varying sizes is
effectively what we have right now in our non-Mesos world. The problem
is that sometimes we end up with not enough workers for certain
classes of jobs (e.g. High Memory), while part of the cluster sits
idle.

Conceptually, in my mind we would define approximately a dozen Tasks,
one for each type of work we need to perform (with different resource
requirements), and then run Jobs, each with a Task and a unique
payload, but I don't think this model works with Mesos. It seems we'd
need to create a unique Task for every Job.

-Bryan

On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney <ke...@apache.org> wrote:
> A job is a group of nearly-identical tasks plus some constraints like rack
> diversity. The scheduler considers each task within a job equivalently
> schedulable, so you can't vary things like resource footprint. It's
> perfectly fine to have several jobs with just a single task, as long as
> each has a different job key (which is (role, environment, name)).
>
> Another approach is to have a bunch of uniform always-on workers (in
> different sizes). This can be expressed as a Service like so:
>
> # workers.aurora
> class Profile(Struct):
>   queue_name = Required(String)
>   resources = Required(Resources)
>   instances = Required(Integer)
>
> HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
> HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
>
> work_forever = Process(name = 'work_forever',
>   cmdline = '''
>     # TODO: Replace this with something that isn't pseudo-bash
>     while true; do
>       work_item=`take_from_work_queue {{profile.queue_name}}`
>       do_work "$work_item"
>       tell_work_queue_finished "{{profile.queue_name}}" "$work_item"
>     done
>   ''')
>
> task = Task(processes = [work_forever],
> *  resources = '{{profile.resources}}, # Note this is static per
> queue-name.*
> )
>
> service = Service(
>   task = task,
>   cluster = 'west',
>   role = 'service-account-name',
>   environment = 'prod',
>   name = '{{profile.queue_name}}_processor'
>   *instances = '{{profile.instances}}', # Scale here.*
> )
>
> jobs = [
>   service.bind(profile = Profile(
>     resources = HIGH_MEM,
>     queue_name = 'graph_traversals',
>     instances = 50,
>   )),
>   service.bind(profile = Profile(
>     resources = HIGH_CPU,
>     queue_name = 'compilations',
>     instances = 200,
>   )),
> ]
>
>
> On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp <br...@codeclimate.com>wrote:
>
>> Thanks, Bill.
>>
>> Am I correct in understanding that is not possible to parameterize
>> individual Jobs, just Tasks? Therefore, since I don't know the job
>> definitions up front, I will have parameterized Task templates, and
>> generate a new Task every time I need to run a Job?
>>
>> Is that the recommended route?
>>
>> Our work is very non-uniform so I don't think work-stealing would be
>> efficient for us.
>>
>> -Bryan
>>
>> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <wf...@apache.org> wrote:
>> > Thanks for checking out Aurora!
>> >
>> > My short answer is that Aurora should handle thousands of short-lived
>> > tasks/jobs per day without trouble.  (If you proceed with this approach
>> and
>> > encounter performance issues, feel free to file tickets!)  The DSL does
>> > have some mechanisms for parameterization.  In your case since you
>> probably
>> > don't know all the job definitions upfront, you'll probably want to
>> > parameterize with environment variables.  I don't see this described in
>> our
>> > docs, but you there's a little detail at the option declaration [1].
>> >
>> > Another approach worth considering is work-stealing, using a single job
>> as
>> > your pool of workers.  I would find this easier to manage, but it would
>> > only be suitable if your work items are sufficiently-uniform.
>> >
>> > Feel free to continue the discussion!  We're also pretty active in our
>> IRC
>> > channel if you'd prefer that medium.
>> >
>> >
>> > [1]
>> >
>> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
>> >
>> >
>> > -=Bill
>> >
>> >
>> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <bryan@codeclimate.com
>> >wrote:
>> >
>> >> Hello,
>> >>
>> >> I am considering Aurora for a key component of our infrastructure.
>> >> Awesome work being done here.
>> >>
>> >> My question is: How suitable is Aurora for running short-lived tasks?
>> >>
>> >> Background: We (Code Climate) do static analysis of tens of thousands
>> >> of repositories every day. We run a variety of forms of analysis, with
>> >> heterogeneous resource requirements, and thus our interest in Mesos.
>> >>
>> >> Looking at Aurora, a lot of the core features look very helpful to us.
>> >> Where I am getting hung up is figuring out how to model short-lived
>> >> tasks as tasks/jobs. Long-running resource allocations are not really
>> >> an option for us due to the variation in our workloads.
>> >>
>> >> My first thought was to create a Task for each type of analysis we
>> >> run, and then start a new Job with the appropriate Task every time we
>> >> want to run analysis (regulated by a queue). This doesn't seem to work
>> >> though. I can't `aurora create` the same `.aurora` file multiple times
>> >> with different Job names (as far as I can tell). Also there is the
>> >> problem of how to customize each Job slightly (e.g. a payload).
>> >>
>> >> An obvious alternative is to create a unique Task every time we want
>> >> to run work. This would result in tens of thousands of tasks being
>> >> created every day, and from what I can tell Aurora does not intend to
>> >> be used like that. (Please correct me if I am wrong.)
>> >>
>> >> Basically, I would like to hook my job queue up to Aurora to perform
>> >> the actual work. There are a dozen different types of jobs, each with
>> >> different performance requirements. Every time a job runs, it has a
>> >> unique payload containing the definition of the work it should be
>> >> performed.
>> >>
>> >> Can Aurora be used this way? If so, what is the proper way to model
>> >> this with respect to Jobs and Tasks?
>> >>
>> >> Any/all help is appreciated.
>> >>
>> >> Thanks!
>> >>
>> >> -Bryan
>> >>
>> >> --
>> >> Bryan Helmkamp, Founder, Code Climate
>> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >>
>>
>>
>>
>> --
>> Bryan Helmkamp, Founder, Code Climate
>> bryan@codeclimate.com / 646-379-1810 / @brynary
>>



-- 
Bryan Helmkamp, Founder, Code Climate
bryan@codeclimate.com / 646-379-1810 / @brynary

Re: Suitibility of Aurora for one-time tasks

Posted by Kevin Sweeney <ke...@apache.org>.
A job is a group of nearly-identical tasks plus some constraints like rack
diversity. The scheduler considers each task within a job equivalently
schedulable, so you can't vary things like resource footprint. It's
perfectly fine to have several jobs with just a single task, as long as
each has a different job key (which is (role, environment, name)).

Another approach is to have a bunch of uniform always-on workers (in
different sizes). This can be expressed as a Service like so:

# workers.aurora
class Profile(Struct):
  queue_name = Required(String)
  resources = Required(Resources)
  instances = Required(Integer)

HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)

work_forever = Process(name = 'work_forever',
  cmdline = '''
    # TODO: Replace this with something that isn't pseudo-bash
    while true; do
      work_item=`take_from_work_queue {{profile.queue_name}}`
      do_work "$work_item"
      tell_work_queue_finished "{{profile.queue_name}}" "$work_item"
    done
  ''')

task = Task(processes = [work_forever],
*  resources = '{{profile.resources}}, # Note this is static per
queue-name.*
)

service = Service(
  task = task,
  cluster = 'west',
  role = 'service-account-name',
  environment = 'prod',
  name = '{{profile.queue_name}}_processor'
  *instances = '{{profile.instances}}', # Scale here.*
)

jobs = [
  service.bind(profile = Profile(
    resources = HIGH_MEM,
    queue_name = 'graph_traversals',
    instances = 50,
  )),
  service.bind(profile = Profile(
    resources = HIGH_CPU,
    queue_name = 'compilations',
    instances = 200,
  )),
]


On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp <br...@codeclimate.com>wrote:

> Thanks, Bill.
>
> Am I correct in understanding that is not possible to parameterize
> individual Jobs, just Tasks? Therefore, since I don't know the job
> definitions up front, I will have parameterized Task templates, and
> generate a new Task every time I need to run a Job?
>
> Is that the recommended route?
>
> Our work is very non-uniform so I don't think work-stealing would be
> efficient for us.
>
> -Bryan
>
> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <wf...@apache.org> wrote:
> > Thanks for checking out Aurora!
> >
> > My short answer is that Aurora should handle thousands of short-lived
> > tasks/jobs per day without trouble.  (If you proceed with this approach
> and
> > encounter performance issues, feel free to file tickets!)  The DSL does
> > have some mechanisms for parameterization.  In your case since you
> probably
> > don't know all the job definitions upfront, you'll probably want to
> > parameterize with environment variables.  I don't see this described in
> our
> > docs, but you there's a little detail at the option declaration [1].
> >
> > Another approach worth considering is work-stealing, using a single job
> as
> > your pool of workers.  I would find this easier to manage, but it would
> > only be suitable if your work items are sufficiently-uniform.
> >
> > Feel free to continue the discussion!  We're also pretty active in our
> IRC
> > channel if you'd prefer that medium.
> >
> >
> > [1]
> >
> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
> >
> >
> > -=Bill
> >
> >
> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <bryan@codeclimate.com
> >wrote:
> >
> >> Hello,
> >>
> >> I am considering Aurora for a key component of our infrastructure.
> >> Awesome work being done here.
> >>
> >> My question is: How suitable is Aurora for running short-lived tasks?
> >>
> >> Background: We (Code Climate) do static analysis of tens of thousands
> >> of repositories every day. We run a variety of forms of analysis, with
> >> heterogeneous resource requirements, and thus our interest in Mesos.
> >>
> >> Looking at Aurora, a lot of the core features look very helpful to us.
> >> Where I am getting hung up is figuring out how to model short-lived
> >> tasks as tasks/jobs. Long-running resource allocations are not really
> >> an option for us due to the variation in our workloads.
> >>
> >> My first thought was to create a Task for each type of analysis we
> >> run, and then start a new Job with the appropriate Task every time we
> >> want to run analysis (regulated by a queue). This doesn't seem to work
> >> though. I can't `aurora create` the same `.aurora` file multiple times
> >> with different Job names (as far as I can tell). Also there is the
> >> problem of how to customize each Job slightly (e.g. a payload).
> >>
> >> An obvious alternative is to create a unique Task every time we want
> >> to run work. This would result in tens of thousands of tasks being
> >> created every day, and from what I can tell Aurora does not intend to
> >> be used like that. (Please correct me if I am wrong.)
> >>
> >> Basically, I would like to hook my job queue up to Aurora to perform
> >> the actual work. There are a dozen different types of jobs, each with
> >> different performance requirements. Every time a job runs, it has a
> >> unique payload containing the definition of the work it should be
> >> performed.
> >>
> >> Can Aurora be used this way? If so, what is the proper way to model
> >> this with respect to Jobs and Tasks?
> >>
> >> Any/all help is appreciated.
> >>
> >> Thanks!
> >>
> >> -Bryan
> >>
> >> --
> >> Bryan Helmkamp, Founder, Code Climate
> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >>
>
>
>
> --
> Bryan Helmkamp, Founder, Code Climate
> bryan@codeclimate.com / 646-379-1810 / @brynary
>

Re: Suitibility of Aurora for one-time tasks

Posted by Bryan Helmkamp <br...@codeclimate.com>.
Thanks, Bill.

Am I correct in understanding that is not possible to parameterize
individual Jobs, just Tasks? Therefore, since I don't know the job
definitions up front, I will have parameterized Task templates, and
generate a new Task every time I need to run a Job?

Is that the recommended route?

Our work is very non-uniform so I don't think work-stealing would be
efficient for us.

-Bryan

On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <wf...@apache.org> wrote:
> Thanks for checking out Aurora!
>
> My short answer is that Aurora should handle thousands of short-lived
> tasks/jobs per day without trouble.  (If you proceed with this approach and
> encounter performance issues, feel free to file tickets!)  The DSL does
> have some mechanisms for parameterization.  In your case since you probably
> don't know all the job definitions upfront, you'll probably want to
> parameterize with environment variables.  I don't see this described in our
> docs, but you there's a little detail at the option declaration [1].
>
> Another approach worth considering is work-stealing, using a single job as
> your pool of workers.  I would find this easier to manage, but it would
> only be suitable if your work items are sufficiently-uniform.
>
> Feel free to continue the discussion!  We're also pretty active in our IRC
> channel if you'd prefer that medium.
>
>
> [1]
> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
>
>
> -=Bill
>
>
> On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <br...@codeclimate.com>wrote:
>
>> Hello,
>>
>> I am considering Aurora for a key component of our infrastructure.
>> Awesome work being done here.
>>
>> My question is: How suitable is Aurora for running short-lived tasks?
>>
>> Background: We (Code Climate) do static analysis of tens of thousands
>> of repositories every day. We run a variety of forms of analysis, with
>> heterogeneous resource requirements, and thus our interest in Mesos.
>>
>> Looking at Aurora, a lot of the core features look very helpful to us.
>> Where I am getting hung up is figuring out how to model short-lived
>> tasks as tasks/jobs. Long-running resource allocations are not really
>> an option for us due to the variation in our workloads.
>>
>> My first thought was to create a Task for each type of analysis we
>> run, and then start a new Job with the appropriate Task every time we
>> want to run analysis (regulated by a queue). This doesn't seem to work
>> though. I can't `aurora create` the same `.aurora` file multiple times
>> with different Job names (as far as I can tell). Also there is the
>> problem of how to customize each Job slightly (e.g. a payload).
>>
>> An obvious alternative is to create a unique Task every time we want
>> to run work. This would result in tens of thousands of tasks being
>> created every day, and from what I can tell Aurora does not intend to
>> be used like that. (Please correct me if I am wrong.)
>>
>> Basically, I would like to hook my job queue up to Aurora to perform
>> the actual work. There are a dozen different types of jobs, each with
>> different performance requirements. Every time a job runs, it has a
>> unique payload containing the definition of the work it should be
>> performed.
>>
>> Can Aurora be used this way? If so, what is the proper way to model
>> this with respect to Jobs and Tasks?
>>
>> Any/all help is appreciated.
>>
>> Thanks!
>>
>> -Bryan
>>
>> --
>> Bryan Helmkamp, Founder, Code Climate
>> bryan@codeclimate.com / 646-379-1810 / @brynary
>>



-- 
Bryan Helmkamp, Founder, Code Climate
bryan@codeclimate.com / 646-379-1810 / @brynary

Re: Suitibility of Aurora for one-time tasks

Posted by Bill Farner <wf...@apache.org>.
Thanks for checking out Aurora!

My short answer is that Aurora should handle thousands of short-lived
tasks/jobs per day without trouble.  (If you proceed with this approach and
encounter performance issues, feel free to file tickets!)  The DSL does
have some mechanisms for parameterization.  In your case since you probably
don't know all the job definitions upfront, you'll probably want to
parameterize with environment variables.  I don't see this described in our
docs, but you there's a little detail at the option declaration [1].

Another approach worth considering is work-stealing, using a single job as
your pool of workers.  I would find this easier to manage, but it would
only be suitable if your work items are sufficiently-uniform.

Feel free to continue the discussion!  We're also pretty active in our IRC
channel if you'd prefer that medium.


[1]
https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183


-=Bill


On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <br...@codeclimate.com>wrote:

> Hello,
>
> I am considering Aurora for a key component of our infrastructure.
> Awesome work being done here.
>
> My question is: How suitable is Aurora for running short-lived tasks?
>
> Background: We (Code Climate) do static analysis of tens of thousands
> of repositories every day. We run a variety of forms of analysis, with
> heterogeneous resource requirements, and thus our interest in Mesos.
>
> Looking at Aurora, a lot of the core features look very helpful to us.
> Where I am getting hung up is figuring out how to model short-lived
> tasks as tasks/jobs. Long-running resource allocations are not really
> an option for us due to the variation in our workloads.
>
> My first thought was to create a Task for each type of analysis we
> run, and then start a new Job with the appropriate Task every time we
> want to run analysis (regulated by a queue). This doesn't seem to work
> though. I can't `aurora create` the same `.aurora` file multiple times
> with different Job names (as far as I can tell). Also there is the
> problem of how to customize each Job slightly (e.g. a payload).
>
> An obvious alternative is to create a unique Task every time we want
> to run work. This would result in tens of thousands of tasks being
> created every day, and from what I can tell Aurora does not intend to
> be used like that. (Please correct me if I am wrong.)
>
> Basically, I would like to hook my job queue up to Aurora to perform
> the actual work. There are a dozen different types of jobs, each with
> different performance requirements. Every time a job runs, it has a
> unique payload containing the definition of the work it should be
> performed.
>
> Can Aurora be used this way? If so, what is the proper way to model
> this with respect to Jobs and Tasks?
>
> Any/all help is appreciated.
>
> Thanks!
>
> -Bryan
>
> --
> Bryan Helmkamp, Founder, Code Climate
> bryan@codeclimate.com / 646-379-1810 / @brynary
>