You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@aurora.apache.org by Ziliang Chen <zl...@gmail.com> on 2016/06/11 15:15:32 UTC

Re: Would you recommend Aurora?

Hi,

Great discussion here.
May I extend the question a little bit ? I am wondering how Aurora scales:
can Aurora schedule millions of cron (for cron, the jobs run periodically
say every 1, 2 or 5 minutes) /service jobs ? Is there any
documentation/perf benchmark for Aurora i can refer to ? I heard that
Aurora can schedule several thousands jobs per second. Never tested that,
but good to confirm.

Thanks a lot !

On Thu, May 26, 2016 at 1:01 AM, Jillian Cocklin <
jillian.cocklin@danalinc.com> wrote:

> Thanks Brian & Maxim, those are great leads.  Awesome that Heron has gone
> open source!  Definitely glad to have learned more about Aurora – for the
> right situation it seems like a really great solution.
>
>
>
> Thanks,
>
> J.
>
>
>
> *From:* Brian Hatfield [mailto:bhatfield@twitter.com]
> *Sent:* Wednesday, May 25, 2016 9:57 AM
> *To:* user@aurora.apache.org
>
> *Subject:* Re: Would you recommend Aurora?
>
>
>
> I mentioned Heron yesterday in this thread - you might like to know that
> as of this morning, it's now open source:
> https://blog.twitter.com/2016/open-sourcing-twitter-heron
>
>
>
> On Wed, May 25, 2016 at 12:22 PM, Maxim Khutornenko <ma...@apache.org>
> wrote:
>
> Hi Jillian,
>
>
>
> You may still consider Aurora if you want a more complex (ala Heron-style)
> orchestration around your batch processing workloads.
>
>
>
> That said, there are plenty of alternatives for batch processing if you
> feel that'll be too much to load:
> http://mesos.apache.org/documentation/latest/frameworks/
>
>
>
> There is also a young but promising framework specifically targeting large
> batch job counts that you may want to explore:
> https://github.com/twosigma/Cook.
>
>
>
> On Wed, May 25, 2016 at 8:12 AM, Jillian Cocklin <
> jillian.cocklin@danalinc.com> wrote:
>
> Thanks Brian and Rick - that's what I was starting to think too.  I
> appreciate your input and the quick responses.
>
>
>
> Best,
>
> J.
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
>
>
>
> _____________________________
> From: rick@chartbeat.com
> Sent: Wednesday, May 25, 2016 4:47 AM
> Subject: Re: Would you recommend Aurora?
> To: <us...@aurora.apache.org>
>
>
>
> Sounds to me like you want something like spark or a traditional map
> reduce framework.
>
>
> On May 24, 2016, at 9:36 PM, Brian Hatfield <bh...@twitter.com> wrote:
>
> It seems like Aurora would not be the solution to your problem entirely.
>
>
>
> It sounds like you either want a stream processor with a way to stream in
> the chunked batch (see also: Storm or Heron (which runs on Aurora)
> <https://blog.twitter.com/2015/flying-faster-with-twitter-heron>), or a
> way to process batch jobs (see also: Hadoop, which can run on Mesos
> <https://github.com/mesos/hadoop> and possibly Aurora).
>
>
>
> I'm not sure which fits your use case better based upon your description,
> but I hope that this is at least a seed of information in the right
> direction.
>
>
>
> Brian
>
>
>
> On Tue, May 24, 2016 at 9:14 PM, Jillian Cocklin <
> jillian.cocklin@danalinc.com> wrote:
>
> I’m analyzing Aurora as a potential candidate for a new project.  While
> the high-level architecture seems to be a good fit, I’m not seeing a lot of
> documentation that matches our use case.
>
>  On an ongoing basis, we’ll receive batch files of records (~5 million
> records per batch), and based on record types we need to “process” them
> against our services.  We’d break up the records into small chunks,
> instantiate a job for each chunk, and have each job be automatically queued
> up to run on available resources (which can be auto scaled up/down as
> needed).
>
>
>
> At first glance it looked like Aurora could create jobs  - but I can’t
> tell whether those can be made as templates so that they can be dynamically
> instantiated, passed data, and run simultaneously.  Are there any best
> practices or code examples for this?  Most of what I’ve found fits better
> with the use case of having different static jobs (like chron jobs or IT
> services) that each need to be run on a periodic basis or continue running
> indefinitely.
>
>
>
> Can anyone let me know whether this is worth pursuing with Aurora?
>
>
>
> Thanks!
>
> J.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>



-- 
Regards, Zi-Liang

Mail:zlchen.ken@gmail.com

Re: Would you recommend Aurora?

Posted by Ziliang Chen <zl...@gmail.com>.
Thank you, Erb !

On Sat, Jun 18, 2016 at 12:40 AM, Erb, Stephan <St...@blue-yonder.com>
wrote:

> 1) It is really hard to answer that question, especially given that there
> is a huge difference between a scheduled cron and running job. Your best
> guess is probably to do some load testing for your particular usecase, and
> to evaluate other design choices as necessary.
>
>
>
> 2) The link I provided for the 40 tasks per second is actually a config
> option. So you could change this, if absolutely necessary.
>
>
>
> *From: *Ziliang Chen <zl...@gmail.com>
> *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>
> *Date: *Monday 13 June 2016 at 03:56
> *To: *"user@aurora.apache.org" <us...@aurora.apache.org>
>
> *Subject: *Re: Would you recommend Aurora?
>
>
>
> Thanks Erb for the great details.
>
> 1) Assume we have 1000 customer, each of the customer has 1000 periodical
> cron jobs. I would like the schedule the total 1M jobs across a pool of
> machines. If Aurora can't take this load, any suggestion/candidate ?
>
> 2) 40 tasks per second. Is there a way to change the default by
> configuration instead of modifying the code ?
>
>
>
> Thank you very much !
>
>
>
> On Mon, Jun 13, 2016 at 1:50 AM, Erb, Stephan <St...@blue-yonder.com>
> wrote:
>
> Could you clarify your cron usecase? Millions of cron jobs that run up to
> every minute sounds more like you want a couple of long running processes
> that do the actual work with a little sleep in between, rather than doing
> task spawning and distribution in Mesos & Aurora for each of them.
>
>
>
> Regarding Aurora's scale: Twitter has recently disclosed that they have
> 250,000 containers/tasks running, with the largest cluster being in the
> range of 30,000 nodes [1].  Aurora is by default not trying to schedule
> more than 40 tasks per second [2]. You can probably try to adjust that
> value, but this could bring other downsides.
>
> ​
>
> [1] https://youtu.be/FU7wrqsRj3o?t=21m11s
>
> [2]
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java#L39-L41
>
> ------------------------------
>
> *From:* Ziliang Chen <zl...@gmail.com>
> *Sent:* Saturday, June 11, 2016 17:15
>
>
> *To:* user@aurora.apache.org
> *Subject:* Re: Would you recommend Aurora?
>
>
>
> Hi,
>
>
>
> Great discussion here.
>
> May I extend the question a little bit ? I am wondering how Aurora scales:
> can Aurora schedule millions of cron (for cron, the jobs run periodically
> say every 1, 2 or 5 minutes) /service jobs ? Is there any
> documentation/perf benchmark for Aurora i can refer to ? I heard that
> Aurora can schedule several thousands jobs per second. Never tested that,
> but good to confirm.
>
>
>
> Thanks a lot !
>
>
>
> On Thu, May 26, 2016 at 1:01 AM, Jillian Cocklin <
> jillian.cocklin@danalinc.com> wrote:
>
> Thanks Brian & Maxim, those are great leads.  Awesome that Heron has gone
> open source!  Definitely glad to have learned more about Aurora – for the
> right situation it seems like a really great solution.
>
>
>
> Thanks,
>
> J.
>
>
>
> *From:* Brian Hatfield [mailto:bhatfield@twitter.com]
> *Sent:* Wednesday, May 25, 2016 9:57 AM
> *To:* user@aurora.apache.org
>
>
> *Subject:* Re: Would you recommend Aurora?
>
>
>
> I mentioned Heron yesterday in this thread - you might like to know that
> as of this morning, it's now open source:
> https://blog.twitter.com/2016/open-sourcing-twitter-heron
>
>
>
> On Wed, May 25, 2016 at 12:22 PM, Maxim Khutornenko <ma...@apache.org>
> wrote:
>
> Hi Jillian,
>
>
>
> You may still consider Aurora if you want a more complex (ala Heron-style)
> orchestration around your batch processing workloads.
>
>
>
> That said, there are plenty of alternatives for batch processing if you
> feel that'll be too much to load:
> http://mesos.apache.org/documentation/latest/frameworks/
>
>
>
> There is also a young but promising framework specifically targeting large
> batch job counts that you may want to explore:
> https://github.com/twosigma/Cook.
>
>
>
> On Wed, May 25, 2016 at 8:12 AM, Jillian Cocklin <
> jillian.cocklin@danalinc.com> wrote:
>
> Thanks Brian and Rick - that's what I was starting to think too.  I
> appreciate your input and the quick responses.
>
>
>
> Best,
>
> J.
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
>
>
>
> _____________________________
> From: rick@chartbeat.com
> Sent: Wednesday, May 25, 2016 4:47 AM
> Subject: Re: Would you recommend Aurora?
> To: <us...@aurora.apache.org>
>
>
>
> Sounds to me like you want something like spark or a traditional map
> reduce framework.
>
>
> On May 24, 2016, at 9:36 PM, Brian Hatfield <bh...@twitter.com> wrote:
>
> It seems like Aurora would not be the solution to your problem entirely.
>
>
>
> It sounds like you either want a stream processor with a way to stream in
> the chunked batch (see also: Storm or Heron (which runs on Aurora)
> <https://blog.twitter.com/2015/flying-faster-with-twitter-heron>), or a
> way to process batch jobs (see also: Hadoop, which can run on Mesos
> <https://github.com/mesos/hadoop> and possibly Aurora).
>
>
>
> I'm not sure which fits your use case better based upon your description,
> but I hope that this is at least a seed of information in the right
> direction.
>
>
>
> Brian
>
>
>
> On Tue, May 24, 2016 at 9:14 PM, Jillian Cocklin <
> jillian.cocklin@danalinc.com> wrote:
>
> I’m analyzing Aurora as a potential candidate for a new project.  While
> the high-level architecture seems to be a good fit, I’m not seeing a lot of
> documentation that matches our use case.
>
>  On an ongoing basis, we’ll receive batch files of records (~5 million
> records per batch), and based on record types we need to “process” them
> against our services.  We’d break up the records into small chunks,
> instantiate a job for each chunk, and have each job be automatically queued
> up to run on available resources (which can be auto scaled up/down as
> needed).
>
>
>
> At first glance it looked like Aurora could create jobs  - but I can’t
> tell whether those can be made as templates so that they can be dynamically
> instantiated, passed data, and run simultaneously.  Are there any best
> practices or code examples for this?  Most of what I’ve found fits better
> with the use case of having different static jobs (like chron jobs or IT
> services) that each need to be run on a periodic basis or continue running
> indefinitely.
>
>
>
> Can anyone let me know whether this is worth pursuing with Aurora?
>
>
>
> Thanks!
>
> J.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Regards, Zi-Liang
>
> Mail:zlchen.ken@gmail.com
>
>
>
>
>
> --
>
> Regards, Zi-Liang
>
> Mail:zlchen.ken@gmail.com
>



-- 
Regards, Zi-Liang

Mail:zlchen.ken@gmail.com

Re: Would you recommend Aurora?

Posted by "Erb, Stephan" <St...@blue-yonder.com>.
1) It is really hard to answer that question, especially given that there is a huge difference between a scheduled cron and running job. Your best guess is probably to do some load testing for your particular usecase, and to evaluate other design choices as necessary.

2) The link I provided for the 40 tasks per second is actually a config option. So you could change this, if absolutely necessary.

From: Ziliang Chen <zl...@gmail.com>
Reply-To: "user@aurora.apache.org" <us...@aurora.apache.org>
Date: Monday 13 June 2016 at 03:56
To: "user@aurora.apache.org" <us...@aurora.apache.org>
Subject: Re: Would you recommend Aurora?

Thanks Erb for the great details.
1) Assume we have 1000 customer, each of the customer has 1000 periodical cron jobs. I would like the schedule the total 1M jobs across a pool of machines. If Aurora can't take this load, any suggestion/candidate ?
2) 40 tasks per second. Is there a way to change the default by configuration instead of modifying the code ?

Thank you very much !

On Mon, Jun 13, 2016 at 1:50 AM, Erb, Stephan <St...@blue-yonder.com>> wrote:

Could you clarify your cron usecase? Millions of cron jobs that run up to every minute sounds more like you want a couple of long running processes that do the actual work with a little sleep in between, rather than doing task spawning and distribution in Mesos & Aurora for each of them.



Regarding Aurora's scale: Twitter has recently disclosed that they have 250,000 containers/tasks running, with the largest cluster being in the range of 30,000 nodes [1].  Aurora is by default not trying to schedule more than 40 tasks per second [2]. You can probably try to adjust that value, but this could bring other downsides.

​

[1] https://youtu.be/FU7wrqsRj3o?t=21m11s

[2] https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java#L39-L41

________________________________
From: Ziliang Chen <zl...@gmail.com>>
Sent: Saturday, June 11, 2016 17:15

To: user@aurora.apache.org<ma...@aurora.apache.org>
Subject: Re: Would you recommend Aurora?

Hi,

Great discussion here.
May I extend the question a little bit ? I am wondering how Aurora scales: can Aurora schedule millions of cron (for cron, the jobs run periodically say every 1, 2 or 5 minutes) /service jobs ? Is there any documentation/perf benchmark for Aurora i can refer to ? I heard that Aurora can schedule several thousands jobs per second. Never tested that, but good to confirm.

Thanks a lot !

On Thu, May 26, 2016 at 1:01 AM, Jillian Cocklin <ji...@danalinc.com>> wrote:
Thanks Brian & Maxim, those are great leads.  Awesome that Heron has gone open source!  Definitely glad to have learned more about Aurora – for the right situation it seems like a really great solution.

Thanks,
J.

From: Brian Hatfield [mailto:bhatfield@twitter.com<ma...@twitter.com>]
Sent: Wednesday, May 25, 2016 9:57 AM
To: user@aurora.apache.org<ma...@aurora.apache.org>

Subject: Re: Would you recommend Aurora?

I mentioned Heron yesterday in this thread - you might like to know that as of this morning, it's now open source: https://blog.twitter.com/2016/open-sourcing-twitter-heron

On Wed, May 25, 2016 at 12:22 PM, Maxim Khutornenko <ma...@apache.org>> wrote:
Hi Jillian,

You may still consider Aurora if you want a more complex (ala Heron-style) orchestration around your batch processing workloads.

That said, there are plenty of alternatives for batch processing if you feel that'll be too much to load: http://mesos.apache.org/documentation/latest/frameworks/

There is also a young but promising framework specifically targeting large batch job counts that you may want to explore: https://github.com/twosigma/Cook.

On Wed, May 25, 2016 at 8:12 AM, Jillian Cocklin <ji...@danalinc.com>> wrote:
Thanks Brian and Rick - that's what I was starting to think too.  I appreciate your input and the quick responses.

Best,
J.
Get Outlook for iOS<https://aka.ms/o0ukef>

_____________________________
From: rick@chartbeat.com<ma...@chartbeat.com>
Sent: Wednesday, May 25, 2016 4:47 AM
Subject: Re: Would you recommend Aurora?
To: <us...@aurora.apache.org>>

Sounds to me like you want something like spark or a traditional map reduce framework.

On May 24, 2016, at 9:36 PM, Brian Hatfield <bh...@twitter.com>> wrote:
It seems like Aurora would not be the solution to your problem entirely.

It sounds like you either want a stream processor with a way to stream in the chunked batch (see also: Storm or Heron (which runs on Aurora)<https://blog.twitter.com/2015/flying-faster-with-twitter-heron>), or a way to process batch jobs (see also: Hadoop, which can run on Mesos<https://github.com/mesos/hadoop> and possibly Aurora).

I'm not sure which fits your use case better based upon your description, but I hope that this is at least a seed of information in the right direction.

Brian

On Tue, May 24, 2016 at 9:14 PM, Jillian Cocklin <ji...@danalinc.com>> wrote:
I’m analyzing Aurora as a potential candidate for a new project.  While the high-level architecture seems to be a good fit, I’m not seeing a lot of documentation that matches our use case.
 On an ongoing basis, we’ll receive batch files of records (~5 million records per batch), and based on record types we need to “process” them against our services.  We’d break up the records into small chunks, instantiate a job for each chunk, and have each job be automatically queued up to run on available resources (which can be auto scaled up/down as needed).

At first glance it looked like Aurora could create jobs  - but I can’t tell whether those can be made as templates so that they can be dynamically instantiated, passed data, and run simultaneously.  Are there any best practices or code examples for this?  Most of what I’ve found fits better with the use case of having different static jobs (like chron jobs or IT services) that each need to be run on a periodic basis or continue running indefinitely.

Can anyone let me know whether this is worth pursuing with Aurora?

Thanks!
J.














--
Regards, Zi-Liang

Mail:zlchen.ken@gmail.com<ma...@gmail.com>



--
Regards, Zi-Liang

Mail:zlchen.ken@gmail.com<ma...@gmail.com>

Re: Would you recommend Aurora?

Posted by Ziliang Chen <zl...@gmail.com>.
Thanks Erb for the great details.
1) Assume we have 1000 customer, each of the customer has 1000 periodical
cron jobs. I would like the schedule the total 1M jobs across a pool of
machines. If Aurora can't take this load, any suggestion/candidate ?
2) 40 tasks per second. Is there a way to change the default by
configuration instead of modifying the code ?

Thank you very much !

On Mon, Jun 13, 2016 at 1:50 AM, Erb, Stephan <St...@blue-yonder.com>
wrote:

> Could you clarify your cron usecase? Millions of cron jobs that run up to
> every minute sounds more like you want a couple of long running processes
> that do the actual work with a little sleep in between, rather than doing
> task spawning and distribution in Mesos & Aurora for each of them.
>
>
> Regarding Aurora's scale: Twitter has recently disclosed that they have
> 250,000 containers/tasks running, with the largest cluster being in the
> range of 30,000 nodes [1].  Aurora is by default not trying to schedule
> more than 40 tasks per second [2]. You can probably try to adjust that
> value, but this could bring other downsides.
>
> ​
>
> [1] https://youtu.be/FU7wrqsRj3o?t=21m11s
>
> [2]
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java#L39-L41
>
> ------------------------------
> *From:* Ziliang Chen <zl...@gmail.com>
> *Sent:* Saturday, June 11, 2016 17:15
>
> *To:* user@aurora.apache.org
> *Subject:* Re: Would you recommend Aurora?
>
> Hi,
>
> Great discussion here.
> May I extend the question a little bit ? I am wondering how Aurora scales:
> can Aurora schedule millions of cron (for cron, the jobs run periodically
> say every 1, 2 or 5 minutes) /service jobs ? Is there any
> documentation/perf benchmark for Aurora i can refer to ? I heard that
> Aurora can schedule several thousands jobs per second. Never tested that,
> but good to confirm.
>
> Thanks a lot !
>
> On Thu, May 26, 2016 at 1:01 AM, Jillian Cocklin <
> jillian.cocklin@danalinc.com> wrote:
>
>> Thanks Brian & Maxim, those are great leads.  Awesome that Heron has gone
>> open source!  Definitely glad to have learned more about Aurora – for the
>> right situation it seems like a really great solution.
>>
>>
>>
>> Thanks,
>>
>> J.
>>
>>
>>
>> *From:* Brian Hatfield [mailto:bhatfield@twitter.com]
>> *Sent:* Wednesday, May 25, 2016 9:57 AM
>> *To:* user@aurora.apache.org
>>
>> *Subject:* Re: Would you recommend Aurora?
>>
>>
>>
>> I mentioned Heron yesterday in this thread - you might like to know that
>> as of this morning, it's now open source:
>> https://blog.twitter.com/2016/open-sourcing-twitter-heron
>>
>>
>>
>> On Wed, May 25, 2016 at 12:22 PM, Maxim Khutornenko <ma...@apache.org>
>> wrote:
>>
>> Hi Jillian,
>>
>>
>>
>> You may still consider Aurora if you want a more complex (ala
>> Heron-style) orchestration around your batch processing workloads.
>>
>>
>>
>> That said, there are plenty of alternatives for batch processing if you
>> feel that'll be too much to load:
>> http://mesos.apache.org/documentation/latest/frameworks/
>>
>>
>>
>> There is also a young but promising framework specifically targeting
>> large batch job counts that you may want to explore:
>> https://github.com/twosigma/Cook.
>>
>>
>>
>> On Wed, May 25, 2016 at 8:12 AM, Jillian Cocklin <
>> jillian.cocklin@danalinc.com> wrote:
>>
>> Thanks Brian and Rick - that's what I was starting to think too.  I
>> appreciate your input and the quick responses.
>>
>>
>>
>> Best,
>>
>> J.
>>
>> Get Outlook for iOS <https://aka.ms/o0ukef>
>>
>>
>>
>> _____________________________
>> From: rick@chartbeat.com
>> Sent: Wednesday, May 25, 2016 4:47 AM
>> Subject: Re: Would you recommend Aurora?
>> To: <us...@aurora.apache.org>
>>
>>
>>
>> Sounds to me like you want something like spark or a traditional map
>> reduce framework.
>>
>>
>> On May 24, 2016, at 9:36 PM, Brian Hatfield <bh...@twitter.com>
>> wrote:
>>
>> It seems like Aurora would not be the solution to your problem entirely.
>>
>>
>>
>> It sounds like you either want a stream processor with a way to stream in
>> the chunked batch (see also: Storm or Heron (which runs on Aurora)
>> <https://blog.twitter.com/2015/flying-faster-with-twitter-heron>), or a
>> way to process batch jobs (see also: Hadoop, which can run on Mesos
>> <https://github.com/mesos/hadoop> and possibly Aurora).
>>
>>
>>
>> I'm not sure which fits your use case better based upon your description,
>> but I hope that this is at least a seed of information in the right
>> direction.
>>
>>
>>
>> Brian
>>
>>
>>
>> On Tue, May 24, 2016 at 9:14 PM, Jillian Cocklin <
>> jillian.cocklin@danalinc.com> wrote:
>>
>> I’m analyzing Aurora as a potential candidate for a new project.  While
>> the high-level architecture seems to be a good fit, I’m not seeing a lot of
>> documentation that matches our use case.
>>
>>  On an ongoing basis, we’ll receive batch files of records (~5 million
>> records per batch), and based on record types we need to “process” them
>> against our services.  We’d break up the records into small chunks,
>> instantiate a job for each chunk, and have each job be automatically queued
>> up to run on available resources (which can be auto scaled up/down as
>> needed).
>>
>>
>>
>> At first glance it looked like Aurora could create jobs  - but I can’t
>> tell whether those can be made as templates so that they can be dynamically
>> instantiated, passed data, and run simultaneously.  Are there any best
>> practices or code examples for this?  Most of what I’ve found fits better
>> with the use case of having different static jobs (like chron jobs or IT
>> services) that each need to be run on a periodic basis or continue running
>> indefinitely.
>>
>>
>>
>> Can anyone let me know whether this is worth pursuing with Aurora?
>>
>>
>>
>> Thanks!
>>
>> J.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
> --
> Regards, Zi-Liang
>
> Mail:zlchen.ken@gmail.com
>



-- 
Regards, Zi-Liang

Mail:zlchen.ken@gmail.com

Re: Would you recommend Aurora?

Posted by "Erb, Stephan" <St...@blue-yonder.com>.
Could you clarify your cron usecase? Millions of cron jobs that run up to every minute sounds more like you want a couple of long running processes that do the actual work with a little sleep in between, rather than doing task spawning and distribution in Mesos & Aurora for each of them.


Regarding Aurora's scale: Twitter has recently disclosed that they have 250,000 containers/tasks running, with the largest cluster being in the range of 30,000 nodes [1].  Aurora is by default not trying to schedule more than 40 tasks per second [2]. You can probably try to adjust that value, but this could bring other downsides.

​

[1] https://youtu.be/FU7wrqsRj3o?t=21m11s

[2] https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java#L39-L41

________________________________
From: Ziliang Chen <zl...@gmail.com>
Sent: Saturday, June 11, 2016 17:15
To: user@aurora.apache.org
Subject: Re: Would you recommend Aurora?

Hi,

Great discussion here.
May I extend the question a little bit ? I am wondering how Aurora scales: can Aurora schedule millions of cron (for cron, the jobs run periodically say every 1, 2 or 5 minutes) /service jobs ? Is there any documentation/perf benchmark for Aurora i can refer to ? I heard that Aurora can schedule several thousands jobs per second. Never tested that, but good to confirm.

Thanks a lot !

On Thu, May 26, 2016 at 1:01 AM, Jillian Cocklin <ji...@danalinc.com>> wrote:
Thanks Brian & Maxim, those are great leads.  Awesome that Heron has gone open source!  Definitely glad to have learned more about Aurora – for the right situation it seems like a really great solution.

Thanks,
J.

From: Brian Hatfield [mailto:bhatfield@twitter.com<ma...@twitter.com>]
Sent: Wednesday, May 25, 2016 9:57 AM
To: user@aurora.apache.org<ma...@aurora.apache.org>

Subject: Re: Would you recommend Aurora?

I mentioned Heron yesterday in this thread - you might like to know that as of this morning, it's now open source: https://blog.twitter.com/2016/open-sourcing-twitter-heron

On Wed, May 25, 2016 at 12:22 PM, Maxim Khutornenko <ma...@apache.org>> wrote:
Hi Jillian,

You may still consider Aurora if you want a more complex (ala Heron-style) orchestration around your batch processing workloads.

That said, there are plenty of alternatives for batch processing if you feel that'll be too much to load: http://mesos.apache.org/documentation/latest/frameworks/

There is also a young but promising framework specifically targeting large batch job counts that you may want to explore: https://github.com/twosigma/Cook.

On Wed, May 25, 2016 at 8:12 AM, Jillian Cocklin <ji...@danalinc.com>> wrote:
Thanks Brian and Rick - that's what I was starting to think too.  I appreciate your input and the quick responses.

Best,
J.
Get Outlook for iOS<https://aka.ms/o0ukef>

_____________________________
From: rick@chartbeat.com<ma...@chartbeat.com>
Sent: Wednesday, May 25, 2016 4:47 AM
Subject: Re: Would you recommend Aurora?
To: <us...@aurora.apache.org>>


Sounds to me like you want something like spark or a traditional map reduce framework.

On May 24, 2016, at 9:36 PM, Brian Hatfield <bh...@twitter.com>> wrote:
It seems like Aurora would not be the solution to your problem entirely.

It sounds like you either want a stream processor with a way to stream in the chunked batch (see also: Storm or Heron (which runs on Aurora)<https://blog.twitter.com/2015/flying-faster-with-twitter-heron>), or a way to process batch jobs (see also: Hadoop, which can run on Mesos<https://github.com/mesos/hadoop> and possibly Aurora).

I'm not sure which fits your use case better based upon your description, but I hope that this is at least a seed of information in the right direction.

Brian

On Tue, May 24, 2016 at 9:14 PM, Jillian Cocklin <ji...@danalinc.com>> wrote:
I’m analyzing Aurora as a potential candidate for a new project.  While the high-level architecture seems to be a good fit, I’m not seeing a lot of documentation that matches our use case.
 On an ongoing basis, we’ll receive batch files of records (~5 million records per batch), and based on record types we need to “process” them against our services.  We’d break up the records into small chunks, instantiate a job for each chunk, and have each job be automatically queued up to run on available resources (which can be auto scaled up/down as needed).

At first glance it looked like Aurora could create jobs  - but I can’t tell whether those can be made as templates so that they can be dynamically instantiated, passed data, and run simultaneously.  Are there any best practices or code examples for this?  Most of what I’ve found fits better with the use case of having different static jobs (like chron jobs or IT services) that each need to be run on a periodic basis or continue running indefinitely.

Can anyone let me know whether this is worth pursuing with Aurora?

Thanks!
J.














--
Regards, Zi-Liang

Mail:zlchen.ken@gmail.com<ma...@gmail.com>