You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Dmitry Goldenberg <dg...@gmail.com> on 2015/06/04 18:08:46 UTC

Cluster autoscaling in Spark+Mesos ?

A Mesos noob here. Could someone point me at the doc or summary for the
cluster autoscaling capabilities in Mesos?

Is there a way to feed it events and have it detect the need to bring in
more machines or decommission machines?  Is there a way to receive events
back that notify you that machines have been allocated or decommissioned?

Would this work within a certain set of
"preallocated"/pre-provisioned/"stand-by" machines or will Mesos go and
grab machines from the cloud?

What are the integration points of Apache Spark and Mesos?  What are the
true advantages of running Spark on Mesos?

Can Mesos autoscale the cluster based on some signals/events coming out of
Spark runtime or Spark consumers, then cause the consumers to run on the
updated cluster, or signal to the consumers to restart themselves into an
updated cluster?

Thanks.

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Tim Chen <ti...@mesosphere.io>.

[What are the integration points of Apache Spark and Mesos?  What are the
true advantages of running Spark on Mesos?]

Spark runs on Mesos by acting as a framework/scheduler, and Spark
out-of-box provides a coarse grain and a fine grain scheduler.
I think the advantage of Spark running on Mesos is that, it's easy to
define Spark specific scheduling needs using Mesos Scheduler APIs, and
therefore can provide more optimization opportunities. Also by running on
Mesos you can share the cluster with a lot more frameworks and we're adding
a lot more support to make the multi-framework experience a lot nicer.

[Can Mesos autoscale the cluster based on some signals/events coming out of
Spark runtime or Spark consumers, then cause the consumers to run on the
updated cluster, or signal to the consumers to restart themselves into an
updated cluster?]

Mesos won't autoscale out of the box, but Spark has dynamic allocation that
can scale down and back up based on Spark metrics.
Potentially more can be done on the Spark scheduler side that can signal
more events and scale opportunities, so if you have ideas about Spark
please feel free to email the dev@spark list, create JIRAs or we can chat
on IRC/email for more discussions too.

Tim

On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg <dg...@gmail.com>
wrote:

> A Mesos noob here. Could someone point me at the doc or summary for the
> cluster autoscaling capabilities in Mesos?
>
> Is there a way to feed it events and have it detect the need to bring in
> more machines or decommission machines?  Is there a way to receive events
> back that notify you that machines have been allocated or decommissioned?
>
> Would this work within a certain set of
> "preallocated"/pre-provisioned/"stand-by" machines or will Mesos go and
> grab machines from the cloud?
>
> What are the integration points of Apache Spark and Mesos?  What are the
> true advantages of running Spark on Mesos?
>
> Can Mesos autoscale the cluster based on some signals/events coming out of
> Spark runtime or Spark consumers, then cause the consumers to run on the
> updated cluster, or signal to the consumers to restart themselves into an
> updated cluster?
>
> Thanks.
>

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Sharma Podila <sp...@netflix.com>.

Not yet, we are working on making it available, sometime soon (I know, I've
said that before). Until then, if you are interested, some details are
available in my slides from Nov at
http://www.slideshare.net/spodila/aws-reinvent-2014-talk-scheduling-using-apache-mesos-in-the-cloud


On Fri, Jun 5, 2015 at 12:05 AM, Ankur Chauhan <an...@malloc64.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> @Sharma - Is mantis/fenzo available on github or something, I did find
> some maven artifacts but the repository netflix/fenzo is a 404. I am
> interested in learning about the bin packing logic of fenzo.
>
> - -- Ankur Chauhan
>
> On 04/06/2015 22:35, Sharma Podila wrote:
> > We Autoscale our Mesos cluster in EC2 from within our framework.
> > Scaling up can be easy via watching demand Vs supply. However,
> > scaling down requires bin packing the tasks tightly onto as few
> > servers as possible. Do you have any specific ideas on how you
> > would leverage Mantis/Mesos for Spark based jobs? Fenzo, the
> > scheduler part of Mantis, could be another point of leverage, which
> > could give a framework the ability to autoscale the cluster among
> > other benefits.
> >
> >
> >
> > On Thu, Jun 4, 2015 at 1:06 PM, Dmitry Goldenberg
> > <dgoldenberg123@gmail.com <ma...@gmail.com>>
> > wrote:
> >
> > Thanks, Vinod. I'm really interested in how we could leverage
> > something like Mantis and Mesos to achieve autoscaling in a
> > Spark-based data processing system...
> >
> > On Jun 4, 2015, at 3:54 PM, Vinod Kone <vinodkone@gmail.com
> > <ma...@gmail.com>> wrote:
> >
> >> Hey Dmitry. At the current time there is no built-in support for
> >> Mesos to autoscale nodes in the cluster. I've heard people
> >> (Netflix?) do it out of band on EC2.
> >>
> >> On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg
> >> <dgoldenberg123@gmail.com <ma...@gmail.com>>
> >> wrote:
> >>
> >> A Mesos noob here. Could someone point me at the doc or summary
> >> for the cluster autoscaling capabilities in Mesos?
> >>
> >> Is there a way to feed it events and have it detect the need to
> >> bring in more machines or decommission machines?  Is there a way
> >> to receive events back that notify you that machines have been
> >> allocated or decommissioned?
> >>
> >> Would this work within a certain set of
> >> "preallocated"/pre-provisioned/"stand-by" machines or will Mesos
> >> go and grab machines from the cloud?
> >>
> >> What are the integration points of Apache Spark and Mesos? What
> >> are the true advantages of running Spark on Mesos?
> >>
> >> Can Mesos autoscale the cluster based on some signals/events
> >> coming out of Spark runtime or Spark consumers, then cause the
> >> consumers to run on the updated cluster, or signal to the
> >> consumers to restart themselves into an updated cluster?
> >>
> >> Thanks.
> >>
> >>
> >
> -----BEGIN PGP SIGNATURE-----
>
> iQEcBAEBAgAGBQJVcUoqAAoJEOSJAMhvLp3LjYIIAK9pgU41hU3Dbn5tlVWxTK7y
> knsVOnVYiuA43DwDUTXgUUFNl67wMR0DAcueSPtUkXRfyWcgGtwDJfsF1R1vdlrN
> kAiSEVxOSnRb9Gg35HVjAE4Y4uYE5xZnULf6UWi65pIPUEV9nAm3i0K5chjyC/6T
> VE2QagNg3FurXrzeSMJkMrTuwIW+rWHkOifQMtnJb3HwqmdhidZlErXh7Sz5qiDv
> 0GMqjcEjpFK0ahrmDK4Nv675HitPOQN0R9V+sYhveKeRXe43CcoIUvk6yTlLN42Q
> oxl8HFLYxvZ4y+BlHuHO2sfVn6GJyO55sZWyk6k5BGVFT5RSCAjYME9jtCuSk3U=
> =RIIH
> -----END PGP SIGNATURE-----
>

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Ankur Chauhan <an...@malloc64.com>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

@Sharma - Is mantis/fenzo available on github or something, I did find
some maven artifacts but the repository netflix/fenzo is a 404. I am
interested in learning about the bin packing logic of fenzo.

- -- Ankur Chauhan

On 04/06/2015 22:35, Sharma Podila wrote:
> We Autoscale our Mesos cluster in EC2 from within our framework.
> Scaling up can be easy via watching demand Vs supply. However,
> scaling down requires bin packing the tasks tightly onto as few
> servers as possible. Do you have any specific ideas on how you
> would leverage Mantis/Mesos for Spark based jobs? Fenzo, the
> scheduler part of Mantis, could be another point of leverage, which
> could give a framework the ability to autoscale the cluster among
> other benefits.
> 
> 
> 
> On Thu, Jun 4, 2015 at 1:06 PM, Dmitry Goldenberg 
> <dgoldenberg123@gmail.com <ma...@gmail.com>>
> wrote:
> 
> Thanks, Vinod. I'm really interested in how we could leverage 
> something like Mantis and Mesos to achieve autoscaling in a 
> Spark-based data processing system...
> 
> On Jun 4, 2015, at 3:54 PM, Vinod Kone <vinodkone@gmail.com 
> <ma...@gmail.com>> wrote:
> 
>> Hey Dmitry. At the current time there is no built-in support for 
>> Mesos to autoscale nodes in the cluster. I've heard people 
>> (Netflix?) do it out of band on EC2.
>> 
>> On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg 
>> <dgoldenberg123@gmail.com <ma...@gmail.com>>
>> wrote:
>> 
>> A Mesos noob here. Could someone point me at the doc or summary
>> for the cluster autoscaling capabilities in Mesos?
>> 
>> Is there a way to feed it events and have it detect the need to
>> bring in more machines or decommission machines?  Is there a way
>> to receive events back that notify you that machines have been
>> allocated or decommissioned?
>> 
>> Would this work within a certain set of 
>> "preallocated"/pre-provisioned/"stand-by" machines or will Mesos
>> go and grab machines from the cloud?
>> 
>> What are the integration points of Apache Spark and Mesos? What
>> are the true advantages of running Spark on Mesos?
>> 
>> Can Mesos autoscale the cluster based on some signals/events 
>> coming out of Spark runtime or Spark consumers, then cause the 
>> consumers to run on the updated cluster, or signal to the 
>> consumers to restart themselves into an updated cluster?
>> 
>> Thanks.
>> 
>> 
> 
-----BEGIN PGP SIGNATURE-----

iQEcBAEBAgAGBQJVcUoqAAoJEOSJAMhvLp3LjYIIAK9pgU41hU3Dbn5tlVWxTK7y
knsVOnVYiuA43DwDUTXgUUFNl67wMR0DAcueSPtUkXRfyWcgGtwDJfsF1R1vdlrN
kAiSEVxOSnRb9Gg35HVjAE4Y4uYE5xZnULf6UWi65pIPUEV9nAm3i0K5chjyC/6T
VE2QagNg3FurXrzeSMJkMrTuwIW+rWHkOifQMtnJb3HwqmdhidZlErXh7Sz5qiDv
0GMqjcEjpFK0ahrmDK4Nv675HitPOQN0R9V+sYhveKeRXe43CcoIUvk6yTlLN42Q
oxl8HFLYxvZ4y+BlHuHO2sfVn6GJyO55sZWyk6k5BGVFT5RSCAjYME9jtCuSk3U=
=RIIH
-----END PGP SIGNATURE-----

Re: Cluster autoscaling in Spark+Mesos ?

Posted by zhou weitao <zh...@gmail.com>.

Awesome idea, Alex. I am thinking the similar thing also when I read the
volatility feedback part of  the book <Out of Control>.

2015-06-06 10:24 GMT+08:00 Alex Gaudio <ad...@gmail.com>:

> Thanks James!  I'd love to talk with you further if you have any
> interesting or specific problems an algorithm like Relay's might solve.
>
> There are all sorts of possible extension problems I haven't spent time
> thinking about, such as:
> -  scaling services rather than tasks
> -  more accurately predicting non-linear changes to the metric
> -  using Relay's algorithm as a feedback mechanism to optimize machine
> learning problems.
>
> If you're going to Mesoscon, we could catch up then!
>
> Alex
>
>
> On Fri, Jun 5, 2015 at 11:39 AM CCAAT <cc...@tampabay.rr.com> wrote:
>
>> On 06/05/2015 10:09 AM, Alex Gaudio wrote:
>> > Hi @Ankur,
>>
>> > Next, we built Relay <https://github.com/sailthru/relay> and, the Mesos
>> > extension, Relay.Mesos <https://github.com/sailthru/relay.mesos>, to
>> > convert our small scripts into long-running instances we could then put
>> > on Marathon.
>> >>                 Thanks.
>>
>>
>>  From the first reference page:
>> " This type of problem is often quite complex, and there is a field
>> called Control Theory dedicated to problems "
>>
>> Oh WOW! Somebody else that understands controls.....
>>
>> I was going to suggest some code to build a hybrid Feedback (PID based
>> on resource utilizations) + Feedforward (based on chronologically
>> repetitive events) to solve this problem, because it's quite easy to
>> also integrate relay boards to boot-up/shut-down resources, if systems
>> are properly configured for such up/down cycling of hardware. There is
>> also legacy boot/shutdown controls for resources via PXE and such ether
>> based hardware tricks.
>>
>> I think using concepts of process control applied to specific
>> computational resources, such as ram, cpu, etc etc, also needs to be
>> addressed with autoscaling.  Quick glance at your projects and you
>> are well underway.
>>
>> Hint: You might want to explain some fundamentals of "Control Theory",
>> as I'm not sure this collective is aware of such simple, beautiful
>> and robust mathematics.
>>
>>
>> PS: avoid Nyquist stability, for now, but bode plot would be keen!
>>
>>
>> hth,
>> James
>>
>>
>>
>>

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Alex Gaudio <ad...@gmail.com>.

Thanks James!  I'd love to talk with you further if you have any
interesting or specific problems an algorithm like Relay's might solve.

There are all sorts of possible extension problems I haven't spent time
thinking about, such as:
-  scaling services rather than tasks
-  more accurately predicting non-linear changes to the metric
-  using Relay's algorithm as a feedback mechanism to optimize machine
learning problems.

If you're going to Mesoscon, we could catch up then!

Alex


On Fri, Jun 5, 2015 at 11:39 AM CCAAT <cc...@tampabay.rr.com> wrote:

> On 06/05/2015 10:09 AM, Alex Gaudio wrote:
> > Hi @Ankur,
>
> > Next, we built Relay <https://github.com/sailthru/relay> and, the Mesos
> > extension, Relay.Mesos <https://github.com/sailthru/relay.mesos>, to
> > convert our small scripts into long-running instances we could then put
> > on Marathon.
> >>                 Thanks.
>
>
>  From the first reference page:
> " This type of problem is often quite complex, and there is a field
> called Control Theory dedicated to problems "
>
> Oh WOW! Somebody else that understands controls.....
>
> I was going to suggest some code to build a hybrid Feedback (PID based
> on resource utilizations) + Feedforward (based on chronologically
> repetitive events) to solve this problem, because it's quite easy to
> also integrate relay boards to boot-up/shut-down resources, if systems
> are properly configured for such up/down cycling of hardware. There is
> also legacy boot/shutdown controls for resources via PXE and such ether
> based hardware tricks.
>
> I think using concepts of process control applied to specific
> computational resources, such as ram, cpu, etc etc, also needs to be
> addressed with autoscaling.  Quick glance at your projects and you
> are well underway.
>
> Hint: You might want to explain some fundamentals of "Control Theory",
> as I'm not sure this collective is aware of such simple, beautiful
> and robust mathematics.
>
>
> PS: avoid Nyquist stability, for now, but bode plot would be keen!
>
>
> hth,
> James
>
>
>
>

Re: Cluster autoscaling in Spark+Mesos ?

Posted by CCAAT <cc...@tampabay.rr.com>.

On 06/05/2015 10:09 AM, Alex Gaudio wrote:
> Hi @Ankur,

> Next, we built Relay <https://github.com/sailthru/relay> and, the Mesos
> extension, Relay.Mesos <https://github.com/sailthru/relay.mesos>, to
> convert our small scripts into long-running instances we could then put
> on Marathon.
>>                 Thanks.

 From the first reference page:
" This type of problem is often quite complex, and there is a field 
called Control Theory dedicated to problems "

Oh WOW! Somebody else that understands controls.....

I was going to suggest some code to build a hybrid Feedback (PID based 
on resource utilizations) + Feedforward (based on chronologically 
repetitive events) to solve this problem, because it's quite easy to 
also integrate relay boards to boot-up/shut-down resources, if systems 
are properly configured for such up/down cycling of hardware. There is 
also legacy boot/shutdown controls for resources via PXE and such ether 
based hardware tricks.

I think using concepts of process control applied to specific 
computational resources, such as ram, cpu, etc etc, also needs to be 
addressed with autoscaling.  Quick glance at your projects and you
are well underway.

Hint: You might want to explain some fundamentals of "Control Theory",
as I'm not sure this collective is aware of such simple, beautiful
and robust mathematics.

PS: avoid Nyquist stability, for now, but bode plot would be keen!

hth,
James

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Alex Gaudio <ad...@gmail.com>.

Hi @Ankur,

It might be a big late in the conversation to respond, but we had almost
exactly the same questions you did, so I thought I might share our story.

We originally needed to build a data science infrastructure and chose to do
it on top of Mesos.  We first tried for a few months to use Spark to run
all of our applications.  We quickly ran into all sorts of issues and
challenges (from solving confusing bugs to getting creative with how spark
manually partitions things), and then realized that many of these problems
were more easily solved with simple scripts.   After a couple months, we
mostly moved off of Spark (though we're certainly excited to put future
applications on Spark - it is an awesome project and has a lot of promise!).

The next thing we tried to do was scale 1000s of instances of our simple
scripts on Marathon (with the idea to build some sort of very simple
autoscaler to manage it).  We found that marathon could not handle the
turnover of so many small tasks.  With the smaller tasks, Marathon became
unresponsive, got confused about task state, and (I think) occasionally
lost track of some tasks, but on the other hand, it works quite well for
long-running applications, so we use it for that.

Next, we built Relay <https://github.com/sailthru/relay> and, the Mesos
extension, Relay.Mesos <https://github.com/sailthru/relay.mesos>, to
convert our small scripts into long-running instances we could then put on
Marathon.  Relay.Mesos auto-scales thousands of instances of small scripts
on Mesos, and we can just run multiple Relay instances for better fault
tolerance.  We're very happy with this simple little tool!  It doesn't try
to solve a bin packing (or task clustering) problem.  It attempts only to
solves the auto scaling problem.

Finally, to deal with auto-scaling Mesos nodes in the cluster, we use EC2
Auto-scaling policies <http://aws.amazon.com/autoscaling/> based on
aggregate metrics (like CPU usage) reported from our Mesos slaves.  The
simple scaling policies simply spin up (or down) new instances of Mesos
slave that then auto-register themselves into the cluster and are ready to
run.  We currently track just CPU usage and that seems to work well enough
to not need to invest more time on it.

The key strategy we've adopted through this process is to make each tool we
use solve one very specific and well understood problem.  I have found that
trying to use "one tool to rule them all" is not practical because we have
a lot of different kinds of problems to solve.  On the other hand, we now
have a zillion different tools.  At first, the tools are overwhelming to
new employees, but our group seems quite comfortable with all these tools.
I believe this has also created a culture where it's encouraging, and
exciting, to try new things all the time.

I hope that was helpful, or if off the mark, at least interesting.

Alex

On Fri, Jun 5, 2015 at 4:37 AM Tim Chen <ti...@mesosphere.io> wrote:

> Hi Sharma,
>
> What metrics do you watch for demand and supply for Spark? Do you just
> watch node resources or you actually look at some Spark JMX stats?
>
> Tim
>
> On Thu, Jun 4, 2015 at 10:35 PM, Sharma Podila <sp...@netflix.com>
> wrote:
>
>> We Autoscale our Mesos cluster in EC2 from within our framework. Scaling
>> up can be easy via watching demand Vs supply. However, scaling down
>> requires bin packing the tasks tightly onto as few servers as possible.
>> Do you have any specific ideas on how you would leverage Mantis/Mesos for
>> Spark based jobs? Fenzo, the scheduler part of Mantis, could be another
>> point of leverage, which could give a framework the ability to autoscale
>> the cluster among other benefits.
>>
>>
>>
>> On Thu, Jun 4, 2015 at 1:06 PM, Dmitry Goldenberg <
>> dgoldenberg123@gmail.com> wrote:
>>
>>> Thanks, Vinod. I'm really interested in how we could leverage something
>>> like Mantis and Mesos to achieve autoscaling in a Spark-based data
>>> processing system...
>>>
>>> On Jun 4, 2015, at 3:54 PM, Vinod Kone <vi...@gmail.com> wrote:
>>>
>>> Hey Dmitry. At the current time there is no built-in support for Mesos
>>> to autoscale nodes in the cluster. I've heard people (Netflix?) do it out
>>> of band on EC2.
>>>
>>> On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg <
>>> dgoldenberg123@gmail.com> wrote:
>>>
>>>> A Mesos noob here. Could someone point me at the doc or summary for the
>>>> cluster autoscaling capabilities in Mesos?
>>>>
>>>> Is there a way to feed it events and have it detect the need to bring
>>>> in more machines or decommission machines?  Is there a way to receive
>>>> events back that notify you that machines have been allocated or
>>>> decommissioned?
>>>>
>>>> Would this work within a certain set of
>>>> "preallocated"/pre-provisioned/"stand-by" machines or will Mesos go and
>>>> grab machines from the cloud?
>>>>
>>>> What are the integration points of Apache Spark and Mesos?  What are
>>>> the true advantages of running Spark on Mesos?
>>>>
>>>> Can Mesos autoscale the cluster based on some signals/events coming out
>>>> of Spark runtime or Spark consumers, then cause the consumers to run on the
>>>> updated cluster, or signal to the consumers to restart themselves into an
>>>> updated cluster?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>
>

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Sharma Podila <sp...@netflix.com>.

>
> 
> Tim

What metrics do you watch for demand and supply for Spark? Do you just
> watch node resources or you actually look at some Spark JMX stats?


Sorry about the confusion. Here, I was referring to us autoscaling within
our own framework in Mantis, not for Spark.
The scheduler knows the demand Vs supply from its own assignments and
therefore is able to know the idle resources, etc. You can find some
details in
http://www.slideshare.net/spodila/aws-reinvent-2014-talk-scheduling-using-apache-mesos-in-the-cloud



On Fri, Jun 5, 2015 at 1:37 AM, Tim Chen <ti...@mesosphere.io> wrote:

> Hi Sharma,
>
> 
> What metrics do you watch for demand and supply for Spark? Do you just
> watch node resources or you actually look at some Spark JMX stats?
>
> Tim
>
> On Thu, Jun 4, 2015 at 10:35 PM, Sharma Podila <sp...@netflix.com>
> wrote:
>
>> We Autoscale our Mesos cluster in EC2 from within our framework. Scaling
>> up can be easy via watching demand Vs supply. However, scaling down
>> requires bin packing the tasks tightly onto as few servers as possible.
>> Do you have any specific ideas on how you would leverage Mantis/Mesos for
>> Spark based jobs? Fenzo, the scheduler part of Mantis, could be another
>> point of leverage, which could give a framework the ability to autoscale
>> the cluster among other benefits.
>>
>>
>>
>> On Thu, Jun 4, 2015 at 1:06 PM, Dmitry Goldenberg <
>> dgoldenberg123@gmail.com> wrote:
>>
>>> Thanks, Vinod. I'm really interested in how we could leverage something
>>> like Mantis and Mesos to achieve autoscaling in a Spark-based data
>>> processing system...
>>>
>>> On Jun 4, 2015, at 3:54 PM, Vinod Kone <vi...@gmail.com> wrote:
>>>
>>> Hey Dmitry. At the current time there is no built-in support for Mesos
>>> to autoscale nodes in the cluster. I've heard people (Netflix?) do it out
>>> of band on EC2.
>>>
>>> On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg <
>>> dgoldenberg123@gmail.com> wrote:
>>>
>>>> A Mesos noob here. Could someone point me at the doc or summary for the
>>>> cluster autoscaling capabilities in Mesos?
>>>>
>>>> Is there a way to feed it events and have it detect the need to bring
>>>> in more machines or decommission machines?  Is there a way to receive
>>>> events back that notify you that machines have been allocated or
>>>> decommissioned?
>>>>
>>>> Would this work within a certain set of
>>>> "preallocated"/pre-provisioned/"stand-by" machines or will Mesos go and
>>>> grab machines from the cloud?
>>>>
>>>> What are the integration points of Apache Spark and Mesos?  What are
>>>> the true advantages of running Spark on Mesos?
>>>>
>>>> Can Mesos autoscale the cluster based on some signals/events coming out
>>>> of Spark runtime or Spark consumers, then cause the consumers to run on the
>>>> updated cluster, or signal to the consumers to restart themselves into an
>>>> updated cluster?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>
>

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Tim Chen <ti...@mesosphere.io>.

Hi Sharma,

What metrics do you watch for demand and supply for Spark? Do you just
watch node resources or you actually look at some Spark JMX stats?

Tim

On Thu, Jun 4, 2015 at 10:35 PM, Sharma Podila <sp...@netflix.com> wrote:

> We Autoscale our Mesos cluster in EC2 from within our framework. Scaling
> up can be easy via watching demand Vs supply. However, scaling down
> requires bin packing the tasks tightly onto as few servers as possible.
> Do you have any specific ideas on how you would leverage Mantis/Mesos for
> Spark based jobs? Fenzo, the scheduler part of Mantis, could be another
> point of leverage, which could give a framework the ability to autoscale
> the cluster among other benefits.
>
>
>
> On Thu, Jun 4, 2015 at 1:06 PM, Dmitry Goldenberg <
> dgoldenberg123@gmail.com> wrote:
>
>> Thanks, Vinod. I'm really interested in how we could leverage something
>> like Mantis and Mesos to achieve autoscaling in a Spark-based data
>> processing system...
>>
>> On Jun 4, 2015, at 3:54 PM, Vinod Kone <vi...@gmail.com> wrote:
>>
>> Hey Dmitry. At the current time there is no built-in support for Mesos to
>> autoscale nodes in the cluster. I've heard people (Netflix?) do it out of
>> band on EC2.
>>
>> On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg <
>> dgoldenberg123@gmail.com> wrote:
>>
>>> A Mesos noob here. Could someone point me at the doc or summary for the
>>> cluster autoscaling capabilities in Mesos?
>>>
>>> Is there a way to feed it events and have it detect the need to bring in
>>> more machines or decommission machines?  Is there a way to receive events
>>> back that notify you that machines have been allocated or decommissioned?
>>>
>>> Would this work within a certain set of
>>> "preallocated"/pre-provisioned/"stand-by" machines or will Mesos go and
>>> grab machines from the cloud?
>>>
>>> What are the integration points of Apache Spark and Mesos?  What are the
>>> true advantages of running Spark on Mesos?
>>>
>>> Can Mesos autoscale the cluster based on some signals/events coming out
>>> of Spark runtime or Spark consumers, then cause the consumers to run on the
>>> updated cluster, or signal to the consumers to restart themselves into an
>>> updated cluster?
>>>
>>> Thanks.
>>>
>>
>>
>

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Sharma Podila <sp...@netflix.com>.

We Autoscale our Mesos cluster in EC2 from within our framework. Scaling up
can be easy via watching demand Vs supply. However, scaling down requires
bin packing the tasks tightly onto as few servers as possible.
Do you have any specific ideas on how you would leverage Mantis/Mesos for
Spark based jobs? Fenzo, the scheduler part of Mantis, could be another
point of leverage, which could give a framework the ability to autoscale
the cluster among other benefits.



On Thu, Jun 4, 2015 at 1:06 PM, Dmitry Goldenberg <dg...@gmail.com>
wrote:

> Thanks, Vinod. I'm really interested in how we could leverage something
> like Mantis and Mesos to achieve autoscaling in a Spark-based data
> processing system...
>
> On Jun 4, 2015, at 3:54 PM, Vinod Kone <vi...@gmail.com> wrote:
>
> Hey Dmitry. At the current time there is no built-in support for Mesos to
> autoscale nodes in the cluster. I've heard people (Netflix?) do it out of
> band on EC2.
>
> On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg <
> dgoldenberg123@gmail.com> wrote:
>
>> A Mesos noob here. Could someone point me at the doc or summary for the
>> cluster autoscaling capabilities in Mesos?
>>
>> Is there a way to feed it events and have it detect the need to bring in
>> more machines or decommission machines?  Is there a way to receive events
>> back that notify you that machines have been allocated or decommissioned?
>>
>> Would this work within a certain set of
>> "preallocated"/pre-provisioned/"stand-by" machines or will Mesos go and
>> grab machines from the cloud?
>>
>> What are the integration points of Apache Spark and Mesos?  What are the
>> true advantages of running Spark on Mesos?
>>
>> Can Mesos autoscale the cluster based on some signals/events coming out
>> of Spark runtime or Spark consumers, then cause the consumers to run on the
>> updated cluster, or signal to the consumers to restart themselves into an
>> updated cluster?
>>
>> Thanks.
>>
>
>

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Dmitry Goldenberg <dg...@gmail.com>.

Thanks, Vinod. I'm really interested in how we could leverage something like Mantis and Mesos to achieve autoscaling in a Spark-based data processing system...

> On Jun 4, 2015, at 3:54 PM, Vinod Kone <vi...@gmail.com> wrote:
> 
> Hey Dmitry. At the current time there is no built-in support for Mesos to autoscale nodes in the cluster. I've heard people (Netflix?) do it out of band on EC2.
> 
>> On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg <dg...@gmail.com> wrote:
>> A Mesos noob here. Could someone point me at the doc or summary for the cluster autoscaling capabilities in Mesos?
>> 
>> Is there a way to feed it events and have it detect the need to bring in more machines or decommission machines?  Is there a way to receive events back that notify you that machines have been allocated or decommissioned?
>> 
>> Would this work within a certain set of "preallocated"/pre-provisioned/"stand-by" machines or will Mesos go and grab machines from the cloud?
>> 
>> What are the integration points of Apache Spark and Mesos?  What are the true advantages of running Spark on Mesos?
>> 
>> Can Mesos autoscale the cluster based on some signals/events coming out of Spark runtime or Spark consumers, then cause the consumers to run on the updated cluster, or signal to the consumers to restart themselves into an updated cluster?
>> 
>> Thanks.
>

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Tim Chen <ti...@mesosphere.io>.

Hi Dmitry,

That certainly can work just needs to coordinate the events you mentioned,
and make sure it does happen accordingly. Currently the Spark scheduler is
very job agnostic and doesn't understand what Spark job is it running, and
that's the next type of optimizations I'd like to put into the roadmap,
that understands the job type that it's running and can support certain
actions depending on what it is.

Do you have a specific use case you can prototype this? We can certainly
make this happen in the Spark side.

Tim





On Thu, Jun 4, 2015 at 2:11 PM, Dmitry Goldenberg <dg...@gmail.com>
wrote:

> Tim,
>
> Aware of more resources - is that if it runs on Mesos or via any type of
> cluster manager?  Our thinking was that once we can determine that the
> cluster has changed, we could notify the streaming consumers to finish
> processing the current batch, then terminate, then resume streaming with a
> new instance of the Context.  Would that not cause Spark to refresh its
> awareness of the cluster resources?
>
> - Dmitry
>
> On Thu, Jun 4, 2015 at 5:03 PM, Tim Chen <ti...@mesosphere.io> wrote:
>
>> Spark is aware there are more resources by getting more resource offers
>> and using those new offers.
>>
>> I don't think there is a way to refresh the Spark context for streaming.
>>
>> Tim
>>
>> On Thu, Jun 4, 2015 at 1:59 PM, Dmitry Goldenberg <
>> dgoldenberg123@gmail.com> wrote:
>>
>>> Thanks, Ankur. I'd be curious to understand how the data exchange
>>> happens in this case. How does Spark become aware of the fact that machines
>>> have been added to the cluster or have been removed from it?  And then, do
>>> you have some mechanism to perhaps restart the Spark consumers into
>>> refreshed Spark context's which are aware of the new cluster topology?
>>>
>>> On Thu, Jun 4, 2015 at 4:23 PM, Ankur Chauhan <an...@malloc64.com>
>>> wrote:
>>>
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> AFAIK Mesos does not support host level auto-scaling because that is
>>>> not the scope of the mesos-master or mesos-slave. In EC2 (like in my
>>>> case) we have autoscaling groups set with cloudwatch metrics hooked up
>>>> to scaling policies. In our case, we have the following.
>>>> * Add 1 host per AZ when cpu load is > 85% for 15 mins continuously.
>>>> * Remove 1 host if the cpu load is < 15% for 15 mins continuously.
>>>> * Similar monitoring + scale-up/scale-down based on memory.
>>>>
>>>> All of these rules have a cooldown period of 30mins so that we don't
>>>> end-up scaling up/down too fast.
>>>>
>>>> Then again, our workload is bursty (spark on mesos in fine-grained
>>>> mode). So, the new resources get used up and tasks distribute pretty
>>>> fast. The above may not work in case you have long-running tasks (such
>>>> as marathon tasks) because they would not be redistributed till some
>>>> task restarting happens.
>>>>
>>>> - -- Ankur
>>>>
>>>> On 04/06/2015 13:13, Dmitry Goldenberg wrote:
>>>> > Would it be accurate to say that Mesos helps you optimize resource
>>>> > utilization out of a preset  pool of resources, presumably servers?
>>>> > And its level of autoscaling is within that pool?
>>>> >
>>>> >
>>>> > On Jun 4, 2015, at 3:54 PM, Vinod Kone <vinodkone@gmail.com
>>>> > <ma...@gmail.com>> wrote:
>>>> >
>>>> >> Hey Dmitry. At the current time there is no built-in support for
>>>> >> Mesos to autoscale nodes in the cluster. I've heard people
>>>> >> (Netflix?) do it out of band on EC2.
>>>> >>
>>>> >> On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg
>>>> >> <dgoldenberg123@gmail.com <ma...@gmail.com>>
>>>> >> wrote:
>>>> >>
>>>> >> A Mesos noob here. Could someone point me at the doc or summary
>>>> >> for the cluster autoscaling capabilities in Mesos?
>>>> >>
>>>> >> Is there a way to feed it events and have it detect the need to
>>>> >> bring in more machines or decommission machines?  Is there a way
>>>> >> to receive events back that notify you that machines have been
>>>> >> allocated or decommissioned?
>>>> >>
>>>> >> Would this work within a certain set of
>>>> >> "preallocated"/pre-provisioned/"stand-by" machines or will Mesos
>>>> >> go and grab machines from the cloud?
>>>> >>
>>>> >> What are the integration points of Apache Spark and Mesos?  What
>>>> >> are the true advantages of running Spark on Mesos?
>>>> >>
>>>> >> Can Mesos autoscale the cluster based on some signals/events
>>>> >> coming out of Spark runtime or Spark consumers, then cause the
>>>> >> consumers to run on the updated cluster, or signal to the
>>>> >> consumers to restart themselves into an updated cluster?
>>>> >>
>>>> >> Thanks.
>>>> >>
>>>> >>
>>>> -----BEGIN PGP SIGNATURE-----
>>>>
>>>> iQEcBAEBAgAGBQJVcLO2AAoJEOSJAMhvLp3LDuEH/1Bu3vhALR8+TPbsM5TscDOy
>>>> vFwyb+ACh8tKL2XoXPwBaMkXU5qPFGX9Wa5weDNCqcUqbvoZ6G9ScrXbpTpWVFTn
>>>> n240CxKGMqplgelDZmQAlixlPB8jUi9ZUfn6Z4FjuPUz1scLSyIOATxh57z0qRyp
>>>> kdbS3pcU5ZmS9N/CHwNGOI9qwk7ebA1HPLqkRnBJLHKXJ6savW4FbANYb8OLWcAM
>>>> It2GzbyAdrMMs7dgeaaEPnvwqnF5nSf2aERA9EjFyxBhJMgKidlUxFSxvMTD1jkx
>>>> xjMZJeeVDqVsdZWtJkNwNsjXQG7X7f2bWY14rDL4XM59X8XCLnxkODRMTeGjXBM=
>>>> =cHZK
>>>> -----END PGP SIGNATURE-----
>>>>
>>>
>>>
>>
>

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Dmitry Goldenberg <dg...@gmail.com>.

Tim,

Aware of more resources - is that if it runs on Mesos or via any type of
cluster manager?  Our thinking was that once we can determine that the
cluster has changed, we could notify the streaming consumers to finish
processing the current batch, then terminate, then resume streaming with a
new instance of the Context.  Would that not cause Spark to refresh its
awareness of the cluster resources?

- Dmitry

On Thu, Jun 4, 2015 at 5:03 PM, Tim Chen <ti...@mesosphere.io> wrote:

> Spark is aware there are more resources by getting more resource offers
> and using those new offers.
>
> I don't think there is a way to refresh the Spark context for streaming.
>
> Tim
>
> On Thu, Jun 4, 2015 at 1:59 PM, Dmitry Goldenberg <
> dgoldenberg123@gmail.com> wrote:
>
>> Thanks, Ankur. I'd be curious to understand how the data exchange happens
>> in this case. How does Spark become aware of the fact that machines have
>> been added to the cluster or have been removed from it?  And then, do you
>> have some mechanism to perhaps restart the Spark consumers into refreshed
>> Spark context's which are aware of the new cluster topology?
>>
>> On Thu, Jun 4, 2015 at 4:23 PM, Ankur Chauhan <an...@malloc64.com> wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> AFAIK Mesos does not support host level auto-scaling because that is
>>> not the scope of the mesos-master or mesos-slave. In EC2 (like in my
>>> case) we have autoscaling groups set with cloudwatch metrics hooked up
>>> to scaling policies. In our case, we have the following.
>>> * Add 1 host per AZ when cpu load is > 85% for 15 mins continuously.
>>> * Remove 1 host if the cpu load is < 15% for 15 mins continuously.
>>> * Similar monitoring + scale-up/scale-down based on memory.
>>>
>>> All of these rules have a cooldown period of 30mins so that we don't
>>> end-up scaling up/down too fast.
>>>
>>> Then again, our workload is bursty (spark on mesos in fine-grained
>>> mode). So, the new resources get used up and tasks distribute pretty
>>> fast. The above may not work in case you have long-running tasks (such
>>> as marathon tasks) because they would not be redistributed till some
>>> task restarting happens.
>>>
>>> - -- Ankur
>>>
>>> On 04/06/2015 13:13, Dmitry Goldenberg wrote:
>>> > Would it be accurate to say that Mesos helps you optimize resource
>>> > utilization out of a preset  pool of resources, presumably servers?
>>> > And its level of autoscaling is within that pool?
>>> >
>>> >
>>> > On Jun 4, 2015, at 3:54 PM, Vinod Kone <vinodkone@gmail.com
>>> > <ma...@gmail.com>> wrote:
>>> >
>>> >> Hey Dmitry. At the current time there is no built-in support for
>>> >> Mesos to autoscale nodes in the cluster. I've heard people
>>> >> (Netflix?) do it out of band on EC2.
>>> >>
>>> >> On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg
>>> >> <dgoldenberg123@gmail.com <ma...@gmail.com>>
>>> >> wrote:
>>> >>
>>> >> A Mesos noob here. Could someone point me at the doc or summary
>>> >> for the cluster autoscaling capabilities in Mesos?
>>> >>
>>> >> Is there a way to feed it events and have it detect the need to
>>> >> bring in more machines or decommission machines?  Is there a way
>>> >> to receive events back that notify you that machines have been
>>> >> allocated or decommissioned?
>>> >>
>>> >> Would this work within a certain set of
>>> >> "preallocated"/pre-provisioned/"stand-by" machines or will Mesos
>>> >> go and grab machines from the cloud?
>>> >>
>>> >> What are the integration points of Apache Spark and Mesos?  What
>>> >> are the true advantages of running Spark on Mesos?
>>> >>
>>> >> Can Mesos autoscale the cluster based on some signals/events
>>> >> coming out of Spark runtime or Spark consumers, then cause the
>>> >> consumers to run on the updated cluster, or signal to the
>>> >> consumers to restart themselves into an updated cluster?
>>> >>
>>> >> Thanks.
>>> >>
>>> >>
>>> -----BEGIN PGP SIGNATURE-----
>>>
>>> iQEcBAEBAgAGBQJVcLO2AAoJEOSJAMhvLp3LDuEH/1Bu3vhALR8+TPbsM5TscDOy
>>> vFwyb+ACh8tKL2XoXPwBaMkXU5qPFGX9Wa5weDNCqcUqbvoZ6G9ScrXbpTpWVFTn
>>> n240CxKGMqplgelDZmQAlixlPB8jUi9ZUfn6Z4FjuPUz1scLSyIOATxh57z0qRyp
>>> kdbS3pcU5ZmS9N/CHwNGOI9qwk7ebA1HPLqkRnBJLHKXJ6savW4FbANYb8OLWcAM
>>> It2GzbyAdrMMs7dgeaaEPnvwqnF5nSf2aERA9EjFyxBhJMgKidlUxFSxvMTD1jkx
>>> xjMZJeeVDqVsdZWtJkNwNsjXQG7X7f2bWY14rDL4XM59X8XCLnxkODRMTeGjXBM=
>>> =cHZK
>>> -----END PGP SIGNATURE-----
>>>
>>
>>
>

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Ankur Chauhan <an...@malloc64.com>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Yes, Breaking computations into simpler smaller jobs that run often
generally will be another way but spark will consume resource offers
as provided by mesos without any problems/extra effort.

- -- Ankur

On 04/06/2015 14:03, Tim Chen wrote:
> Spark is aware there are more resources by getting more resource
> offers and using those new offers.
> 
> I don't think there is a way to refresh the Spark context for
> streaming.
> 
> Tim
> 
> On Thu, Jun 4, 2015 at 1:59 PM, Dmitry Goldenberg 
> <dgoldenberg123@gmail.com <ma...@gmail.com>>
> wrote:
> 
> Thanks, Ankur. I'd be curious to understand how the data exchange 
> happens in this case. How does Spark become aware of the fact that 
> machines have been added to the cluster or have been removed from 
> it?  And then, do you have some mechanism to perhaps restart the 
> Spark consumers into refreshed Spark context's which are aware of 
> the new cluster topology?
> 
> On Thu, Jun 4, 2015 at 4:23 PM, Ankur Chauhan <ankur@malloc64.com 
> <ma...@malloc64.com>> wrote:
> 
> AFAIK Mesos does not support host level auto-scaling because that
> is not the scope of the mesos-master or mesos-slave. In EC2 (like
> in my case) we have autoscaling groups set with cloudwatch metrics 
> hooked up to scaling policies. In our case, we have the following. 
> * Add 1 host per AZ when cpu load is > 85% for 15 mins
> continuously. * Remove 1 host if the cpu load is < 15% for 15 mins
> continuously. * Similar monitoring + scale-up/scale-down based on
> memory.
> 
> All of these rules have a cooldown period of 30mins so that we
> don't end-up scaling up/down too fast.
> 
> Then again, our workload is bursty (spark on mesos in fine-grained 
> mode). So, the new resources get used up and tasks distribute
> pretty fast. The above may not work in case you have long-running
> tasks (such as marathon tasks) because they would not be
> redistributed till some task restarting happens.
> 
> -- Ankur
> 
> On 04/06/2015 13:13, Dmitry Goldenberg wrote:
>> Would it be accurate to say that Mesos helps you optimize
>> resource utilization out of a preset  pool of resources,
>> presumably
> servers?
>> And its level of autoscaling is within that pool?
> 
> 
>> On Jun 4, 2015, at 3:54 PM, Vinod Kone <vinodkone@gmail.com
> <ma...@gmail.com>
>> <mailto:vinodkone@gmail.com <ma...@gmail.com>>>
>> wrote:
> 
>>> Hey Dmitry. At the current time there is no built-in support
>>> for Mesos to autoscale nodes in the cluster. I've heard people 
>>> (Netflix?) do it out of band on EC2.
>>> 
>>> On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg 
>>> <dgoldenberg123@gmail.com <ma...@gmail.com>
> <mailto:dgoldenberg123@gmail.com
> <ma...@gmail.com>>>
>>> wrote:
>>> 
>>> A Mesos noob here. Could someone point me at the doc or
>>> summary for the cluster autoscaling capabilities in Mesos?
>>> 
>>> Is there a way to feed it events and have it detect the need
>>> to bring in more machines or decommission machines?  Is there a
>>> way to receive events back that notify you that machines have
>>> been allocated or decommissioned?
>>> 
>>> Would this work within a certain set of 
>>> "preallocated"/pre-provisioned/"stand-by" machines or will
>>> Mesos go and grab machines from the cloud?
>>> 
>>> What are the integration points of Apache Spark and Mesos?
>>> What are the true advantages of running Spark on Mesos?
>>> 
>>> Can Mesos autoscale the cluster based on some signals/events 
>>> coming out of Spark runtime or Spark consumers, then cause the 
>>> consumers to run on the updated cluster, or signal to the 
>>> consumers to restart themselves into an updated cluster?
>>> 
>>> Thanks.
>>> 
>>> 
> 
> 
> 
-----BEGIN PGP SIGNATURE-----

iQEcBAEBAgAGBQJVcL4EAAoJEOSJAMhvLp3LozkIALzYkyY+vNhLw3Jucl/lzdrF
laUyJFhz4QJre6tUj5fRSDztLkmNLSeDNS8EZ3EyUBLMdMy9a4QbLHqkO+TrCp4X
jR4ar5BFTCC1h53tmHgHML4VGksPkQvK3dyb4DUYBhtf+sEXU/bUnEcVKlk+nCVP
VCy1H67j4UN+gPLQntEzKHou3aksd/Xr2GlQfapljps3aojnmO7W1Ytm7h1Z7UCW
kun17Bmw395PJOWPMtw93j0GZGuFGjg3h4Gbp62zquc1561xLoLR4g36zkzumA5J
bhX6uU4Z5ia0aZIuhxz8vwhTyARqPmyg7jdNqgAXseF6sjEsOk/0fUL13pv+TO4=
=E5xx
-----END PGP SIGNATURE-----

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Tim Chen <ti...@mesosphere.io>.

Spark is aware there are more resources by getting more resource offers and
using those new offers.

I don't think there is a way to refresh the Spark context for streaming.

Tim

On Thu, Jun 4, 2015 at 1:59 PM, Dmitry Goldenberg <dg...@gmail.com>
wrote:

> Thanks, Ankur. I'd be curious to understand how the data exchange happens
> in this case. How does Spark become aware of the fact that machines have
> been added to the cluster or have been removed from it?  And then, do you
> have some mechanism to perhaps restart the Spark consumers into refreshed
> Spark context's which are aware of the new cluster topology?
>
> On Thu, Jun 4, 2015 at 4:23 PM, Ankur Chauhan <an...@malloc64.com> wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> AFAIK Mesos does not support host level auto-scaling because that is
>> not the scope of the mesos-master or mesos-slave. In EC2 (like in my
>> case) we have autoscaling groups set with cloudwatch metrics hooked up
>> to scaling policies. In our case, we have the following.
>> * Add 1 host per AZ when cpu load is > 85% for 15 mins continuously.
>> * Remove 1 host if the cpu load is < 15% for 15 mins continuously.
>> * Similar monitoring + scale-up/scale-down based on memory.
>>
>> All of these rules have a cooldown period of 30mins so that we don't
>> end-up scaling up/down too fast.
>>
>> Then again, our workload is bursty (spark on mesos in fine-grained
>> mode). So, the new resources get used up and tasks distribute pretty
>> fast. The above may not work in case you have long-running tasks (such
>> as marathon tasks) because they would not be redistributed till some
>> task restarting happens.
>>
>> - -- Ankur
>>
>> On 04/06/2015 13:13, Dmitry Goldenberg wrote:
>> > Would it be accurate to say that Mesos helps you optimize resource
>> > utilization out of a preset  pool of resources, presumably servers?
>> > And its level of autoscaling is within that pool?
>> >
>> >
>> > On Jun 4, 2015, at 3:54 PM, Vinod Kone <vinodkone@gmail.com
>> > <ma...@gmail.com>> wrote:
>> >
>> >> Hey Dmitry. At the current time there is no built-in support for
>> >> Mesos to autoscale nodes in the cluster. I've heard people
>> >> (Netflix?) do it out of band on EC2.
>> >>
>> >> On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg
>> >> <dgoldenberg123@gmail.com <ma...@gmail.com>>
>> >> wrote:
>> >>
>> >> A Mesos noob here. Could someone point me at the doc or summary
>> >> for the cluster autoscaling capabilities in Mesos?
>> >>
>> >> Is there a way to feed it events and have it detect the need to
>> >> bring in more machines or decommission machines?  Is there a way
>> >> to receive events back that notify you that machines have been
>> >> allocated or decommissioned?
>> >>
>> >> Would this work within a certain set of
>> >> "preallocated"/pre-provisioned/"stand-by" machines or will Mesos
>> >> go and grab machines from the cloud?
>> >>
>> >> What are the integration points of Apache Spark and Mesos?  What
>> >> are the true advantages of running Spark on Mesos?
>> >>
>> >> Can Mesos autoscale the cluster based on some signals/events
>> >> coming out of Spark runtime or Spark consumers, then cause the
>> >> consumers to run on the updated cluster, or signal to the
>> >> consumers to restart themselves into an updated cluster?
>> >>
>> >> Thanks.
>> >>
>> >>
>> -----BEGIN PGP SIGNATURE-----
>>
>> iQEcBAEBAgAGBQJVcLO2AAoJEOSJAMhvLp3LDuEH/1Bu3vhALR8+TPbsM5TscDOy
>> vFwyb+ACh8tKL2XoXPwBaMkXU5qPFGX9Wa5weDNCqcUqbvoZ6G9ScrXbpTpWVFTn
>> n240CxKGMqplgelDZmQAlixlPB8jUi9ZUfn6Z4FjuPUz1scLSyIOATxh57z0qRyp
>> kdbS3pcU5ZmS9N/CHwNGOI9qwk7ebA1HPLqkRnBJLHKXJ6savW4FbANYb8OLWcAM
>> It2GzbyAdrMMs7dgeaaEPnvwqnF5nSf2aERA9EjFyxBhJMgKidlUxFSxvMTD1jkx
>> xjMZJeeVDqVsdZWtJkNwNsjXQG7X7f2bWY14rDL4XM59X8XCLnxkODRMTeGjXBM=
>> =cHZK
>> -----END PGP SIGNATURE-----
>>
>
>

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Dmitry Goldenberg <dg...@gmail.com>.

Thanks, Ankur. I'd be curious to understand how the data exchange happens
in this case. How does Spark become aware of the fact that machines have
been added to the cluster or have been removed from it?  And then, do you
have some mechanism to perhaps restart the Spark consumers into refreshed
Spark context's which are aware of the new cluster topology?

On Thu, Jun 4, 2015 at 4:23 PM, Ankur Chauhan <an...@malloc64.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> AFAIK Mesos does not support host level auto-scaling because that is
> not the scope of the mesos-master or mesos-slave. In EC2 (like in my
> case) we have autoscaling groups set with cloudwatch metrics hooked up
> to scaling policies. In our case, we have the following.
> * Add 1 host per AZ when cpu load is > 85% for 15 mins continuously.
> * Remove 1 host if the cpu load is < 15% for 15 mins continuously.
> * Similar monitoring + scale-up/scale-down based on memory.
>
> All of these rules have a cooldown period of 30mins so that we don't
> end-up scaling up/down too fast.
>
> Then again, our workload is bursty (spark on mesos in fine-grained
> mode). So, the new resources get used up and tasks distribute pretty
> fast. The above may not work in case you have long-running tasks (such
> as marathon tasks) because they would not be redistributed till some
> task restarting happens.
>
> - -- Ankur
>
> On 04/06/2015 13:13, Dmitry Goldenberg wrote:
> > Would it be accurate to say that Mesos helps you optimize resource
> > utilization out of a preset  pool of resources, presumably servers?
> > And its level of autoscaling is within that pool?
> >
> >
> > On Jun 4, 2015, at 3:54 PM, Vinod Kone <vinodkone@gmail.com
> > <ma...@gmail.com>> wrote:
> >
> >> Hey Dmitry. At the current time there is no built-in support for
> >> Mesos to autoscale nodes in the cluster. I've heard people
> >> (Netflix?) do it out of band on EC2.
> >>
> >> On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg
> >> <dgoldenberg123@gmail.com <ma...@gmail.com>>
> >> wrote:
> >>
> >> A Mesos noob here. Could someone point me at the doc or summary
> >> for the cluster autoscaling capabilities in Mesos?
> >>
> >> Is there a way to feed it events and have it detect the need to
> >> bring in more machines or decommission machines?  Is there a way
> >> to receive events back that notify you that machines have been
> >> allocated or decommissioned?
> >>
> >> Would this work within a certain set of
> >> "preallocated"/pre-provisioned/"stand-by" machines or will Mesos
> >> go and grab machines from the cloud?
> >>
> >> What are the integration points of Apache Spark and Mesos?  What
> >> are the true advantages of running Spark on Mesos?
> >>
> >> Can Mesos autoscale the cluster based on some signals/events
> >> coming out of Spark runtime or Spark consumers, then cause the
> >> consumers to run on the updated cluster, or signal to the
> >> consumers to restart themselves into an updated cluster?
> >>
> >> Thanks.
> >>
> >>
> -----BEGIN PGP SIGNATURE-----
>
> iQEcBAEBAgAGBQJVcLO2AAoJEOSJAMhvLp3LDuEH/1Bu3vhALR8+TPbsM5TscDOy
> vFwyb+ACh8tKL2XoXPwBaMkXU5qPFGX9Wa5weDNCqcUqbvoZ6G9ScrXbpTpWVFTn
> n240CxKGMqplgelDZmQAlixlPB8jUi9ZUfn6Z4FjuPUz1scLSyIOATxh57z0qRyp
> kdbS3pcU5ZmS9N/CHwNGOI9qwk7ebA1HPLqkRnBJLHKXJ6savW4FbANYb8OLWcAM
> It2GzbyAdrMMs7dgeaaEPnvwqnF5nSf2aERA9EjFyxBhJMgKidlUxFSxvMTD1jkx
> xjMZJeeVDqVsdZWtJkNwNsjXQG7X7f2bWY14rDL4XM59X8XCLnxkODRMTeGjXBM=
> =cHZK
> -----END PGP SIGNATURE-----
>

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Ankur Chauhan <an...@malloc64.com>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

AFAIK Mesos does not support host level auto-scaling because that is
not the scope of the mesos-master or mesos-slave. In EC2 (like in my
case) we have autoscaling groups set with cloudwatch metrics hooked up
to scaling policies. In our case, we have the following.
* Add 1 host per AZ when cpu load is > 85% for 15 mins continuously.
* Remove 1 host if the cpu load is < 15% for 15 mins continuously.
* Similar monitoring + scale-up/scale-down based on memory.

All of these rules have a cooldown period of 30mins so that we don't
end-up scaling up/down too fast.

Then again, our workload is bursty (spark on mesos in fine-grained
mode). So, the new resources get used up and tasks distribute pretty
fast. The above may not work in case you have long-running tasks (such
as marathon tasks) because they would not be redistributed till some
task restarting happens.

- -- Ankur

On 04/06/2015 13:13, Dmitry Goldenberg wrote:
> Would it be accurate to say that Mesos helps you optimize resource 
> utilization out of a preset  pool of resources, presumably servers?
> And its level of autoscaling is within that pool?
> 
> 
> On Jun 4, 2015, at 3:54 PM, Vinod Kone <vinodkone@gmail.com 
> <ma...@gmail.com>> wrote:
> 
>> Hey Dmitry. At the current time there is no built-in support for
>> Mesos to autoscale nodes in the cluster. I've heard people
>> (Netflix?) do it out of band on EC2.
>> 
>> On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg 
>> <dgoldenberg123@gmail.com <ma...@gmail.com>>
>> wrote:
>> 
>> A Mesos noob here. Could someone point me at the doc or summary 
>> for the cluster autoscaling capabilities in Mesos?
>> 
>> Is there a way to feed it events and have it detect the need to 
>> bring in more machines or decommission machines?  Is there a way 
>> to receive events back that notify you that machines have been 
>> allocated or decommissioned?
>> 
>> Would this work within a certain set of 
>> "preallocated"/pre-provisioned/"stand-by" machines or will Mesos 
>> go and grab machines from the cloud?
>> 
>> What are the integration points of Apache Spark and Mesos?  What 
>> are the true advantages of running Spark on Mesos?
>> 
>> Can Mesos autoscale the cluster based on some signals/events 
>> coming out of Spark runtime or Spark consumers, then cause the 
>> consumers to run on the updated cluster, or signal to the 
>> consumers to restart themselves into an updated cluster?
>> 
>> Thanks.
>> 
>> 
-----BEGIN PGP SIGNATURE-----

iQEcBAEBAgAGBQJVcLO2AAoJEOSJAMhvLp3LDuEH/1Bu3vhALR8+TPbsM5TscDOy
vFwyb+ACh8tKL2XoXPwBaMkXU5qPFGX9Wa5weDNCqcUqbvoZ6G9ScrXbpTpWVFTn
n240CxKGMqplgelDZmQAlixlPB8jUi9ZUfn6Z4FjuPUz1scLSyIOATxh57z0qRyp
kdbS3pcU5ZmS9N/CHwNGOI9qwk7ebA1HPLqkRnBJLHKXJ6savW4FbANYb8OLWcAM
It2GzbyAdrMMs7dgeaaEPnvwqnF5nSf2aERA9EjFyxBhJMgKidlUxFSxvMTD1jkx
xjMZJeeVDqVsdZWtJkNwNsjXQG7X7f2bWY14rDL4XM59X8XCLnxkODRMTeGjXBM=
=cHZK
-----END PGP SIGNATURE-----

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Dmitry Goldenberg <dg...@gmail.com>.

Would it be accurate to say that Mesos helps you optimize resource utilization out of a preset  pool of resources, presumably servers? And its level of autoscaling is within that pool?


> On Jun 4, 2015, at 3:54 PM, Vinod Kone <vi...@gmail.com> wrote:
> 
> Hey Dmitry. At the current time there is no built-in support for Mesos to autoscale nodes in the cluster. I've heard people (Netflix?) do it out of band on EC2.
> 
>> On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg <dg...@gmail.com> wrote:
>> A Mesos noob here. Could someone point me at the doc or summary for the cluster autoscaling capabilities in Mesos?
>> 
>> Is there a way to feed it events and have it detect the need to bring in more machines or decommission machines?  Is there a way to receive events back that notify you that machines have been allocated or decommissioned?
>> 
>> Would this work within a certain set of "preallocated"/pre-provisioned/"stand-by" machines or will Mesos go and grab machines from the cloud?
>> 
>> What are the integration points of Apache Spark and Mesos?  What are the true advantages of running Spark on Mesos?
>> 
>> Can Mesos autoscale the cluster based on some signals/events coming out of Spark runtime or Spark consumers, then cause the consumers to run on the updated cluster, or signal to the consumers to restart themselves into an updated cluster?
>> 
>> Thanks.
>

Re: Cluster autoscaling in Spark+Mesos ?

Posted by Vinod Kone <vi...@gmail.com>.

Hey Dmitry. At the current time there is no built-in support for Mesos to
autoscale nodes in the cluster. I've heard people (Netflix?) do it out of
band on EC2.

On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg <dg...@gmail.com>
wrote:

> A Mesos noob here. Could someone point me at the doc or summary for the
> cluster autoscaling capabilities in Mesos?
>
> Is there a way to feed it events and have it detect the need to bring in
> more machines or decommission machines?  Is there a way to receive events
> back that notify you that machines have been allocated or decommissioned?
>
> Would this work within a certain set of
> "preallocated"/pre-provisioned/"stand-by" machines or will Mesos go and
> grab machines from the cloud?
>
> What are the integration points of Apache Spark and Mesos?  What are the
> true advantages of running Spark on Mesos?
>
> Can Mesos autoscale the cluster based on some signals/events coming out of
> Spark runtime or Spark consumers, then cause the consumers to run on the
> updated cluster, or signal to the consumers to restart themselves into an
> updated cluster?
>
> Thanks.
>