You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Aaron Carey <ac...@ilm.com> on 2015/09/23 10:50:02 UTC

Metric for tasks queued/waiting?

Hi all,

Is there any way to get a metric of all tasks currently waiting/queued in Mesos (across all schedulers)? The snapshot metrics seem to cover ever other kind of task state? This would be quite useful for auto-scaling purposes..

Thanks,
Aaron

Re: Metric for tasks queued/waiting?

Posted by Niklas Nielsen <ni...@mesosphere.io>.

Created a ticket for us to continue the discussion:
https://issues.apache.org/jira/browse/MESOS-3507

We can try to capture the explicit use-case from Aaron and maybe create
another ticket to track a more-or-less generic path we could go down.

Cheers,
Niklas

On 23 September 2015 at 15:55, Sharma Podila <sp...@netflix.com> wrote:

> Discussing in a separate place/JIRA ticket sounds good.
> Basically, representing contention using a summary of pending resource
> requests from each framework could be the hints to mesos master. However,
> this gets into intricacies, not the least of which is diversity of resource
> requests, qualified by queue depth.
> Another way to think of this could be that each framework could trigger a
> scale up individually (say, by hitting a mesos master or another
> independent service's endpoint to add additional agents/slaves). Even
> uncoordinated scale up actions from multiple frameworks should result in
> the same end result, modulo reservations/limits/etc. Then, mesos master
> needs to deal with only scale down, which it could perform based on offer
> rejections from frameworks, implying nobody needs that many agents/slaves.
>
> Maybe that's more details than needed in this discussion...
>
>
>
> On Wed, Sep 23, 2015 at 2:05 PM, Niklas Nielsen <ni...@mesosphere.io>
> wrote:
>
>> I'd love to see this solved in a general way; "How does the framework
>> communicate (insert intent, metric, hint, etc) to mesos".
>>
>> In one way, the 'webui_url' of in the framework info conveys "This is how
>> you get to my web ui". As providing a webui was a common pattern for the
>> frameworks.
>>
>> This could be expanded, so the framework can report an 'apiui_url' or
>> maybe even more specific "metrics_url" where the mesos master (or other
>> frameworks and 3rd party tooling) can get insights into queue depths,
>> resource preferences, etc.
>>
>> We can start discussing this further in a JIRA ticket :)
>>
>> Niklas
>>
>> On 23 September 2015 at 13:54, Alex Gaudio <ad...@gmail.com> wrote:
>>
>>> Hi Aaron,
>>>
>>> You might consider trying to solve the autoscaling problem with Relay, a
>>> Python tool I use to solve this problem.  Feel free to shoot me an email if
>>> you are interested.
>>>
>>> github.com/sailthru/relay
>>>
>>> Alex
>>>
>>> On Wed, Sep 23, 2015, 11:03 AM David Greenberg <ds...@gmail.com>
>>> wrote:
>>>
>>>> In addition, this technique could be implemented in the allocator with
>>>> an understanding of global demand:
>>>> https://www.youtube.com/watch?v=BkBMYUe76oI
>>>>
>>>> That would allow for tunable fair-sharing based on DRF-principles.
>>>>
>>>> On Wed, Sep 23, 2015 at 10:59 AM haosdent <ha...@gmail.com> wrote:
>>>>
>>>>> Feel free to open a story in jira if you think you ideas are awesome.
>>>>> :-)
>>>>>
>>>> On Sep 23, 2015 10:54 PM, "Sharma Podila" <sp...@netflix.com> wrote:
>>>>>
>>>>>> Ah, OK, thanks. Yes, Fenzo is a Java library.
>>>>>>
>>>>>> It might be a nice addition to Mesos master to get a global view of
>>>>>> contention for resources. In addition to autoscaling, it would be useful in
>>>>>> the allocator.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 23, 2015 at 7:29 AM, Aaron Carey <ac...@ilm.com> wrote:
>>>>>>
>>>>>>> Thanks Sharma,
>>>>>>>
>>>>>>> I was in the audience for a talk you did about Fenzo at MesosCon :)
>>>>>>> It looked great but we're a python shop primarily so the Java requirement
>>>>>>> would be a problem for us.
>>>>>>>
>>>>>>> The scaling in the scheduler makes total sense, (obvious when you
>>>>>>> think about it!), I was naively hoping for some sort of knowledge of that
>>>>>>> back in the Mesos master as we were hoping to have scaling be independent
>>>>>>> of schedulers. I think this'll need a re-think!
>>>>>>>
>>>>>>> Thanks for your help!
>>>>>>>
>>>>>>> Aaron
>>>>>>>
>>>>>>> ------------------------------
>>>>>>> *From:* Sharma Podila [spodila@netflix.com]
>>>>>>> *Sent:* 23 September 2015 15:22
>>>>>>>
>>>>>>> *To:* user@mesos.apache.org
>>>>>>> *Subject:* Re: Metric for tasks queued/waiting?
>>>>>>>
>>>>>>> Jobs/tasks wait in framework schedulers, not mesos master.
>>>>>>> Autoscaling triggers must come from schedulers, not only because that's who
>>>>>>> knows the pending task set size, but, also because it knows how many of
>>>>>>> them need to be launched right away, on what kind of machines.
>>>>>>>
>>>>>>> We built such an autoscaling capability in our framework schedulers.
>>>>>>> The autoscaling is achieved by our library Fenzo
>>>>>>> <https://github.com/Netflix/Fenzo> which we open sourced recently.
>>>>>>> Also read about Fenzo autoscaling here
>>>>>>> <https://github.com/Netflix/Fenzo/wiki/Autoscaling>. You should
>>>>>>> look into using that if you are developing your own scheduler. Or, have
>>>>>>> your scheduler team pick up Fenzo for autoscaling.
>>>>>>>
>>>>>>> Also, note that scaling up is temptingly easy by watching the
>>>>>>> pending task queue. But, scaling down requires bin packing, etc. Other
>>>>>>> issues pop up as well, for example:
>>>>>>>
>>>>>>> - what if a user submits tasks that cannot be satisfied? Will
>>>>>>> autoscale keep increasing the cluster size unbounded?
>>>>>>> - what if you would like to have a heterogeneous mix of hosts and
>>>>>>> tasks? which kind of hosts do you need to autoscale based on which tasks
>>>>>>> are pending?
>>>>>>>
>>>>>>> These are automatically addressed in Fenzo.
>>>>>>>
>>>>>>> Sharma
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 23, 2015 at 4:56 AM, Aaron Carey <ac...@ilm.com> wrote:
>>>>>>>
>>>>>>>> No, I basically had the same question as Jim (but maybe didn't word
>>>>>>>> it so well ;))
>>>>>>>>
>>>>>>>> I'll have a look at your response there :)
>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>>> *From:* haosdent [haosdent@gmail.com]
>>>>>>>> *Sent:* 23 September 2015 10:12
>>>>>>>> *To:* user@mesos.apache.org
>>>>>>>> *Subject:* Re: Metric for tasks queued/waiting?
>>>>>>>>
>>>>>>>> Does /metrics/snapshot not satisfy your requirement?
>>>>>>>>
>>>>>>>> On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <ac...@ilm.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Is there any way to get a metric of all tasks currently
>>>>>>>>> waiting/queued in Mesos (across all schedulers)? The snapshot metrics seem
>>>>>>>>> to cover ever other kind of task state? This would be quite useful for
>>>>>>>>> auto-scaling purposes..
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Aaron
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>
>

Re: Metric for tasks queued/waiting?

Posted by Sharma Podila <sp...@netflix.com>.

Discussing in a separate place/JIRA ticket sounds good.
Basically, representing contention using a summary of pending resource
requests from each framework could be the hints to mesos master. However,
this gets into intricacies, not the least of which is diversity of resource
requests, qualified by queue depth.
Another way to think of this could be that each framework could trigger a
scale up individually (say, by hitting a mesos master or another
independent service's endpoint to add additional agents/slaves). Even
uncoordinated scale up actions from multiple frameworks should result in
the same end result, modulo reservations/limits/etc. Then, mesos master
needs to deal with only scale down, which it could perform based on offer
rejections from frameworks, implying nobody needs that many agents/slaves.

Maybe that's more details than needed in this discussion...



On Wed, Sep 23, 2015 at 2:05 PM, Niklas Nielsen <ni...@mesosphere.io>
wrote:

> I'd love to see this solved in a general way; "How does the framework
> communicate (insert intent, metric, hint, etc) to mesos".
>
> In one way, the 'webui_url' of in the framework info conveys "This is how
> you get to my web ui". As providing a webui was a common pattern for the
> frameworks.
>
> This could be expanded, so the framework can report an 'apiui_url' or
> maybe even more specific "metrics_url" where the mesos master (or other
> frameworks and 3rd party tooling) can get insights into queue depths,
> resource preferences, etc.
>
> We can start discussing this further in a JIRA ticket :)
>
> Niklas
>
> On 23 September 2015 at 13:54, Alex Gaudio <ad...@gmail.com> wrote:
>
>> Hi Aaron,
>>
>> You might consider trying to solve the autoscaling problem with Relay, a
>> Python tool I use to solve this problem.  Feel free to shoot me an email if
>> you are interested.
>>
>> github.com/sailthru/relay
>>
>> Alex
>>
>> On Wed, Sep 23, 2015, 11:03 AM David Greenberg <ds...@gmail.com>
>> wrote:
>>
>>> In addition, this technique could be implemented in the allocator with
>>> an understanding of global demand:
>>> https://www.youtube.com/watch?v=BkBMYUe76oI
>>>
>>> That would allow for tunable fair-sharing based on DRF-principles.
>>>
>>> On Wed, Sep 23, 2015 at 10:59 AM haosdent <ha...@gmail.com> wrote:
>>>
>>>> Feel free to open a story in jira if you think you ideas are awesome.
>>>> :-)
>>>>
>>> On Sep 23, 2015 10:54 PM, "Sharma Podila" <sp...@netflix.com> wrote:
>>>>
>>>>> Ah, OK, thanks. Yes, Fenzo is a Java library.
>>>>>
>>>>> It might be a nice addition to Mesos master to get a global view of
>>>>> contention for resources. In addition to autoscaling, it would be useful in
>>>>> the allocator.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Sep 23, 2015 at 7:29 AM, Aaron Carey <ac...@ilm.com> wrote:
>>>>>
>>>>>> Thanks Sharma,
>>>>>>
>>>>>> I was in the audience for a talk you did about Fenzo at MesosCon :)
>>>>>> It looked great but we're a python shop primarily so the Java requirement
>>>>>> would be a problem for us.
>>>>>>
>>>>>> The scaling in the scheduler makes total sense, (obvious when you
>>>>>> think about it!), I was naively hoping for some sort of knowledge of that
>>>>>> back in the Mesos master as we were hoping to have scaling be independent
>>>>>> of schedulers. I think this'll need a re-think!
>>>>>>
>>>>>> Thanks for your help!
>>>>>>
>>>>>> Aaron
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:* Sharma Podila [spodila@netflix.com]
>>>>>> *Sent:* 23 September 2015 15:22
>>>>>>
>>>>>> *To:* user@mesos.apache.org
>>>>>> *Subject:* Re: Metric for tasks queued/waiting?
>>>>>>
>>>>>> Jobs/tasks wait in framework schedulers, not mesos master.
>>>>>> Autoscaling triggers must come from schedulers, not only because that's who
>>>>>> knows the pending task set size, but, also because it knows how many of
>>>>>> them need to be launched right away, on what kind of machines.
>>>>>>
>>>>>> We built such an autoscaling capability in our framework schedulers.
>>>>>> The autoscaling is achieved by our library Fenzo
>>>>>> <https://github.com/Netflix/Fenzo> which we open sourced recently.
>>>>>> Also read about Fenzo autoscaling here
>>>>>> <https://github.com/Netflix/Fenzo/wiki/Autoscaling>. You should look
>>>>>> into using that if you are developing your own scheduler. Or, have your
>>>>>> scheduler team pick up Fenzo for autoscaling.
>>>>>>
>>>>>> Also, note that scaling up is temptingly easy by watching the pending
>>>>>> task queue. But, scaling down requires bin packing, etc. Other issues pop
>>>>>> up as well, for example:
>>>>>>
>>>>>> - what if a user submits tasks that cannot be satisfied? Will
>>>>>> autoscale keep increasing the cluster size unbounded?
>>>>>> - what if you would like to have a heterogeneous mix of hosts and
>>>>>> tasks? which kind of hosts do you need to autoscale based on which tasks
>>>>>> are pending?
>>>>>>
>>>>>> These are automatically addressed in Fenzo.
>>>>>>
>>>>>> Sharma
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 23, 2015 at 4:56 AM, Aaron Carey <ac...@ilm.com> wrote:
>>>>>>
>>>>>>> No, I basically had the same question as Jim (but maybe didn't word
>>>>>>> it so well ;))
>>>>>>>
>>>>>>> I'll have a look at your response there :)
>>>>>>>
>>>>>>> ------------------------------
>>>>>>> *From:* haosdent [haosdent@gmail.com]
>>>>>>> *Sent:* 23 September 2015 10:12
>>>>>>> *To:* user@mesos.apache.org
>>>>>>> *Subject:* Re: Metric for tasks queued/waiting?
>>>>>>>
>>>>>>> Does /metrics/snapshot not satisfy your requirement?
>>>>>>>
>>>>>>> On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <ac...@ilm.com> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Is there any way to get a metric of all tasks currently
>>>>>>>> waiting/queued in Mesos (across all schedulers)? The snapshot metrics seem
>>>>>>>> to cover ever other kind of task state? This would be quite useful for
>>>>>>>> auto-scaling purposes..
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Aaron
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>

Re: Metric for tasks queued/waiting?

Posted by Niklas Nielsen <ni...@mesosphere.io>.

I'd love to see this solved in a general way; "How does the framework
communicate (insert intent, metric, hint, etc) to mesos".

In one way, the 'webui_url' of in the framework info conveys "This is how
you get to my web ui". As providing a webui was a common pattern for the
frameworks.

This could be expanded, so the framework can report an 'apiui_url' or maybe
even more specific "metrics_url" where the mesos master (or other
frameworks and 3rd party tooling) can get insights into queue depths,
resource preferences, etc.

We can start discussing this further in a JIRA ticket :)

Niklas

On 23 September 2015 at 13:54, Alex Gaudio <ad...@gmail.com> wrote:

> Hi Aaron,
>
> You might consider trying to solve the autoscaling problem with Relay, a
> Python tool I use to solve this problem.  Feel free to shoot me an email if
> you are interested.
>
> github.com/sailthru/relay
>
> Alex
>
> On Wed, Sep 23, 2015, 11:03 AM David Greenberg <ds...@gmail.com>
> wrote:
>
>> In addition, this technique could be implemented in the allocator with an
>> understanding of global demand:
>> https://www.youtube.com/watch?v=BkBMYUe76oI
>>
>> That would allow for tunable fair-sharing based on DRF-principles.
>>
>> On Wed, Sep 23, 2015 at 10:59 AM haosdent <ha...@gmail.com> wrote:
>>
>>> Feel free to open a story in jira if you think you ideas are awesome. :-)
>>>
>> On Sep 23, 2015 10:54 PM, "Sharma Podila" <sp...@netflix.com> wrote:
>>>
>>>> Ah, OK, thanks. Yes, Fenzo is a Java library.
>>>>
>>>> It might be a nice addition to Mesos master to get a global view of
>>>> contention for resources. In addition to autoscaling, it would be useful in
>>>> the allocator.
>>>>
>>>>
>>>>
>>>> On Wed, Sep 23, 2015 at 7:29 AM, Aaron Carey <ac...@ilm.com> wrote:
>>>>
>>>>> Thanks Sharma,
>>>>>
>>>>> I was in the audience for a talk you did about Fenzo at MesosCon :) It
>>>>> looked great but we're a python shop primarily so the Java requirement
>>>>> would be a problem for us.
>>>>>
>>>>> The scaling in the scheduler makes total sense, (obvious when you
>>>>> think about it!), I was naively hoping for some sort of knowledge of that
>>>>> back in the Mesos master as we were hoping to have scaling be independent
>>>>> of schedulers. I think this'll need a re-think!
>>>>>
>>>>> Thanks for your help!
>>>>>
>>>>> Aaron
>>>>>
>>>>> ------------------------------
>>>>> *From:* Sharma Podila [spodila@netflix.com]
>>>>> *Sent:* 23 September 2015 15:22
>>>>>
>>>>> *To:* user@mesos.apache.org
>>>>> *Subject:* Re: Metric for tasks queued/waiting?
>>>>>
>>>>> Jobs/tasks wait in framework schedulers, not mesos master. Autoscaling
>>>>> triggers must come from schedulers, not only because that's who knows the
>>>>> pending task set size, but, also because it knows how many of them need to
>>>>> be launched right away, on what kind of machines.
>>>>>
>>>>> We built such an autoscaling capability in our framework schedulers.
>>>>> The autoscaling is achieved by our library Fenzo
>>>>> <https://github.com/Netflix/Fenzo> which we open sourced recently.
>>>>> Also read about Fenzo autoscaling here
>>>>> <https://github.com/Netflix/Fenzo/wiki/Autoscaling>. You should look
>>>>> into using that if you are developing your own scheduler. Or, have your
>>>>> scheduler team pick up Fenzo for autoscaling.
>>>>>
>>>>> Also, note that scaling up is temptingly easy by watching the pending
>>>>> task queue. But, scaling down requires bin packing, etc. Other issues pop
>>>>> up as well, for example:
>>>>>
>>>>> - what if a user submits tasks that cannot be satisfied? Will
>>>>> autoscale keep increasing the cluster size unbounded?
>>>>> - what if you would like to have a heterogeneous mix of hosts and
>>>>> tasks? which kind of hosts do you need to autoscale based on which tasks
>>>>> are pending?
>>>>>
>>>>> These are automatically addressed in Fenzo.
>>>>>
>>>>> Sharma
>>>>>
>>>>>
>>>>> On Wed, Sep 23, 2015 at 4:56 AM, Aaron Carey <ac...@ilm.com> wrote:
>>>>>
>>>>>> No, I basically had the same question as Jim (but maybe didn't word
>>>>>> it so well ;))
>>>>>>
>>>>>> I'll have a look at your response there :)
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:* haosdent [haosdent@gmail.com]
>>>>>> *Sent:* 23 September 2015 10:12
>>>>>> *To:* user@mesos.apache.org
>>>>>> *Subject:* Re: Metric for tasks queued/waiting?
>>>>>>
>>>>>> Does /metrics/snapshot not satisfy your requirement?
>>>>>>
>>>>>> On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <ac...@ilm.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Is there any way to get a metric of all tasks currently
>>>>>>> waiting/queued in Mesos (across all schedulers)? The snapshot metrics seem
>>>>>>> to cover ever other kind of task state? This would be quite useful for
>>>>>>> auto-scaling purposes..
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Aaron
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>

RE: Metric for tasks queued/waiting?

Posted by Aaron Carey <ac...@ilm.com>.

Thanks Alex,

The problem here is more along the lines of getting the metrics to feed into the algorithm, rather than the algorithm itself. Relay looks very cool though thanks :)

Aaron

________________________________
From: Alex Gaudio [adgaudio@gmail.com]
Sent: 23 September 2015 21:54
To: user@mesos.apache.org
Subject: Re: Metric for tasks queued/waiting?

Hi Aaron,

You might consider trying to solve the autoscaling problem with Relay, a Python tool I use to solve this problem.  Feel free to shoot me an email if you are interested.

github.com/sailthru/relay<http://github.com/sailthru/relay>

Alex

On Wed, Sep 23, 2015, 11:03 AM David Greenberg <ds...@gmail.com>> wrote:
In addition, this technique could be implemented in the allocator with an understanding of global demand: https://www.youtube.com/watch?v=BkBMYUe76oI

That would allow for tunable fair-sharing based on DRF-principles.

On Wed, Sep 23, 2015 at 10:59 AM haosdent <ha...@gmail.com>> wrote:

Feel free to open a story in jira if you think you ideas are awesome. :-)

On Sep 23, 2015 10:54 PM, "Sharma Podila" <sp...@netflix.com>> wrote:
Ah, OK, thanks. Yes, Fenzo is a Java library.

It might be a nice addition to Mesos master to get a global view of contention for resources. In addition to autoscaling, it would be useful in the allocator.

On Wed, Sep 23, 2015 at 7:29 AM, Aaron Carey <ac...@ilm.com>> wrote:
Thanks Sharma,

I was in the audience for a talk you did about Fenzo at MesosCon :) It looked great but we're a python shop primarily so the Java requirement would be a problem for us.

The scaling in the scheduler makes total sense, (obvious when you think about it!), I was naively hoping for some sort of knowledge of that back in the Mesos master as we were hoping to have scaling be independent of schedulers. I think this'll need a re-think!

Thanks for your help!

Aaron

________________________________
From: Sharma Podila [spodila@netflix.com<ma...@netflix.com>]
Sent: 23 September 2015 15:22

To: user@mesos.apache.org<ma...@mesos.apache.org>
Subject: Re: Metric for tasks queued/waiting?

Jobs/tasks wait in framework schedulers, not mesos master. Autoscaling triggers must come from schedulers, not only because that's who knows the pending task set size, but, also because it knows how many of them need to be launched right away, on what kind of machines.

We built such an autoscaling capability in our framework schedulers. The autoscaling is achieved by our library Fenzo<https://github.com/Netflix/Fenzo> which we open sourced recently. Also read about Fenzo autoscaling here<https://github.com/Netflix/Fenzo/wiki/Autoscaling>. You should look into using that if you are developing your own scheduler. Or, have your scheduler team pick up Fenzo for autoscaling.

Also, note that scaling up is temptingly easy by watching the pending task queue. But, scaling down requires bin packing, etc. Other issues pop up as well, for example:

- what if a user submits tasks that cannot be satisfied? Will autoscale keep increasing the cluster size unbounded?
- what if you would like to have a heterogeneous mix of hosts and tasks? which kind of hosts do you need to autoscale based on which tasks are pending?

These are automatically addressed in Fenzo.

Sharma

On Wed, Sep 23, 2015 at 4:56 AM, Aaron Carey <ac...@ilm.com>> wrote:
No, I basically had the same question as Jim (but maybe didn't word it so well ;))

I'll have a look at your response there :)

________________________________
From: haosdent [haosdent@gmail.com<ma...@gmail.com>]
Sent: 23 September 2015 10:12
To: user@mesos.apache.org<ma...@mesos.apache.org>
Subject: Re: Metric for tasks queued/waiting?

Does /metrics/snapshot not satisfy your requirement?

On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <ac...@ilm.com>> wrote:
Hi all,

Is there any way to get a metric of all tasks currently waiting/queued in Mesos (across all schedulers)? The snapshot metrics seem to cover ever other kind of task state? This would be quite useful for auto-scaling purposes..

Thanks,
Aaron

--
Best Regards,
Haosdent Huang

Re: Metric for tasks queued/waiting?

Posted by Alex Gaudio <ad...@gmail.com>.

Hi Aaron,

You might consider trying to solve the autoscaling problem with Relay, a
Python tool I use to solve this problem.  Feel free to shoot me an email if
you are interested.

github.com/sailthru/relay

Alex

On Wed, Sep 23, 2015, 11:03 AM David Greenberg <ds...@gmail.com>
wrote:

> In addition, this technique could be implemented in the allocator with an
> understanding of global demand:
> https://www.youtube.com/watch?v=BkBMYUe76oI
>
> That would allow for tunable fair-sharing based on DRF-principles.
>
> On Wed, Sep 23, 2015 at 10:59 AM haosdent <ha...@gmail.com> wrote:
>
>> Feel free to open a story in jira if you think you ideas are awesome. :-)
>>
> On Sep 23, 2015 10:54 PM, "Sharma Podila" <sp...@netflix.com> wrote:
>>
>>> Ah, OK, thanks. Yes, Fenzo is a Java library.
>>>
>>> It might be a nice addition to Mesos master to get a global view of
>>> contention for resources. In addition to autoscaling, it would be useful in
>>> the allocator.
>>>
>>>
>>>
>>> On Wed, Sep 23, 2015 at 7:29 AM, Aaron Carey <ac...@ilm.com> wrote:
>>>
>>>> Thanks Sharma,
>>>>
>>>> I was in the audience for a talk you did about Fenzo at MesosCon :) It
>>>> looked great but we're a python shop primarily so the Java requirement
>>>> would be a problem for us.
>>>>
>>>> The scaling in the scheduler makes total sense, (obvious when you think
>>>> about it!), I was naively hoping for some sort of knowledge of that back in
>>>> the Mesos master as we were hoping to have scaling be independent of
>>>> schedulers. I think this'll need a re-think!
>>>>
>>>> Thanks for your help!
>>>>
>>>> Aaron
>>>>
>>>> ------------------------------
>>>> *From:* Sharma Podila [spodila@netflix.com]
>>>> *Sent:* 23 September 2015 15:22
>>>>
>>>> *To:* user@mesos.apache.org
>>>> *Subject:* Re: Metric for tasks queued/waiting?
>>>>
>>>> Jobs/tasks wait in framework schedulers, not mesos master. Autoscaling
>>>> triggers must come from schedulers, not only because that's who knows the
>>>> pending task set size, but, also because it knows how many of them need to
>>>> be launched right away, on what kind of machines.
>>>>
>>>> We built such an autoscaling capability in our framework schedulers.
>>>> The autoscaling is achieved by our library Fenzo
>>>> <https://github.com/Netflix/Fenzo> which we open sourced recently.
>>>> Also read about Fenzo autoscaling here
>>>> <https://github.com/Netflix/Fenzo/wiki/Autoscaling>. You should look
>>>> into using that if you are developing your own scheduler. Or, have your
>>>> scheduler team pick up Fenzo for autoscaling.
>>>>
>>>> Also, note that scaling up is temptingly easy by watching the pending
>>>> task queue. But, scaling down requires bin packing, etc. Other issues pop
>>>> up as well, for example:
>>>>
>>>> - what if a user submits tasks that cannot be satisfied? Will autoscale
>>>> keep increasing the cluster size unbounded?
>>>> - what if you would like to have a heterogeneous mix of hosts and
>>>> tasks? which kind of hosts do you need to autoscale based on which tasks
>>>> are pending?
>>>>
>>>> These are automatically addressed in Fenzo.
>>>>
>>>> Sharma
>>>>
>>>>
>>>> On Wed, Sep 23, 2015 at 4:56 AM, Aaron Carey <ac...@ilm.com> wrote:
>>>>
>>>>> No, I basically had the same question as Jim (but maybe didn't word it
>>>>> so well ;))
>>>>>
>>>>> I'll have a look at your response there :)
>>>>>
>>>>> ------------------------------
>>>>> *From:* haosdent [haosdent@gmail.com]
>>>>> *Sent:* 23 September 2015 10:12
>>>>> *To:* user@mesos.apache.org
>>>>> *Subject:* Re: Metric for tasks queued/waiting?
>>>>>
>>>>> Does /metrics/snapshot not satisfy your requirement?
>>>>>
>>>>> On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <ac...@ilm.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Is there any way to get a metric of all tasks currently
>>>>>> waiting/queued in Mesos (across all schedulers)? The snapshot metrics seem
>>>>>> to cover ever other kind of task state? This would be quite useful for
>>>>>> auto-scaling purposes..
>>>>>>
>>>>>> Thanks,
>>>>>> Aaron
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>>
>>>

Re: Metric for tasks queued/waiting?

Posted by David Greenberg <ds...@gmail.com>.

In addition, this technique could be implemented in the allocator with an
understanding of global demand: https://www.youtube.com/watch?v=BkBMYUe76oI

That would allow for tunable fair-sharing based on DRF-principles.

On Wed, Sep 23, 2015 at 10:59 AM haosdent <ha...@gmail.com> wrote:

> Feel free to open a story in jira if you think you ideas are awesome. :-)
> On Sep 23, 2015 10:54 PM, "Sharma Podila" <sp...@netflix.com> wrote:
>
>> Ah, OK, thanks. Yes, Fenzo is a Java library.
>>
>> It might be a nice addition to Mesos master to get a global view of
>> contention for resources. In addition to autoscaling, it would be useful in
>> the allocator.
>>
>>
>>
>> On Wed, Sep 23, 2015 at 7:29 AM, Aaron Carey <ac...@ilm.com> wrote:
>>
>>> Thanks Sharma,
>>>
>>> I was in the audience for a talk you did about Fenzo at MesosCon :) It
>>> looked great but we're a python shop primarily so the Java requirement
>>> would be a problem for us.
>>>
>>> The scaling in the scheduler makes total sense, (obvious when you think
>>> about it!), I was naively hoping for some sort of knowledge of that back in
>>> the Mesos master as we were hoping to have scaling be independent of
>>> schedulers. I think this'll need a re-think!
>>>
>>> Thanks for your help!
>>>
>>> Aaron
>>>
>>> ------------------------------
>>> *From:* Sharma Podila [spodila@netflix.com]
>>> *Sent:* 23 September 2015 15:22
>>>
>>> *To:* user@mesos.apache.org
>>> *Subject:* Re: Metric for tasks queued/waiting?
>>>
>>> Jobs/tasks wait in framework schedulers, not mesos master. Autoscaling
>>> triggers must come from schedulers, not only because that's who knows the
>>> pending task set size, but, also because it knows how many of them need to
>>> be launched right away, on what kind of machines.
>>>
>>> We built such an autoscaling capability in our framework schedulers. The
>>> autoscaling is achieved by our library Fenzo
>>> <https://github.com/Netflix/Fenzo> which we open sourced recently. Also
>>> read about Fenzo autoscaling here
>>> <https://github.com/Netflix/Fenzo/wiki/Autoscaling>. You should look
>>> into using that if you are developing your own scheduler. Or, have your
>>> scheduler team pick up Fenzo for autoscaling.
>>>
>>> Also, note that scaling up is temptingly easy by watching the pending
>>> task queue. But, scaling down requires bin packing, etc. Other issues pop
>>> up as well, for example:
>>>
>>> - what if a user submits tasks that cannot be satisfied? Will autoscale
>>> keep increasing the cluster size unbounded?
>>> - what if you would like to have a heterogeneous mix of hosts and tasks?
>>> which kind of hosts do you need to autoscale based on which tasks are
>>> pending?
>>>
>>> These are automatically addressed in Fenzo.
>>>
>>> Sharma
>>>
>>>
>>> On Wed, Sep 23, 2015 at 4:56 AM, Aaron Carey <ac...@ilm.com> wrote:
>>>
>>>> No, I basically had the same question as Jim (but maybe didn't word it
>>>> so well ;))
>>>>
>>>> I'll have a look at your response there :)
>>>>
>>>> ------------------------------
>>>> *From:* haosdent [haosdent@gmail.com]
>>>> *Sent:* 23 September 2015 10:12
>>>> *To:* user@mesos.apache.org
>>>> *Subject:* Re: Metric for tasks queued/waiting?
>>>>
>>>> Does /metrics/snapshot not satisfy your requirement?
>>>>
>>>> On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <ac...@ilm.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Is there any way to get a metric of all tasks currently waiting/queued
>>>>> in Mesos (across all schedulers)? The snapshot metrics seem to cover ever
>>>>> other kind of task state? This would be quite useful for auto-scaling
>>>>> purposes..
>>>>>
>>>>> Thanks,
>>>>> Aaron
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>>
>>

Re: Metric for tasks queued/waiting?

Posted by haosdent <ha...@gmail.com>.

Feel free to open a story in jira if you think you ideas are awesome. :-)
On Sep 23, 2015 10:54 PM, "Sharma Podila" <sp...@netflix.com> wrote:

> Ah, OK, thanks. Yes, Fenzo is a Java library.
>
> It might be a nice addition to Mesos master to get a global view of
> contention for resources. In addition to autoscaling, it would be useful in
> the allocator.
>
>
>
> On Wed, Sep 23, 2015 at 7:29 AM, Aaron Carey <ac...@ilm.com> wrote:
>
>> Thanks Sharma,
>>
>> I was in the audience for a talk you did about Fenzo at MesosCon :) It
>> looked great but we're a python shop primarily so the Java requirement
>> would be a problem for us.
>>
>> The scaling in the scheduler makes total sense, (obvious when you think
>> about it!), I was naively hoping for some sort of knowledge of that back in
>> the Mesos master as we were hoping to have scaling be independent of
>> schedulers. I think this'll need a re-think!
>>
>> Thanks for your help!
>>
>> Aaron
>>
>> ------------------------------
>> *From:* Sharma Podila [spodila@netflix.com]
>> *Sent:* 23 September 2015 15:22
>>
>> *To:* user@mesos.apache.org
>> *Subject:* Re: Metric for tasks queued/waiting?
>>
>> Jobs/tasks wait in framework schedulers, not mesos master. Autoscaling
>> triggers must come from schedulers, not only because that's who knows the
>> pending task set size, but, also because it knows how many of them need to
>> be launched right away, on what kind of machines.
>>
>> We built such an autoscaling capability in our framework schedulers. The
>> autoscaling is achieved by our library Fenzo
>> <https://github.com/Netflix/Fenzo> which we open sourced recently. Also
>> read about Fenzo autoscaling here
>> <https://github.com/Netflix/Fenzo/wiki/Autoscaling>. You should look
>> into using that if you are developing your own scheduler. Or, have your
>> scheduler team pick up Fenzo for autoscaling.
>>
>> Also, note that scaling up is temptingly easy by watching the pending
>> task queue. But, scaling down requires bin packing, etc. Other issues pop
>> up as well, for example:
>>
>> - what if a user submits tasks that cannot be satisfied? Will autoscale
>> keep increasing the cluster size unbounded?
>> - what if you would like to have a heterogeneous mix of hosts and tasks?
>> which kind of hosts do you need to autoscale based on which tasks are
>> pending?
>>
>> These are automatically addressed in Fenzo.
>>
>> Sharma
>>
>>
>> On Wed, Sep 23, 2015 at 4:56 AM, Aaron Carey <ac...@ilm.com> wrote:
>>
>>> No, I basically had the same question as Jim (but maybe didn't word it
>>> so well ;))
>>>
>>> I'll have a look at your response there :)
>>>
>>> ------------------------------
>>> *From:* haosdent [haosdent@gmail.com]
>>> *Sent:* 23 September 2015 10:12
>>> *To:* user@mesos.apache.org
>>> *Subject:* Re: Metric for tasks queued/waiting?
>>>
>>> Does /metrics/snapshot not satisfy your requirement?
>>>
>>> On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <ac...@ilm.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Is there any way to get a metric of all tasks currently waiting/queued
>>>> in Mesos (across all schedulers)? The snapshot metrics seem to cover ever
>>>> other kind of task state? This would be quite useful for auto-scaling
>>>> purposes..
>>>>
>>>> Thanks,
>>>> Aaron
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>>
>

Re: Metric for tasks queued/waiting?

Posted by Sharma Podila <sp...@netflix.com>.

Ah, OK, thanks. Yes, Fenzo is a Java library.

It might be a nice addition to Mesos master to get a global view of
contention for resources. In addition to autoscaling, it would be useful in
the allocator.



On Wed, Sep 23, 2015 at 7:29 AM, Aaron Carey <ac...@ilm.com> wrote:

> Thanks Sharma,
>
> I was in the audience for a talk you did about Fenzo at MesosCon :) It
> looked great but we're a python shop primarily so the Java requirement
> would be a problem for us.
>
> The scaling in the scheduler makes total sense, (obvious when you think
> about it!), I was naively hoping for some sort of knowledge of that back in
> the Mesos master as we were hoping to have scaling be independent of
> schedulers. I think this'll need a re-think!
>
> Thanks for your help!
>
> Aaron
>
> ------------------------------
> *From:* Sharma Podila [spodila@netflix.com]
> *Sent:* 23 September 2015 15:22
>
> *To:* user@mesos.apache.org
> *Subject:* Re: Metric for tasks queued/waiting?
>
> Jobs/tasks wait in framework schedulers, not mesos master. Autoscaling
> triggers must come from schedulers, not only because that's who knows the
> pending task set size, but, also because it knows how many of them need to
> be launched right away, on what kind of machines.
>
> We built such an autoscaling capability in our framework schedulers. The
> autoscaling is achieved by our library Fenzo
> <https://github.com/Netflix/Fenzo> which we open sourced recently. Also
> read about Fenzo autoscaling here
> <https://github.com/Netflix/Fenzo/wiki/Autoscaling>. You should look into
> using that if you are developing your own scheduler. Or, have your
> scheduler team pick up Fenzo for autoscaling.
>
> Also, note that scaling up is temptingly easy by watching the pending task
> queue. But, scaling down requires bin packing, etc. Other issues pop up as
> well, for example:
>
> - what if a user submits tasks that cannot be satisfied? Will autoscale
> keep increasing the cluster size unbounded?
> - what if you would like to have a heterogeneous mix of hosts and tasks?
> which kind of hosts do you need to autoscale based on which tasks are
> pending?
>
> These are automatically addressed in Fenzo.
>
> Sharma
>
>
> On Wed, Sep 23, 2015 at 4:56 AM, Aaron Carey <ac...@ilm.com> wrote:
>
>> No, I basically had the same question as Jim (but maybe didn't word it so
>> well ;))
>>
>> I'll have a look at your response there :)
>>
>> ------------------------------
>> *From:* haosdent [haosdent@gmail.com]
>> *Sent:* 23 September 2015 10:12
>> *To:* user@mesos.apache.org
>> *Subject:* Re: Metric for tasks queued/waiting?
>>
>> Does /metrics/snapshot not satisfy your requirement?
>>
>> On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <ac...@ilm.com> wrote:
>>
>>> Hi all,
>>>
>>> Is there any way to get a metric of all tasks currently waiting/queued
>>> in Mesos (across all schedulers)? The snapshot metrics seem to cover ever
>>> other kind of task state? This would be quite useful for auto-scaling
>>> purposes..
>>>
>>> Thanks,
>>> Aaron
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>

RE: Metric for tasks queued/waiting?

Posted by Aaron Carey <ac...@ilm.com>.

Thanks Sharma,

I was in the audience for a talk you did about Fenzo at MesosCon :) It looked great but we're a python shop primarily so the Java requirement would be a problem for us.

The scaling in the scheduler makes total sense, (obvious when you think about it!), I was naively hoping for some sort of knowledge of that back in the Mesos master as we were hoping to have scaling be independent of schedulers. I think this'll need a re-think!

Thanks for your help!

Aaron

________________________________
From: Sharma Podila [spodila@netflix.com]
Sent: 23 September 2015 15:22
To: user@mesos.apache.org
Subject: Re: Metric for tasks queued/waiting?

Jobs/tasks wait in framework schedulers, not mesos master. Autoscaling triggers must come from schedulers, not only because that's who knows the pending task set size, but, also because it knows how many of them need to be launched right away, on what kind of machines.

We built such an autoscaling capability in our framework schedulers. The autoscaling is achieved by our library Fenzo<https://github.com/Netflix/Fenzo> which we open sourced recently. Also read about Fenzo autoscaling here<https://github.com/Netflix/Fenzo/wiki/Autoscaling>. You should look into using that if you are developing your own scheduler. Or, have your scheduler team pick up Fenzo for autoscaling.

Also, note that scaling up is temptingly easy by watching the pending task queue. But, scaling down requires bin packing, etc. Other issues pop up as well, for example:

- what if a user submits tasks that cannot be satisfied? Will autoscale keep increasing the cluster size unbounded?
- what if you would like to have a heterogeneous mix of hosts and tasks? which kind of hosts do you need to autoscale based on which tasks are pending?

These are automatically addressed in Fenzo.

Sharma


On Wed, Sep 23, 2015 at 4:56 AM, Aaron Carey <ac...@ilm.com>> wrote:
No, I basically had the same question as Jim (but maybe didn't word it so well ;))

I'll have a look at your response there :)

________________________________
From: haosdent [haosdent@gmail.com<ma...@gmail.com>]
Sent: 23 September 2015 10:12
To: user@mesos.apache.org<ma...@mesos.apache.org>
Subject: Re: Metric for tasks queued/waiting?

Does /metrics/snapshot not satisfy your requirement?

On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <ac...@ilm.com>> wrote:
Hi all,

Is there any way to get a metric of all tasks currently waiting/queued in Mesos (across all schedulers)? The snapshot metrics seem to cover ever other kind of task state? This would be quite useful for auto-scaling purposes..

Thanks,
Aaron



--
Best Regards,
Haosdent Huang

Re: Metric for tasks queued/waiting?

Posted by Sharma Podila <sp...@netflix.com>.

Jobs/tasks wait in framework schedulers, not mesos master. Autoscaling
triggers must come from schedulers, not only because that's who knows the
pending task set size, but, also because it knows how many of them need to
be launched right away, on what kind of machines.

We built such an autoscaling capability in our framework schedulers. The
autoscaling is achieved by our library Fenzo
<https://github.com/Netflix/Fenzo> which we open sourced recently. Also
read about Fenzo autoscaling here
<https://github.com/Netflix/Fenzo/wiki/Autoscaling>. You should look into
using that if you are developing your own scheduler. Or, have your
scheduler team pick up Fenzo for autoscaling.

Also, note that scaling up is temptingly easy by watching the pending task
queue. But, scaling down requires bin packing, etc. Other issues pop up as
well, for example:

- what if a user submits tasks that cannot be satisfied? Will autoscale
keep increasing the cluster size unbounded?
- what if you would like to have a heterogeneous mix of hosts and tasks?
which kind of hosts do you need to autoscale based on which tasks are
pending?

These are automatically addressed in Fenzo.

Sharma

On Wed, Sep 23, 2015 at 4:56 AM, Aaron Carey <ac...@ilm.com> wrote:

> No, I basically had the same question as Jim (but maybe didn't word it so
> well ;))
>
> I'll have a look at your response there :)
>
> ------------------------------
> *From:* haosdent [haosdent@gmail.com]
> *Sent:* 23 September 2015 10:12
> *To:* user@mesos.apache.org
> *Subject:* Re: Metric for tasks queued/waiting?
>
> Does /metrics/snapshot not satisfy your requirement?
>
> On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <ac...@ilm.com> wrote:
>
>> Hi all,
>>
>> Is there any way to get a metric of all tasks currently waiting/queued in
>> Mesos (across all schedulers)? The snapshot metrics seem to cover ever
>> other kind of task state? This would be quite useful for auto-scaling
>> purposes..
>>
>> Thanks,
>> Aaron
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>

RE: Metric for tasks queued/waiting?

Posted by Aaron Carey <ac...@ilm.com>.

No, I basically had the same question as Jim (but maybe didn't word it so well ;))

I'll have a look at your response there :)

________________________________
From: haosdent [haosdent@gmail.com]
Sent: 23 September 2015 10:12
To: user@mesos.apache.org
Subject: Re: Metric for tasks queued/waiting?

Does /metrics/snapshot not satisfy your requirement?

On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <ac...@ilm.com>> wrote:
Hi all,

Is there any way to get a metric of all tasks currently waiting/queued in Mesos (across all schedulers)? The snapshot metrics seem to cover ever other kind of task state? This would be quite useful for auto-scaling purposes..

Thanks,
Aaron

--
Best Regards,
Haosdent Huang

Re: Metric for tasks queued/waiting?

Posted by haosdent <ha...@gmail.com>.

Does /metrics/snapshot not satisfy your requirement?

On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <ac...@ilm.com> wrote:

> Hi all,
>
> Is there any way to get a metric of all tasks currently waiting/queued in
> Mesos (across all schedulers)? The snapshot metrics seem to cover ever
> other kind of task state? This would be quite useful for auto-scaling
> purposes..
>
> Thanks,
> Aaron
>



-- 
Best Regards,
Haosdent Huang