You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Benjamin Mahler <bm...@apache.org> on 2018/05/16 18:06:45 UTC

[Performance WG] Notes from meeting today

Hi folks,

Here are some notes from the performance meeting today.

(1) First I did a demo of flamescope, you can find it here:
https://github.com/Netflix/flamescope

It's a very useful tool, hopefully we can make it easier for users to
generate the data that we can drop into flamescope when reporting any
performance issues. One of the open questions is how `perf --call-graph
dwarf` compares to `perf -g` but with mesos compiled with frame pointers. I
haven't had time to check this yet.

When playing with the tool, it was easy to find some hot spots in the given
cluster I was looking at (which was not necessarily representative). For
the agent, jie filed:

https://issues.apache.org/jira/browse/MESOS-8901

And for the master, I noticed that metrics, state json generation (no
surprise), and a particular spot in the allocator were very expensive.

Metrics we'd like to address via migration to push gauges (Zhitao has
offered to help with this effort):

https://issues.apache.org/jira/browse/MESOS-8914

The state generation we'd like to address via streaming state into a
separate actor (and providing filtering as well), this will get further
investigated / prioritized very soon:

https://issues.apache.org/jira/browse/MESOS-8345

(2) Kapil discussed benchmarks for the long standing "offer starvation"
issue:

https://issues.apache.org/jira/browse/MESOS-3202

I'll send out an email or document soon with some background on this issue
as well as our options to address it.

Let me know if you have any questions or feedback!

Ben

Re: [mesos-mail] Re: [Performance WG] Notes from meeting today

Posted by Benjamin Mahler <bm...@apache.org>.
I just pushed some initial documentation for this, it will show up soon
next to the memory profiling link:

http://mesos.apache.org/documentation/latest/#administration

On Fri, May 25, 2018 at 6:13 PM, Benjamin Mahler <bm...@apache.org> wrote:

> I'll write up some instructions with what I know so far and get it added
> to the website. In the meantime, here's what you need to do to generate a
> 60 second profile:
>
> $ sudo perf record -F 100 -a -g --call-graph dwarf -p <mesos-master-pid>
> -- sleep 60
> $ sudo perf script --header | c++filt > mesos-master.stacks
> $ gzip mesos-master.stacks
> # Share the mesos-master.stacks.gz file for analysis.
>
> It seems that frame pointer omission is ok, as long as '--call-graph
> dwarf' is provided to perf. I don't yet know if frame pointers yield better
> traces than '--call-graph dwarf' without frame pointers.
>
> If you want to use flamescope yourself, follow the instructions here and
> put the unzipped file above into the 'examples' directory:
> https://github.com/Netflix/flamescope
>
> On Thu, May 17, 2018 at 4:51 PM, Zhitao Li <zh...@gmail.com> wrote:
>
>> Hi Ben,
>>
>> Thanks a lot, this is super informative.
>>
>> One question: will you write a blog/doc on how to generate flamescope
>> graphs from either a micro-benchmark, or a real cluster? Also, do you know
>> what configuration for compiling should be used to preserve proper debug
>> symbols for both Mesos and 3rdparty libraries?
>>
>> On Wed, May 16, 2018 at 5:44 PM, Benjamin Mahler <bm...@apache.org>
>> wrote:
>>
>> > +Judith
>> >
>> > There should be a recording. Judith, do you know where they get posted?
>> >
>> > Benjamin, glad to hear it's useful, I'll continue doing it!
>> >
>> > On Wed, May 16, 2018 at 4:41 PM Gilbert Song <gi...@mesosphere.io>
>> > wrote:
>> >
>> > > Do we have the recorded video for this meeting?
>> > >
>> > > On Wed, May 16, 2018 at 1:54 PM, Benjamin Bannier <
>> > > benjamin.bannier@mesosphere.io> wrote:
>> > >
>> > > > Hi Ben,
>> > > >
>> > > > thanks for taking the time to edit and share these detailed notes.
>> > Being
>> > > > able to asynchronously see the great work folks are doing surfaced
>> is
>> > > > great, especially when put into context with thought like here.
>> > > >
>> > > >
>> > > > Benjamin
>> > > >
>> > > > > On May 16, 2018, at 8:06 PM, Benjamin Mahler <bm...@apache.org>
>> > > wrote:
>> > > > >
>> > > > > Hi folks,
>> > > > >
>> > > > > Here are some notes from the performance meeting today.
>> > > > >
>> > > > > (1) First I did a demo of flamescope, you can find it here:
>> > > > > https://github.com/Netflix/flamescope
>> > > > >
>> > > > > It's a very useful tool, hopefully we can make it easier for
>> users to
>> > > > > generate the data that we can drop into flamescope when reporting
>> any
>> > > > > performance issues. One of the open questions is how `perf
>> > --call-graph
>> > > > > dwarf` compares to `perf -g` but with mesos compiled with frame
>> > > > pointers. I
>> > > > > haven't had time to check this yet.
>> > > > >
>> > > > > When playing with the tool, it was easy to find some hot spots in
>> the
>> > > > given
>> > > > > cluster I was looking at (which was not necessarily
>> representative).
>> > > For
>> > > > > the agent, jie filed:
>> > > > >
>> > > > > https://issues.apache.org/jira/browse/MESOS-8901
>> > > > >
>> > > > > And for the master, I noticed that metrics, state json generation
>> (no
>> > > > > surprise), and a particular spot in the allocator were very
>> > expensive.
>> > > > >
>> > > > > Metrics we'd like to address via migration to push gauges (Zhitao
>> has
>> > > > > offered to help with this effort):
>> > > > >
>> > > > > https://issues.apache.org/jira/browse/MESOS-8914
>> > > > >
>> > > > > The state generation we'd like to address via streaming state
>> into a
>> > > > > separate actor (and providing filtering as well), this will get
>> > further
>> > > > > investigated / prioritized very soon:
>> > > > >
>> > > > > https://issues.apache.org/jira/browse/MESOS-8345
>> > > > >
>> > > > > (2) Kapil discussed benchmarks for the long standing "offer
>> > starvation"
>> > > > > issue:
>> > > > >
>> > > > > https://issues.apache.org/jira/browse/MESOS-3202
>> > > > >
>> > > > > I'll send out an email or document soon with some background on
>> this
>> > > > issue
>> > > > > as well as our options to address it.
>> > > > >
>> > > > > Let me know if you have any questions or feedback!
>> > > > >
>> > > > > Ben
>> > > >
>> > > > --
>> > > > You received this message because you are subscribed to the Google
>> > Groups
>> > > > "Apache Mesos Mail Lists" group.
>> > > > Visit this group at
>> > > https://groups.google.com/a/mesosphere.io/group/mesos-
>> > > > mail/.
>> > > > For more options, visit
>> > > https://groups.google.com/a/mesosphere.io/d/optout
>> > > > .
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> Cheers,
>>
>> Zhitao Li
>>
>
>

Re: [mesos-mail] Re: [Performance WG] Notes from meeting today

Posted by Benjamin Mahler <bm...@apache.org>.
I'll write up some instructions with what I know so far and get it added to
the website. In the meantime, here's what you need to do to generate a 60
second profile:

$ sudo perf record -F 100 -a -g --call-graph dwarf -p <mesos-master-pid> --
sleep 60
$ sudo perf script --header | c++filt > mesos-master.stacks
$ gzip mesos-master.stacks
# Share the mesos-master.stacks.gz file for analysis.

It seems that frame pointer omission is ok, as long as '--call-graph dwarf'
is provided to perf. I don't yet know if frame pointers yield better traces
than '--call-graph dwarf' without frame pointers.

If you want to use flamescope yourself, follow the instructions here and
put the unzipped file above into the 'examples' directory:
https://github.com/Netflix/flamescope

On Thu, May 17, 2018 at 4:51 PM, Zhitao Li <zh...@gmail.com> wrote:

> Hi Ben,
>
> Thanks a lot, this is super informative.
>
> One question: will you write a blog/doc on how to generate flamescope
> graphs from either a micro-benchmark, or a real cluster? Also, do you know
> what configuration for compiling should be used to preserve proper debug
> symbols for both Mesos and 3rdparty libraries?
>
> On Wed, May 16, 2018 at 5:44 PM, Benjamin Mahler <bm...@apache.org>
> wrote:
>
> > +Judith
> >
> > There should be a recording. Judith, do you know where they get posted?
> >
> > Benjamin, glad to hear it's useful, I'll continue doing it!
> >
> > On Wed, May 16, 2018 at 4:41 PM Gilbert Song <gi...@mesosphere.io>
> > wrote:
> >
> > > Do we have the recorded video for this meeting?
> > >
> > > On Wed, May 16, 2018 at 1:54 PM, Benjamin Bannier <
> > > benjamin.bannier@mesosphere.io> wrote:
> > >
> > > > Hi Ben,
> > > >
> > > > thanks for taking the time to edit and share these detailed notes.
> > Being
> > > > able to asynchronously see the great work folks are doing surfaced is
> > > > great, especially when put into context with thought like here.
> > > >
> > > >
> > > > Benjamin
> > > >
> > > > > On May 16, 2018, at 8:06 PM, Benjamin Mahler <bm...@apache.org>
> > > wrote:
> > > > >
> > > > > Hi folks,
> > > > >
> > > > > Here are some notes from the performance meeting today.
> > > > >
> > > > > (1) First I did a demo of flamescope, you can find it here:
> > > > > https://github.com/Netflix/flamescope
> > > > >
> > > > > It's a very useful tool, hopefully we can make it easier for users
> to
> > > > > generate the data that we can drop into flamescope when reporting
> any
> > > > > performance issues. One of the open questions is how `perf
> > --call-graph
> > > > > dwarf` compares to `perf -g` but with mesos compiled with frame
> > > > pointers. I
> > > > > haven't had time to check this yet.
> > > > >
> > > > > When playing with the tool, it was easy to find some hot spots in
> the
> > > > given
> > > > > cluster I was looking at (which was not necessarily
> representative).
> > > For
> > > > > the agent, jie filed:
> > > > >
> > > > > https://issues.apache.org/jira/browse/MESOS-8901
> > > > >
> > > > > And for the master, I noticed that metrics, state json generation
> (no
> > > > > surprise), and a particular spot in the allocator were very
> > expensive.
> > > > >
> > > > > Metrics we'd like to address via migration to push gauges (Zhitao
> has
> > > > > offered to help with this effort):
> > > > >
> > > > > https://issues.apache.org/jira/browse/MESOS-8914
> > > > >
> > > > > The state generation we'd like to address via streaming state into
> a
> > > > > separate actor (and providing filtering as well), this will get
> > further
> > > > > investigated / prioritized very soon:
> > > > >
> > > > > https://issues.apache.org/jira/browse/MESOS-8345
> > > > >
> > > > > (2) Kapil discussed benchmarks for the long standing "offer
> > starvation"
> > > > > issue:
> > > > >
> > > > > https://issues.apache.org/jira/browse/MESOS-3202
> > > > >
> > > > > I'll send out an email or document soon with some background on
> this
> > > > issue
> > > > > as well as our options to address it.
> > > > >
> > > > > Let me know if you have any questions or feedback!
> > > > >
> > > > > Ben
> > > >
> > > > --
> > > > You received this message because you are subscribed to the Google
> > Groups
> > > > "Apache Mesos Mail Lists" group.
> > > > Visit this group at
> > > https://groups.google.com/a/mesosphere.io/group/mesos-
> > > > mail/.
> > > > For more options, visit
> > > https://groups.google.com/a/mesosphere.io/d/optout
> > > > .
> > > >
> > >
> >
>
>
>
> --
> Cheers,
>
> Zhitao Li
>

Re: [mesos-mail] Re: [Performance WG] Notes from meeting today

Posted by Judith Malnick <jm...@mesosphere.io>.
Hi, Just uploaded the video. It will be done processing in a
couple minutes, and when it finishes you can find it here
https://youtu.be/LyFYTVOaJfQ

On Thu, May 17, 2018 at 4:51 PM, Zhitao Li <zh...@gmail.com> wrote:

> Hi Ben,
>
> Thanks a lot, this is super informative.
>
> One question: will you write a blog/doc on how to generate flamescope
> graphs from either a micro-benchmark, or a real cluster? Also, do you know
> what configuration for compiling should be used to preserve proper debug
> symbols for both Mesos and 3rdparty libraries?
>
> On Wed, May 16, 2018 at 5:44 PM, Benjamin Mahler <bm...@apache.org>
> wrote:
>
>> +Judith
>>
>> There should be a recording. Judith, do you know where they get posted?
>>
>> Benjamin, glad to hear it's useful, I'll continue doing it!
>>
>> On Wed, May 16, 2018 at 4:41 PM Gilbert Song <gi...@mesosphere.io>
>> wrote:
>>
>> > Do we have the recorded video for this meeting?
>> >
>> > On Wed, May 16, 2018 at 1:54 PM, Benjamin Bannier <
>> > benjamin.bannier@mesosphere.io> wrote:
>> >
>> > > Hi Ben,
>> > >
>> > > thanks for taking the time to edit and share these detailed notes.
>> Being
>> > > able to asynchronously see the great work folks are doing surfaced is
>> > > great, especially when put into context with thought like here.
>> > >
>> > >
>> > > Benjamin
>> > >
>> > > > On May 16, 2018, at 8:06 PM, Benjamin Mahler <bm...@apache.org>
>> > wrote:
>> > > >
>> > > > Hi folks,
>> > > >
>> > > > Here are some notes from the performance meeting today.
>> > > >
>> > > > (1) First I did a demo of flamescope, you can find it here:
>> > > > https://github.com/Netflix/flamescope
>> > > >
>> > > > It's a very useful tool, hopefully we can make it easier for users
>> to
>> > > > generate the data that we can drop into flamescope when reporting
>> any
>> > > > performance issues. One of the open questions is how `perf
>> --call-graph
>> > > > dwarf` compares to `perf -g` but with mesos compiled with frame
>> > > pointers. I
>> > > > haven't had time to check this yet.
>> > > >
>> > > > When playing with the tool, it was easy to find some hot spots in
>> the
>> > > given
>> > > > cluster I was looking at (which was not necessarily representative).
>> > For
>> > > > the agent, jie filed:
>> > > >
>> > > > https://issues.apache.org/jira/browse/MESOS-8901
>> > > >
>> > > > And for the master, I noticed that metrics, state json generation
>> (no
>> > > > surprise), and a particular spot in the allocator were very
>> expensive.
>> > > >
>> > > > Metrics we'd like to address via migration to push gauges (Zhitao
>> has
>> > > > offered to help with this effort):
>> > > >
>> > > > https://issues.apache.org/jira/browse/MESOS-8914
>> > > >
>> > > > The state generation we'd like to address via streaming state into a
>> > > > separate actor (and providing filtering as well), this will get
>> further
>> > > > investigated / prioritized very soon:
>> > > >
>> > > > https://issues.apache.org/jira/browse/MESOS-8345
>> > > >
>> > > > (2) Kapil discussed benchmarks for the long standing "offer
>> starvation"
>> > > > issue:
>> > > >
>> > > > https://issues.apache.org/jira/browse/MESOS-3202
>> > > >
>> > > > I'll send out an email or document soon with some background on this
>> > > issue
>> > > > as well as our options to address it.
>> > > >
>> > > > Let me know if you have any questions or feedback!
>> > > >
>> > > > Ben
>> > >
>> > > --
>> > > You received this message because you are subscribed to the Google
>> Groups
>> > > "Apache Mesos Mail Lists" group.
>> > > Visit this group at
>> > https://groups.google.com/a/mesosphere.io/group/mesos-
>> > > mail/.
>> > > For more options, visit
>> > https://groups.google.com/a/mesosphere.io/d/optout
>> > > .
>> > >
>> >
>>
>
>
>
> --
> Cheers,
>
> Zhitao Li
>



-- 
Judith Malnick
Community Manager
310-709-1517

Re: [mesos-mail] Re: [Performance WG] Notes from meeting today

Posted by Zhitao Li <zh...@gmail.com>.
Hi Ben,

Thanks a lot, this is super informative.

One question: will you write a blog/doc on how to generate flamescope
graphs from either a micro-benchmark, or a real cluster? Also, do you know
what configuration for compiling should be used to preserve proper debug
symbols for both Mesos and 3rdparty libraries?

On Wed, May 16, 2018 at 5:44 PM, Benjamin Mahler <bm...@apache.org> wrote:

> +Judith
>
> There should be a recording. Judith, do you know where they get posted?
>
> Benjamin, glad to hear it's useful, I'll continue doing it!
>
> On Wed, May 16, 2018 at 4:41 PM Gilbert Song <gi...@mesosphere.io>
> wrote:
>
> > Do we have the recorded video for this meeting?
> >
> > On Wed, May 16, 2018 at 1:54 PM, Benjamin Bannier <
> > benjamin.bannier@mesosphere.io> wrote:
> >
> > > Hi Ben,
> > >
> > > thanks for taking the time to edit and share these detailed notes.
> Being
> > > able to asynchronously see the great work folks are doing surfaced is
> > > great, especially when put into context with thought like here.
> > >
> > >
> > > Benjamin
> > >
> > > > On May 16, 2018, at 8:06 PM, Benjamin Mahler <bm...@apache.org>
> > wrote:
> > > >
> > > > Hi folks,
> > > >
> > > > Here are some notes from the performance meeting today.
> > > >
> > > > (1) First I did a demo of flamescope, you can find it here:
> > > > https://github.com/Netflix/flamescope
> > > >
> > > > It's a very useful tool, hopefully we can make it easier for users to
> > > > generate the data that we can drop into flamescope when reporting any
> > > > performance issues. One of the open questions is how `perf
> --call-graph
> > > > dwarf` compares to `perf -g` but with mesos compiled with frame
> > > pointers. I
> > > > haven't had time to check this yet.
> > > >
> > > > When playing with the tool, it was easy to find some hot spots in the
> > > given
> > > > cluster I was looking at (which was not necessarily representative).
> > For
> > > > the agent, jie filed:
> > > >
> > > > https://issues.apache.org/jira/browse/MESOS-8901
> > > >
> > > > And for the master, I noticed that metrics, state json generation (no
> > > > surprise), and a particular spot in the allocator were very
> expensive.
> > > >
> > > > Metrics we'd like to address via migration to push gauges (Zhitao has
> > > > offered to help with this effort):
> > > >
> > > > https://issues.apache.org/jira/browse/MESOS-8914
> > > >
> > > > The state generation we'd like to address via streaming state into a
> > > > separate actor (and providing filtering as well), this will get
> further
> > > > investigated / prioritized very soon:
> > > >
> > > > https://issues.apache.org/jira/browse/MESOS-8345
> > > >
> > > > (2) Kapil discussed benchmarks for the long standing "offer
> starvation"
> > > > issue:
> > > >
> > > > https://issues.apache.org/jira/browse/MESOS-3202
> > > >
> > > > I'll send out an email or document soon with some background on this
> > > issue
> > > > as well as our options to address it.
> > > >
> > > > Let me know if you have any questions or feedback!
> > > >
> > > > Ben
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "Apache Mesos Mail Lists" group.
> > > Visit this group at
> > https://groups.google.com/a/mesosphere.io/group/mesos-
> > > mail/.
> > > For more options, visit
> > https://groups.google.com/a/mesosphere.io/d/optout
> > > .
> > >
> >
>



-- 
Cheers,

Zhitao Li

Re: [mesos-mail] Re: [Performance WG] Notes from meeting today

Posted by Benjamin Mahler <bm...@apache.org>.
+Judith

There should be a recording. Judith, do you know where they get posted?

Benjamin, glad to hear it's useful, I'll continue doing it!

On Wed, May 16, 2018 at 4:41 PM Gilbert Song <gi...@mesosphere.io> wrote:

> Do we have the recorded video for this meeting?
>
> On Wed, May 16, 2018 at 1:54 PM, Benjamin Bannier <
> benjamin.bannier@mesosphere.io> wrote:
>
> > Hi Ben,
> >
> > thanks for taking the time to edit and share these detailed notes. Being
> > able to asynchronously see the great work folks are doing surfaced is
> > great, especially when put into context with thought like here.
> >
> >
> > Benjamin
> >
> > > On May 16, 2018, at 8:06 PM, Benjamin Mahler <bm...@apache.org>
> wrote:
> > >
> > > Hi folks,
> > >
> > > Here are some notes from the performance meeting today.
> > >
> > > (1) First I did a demo of flamescope, you can find it here:
> > > https://github.com/Netflix/flamescope
> > >
> > > It's a very useful tool, hopefully we can make it easier for users to
> > > generate the data that we can drop into flamescope when reporting any
> > > performance issues. One of the open questions is how `perf --call-graph
> > > dwarf` compares to `perf -g` but with mesos compiled with frame
> > pointers. I
> > > haven't had time to check this yet.
> > >
> > > When playing with the tool, it was easy to find some hot spots in the
> > given
> > > cluster I was looking at (which was not necessarily representative).
> For
> > > the agent, jie filed:
> > >
> > > https://issues.apache.org/jira/browse/MESOS-8901
> > >
> > > And for the master, I noticed that metrics, state json generation (no
> > > surprise), and a particular spot in the allocator were very expensive.
> > >
> > > Metrics we'd like to address via migration to push gauges (Zhitao has
> > > offered to help with this effort):
> > >
> > > https://issues.apache.org/jira/browse/MESOS-8914
> > >
> > > The state generation we'd like to address via streaming state into a
> > > separate actor (and providing filtering as well), this will get further
> > > investigated / prioritized very soon:
> > >
> > > https://issues.apache.org/jira/browse/MESOS-8345
> > >
> > > (2) Kapil discussed benchmarks for the long standing "offer starvation"
> > > issue:
> > >
> > > https://issues.apache.org/jira/browse/MESOS-3202
> > >
> > > I'll send out an email or document soon with some background on this
> > issue
> > > as well as our options to address it.
> > >
> > > Let me know if you have any questions or feedback!
> > >
> > > Ben
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Apache Mesos Mail Lists" group.
> > Visit this group at
> https://groups.google.com/a/mesosphere.io/group/mesos-
> > mail/.
> > For more options, visit
> https://groups.google.com/a/mesosphere.io/d/optout
> > .
> >
>

Re: [mesos-mail] Re: [Performance WG] Notes from meeting today

Posted by Gilbert Song <gi...@mesosphere.io>.
Do we have the recorded video for this meeting?

On Wed, May 16, 2018 at 1:54 PM, Benjamin Bannier <
benjamin.bannier@mesosphere.io> wrote:

> Hi Ben,
>
> thanks for taking the time to edit and share these detailed notes. Being
> able to asynchronously see the great work folks are doing surfaced is
> great, especially when put into context with thought like here.
>
>
> Benjamin
>
> > On May 16, 2018, at 8:06 PM, Benjamin Mahler <bm...@apache.org> wrote:
> >
> > Hi folks,
> >
> > Here are some notes from the performance meeting today.
> >
> > (1) First I did a demo of flamescope, you can find it here:
> > https://github.com/Netflix/flamescope
> >
> > It's a very useful tool, hopefully we can make it easier for users to
> > generate the data that we can drop into flamescope when reporting any
> > performance issues. One of the open questions is how `perf --call-graph
> > dwarf` compares to `perf -g` but with mesos compiled with frame
> pointers. I
> > haven't had time to check this yet.
> >
> > When playing with the tool, it was easy to find some hot spots in the
> given
> > cluster I was looking at (which was not necessarily representative). For
> > the agent, jie filed:
> >
> > https://issues.apache.org/jira/browse/MESOS-8901
> >
> > And for the master, I noticed that metrics, state json generation (no
> > surprise), and a particular spot in the allocator were very expensive.
> >
> > Metrics we'd like to address via migration to push gauges (Zhitao has
> > offered to help with this effort):
> >
> > https://issues.apache.org/jira/browse/MESOS-8914
> >
> > The state generation we'd like to address via streaming state into a
> > separate actor (and providing filtering as well), this will get further
> > investigated / prioritized very soon:
> >
> > https://issues.apache.org/jira/browse/MESOS-8345
> >
> > (2) Kapil discussed benchmarks for the long standing "offer starvation"
> > issue:
> >
> > https://issues.apache.org/jira/browse/MESOS-3202
> >
> > I'll send out an email or document soon with some background on this
> issue
> > as well as our options to address it.
> >
> > Let me know if you have any questions or feedback!
> >
> > Ben
>
> --
> You received this message because you are subscribed to the Google Groups
> "Apache Mesos Mail Lists" group.
> Visit this group at https://groups.google.com/a/mesosphere.io/group/mesos-
> mail/.
> For more options, visit https://groups.google.com/a/mesosphere.io/d/optout
> .
>

Re: [Performance WG] Notes from meeting today

Posted by Benjamin Bannier <be...@mesosphere.io>.
Hi Ben,

thanks for taking the time to edit and share these detailed notes. Being
able to asynchronously see the great work folks are doing surfaced is
great, especially when put into context with thought like here.


Benjamin

> On May 16, 2018, at 8:06 PM, Benjamin Mahler <bm...@apache.org> wrote:
> 
> Hi folks,
> 
> Here are some notes from the performance meeting today.
> 
> (1) First I did a demo of flamescope, you can find it here:
> https://github.com/Netflix/flamescope
> 
> It's a very useful tool, hopefully we can make it easier for users to
> generate the data that we can drop into flamescope when reporting any
> performance issues. One of the open questions is how `perf --call-graph
> dwarf` compares to `perf -g` but with mesos compiled with frame pointers. I
> haven't had time to check this yet.
> 
> When playing with the tool, it was easy to find some hot spots in the given
> cluster I was looking at (which was not necessarily representative). For
> the agent, jie filed:
> 
> https://issues.apache.org/jira/browse/MESOS-8901
> 
> And for the master, I noticed that metrics, state json generation (no
> surprise), and a particular spot in the allocator were very expensive.
> 
> Metrics we'd like to address via migration to push gauges (Zhitao has
> offered to help with this effort):
> 
> https://issues.apache.org/jira/browse/MESOS-8914
> 
> The state generation we'd like to address via streaming state into a
> separate actor (and providing filtering as well), this will get further
> investigated / prioritized very soon:
> 
> https://issues.apache.org/jira/browse/MESOS-8345
> 
> (2) Kapil discussed benchmarks for the long standing "offer starvation"
> issue:
> 
> https://issues.apache.org/jira/browse/MESOS-3202
> 
> I'll send out an email or document soon with some background on this issue
> as well as our options to address it.
> 
> Let me know if you have any questions or feedback!
> 
> Ben