You are viewing a plain text version of this content. The canonical link for it is here.

Posted to kato-spec@incubator.apache.org by Andreas Grabner <an...@dynatrace.com> on 2009/11/04 11:46:34 UTC

RE: "Snapshot" support

Hi Steve - following up on my email I've sent out two weeks ago. See my
answers below starting with "AG:"

-----Original Message-----
From: Andreas Grabner 
Sent: Mittwoch, 21. Oktober 2009 10:41
To: kato-spec@incubator.apache.org
Subject: RE: "Snapshot" support

Hi Steve - find my answers below (lines starting with "AG:")

Let me know how you want to split up this thread as we are discussing
multiple topics here

thanks

-----Original Message-----
From: Steve Poole [mailto:spoole167@googlemail.com] 
Sent: Samstag, 17. Oktober 2009 08:11
To: kato-spec@incubator.apache.org
Subject: Re: "Snapshot" support

On Wed, Oct 14, 2009 at 2:55 PM, Andreas Grabner <
andreas.grabner@dynatrace.com> wrote:

> Steve
>
> Thanks Andreas -  this is good stuff.  Questions below.

I propose we continue to  discuss  on this thread but with the aim of
pulling out the top level items as we go. For instance you've obviously
got
more requirements on JVMTI and we should pull that out as a separate
thread.    I'll do that once you've answered a few of my questions
below.

Thanks  again


>
> I am following up on Alois' email with some uses cases that we have
with
> our clients. Based on those use cases we also derived requirements.
>
>
>
> Use Case: Really Large Memory Dumps don't Scale
>
> Most of our enterprise customers run their applications on 64Bit
Systems
> with JVM's having > 1.5GB Heap Space.
>
>
Do you have info  on what the latest heap size  is that you've
encountered?

AG: We have seen heap sizes of 8GB

Iterating through Java Objects on the Heap doesn't scale with growing
> Heap Sizes. Due to the object tagging that creates a tag for every
> object on the heap we quickly exhaust the native memory.
>
> Do you have to tag all objects?

AG: Our Memory Snapshot feature visualizes the referral tree of objects.
I believe that for this we need to create tags for each individual
object in order to get the reference information - unless there is
another way of walking the referrer tree that we are not aware of??


> Using the current JVMTI/PI APIs doesn't allow us to iterate over the
> heap for large heaps in a timely manner or without running into memory
> issues -> Large Memory Dumps are often not possible!!
>

> Use Case: Provide full context information in OutOfMemory situation
>
> Capturing dump information in case of an OutOfMemory exception is key
to
> understand the root cause of this event.
>
> Access to the JVMTI interfaces to iterate the objects on the heap is
not
> possible at this point in time which makes it impossible to collect
heap
> information in the same way as when creating a dump during "normal"
> runtime execution
>
> Therefore no detailed memory dumps can be made in the eventhandler of
an
> OutOfMemory exception!!
>

I don't understand this -   the Eclipse MAT tool does detailed OOM
analysis
and that uses the HPROF file for Sun JVMs and other dumps for IBM JVMs -
do
you  extra requirements beyond what MAT can offer?

AG: In the next use case I explain why we prefer to not use dump files
for analysis but rather have a "central approach" where our agent (that
sits in the JVM) can grab all the information needed to perform OOM
Analysis. In large distributed environments its not feasible to start
collecting log files from different servers. With our agent technology
we can collect this data from within the JVM and send it off to our
central server that manages all JVMs in the System

>
>
>
> Use Case: Central Management of Memory Dumps in Distributed Enterprise
> Applications
>
> Most of our enterprise customers run their distributed applications on
> multiple servers hosting multiple JVM's. Creating and analyzing memory
> dumps in these distributed scenarios must be central manageable.
>
> Creating dump files and storing them on the server machines is not a
> perfect solution because
>
> a)       Getting access to dump files on servers is often restricted
by
> security policies
>
> b)       Local Disk Space required
>
> Therefore a dump file approach is not an option for most of our
> customers!!
>
>  Understood.

>
>
>
> Requirements based on use cases
>
> *       Limit the native memory usage when iterating through objects
>
>        *       Eliminate the need for an additional object tag
> requiring native memory -> this can exhaust native memory when having
> millions of objects
>        *       Instead of using the Tag use Object Ptr
>        *       Must ensure that the ObjectPtr stays constant (no
> objects are moved) throughout the iteration operation
>
>
*       Enable JVMTI Interface access to iterate through Heap Objects in
> case of Resource Exhaustion (OutOfMemory)
>
>        *       Having full access to all Object Heap Interface
> functions allows us to capture this information in case of an OOM
>        *       Also have access to JVMTI interfaces for capturing
stack
> traces
>        *       Can some part of this information also be made
available
> in terms of a more severe JVM crash?
>
> *       Native Interface for memory dump generation
>
> Thats an interesting idea -  we were expecting to provide a Java API
to do
that .  Having a native version could easily make sense.

AG: Native would be our preference


>        *       In order to centrally manage memory dumps we need to be
> able to do it via a native interface within the JVM
>

I don't understand why you need to manage dumps via a a native
interface?

AG: We have an agent that lives in the JVM. This agent sends memory
information to our central dynaTrace Server. This allows us to do
central management of all connected JVMs. I mentioned earlier that
working via dump files doesn't always work with our clients (security
policies, disk space, ...). As there is JVMTI already - why not extend
this API? We are also OK with a JavaAPI as long as it works within the
JVM and not on Dump Files and as long as the performance is not a
problem (compared to a native implementation)


>        *       JVMTI would be a perfect candidate assuming the
existing
> limitations can be addressed
>        *       Separate native interface would be an alternative
option
>
> Agree -  but in either case addressing the usage issues with JVMTI
will
come down to understanding why JVMTI looks like it does now and how
other
approaches may affect the runtime performance of a system.

AG: Agreed. Performance is a big topic for us. Getting this kind of
information must work fast - nobody wants to wait hours to grab a
detailed memory snapshot.

>
>
> Additional requirements (maybe not in the scope of this JSR)
>
>
I think these are all worth discussing -   if you use the info then we
should explore if it makes sense to specifiy it.l



> *       Access to Objects in PermGen
> *       Generation information when iterating through objects
>
>        *       which generations are objects in that live in the heap
>
> *       Get access to Generation Sizes via JVMTI
>
>        *       Size information is available via JMX
>        *       so it should also be made available via the native
> interfaces
>
> *       Object Information on GC finished event
>
>        *       get information about how many objects have been
> moved/freed (either real object Id's or at least the size)
>        *       must be able to turn on/off this feature during runtime
> to keep overhead low when not needed
>
>
>
> Let me know if any of these use cases or requirements needs further
> explanation.
>
>
>
> Thanks
>
> Andi & Alois
>
>
>
>
>
> -----Original Message-----
> From: Alois Reitbauer
> Sent: Montag, 21. September 2009 17:01
> To: kato-spec@incubator.apache.org
> Cc: Andreas Grabner
> Subject: RE: "Snapshot" support
>
>
>
> Steve,
>
>
>
> we will be happy to contribute our use cases. I propose to start with
>
> memory dumps first and thread dumps later. Either me or Andi will come
>
> back with some concrete use cases.
>
>
>
> - Alois
>
>
>
> -----Original Message-----
>
> From: Steve Poole [mailto:spoole167@googlemail.com]
>
> Sent: Dienstag, 08. September 2009 06:31
>
> To: kato-spec@incubator.apache.org
>
> Subject: "Snapshot" support
>
>
>
> One of the capabilities that this API is intended to provide is
support
>
> for
>
> "Snapshots"
>
>
>
> This is  based on the idea that for various reasons the dumps that we
>
> can
>
> get today can be too big, take too long to generate , not have the
right
>
> information etc.
>
>
>
> Also we need to recognise that dumps are not only produced to help
>
> diagnose
>
> a failure.  Some users consume dumps as part of monitoring a live
>
> system.
>
>
>
> So we need to discuss (at least)
>
>
>
> a)  How dump content configuration would work
>
> b)  What sorts of data are needed in a snapshot dump
>
>
>
> This is the largest outstanding piece of the API.   Now with Alois and
>
> Andreas on board we can start to clarify usecases and  resolve the
>
> design
>
>
>
>
>
> Cheers
>
>
>
> Steve
>
>
>
>
>
>


-- 
Steve

Re: "Snapshot" support

Posted by Steve Poole <sp...@googlemail.com>.

Sorry Andreas - been out on vacation.    Made a quick reply below and I will
do a more detailed response later.


On Wed, Nov 4, 2009 at 11:46 AM, Andreas Grabner <
andreas.grabner@dynatrace.com> wrote:

> Hi Steve - following up on my email I've sent out two weeks ago. See my
> answers below starting with "AG:"
>
> -----Original Message-----
> From: Andreas Grabner
> Sent: Mittwoch, 21. Oktober 2009 10:41
> To: kato-spec@incubator.apache.org
> Subject: RE: "Snapshot" support
>
> Hi Steve - find my answers below (lines starting with "AG:")
>
> Let me know how you want to split up this thread as we are discussing
> multiple topics here
>
> thanks
>
> -----Original Message-----
> From: Steve Poole [mailto:spoole167@googlemail.com]
> Sent: Samstag, 17. Oktober 2009 08:11
> To: kato-spec@incubator.apache.org
> Subject: Re: "Snapshot" support
>
> On Wed, Oct 14, 2009 at 2:55 PM, Andreas Grabner <
> andreas.grabner@dynatrace.com> wrote:
>
> > Steve
> >
> > Thanks Andreas -  this is good stuff.  Questions below.
>
> I propose we continue to  discuss  on this thread but with the aim of
> pulling out the top level items as we go. For instance you've obviously
> got
> more requirements on JVMTI and we should pull that out as a separate
> thread.    I'll do that once you've answered a few of my questions
> below.
>
> Thanks  again
>
>
> >
> > I am following up on Alois' email with some uses cases that we have
> with
> > our clients. Based on those use cases we also derived requirements.
> >
> >
> >
> > Use Case: Really Large Memory Dumps don't Scale
> >
> > Most of our enterprise customers run their applications on 64Bit
> Systems
> > with JVM's having > 1.5GB Heap Space.
> >
> >
> Do you have info  on what the latest heap size  is that you've
> encountered?
>
> AG: We have seen heap sizes of 8GB
>
> Iterating through Java Objects on the Heap doesn't scale with growing
> > Heap Sizes. Due to the object tagging that creates a tag for every
> > object on the heap we quickly exhaust the native memory.
> >
> > Do you have to tag all objects?
>
> AG: Our Memory Snapshot feature visualizes the referral tree of objects.
> I believe that for this we need to create tags for each individual
> object in order to get the reference information - unless there is
> another way of walking the referrer tree that we are not aware of??
>
> For the RI and the JVMTI based dump we've sidestepped the problem by
working from the threads and their object references
That means we don't have to use the tagging to find objects but it does mean
you have to keep  track of ones you've seen
before.



> > Using the current JVMTI/PI APIs doesn't allow us to iterate over the
> > heap for large heaps in a timely manner or without running into memory
> > issues -> Large Memory Dumps are often not possible!!
> >
>
> > Use Case: Provide full context information in OutOfMemory situation
> >
> > Capturing dump information in case of an OutOfMemory exception is key
> to
> > understand the root cause of this event.
> >
> > Access to the JVMTI interfaces to iterate the objects on the heap is
> not
> > possible at this point in time which makes it impossible to collect
> heap
> > information in the same way as when creating a dump during "normal"
> > runtime execution
> >
> > Therefore no detailed memory dumps can be made in the eventhandler of
> an
> > OutOfMemory exception!!
> >
>
> I don't understand this -   the Eclipse MAT tool does detailed OOM
> analysis
> and that uses the HPROF file for Sun JVMs and other dumps for IBM JVMs -
> do
> you  extra requirements beyond what MAT can offer?
>
> AG: In the next use case I explain why we prefer to not use dump files
> for analysis but rather have a "central approach" where our agent (that
> sits in the JVM) can grab all the information needed to perform OOM
> Analysis. In large distributed environments its not feasible to start
> collecting log files from different servers. With our agent technology
> we can collect this data from within the JVM and send it off to our
> central server that manages all JVMs in the System
>
>
> >
> >
> > Use Case: Central Management of Memory Dumps in Distributed Enterprise
> > Applications
> >
> > Most of our enterprise customers run their distributed applications on
> > multiple servers hosting multiple JVM's. Creating and analyzing memory
> > dumps in these distributed scenarios must be central manageable.
> >
> > Creating dump files and storing them on the server machines is not a
> > perfect solution because
> >
> > a)       Getting access to dump files on servers is often restricted
> by
> > security policies
> >
> > b)       Local Disk Space required
> >
> > Therefore a dump file approach is not an option for most of our
> > customers!!
> >
> >  Understood.
>
> >
> >
> >
> > Requirements based on use cases
> >
> > *       Limit the native memory usage when iterating through objects
> >
> >        *       Eliminate the need for an additional object tag
> > requiring native memory -> this can exhaust native memory when having
> > millions of objects
> >        *       Instead of using the Tag use Object Ptr
> >        *       Must ensure that the ObjectPtr stays constant (no
> > objects are moved) throughout the iteration operation
> >
> >
> *       Enable JVMTI Interface access to iterate through Heap Objects in
> > case of Resource Exhaustion (OutOfMemory)
> >
> >        *       Having full access to all Object Heap Interface
> > functions allows us to capture this information in case of an OOM
> >        *       Also have access to JVMTI interfaces for capturing
> stack
> > traces
> >        *       Can some part of this information also be made
> available
> > in terms of a more severe JVM crash?
> >
> > *       Native Interface for memory dump generation
> >
> > Thats an interesting idea -  we were expecting to provide a Java API
> to do
> that .  Having a native version could easily make sense.
>
> AG: Native would be our preference
>
>
> >        *       In order to centrally manage memory dumps we need to be
> > able to do it via a native interface within the JVM
> >
>
> I don't understand why you need to manage dumps via a a native
> interface?
>
> AG: We have an agent that lives in the JVM. This agent sends memory
> information to our central dynaTrace Server. This allows us to do
> central management of all connected JVMs. I mentioned earlier that
> working via dump files doesn't always work with our clients (security
> policies, disk space, ...). As there is JVMTI already - why not extend
> this API? We are also OK with a JavaAPI as long as it works within the
> JVM and not on Dump Files and as long as the performance is not a
> problem (compared to a native implementation)
>
>
hmm  - ok so we do need to be careful about not straying into the world of
tracing.  The JSR is intended to cover static data sets: even if they are
taken very
frequently :-).     The data doesn't have to reside on a dump file -  it
could of course
just be sent down the wire to a remote colection point.

My concern is that at this point specifing additions to JVMTI  to improve
its performance or design is too
early.  I appreciate that you want to see JVMTI improved but we need to move
the discussion up a level and focus  on the  externals (and leave the
implementors the choice of how they make it work)
If the way to make a sensible solution ends up  requiring new or improved
native level APIs then thats fine.


My understanding is that you do the following -

collect data
send it to a collection point
analyse it

(repeat)

Some of the questions we should be asking are

A)   What data is collected and how do you define the criteria
B)   Whats types of analysis takes place and (apart from just accessing the
data sent)  what types of cross collection information is required (I'm
thinking about object corrolation )
C)   How much data and how often is it required

Its these sort of questions that will help shape the API, and help us drive
down to how actually we need to imagine it being implemented.



> >        *       JVMTI would be a perfect candidate assuming the
> existing
> > limitations can be addressed
> >        *       Separate native interface would be an alternative
> option
> >
> > Agree -  but in either case addressing the usage issues with JVMTI
> will
> come down to understanding why JVMTI looks like it does now and how
> other
> approaches may affect the runtime performance of a system.
>
> AG: Agreed. Performance is a big topic for us. Getting this kind of
> information must work fast - nobody wants to wait hours to grab a
> detailed memory snapshot.
>
> >
> >
> > Additional requirements (maybe not in the scope of this JSR)
> >
> >
> I think these are all worth discussing -   if you use the info then we
> should explore if it makes sense to specifiy it.l
>
>
>
> > *       Access to Objects in PermGen
> > *       Generation information when iterating through objects
> >
> >        *       which generations are objects in that live in the heap
> >
> > *       Get access to Generation Sizes via JVMTI
> >
> >        *       Size information is available via JMX
> >        *       so it should also be made available via the native
> > interfaces
> >
> > *       Object Information on GC finished event
> >
> >        *       get information about how many objects have been
> > moved/freed (either real object Id's or at least the size)
> >        *       must be able to turn on/off this feature during runtime
> > to keep overhead low when not needed
> >
> >
> >
> > Let me know if any of these use cases or requirements needs further
> > explanation.
> >
> >
> >
> > Thanks
> >
> > Andi & Alois
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Alois Reitbauer
> > Sent: Montag, 21. September 2009 17:01
> > To: kato-spec@incubator.apache.org
> > Cc: Andreas Grabner
> > Subject: RE: "Snapshot" support
> >
> >
> >
> > Steve,
> >
> >
> >
> > we will be happy to contribute our use cases. I propose to start with
> >
> > memory dumps first and thread dumps later. Either me or Andi will come
> >
> > back with some concrete use cases.
> >
> >
> >
> > - Alois
> >
> >
> >
> > -----Original Message-----
> >
> > From: Steve Poole [mailto:spoole167@googlemail.com]
> >
> > Sent: Dienstag, 08. September 2009 06:31
> >
> > To: kato-spec@incubator.apache.org
> >
> > Subject: "Snapshot" support
> >
> >
> >
> > One of the capabilities that this API is intended to provide is
> support
> >
> > for
> >
> > "Snapshots"
> >
> >
> >
> > This is  based on the idea that for various reasons the dumps that we
> >
> > can
> >
> > get today can be too big, take too long to generate , not have the
> right
> >
> > information etc.
> >
> >
> >
> > Also we need to recognise that dumps are not only produced to help
> >
> > diagnose
> >
> > a failure.  Some users consume dumps as part of monitoring a live
> >
> > system.
> >
> >
> >
> > So we need to discuss (at least)
> >
> >
> >
> > a)  How dump content configuration would work
> >
> > b)  What sorts of data are needed in a snapshot dump
> >
> >
> >
> > This is the largest outstanding piece of the API.   Now with Alois and
> >
> > Andreas on board we can start to clarify usecases and  resolve the
> >
> > design
> >
> >
> >
> >
> >
> > Cheers
> >
> >
> >
> > Steve
> >
> >
> >
> >
> >
> >
>
>
> --
> Steve
>
>
>


-- 
Steve

Re: "Snapshot" support

Posted by Steve Poole <sp...@googlemail.com>.

Andreas , thanks for this info.

 I don't think much of what you've described is outside the JSR scope at the
moment.  The JSR can deal with standardising the collection of data
(including statistics)  and when dumps are triggered.   The form of the data
produced  is intentionally  implementation specific but the data reading API
would be part of the standard.

I think its worth splitting out  the topics I've outlined  into separate
threads for further discussion so I'll create a threads for discussing
statistics and  dump triggers



On Mon, Nov 16, 2009 at 6:42 PM, Andreas Grabner <
andreas.grabner@dynatrace.com> wrote:

> Thanks Steve. This email seems to grow a lot - so I placed your current
> questions at the top - see my combined answer below
>
> A)   What data is collected and how do you define the criteria
> B)   Whats types of analysis takes place and (apart from just accessing
> the
> data sent)  what types of cross collection information is required (I'm
> thinking about object corrolation )
> C)   How much data and how often is it required
>
> Our goal is to collect heap information from a remote location (our
> collectors). We have two options in terms of the depth of data. Simple
> Dump only includes the number of instances per class. The Extended Dump
> includes object references. Data collection is either triggered
> manually, scheduled (e.g.: every 30 minutes during a load-test) or
> triggered (in case of a certain event - e.g.: heavy memory usage). The
> collected "raw" data is sent from the JVM to the central collector that
> is analyzing the raw data in terms of, e.g.: what are the most used
> classes, walking the referrer tree, ...
>
> Our challenges/requests are
> * that this is very slow for very large heap sizes as we see it with our
> clients.
> * we don't get this information in case of a severe runtime problem,
> e.g.: outofmemory
> * additionally to that we want to get more information about the
> individual objects on the heap, e.g: in which generation they live in.
>
> Some of this exceeds the scope of this JSR - please advice on what we
> should keep in here and what should be discussed elsewhere
>
> Thanks
>
>
>
> -----Original Message-----
> From: Steve Poole [mailto:spoole167@googlemail.com]
> Sent: Mittwoch, 04. November 2009 12:11
> To: kato-spec@incubator.apache.org
> Subject: Re: "Snapshot" support
>
> Sorry Andreas - been out on vacation.    Made a quick reply below and I
> will
> do a more detailed response later.
>
>
> On Wed, Nov 4, 2009 at 11:46 AM, Andreas Grabner <
> andreas.grabner@dynatrace.com> wrote:
>
> > Hi Steve - following up on my email I've sent out two weeks ago. See
> my
> > answers below starting with "AG:"
> >
> > -----Original Message-----
> > From: Andreas Grabner
> > Sent: Mittwoch, 21. Oktober 2009 10:41
> > To: kato-spec@incubator.apache.org
> > Subject: RE: "Snapshot" support
> >
> > Hi Steve - find my answers below (lines starting with "AG:")
> >
> > Let me know how you want to split up this thread as we are discussing
> > multiple topics here
> >
> > thanks
> >
> > -----Original Message-----
> > From: Steve Poole [mailto:spoole167@googlemail.com]
> > Sent: Samstag, 17. Oktober 2009 08:11
> > To: kato-spec@incubator.apache.org
> > Subject: Re: "Snapshot" support
> >
> > On Wed, Oct 14, 2009 at 2:55 PM, Andreas Grabner <
> > andreas.grabner@dynatrace.com> wrote:
> >
> > > Steve
> > >
> > > Thanks Andreas -  this is good stuff.  Questions below.
> >
> > I propose we continue to  discuss  on this thread but with the aim of
> > pulling out the top level items as we go. For instance you've
> obviously
> > got
> > more requirements on JVMTI and we should pull that out as a separate
> > thread.    I'll do that once you've answered a few of my questions
> > below.
> >
> > Thanks  again
> >
> >
> > >
> > > I am following up on Alois' email with some uses cases that we have
> > with
> > > our clients. Based on those use cases we also derived requirements.
> > >
> > >
> > >
> > > Use Case: Really Large Memory Dumps don't Scale
> > >
> > > Most of our enterprise customers run their applications on 64Bit
> > Systems
> > > with JVM's having > 1.5GB Heap Space.
> > >
> > >
> > Do you have info  on what the latest heap size  is that you've
> > encountered?
> >
> > AG: We have seen heap sizes of 8GB
> >
> > Iterating through Java Objects on the Heap doesn't scale with growing
> > > Heap Sizes. Due to the object tagging that creates a tag for every
> > > object on the heap we quickly exhaust the native memory.
> > >
> > > Do you have to tag all objects?
> >
> > AG: Our Memory Snapshot feature visualizes the referral tree of
> objects.
> > I believe that for this we need to create tags for each individual
> > object in order to get the reference information - unless there is
> > another way of walking the referrer tree that we are not aware of??
> >
> > For the RI and the JVMTI based dump we've sidestepped the problem by
> working from the threads and their object references
> That means we don't have to use the tagging to find objects but it does
> mean
> you have to keep  track of ones you've seen
> before.
>
>
>
> > > Using the current JVMTI/PI APIs doesn't allow us to iterate over the
> > > heap for large heaps in a timely manner or without running into
> memory
> > > issues -> Large Memory Dumps are often not possible!!
> > >
> >
> > > Use Case: Provide full context information in OutOfMemory situation
> > >
> > > Capturing dump information in case of an OutOfMemory exception is
> key
> > to
> > > understand the root cause of this event.
> > >
> > > Access to the JVMTI interfaces to iterate the objects on the heap is
> > not
> > > possible at this point in time which makes it impossible to collect
> > heap
> > > information in the same way as when creating a dump during "normal"
> > > runtime execution
> > >
> > > Therefore no detailed memory dumps can be made in the eventhandler
> of
> > an
> > > OutOfMemory exception!!
> > >
> >
> > I don't understand this -   the Eclipse MAT tool does detailed OOM
> > analysis
> > and that uses the HPROF file for Sun JVMs and other dumps for IBM JVMs
> -
> > do
> > you  extra requirements beyond what MAT can offer?
> >
> > AG: In the next use case I explain why we prefer to not use dump files
> > for analysis but rather have a "central approach" where our agent
> (that
> > sits in the JVM) can grab all the information needed to perform OOM
> > Analysis. In large distributed environments its not feasible to start
> > collecting log files from different servers. With our agent technology
> > we can collect this data from within the JVM and send it off to our
> > central server that manages all JVMs in the System
> >
> >
> > >
> > >
> > > Use Case: Central Management of Memory Dumps in Distributed
> Enterprise
> > > Applications
> > >
> > > Most of our enterprise customers run their distributed applications
> on
> > > multiple servers hosting multiple JVM's. Creating and analyzing
> memory
> > > dumps in these distributed scenarios must be central manageable.
> > >
> > > Creating dump files and storing them on the server machines is not a
> > > perfect solution because
> > >
> > > a)       Getting access to dump files on servers is often restricted
> > by
> > > security policies
> > >
> > > b)       Local Disk Space required
> > >
> > > Therefore a dump file approach is not an option for most of our
> > > customers!!
> > >
> > >  Understood.
> >
> > >
> > >
> > >
> > > Requirements based on use cases
> > >
> > > *       Limit the native memory usage when iterating through objects
> > >
> > >        *       Eliminate the need for an additional object tag
> > > requiring native memory -> this can exhaust native memory when
> having
> > > millions of objects
> > >        *       Instead of using the Tag use Object Ptr
> > >        *       Must ensure that the ObjectPtr stays constant (no
> > > objects are moved) throughout the iteration operation
> > >
> > >
> > *       Enable JVMTI Interface access to iterate through Heap Objects
> in
> > > case of Resource Exhaustion (OutOfMemory)
> > >
> > >        *       Having full access to all Object Heap Interface
> > > functions allows us to capture this information in case of an OOM
> > >        *       Also have access to JVMTI interfaces for capturing
> > stack
> > > traces
> > >        *       Can some part of this information also be made
> > available
> > > in terms of a more severe JVM crash?
> > >
> > > *       Native Interface for memory dump generation
> > >
> > > Thats an interesting idea -  we were expecting to provide a Java API
> > to do
> > that .  Having a native version could easily make sense.
> >
> > AG: Native would be our preference
> >
> >
> > >        *       In order to centrally manage memory dumps we need to
> be
> > > able to do it via a native interface within the JVM
> > >
> >
> > I don't understand why you need to manage dumps via a a native
> > interface?
> >
> > AG: We have an agent that lives in the JVM. This agent sends memory
> > information to our central dynaTrace Server. This allows us to do
> > central management of all connected JVMs. I mentioned earlier that
> > working via dump files doesn't always work with our clients (security
> > policies, disk space, ...). As there is JVMTI already - why not extend
> > this API? We are also OK with a JavaAPI as long as it works within the
> > JVM and not on Dump Files and as long as the performance is not a
> > problem (compared to a native implementation)
> >
> >
> hmm  - ok so we do need to be careful about not straying into the world
> of
> tracing.  The JSR is intended to cover static data sets: even if they
> are
> taken very
> frequently :-).     The data doesn't have to reside on a dump file -  it
> could of course
> just be sent down the wire to a remote colection point.
>
> My concern is that at this point specifing additions to JVMTI  to
> improve
> its performance or design is too
> early.  I appreciate that you want to see JVMTI improved but we need to
> move
> the discussion up a level and focus  on the  externals (and leave the
> implementors the choice of how they make it work)
> If the way to make a sensible solution ends up  requiring new or
> improved
> native level APIs then thats fine.
>
>
> My understanding is that you do the following -
>
> collect data
> send it to a collection point
> analyse it
>
> (repeat)
>
> Some of the questions we should be asking are
>
> A)   What data is collected and how do you define the criteria
> B)   Whats types of analysis takes place and (apart from just accessing
> the
> data sent)  what types of cross collection information is required (I'm
> thinking about object corrolation )
> C)   How much data and how often is it required
>
> Its these sort of questions that will help shape the API, and help us
> drive
> down to how actually we need to imagine it being implemented.
>
>
>
> > >        *       JVMTI would be a perfect candidate assuming the
> > existing
> > > limitations can be addressed
> > >        *       Separate native interface would be an alternative
> > option
> > >
> > > Agree -  but in either case addressing the usage issues with JVMTI
> > will
> > come down to understanding why JVMTI looks like it does now and how
> > other
> > approaches may affect the runtime performance of a system.
> >
> > AG: Agreed. Performance is a big topic for us. Getting this kind of
> > information must work fast - nobody wants to wait hours to grab a
> > detailed memory snapshot.
> >
> > >
> > >
> > > Additional requirements (maybe not in the scope of this JSR)
> > >
> > >
> > I think these are all worth discussing -   if you use the info then we
> > should explore if it makes sense to specifiy it.l
> >
> >
> >
> > > *       Access to Objects in PermGen
> > > *       Generation information when iterating through objects
> > >
> > >        *       which generations are objects in that live in the
> heap
> > >
> > > *       Get access to Generation Sizes via JVMTI
> > >
> > >        *       Size information is available via JMX
> > >        *       so it should also be made available via the native
> > > interfaces
> > >
> > > *       Object Information on GC finished event
> > >
> > >        *       get information about how many objects have been
> > > moved/freed (either real object Id's or at least the size)
> > >        *       must be able to turn on/off this feature during
> runtime
> > > to keep overhead low when not needed
> > >
> > >
> > >
> > > Let me know if any of these use cases or requirements needs further
> > > explanation.
> > >
> > >
> > >
> > > Thanks
> > >
> > > Andi & Alois
> > >
> > >
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Alois Reitbauer
> > > Sent: Montag, 21. September 2009 17:01
> > > To: kato-spec@incubator.apache.org
> > > Cc: Andreas Grabner
> > > Subject: RE: "Snapshot" support
> > >
> > >
> > >
> > > Steve,
> > >
> > >
> > >
> > > we will be happy to contribute our use cases. I propose to start
> with
> > >
> > > memory dumps first and thread dumps later. Either me or Andi will
> come
> > >
> > > back with some concrete use cases.
> > >
> > >
> > >
> > > - Alois
> > >
> > >
> > >
> > > -----Original Message-----
> > >
> > > From: Steve Poole [mailto:spoole167@googlemail.com]
> > >
> > > Sent: Dienstag, 08. September 2009 06:31
> > >
> > > To: kato-spec@incubator.apache.org
> > >
> > > Subject: "Snapshot" support
> > >
> > >
> > >
> > > One of the capabilities that this API is intended to provide is
> > support
> > >
> > > for
> > >
> > > "Snapshots"
> > >
> > >
> > >
> > > This is  based on the idea that for various reasons the dumps that
> we
> > >
> > > can
> > >
> > > get today can be too big, take too long to generate , not have the
> > right
> > >
> > > information etc.
> > >
> > >
> > >
> > > Also we need to recognise that dumps are not only produced to help
> > >
> > > diagnose
> > >
> > > a failure.  Some users consume dumps as part of monitoring a live
> > >
> > > system.
> > >
> > >
> > >
> > > So we need to discuss (at least)
> > >
> > >
> > >
> > > a)  How dump content configuration would work
> > >
> > > b)  What sorts of data are needed in a snapshot dump
> > >
> > >
> > >
> > > This is the largest outstanding piece of the API.   Now with Alois
> and
> > >
> > > Andreas on board we can start to clarify usecases and  resolve the
> > >
> > > design
> > >
> > >
> > >
> > >
> > >
> > > Cheers
> > >
> > >
> > >
> > > Steve
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Steve
> >
> >
> >
>
>
> --
> Steve
>



-- 
Steve

RE: "Snapshot" support

Posted by Alois Reitbauer <al...@dynatrace.com>.

Steve,

good to see moving these requirements into the JSR. I also agree that
the interal structures are implementation specific. The vital point is
that tooling providers like us need a standardized way of accessing this
information form within the JVM. I think a lot the current problems are
caused because JVMTI/PI specify how a JVM vendor has to do certain
things instead of defining which data should be returned and how to work
with this data.

- Alois

-----Original Message-----
From: Steve Poole [mailto:spoole167@googlemail.com] 
Sent: Dienstag, 17. November 2009 10:31
To: kato-spec@incubator.apache.org
Subject: Re: "Snapshot" support

Andreas , thanks for this info.

 I don't think much of what you've described is outside the JSR scope at
the
moment.  The JSR can deal with standardising the collection of data
(including statistics)  and when dumps are triggered.   The form of the
data
produced  is intentionally  implementation specific but the data reading
API
would be part of the standard.

I think its worth splitting out  the topics I've outlined  into separate
threads for further discussion so I'll create a threads for discussing
statistics and  dump triggers



On Mon, Nov 16, 2009 at 6:42 PM, Andreas Grabner <
andreas.grabner@dynatrace.com> wrote:

> Thanks Steve. This email seems to grow a lot - so I placed your
current
> questions at the top - see my combined answer below
>
> A)   What data is collected and how do you define the criteria
> B)   Whats types of analysis takes place and (apart from just
accessing
> the
> data sent)  what types of cross collection information is required
(I'm
> thinking about object corrolation )
> C)   How much data and how often is it required
>
> Our goal is to collect heap information from a remote location (our
> collectors). We have two options in terms of the depth of data. Simple
> Dump only includes the number of instances per class. The Extended
Dump
> includes object references. Data collection is either triggered
> manually, scheduled (e.g.: every 30 minutes during a load-test) or
> triggered (in case of a certain event - e.g.: heavy memory usage). The
> collected "raw" data is sent from the JVM to the central collector
that
> is analyzing the raw data in terms of, e.g.: what are the most used
> classes, walking the referrer tree, ...
>
> Our challenges/requests are
> * that this is very slow for very large heap sizes as we see it with
our
> clients.
> * we don't get this information in case of a severe runtime problem,
> e.g.: outofmemory
> * additionally to that we want to get more information about the
> individual objects on the heap, e.g: in which generation they live in.
>
> Some of this exceeds the scope of this JSR - please advice on what we
> should keep in here and what should be discussed elsewhere
>
> Thanks
>
>
>
> -----Original Message-----
> From: Steve Poole [mailto:spoole167@googlemail.com]
> Sent: Mittwoch, 04. November 2009 12:11
> To: kato-spec@incubator.apache.org
> Subject: Re: "Snapshot" support
>
> Sorry Andreas - been out on vacation.    Made a quick reply below and
I
> will
> do a more detailed response later.
>
>
> On Wed, Nov 4, 2009 at 11:46 AM, Andreas Grabner <
> andreas.grabner@dynatrace.com> wrote:
>
> > Hi Steve - following up on my email I've sent out two weeks ago. See
> my
> > answers below starting with "AG:"
> >
> > -----Original Message-----
> > From: Andreas Grabner
> > Sent: Mittwoch, 21. Oktober 2009 10:41
> > To: kato-spec@incubator.apache.org
> > Subject: RE: "Snapshot" support
> >
> > Hi Steve - find my answers below (lines starting with "AG:")
> >
> > Let me know how you want to split up this thread as we are
discussing
> > multiple topics here
> >
> > thanks
> >
> > -----Original Message-----
> > From: Steve Poole [mailto:spoole167@googlemail.com]
> > Sent: Samstag, 17. Oktober 2009 08:11
> > To: kato-spec@incubator.apache.org
> > Subject: Re: "Snapshot" support
> >
> > On Wed, Oct 14, 2009 at 2:55 PM, Andreas Grabner <
> > andreas.grabner@dynatrace.com> wrote:
> >
> > > Steve
> > >
> > > Thanks Andreas -  this is good stuff.  Questions below.
> >
> > I propose we continue to  discuss  on this thread but with the aim
of
> > pulling out the top level items as we go. For instance you've
> obviously
> > got
> > more requirements on JVMTI and we should pull that out as a separate
> > thread.    I'll do that once you've answered a few of my questions
> > below.
> >
> > Thanks  again
> >
> >
> > >
> > > I am following up on Alois' email with some uses cases that we
have
> > with
> > > our clients. Based on those use cases we also derived
requirements.
> > >
> > >
> > >
> > > Use Case: Really Large Memory Dumps don't Scale
> > >
> > > Most of our enterprise customers run their applications on 64Bit
> > Systems
> > > with JVM's having > 1.5GB Heap Space.
> > >
> > >
> > Do you have info  on what the latest heap size  is that you've
> > encountered?
> >
> > AG: We have seen heap sizes of 8GB
> >
> > Iterating through Java Objects on the Heap doesn't scale with
growing
> > > Heap Sizes. Due to the object tagging that creates a tag for every
> > > object on the heap we quickly exhaust the native memory.
> > >
> > > Do you have to tag all objects?
> >
> > AG: Our Memory Snapshot feature visualizes the referral tree of
> objects.
> > I believe that for this we need to create tags for each individual
> > object in order to get the reference information - unless there is
> > another way of walking the referrer tree that we are not aware of??
> >
> > For the RI and the JVMTI based dump we've sidestepped the problem by
> working from the threads and their object references
> That means we don't have to use the tagging to find objects but it
does
> mean
> you have to keep  track of ones you've seen
> before.
>
>
>
> > > Using the current JVMTI/PI APIs doesn't allow us to iterate over
the
> > > heap for large heaps in a timely manner or without running into
> memory
> > > issues -> Large Memory Dumps are often not possible!!
> > >
> >
> > > Use Case: Provide full context information in OutOfMemory
situation
> > >
> > > Capturing dump information in case of an OutOfMemory exception is
> key
> > to
> > > understand the root cause of this event.
> > >
> > > Access to the JVMTI interfaces to iterate the objects on the heap
is
> > not
> > > possible at this point in time which makes it impossible to
collect
> > heap
> > > information in the same way as when creating a dump during
"normal"
> > > runtime execution
> > >
> > > Therefore no detailed memory dumps can be made in the eventhandler
> of
> > an
> > > OutOfMemory exception!!
> > >
> >
> > I don't understand this -   the Eclipse MAT tool does detailed OOM
> > analysis
> > and that uses the HPROF file for Sun JVMs and other dumps for IBM
JVMs
> -
> > do
> > you  extra requirements beyond what MAT can offer?
> >
> > AG: In the next use case I explain why we prefer to not use dump
files
> > for analysis but rather have a "central approach" where our agent
> (that
> > sits in the JVM) can grab all the information needed to perform OOM
> > Analysis. In large distributed environments its not feasible to
start
> > collecting log files from different servers. With our agent
technology
> > we can collect this data from within the JVM and send it off to our
> > central server that manages all JVMs in the System
> >
> >
> > >
> > >
> > > Use Case: Central Management of Memory Dumps in Distributed
> Enterprise
> > > Applications
> > >
> > > Most of our enterprise customers run their distributed
applications
> on
> > > multiple servers hosting multiple JVM's. Creating and analyzing
> memory
> > > dumps in these distributed scenarios must be central manageable.
> > >
> > > Creating dump files and storing them on the server machines is not
a
> > > perfect solution because
> > >
> > > a)       Getting access to dump files on servers is often
restricted
> > by
> > > security policies
> > >
> > > b)       Local Disk Space required
> > >
> > > Therefore a dump file approach is not an option for most of our
> > > customers!!
> > >
> > >  Understood.
> >
> > >
> > >
> > >
> > > Requirements based on use cases
> > >
> > > *       Limit the native memory usage when iterating through
objects
> > >
> > >        *       Eliminate the need for an additional object tag
> > > requiring native memory -> this can exhaust native memory when
> having
> > > millions of objects
> > >        *       Instead of using the Tag use Object Ptr
> > >        *       Must ensure that the ObjectPtr stays constant (no
> > > objects are moved) throughout the iteration operation
> > >
> > >
> > *       Enable JVMTI Interface access to iterate through Heap
Objects
> in
> > > case of Resource Exhaustion (OutOfMemory)
> > >
> > >        *       Having full access to all Object Heap Interface
> > > functions allows us to capture this information in case of an OOM
> > >        *       Also have access to JVMTI interfaces for capturing
> > stack
> > > traces
> > >        *       Can some part of this information also be made
> > available
> > > in terms of a more severe JVM crash?
> > >
> > > *       Native Interface for memory dump generation
> > >
> > > Thats an interesting idea -  we were expecting to provide a Java
API
> > to do
> > that .  Having a native version could easily make sense.
> >
> > AG: Native would be our preference
> >
> >
> > >        *       In order to centrally manage memory dumps we need
to
> be
> > > able to do it via a native interface within the JVM
> > >
> >
> > I don't understand why you need to manage dumps via a a native
> > interface?
> >
> > AG: We have an agent that lives in the JVM. This agent sends memory
> > information to our central dynaTrace Server. This allows us to do
> > central management of all connected JVMs. I mentioned earlier that
> > working via dump files doesn't always work with our clients
(security
> > policies, disk space, ...). As there is JVMTI already - why not
extend
> > this API? We are also OK with a JavaAPI as long as it works within
the
> > JVM and not on Dump Files and as long as the performance is not a
> > problem (compared to a native implementation)
> >
> >
> hmm  - ok so we do need to be careful about not straying into the
world
> of
> tracing.  The JSR is intended to cover static data sets: even if they
> are
> taken very
> frequently :-).     The data doesn't have to reside on a dump file -
it
> could of course
> just be sent down the wire to a remote colection point.
>
> My concern is that at this point specifing additions to JVMTI  to
> improve
> its performance or design is too
> early.  I appreciate that you want to see JVMTI improved but we need
to
> move
> the discussion up a level and focus  on the  externals (and leave the
> implementors the choice of how they make it work)
> If the way to make a sensible solution ends up  requiring new or
> improved
> native level APIs then thats fine.
>
>
> My understanding is that you do the following -
>
> collect data
> send it to a collection point
> analyse it
>
> (repeat)
>
> Some of the questions we should be asking are
>
> A)   What data is collected and how do you define the criteria
> B)   Whats types of analysis takes place and (apart from just
accessing
> the
> data sent)  what types of cross collection information is required
(I'm
> thinking about object corrolation )
> C)   How much data and how often is it required
>
> Its these sort of questions that will help shape the API, and help us
> drive
> down to how actually we need to imagine it being implemented.
>
>
>
> > >        *       JVMTI would be a perfect candidate assuming the
> > existing
> > > limitations can be addressed
> > >        *       Separate native interface would be an alternative
> > option
> > >
> > > Agree -  but in either case addressing the usage issues with JVMTI
> > will
> > come down to understanding why JVMTI looks like it does now and how
> > other
> > approaches may affect the runtime performance of a system.
> >
> > AG: Agreed. Performance is a big topic for us. Getting this kind of
> > information must work fast - nobody wants to wait hours to grab a
> > detailed memory snapshot.
> >
> > >
> > >
> > > Additional requirements (maybe not in the scope of this JSR)
> > >
> > >
> > I think these are all worth discussing -   if you use the info then
we
> > should explore if it makes sense to specifiy it.l
> >
> >
> >
> > > *       Access to Objects in PermGen
> > > *       Generation information when iterating through objects
> > >
> > >        *       which generations are objects in that live in the
> heap
> > >
> > > *       Get access to Generation Sizes via JVMTI
> > >
> > >        *       Size information is available via JMX
> > >        *       so it should also be made available via the native
> > > interfaces
> > >
> > > *       Object Information on GC finished event
> > >
> > >        *       get information about how many objects have been
> > > moved/freed (either real object Id's or at least the size)
> > >        *       must be able to turn on/off this feature during
> runtime
> > > to keep overhead low when not needed
> > >
> > >
> > >
> > > Let me know if any of these use cases or requirements needs
further
> > > explanation.
> > >
> > >
> > >
> > > Thanks
> > >
> > > Andi & Alois
> > >
> > >
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Alois Reitbauer
> > > Sent: Montag, 21. September 2009 17:01
> > > To: kato-spec@incubator.apache.org
> > > Cc: Andreas Grabner
> > > Subject: RE: "Snapshot" support
> > >
> > >
> > >
> > > Steve,
> > >
> > >
> > >
> > > we will be happy to contribute our use cases. I propose to start
> with
> > >
> > > memory dumps first and thread dumps later. Either me or Andi will
> come
> > >
> > > back with some concrete use cases.
> > >
> > >
> > >
> > > - Alois
> > >
> > >
> > >
> > > -----Original Message-----
> > >
> > > From: Steve Poole [mailto:spoole167@googlemail.com]
> > >
> > > Sent: Dienstag, 08. September 2009 06:31
> > >
> > > To: kato-spec@incubator.apache.org
> > >
> > > Subject: "Snapshot" support
> > >
> > >
> > >
> > > One of the capabilities that this API is intended to provide is
> > support
> > >
> > > for
> > >
> > > "Snapshots"
> > >
> > >
> > >
> > > This is  based on the idea that for various reasons the dumps that
> we
> > >
> > > can
> > >
> > > get today can be too big, take too long to generate , not have the
> > right
> > >
> > > information etc.
> > >
> > >
> > >
> > > Also we need to recognise that dumps are not only produced to help
> > >
> > > diagnose
> > >
> > > a failure.  Some users consume dumps as part of monitoring a live
> > >
> > > system.
> > >
> > >
> > >
> > > So we need to discuss (at least)
> > >
> > >
> > >
> > > a)  How dump content configuration would work
> > >
> > > b)  What sorts of data are needed in a snapshot dump
> > >
> > >
> > >
> > > This is the largest outstanding piece of the API.   Now with Alois
> and
> > >
> > > Andreas on board we can start to clarify usecases and  resolve the
> > >
> > > design
> > >
> > >
> > >
> > >
> > >
> > > Cheers
> > >
> > >
> > >
> > > Steve
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Steve
> >
> >
> >
>
>
> --
> Steve
>



-- 
Steve

RE: "Snapshot" support

Posted by Andreas Grabner <an...@dynatrace.com>.

Thanks Steve. This email seems to grow a lot - so I placed your current
questions at the top - see my combined answer below

A)   What data is collected and how do you define the criteria
B)   Whats types of analysis takes place and (apart from just accessing
the
data sent)  what types of cross collection information is required (I'm
thinking about object corrolation )
C)   How much data and how often is it required

Our goal is to collect heap information from a remote location (our
collectors). We have two options in terms of the depth of data. Simple
Dump only includes the number of instances per class. The Extended Dump
includes object references. Data collection is either triggered
manually, scheduled (e.g.: every 30 minutes during a load-test) or
triggered (in case of a certain event - e.g.: heavy memory usage). The
collected "raw" data is sent from the JVM to the central collector that
is analyzing the raw data in terms of, e.g.: what are the most used
classes, walking the referrer tree, ...

Our challenges/requests are
* that this is very slow for very large heap sizes as we see it with our
clients. 
* we don't get this information in case of a severe runtime problem,
e.g.: outofmemory
* additionally to that we want to get more information about the
individual objects on the heap, e.g: in which generation they live in. 

Some of this exceeds the scope of this JSR - please advice on what we
should keep in here and what should be discussed elsewhere

Thanks



-----Original Message-----
From: Steve Poole [mailto:spoole167@googlemail.com] 
Sent: Mittwoch, 04. November 2009 12:11
To: kato-spec@incubator.apache.org
Subject: Re: "Snapshot" support

Sorry Andreas - been out on vacation.    Made a quick reply below and I
will
do a more detailed response later.


On Wed, Nov 4, 2009 at 11:46 AM, Andreas Grabner <
andreas.grabner@dynatrace.com> wrote:

> Hi Steve - following up on my email I've sent out two weeks ago. See
my
> answers below starting with "AG:"
>
> -----Original Message-----
> From: Andreas Grabner
> Sent: Mittwoch, 21. Oktober 2009 10:41
> To: kato-spec@incubator.apache.org
> Subject: RE: "Snapshot" support
>
> Hi Steve - find my answers below (lines starting with "AG:")
>
> Let me know how you want to split up this thread as we are discussing
> multiple topics here
>
> thanks
>
> -----Original Message-----
> From: Steve Poole [mailto:spoole167@googlemail.com]
> Sent: Samstag, 17. Oktober 2009 08:11
> To: kato-spec@incubator.apache.org
> Subject: Re: "Snapshot" support
>
> On Wed, Oct 14, 2009 at 2:55 PM, Andreas Grabner <
> andreas.grabner@dynatrace.com> wrote:
>
> > Steve
> >
> > Thanks Andreas -  this is good stuff.  Questions below.
>
> I propose we continue to  discuss  on this thread but with the aim of
> pulling out the top level items as we go. For instance you've
obviously
> got
> more requirements on JVMTI and we should pull that out as a separate
> thread.    I'll do that once you've answered a few of my questions
> below.
>
> Thanks  again
>
>
> >
> > I am following up on Alois' email with some uses cases that we have
> with
> > our clients. Based on those use cases we also derived requirements.
> >
> >
> >
> > Use Case: Really Large Memory Dumps don't Scale
> >
> > Most of our enterprise customers run their applications on 64Bit
> Systems
> > with JVM's having > 1.5GB Heap Space.
> >
> >
> Do you have info  on what the latest heap size  is that you've
> encountered?
>
> AG: We have seen heap sizes of 8GB
>
> Iterating through Java Objects on the Heap doesn't scale with growing
> > Heap Sizes. Due to the object tagging that creates a tag for every
> > object on the heap we quickly exhaust the native memory.
> >
> > Do you have to tag all objects?
>
> AG: Our Memory Snapshot feature visualizes the referral tree of
objects.
> I believe that for this we need to create tags for each individual
> object in order to get the reference information - unless there is
> another way of walking the referrer tree that we are not aware of??
>
> For the RI and the JVMTI based dump we've sidestepped the problem by
working from the threads and their object references
That means we don't have to use the tagging to find objects but it does
mean
you have to keep  track of ones you've seen
before.



> > Using the current JVMTI/PI APIs doesn't allow us to iterate over the
> > heap for large heaps in a timely manner or without running into
memory
> > issues -> Large Memory Dumps are often not possible!!
> >
>
> > Use Case: Provide full context information in OutOfMemory situation
> >
> > Capturing dump information in case of an OutOfMemory exception is
key
> to
> > understand the root cause of this event.
> >
> > Access to the JVMTI interfaces to iterate the objects on the heap is
> not
> > possible at this point in time which makes it impossible to collect
> heap
> > information in the same way as when creating a dump during "normal"
> > runtime execution
> >
> > Therefore no detailed memory dumps can be made in the eventhandler
of
> an
> > OutOfMemory exception!!
> >
>
> I don't understand this -   the Eclipse MAT tool does detailed OOM
> analysis
> and that uses the HPROF file for Sun JVMs and other dumps for IBM JVMs
-
> do
> you  extra requirements beyond what MAT can offer?
>
> AG: In the next use case I explain why we prefer to not use dump files
> for analysis but rather have a "central approach" where our agent
(that
> sits in the JVM) can grab all the information needed to perform OOM
> Analysis. In large distributed environments its not feasible to start
> collecting log files from different servers. With our agent technology
> we can collect this data from within the JVM and send it off to our
> central server that manages all JVMs in the System
>
>
> >
> >
> > Use Case: Central Management of Memory Dumps in Distributed
Enterprise
> > Applications
> >
> > Most of our enterprise customers run their distributed applications
on
> > multiple servers hosting multiple JVM's. Creating and analyzing
memory
> > dumps in these distributed scenarios must be central manageable.
> >
> > Creating dump files and storing them on the server machines is not a
> > perfect solution because
> >
> > a)       Getting access to dump files on servers is often restricted
> by
> > security policies
> >
> > b)       Local Disk Space required
> >
> > Therefore a dump file approach is not an option for most of our
> > customers!!
> >
> >  Understood.
>
> >
> >
> >
> > Requirements based on use cases
> >
> > *       Limit the native memory usage when iterating through objects
> >
> >        *       Eliminate the need for an additional object tag
> > requiring native memory -> this can exhaust native memory when
having
> > millions of objects
> >        *       Instead of using the Tag use Object Ptr
> >        *       Must ensure that the ObjectPtr stays constant (no
> > objects are moved) throughout the iteration operation
> >
> >
> *       Enable JVMTI Interface access to iterate through Heap Objects
in
> > case of Resource Exhaustion (OutOfMemory)
> >
> >        *       Having full access to all Object Heap Interface
> > functions allows us to capture this information in case of an OOM
> >        *       Also have access to JVMTI interfaces for capturing
> stack
> > traces
> >        *       Can some part of this information also be made
> available
> > in terms of a more severe JVM crash?
> >
> > *       Native Interface for memory dump generation
> >
> > Thats an interesting idea -  we were expecting to provide a Java API
> to do
> that .  Having a native version could easily make sense.
>
> AG: Native would be our preference
>
>
> >        *       In order to centrally manage memory dumps we need to
be
> > able to do it via a native interface within the JVM
> >
>
> I don't understand why you need to manage dumps via a a native
> interface?
>
> AG: We have an agent that lives in the JVM. This agent sends memory
> information to our central dynaTrace Server. This allows us to do
> central management of all connected JVMs. I mentioned earlier that
> working via dump files doesn't always work with our clients (security
> policies, disk space, ...). As there is JVMTI already - why not extend
> this API? We are also OK with a JavaAPI as long as it works within the
> JVM and not on Dump Files and as long as the performance is not a
> problem (compared to a native implementation)
>
>
hmm  - ok so we do need to be careful about not straying into the world
of
tracing.  The JSR is intended to cover static data sets: even if they
are
taken very
frequently :-).     The data doesn't have to reside on a dump file -  it
could of course
just be sent down the wire to a remote colection point.

My concern is that at this point specifing additions to JVMTI  to
improve
its performance or design is too
early.  I appreciate that you want to see JVMTI improved but we need to
move
the discussion up a level and focus  on the  externals (and leave the
implementors the choice of how they make it work)
If the way to make a sensible solution ends up  requiring new or
improved
native level APIs then thats fine.


My understanding is that you do the following -

collect data
send it to a collection point
analyse it

(repeat)

Some of the questions we should be asking are

A)   What data is collected and how do you define the criteria
B)   Whats types of analysis takes place and (apart from just accessing
the
data sent)  what types of cross collection information is required (I'm
thinking about object corrolation )
C)   How much data and how often is it required

Its these sort of questions that will help shape the API, and help us
drive
down to how actually we need to imagine it being implemented.



> >        *       JVMTI would be a perfect candidate assuming the
> existing
> > limitations can be addressed
> >        *       Separate native interface would be an alternative
> option
> >
> > Agree -  but in either case addressing the usage issues with JVMTI
> will
> come down to understanding why JVMTI looks like it does now and how
> other
> approaches may affect the runtime performance of a system.
>
> AG: Agreed. Performance is a big topic for us. Getting this kind of
> information must work fast - nobody wants to wait hours to grab a
> detailed memory snapshot.
>
> >
> >
> > Additional requirements (maybe not in the scope of this JSR)
> >
> >
> I think these are all worth discussing -   if you use the info then we
> should explore if it makes sense to specifiy it.l
>
>
>
> > *       Access to Objects in PermGen
> > *       Generation information when iterating through objects
> >
> >        *       which generations are objects in that live in the
heap
> >
> > *       Get access to Generation Sizes via JVMTI
> >
> >        *       Size information is available via JMX
> >        *       so it should also be made available via the native
> > interfaces
> >
> > *       Object Information on GC finished event
> >
> >        *       get information about how many objects have been
> > moved/freed (either real object Id's or at least the size)
> >        *       must be able to turn on/off this feature during
runtime
> > to keep overhead low when not needed
> >
> >
> >
> > Let me know if any of these use cases or requirements needs further
> > explanation.
> >
> >
> >
> > Thanks
> >
> > Andi & Alois
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Alois Reitbauer
> > Sent: Montag, 21. September 2009 17:01
> > To: kato-spec@incubator.apache.org
> > Cc: Andreas Grabner
> > Subject: RE: "Snapshot" support
> >
> >
> >
> > Steve,
> >
> >
> >
> > we will be happy to contribute our use cases. I propose to start
with
> >
> > memory dumps first and thread dumps later. Either me or Andi will
come
> >
> > back with some concrete use cases.
> >
> >
> >
> > - Alois
> >
> >
> >
> > -----Original Message-----
> >
> > From: Steve Poole [mailto:spoole167@googlemail.com]
> >
> > Sent: Dienstag, 08. September 2009 06:31
> >
> > To: kato-spec@incubator.apache.org
> >
> > Subject: "Snapshot" support
> >
> >
> >
> > One of the capabilities that this API is intended to provide is
> support
> >
> > for
> >
> > "Snapshots"
> >
> >
> >
> > This is  based on the idea that for various reasons the dumps that
we
> >
> > can
> >
> > get today can be too big, take too long to generate , not have the
> right
> >
> > information etc.
> >
> >
> >
> > Also we need to recognise that dumps are not only produced to help
> >
> > diagnose
> >
> > a failure.  Some users consume dumps as part of monitoring a live
> >
> > system.
> >
> >
> >
> > So we need to discuss (at least)
> >
> >
> >
> > a)  How dump content configuration would work
> >
> > b)  What sorts of data are needed in a snapshot dump
> >
> >
> >
> > This is the largest outstanding piece of the API.   Now with Alois
and
> >
> > Andreas on board we can start to clarify usecases and  resolve the
> >
> > design
> >
> >
> >
> >
> >
> > Cheers
> >
> >
> >
> > Steve
> >
> >
> >
> >
> >
> >
>
>
> --
> Steve
>
>
>


-- 
Steve