You are viewing a plain text version of this content. The canonical link for it is here.

Posted to kato-spec@incubator.apache.org by Steve Poole <sp...@googlemail.com> on 2010/01/13 21:03:49 UTC

JSR 326 and Apache Kato - A "state of the nation" examination

Greetings all,

Discussions this year have got off to a good start and  we're also really
close to providing  that first driver which contains the changes we've
discussed over time.    With that in mind I think its worth examining  the
past, present and future of this work.

*A brief recap *

We've been working on this JSR for sometime - since 5 August
2008<http://jcp.org/en/jsr/detail?id=326>to be precise.

At the start of the project we expected to be able to  develop , what I
called the "legs" , under the code contributed  by IBM.  These "legs" were
intended to map the API to the dumps that were available from a Sun JVM -
including being able to read Hotspot data from a core file.    We also
expected to drive quickly towards discussing the form of the future - how to
deal with titanic dumps and how not to have dumps at all.

Most of this didn't happen.  We did write an HPROF reader but we didn't
manage to develop a core file reader for the Hotspot JVM.   In that regard
we also examined the Serviceability Agent
API<http://www.usenix.org/events/jvm01/full_papers/russell/russell_html/index.html>but
there were too many restrictions on use and operating environment.
It
turned out that it was not feasible for Apache Kato to develop a corefile
reader for Hotspot due to licensing issues and more importantly, lack of
skills in Hotspot.

At that point we were somewhat stuck (I did discuss this problem privately
with various JVM vendors but we did not reach a resolution)

All was not lost - we wrote a prototype (in python!)  of a new dump that
used JVMTI.  The dump was the first to contain local variables. We hooked it
up to the Java debugger through our JDI connector  to show that you could
use a familiar interface to analyse your problem.  Java
DBX<http://en.wikipedia.org/wiki/Dbx_%28debugger%29>for corefiles had
arrived.

We also tacked on a JXPath <http://commons.apache.org/jxpath/> based layer
(now in the KatoView tool) that allowed you to query the API without writing
reams of code.

We took  JSR 326 to San Francisco and showed people what we had at JavaOne
BOF4870 <http://cwiki.apache.org/confluence/display/KATO/BOF4870>   and I
got to meet a few of you face to face for the first time.

Afterward JavaOne we rewrote the python prototype in C and started to bring
the first Early Draft
Review<http://cwiki.apache.org/KATO/jsr326specification.data/jsr326-edr-1-2009-08-21.pdf>
together,  although it took a long time to get the EDR on to the JCP site.
Mostly my learning of a new process and dealing with a  licensing concern
where I learned about the concept of "collective
copyright"<http://en.wikipedia.org/wiki/Copyright_collective>

After the EDR was out we started work on the first code release from Apache
Kato (all new stuff to learn). We still hadn't resolved the mismatch between
what data the API said it could offer and our inability to provide said data
(ie no hotspot support).  The answer was to  factor out the relationship
between Java entities and native code entities and make it optional.  Now
those dumps that know nothing about processes or address spaces or even
pointers are not required to fake them.

Finally , and quite recently,  we added in to the API the first attempt at a
standard dump trigger mechanism and we added an additional dump type  that
will help us as we develop the snapshot and optionality designs.

*Today *

Lets look to the present.  its January 2010 and there is a foot of snow
outside my window,  which is unusual for where I live.  What else is unusual
is that we have an Expert Group which has been so very quiet.   It's time to
examine our situation and discuss what else  it is that we need to do to
make this project a success.

 At the highest level we need at least 4 things

   1.  A design that will address our requirements.
   2.  A matching implemention that supports a high percentage of this
   design
   3.  Adoption by JVM vendors
   4.  A user community



*Design*

Do you know what our requirements are?  The original proposal for kato is
here <http://wiki.apache.org/incubator/KatoProposal>  and the JSR is
here<http://jcp.org/en/jsr/detail?id=326>

Are these documents saying  what you expected and want?     The  Early Draft
Review<http://cwiki.apache.org/KATO/jsr326specification.data/jsr326-edr-1-2009-08-21.pdf>
outlines more.


*Implementation *

We're going to provide a binary driver as soon as we possibly can for you
all to use - but you can check out the code and try building and using it
now.  We still have a  technical hurdle, We are hampered by  our inability
to make JVM modifications if necessary.  How should we resolve this?
Remember that we have to be able to provide a Reference Implementation to
match the specification.  We can legitimately justify having some edge
conditions that are not implemented but its no use to anyone if key parts of
the API are not implemented.  Having said that it is reasonable to consider
a middle ground where we specify a new JVM interface that we require to be
provided by JVM vendors.  It depends on technical circumstances but that
approach has more flexibility in implementation - its likely going  to be
easier to ask a JVM vendor to provide data to a new standardized  API.   My
current thinking is that for now we minimise this situation as much as
possible and live with slower implementations  at least until we've resolved
the outstanding questions of adoption by JVM Vendors.

I think we've come to realize that the desire to be able to extract
information about a Hotspot JVM from a corefile is not going to happen and
is actually not necessary.  We've said right from the beginning that dumps
sizes are growing and  we need to consider have smaller dumps.  Rather than
finding a way to read Hotspot data from a corefile we just move directly to
defining  and implementing what a Snapshot Dump mechanism really is.   My
expectation is that we will only need JVM support for a yet to be designed
low level  API which we can use to extract information from a running JVM. I
really don't know what form that API would take - it might be something like
JVMTI , it might be a set of native methods - or it may just be new Java
classes that the JVM Vendor replaces.

What drives this discussion and hence defines what we need from the JVM
vendors comes from  having the Snapshot concept clear in everyone's head.
Since this is new to everyone I want to provide an implementation that
embodies the concepts as a soon as possible so we can argue this through
from a practical hands-on approach.


*Adoption by JVM Vendors*

Adoption by JVM vendors - and by that we mainly mean Sun and Oracle since
IBM already has a similar implementation -  is predicated on usefulness and
a need to have JVM specific code.  If there is no requirement for JVM
specific changes then adoption is  not really an issue.  If we have to have
JVM changes (and we will in the end)  then we need to either have Sun/Oracle
or another JVM vendor develop these JVM changes. Otherwise  we have to find
a 3rd party who is willing to develop a GPL licensed extension to OpenJDK to
support our requirements.

We're going to have to wait a few weeks until the Oracle/Sun acquisition is
completed before we can expect to get a sensible answer to the first
question.   Its also possible  that we could go straight to the OpenJDK folk
and see if they wanted to play.   In either case though we would need to
have a good idea on the type of JVM changes and/or new  data access  we
need.

*User Community *

We need to agree who our users actually are.  I know that there are various
views but lets get it clear.  My view is that our users are the tools
vendors and more expert application programmers out there.  This API may
make life easier for the JVM vendor but only in passing.  The major
objective is to help programmers solve their own problems not to help JVM
vendors fix bugs in the JVM.   Do you agree?

What else makes a user community?  Having something to use is high up the
list.  We need to get what we have out the door and being used.   Its not as
simple as that of course -we need documentation and usage examples and most
importantly we need to be able offer a compelling reason for using our
API.   Right now we're light on all of these.



*What the future holds*

I can't say how much I really appreciate all the time and effort that people
have expended so far on this project. It is a shame that we've not had the
larger buy-in where we expected it but that may change.  I intend to keep
asking.

Right now though now I need to ask more of you : you as a subscriber to this
mailing list , you as a member of the Expert Group, you as a contributor or
committer to Apache Kato and you as a potential user of JSR 326.  I need you
to  tell me if we are on the right track,  are we going in the right
direction or not?  If we are doing what you expected say so as well: its
good to get confirmation.  If we're not addressing the issues you consider
need to be talked about - say so.  If you can help with documentation,
use-cases, evangelism,  coding, testing,  or anything - just say so.


In my view the future of this project  ranges from being *just* the place
where a new modern replacement for HPROF is developed all the way through to
delivering on those objectives we set ourselves in 2008.  I need your help
and active involvement right now  in determining our actual future.

Thanks

-- 
Steve

Re: JSR 326 and Apache Kato - A "state of the nation" examination

Posted by Stuart Monteith <st...@stoo.me.uk>.

My 2p too...


David Griffiths wrote:
> Hi Steve, this is my 2p (and definitely not IBM's 2p):
>
> I think you're heading off in the wrong direction. In one of your
> messages you said that developers didn't initially "get" the concept
> of using dumps for debugging their application problems. I'm with
> those developers. I think application developers already have a wealth
> of tools to assist with debugging and profiling their apps and they
> should continue to use those.
>
>    
Part of the value of Kato is that it should allow you to examine problem 
state without attaching a debugger
or profiling your application (server?) which it is running.
> The gap in the market that I think Kato should be addressing is
> analyzing post-mortem dumps in a production environment. First-failure
> data capture (FFDC) dumps where you don't always have the opportunity
> to set all the options to give you exactly the type of dump or trace
> you'd like. And I think you should be targeting the "image" part of
> the API as much as the Java part. Give us access to as much info as
> possible to debug a problem.
>
>    
I agree with this. I always envisaged FFDC as a major use for DTFJ and Kato.
I'm more sceptical about the Image part of the API, as any given general 
pure Java problem
isn't going to manifest itself in the Image API. Having said that, I do 
see it's use, but I don't believe
it is the first priority, and is the furtherest  from an Java 
application developer's view.
> This is the background of Kato. It is descended from DTFJ which is an
> internal IBM API for analyzing dumps. It is not limited to being used
> just by JVM service people to solve bugs in the JVM. Dan Julin can
> vouch for the fact that DTFJ is used widely by WebSphere to debug
> WebSphere application level issues. In fact Dan is one of the main
> customers for DTFJ.
>
>    
The DTFJ API is not an intenal API. It is documented in the Diagnostics 
Guide and is shipped
and supported in IBM's JVMs.
> I think you should forget about trying to define your own snapshot
> dump format and concentrate instead on providing access to core dumps
> which already contain all the information and more that we need. The
> support for analyzing core files is poor. It's crazy that so many
> people are still using gdb/dbx/etc rather than pure Java solutions.
>    
I think we should be working on core file readers as they can solve the 
majority of usecases.
But there are downsides. For one, the Sun HotSpot JVM is at best GPL 
licensed, which is incompatible
and puts us at risk, and secondly, without active involvement from Sun, 
is unlikely to be maintainable.
The DRLVM would be an interesting direction to go, but there are doubts 
about it's adoption and it's future.
> I think Kato should be mainly an API with maybe a reference
> implementation for some JVM on Linux. I don't understand why HotSpot
> is such an issue. It's up to either Sun or some third-party to provide
> a binary implementation of the Kato API for HotSpot core files. This
> should not be difficult to achieve.
>
>    
The issue it that we are developing a JSR as well as this an Apache 
Incubator project. For this we
need a specification, a TCK, and most importantly, a reference 
implementation. The reason for the
interest in the Hotspot JVM is that it is the JVM with the most market 
penetration.

> Easy problems already have easy solutions and plenty of them. It's the
> big complex production environments I think we should be targeting.
> The demand to analyze core dumps is there, what's missing is a
> JVM-neutral solution.
>
>    
This project's goals have evolved during it's lifetime, through either 
the changing constraints or through the
feedback that was received from various parties. As a result this isn't 
simply open sourced DTFJ. We do want
there to be interest in the project, so solving the problems in the "big 
complex production environments" may be less
of a priority compared to the everyday simple problems developers may 
find on their desktop. It seems we might have
to solve the latter before there'll be enough interest in the former.

Regards,
     Stuart


> Cheers,
>
> Dave
>
> On Wed, Jan 13, 2010 at 9:03 PM, Steve Poole<sp...@googlemail.com>  wrote:
>    
>> Greetings all,
>>
>> Discussions this year have got off to a good start and  we're also really
>> close to providing  that first driver which contains the changes we've
>> discussed over time.    With that in mind I think its worth examining  the
>> past, present and future of this work.
>>
>> *A brief recap *
>>
>> We've been working on this JSR for sometime - since 5 August
>> 2008<http://jcp.org/en/jsr/detail?id=326>to be precise.
>>
>> At the start of the project we expected to be able to  develop , what I
>> called the "legs" , under the code contributed  by IBM.  These "legs" were
>> intended to map the API to the dumps that were available from a Sun JVM -
>> including being able to read Hotspot data from a core file.    We also
>> expected to drive quickly towards discussing the form of the future - how to
>> deal with titanic dumps and how not to have dumps at all.
>>
>> Most of this didn't happen.  We did write an HPROF reader but we didn't
>> manage to develop a core file reader for the Hotspot JVM.   In that regard
>> we also examined the Serviceability Agent
>> API<http://www.usenix.org/events/jvm01/full_papers/russell/russell_html/index.html>but
>> there were too many restrictions on use and operating environment.
>> It
>> turned out that it was not feasible for Apache Kato to develop a corefile
>> reader for Hotspot due to licensing issues and more importantly, lack of
>> skills in Hotspot.
>>
>> At that point we were somewhat stuck (I did discuss this problem privately
>> with various JVM vendors but we did not reach a resolution)
>>
>> All was not lost - we wrote a prototype (in python!)  of a new dump that
>> used JVMTI.  The dump was the first to contain local variables. We hooked it
>> up to the Java debugger through our JDI connector  to show that you could
>> use a familiar interface to analyse your problem.  Java
>> DBX<http://en.wikipedia.org/wiki/Dbx_%28debugger%29>for corefiles had
>> arrived.
>>
>> We also tacked on a JXPath<http://commons.apache.org/jxpath/>  based layer
>> (now in the KatoView tool) that allowed you to query the API without writing
>> reams of code.
>>
>> We took  JSR 326 to San Francisco and showed people what we had at JavaOne
>> BOF4870<http://cwiki.apache.org/confluence/display/KATO/BOF4870>     and I
>> got to meet a few of you face to face for the first time.
>>
>> Afterward JavaOne we rewrote the python prototype in C and started to bring
>> the first Early Draft
>> Review<http://cwiki.apache.org/KATO/jsr326specification.data/jsr326-edr-1-2009-08-21.pdf>
>> together,  although it took a long time to get the EDR on to the JCP site.
>> Mostly my learning of a new process and dealing with a  licensing concern
>> where I learned about the concept of "collective
>> copyright"<http://en.wikipedia.org/wiki/Copyright_collective>
>>
>> After the EDR was out we started work on the first code release from Apache
>> Kato (all new stuff to learn). We still hadn't resolved the mismatch between
>> what data the API said it could offer and our inability to provide said data
>> (ie no hotspot support).  The answer was to  factor out the relationship
>> between Java entities and native code entities and make it optional.  Now
>> those dumps that know nothing about processes or address spaces or even
>> pointers are not required to fake them.
>>
>> Finally , and quite recently,  we added in to the API the first attempt at a
>> standard dump trigger mechanism and we added an additional dump type  that
>> will help us as we develop the snapshot and optionality designs.
>>
>> *Today *
>>
>> Lets look to the present.  its January 2010 and there is a foot of snow
>> outside my window,  which is unusual for where I live.  What else is unusual
>> is that we have an Expert Group which has been so very quiet.   It's time to
>> examine our situation and discuss what else  it is that we need to do to
>> make this project a success.
>>
>>   At the highest level we need at least 4 things
>>
>>    1.  A design that will address our requirements.
>>    2.  A matching implemention that supports a high percentage of this
>>    design
>>    3.  Adoption by JVM vendors
>>    4.  A user community
>>
>>
>>
>> *Design*
>>
>> Do you know what our requirements are?  The original proposal for kato is
>> here<http://wiki.apache.org/incubator/KatoProposal>    and the JSR is
>> here<http://jcp.org/en/jsr/detail?id=326>
>>
>> Are these documents saying  what you expected and want?     The  Early Draft
>> Review<http://cwiki.apache.org/KATO/jsr326specification.data/jsr326-edr-1-2009-08-21.pdf>
>> outlines more.
>>
>>
>> *Implementation *
>>
>> We're going to provide a binary driver as soon as we possibly can for you
>> all to use - but you can check out the code and try building and using it
>> now.  We still have a  technical hurdle, We are hampered by  our inability
>> to make JVM modifications if necessary.  How should we resolve this?
>> Remember that we have to be able to provide a Reference Implementation to
>> match the specification.  We can legitimately justify having some edge
>> conditions that are not implemented but its no use to anyone if key parts of
>> the API are not implemented.  Having said that it is reasonable to consider
>> a middle ground where we specify a new JVM interface that we require to be
>> provided by JVM vendors.  It depends on technical circumstances but that
>> approach has more flexibility in implementation - its likely going  to be
>> easier to ask a JVM vendor to provide data to a new standardized  API.   My
>> current thinking is that for now we minimise this situation as much as
>> possible and live with slower implementations  at least until we've resolved
>> the outstanding questions of adoption by JVM Vendors.
>>
>> I think we've come to realize that the desire to be able to extract
>> information about a Hotspot JVM from a corefile is not going to happen and
>> is actually not necessary.  We've said right from the beginning that dumps
>> sizes are growing and  we need to consider have smaller dumps.  Rather than
>> finding a way to read Hotspot data from a corefile we just move directly to
>> defining  and implementing what a Snapshot Dump mechanism really is.   My
>> expectation is that we will only need JVM support for a yet to be designed
>> low level  API which we can use to extract information from a running JVM. I
>> really don't know what form that API would take - it might be something like
>> JVMTI , it might be a set of native methods - or it may just be new Java
>> classes that the JVM Vendor replaces.
>>
>> What drives this discussion and hence defines what we need from the JVM
>> vendors comes from  having the Snapshot concept clear in everyone's head.
>> Since this is new to everyone I want to provide an implementation that
>> embodies the concepts as a soon as possible so we can argue this through
>> from a practical hands-on approach.
>>
>>
>> *Adoption by JVM Vendors*
>>
>> Adoption by JVM vendors - and by that we mainly mean Sun and Oracle since
>> IBM already has a similar implementation -  is predicated on usefulness and
>> a need to have JVM specific code.  If there is no requirement for JVM
>> specific changes then adoption is  not really an issue.  If we have to have
>> JVM changes (and we will in the end)  then we need to either have Sun/Oracle
>> or another JVM vendor develop these JVM changes. Otherwise  we have to find
>> a 3rd party who is willing to develop a GPL licensed extension to OpenJDK to
>> support our requirements.
>>
>> We're going to have to wait a few weeks until the Oracle/Sun acquisition is
>> completed before we can expect to get a sensible answer to the first
>> question.   Its also possible  that we could go straight to the OpenJDK folk
>> and see if they wanted to play.   In either case though we would need to
>> have a good idea on the type of JVM changes and/or new  data access  we
>> need.
>>
>> *User Community *
>>
>> We need to agree who our users actually are.  I know that there are various
>> views but lets get it clear.  My view is that our users are the tools
>> vendors and more expert application programmers out there.  This API may
>> make life easier for the JVM vendor but only in passing.  The major
>> objective is to help programmers solve their own problems not to help JVM
>> vendors fix bugs in the JVM.   Do you agree?
>>
>> What else makes a user community?  Having something to use is high up the
>> list.  We need to get what we have out the door and being used.   Its not as
>> simple as that of course -we need documentation and usage examples and most
>> importantly we need to be able offer a compelling reason for using our
>> API.   Right now we're light on all of these.
>>
>>
>>
>> *What the future holds*
>>
>> I can't say how much I really appreciate all the time and effort that people
>> have expended so far on this project. It is a shame that we've not had the
>> larger buy-in where we expected it but that may change.  I intend to keep
>> asking.
>>
>> Right now though now I need to ask more of you : you as a subscriber to this
>> mailing list , you as a member of the Expert Group, you as a contributor or
>> committer to Apache Kato and you as a potential user of JSR 326.  I need you
>> to  tell me if we are on the right track,  are we going in the right
>> direction or not?  If we are doing what you expected say so as well: its
>> good to get confirmation.  If we're not addressing the issues you consider
>> need to be talked about - say so.  If you can help with documentation,
>> use-cases, evangelism,  coding, testing,  or anything - just say so.
>>
>>
>> In my view the future of this project  ranges from being *just* the place
>> where a new modern replacement for HPROF is developed all the way through to
>> delivering on those objectives we set ourselves in 2008.  I need your help
>> and active involvement right now  in determining our actual future.
>>
>> Thanks
>>
>> --
>> Steve
>>
>>      

-- 
Stuart Monteith
http://blog.stoo.me.uk/

Re: JSR 326 and Apache Kato - A "state of the nation" examination

Posted by David Griffiths <da...@gmail.com>.

Hi Alois, when you talk about memory do you mean Java heap or do you
mean all memory including native?

Cheers,

Dave

Re: JSR 326 and Apache Kato - A "state of the nation" examination

Posted by Steve Poole <sp...@googlemail.com>.

Hi Alois -  waited a few days to see what other people might say.

Thanks for you input. I've responded inline below

On Wed, Jan 20, 2010 at 2:18 PM, Alois Reitbauer <
alois.reitbauer@dynatrace.com> wrote:

> Everybody,
>
> here is my perspective - the perspective of a performance tool vendor.
> The motivation for us to be part of the JSR  is to improve the
> possibilities of working with memory dumps, specifically for larger
> 64bit JVMs.  We see that the JVMTI API has reached its limits regarding
> performance and usability. It was a good API at the time created,
> however times and applications change.
>

Are you against improving JVMTI if we determined that it was a sensible
option?  I ask this because  there is a scenario where that would be the
case.  The scenario is tied up with thoughts I have about how we collect the
data required to go into a snapshot.  I won't write them here but I will
start a new thread on the snapshot API design.


> The major use case we have to support is memory analysis of large JVMs
> as well as complex production environments. By complex I am mainly
> talking about their size like environments with up to a couple of
> hundred JVMS. The usage of memory profiling goes beyond the resolution
> of an out-of-memory error. We see more and more companies that want to
> optimize their memory footprints. In addition to creating a single dump
> after a JVM has crashed, we see analysis cases where you want to compare
> a number of dumps over time, which requires efficient dumping
> mechanisms.
>
> How important is tracking individual objects across dumps over time?


> We are not using vendor-specific dump formats at all, but have our own
> implementation consisting of an agent and server part. Working with dump
> files is in many use cases not practical. Specifically in production the
> logistics for working with files introduces unnecessary complexity -
> especially when fast problem analysis is required.  Nearly every tool
> vendor uses his own dump format. Maybe not initially, but after
> preprocessing. In order to work efficiently with large dumps you need to
> perform operations like index etc.
>
> However this does not means that I propose an API at the abstraction
> level of JVMTI. I really like the idea of having a kind of query
> language, getting binary data back and then processing it in a
> standardized way. Every JVM vender should be free in how he implement
> the data collection process. At the same time I want to be able to
> either write this information to disk or stream it somewhere for further
> processing. So we also  a streaming type interface which allows us to
> process the information as a stream rather than a whole file.


"Streaming" that is a scary concept.  Though I suppose if the design of the
API has a visitor type pattern then it would not be too scary.


> Rethinking
> these requirements we need a protocol/API to communicate with the JVM to
> issue our queries and an API to process the binary stream we get back -
> a very similar approach to how JDBC works.
>
> Agree - that's the sort of pattern I see too.


> The protocol part is important because I have to specify in advance
> which information I want. I do not necessarily want to get back all
> information. Monitoring the number of objects of a specific class in
> most cases requires to create a whole heap dump. I know that there are
> already other ways to do that, however none of them works for create
> dump then analyze approach.
>
> From my perspective we need to support the diagnosis of
> application-level problems as our primary goal. The end-user of dumps
> will always be developers. Who else is able to make sense of the data.
> However the environments  range from a small development JVM to a large
> 32 GB production JVM. Tools for the latter are very rare.
>
> Don't get me wrong. The work done in the KATO project is great. It is a
> great a showcase and reference on the similarities and difference in
> vendor-specific dump formats.  I am wondering who you see as the users
> of KATO? We as a tool vendor will still require our own dump solution,
> for the technical reasons stated above.
>
> We have to look at this effort as a multi step approach.  Our final goal
has to be to improve the diagnostic capabilities for end user customers
(those who run the Java applications)  That means improved tools and that
means  tools vendors writing said tools.  Tools vendors do not want to write
tools for one single JVM so we have to provide them with a standard
interface.


> I also see joint efforts with JVM vendors as mandatory as otherwise we
> are not able to make a significant technological improvement here.
>

Agree entirely.  I'm going to hold off until the Sun/Oracle acquisition has
completed and then I will ask the Sun and Oracle EG reps to present their
position on this JSR.

Starting with the OpenJDK project is a good first step. However at the
> end all vendors have to provide implementations. The new features of
> JVMTI for Java6 also show that there is activity and willingness to
> contribute.
>
> I am happy that there is no increased moment again and I am looking
> forward to the future of JSR 326.  First we have to agree what this
> future should look like.
>
> Best
>
> Alois
>
>


-- 
Steve

RE: JSR 326 and Apache Kato - A "state of the nation" examination

Posted by Alois Reitbauer <al...@dynatrace.com>.

Steve,

to reply to your questions:

I am not against extending JVMTI. However I propose to define our "golden" solution first and check what can and should be done pragmatically. Additionally don't get scared by streaming :-) - a visitor pattern will serve the same purpose. Whatever the actual implementation will look like then, I think everybody got my point here 

Best

Alois

-----Original Message-----
From: Steve Poole [mailto:spoole167@googlemail.com] 
Sent: Montag, 25. Jänner 2010 11:26
To: kato-spec@incubator.apache.org
Subject: Re: JSR 326 and Apache Kato - A "state of the nation" examination

Hi Alois -  waited a few days to see what other people might say.

Thanks for you input. I've responded inline below

On Wed, Jan 20, 2010 at 2:18 PM, Alois Reitbauer < alois.reitbauer@dynatrace.com> wrote:

> Everybody,
>
> here is my perspective - the perspective of a performance tool vendor.
> The motivation for us to be part of the JSR  is to improve the 
> possibilities of working with memory dumps, specifically for larger 
> 64bit JVMs.  We see that the JVMTI API has reached its limits 
> regarding performance and usability. It was a good API at the time 
> created, however times and applications change.
>

Are you against improving JVMTI if we determined that it was a sensible option?  I ask this because  there is a scenario where that would be the case.  The scenario is tied up with thoughts I have about how we collect the data required to go into a snapshot.  I won't write them here but I will start a new thread on the snapshot API design.


> The major use case we have to support is memory analysis of large JVMs 
> as well as complex production environments. By complex I am mainly 
> talking about their size like environments with up to a couple of 
> hundred JVMS. The usage of memory profiling goes beyond the resolution 
> of an out-of-memory error. We see more and more companies that want to 
> optimize their memory footprints. In addition to creating a single 
> dump after a JVM has crashed, we see analysis cases where you want to 
> compare a number of dumps over time, which requires efficient dumping 
> mechanisms.
>
> How important is tracking individual objects across dumps over time?


> We are not using vendor-specific dump formats at all, but have our own 
> implementation consisting of an agent and server part. Working with 
> dump files is in many use cases not practical. Specifically in 
> production the logistics for working with files introduces unnecessary 
> complexity - especially when fast problem analysis is required.  
> Nearly every tool vendor uses his own dump format. Maybe not 
> initially, but after preprocessing. In order to work efficiently with 
> large dumps you need to perform operations like index etc.
>
> However this does not means that I propose an API at the abstraction 
> level of JVMTI. I really like the idea of having a kind of query 
> language, getting binary data back and then processing it in a 
> standardized way. Every JVM vender should be free in how he implement 
> the data collection process. At the same time I want to be able to 
> either write this information to disk or stream it somewhere for 
> further processing. So we also  a streaming type interface which 
> allows us to process the information as a stream rather than a whole file.


"Streaming" that is a scary concept.  Though I suppose if the design of the API has a visitor type pattern then it would not be too scary.


> Rethinking
> these requirements we need a protocol/API to communicate with the JVM 
> to issue our queries and an API to process the binary stream we get 
> back - a very similar approach to how JDBC works.
>
> Agree - that's the sort of pattern I see too.


> The protocol part is important because I have to specify in advance 
> which information I want. I do not necessarily want to get back all 
> information. Monitoring the number of objects of a specific class in 
> most cases requires to create a whole heap dump. I know that there are 
> already other ways to do that, however none of them works for create 
> dump then analyze approach.
>
> From my perspective we need to support the diagnosis of 
> application-level problems as our primary goal. The end-user of dumps 
> will always be developers. Who else is able to make sense of the data.
> However the environments  range from a small development JVM to a 
> large
> 32 GB production JVM. Tools for the latter are very rare.
>
> Don't get me wrong. The work done in the KATO project is great. It is 
> a great a showcase and reference on the similarities and difference in 
> vendor-specific dump formats.  I am wondering who you see as the users 
> of KATO? We as a tool vendor will still require our own dump solution, 
> for the technical reasons stated above.
>
> We have to look at this effort as a multi step approach.  Our final 
> goal
has to be to improve the diagnostic capabilities for end user customers (those who run the Java applications)  That means improved tools and that means  tools vendors writing said tools.  Tools vendors do not want to write tools for one single JVM so we have to provide them with a standard interface.


> I also see joint efforts with JVM vendors as mandatory as otherwise we 
> are not able to make a significant technological improvement here.
>

Agree entirely.  I'm going to hold off until the Sun/Oracle acquisition has completed and then I will ask the Sun and Oracle EG reps to present their position on this JSR.

Starting with the OpenJDK project is a good first step. However at the
> end all vendors have to provide implementations. The new features of 
> JVMTI for Java6 also show that there is activity and willingness to 
> contribute.
>
> I am happy that there is no increased moment again and I am looking 
> forward to the future of JSR 326.  First we have to agree what this 
> future should look like.
>
> Best
>
> Alois
>
>


--
Steve

Re: JSR 326 and Apache Kato - A "state of the nation" examination

Posted by David Griffiths <da...@gmail.com>.

Hi Alois, could you please give some examples of the kind of
information you would like to see in a heap memory snapshot? You
earlier mentioned monitoring the number of instances of a particular
class. Do you think that obtaining that kind of info via JVMTI (eg
IterateOverInstancesOfClass) is insufficient?

Cheers,

Dave

On Thu, Jan 21, 2010 at 9:11 AM, Alois Reitbauer
<al...@dynatrace.com> wrote:
> David,
>
> good point. What we mostly see are heap memory issues. However from time to time there are native memory problems as well.  So I would see them as a second priority item from the point of an application developer. In the native areas the most prominent problems I am aware of are resource leaks like sockets, (file) handles etc.  Another memory related issues is what I would name "perm gen" diagnosis - while being well aware that this is a Sun term.  Understanding which classes got loaded and how big they are is an issue I also see now more frequently.
>
> Cheers,
>
> Alois
>
> -----Original Message-----
> From: David Griffiths [mailto:david.griffiths@gmail.com]
> Sent: Mittwoch, 20. Jänner 2010 16:36
> To: kato-spec@incubator.apache.org
> Subject: Re: JSR 326 and Apache Kato - A "state of the nation" examination
>
> Hi Alois, when you talk about memory do you mean Java heap or do you mean all memory including native?
>
> Cheers,
>
> Dave
>
>
>

RE: JSR 326 and Apache Kato - A "state of the nation" examination

Posted by Alois Reitbauer <al...@dynatrace.com>.

David,

good point. What we mostly see are heap memory issues. However from time to time there are native memory problems as well.  So I would see them as a second priority item from the point of an application developer. In the native areas the most prominent problems I am aware of are resource leaks like sockets, (file) handles etc.  Another memory related issues is what I would name "perm gen" diagnosis - while being well aware that this is a Sun term.  Understanding which classes got loaded and how big they are is an issue I also see now more frequently.

Cheers,

Alois

-----Original Message-----
From: David Griffiths [mailto:david.griffiths@gmail.com] 
Sent: Mittwoch, 20. Jänner 2010 16:36
To: kato-spec@incubator.apache.org
Subject: Re: JSR 326 and Apache Kato - A "state of the nation" examination

Hi Alois, when you talk about memory do you mean Java heap or do you mean all memory including native?

Cheers,

Dave

RE: JSR 326 and Apache Kato - A "state of the nation" examination

Posted by Alois Reitbauer <al...@dynatrace.com>.

Everybody,

here is my perspective - the perspective of a performance tool vendor.
The motivation for us to be part of the JSR  is to improve the
possibilities of working with memory dumps, specifically for larger
64bit JVMs.  We see that the JVMTI API has reached its limits regarding
performance and usability. It was a good API at the time created,
however times and applications change.

The major use case we have to support is memory analysis of large JVMs
as well as complex production environments. By complex I am mainly
talking about their size like environments with up to a couple of
hundred JVMS. The usage of memory profiling goes beyond the resolution
of an out-of-memory error. We see more and more companies that want to
optimize their memory footprints. In addition to creating a single dump
after a JVM has crashed, we see analysis cases where you want to compare
a number of dumps over time, which requires efficient dumping
mechanisms.

We are not using vendor-specific dump formats at all, but have our own
implementation consisting of an agent and server part. Working with dump
files is in many use cases not practical. Specifically in production the
logistics for working with files introduces unnecessary complexity -
especially when fast problem analysis is required.  Nearly every tool
vendor uses his own dump format. Maybe not initially, but after
preprocessing. In order to work efficiently with large dumps you need to
perform operations like index etc. 

However this does not means that I propose an API at the abstraction
level of JVMTI. I really like the idea of having a kind of query
language, getting binary data back and then processing it in a
standardized way. Every JVM vender should be free in how he implement
the data collection process. At the same time I want to be able to
either write this information to disk or stream it somewhere for further
processing. So we also  a streaming type interface which allows us to
process the information as a stream rather than a whole file. Rethinking
these requirements we need a protocol/API to communicate with the JVM to
issue our queries and an API to process the binary stream we get back -
a very similar approach to how JDBC works. 

The protocol part is important because I have to specify in advance
which information I want. I do not necessarily want to get back all
information. Monitoring the number of objects of a specific class in
most cases requires to create a whole heap dump. I know that there are
already other ways to do that, however none of them works for create
dump then analyze approach. 

From my perspective we need to support the diagnosis of
application-level problems as our primary goal. The end-user of dumps
will always be developers. Who else is able to make sense of the data.
However the environments  range from a small development JVM to a large
32 GB production JVM. Tools for the latter are very rare. 

Don't get me wrong. The work done in the KATO project is great. It is a
great a showcase and reference on the similarities and difference in
vendor-specific dump formats.  I am wondering who you see as the users
of KATO? We as a tool vendor will still require our own dump solution,
for the technical reasons stated above. 

I also see joint efforts with JVM vendors as mandatory as otherwise we
are not able to make a significant technological improvement here.
Starting with the OpenJDK project is a good first step. However at the
end all vendors have to provide implementations. The new features of
JVMTI for Java6 also show that there is activity and willingness to
contribute.

I am happy that there is no increased moment again and I am looking
forward to the future of JSR 326.  First we have to agree what this
future should look like.

Best

Alois

Re: JSR 326 and Apache Kato - A "state of the nation" examination

Posted by David Griffiths <da...@gmail.com>.

Hi Steve, this is my 2p (and definitely not IBM's 2p):

I think you're heading off in the wrong direction. In one of your
messages you said that developers didn't initially "get" the concept
of using dumps for debugging their application problems. I'm with
those developers. I think application developers already have a wealth
of tools to assist with debugging and profiling their apps and they
should continue to use those.

The gap in the market that I think Kato should be addressing is
analyzing post-mortem dumps in a production environment. First-failure
data capture (FFDC) dumps where you don't always have the opportunity
to set all the options to give you exactly the type of dump or trace
you'd like. And I think you should be targeting the "image" part of
the API as much as the Java part. Give us access to as much info as
possible to debug a problem.

This is the background of Kato. It is descended from DTFJ which is an
internal IBM API for analyzing dumps. It is not limited to being used
just by JVM service people to solve bugs in the JVM. Dan Julin can
vouch for the fact that DTFJ is used widely by WebSphere to debug
WebSphere application level issues. In fact Dan is one of the main
customers for DTFJ.

I think you should forget about trying to define your own snapshot
dump format and concentrate instead on providing access to core dumps
which already contain all the information and more that we need. The
support for analyzing core files is poor. It's crazy that so many
people are still using gdb/dbx/etc rather than pure Java solutions.

I think Kato should be mainly an API with maybe a reference
implementation for some JVM on Linux. I don't understand why HotSpot
is such an issue. It's up to either Sun or some third-party to provide
a binary implementation of the Kato API for HotSpot core files. This
should not be difficult to achieve.

Easy problems already have easy solutions and plenty of them. It's the
big complex production environments I think we should be targeting.
The demand to analyze core dumps is there, what's missing is a
JVM-neutral solution.

Cheers,

Dave

On Wed, Jan 13, 2010 at 9:03 PM, Steve Poole <sp...@googlemail.com> wrote:
> Greetings all,
>
> Discussions this year have got off to a good start and  we're also really
> close to providing  that first driver which contains the changes we've
> discussed over time.    With that in mind I think its worth examining  the
> past, present and future of this work.
>
> *A brief recap *
>
> We've been working on this JSR for sometime - since 5 August
> 2008<http://jcp.org/en/jsr/detail?id=326>to be precise.
>
> At the start of the project we expected to be able to  develop , what I
> called the "legs" , under the code contributed  by IBM.  These "legs" were
> intended to map the API to the dumps that were available from a Sun JVM -
> including being able to read Hotspot data from a core file.    We also
> expected to drive quickly towards discussing the form of the future - how to
> deal with titanic dumps and how not to have dumps at all.
>
> Most of this didn't happen.  We did write an HPROF reader but we didn't
> manage to develop a core file reader for the Hotspot JVM.   In that regard
> we also examined the Serviceability Agent
> API<http://www.usenix.org/events/jvm01/full_papers/russell/russell_html/index.html>but
> there were too many restrictions on use and operating environment.
> It
> turned out that it was not feasible for Apache Kato to develop a corefile
> reader for Hotspot due to licensing issues and more importantly, lack of
> skills in Hotspot.
>
> At that point we were somewhat stuck (I did discuss this problem privately
> with various JVM vendors but we did not reach a resolution)
>
> All was not lost - we wrote a prototype (in python!)  of a new dump that
> used JVMTI.  The dump was the first to contain local variables. We hooked it
> up to the Java debugger through our JDI connector  to show that you could
> use a familiar interface to analyse your problem.  Java
> DBX<http://en.wikipedia.org/wiki/Dbx_%28debugger%29>for corefiles had
> arrived.
>
> We also tacked on a JXPath <http://commons.apache.org/jxpath/> based layer
> (now in the KatoView tool) that allowed you to query the API without writing
> reams of code.
>
> We took  JSR 326 to San Francisco and showed people what we had at JavaOne
> BOF4870 <http://cwiki.apache.org/confluence/display/KATO/BOF4870>   and I
> got to meet a few of you face to face for the first time.
>
> Afterward JavaOne we rewrote the python prototype in C and started to bring
> the first Early Draft
> Review<http://cwiki.apache.org/KATO/jsr326specification.data/jsr326-edr-1-2009-08-21.pdf>
> together,  although it took a long time to get the EDR on to the JCP site.
> Mostly my learning of a new process and dealing with a  licensing concern
> where I learned about the concept of "collective
> copyright"<http://en.wikipedia.org/wiki/Copyright_collective>
>
> After the EDR was out we started work on the first code release from Apache
> Kato (all new stuff to learn). We still hadn't resolved the mismatch between
> what data the API said it could offer and our inability to provide said data
> (ie no hotspot support).  The answer was to  factor out the relationship
> between Java entities and native code entities and make it optional.  Now
> those dumps that know nothing about processes or address spaces or even
> pointers are not required to fake them.
>
> Finally , and quite recently,  we added in to the API the first attempt at a
> standard dump trigger mechanism and we added an additional dump type  that
> will help us as we develop the snapshot and optionality designs.
>
> *Today *
>
> Lets look to the present.  its January 2010 and there is a foot of snow
> outside my window,  which is unusual for where I live.  What else is unusual
> is that we have an Expert Group which has been so very quiet.   It's time to
> examine our situation and discuss what else  it is that we need to do to
> make this project a success.
>
>  At the highest level we need at least 4 things
>
>   1.  A design that will address our requirements.
>   2.  A matching implemention that supports a high percentage of this
>   design
>   3.  Adoption by JVM vendors
>   4.  A user community
>
>
>
> *Design*
>
> Do you know what our requirements are?  The original proposal for kato is
> here <http://wiki.apache.org/incubator/KatoProposal>  and the JSR is
> here<http://jcp.org/en/jsr/detail?id=326>
>
> Are these documents saying  what you expected and want?     The  Early Draft
> Review<http://cwiki.apache.org/KATO/jsr326specification.data/jsr326-edr-1-2009-08-21.pdf>
> outlines more.
>
>
> *Implementation *
>
> We're going to provide a binary driver as soon as we possibly can for you
> all to use - but you can check out the code and try building and using it
> now.  We still have a  technical hurdle, We are hampered by  our inability
> to make JVM modifications if necessary.  How should we resolve this?
> Remember that we have to be able to provide a Reference Implementation to
> match the specification.  We can legitimately justify having some edge
> conditions that are not implemented but its no use to anyone if key parts of
> the API are not implemented.  Having said that it is reasonable to consider
> a middle ground where we specify a new JVM interface that we require to be
> provided by JVM vendors.  It depends on technical circumstances but that
> approach has more flexibility in implementation - its likely going  to be
> easier to ask a JVM vendor to provide data to a new standardized  API.   My
> current thinking is that for now we minimise this situation as much as
> possible and live with slower implementations  at least until we've resolved
> the outstanding questions of adoption by JVM Vendors.
>
> I think we've come to realize that the desire to be able to extract
> information about a Hotspot JVM from a corefile is not going to happen and
> is actually not necessary.  We've said right from the beginning that dumps
> sizes are growing and  we need to consider have smaller dumps.  Rather than
> finding a way to read Hotspot data from a corefile we just move directly to
> defining  and implementing what a Snapshot Dump mechanism really is.   My
> expectation is that we will only need JVM support for a yet to be designed
> low level  API which we can use to extract information from a running JVM. I
> really don't know what form that API would take - it might be something like
> JVMTI , it might be a set of native methods - or it may just be new Java
> classes that the JVM Vendor replaces.
>
> What drives this discussion and hence defines what we need from the JVM
> vendors comes from  having the Snapshot concept clear in everyone's head.
> Since this is new to everyone I want to provide an implementation that
> embodies the concepts as a soon as possible so we can argue this through
> from a practical hands-on approach.
>
>
> *Adoption by JVM Vendors*
>
> Adoption by JVM vendors - and by that we mainly mean Sun and Oracle since
> IBM already has a similar implementation -  is predicated on usefulness and
> a need to have JVM specific code.  If there is no requirement for JVM
> specific changes then adoption is  not really an issue.  If we have to have
> JVM changes (and we will in the end)  then we need to either have Sun/Oracle
> or another JVM vendor develop these JVM changes. Otherwise  we have to find
> a 3rd party who is willing to develop a GPL licensed extension to OpenJDK to
> support our requirements.
>
> We're going to have to wait a few weeks until the Oracle/Sun acquisition is
> completed before we can expect to get a sensible answer to the first
> question.   Its also possible  that we could go straight to the OpenJDK folk
> and see if they wanted to play.   In either case though we would need to
> have a good idea on the type of JVM changes and/or new  data access  we
> need.
>
> *User Community *
>
> We need to agree who our users actually are.  I know that there are various
> views but lets get it clear.  My view is that our users are the tools
> vendors and more expert application programmers out there.  This API may
> make life easier for the JVM vendor but only in passing.  The major
> objective is to help programmers solve their own problems not to help JVM
> vendors fix bugs in the JVM.   Do you agree?
>
> What else makes a user community?  Having something to use is high up the
> list.  We need to get what we have out the door and being used.   Its not as
> simple as that of course -we need documentation and usage examples and most
> importantly we need to be able offer a compelling reason for using our
> API.   Right now we're light on all of these.
>
>
>
> *What the future holds*
>
> I can't say how much I really appreciate all the time and effort that people
> have expended so far on this project. It is a shame that we've not had the
> larger buy-in where we expected it but that may change.  I intend to keep
> asking.
>
> Right now though now I need to ask more of you : you as a subscriber to this
> mailing list , you as a member of the Expert Group, you as a contributor or
> committer to Apache Kato and you as a potential user of JSR 326.  I need you
> to  tell me if we are on the right track,  are we going in the right
> direction or not?  If we are doing what you expected say so as well: its
> good to get confirmation.  If we're not addressing the issues you consider
> need to be talked about - say so.  If you can help with documentation,
> use-cases, evangelism,  coding, testing,  or anything - just say so.
>
>
> In my view the future of this project  ranges from being *just* the place
> where a new modern replacement for HPROF is developed all the way through to
> delivering on those objectives we set ourselves in 2008.  I need your help
> and active involvement right now  in determining our actual future.
>
> Thanks
>
> --
> Steve
>