You are viewing a plain text version of this content. The canonical link for it is here.
Posted to kato-spec@incubator.apache.org by Andrew Johnson <an...@uk.ibm.com> on 2009/04/21 14:20:52 UTC

My experience with DTFJ

I've used DTFJ to write the DTFJ Adapter for the Eclipse Memory Analyzer 
Tool.
http://www.eclipse.org/mat

Here are some comments I have about DTFJ API which I think also apply to 
Kato.

Predictable order from iterators
- I want the same results with same dump, but with the same or different 
VM versions to run Kato/my program. This is needed for ease of debugging 
etc.
- No iterators anywhere from straight HashMaps
- This should be documented as a requirement of the API

Generics for iterators
- Specifies the return type from getNext()
- Would reduce coding errors
- CorruptData as a return type might cause a problem, but it is an 
interface
- need CorruptData objects of different types for errors. e.g. 
MyCorruptJavaObject implements JavaObject, CorruptData
- <? extends JavaClass> etc. in case the implementation is using a 
subclass collection
- getAddress() - clash in method names between CorruptData and JavaObject 
etc. - but does this matter if the object is corrupt? Should a CorruptData 
object which also implements another interface throw CorruptDataException 
whenever possible from the additional methods?

In general iterator methods don't themselves return exceptions - except
ImageProcess:
    Iterator getLibraries() throws DataUnavailable, CorruptDataException;
Why?

Equality between objects from different dumps 
If you get an object twice from a dump and it is meant to represent the 
same thing then the hash code must match and they must compare equal. It's 
okay for for the implementation to dynamically build objects however.
If you open the same dump twice should objects compare equal?
If you open two dumps at different times from the same run should 
identical objects compare equal (JavaRuntime, JavaClass)? How do we cope 
with JavaObjects which get moved by GC? Even a persistentHashCode and 
address match doesn't guarantee that its the same object. The field values 
might have changed by then anyway.

ImageProcess.getID() - string or int
for process - should be convertible to int or long if possible using 
Integer.decode() or Long.decode()

ImageAddressSpace.getID() for address space - this is needed

Image.close() - this would be useful to tidy up temporary files etc., but 
how should this work. Should all API calls that need the image then throw 
an exception. If so, what?

Class names with slash or dot as separator?
java/lang/Object or java.lang.Object ?

Iterators over heap - should JavaObjects representing classes be found 
when walking over the heaps?
c.f. DTFJ
Java 1.4.2 - in separate heap
Java 5.0 - missing
Java 6 interspersed, but JavaClass at different address to JavaObject

Thread roots
Global roots
JavaReference.getSource()
- if the location of a root is known to be connected with a thread but 
cannot be connected to a particular JavaStackFrame can a JavaThread be a 
source?

JavaReference.getSource() getTarget()
These just return an Object. Is there a more type-safe way of doing this? 
Perhaps not - the choice is a runtime ClassCastException or some other 
sort of exception from a getSourceJavaObject() etc. calls.

JavaObject.getReferences
If there are hidden references (i.e. from a JavaObject of a 
JavaClassLoader to the defined classes) should these be returned as well 
as the fields and the type. I.E should getRoots and getReferences be 
sufficient to traverse the graph of objects and locate every live object?

JavaObject/JavaClass - any commonality - should they have a common 
superclass?

Should the API have one-way or two-way links between objects. Two-way 
makes combining data from two different dump readers a bit harder. E.g. if 
the first reader delegates to the secondary reader, but the data from the 
secondary reader returns an ImagePointer (even from CorruptData) then you 
can get the ImageAddressSpace from the secondary reader, which won't match 
the primary reader.

JavaClass - would a getInstanceSize() method be useful? This wouldn't make 
sense for arrays unless it was for a zero length array.

ImagePointer - getPointerAt() knows the size as 32-bit or 64-bit, but 
pointer size is obtained from ImageProcess, so conceivably there could be 
one address space with several processes with different pointer sizes! How 
does getPointerAt know the size then?

ImageSection.isShared() - with other VMs? Does this include VMs in the 
same address space or process? Is shared a address space, process or 
runtime concept? Does it depend on the type of the section?

ImageStackFrame.getBasePointer() - do we have a hint whether the frame 
goes up or down from this address and what the size of the frame is?

JavaStackFrame.getBasePointer() - do we have a hint whether the frame goes 
up or down from this address and what the size of the frame is?

hashCode() and equals()
Why are these listed as methods in some of the interfaces? Is an 
implementation required to override the java.lang.Object version?
Why does JavaStackFrame have hashCode() listed in JavaDoc?

ManagedRuntime.getVersion() - Why CorruptDataException if no 
understandable version data (does that include no version data stored in 
the dump?)

JavaClass.getConstantPoolReferences() - this seems to just return 
JavaObjects. If the JavaObject represents a class, is there an easy way to 
get to the JavaClass? The reverse step is easy with JavaClass.getObject(). 
Should this API return JavaClass and JavaObjects? That would be harder to 
make it use a generics for the iterator.


Andrew Johnson






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU







Re: My experience with DTFJ

Posted by Steve Poole <sp...@googlemail.com>.
Thanks Andrew - appreciate the input.  You've listed many of the weaknesses
of the API plus some additional requirements.  I'm going to use your note as
a resource and turn the items listed into various discussion items - either
on the kato wiki or the Jira project.



On Tue, Apr 21, 2009 at 3:20 PM, Andrew Johnson
<an...@uk.ibm.com>wrote:

> I've used DTFJ to write the DTFJ Adapter for the Eclipse Memory Analyzer
> Tool.
> http://www.eclipse.org/mat
>
> Here are some comments I have about DTFJ API which I think also apply to
> Kato.
>
> Predictable order from iterators
> - I want the same results with same dump, but with the same or different
> VM versions to run Kato/my program. This is needed for ease of debugging
> etc.
> - No iterators anywhere from straight HashMaps
> - This should be documented as a requirement of the API
>
> Generics for iterators
> - Specifies the return type from getNext()
> - Would reduce coding errors
> - CorruptData as a return type might cause a problem, but it is an
> interface
> - need CorruptData objects of different types for errors. e.g.
> MyCorruptJavaObject implements JavaObject, CorruptData
> - <? extends JavaClass> etc. in case the implementation is using a
> subclass collection
> - getAddress() - clash in method names between CorruptData and JavaObject
> etc. - but does this matter if the object is corrupt? Should a CorruptData
> object which also implements another interface throw CorruptDataException
> whenever possible from the additional methods?
>
> In general iterator methods don't themselves return exceptions - except
> ImageProcess:
>    Iterator getLibraries() throws DataUnavailable, CorruptDataException;
> Why?
>
> Equality between objects from different dumps
> If you get an object twice from a dump and it is meant to represent the
> same thing then the hash code must match and they must compare equal. It's
> okay for for the implementation to dynamically build objects however.
> If you open the same dump twice should objects compare equal?
> If you open two dumps at different times from the same run should
> identical objects compare equal (JavaRuntime, JavaClass)? How do we cope
> with JavaObjects which get moved by GC? Even a persistentHashCode and
> address match doesn't guarantee that its the same object. The field values
> might have changed by then anyway.
>
> ImageProcess.getID() - string or int
> for process - should be convertible to int or long if possible using
> Integer.decode() or Long.decode()
>
> ImageAddressSpace.getID() for address space - this is needed
>
> Image.close() - this would be useful to tidy up temporary files etc., but
> how should this work. Should all API calls that need the image then throw
> an exception. If so, what?
>
> Class names with slash or dot as separator?
> java/lang/Object or java.lang.Object ?
>
> Iterators over heap - should JavaObjects representing classes be found
> when walking over the heaps?
> c.f. DTFJ
> Java 1.4.2 - in separate heap
> Java 5.0 - missing
> Java 6 interspersed, but JavaClass at different address to JavaObject
>
> Thread roots
> Global roots
> JavaReference.getSource()
> - if the location of a root is known to be connected with a thread but
> cannot be connected to a particular JavaStackFrame can a JavaThread be a
> source?
>
> JavaReference.getSource() getTarget()
> These just return an Object. Is there a more type-safe way of doing this?
> Perhaps not - the choice is a runtime ClassCastException or some other
> sort of exception from a getSourceJavaObject() etc. calls.
>
> JavaObject.getReferences
> If there are hidden references (i.e. from a JavaObject of a
> JavaClassLoader to the defined classes) should these be returned as well
> as the fields and the type. I.E should getRoots and getReferences be
> sufficient to traverse the graph of objects and locate every live object?
>
> JavaObject/JavaClass - any commonality - should they have a common
> superclass?
>
> Should the API have one-way or two-way links between objects. Two-way
> makes combining data from two different dump readers a bit harder. E.g. if
> the first reader delegates to the secondary reader, but the data from the
> secondary reader returns an ImagePointer (even from CorruptData) then you
> can get the ImageAddressSpace from the secondary reader, which won't match
> the primary reader.
>
> JavaClass - would a getInstanceSize() method be useful? This wouldn't make
> sense for arrays unless it was for a zero length array.
>
> ImagePointer - getPointerAt() knows the size as 32-bit or 64-bit, but
> pointer size is obtained from ImageProcess, so conceivably there could be
> one address space with several processes with different pointer sizes! How
> does getPointerAt know the size then?
>
> ImageSection.isShared() - with other VMs? Does this include VMs in the
> same address space or process? Is shared a address space, process or
> runtime concept? Does it depend on the type of the section?
>
> ImageStackFrame.getBasePointer() - do we have a hint whether the frame
> goes up or down from this address and what the size of the frame is?
>
> JavaStackFrame.getBasePointer() - do we have a hint whether the frame goes
> up or down from this address and what the size of the frame is?
>
> hashCode() and equals()
> Why are these listed as methods in some of the interfaces? Is an
> implementation required to override the java.lang.Object version?
> Why does JavaStackFrame have hashCode() listed in JavaDoc?
>
> ManagedRuntime.getVersion() - Why CorruptDataException if no
> understandable version data (does that include no version data stored in
> the dump?)
>
> JavaClass.getConstantPoolReferences() - this seems to just return
> JavaObjects. If the JavaObject represents a class, is there an easy way to
> get to the JavaClass? The reverse step is easy with JavaClass.getObject().
> Should this API return JavaClass and JavaObjects? That would be harder to
> make it use a generics for the iterator.
>
>
> Andrew Johnson
>
>
>
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
>
>
>
>
>