You are viewing a plain text version of this content. The canonical link for it is here.
Posted to kato-spec@incubator.apache.org by Andrew Johnson <an...@uk.ibm.com> on 2009/04/21 14:20:52 UTC
My experience with DTFJ
I've used DTFJ to write the DTFJ Adapter for the Eclipse Memory Analyzer
Tool.
http://www.eclipse.org/mat
Here are some comments I have about DTFJ API which I think also apply to
Kato.
Predictable order from iterators
- I want the same results with same dump, but with the same or different
VM versions to run Kato/my program. This is needed for ease of debugging
etc.
- No iterators anywhere from straight HashMaps
- This should be documented as a requirement of the API
Generics for iterators
- Specifies the return type from getNext()
- Would reduce coding errors
- CorruptData as a return type might cause a problem, but it is an
interface
- need CorruptData objects of different types for errors. e.g.
MyCorruptJavaObject implements JavaObject, CorruptData
- <? extends JavaClass> etc. in case the implementation is using a
subclass collection
- getAddress() - clash in method names between CorruptData and JavaObject
etc. - but does this matter if the object is corrupt? Should a CorruptData
object which also implements another interface throw CorruptDataException
whenever possible from the additional methods?
In general iterator methods don't themselves return exceptions - except
ImageProcess:
Iterator getLibraries() throws DataUnavailable, CorruptDataException;
Why?
Equality between objects from different dumps
If you get an object twice from a dump and it is meant to represent the
same thing then the hash code must match and they must compare equal. It's
okay for for the implementation to dynamically build objects however.
If you open the same dump twice should objects compare equal?
If you open two dumps at different times from the same run should
identical objects compare equal (JavaRuntime, JavaClass)? How do we cope
with JavaObjects which get moved by GC? Even a persistentHashCode and
address match doesn't guarantee that its the same object. The field values
might have changed by then anyway.
ImageProcess.getID() - string or int
for process - should be convertible to int or long if possible using
Integer.decode() or Long.decode()
ImageAddressSpace.getID() for address space - this is needed
Image.close() - this would be useful to tidy up temporary files etc., but
how should this work. Should all API calls that need the image then throw
an exception. If so, what?
Class names with slash or dot as separator?
java/lang/Object or java.lang.Object ?
Iterators over heap - should JavaObjects representing classes be found
when walking over the heaps?
c.f. DTFJ
Java 1.4.2 - in separate heap
Java 5.0 - missing
Java 6 interspersed, but JavaClass at different address to JavaObject
Thread roots
Global roots
JavaReference.getSource()
- if the location of a root is known to be connected with a thread but
cannot be connected to a particular JavaStackFrame can a JavaThread be a
source?
JavaReference.getSource() getTarget()
These just return an Object. Is there a more type-safe way of doing this?
Perhaps not - the choice is a runtime ClassCastException or some other
sort of exception from a getSourceJavaObject() etc. calls.
JavaObject.getReferences
If there are hidden references (i.e. from a JavaObject of a
JavaClassLoader to the defined classes) should these be returned as well
as the fields and the type. I.E should getRoots and getReferences be
sufficient to traverse the graph of objects and locate every live object?
JavaObject/JavaClass - any commonality - should they have a common
superclass?
Should the API have one-way or two-way links between objects. Two-way
makes combining data from two different dump readers a bit harder. E.g. if
the first reader delegates to the secondary reader, but the data from the
secondary reader returns an ImagePointer (even from CorruptData) then you
can get the ImageAddressSpace from the secondary reader, which won't match
the primary reader.
JavaClass - would a getInstanceSize() method be useful? This wouldn't make
sense for arrays unless it was for a zero length array.
ImagePointer - getPointerAt() knows the size as 32-bit or 64-bit, but
pointer size is obtained from ImageProcess, so conceivably there could be
one address space with several processes with different pointer sizes! How
does getPointerAt know the size then?
ImageSection.isShared() - with other VMs? Does this include VMs in the
same address space or process? Is shared a address space, process or
runtime concept? Does it depend on the type of the section?
ImageStackFrame.getBasePointer() - do we have a hint whether the frame
goes up or down from this address and what the size of the frame is?
JavaStackFrame.getBasePointer() - do we have a hint whether the frame goes
up or down from this address and what the size of the frame is?
hashCode() and equals()
Why are these listed as methods in some of the interfaces? Is an
implementation required to override the java.lang.Object version?
Why does JavaStackFrame have hashCode() listed in JavaDoc?
ManagedRuntime.getVersion() - Why CorruptDataException if no
understandable version data (does that include no version data stored in
the dump?)
JavaClass.getConstantPoolReferences() - this seems to just return
JavaObjects. If the JavaObject represents a class, is there an easy way to
get to the JavaClass? The reverse step is easy with JavaClass.getObject().
Should this API return JavaClass and JavaObjects? That would be harder to
make it use a generics for the iterator.
Andrew Johnson
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Re: My experience with DTFJ
Posted by Steve Poole <sp...@googlemail.com>.
Thanks Andrew - appreciate the input. You've listed many of the weaknesses
of the API plus some additional requirements. I'm going to use your note as
a resource and turn the items listed into various discussion items - either
on the kato wiki or the Jira project.
On Tue, Apr 21, 2009 at 3:20 PM, Andrew Johnson
<an...@uk.ibm.com>wrote:
> I've used DTFJ to write the DTFJ Adapter for the Eclipse Memory Analyzer
> Tool.
> http://www.eclipse.org/mat
>
> Here are some comments I have about DTFJ API which I think also apply to
> Kato.
>
> Predictable order from iterators
> - I want the same results with same dump, but with the same or different
> VM versions to run Kato/my program. This is needed for ease of debugging
> etc.
> - No iterators anywhere from straight HashMaps
> - This should be documented as a requirement of the API
>
> Generics for iterators
> - Specifies the return type from getNext()
> - Would reduce coding errors
> - CorruptData as a return type might cause a problem, but it is an
> interface
> - need CorruptData objects of different types for errors. e.g.
> MyCorruptJavaObject implements JavaObject, CorruptData
> - <? extends JavaClass> etc. in case the implementation is using a
> subclass collection
> - getAddress() - clash in method names between CorruptData and JavaObject
> etc. - but does this matter if the object is corrupt? Should a CorruptData
> object which also implements another interface throw CorruptDataException
> whenever possible from the additional methods?
>
> In general iterator methods don't themselves return exceptions - except
> ImageProcess:
> Iterator getLibraries() throws DataUnavailable, CorruptDataException;
> Why?
>
> Equality between objects from different dumps
> If you get an object twice from a dump and it is meant to represent the
> same thing then the hash code must match and they must compare equal. It's
> okay for for the implementation to dynamically build objects however.
> If you open the same dump twice should objects compare equal?
> If you open two dumps at different times from the same run should
> identical objects compare equal (JavaRuntime, JavaClass)? How do we cope
> with JavaObjects which get moved by GC? Even a persistentHashCode and
> address match doesn't guarantee that its the same object. The field values
> might have changed by then anyway.
>
> ImageProcess.getID() - string or int
> for process - should be convertible to int or long if possible using
> Integer.decode() or Long.decode()
>
> ImageAddressSpace.getID() for address space - this is needed
>
> Image.close() - this would be useful to tidy up temporary files etc., but
> how should this work. Should all API calls that need the image then throw
> an exception. If so, what?
>
> Class names with slash or dot as separator?
> java/lang/Object or java.lang.Object ?
>
> Iterators over heap - should JavaObjects representing classes be found
> when walking over the heaps?
> c.f. DTFJ
> Java 1.4.2 - in separate heap
> Java 5.0 - missing
> Java 6 interspersed, but JavaClass at different address to JavaObject
>
> Thread roots
> Global roots
> JavaReference.getSource()
> - if the location of a root is known to be connected with a thread but
> cannot be connected to a particular JavaStackFrame can a JavaThread be a
> source?
>
> JavaReference.getSource() getTarget()
> These just return an Object. Is there a more type-safe way of doing this?
> Perhaps not - the choice is a runtime ClassCastException or some other
> sort of exception from a getSourceJavaObject() etc. calls.
>
> JavaObject.getReferences
> If there are hidden references (i.e. from a JavaObject of a
> JavaClassLoader to the defined classes) should these be returned as well
> as the fields and the type. I.E should getRoots and getReferences be
> sufficient to traverse the graph of objects and locate every live object?
>
> JavaObject/JavaClass - any commonality - should they have a common
> superclass?
>
> Should the API have one-way or two-way links between objects. Two-way
> makes combining data from two different dump readers a bit harder. E.g. if
> the first reader delegates to the secondary reader, but the data from the
> secondary reader returns an ImagePointer (even from CorruptData) then you
> can get the ImageAddressSpace from the secondary reader, which won't match
> the primary reader.
>
> JavaClass - would a getInstanceSize() method be useful? This wouldn't make
> sense for arrays unless it was for a zero length array.
>
> ImagePointer - getPointerAt() knows the size as 32-bit or 64-bit, but
> pointer size is obtained from ImageProcess, so conceivably there could be
> one address space with several processes with different pointer sizes! How
> does getPointerAt know the size then?
>
> ImageSection.isShared() - with other VMs? Does this include VMs in the
> same address space or process? Is shared a address space, process or
> runtime concept? Does it depend on the type of the section?
>
> ImageStackFrame.getBasePointer() - do we have a hint whether the frame
> goes up or down from this address and what the size of the frame is?
>
> JavaStackFrame.getBasePointer() - do we have a hint whether the frame goes
> up or down from this address and what the size of the frame is?
>
> hashCode() and equals()
> Why are these listed as methods in some of the interfaces? Is an
> implementation required to override the java.lang.Object version?
> Why does JavaStackFrame have hashCode() listed in JavaDoc?
>
> ManagedRuntime.getVersion() - Why CorruptDataException if no
> understandable version data (does that include no version data stored in
> the dump?)
>
> JavaClass.getConstantPoolReferences() - this seems to just return
> JavaObjects. If the JavaObject represents a class, is there an easy way to
> get to the JavaClass? The reverse step is easy with JavaClass.getObject().
> Should this API return JavaClass and JavaObjects? That would be harder to
> make it use a generics for the iterator.
>
>
> Andrew Johnson
>
>
>
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
>
>
>
>
>