You are viewing a plain text version of this content. The canonical link for it is here.
Posted to kato-spec@incubator.apache.org by Steve Poole <sp...@googlemail.com> on 2009/09/08 10:00:23 UTC

Round tripping entity identity

It's important for performance and scalability reasons that the API does not
force the loading of a complete dump into memory.

What that means in practise is that the application using the API will need
to be able to read entities in a "random access" manner.  That in turn means
that we need a mechanism that can provide a "handle" to an entity and a way
for the API to convert the handle back into an full representation of the
entity in question.

Since the handle will probably have to be stored outside of Java (ie in a
database)   the handle has to  be a primitive data type.

Is this a reasonable view?

The MAT tool is the only example I have to date where this mechanism is
needed  -  I believe they use a long rather than a String.

Are there other uses we should consider?  -  What about cross dump
corrolations?

Cheers

Steve

Re: Round tripping entity identity

Posted by Stuart Monteith <st...@stoo.me.uk>.
Currently this can only be done with  JavaObject in the API.

To "round trip" a JavaObject using the API as it is just now you can do 
the following:

    Get the address of a JavaObject using:
         long id =  ( (ImagePointer)  JavaObject.getID()).getAddress();
   
    This id can be stored away, perhaps on disk, and the original object 
can be retrieved by doing the following:
          ImageAddressSpace ias;
           JavaObject obj = JavaRuntime.getObjectAtAddress( 
ias.getPointer(id));

You can see that with the existing API there is some awkward conversion 
from pointers to long values and back again.

I'd say that there is some confusion here between the address of 
something in memory, and it's identity. While it is valid
to use an address to identify something, it isn't always valid to use an 
identifier as a means of addressing something.
For example, the hprof implementation of the API doesn't operate with 
addresses, but it still has identifiers.

I'd suggest that the "getID()" methods do just that, and look something 
like this:

    long getID();

This is making two assumptions:

    1. long is large enough to distinguish between different objects.
    2. Identifiers are only unique for objects of the same type.

Of course, for this to be complete there would have to be a method to 
construct an object of each type.

For example:

    JavaClass JavaRuntime.getClassWithID(long id);
    JavaMonitor JavaRuntime.getMonitorWithID(long id);
    JavaObject JavaRuntime.getObjectWithID(long id);

The alternative is to have a method like:
    Object JavaRuntime.getSomethingWithID(JavaID id);

where the JavaID is a type that encompasses the type of object as well 
as the ID.
It might look something like this:

public interface JavaID extends Serializable {
    /**
     * Returns true if this ID identifies an object of the passed type.
     */
    public boolean isa(Class type);

    /**
     * Identify the runtime this ID belongs to.
     */
    public JavaRuntime getRuntime(); //
}

Having the JavaID serializable allows for the object type to be stored 
along with the id of the object itself. It may also include the 
identifier of the runtime too. However, I believe it will make indices 
more difficult to build as the identifier will be a variable number of 
bytes, which will be implementation specific.

So I'd say I support having a primitive type as an identifier.

Cross dump correlation is interesting.
To compare objects you have to:
    1. Establish that they are from the same process.
    2. That the objects are the same.

Number 1. should be the simple matter of comparing the stored process 
id, JVM version information, etc. It should be highly unlikely that two 
dumps from two different processes could be compared by accident.

Number 2's difficulty depends on the implementation and the type being 
compared. For example, JVMTI agents would have trouble consistently 
identifying anything because of the changing IDs they are given.
Given an implementation backed by a core file, I would expect the 
following to probably be consistent across dumps:
    ImageThread, ImageProcess, JavaRuntime, JavaThread, JavaHeap.

The most troublesome identities to pin down would be JavaObject's. As 
the garbage collector can and will rearrange object's positions in 
memory, that can't be used for identity.


The other issue I can think of is that these IDs will be reported to the 
tools users. For example, an object could be identified like 
"java.util.HashMap@0x2d78e". Should we specify any significance to these 
IDs? I'd think not, because of the variance in the different 
implementations.

I think it's important to point out how ImageSections would still 
provide the locations of things in memory (excepting JavaClass, which is 
missing ImageSections).


Regards,
    Stuart


Steve Poole wrote:
> It's important for performance and scalability reasons that the API does not
> force the loading of a complete dump into memory.
>
> What that means in practise is that the application using the API will need
> to be able to read entities in a "random access" manner.  That in turn means
> that we need a mechanism that can provide a "handle" to an entity and a way
> for the API to convert the handle back into an full representation of the
> entity in question.
> Since the handle will probably have to be stored outside of Java (ie in a
> database)   the handle has to  be a primitive data type.
>
> Is this a reasonable view?
>
> The MAT tool is the only example I have to date where this mechanism is
> needed  -  I believe they use a long rather than a String.
>
> Are there other uses we should consider?  -  What about cross dump
> corrolations?
>
> Cheers
>
> Steve
>
>   

-- 
Stuart Monteith
http://blog.stoo.me.uk/