You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by "Geir Magnusson Jr." <ge...@apache.org> on 2005/11/01 00:38:57 UTC
Re: Questions about GC implementations
Robin - can you post this to a JIRA please?
geir
On Oct 29, 2005, at 1:57 AM, Robin Garner wrote:
>
>
>>> In the garbage collectors I've worked with, the essential design is:
>>>
>>> - 'new' allocates space on the heap.
>>> - The header of each object contains a pointer (or equiv) to a
>>> per-class
>>> data structure, that points to the GC map for the object (and
>>> virtual
>>> dispatch tables etc etc)
>>> - Reference fields in objects contain pointers directly to the heap
>>> objects they reference.
>>> - Pointer loads and stores are (optionally) performed via
>>> barriers - the
>>> terminology is a little confusing: these are not synchronization
>>> barriers,
>>> but opportunities for the GC to intercept the load/store and do some
>>> additional processing. Write barriers are common, read barriers
>>> less so.
>>>
>>>
>>
>> This is also the approach I have taken, so I think we are
>> on the same page. I think we are just saying the same thing
>> in different words. When an object is to be allocated,
>> its pointer will be set by the allocator. This pointer
>> is 'robject.pgarbage' and is found in 'jvm/src/object.h'.
>> The reason you did not find one is because I have only
>> provided a stub GC implementation. The 'new' operation
>> is performed by object_instance_new() in 'jvm/src/object.c'
>> and includes a call to GC_OBJECT_NEW().
>>
>>
> Ok, so now I understand what the 'robject.pgarbage' pointer is -
> this is the pointer to the body of an object. So do I understand
> correctly that the body of an array is pointed to by a different
> pointer in an object header ? Do you realise that this means that
> a GETFIELD requires two indirections ?
>
> Do you realise that your object header is somewhere in the vicinity
> of 40 bytes ? The average payload of a java object (at least for
> SPECjvm98) is 16 bytes. Many of the fields in the bootJVM object
> header are actually constant across all objects of a class.
>
> The object table needs to go - I think this is a fundamental enough
> design feature that it needs to be removed as soon as possible. An
> object header should be 2 words at the most (3 for arrays), and
> they should be contiguous with the object.
>
>> Any time a reference to that object is made, its object
>> hash, of type 'jvm_object_hash' has a reference recorded
>> by GC_OBJECT_MKREF_FROM_OBJECT() or GC_OBJECT_FIELD_MKREF()
>> or GC_OBJECT_MKREF_FROM_JVM(). These are for internal
>> JVM references, references from Java object reference variables,
>> and references from Java local method reference variables,
>> respectively. When the reference is no longer needed, such as
>> when an object is destroyed or when a local method returns,
>> the reference is no longer needed. When this occurs,
>> then GC_OBJECT_RMREF_FROM_OBJECT() or GC_OBJECT_FIELD_RMREF()
>> or GC_OBJECT_RMREF_FROM_JVM() are called, respectively. When
>> an object is no longer used, then GC_OBJECT_DELETE() is called.
>>
>>
> So these I guess are what we would refer to in the GC world as
> barriers. My confusion is that a) they don't have enough
> parameters, and b) there are too many of them. Am I right in
> assuming that you are invoking a barrier on every update to a local
> variable, every update to a stack ? This is going to be way too
> inefficient.
>
> For a stop the world GC, what is needed is a barrier on a PUTFIELD
> and AASTORE, and a way that the GC can query the current pointers
> that the VM has into the heap when it starts a collection.
> Optionally, we can also have a PUTSTATIC barrier, and corresponding
> read barriers.
>
> I also don't get the distinction between the classes and objects.
> Either a class is an object, and lives in the heap - in which case
> it can be treated as an object - or it isn't and lives in the VM.
>
>
>> When a local methods is called, GC_STACK_NEW() is invoked to set
>> up GC for that stack frame. Adding and removing objects is done
>> per above. Adding and removing references to objects is done
>> with GC_STACK_MKREF_FROM_JVM() and GC_STACK_RMREF_FROM_JVM().
>> When it returns, GC_STACK_DELETE() is called.
>>
>>
> This is way too heavyweight. Stack manipulation is fast-path code,
> and shouldn't have to call out to the GC all the time.
>
>
>>> - There are many styles of collector, but the most common class uses
>>> tracing, in which a root set of pointers is used to determine an
>>> initial
>>> set of live objects, and the collector performs a transitive
>>> closure over
>>> this set to establish the set of all live objects. The root set is
>>> commonly the thread stacks and the static pointer fields.
>>>
>>>
>>
>> The macro GC_RUN() refers to the collector. The stub implementation
>> shows what it is supposed to do. The OBJECT_STATUS_GCREQ status bit
>> controls when an object needs collecting.
>>
>>
> So where does OBJECT_STATUS_GCREQ get set ? It's the GC (and only
> the GC) that knows when an object is alive or not (unless we do
> reachability analysis in the compiler/interpreter).
>
>
>> All of the above setting up and tearing down of objects and
>> references
>> to objects have equivalents for classes using the same rationale.
>> (The class GC pointer is 'rclass.pgarbage' in 'jvm/src/class.h'.
>> The local method GC pointer is 'JVMREG_STACK_GC_OFFSET' in
>> 'jvm/src/jvmreg.h' and is manipulated by GC_STACK_NEW() and
>> GC_STACK_DELETE().)
>>
>>
>>
>>> - The above is also complicated by
>>> . Reference types
>>> . Finalization
>>> . Locks
>>> . Address-based hashing
>>> The solutions to these are all pretty well known, but complicate
>>> the
>>>
>>>
>> design
>>
>>
>>> This is pretty much it - the rest (45 years of research) is
>>> optimizing the
>>> way this is all done.
>>>
>>>
>>>
>>>> Perhaps I am taking it a bit too simplistically,
>>>> but all I did was define a GC hook anywhere an object or
>>>> array reference or a class load or unload happened. How
>>>> does your concept differ? And what sort of approach should
>>>> be used instead.
>>>>
>>>>
>>> This I think brings us back to my initial question, asking what
>>> these
>>> hooks were supposed to do. I guess you're saying you had a vague
>>> idea
>>> that the GC might need to know about these events, so put hooks
>>> in for
>>> them.
>>>
>>> When I was looking around the code trying to find out where to start
>>> hooking in a managed heap, I looked for the 'new' or 'alloc'
>>> operation,
>>> and couldn't seem to find it.
>>>
>>>
>>>
>>
>> 'new' -- see object_instance_new() for objects,
>> class_static_new() for classes.
>>
>> 'delete' -- see object_instance_delete() for objects,
>> class_static_delete() for classes.
>>
>>
> Again, what is it that calls 'delete' - isn't that the GC's job ?
> I'm still confused.
>
>
>> All four of these functions perform a number of GC activities,
>> espcially calling the GC_xxx() macros listed above.
>>
>> Notice that, throughout the body of the code, _any_ time a reference
>> to an object (or class) is created or destroyed, one or more related
>> GC functions are called.
>>
>> What is _done_ with these events is up to the implementor
>> of the GC algorithm itself.
>>
>>
> The GC and the VM need to be in bed a bit more closely than this,
> at least for an implementation with reasonable performance.
>
> Attached is a tarball with 3 header files, and two (untested) GC
> implementations - unfortunately there's too much work for me to
> refactor the VM to match the interface and therefore to test. It's
> probably best treated as example code for now.
>
> cheers
> Robin
>
> <gc.tgz>
>
--
Geir Magnusson Jr +1-203-665-6437
geirm@apache.org
Re: Questions about GC implementations
Posted by "Geir Magnusson Jr." <ge...@apache.org>.
thanks
On Oct 31, 2005, at 10:57 PM, Robin Garner wrote:
> Done. This is issue #11.
>
> cheers
>
> Geir Magnusson Jr. wrote:
>
>
>> Robin - can you post this to a JIRA please?
>>
>> geir
>>
>> On Oct 29, 2005, at 1:57 AM, Robin Garner wrote:
>>
>>
>>>
>>>
>>>
>>>>> In the garbage collectors I've worked with, the essential
>>>>> design is:
>>>>>
>>>>> - 'new' allocates space on the heap.
>>>>> - The header of each object contains a pointer (or equiv) to a
>>>>> per-class
>>>>> data structure, that points to the GC map for the object (and
>>>>> virtual
>>>>> dispatch tables etc etc)
>>>>> - Reference fields in objects contain pointers directly to the
>>>>> heap
>>>>> objects they reference.
>>>>> - Pointer loads and stores are (optionally) performed via
>>>>> barriers - the
>>>>> terminology is a little confusing: these are not
>>>>> synchronization barriers,
>>>>> but opportunities for the GC to intercept the load/store and do
>>>>> some
>>>>> additional processing. Write barriers are common, read barriers
>>>>> less so.
>>>>>
>>>>>
>>>>>
>>>>
>>>> This is also the approach I have taken, so I think we are
>>>> on the same page. I think we are just saying the same thing
>>>> in different words. When an object is to be allocated,
>>>> its pointer will be set by the allocator. This pointer
>>>> is 'robject.pgarbage' and is found in 'jvm/src/object.h'.
>>>> The reason you did not find one is because I have only
>>>> provided a stub GC implementation. The 'new' operation
>>>> is performed by object_instance_new() in 'jvm/src/object.c'
>>>> and includes a call to GC_OBJECT_NEW().
>>>>
>>>>
>>>>
>>> Ok, so now I understand what the 'robject.pgarbage' pointer is -
>>> this is the pointer to the body of an object. So do I understand
>>> correctly that the body of an array is pointed to by a different
>>> pointer in an object header ? Do you realise that this means that
>>> a GETFIELD requires two indirections ?
>>>
>>> Do you realise that your object header is somewhere in the
>>> vicinity of 40 bytes ? The average payload of a java object (at
>>> least for SPECjvm98) is 16 bytes. Many of the fields in the
>>> bootJVM object header are actually constant across all objects of
>>> a class.
>>>
>>> The object table needs to go - I think this is a fundamental
>>> enough design feature that it needs to be removed as soon as
>>> possible. An object header should be 2 words at the most (3 for
>>> arrays), and they should be contiguous with the object.
>>>
>>>
>>>> Any time a reference to that object is made, its object
>>>> hash, of type 'jvm_object_hash' has a reference recorded
>>>> by GC_OBJECT_MKREF_FROM_OBJECT() or GC_OBJECT_FIELD_MKREF()
>>>> or GC_OBJECT_MKREF_FROM_JVM(). These are for internal
>>>> JVM references, references from Java object reference variables,
>>>> and references from Java local method reference variables,
>>>> respectively. When the reference is no longer needed, such as
>>>> when an object is destroyed or when a local method returns,
>>>> the reference is no longer needed. When this occurs,
>>>> then GC_OBJECT_RMREF_FROM_OBJECT() or GC_OBJECT_FIELD_RMREF()
>>>> or GC_OBJECT_RMREF_FROM_JVM() are called, respectively. When
>>>> an object is no longer used, then GC_OBJECT_DELETE() is called.
>>>>
>>>>
>>>>
>>> So these I guess are what we would refer to in the GC world as
>>> barriers. My confusion is that a) they don't have enough
>>> parameters, and b) there are too many of them. Am I right in
>>> assuming that you are invoking a barrier on every update to a
>>> local variable, every update to a stack ? This is going to be way
>>> too inefficient.
>>>
>>> For a stop the world GC, what is needed is a barrier on a
>>> PUTFIELD and AASTORE, and a way that the GC can query the current
>>> pointers that the VM has into the heap when it starts a
>>> collection. Optionally, we can also have a PUTSTATIC barrier, and
>>> corresponding read barriers.
>>>
>>> I also don't get the distinction between the classes and objects.
>>> Either a class is an object, and lives in the heap - in which
>>> case it can be treated as an object - or it isn't and lives in
>>> the VM.
>>>
>>>
>>>
>>>> When a local methods is called, GC_STACK_NEW() is invoked to set
>>>> up GC for that stack frame. Adding and removing objects is done
>>>> per above. Adding and removing references to objects is done
>>>> with GC_STACK_MKREF_FROM_JVM() and GC_STACK_RMREF_FROM_JVM().
>>>> When it returns, GC_STACK_DELETE() is called.
>>>>
>>>>
>>>>
>>> This is way too heavyweight. Stack manipulation is fast-path
>>> code, and shouldn't have to call out to the GC all the time.
>>>
>>>
>>>
>>>>> - There are many styles of collector, but the most common class
>>>>> uses
>>>>> tracing, in which a root set of pointers is used to determine
>>>>> an initial
>>>>> set of live objects, and the collector performs a transitive
>>>>> closure over
>>>>> this set to establish the set of all live objects. The root set is
>>>>> commonly the thread stacks and the static pointer fields.
>>>>>
>>>>>
>>>>>
>>>>
>>>> The macro GC_RUN() refers to the collector. The stub implementation
>>>> shows what it is supposed to do. The OBJECT_STATUS_GCREQ status bit
>>>> controls when an object needs collecting.
>>>>
>>>>
>>>>
>>> So where does OBJECT_STATUS_GCREQ get set ? It's the GC (and only
>>> the GC) that knows when an object is alive or not (unless we do
>>> reachability analysis in the compiler/interpreter).
>>>
>>>
>>>
>>>> All of the above setting up and tearing down of objects and
>>>> references
>>>> to objects have equivalents for classes using the same rationale.
>>>> (The class GC pointer is 'rclass.pgarbage' in 'jvm/src/class.h'.
>>>> The local method GC pointer is 'JVMREG_STACK_GC_OFFSET' in
>>>> 'jvm/src/jvmreg.h' and is manipulated by GC_STACK_NEW() and
>>>> GC_STACK_DELETE().)
>>>>
>>>>
>>>>
>>>>
>>>>> - The above is also complicated by
>>>>> . Reference types
>>>>> . Finalization
>>>>> . Locks
>>>>> . Address-based hashing
>>>>> The solutions to these are all pretty well known, but
>>>>> complicate the
>>>>>
>>>>>
>>>>>
>>>> design
>>>>
>>>>
>>>>
>>>>> This is pretty much it - the rest (45 years of research) is
>>>>> optimizing the
>>>>> way this is all done.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Perhaps I am taking it a bit too simplistically,
>>>>>> but all I did was define a GC hook anywhere an object or
>>>>>> array reference or a class load or unload happened. How
>>>>>> does your concept differ? And what sort of approach should
>>>>>> be used instead.
>>>>>>
>>>>>>
>>>>>>
>>>>> This I think brings us back to my initial question, asking what
>>>>> these
>>>>> hooks were supposed to do. I guess you're saying you had a
>>>>> vague idea
>>>>> that the GC might need to know about these events, so put hooks
>>>>> in for
>>>>> them.
>>>>>
>>>>> When I was looking around the code trying to find out where to
>>>>> start
>>>>> hooking in a managed heap, I looked for the 'new' or 'alloc'
>>>>> operation,
>>>>> and couldn't seem to find it.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> 'new' -- see object_instance_new() for objects,
>>>> class_static_new() for classes.
>>>>
>>>> 'delete' -- see object_instance_delete() for objects,
>>>> class_static_delete() for classes.
>>>>
>>>>
>>>>
>>> Again, what is it that calls 'delete' - isn't that the GC's job ?
>>> I'm still confused.
>>>
>>>
>>>
>>>> All four of these functions perform a number of GC activities,
>>>> espcially calling the GC_xxx() macros listed above.
>>>>
>>>> Notice that, throughout the body of the code, _any_ time a
>>>> reference
>>>> to an object (or class) is created or destroyed, one or more
>>>> related
>>>> GC functions are called.
>>>>
>>>> What is _done_ with these events is up to the implementor
>>>> of the GC algorithm itself.
>>>>
>>>>
>>>>
>>> The GC and the VM need to be in bed a bit more closely than this,
>>> at least for an implementation with reasonable performance.
>>>
>>> Attached is a tarball with 3 header files, and two (untested) GC
>>> implementations - unfortunately there's too much work for me to
>>> refactor the VM to match the interface and therefore to test.
>>> It's probably best treated as example code for now.
>>>
>>> cheers
>>> Robin
>>>
>>> <gc.tgz>
>>>
>>>
>>
>>
>
>
--
Geir Magnusson Jr +1-203-665-6437
geirm@apache.org
Re: Questions about GC implementations
Posted by Robin Garner <Ro...@anu.edu.au>.
Done. This is issue #11.
cheers
Geir Magnusson Jr. wrote:
> Robin - can you post this to a JIRA please?
>
> geir
>
> On Oct 29, 2005, at 1:57 AM, Robin Garner wrote:
>
>>
>>
>>>> In the garbage collectors I've worked with, the essential design is:
>>>>
>>>> - 'new' allocates space on the heap.
>>>> - The header of each object contains a pointer (or equiv) to a
>>>> per-class
>>>> data structure, that points to the GC map for the object (and virtual
>>>> dispatch tables etc etc)
>>>> - Reference fields in objects contain pointers directly to the heap
>>>> objects they reference.
>>>> - Pointer loads and stores are (optionally) performed via barriers
>>>> - the
>>>> terminology is a little confusing: these are not synchronization
>>>> barriers,
>>>> but opportunities for the GC to intercept the load/store and do some
>>>> additional processing. Write barriers are common, read barriers
>>>> less so.
>>>>
>>>>
>>>
>>> This is also the approach I have taken, so I think we are
>>> on the same page. I think we are just saying the same thing
>>> in different words. When an object is to be allocated,
>>> its pointer will be set by the allocator. This pointer
>>> is 'robject.pgarbage' and is found in 'jvm/src/object.h'.
>>> The reason you did not find one is because I have only
>>> provided a stub GC implementation. The 'new' operation
>>> is performed by object_instance_new() in 'jvm/src/object.c'
>>> and includes a call to GC_OBJECT_NEW().
>>>
>>>
>> Ok, so now I understand what the 'robject.pgarbage' pointer is - this
>> is the pointer to the body of an object. So do I understand correctly
>> that the body of an array is pointed to by a different pointer in an
>> object header ? Do you realise that this means that a GETFIELD
>> requires two indirections ?
>>
>> Do you realise that your object header is somewhere in the vicinity
>> of 40 bytes ? The average payload of a java object (at least for
>> SPECjvm98) is 16 bytes. Many of the fields in the bootJVM object
>> header are actually constant across all objects of a class.
>>
>> The object table needs to go - I think this is a fundamental enough
>> design feature that it needs to be removed as soon as possible. An
>> object header should be 2 words at the most (3 for arrays), and they
>> should be contiguous with the object.
>>
>>> Any time a reference to that object is made, its object
>>> hash, of type 'jvm_object_hash' has a reference recorded
>>> by GC_OBJECT_MKREF_FROM_OBJECT() or GC_OBJECT_FIELD_MKREF()
>>> or GC_OBJECT_MKREF_FROM_JVM(). These are for internal
>>> JVM references, references from Java object reference variables,
>>> and references from Java local method reference variables,
>>> respectively. When the reference is no longer needed, such as
>>> when an object is destroyed or when a local method returns,
>>> the reference is no longer needed. When this occurs,
>>> then GC_OBJECT_RMREF_FROM_OBJECT() or GC_OBJECT_FIELD_RMREF()
>>> or GC_OBJECT_RMREF_FROM_JVM() are called, respectively. When
>>> an object is no longer used, then GC_OBJECT_DELETE() is called.
>>>
>>>
>> So these I guess are what we would refer to in the GC world as
>> barriers. My confusion is that a) they don't have enough parameters,
>> and b) there are too many of them. Am I right in assuming that you
>> are invoking a barrier on every update to a local variable, every
>> update to a stack ? This is going to be way too inefficient.
>>
>> For a stop the world GC, what is needed is a barrier on a PUTFIELD
>> and AASTORE, and a way that the GC can query the current pointers
>> that the VM has into the heap when it starts a collection.
>> Optionally, we can also have a PUTSTATIC barrier, and corresponding
>> read barriers.
>>
>> I also don't get the distinction between the classes and objects.
>> Either a class is an object, and lives in the heap - in which case it
>> can be treated as an object - or it isn't and lives in the VM.
>>
>>
>>> When a local methods is called, GC_STACK_NEW() is invoked to set
>>> up GC for that stack frame. Adding and removing objects is done
>>> per above. Adding and removing references to objects is done
>>> with GC_STACK_MKREF_FROM_JVM() and GC_STACK_RMREF_FROM_JVM().
>>> When it returns, GC_STACK_DELETE() is called.
>>>
>>>
>> This is way too heavyweight. Stack manipulation is fast-path code,
>> and shouldn't have to call out to the GC all the time.
>>
>>
>>>> - There are many styles of collector, but the most common class uses
>>>> tracing, in which a root set of pointers is used to determine an
>>>> initial
>>>> set of live objects, and the collector performs a transitive
>>>> closure over
>>>> this set to establish the set of all live objects. The root set is
>>>> commonly the thread stacks and the static pointer fields.
>>>>
>>>>
>>>
>>> The macro GC_RUN() refers to the collector. The stub implementation
>>> shows what it is supposed to do. The OBJECT_STATUS_GCREQ status bit
>>> controls when an object needs collecting.
>>>
>>>
>> So where does OBJECT_STATUS_GCREQ get set ? It's the GC (and only the
>> GC) that knows when an object is alive or not (unless we do
>> reachability analysis in the compiler/interpreter).
>>
>>
>>> All of the above setting up and tearing down of objects and references
>>> to objects have equivalents for classes using the same rationale.
>>> (The class GC pointer is 'rclass.pgarbage' in 'jvm/src/class.h'.
>>> The local method GC pointer is 'JVMREG_STACK_GC_OFFSET' in
>>> 'jvm/src/jvmreg.h' and is manipulated by GC_STACK_NEW() and
>>> GC_STACK_DELETE().)
>>>
>>>
>>>
>>>> - The above is also complicated by
>>>> . Reference types
>>>> . Finalization
>>>> . Locks
>>>> . Address-based hashing
>>>> The solutions to these are all pretty well known, but complicate the
>>>>
>>>>
>>> design
>>>
>>>
>>>> This is pretty much it - the rest (45 years of research) is
>>>> optimizing the
>>>> way this is all done.
>>>>
>>>>
>>>>
>>>>> Perhaps I am taking it a bit too simplistically,
>>>>> but all I did was define a GC hook anywhere an object or
>>>>> array reference or a class load or unload happened. How
>>>>> does your concept differ? And what sort of approach should
>>>>> be used instead.
>>>>>
>>>>>
>>>> This I think brings us back to my initial question, asking what these
>>>> hooks were supposed to do. I guess you're saying you had a vague idea
>>>> that the GC might need to know about these events, so put hooks in for
>>>> them.
>>>>
>>>> When I was looking around the code trying to find out where to start
>>>> hooking in a managed heap, I looked for the 'new' or 'alloc'
>>>> operation,
>>>> and couldn't seem to find it.
>>>>
>>>>
>>>>
>>>
>>> 'new' -- see object_instance_new() for objects,
>>> class_static_new() for classes.
>>>
>>> 'delete' -- see object_instance_delete() for objects,
>>> class_static_delete() for classes.
>>>
>>>
>> Again, what is it that calls 'delete' - isn't that the GC's job ? I'm
>> still confused.
>>
>>
>>> All four of these functions perform a number of GC activities,
>>> espcially calling the GC_xxx() macros listed above.
>>>
>>> Notice that, throughout the body of the code, _any_ time a reference
>>> to an object (or class) is created or destroyed, one or more related
>>> GC functions are called.
>>>
>>> What is _done_ with these events is up to the implementor
>>> of the GC algorithm itself.
>>>
>>>
>> The GC and the VM need to be in bed a bit more closely than this, at
>> least for an implementation with reasonable performance.
>>
>> Attached is a tarball with 3 header files, and two (untested) GC
>> implementations - unfortunately there's too much work for me to
>> refactor the VM to match the interface and therefore to test. It's
>> probably best treated as example code for now.
>>
>> cheers
>> Robin
>>
>> <gc.tgz>
>>
>