You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Simon Zhou <si...@gmail.com> on 2009/03/31 12:06:57 UTC

[GSoC] Proposal of GC-1: implement WeakReference support in Harmony concurrent GC

Hi All,
Here is my proposal, any suggestion is welcome!

*Title/Summary:* GC-1: Implement WeakReference support in Harmony Concurent
GC

*Student:* Simon.Zhou

*Student e-mail:* simon.harmony@gmail.com

*Student Major:* Computer Science

*Student Degree:* Master

*Student Graduation:* 2009.7

*Organization:* Fudan University

*Assigned Mentor:*

*Abstract:*

weak reference allows a program to maintain special reference to objects,
which is a useful API for the program interacting with the garbage
collector to some extent. Apache Harmony has a concurrent garbage collector
(code name: Tick) which can perform the garbage collection without
suspending the application threads except for root set enumeration. Tick do
not have weak reference supporting now, it treats all the references object
just as  strong references, which may cause inefficiency for applications
using weak references. In this project, I will add the weak reference
support for Tick, which would be different from the implementation in
generational GC, because the consistency should be maintained for the heap
objects carefully. Read barrier of the get() method of reference object will
be used for this implementation, and the performance issue will be seriously
considered. before implement phantom reference, I will also implement the
finalizer feature for Tick, although it is not recommanded to be used except
it is necessary. Besides this, I am trying to reduce the floating garbage
for Tick’s concurrent algorithms.

*Detailed Description:*

Tick is a concurrent GC of Harmony DRLVM, it can do the garbage collection
without suspending application threads. For concurrent GC, the most
important issue that should be dealt with is the inconsistency problem, that
is if we trace the heap object graph while the mutator updating the same
graph, inconsistency problem occurs, some live objects may be mistaken
relaimed as garbage. In order to keep the consistency of objects, write
barriers is inserted when concurrent tracing starts, which intercept all the
write operations of object reference and record useful data for rescaning.
Now there are 3 different concurrent collection algorithms in Tick: DLG,
Sliding-View, Mostly-Concurrent; and 3 different write barriers are used
repectively. see more information [1].

Experiment result shows that Tick can significantly reduce or even eliminate
the pause time caused by garbage collection. However, one drawback of Tick
is that, it do not support weak reference mechanism, which is a important
API of Java programming language.

Weak reference is an useful tool for whom are much care about the
application memory usage, e.g. it can help developers to cache some
temporary large objects for later reusing; also, to do some cleanup
operations just when a object is acturally relaimed. There are 3 kinds of
weak reference class in Java language (the weakness is decreasingly): soft
reference, weak reference, phantom reference. Soft reference is usually used
to maintain better cache for large object, such as pictures load from web;
weak reference can help to reduce the frequency of creating short-live
objects; phantom reference can usully help programmer to do cleanup
operations just when an object is physically relaimed. (from now on, I will
use ‘weakref’ to represent the set of all the three kinds of weak
references) All the three kinds of reference classes have a get() method, if
the object has not been actually relaimed, the get() method will return a
reference of the referent object (referent object is the object which
reference object point to) except phantom reference (phantom reference’s
get() method always return null, since it is just used for cleanup
operations when object is actually relaimed). see more information [2,3]

Weakrefs are always processed in GC. For generational GC, weakref feature
can be implemented easily: when minor collection occurs, weak reference
objects are equeued and their referents are reclaimed. when major collection
occurs, soft reference objects are enqueued and their referents are
reclaimed. when finalizer are processed, phantom reference objects are
enqueued and their referents are reclaimed.

Unlike generational GC, Tick just simply treats all the references in the
heap as strong references, which make memory pressure become bigger for the
applications using much data structure based on weakrefs, such as
WeakHashMap. In this project, my job is to implement this weakref feature
for Tick, and this work should be independent to the concurrent algorithms
used in Tick.

For concurrent GC, weakref processing is different with generational GC,
because the tracing thread is running concurrently with application threads.
Application threads may invoke the get() method to get the reference of
referent object, it may:
1, put it to the thread local stack as local var or calling parameter, such
as
    Object localvar = weakref.get();
    functionA(weakref.get());
2, assign it to another object slot, such as
    obj.field = weakref.get();
these operations will change the reachability of referent object, which may
cause the inconsistency issues in concurrent GC. In order to deal with it,
what we should do is to intercept these operations while application thread
is running. Because all these operations use the get() method to get the
reference of referent object, inserting the read barrier in the get() is a
simple and effective approach. The main job of the read barrier is to add
the pointer of referent object to a remembered set, then the garbage
collection thread will pick it up and trace from it.

I divided the work into 3 major steps:

The first step is to implement the read barrier for get() method of
reference object. My approch is implementing the barrier using a VMHelper
method, which is a VMMagic based helper function writen in Java. That is,
using Java to write some hot funtions in JVM, then these Java methods can be
compilered and even inlined into the native code produced by JIT compiler
[4]. This is very useful approach to reduce the read barrier’s overhead. In
this step, I will write a Java version read barrier and add some code to the
JIT compiler (Jitrino and JET) to make it compile the barrier correctly and
effiently.

The second step is to implement the weakref processing module in Tick. This
module will collect all the weakrefs while concurrent marking and put them
to different queues depending on its reference type. When concurrent marking
finishes, GC threads pick up the reference object in the queues to see if
the referent object is marked, if so, the referent object is live and should
not be processed, otherwise, process the referent object according the type
of the reference object respectively. moreover, if the reference object is
intercepted by the read barrier in the get() method, its referent object
should be marked while tracing, so the consistency is well maintained.

After this step, I will test Tick with benchmark using weakrefs, or using
data structure based on the weakrefs (such as WeakHashMap). The experiment
result will be analyzed and compared to Tick without weakrefs supporting and
generational GC with weakrefs supporting. Then I will form the result data
to a document report.

The last step is trying to reduce the floating garabge [5] for concurrent
GC. This work is not related to weakrefs feature, but its effect may be
similar to it, since the major goal of reducing floating garbage is to ease
the memory pressure while using concurrent GC. Floating garbage always exist
and may not be totally eliminated in concurrent GC, my job is trying to
optimize its impact as much as possible.

*Additional Information:*

Personal Information:
  I am a master canidate of Computer Science in Parallel Processing
Institution, Fudan University. My interested area includes Compiler,
Computer Architecure, Managed Runtime etc. I started to follow Harmony with
interests 2 year ago, and I used to contribute to DRLVM and have submited
some patches to the community. As an intern, I improved Harmony concurrent
GC by using state-based phase design and implementing an efficient scheduler
in the summer of 2008, the patch of this work has been merged to the source
trunk [6].

Schedule:

  I can work at least 3 days per week for this work, my schedule this
project as follow:
  April 10 ~ April 26 Communicate with the mentor and design the interface
of the implementation, such as, the native interface of get() method, weak
reference processing interfaces in GC module. Write down the design
document.

  April 27 ~ May 24 implementing read barrier for get() method of reference
object.

  May 25 ~ June 30 implementing the weak and soft reference processing
features in GC module, add finalizer feature to Tick

  July 1 ~ July 15 implementing phantom reference processing features.

  July 16 ~ Aug 1 test and analyze Tick with weak reference related
benchmark. write document for implementing and testing.

  Aug 1 ~ Aug 15 try to improve Tick by reducing floating garbage.

*References:*

[1]
http://xiao-feng.blogspot.com/2007/03/comparison-between-snapshot-based-gc.html
[2] http://www.realjenius.com/node/377
[3] http://www.pawlan.com/Monica/refobjs/
[4]
http://mail-archives.apache.org/mod_mbox/harmony-dev/200505.mbox/%3C4293E6BD.1060303@anu.edu.au%3E
[5] http://www.memorymanagement.org/glossary/f.html#floating.garbage
[6] http://issues.apache.org/jira/browse/HARMONY-5989

-- 
>From : Simon.Zhou@PPI, Fudan University

Re: [GSoC] Proposal of GC-1: implement WeakReference support in Harmony concurrent GC

Posted by Xiao-Feng Li <xi...@gmail.com>.
Simon,

Probably you can have a STW weakref/finalization version at first,
then check if you want to support concurrent processing. This would be
easier.

For thread suspension, please check:
vm_suspend_all_threads and vm_resume_all_threads in
gc_gen/src/common/gc_platform.h

Thanks,
xiaofeng

On Wed, Apr 15, 2009 at 1:27 PM, Simon Zhou <si...@gmail.com> wrote:
> Thank you Xiao-feng It is very helpful!
>
> I am thinking of take 1~5 as my step one.
>
> In step 2, I will implement a STW version of finalizer processing.
> IMHO, maybe I need implement a new interface of VMCore for suspending all
> java threads, now it only has a interface for suspending and enumerating
> rootset, but I do not need rootset in this case, maybe a new interface just
> for suspending is more efficient, I guess.
> then I will do some experimental study on the finalization to get the impact
> on the pause time.
>
> Thanks
> Simon
>
> 2009/4/14 Xiao-Feng Li <xi...@gmail.com>
>
>> Simon, here is my understanding.
>>
>> GC has following things with weakref in SATB:
>> 1. to stop tracing when meeting a weakref object, and remember it in a
>> GC-weakref-queue for later processing;
>> 2. to catch the referent of a weakref get(), and remember it as a
>> dirty object (or a normal reachable object);
>> 3. During marking, the remembered dirty objects should be traced as roots.
>>
>> 4. When marking finishes, GC need to scan the GC-weakref-queue to find
>> unreachable referents, etc. Those weakref's referent field is cleared
>> and passed to VM-weakref-queue for further processing. (VM processes
>> them by executing enqueue() method of them.)
>> 5. to sweep.
>>
>> Step 4 must be executed after marking finishes, but step 4 can be
>> executed concurrently with mutators, and the write barrier can be
>> turned off.
>>
>> Note, the above doesn't include finalizer processing. Finalization
>> process needs to trace the finalizable objects as normal scanning,
>> which might have race condition with mutators. So write barrier should
>> be kept for it.
>>
>> So in my opinion, step 4 can be conducted in STW minor with finalizer
>> processing. Then it will be very similar to what Harmony GC already
>> has.
>>
>> For step 1, check scan_weak_reference()'s call sites and implementation;
>> For step 4, check collector_identify_finref()'s call sites and
>> implementation.
>>
>> Thanks,
>> xiaofeng
>>
>> On Tue, Apr 14, 2009 at 10:33 AM, Simon Zhou <si...@gmail.com>
>> wrote:
>> > After some investigations, I am wondering when to processing WeakRef for
>> > concurrent GC. my understanding is as follows
>> >
>> > IMHO, we can process them, also concurrently with mutators, when the
>> > concurrent marking finishes (for 2 SATB alogorithms). the advantage is
>> that,
>> > the data structure is easy to implement, I guess.
>> >
>> > Another choice is to process them along with the marking phase, that is,
>> we
>> > should construct sync queues for 3 kinds of reference object, the read
>> > barrier will enqueue and GC threads will dequeue. The WeakRef processing
>> > will finishes with the concurrent marking phase.
>> > For mostly concurrent algorithm, we can just process it in the short STW
>> > phase.
>> > Would you like to give some comments on this? Thank you very much!
>> >
>> > Thanks
>> > Simon
>> >
>> > 2009/4/1 Xiao-Feng Li <xi...@gmail.com>
>> >
>> >> Simon, I've got some time reviewing your proposal. It looks very
>> >> feasible. Probably you'd better read my blog entry on WeakRef and
>> >> Finalizer processing of Harmony [1].
>> >>
>> >> One question, will you implement weakref support for all the three
>> >> concurrent GC algorithm?
>> >>
>> >> Btw, please go ahead to submit/update your proposal to the GSoC
>> >> application web.
>> >>
>> >> [1]
>> >>
>> http://xiao-feng.blogspot.com/2007/05/weak-reference-processing-in-apache.html
>> >>
>> >> Thanks,
>> >> xiaofeng
>> >>
>> >> On Tue, Mar 31, 2009 at 6:06 PM, Simon Zhou <si...@gmail.com>
>> >> wrote:
>> >>  > Hi All,
>> >> > Here is my proposal, any suggestion is welcome!
>> >> >
>> >> > *Title/Summary:* GC-1: Implement WeakReference support in Harmony
>> >> Concurent
>> >> > GC
>> >> >
>> >> > *Student:* Simon.Zhou
>> >> >
>> >> > *Student e-mail:* simon.harmony@gmail.com
>> >> >
>> >> > *Student Major:* Computer Science
>> >> >
>> >> > *Student Degree:* Master
>> >> >
>> >> > *Student Graduation:* 2009.7
>> >> >
>> >> > *Organization:* Fudan University
>> >> >
>> >> > *Assigned Mentor:*
>> >> >
>> >> > *Abstract:*
>> >> >
>> >> > weak reference allows a program to maintain special reference to
>> objects,
>> >> > which is a useful API for the program interacting with the garbage
>> >> > collector to some extent. Apache Harmony has a concurrent garbage
>> >> collector
>> >> > (code name: Tick) which can perform the garbage collection without
>> >> > suspending the application threads except for root set enumeration.
>> Tick
>> >> do
>> >> > not have weak reference supporting now, it treats all the references
>> >> object
>> >> > just as  strong references, which may cause inefficiency for
>> applications
>> >> > using weak references. In this project, I will add the weak reference
>> >> > support for Tick, which would be different from the implementation in
>> >> > generational GC, because the consistency should be maintained for the
>> >> heap
>> >> > objects carefully. Read barrier of the get() method of reference
>> object
>> >> will
>> >> > be used for this implementation, and the performance issue will be
>> >> seriously
>> >> > considered. before implement phantom reference, I will also implement
>> the
>> >> > finalizer feature for Tick, although it is not recommanded to be used
>> >> except
>> >> > it is necessary. Besides this, I am trying to reduce the floating
>> garbage
>> >> > for Tick’s concurrent algorithms.
>> >> >
>> >> > *Detailed Description:*
>> >> >
>> >> > Tick is a concurrent GC of Harmony DRLVM, it can do the garbage
>> >> collection
>> >> > without suspending application threads. For concurrent GC, the most
>> >> > important issue that should be dealt with is the inconsistency
>> problem,
>> >> that
>> >> > is if we trace the heap object graph while the mutator updating the
>> same
>> >> > graph, inconsistency problem occurs, some live objects may be mistaken
>> >> > relaimed as garbage. In order to keep the consistency of objects,
>> write
>> >> > barriers is inserted when concurrent tracing starts, which intercept
>> all
>> >> the
>> >> > write operations of object reference and record useful data for
>> >> rescaning.
>> >> > Now there are 3 different concurrent collection algorithms in Tick:
>> DLG,
>> >> > Sliding-View, Mostly-Concurrent; and 3 different write barriers are
>> used
>> >> > repectively. see more information [1].
>> >> >
>> >> > Experiment result shows that Tick can significantly reduce or even
>> >> eliminate
>> >> > the pause time caused by garbage collection. However, one drawback of
>> >> Tick
>> >> > is that, it do not support weak reference mechanism, which is a
>> important
>> >> > API of Java programming language.
>> >> >
>> >> > Weak reference is an useful tool for whom are much care about the
>> >> > application memory usage, e.g. it can help developers to cache some
>> >> > temporary large objects for later reusing; also, to do some cleanup
>> >> > operations just when a object is acturally relaimed. There are 3 kinds
>> of
>> >> > weak reference class in Java language (the weakness is decreasingly):
>> >> soft
>> >> > reference, weak reference, phantom reference. Soft reference is
>> usually
>> >> used
>> >> > to maintain better cache for large object, such as pictures load from
>> >> web;
>> >> > weak reference can help to reduce the frequency of creating short-live
>> >> > objects; phantom reference can usully help programmer to do cleanup
>> >> > operations just when an object is physically relaimed. (from now on, I
>> >> will
>> >> > use ‘weakref’ to represent the set of all the three kinds of weak
>> >> > references) All the three kinds of reference classes have a get()
>> method,
>> >> if
>> >> > the object has not been actually relaimed, the get() method will
>> return a
>> >> > reference of the referent object (referent object is the object which
>> >> > reference object point to) except phantom reference (phantom
>> reference’s
>> >> > get() method always return null, since it is just used for cleanup
>> >> > operations when object is actually relaimed). see more information
>> [2,3]
>> >> >
>> >> > Weakrefs are always processed in GC. For generational GC, weakref
>> feature
>> >> > can be implemented easily: when minor collection occurs, weak
>> reference
>> >> > objects are equeued and their referents are reclaimed. when major
>> >> collection
>> >> > occurs, soft reference objects are enqueued and their referents are
>> >> > reclaimed. when finalizer are processed, phantom reference objects are
>> >> > enqueued and their referents are reclaimed.
>> >> >
>> >> > Unlike generational GC, Tick just simply treats all the references in
>> the
>> >> > heap as strong references, which make memory pressure become bigger
>> for
>> >> the
>> >> > applications using much data structure based on weakrefs, such as
>> >> > WeakHashMap. In this project, my job is to implement this weakref
>> feature
>> >> > for Tick, and this work should be independent to the concurrent
>> >> algorithms
>> >> > used in Tick.
>> >> >
>> >> > For concurrent GC, weakref processing is different with generational
>> GC,
>> >> > because the tracing thread is running concurrently with application
>> >> threads.
>> >> > Application threads may invoke the get() method to get the reference
>> of
>> >> > referent object, it may:
>> >> > 1, put it to the thread local stack as local var or calling parameter,
>> >> such
>> >> > as
>> >> >    Object localvar = weakref.get();
>> >> >    functionA(weakref.get());
>> >> > 2, assign it to another object slot, such as
>> >> >    obj.field = weakref.get();
>> >> > these operations will change the reachability of referent object,
>> which
>> >> may
>> >> > cause the inconsistency issues in concurrent GC. In order to deal with
>> >> it,
>> >> > what we should do is to intercept these operations while application
>> >> thread
>> >> > is running. Because all these operations use the get() method to get
>> the
>> >> > reference of referent object, inserting the read barrier in the get()
>> is
>> >> a
>> >> > simple and effective approach. The main job of the read barrier is to
>> add
>> >> > the pointer of referent object to a remembered set, then the garbage
>> >> > collection thread will pick it up and trace from it.
>> >> >
>> >> > I divided the work into 3 major steps:
>> >> >
>> >> > The first step is to implement the read barrier for get() method of
>> >> > reference object. My approch is implementing the barrier using a
>> VMHelper
>> >> > method, which is a VMMagic based helper function writen in Java. That
>> is,
>> >> > using Java to write some hot funtions in JVM, then these Java methods
>> can
>> >> be
>> >> > compilered and even inlined into the native code produced by JIT
>> compiler
>> >> > [4]. This is very useful approach to reduce the read barrier’s
>> overhead.
>> >> In
>> >> > this step, I will write a Java version read barrier and add some code
>> to
>> >> the
>> >> > JIT compiler (Jitrino and JET) to make it compile the barrier
>> correctly
>> >> and
>> >> > effiently.
>> >> >
>> >> > The second step is to implement the weakref processing module in Tick.
>> >> This
>> >> > module will collect all the weakrefs while concurrent marking and put
>> >> them
>> >> > to different queues depending on its reference type. When concurrent
>> >> marking
>> >> > finishes, GC threads pick up the reference object in the queues to see
>> if
>> >> > the referent object is marked, if so, the referent object is live and
>> >> should
>> >> > not be processed, otherwise, process the referent object according the
>> >> type
>> >> > of the reference object respectively. moreover, if the reference
>> object
>> >> is
>> >> > intercepted by the read barrier in the get() method, its referent
>> object
>> >> > should be marked while tracing, so the consistency is well maintained.
>> >> >
>> >> > After this step, I will test Tick with benchmark using weakrefs, or
>> using
>> >> > data structure based on the weakrefs (such as WeakHashMap). The
>> >> experiment
>> >> > result will be analyzed and compared to Tick without weakrefs
>> supporting
>> >> and
>> >> > generational GC with weakrefs supporting. Then I will form the result
>> >> data
>> >> > to a document report.
>> >> >
>> >> > The last step is trying to reduce the floating garabge [5] for
>> concurrent
>> >> > GC. This work is not related to weakrefs feature, but its effect may
>> be
>> >> > similar to it, since the major goal of reducing floating garbage is to
>> >> ease
>> >> > the memory pressure while using concurrent GC. Floating garbage always
>> >> exist
>> >> > and may not be totally eliminated in concurrent GC, my job is trying
>> to
>> >> > optimize its impact as much as possible.
>> >> >
>> >> > *Additional Information:*
>> >> >
>> >> > Personal Information:
>> >> >  I am a master canidate of Computer Science in Parallel Processing
>> >> > Institution, Fudan University. My interested area includes Compiler,
>> >> > Computer Architecure, Managed Runtime etc. I started to follow Harmony
>> >> with
>> >> > interests 2 year ago, and I used to contribute to DRLVM and have
>> submited
>> >> > some patches to the community. As an intern, I improved Harmony
>> >> concurrent
>> >> > GC by using state-based phase design and implementing an efficient
>> >> scheduler
>> >> > in the summer of 2008, the patch of this work has been merged to the
>> >> source
>> >> > trunk [6].
>> >> >
>> >> > Schedule:
>> >> >
>> >> >  I can work at least 3 days per week for this work, my schedule this
>> >> > project as follow:
>> >> >  April 10 ~ April 26 Communicate with the mentor and design the
>> interface
>> >> > of the implementation, such as, the native interface of get() method,
>> >> weak
>> >> > reference processing interfaces in GC module. Write down the design
>> >> > document.
>> >> >
>> >> >  April 27 ~ May 24 implementing read barrier for get() method of
>> >> reference
>> >> > object.
>> >> >
>> >> >  May 25 ~ June 30 implementing the weak and soft reference processing
>> >> > features in GC module, add finalizer feature to Tick
>> >> >
>> >> >  July 1 ~ July 15 implementing phantom reference processing features.
>> >> >
>> >> >  July 16 ~ Aug 1 test and analyze Tick with weak reference related
>> >> > benchmark. write document for implementing and testing.
>> >> >
>> >> >  Aug 1 ~ Aug 15 try to improve Tick by reducing floating garbage.
>> >> >
>> >> > *References:*
>> >> >
>> >> > [1]
>> >> >
>> >>
>> http://xiao-feng.blogspot.com/2007/03/comparison-between-snapshot-based-gc.html
>> >> > [2] http://www.realjenius.com/node/377
>> >> > [3] http://www.pawlan.com/Monica/refobjs/
>> >> > [4]
>> >> >
>> >>
>> http://mail-archives.apache.org/mod_mbox/harmony-dev/200505.mbox/%3C4293E6BD.1060303@anu.edu.au%3E
>> >> > [5] http://www.memorymanagement.org/glossary/f.html#floating.garbage
>> >> > [6] http://issues.apache.org/jira/browse/HARMONY-5989
>> >> >
>> >> > --
>> >> > From : Simon.Zhou@PPI, Fudan University
>> >> >
>> >>
>> >>
>> >>
>> >>  --
>> >> http://people.apache.org/~xli <http://people.apache.org/%7Exli>
>> >>
>> >
>> >
>> >
>> > --
>> > From : Simon.Zhou@PPI, Fudan University
>> >
>>
>>
>>
>> --
>> http://people.apache.org/~xli <http://people.apache.org/%7Exli>
>>
>
>
>
> --
> From : Simon.Zhou@PPI, Fudan University
>



-- 
http://people.apache.org/~xli

Re: [GSoC] Proposal of GC-1: implement WeakReference support in Harmony concurrent GC

Posted by Simon Zhou <si...@gmail.com>.
Thank you Xiao-feng It is very helpful!

I am thinking of take 1~5 as my step one.

In step 2, I will implement a STW version of finalizer processing.
IMHO, maybe I need implement a new interface of VMCore for suspending all
java threads, now it only has a interface for suspending and enumerating
rootset, but I do not need rootset in this case, maybe a new interface just
for suspending is more efficient, I guess.
then I will do some experimental study on the finalization to get the impact
on the pause time.

Thanks
Simon

2009/4/14 Xiao-Feng Li <xi...@gmail.com>

> Simon, here is my understanding.
>
> GC has following things with weakref in SATB:
> 1. to stop tracing when meeting a weakref object, and remember it in a
> GC-weakref-queue for later processing;
> 2. to catch the referent of a weakref get(), and remember it as a
> dirty object (or a normal reachable object);
> 3. During marking, the remembered dirty objects should be traced as roots.
>
> 4. When marking finishes, GC need to scan the GC-weakref-queue to find
> unreachable referents, etc. Those weakref's referent field is cleared
> and passed to VM-weakref-queue for further processing. (VM processes
> them by executing enqueue() method of them.)
> 5. to sweep.
>
> Step 4 must be executed after marking finishes, but step 4 can be
> executed concurrently with mutators, and the write barrier can be
> turned off.
>
> Note, the above doesn't include finalizer processing. Finalization
> process needs to trace the finalizable objects as normal scanning,
> which might have race condition with mutators. So write barrier should
> be kept for it.
>
> So in my opinion, step 4 can be conducted in STW minor with finalizer
> processing. Then it will be very similar to what Harmony GC already
> has.
>
> For step 1, check scan_weak_reference()'s call sites and implementation;
> For step 4, check collector_identify_finref()'s call sites and
> implementation.
>
> Thanks,
> xiaofeng
>
> On Tue, Apr 14, 2009 at 10:33 AM, Simon Zhou <si...@gmail.com>
> wrote:
> > After some investigations, I am wondering when to processing WeakRef for
> > concurrent GC. my understanding is as follows
> >
> > IMHO, we can process them, also concurrently with mutators, when the
> > concurrent marking finishes (for 2 SATB alogorithms). the advantage is
> that,
> > the data structure is easy to implement, I guess.
> >
> > Another choice is to process them along with the marking phase, that is,
> we
> > should construct sync queues for 3 kinds of reference object, the read
> > barrier will enqueue and GC threads will dequeue. The WeakRef processing
> > will finishes with the concurrent marking phase.
> > For mostly concurrent algorithm, we can just process it in the short STW
> > phase.
> > Would you like to give some comments on this? Thank you very much!
> >
> > Thanks
> > Simon
> >
> > 2009/4/1 Xiao-Feng Li <xi...@gmail.com>
> >
> >> Simon, I've got some time reviewing your proposal. It looks very
> >> feasible. Probably you'd better read my blog entry on WeakRef and
> >> Finalizer processing of Harmony [1].
> >>
> >> One question, will you implement weakref support for all the three
> >> concurrent GC algorithm?
> >>
> >> Btw, please go ahead to submit/update your proposal to the GSoC
> >> application web.
> >>
> >> [1]
> >>
> http://xiao-feng.blogspot.com/2007/05/weak-reference-processing-in-apache.html
> >>
> >> Thanks,
> >> xiaofeng
> >>
> >> On Tue, Mar 31, 2009 at 6:06 PM, Simon Zhou <si...@gmail.com>
> >> wrote:
> >>  > Hi All,
> >> > Here is my proposal, any suggestion is welcome!
> >> >
> >> > *Title/Summary:* GC-1: Implement WeakReference support in Harmony
> >> Concurent
> >> > GC
> >> >
> >> > *Student:* Simon.Zhou
> >> >
> >> > *Student e-mail:* simon.harmony@gmail.com
> >> >
> >> > *Student Major:* Computer Science
> >> >
> >> > *Student Degree:* Master
> >> >
> >> > *Student Graduation:* 2009.7
> >> >
> >> > *Organization:* Fudan University
> >> >
> >> > *Assigned Mentor:*
> >> >
> >> > *Abstract:*
> >> >
> >> > weak reference allows a program to maintain special reference to
> objects,
> >> > which is a useful API for the program interacting with the garbage
> >> > collector to some extent. Apache Harmony has a concurrent garbage
> >> collector
> >> > (code name: Tick) which can perform the garbage collection without
> >> > suspending the application threads except for root set enumeration.
> Tick
> >> do
> >> > not have weak reference supporting now, it treats all the references
> >> object
> >> > just as  strong references, which may cause inefficiency for
> applications
> >> > using weak references. In this project, I will add the weak reference
> >> > support for Tick, which would be different from the implementation in
> >> > generational GC, because the consistency should be maintained for the
> >> heap
> >> > objects carefully. Read barrier of the get() method of reference
> object
> >> will
> >> > be used for this implementation, and the performance issue will be
> >> seriously
> >> > considered. before implement phantom reference, I will also implement
> the
> >> > finalizer feature for Tick, although it is not recommanded to be used
> >> except
> >> > it is necessary. Besides this, I am trying to reduce the floating
> garbage
> >> > for Tick’s concurrent algorithms.
> >> >
> >> > *Detailed Description:*
> >> >
> >> > Tick is a concurrent GC of Harmony DRLVM, it can do the garbage
> >> collection
> >> > without suspending application threads. For concurrent GC, the most
> >> > important issue that should be dealt with is the inconsistency
> problem,
> >> that
> >> > is if we trace the heap object graph while the mutator updating the
> same
> >> > graph, inconsistency problem occurs, some live objects may be mistaken
> >> > relaimed as garbage. In order to keep the consistency of objects,
> write
> >> > barriers is inserted when concurrent tracing starts, which intercept
> all
> >> the
> >> > write operations of object reference and record useful data for
> >> rescaning.
> >> > Now there are 3 different concurrent collection algorithms in Tick:
> DLG,
> >> > Sliding-View, Mostly-Concurrent; and 3 different write barriers are
> used
> >> > repectively. see more information [1].
> >> >
> >> > Experiment result shows that Tick can significantly reduce or even
> >> eliminate
> >> > the pause time caused by garbage collection. However, one drawback of
> >> Tick
> >> > is that, it do not support weak reference mechanism, which is a
> important
> >> > API of Java programming language.
> >> >
> >> > Weak reference is an useful tool for whom are much care about the
> >> > application memory usage, e.g. it can help developers to cache some
> >> > temporary large objects for later reusing; also, to do some cleanup
> >> > operations just when a object is acturally relaimed. There are 3 kinds
> of
> >> > weak reference class in Java language (the weakness is decreasingly):
> >> soft
> >> > reference, weak reference, phantom reference. Soft reference is
> usually
> >> used
> >> > to maintain better cache for large object, such as pictures load from
> >> web;
> >> > weak reference can help to reduce the frequency of creating short-live
> >> > objects; phantom reference can usully help programmer to do cleanup
> >> > operations just when an object is physically relaimed. (from now on, I
> >> will
> >> > use ‘weakref’ to represent the set of all the three kinds of weak
> >> > references) All the three kinds of reference classes have a get()
> method,
> >> if
> >> > the object has not been actually relaimed, the get() method will
> return a
> >> > reference of the referent object (referent object is the object which
> >> > reference object point to) except phantom reference (phantom
> reference’s
> >> > get() method always return null, since it is just used for cleanup
> >> > operations when object is actually relaimed). see more information
> [2,3]
> >> >
> >> > Weakrefs are always processed in GC. For generational GC, weakref
> feature
> >> > can be implemented easily: when minor collection occurs, weak
> reference
> >> > objects are equeued and their referents are reclaimed. when major
> >> collection
> >> > occurs, soft reference objects are enqueued and their referents are
> >> > reclaimed. when finalizer are processed, phantom reference objects are
> >> > enqueued and their referents are reclaimed.
> >> >
> >> > Unlike generational GC, Tick just simply treats all the references in
> the
> >> > heap as strong references, which make memory pressure become bigger
> for
> >> the
> >> > applications using much data structure based on weakrefs, such as
> >> > WeakHashMap. In this project, my job is to implement this weakref
> feature
> >> > for Tick, and this work should be independent to the concurrent
> >> algorithms
> >> > used in Tick.
> >> >
> >> > For concurrent GC, weakref processing is different with generational
> GC,
> >> > because the tracing thread is running concurrently with application
> >> threads.
> >> > Application threads may invoke the get() method to get the reference
> of
> >> > referent object, it may:
> >> > 1, put it to the thread local stack as local var or calling parameter,
> >> such
> >> > as
> >> >    Object localvar = weakref.get();
> >> >    functionA(weakref.get());
> >> > 2, assign it to another object slot, such as
> >> >    obj.field = weakref.get();
> >> > these operations will change the reachability of referent object,
> which
> >> may
> >> > cause the inconsistency issues in concurrent GC. In order to deal with
> >> it,
> >> > what we should do is to intercept these operations while application
> >> thread
> >> > is running. Because all these operations use the get() method to get
> the
> >> > reference of referent object, inserting the read barrier in the get()
> is
> >> a
> >> > simple and effective approach. The main job of the read barrier is to
> add
> >> > the pointer of referent object to a remembered set, then the garbage
> >> > collection thread will pick it up and trace from it.
> >> >
> >> > I divided the work into 3 major steps:
> >> >
> >> > The first step is to implement the read barrier for get() method of
> >> > reference object. My approch is implementing the barrier using a
> VMHelper
> >> > method, which is a VMMagic based helper function writen in Java. That
> is,
> >> > using Java to write some hot funtions in JVM, then these Java methods
> can
> >> be
> >> > compilered and even inlined into the native code produced by JIT
> compiler
> >> > [4]. This is very useful approach to reduce the read barrier’s
> overhead.
> >> In
> >> > this step, I will write a Java version read barrier and add some code
> to
> >> the
> >> > JIT compiler (Jitrino and JET) to make it compile the barrier
> correctly
> >> and
> >> > effiently.
> >> >
> >> > The second step is to implement the weakref processing module in Tick.
> >> This
> >> > module will collect all the weakrefs while concurrent marking and put
> >> them
> >> > to different queues depending on its reference type. When concurrent
> >> marking
> >> > finishes, GC threads pick up the reference object in the queues to see
> if
> >> > the referent object is marked, if so, the referent object is live and
> >> should
> >> > not be processed, otherwise, process the referent object according the
> >> type
> >> > of the reference object respectively. moreover, if the reference
> object
> >> is
> >> > intercepted by the read barrier in the get() method, its referent
> object
> >> > should be marked while tracing, so the consistency is well maintained.
> >> >
> >> > After this step, I will test Tick with benchmark using weakrefs, or
> using
> >> > data structure based on the weakrefs (such as WeakHashMap). The
> >> experiment
> >> > result will be analyzed and compared to Tick without weakrefs
> supporting
> >> and
> >> > generational GC with weakrefs supporting. Then I will form the result
> >> data
> >> > to a document report.
> >> >
> >> > The last step is trying to reduce the floating garabge [5] for
> concurrent
> >> > GC. This work is not related to weakrefs feature, but its effect may
> be
> >> > similar to it, since the major goal of reducing floating garbage is to
> >> ease
> >> > the memory pressure while using concurrent GC. Floating garbage always
> >> exist
> >> > and may not be totally eliminated in concurrent GC, my job is trying
> to
> >> > optimize its impact as much as possible.
> >> >
> >> > *Additional Information:*
> >> >
> >> > Personal Information:
> >> >  I am a master canidate of Computer Science in Parallel Processing
> >> > Institution, Fudan University. My interested area includes Compiler,
> >> > Computer Architecure, Managed Runtime etc. I started to follow Harmony
> >> with
> >> > interests 2 year ago, and I used to contribute to DRLVM and have
> submited
> >> > some patches to the community. As an intern, I improved Harmony
> >> concurrent
> >> > GC by using state-based phase design and implementing an efficient
> >> scheduler
> >> > in the summer of 2008, the patch of this work has been merged to the
> >> source
> >> > trunk [6].
> >> >
> >> > Schedule:
> >> >
> >> >  I can work at least 3 days per week for this work, my schedule this
> >> > project as follow:
> >> >  April 10 ~ April 26 Communicate with the mentor and design the
> interface
> >> > of the implementation, such as, the native interface of get() method,
> >> weak
> >> > reference processing interfaces in GC module. Write down the design
> >> > document.
> >> >
> >> >  April 27 ~ May 24 implementing read barrier for get() method of
> >> reference
> >> > object.
> >> >
> >> >  May 25 ~ June 30 implementing the weak and soft reference processing
> >> > features in GC module, add finalizer feature to Tick
> >> >
> >> >  July 1 ~ July 15 implementing phantom reference processing features.
> >> >
> >> >  July 16 ~ Aug 1 test and analyze Tick with weak reference related
> >> > benchmark. write document for implementing and testing.
> >> >
> >> >  Aug 1 ~ Aug 15 try to improve Tick by reducing floating garbage.
> >> >
> >> > *References:*
> >> >
> >> > [1]
> >> >
> >>
> http://xiao-feng.blogspot.com/2007/03/comparison-between-snapshot-based-gc.html
> >> > [2] http://www.realjenius.com/node/377
> >> > [3] http://www.pawlan.com/Monica/refobjs/
> >> > [4]
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/harmony-dev/200505.mbox/%3C4293E6BD.1060303@anu.edu.au%3E
> >> > [5] http://www.memorymanagement.org/glossary/f.html#floating.garbage
> >> > [6] http://issues.apache.org/jira/browse/HARMONY-5989
> >> >
> >> > --
> >> > From : Simon.Zhou@PPI, Fudan University
> >> >
> >>
> >>
> >>
> >>  --
> >> http://people.apache.org/~xli <http://people.apache.org/%7Exli>
> >>
> >
> >
> >
> > --
> > From : Simon.Zhou@PPI, Fudan University
> >
>
>
>
> --
> http://people.apache.org/~xli <http://people.apache.org/%7Exli>
>



-- 
>From : Simon.Zhou@PPI, Fudan University

Re: [GSoC] Proposal of GC-1: implement WeakReference support in Harmony concurrent GC

Posted by Xiao-Feng Li <xi...@gmail.com>.
Simon, here is my understanding.

GC has following things with weakref in SATB:
1. to stop tracing when meeting a weakref object, and remember it in a
GC-weakref-queue for later processing;
2. to catch the referent of a weakref get(), and remember it as a
dirty object (or a normal reachable object);
3. During marking, the remembered dirty objects should be traced as roots.

4. When marking finishes, GC need to scan the GC-weakref-queue to find
unreachable referents, etc. Those weakref's referent field is cleared
and passed to VM-weakref-queue for further processing. (VM processes
them by executing enqueue() method of them.)
5. to sweep.

Step 4 must be executed after marking finishes, but step 4 can be
executed concurrently with mutators, and the write barrier can be
turned off.

Note, the above doesn't include finalizer processing. Finalization
process needs to trace the finalizable objects as normal scanning,
which might have race condition with mutators. So write barrier should
be kept for it.

So in my opinion, step 4 can be conducted in STW minor with finalizer
processing. Then it will be very similar to what Harmony GC already
has.

For step 1, check scan_weak_reference()'s call sites and implementation;
For step 4, check collector_identify_finref()'s call sites and implementation.

Thanks,
xiaofeng

On Tue, Apr 14, 2009 at 10:33 AM, Simon Zhou <si...@gmail.com> wrote:
> After some investigations, I am wondering when to processing WeakRef for
> concurrent GC. my understanding is as follows
>
> IMHO, we can process them, also concurrently with mutators, when the
> concurrent marking finishes (for 2 SATB alogorithms). the advantage is that,
> the data structure is easy to implement, I guess.
>
> Another choice is to process them along with the marking phase, that is, we
> should construct sync queues for 3 kinds of reference object, the read
> barrier will enqueue and GC threads will dequeue. The WeakRef processing
> will finishes with the concurrent marking phase.
> For mostly concurrent algorithm, we can just process it in the short STW
> phase.
> Would you like to give some comments on this? Thank you very much!
>
> Thanks
> Simon
>
> 2009/4/1 Xiao-Feng Li <xi...@gmail.com>
>
>> Simon, I've got some time reviewing your proposal. It looks very
>> feasible. Probably you'd better read my blog entry on WeakRef and
>> Finalizer processing of Harmony [1].
>>
>> One question, will you implement weakref support for all the three
>> concurrent GC algorithm?
>>
>> Btw, please go ahead to submit/update your proposal to the GSoC
>> application web.
>>
>> [1]
>> http://xiao-feng.blogspot.com/2007/05/weak-reference-processing-in-apache.html
>>
>> Thanks,
>> xiaofeng
>>
>> On Tue, Mar 31, 2009 at 6:06 PM, Simon Zhou <si...@gmail.com>
>> wrote:
>>  > Hi All,
>> > Here is my proposal, any suggestion is welcome!
>> >
>> > *Title/Summary:* GC-1: Implement WeakReference support in Harmony
>> Concurent
>> > GC
>> >
>> > *Student:* Simon.Zhou
>> >
>> > *Student e-mail:* simon.harmony@gmail.com
>> >
>> > *Student Major:* Computer Science
>> >
>> > *Student Degree:* Master
>> >
>> > *Student Graduation:* 2009.7
>> >
>> > *Organization:* Fudan University
>> >
>> > *Assigned Mentor:*
>> >
>> > *Abstract:*
>> >
>> > weak reference allows a program to maintain special reference to objects,
>> > which is a useful API for the program interacting with the garbage
>> > collector to some extent. Apache Harmony has a concurrent garbage
>> collector
>> > (code name: Tick) which can perform the garbage collection without
>> > suspending the application threads except for root set enumeration. Tick
>> do
>> > not have weak reference supporting now, it treats all the references
>> object
>> > just as  strong references, which may cause inefficiency for applications
>> > using weak references. In this project, I will add the weak reference
>> > support for Tick, which would be different from the implementation in
>> > generational GC, because the consistency should be maintained for the
>> heap
>> > objects carefully. Read barrier of the get() method of reference object
>> will
>> > be used for this implementation, and the performance issue will be
>> seriously
>> > considered. before implement phantom reference, I will also implement the
>> > finalizer feature for Tick, although it is not recommanded to be used
>> except
>> > it is necessary. Besides this, I am trying to reduce the floating garbage
>> > for Tick’s concurrent algorithms.
>> >
>> > *Detailed Description:*
>> >
>> > Tick is a concurrent GC of Harmony DRLVM, it can do the garbage
>> collection
>> > without suspending application threads. For concurrent GC, the most
>> > important issue that should be dealt with is the inconsistency problem,
>> that
>> > is if we trace the heap object graph while the mutator updating the same
>> > graph, inconsistency problem occurs, some live objects may be mistaken
>> > relaimed as garbage. In order to keep the consistency of objects, write
>> > barriers is inserted when concurrent tracing starts, which intercept all
>> the
>> > write operations of object reference and record useful data for
>> rescaning.
>> > Now there are 3 different concurrent collection algorithms in Tick: DLG,
>> > Sliding-View, Mostly-Concurrent; and 3 different write barriers are used
>> > repectively. see more information [1].
>> >
>> > Experiment result shows that Tick can significantly reduce or even
>> eliminate
>> > the pause time caused by garbage collection. However, one drawback of
>> Tick
>> > is that, it do not support weak reference mechanism, which is a important
>> > API of Java programming language.
>> >
>> > Weak reference is an useful tool for whom are much care about the
>> > application memory usage, e.g. it can help developers to cache some
>> > temporary large objects for later reusing; also, to do some cleanup
>> > operations just when a object is acturally relaimed. There are 3 kinds of
>> > weak reference class in Java language (the weakness is decreasingly):
>> soft
>> > reference, weak reference, phantom reference. Soft reference is usually
>> used
>> > to maintain better cache for large object, such as pictures load from
>> web;
>> > weak reference can help to reduce the frequency of creating short-live
>> > objects; phantom reference can usully help programmer to do cleanup
>> > operations just when an object is physically relaimed. (from now on, I
>> will
>> > use ‘weakref’ to represent the set of all the three kinds of weak
>> > references) All the three kinds of reference classes have a get() method,
>> if
>> > the object has not been actually relaimed, the get() method will return a
>> > reference of the referent object (referent object is the object which
>> > reference object point to) except phantom reference (phantom reference’s
>> > get() method always return null, since it is just used for cleanup
>> > operations when object is actually relaimed). see more information [2,3]
>> >
>> > Weakrefs are always processed in GC. For generational GC, weakref feature
>> > can be implemented easily: when minor collection occurs, weak reference
>> > objects are equeued and their referents are reclaimed. when major
>> collection
>> > occurs, soft reference objects are enqueued and their referents are
>> > reclaimed. when finalizer are processed, phantom reference objects are
>> > enqueued and their referents are reclaimed.
>> >
>> > Unlike generational GC, Tick just simply treats all the references in the
>> > heap as strong references, which make memory pressure become bigger for
>> the
>> > applications using much data structure based on weakrefs, such as
>> > WeakHashMap. In this project, my job is to implement this weakref feature
>> > for Tick, and this work should be independent to the concurrent
>> algorithms
>> > used in Tick.
>> >
>> > For concurrent GC, weakref processing is different with generational GC,
>> > because the tracing thread is running concurrently with application
>> threads.
>> > Application threads may invoke the get() method to get the reference of
>> > referent object, it may:
>> > 1, put it to the thread local stack as local var or calling parameter,
>> such
>> > as
>> >    Object localvar = weakref.get();
>> >    functionA(weakref.get());
>> > 2, assign it to another object slot, such as
>> >    obj.field = weakref.get();
>> > these operations will change the reachability of referent object, which
>> may
>> > cause the inconsistency issues in concurrent GC. In order to deal with
>> it,
>> > what we should do is to intercept these operations while application
>> thread
>> > is running. Because all these operations use the get() method to get the
>> > reference of referent object, inserting the read barrier in the get() is
>> a
>> > simple and effective approach. The main job of the read barrier is to add
>> > the pointer of referent object to a remembered set, then the garbage
>> > collection thread will pick it up and trace from it.
>> >
>> > I divided the work into 3 major steps:
>> >
>> > The first step is to implement the read barrier for get() method of
>> > reference object. My approch is implementing the barrier using a VMHelper
>> > method, which is a VMMagic based helper function writen in Java. That is,
>> > using Java to write some hot funtions in JVM, then these Java methods can
>> be
>> > compilered and even inlined into the native code produced by JIT compiler
>> > [4]. This is very useful approach to reduce the read barrier’s overhead.
>> In
>> > this step, I will write a Java version read barrier and add some code to
>> the
>> > JIT compiler (Jitrino and JET) to make it compile the barrier correctly
>> and
>> > effiently.
>> >
>> > The second step is to implement the weakref processing module in Tick.
>> This
>> > module will collect all the weakrefs while concurrent marking and put
>> them
>> > to different queues depending on its reference type. When concurrent
>> marking
>> > finishes, GC threads pick up the reference object in the queues to see if
>> > the referent object is marked, if so, the referent object is live and
>> should
>> > not be processed, otherwise, process the referent object according the
>> type
>> > of the reference object respectively. moreover, if the reference object
>> is
>> > intercepted by the read barrier in the get() method, its referent object
>> > should be marked while tracing, so the consistency is well maintained.
>> >
>> > After this step, I will test Tick with benchmark using weakrefs, or using
>> > data structure based on the weakrefs (such as WeakHashMap). The
>> experiment
>> > result will be analyzed and compared to Tick without weakrefs supporting
>> and
>> > generational GC with weakrefs supporting. Then I will form the result
>> data
>> > to a document report.
>> >
>> > The last step is trying to reduce the floating garabge [5] for concurrent
>> > GC. This work is not related to weakrefs feature, but its effect may be
>> > similar to it, since the major goal of reducing floating garbage is to
>> ease
>> > the memory pressure while using concurrent GC. Floating garbage always
>> exist
>> > and may not be totally eliminated in concurrent GC, my job is trying to
>> > optimize its impact as much as possible.
>> >
>> > *Additional Information:*
>> >
>> > Personal Information:
>> >  I am a master canidate of Computer Science in Parallel Processing
>> > Institution, Fudan University. My interested area includes Compiler,
>> > Computer Architecure, Managed Runtime etc. I started to follow Harmony
>> with
>> > interests 2 year ago, and I used to contribute to DRLVM and have submited
>> > some patches to the community. As an intern, I improved Harmony
>> concurrent
>> > GC by using state-based phase design and implementing an efficient
>> scheduler
>> > in the summer of 2008, the patch of this work has been merged to the
>> source
>> > trunk [6].
>> >
>> > Schedule:
>> >
>> >  I can work at least 3 days per week for this work, my schedule this
>> > project as follow:
>> >  April 10 ~ April 26 Communicate with the mentor and design the interface
>> > of the implementation, such as, the native interface of get() method,
>> weak
>> > reference processing interfaces in GC module. Write down the design
>> > document.
>> >
>> >  April 27 ~ May 24 implementing read barrier for get() method of
>> reference
>> > object.
>> >
>> >  May 25 ~ June 30 implementing the weak and soft reference processing
>> > features in GC module, add finalizer feature to Tick
>> >
>> >  July 1 ~ July 15 implementing phantom reference processing features.
>> >
>> >  July 16 ~ Aug 1 test and analyze Tick with weak reference related
>> > benchmark. write document for implementing and testing.
>> >
>> >  Aug 1 ~ Aug 15 try to improve Tick by reducing floating garbage.
>> >
>> > *References:*
>> >
>> > [1]
>> >
>> http://xiao-feng.blogspot.com/2007/03/comparison-between-snapshot-based-gc.html
>> > [2] http://www.realjenius.com/node/377
>> > [3] http://www.pawlan.com/Monica/refobjs/
>> > [4]
>> >
>> http://mail-archives.apache.org/mod_mbox/harmony-dev/200505.mbox/%3C4293E6BD.1060303@anu.edu.au%3E
>> > [5] http://www.memorymanagement.org/glossary/f.html#floating.garbage
>> > [6] http://issues.apache.org/jira/browse/HARMONY-5989
>> >
>> > --
>> > From : Simon.Zhou@PPI, Fudan University
>> >
>>
>>
>>
>>  --
>> http://people.apache.org/~xli
>>
>
>
>
> --
> From : Simon.Zhou@PPI, Fudan University
>



-- 
http://people.apache.org/~xli

Re: [GSoC] Proposal of GC-1: implement WeakReference support in Harmony concurrent GC

Posted by Simon Zhou <si...@gmail.com>.
After some investigations, I am wondering when to processing WeakRef for
concurrent GC. my understanding is as follows

IMHO, we can process them, also concurrently with mutators, when the
concurrent marking finishes (for 2 SATB alogorithms). the advantage is that,
the data structure is easy to implement, I guess.

Another choice is to process them along with the marking phase, that is, we
should construct sync queues for 3 kinds of reference object, the read
barrier will enqueue and GC threads will dequeue. The WeakRef processing
will finishes with the concurrent marking phase.
For mostly concurrent algorithm, we can just process it in the short STW
phase.
Would you like to give some comments on this? Thank you very much!

Thanks
Simon

2009/4/1 Xiao-Feng Li <xi...@gmail.com>

> Simon, I've got some time reviewing your proposal. It looks very
> feasible. Probably you'd better read my blog entry on WeakRef and
> Finalizer processing of Harmony [1].
>
> One question, will you implement weakref support for all the three
> concurrent GC algorithm?
>
> Btw, please go ahead to submit/update your proposal to the GSoC
> application web.
>
> [1]
> http://xiao-feng.blogspot.com/2007/05/weak-reference-processing-in-apache.html
>
> Thanks,
> xiaofeng
>
> On Tue, Mar 31, 2009 at 6:06 PM, Simon Zhou <si...@gmail.com>
> wrote:
>  > Hi All,
> > Here is my proposal, any suggestion is welcome!
> >
> > *Title/Summary:* GC-1: Implement WeakReference support in Harmony
> Concurent
> > GC
> >
> > *Student:* Simon.Zhou
> >
> > *Student e-mail:* simon.harmony@gmail.com
> >
> > *Student Major:* Computer Science
> >
> > *Student Degree:* Master
> >
> > *Student Graduation:* 2009.7
> >
> > *Organization:* Fudan University
> >
> > *Assigned Mentor:*
> >
> > *Abstract:*
> >
> > weak reference allows a program to maintain special reference to objects,
> > which is a useful API for the program interacting with the garbage
> > collector to some extent. Apache Harmony has a concurrent garbage
> collector
> > (code name: Tick) which can perform the garbage collection without
> > suspending the application threads except for root set enumeration. Tick
> do
> > not have weak reference supporting now, it treats all the references
> object
> > just as  strong references, which may cause inefficiency for applications
> > using weak references. In this project, I will add the weak reference
> > support for Tick, which would be different from the implementation in
> > generational GC, because the consistency should be maintained for the
> heap
> > objects carefully. Read barrier of the get() method of reference object
> will
> > be used for this implementation, and the performance issue will be
> seriously
> > considered. before implement phantom reference, I will also implement the
> > finalizer feature for Tick, although it is not recommanded to be used
> except
> > it is necessary. Besides this, I am trying to reduce the floating garbage
> > for Tick’s concurrent algorithms.
> >
> > *Detailed Description:*
> >
> > Tick is a concurrent GC of Harmony DRLVM, it can do the garbage
> collection
> > without suspending application threads. For concurrent GC, the most
> > important issue that should be dealt with is the inconsistency problem,
> that
> > is if we trace the heap object graph while the mutator updating the same
> > graph, inconsistency problem occurs, some live objects may be mistaken
> > relaimed as garbage. In order to keep the consistency of objects, write
> > barriers is inserted when concurrent tracing starts, which intercept all
> the
> > write operations of object reference and record useful data for
> rescaning.
> > Now there are 3 different concurrent collection algorithms in Tick: DLG,
> > Sliding-View, Mostly-Concurrent; and 3 different write barriers are used
> > repectively. see more information [1].
> >
> > Experiment result shows that Tick can significantly reduce or even
> eliminate
> > the pause time caused by garbage collection. However, one drawback of
> Tick
> > is that, it do not support weak reference mechanism, which is a important
> > API of Java programming language.
> >
> > Weak reference is an useful tool for whom are much care about the
> > application memory usage, e.g. it can help developers to cache some
> > temporary large objects for later reusing; also, to do some cleanup
> > operations just when a object is acturally relaimed. There are 3 kinds of
> > weak reference class in Java language (the weakness is decreasingly):
> soft
> > reference, weak reference, phantom reference. Soft reference is usually
> used
> > to maintain better cache for large object, such as pictures load from
> web;
> > weak reference can help to reduce the frequency of creating short-live
> > objects; phantom reference can usully help programmer to do cleanup
> > operations just when an object is physically relaimed. (from now on, I
> will
> > use ‘weakref’ to represent the set of all the three kinds of weak
> > references) All the three kinds of reference classes have a get() method,
> if
> > the object has not been actually relaimed, the get() method will return a
> > reference of the referent object (referent object is the object which
> > reference object point to) except phantom reference (phantom reference’s
> > get() method always return null, since it is just used for cleanup
> > operations when object is actually relaimed). see more information [2,3]
> >
> > Weakrefs are always processed in GC. For generational GC, weakref feature
> > can be implemented easily: when minor collection occurs, weak reference
> > objects are equeued and their referents are reclaimed. when major
> collection
> > occurs, soft reference objects are enqueued and their referents are
> > reclaimed. when finalizer are processed, phantom reference objects are
> > enqueued and their referents are reclaimed.
> >
> > Unlike generational GC, Tick just simply treats all the references in the
> > heap as strong references, which make memory pressure become bigger for
> the
> > applications using much data structure based on weakrefs, such as
> > WeakHashMap. In this project, my job is to implement this weakref feature
> > for Tick, and this work should be independent to the concurrent
> algorithms
> > used in Tick.
> >
> > For concurrent GC, weakref processing is different with generational GC,
> > because the tracing thread is running concurrently with application
> threads.
> > Application threads may invoke the get() method to get the reference of
> > referent object, it may:
> > 1, put it to the thread local stack as local var or calling parameter,
> such
> > as
> >    Object localvar = weakref.get();
> >    functionA(weakref.get());
> > 2, assign it to another object slot, such as
> >    obj.field = weakref.get();
> > these operations will change the reachability of referent object, which
> may
> > cause the inconsistency issues in concurrent GC. In order to deal with
> it,
> > what we should do is to intercept these operations while application
> thread
> > is running. Because all these operations use the get() method to get the
> > reference of referent object, inserting the read barrier in the get() is
> a
> > simple and effective approach. The main job of the read barrier is to add
> > the pointer of referent object to a remembered set, then the garbage
> > collection thread will pick it up and trace from it.
> >
> > I divided the work into 3 major steps:
> >
> > The first step is to implement the read barrier for get() method of
> > reference object. My approch is implementing the barrier using a VMHelper
> > method, which is a VMMagic based helper function writen in Java. That is,
> > using Java to write some hot funtions in JVM, then these Java methods can
> be
> > compilered and even inlined into the native code produced by JIT compiler
> > [4]. This is very useful approach to reduce the read barrier’s overhead.
> In
> > this step, I will write a Java version read barrier and add some code to
> the
> > JIT compiler (Jitrino and JET) to make it compile the barrier correctly
> and
> > effiently.
> >
> > The second step is to implement the weakref processing module in Tick.
> This
> > module will collect all the weakrefs while concurrent marking and put
> them
> > to different queues depending on its reference type. When concurrent
> marking
> > finishes, GC threads pick up the reference object in the queues to see if
> > the referent object is marked, if so, the referent object is live and
> should
> > not be processed, otherwise, process the referent object according the
> type
> > of the reference object respectively. moreover, if the reference object
> is
> > intercepted by the read barrier in the get() method, its referent object
> > should be marked while tracing, so the consistency is well maintained.
> >
> > After this step, I will test Tick with benchmark using weakrefs, or using
> > data structure based on the weakrefs (such as WeakHashMap). The
> experiment
> > result will be analyzed and compared to Tick without weakrefs supporting
> and
> > generational GC with weakrefs supporting. Then I will form the result
> data
> > to a document report.
> >
> > The last step is trying to reduce the floating garabge [5] for concurrent
> > GC. This work is not related to weakrefs feature, but its effect may be
> > similar to it, since the major goal of reducing floating garbage is to
> ease
> > the memory pressure while using concurrent GC. Floating garbage always
> exist
> > and may not be totally eliminated in concurrent GC, my job is trying to
> > optimize its impact as much as possible.
> >
> > *Additional Information:*
> >
> > Personal Information:
> >  I am a master canidate of Computer Science in Parallel Processing
> > Institution, Fudan University. My interested area includes Compiler,
> > Computer Architecure, Managed Runtime etc. I started to follow Harmony
> with
> > interests 2 year ago, and I used to contribute to DRLVM and have submited
> > some patches to the community. As an intern, I improved Harmony
> concurrent
> > GC by using state-based phase design and implementing an efficient
> scheduler
> > in the summer of 2008, the patch of this work has been merged to the
> source
> > trunk [6].
> >
> > Schedule:
> >
> >  I can work at least 3 days per week for this work, my schedule this
> > project as follow:
> >  April 10 ~ April 26 Communicate with the mentor and design the interface
> > of the implementation, such as, the native interface of get() method,
> weak
> > reference processing interfaces in GC module. Write down the design
> > document.
> >
> >  April 27 ~ May 24 implementing read barrier for get() method of
> reference
> > object.
> >
> >  May 25 ~ June 30 implementing the weak and soft reference processing
> > features in GC module, add finalizer feature to Tick
> >
> >  July 1 ~ July 15 implementing phantom reference processing features.
> >
> >  July 16 ~ Aug 1 test and analyze Tick with weak reference related
> > benchmark. write document for implementing and testing.
> >
> >  Aug 1 ~ Aug 15 try to improve Tick by reducing floating garbage.
> >
> > *References:*
> >
> > [1]
> >
> http://xiao-feng.blogspot.com/2007/03/comparison-between-snapshot-based-gc.html
> > [2] http://www.realjenius.com/node/377
> > [3] http://www.pawlan.com/Monica/refobjs/
> > [4]
> >
> http://mail-archives.apache.org/mod_mbox/harmony-dev/200505.mbox/%3C4293E6BD.1060303@anu.edu.au%3E
> > [5] http://www.memorymanagement.org/glossary/f.html#floating.garbage
> > [6] http://issues.apache.org/jira/browse/HARMONY-5989
> >
> > --
> > From : Simon.Zhou@PPI, Fudan University
> >
>
>
>
>  --
> http://people.apache.org/~xli
>



-- 
>From : Simon.Zhou@PPI, Fudan University

Re: [GSoC] Proposal of GC-1: implement WeakReference support in Harmony concurrent GC

Posted by Simon Zhou <si...@gmail.com>.
Thank you Xiaofeng, [1] is very useful!

I am planning to implement weakref support for all the 3 concurrent
algorithms. Using read barrier, I suppose the solution of different
algorithms would be similar, but I am not sure, IMHO, it is because that the
referent of weakref can not be overwritten after weakref has been created.
Is it right?
I have submitted my proposal to the GSoC application webm, I will keep
updating it these days.

Thanks
Simon

2009/4/1 Xiao-Feng Li <xi...@gmail.com>

> Simon, I've got some time reviewing your proposal. It looks very
> feasible. Probably you'd better read my blog entry on WeakRef and
> Finalizer processing of Harmony [1].
>
> One question, will you implement weakref support for all the three
> concurrent GC algorithm?
>
> Btw, please go ahead to submit/update your proposal to the GSoC
> application web.
>
> [1]
> http://xiao-feng.blogspot.com/2007/05/weak-reference-processing-in-apache.html
>
> Thanks,
> xiaofeng
>
> On Tue, Mar 31, 2009 at 6:06 PM, Simon Zhou <si...@gmail.com>
> wrote:
> > Hi All,
> > Here is my proposal, any suggestion is welcome!
> >
> > *Title/Summary:* GC-1: Implement WeakReference support in Harmony
> Concurent
> > GC
> >
> > *Student:* Simon.Zhou
> >
> > *Student e-mail:* simon.harmony@gmail.com
> >
> > *Student Major:* Computer Science
> >
> > *Student Degree:* Master
> >
> > *Student Graduation:* 2009.7
> >
> > *Organization:* Fudan University
> >
> > *Assigned Mentor:*
> >
> > *Abstract:*
> >
> > weak reference allows a program to maintain special reference to objects,
> > which is a useful API for the program interacting with the garbage
> > collector to some extent. Apache Harmony has a concurrent garbage
> collector
> > (code name: Tick) which can perform the garbage collection without
> > suspending the application threads except for root set enumeration. Tick
> do
> > not have weak reference supporting now, it treats all the references
> object
> > just as  strong references, which may cause inefficiency for applications
> > using weak references. In this project, I will add the weak reference
> > support for Tick, which would be different from the implementation in
> > generational GC, because the consistency should be maintained for the
> heap
> > objects carefully. Read barrier of the get() method of reference object
> will
> > be used for this implementation, and the performance issue will be
> seriously
> > considered. before implement phantom reference, I will also implement the
> > finalizer feature for Tick, although it is not recommanded to be used
> except
> > it is necessary. Besides this, I am trying to reduce the floating garbage
> > for Tick’s concurrent algorithms.
> >
> > *Detailed Description:*
> >
> > Tick is a concurrent GC of Harmony DRLVM, it can do the garbage
> collection
> > without suspending application threads. For concurrent GC, the most
> > important issue that should be dealt with is the inconsistency problem,
> that
> > is if we trace the heap object graph while the mutator updating the same
> > graph, inconsistency problem occurs, some live objects may be mistaken
> > relaimed as garbage. In order to keep the consistency of objects, write
> > barriers is inserted when concurrent tracing starts, which intercept all
> the
> > write operations of object reference and record useful data for
> rescaning.
> > Now there are 3 different concurrent collection algorithms in Tick: DLG,
> > Sliding-View, Mostly-Concurrent; and 3 different write barriers are used
> > repectively. see more information [1].
> >
> > Experiment result shows that Tick can significantly reduce or even
> eliminate
> > the pause time caused by garbage collection. However, one drawback of
> Tick
> > is that, it do not support weak reference mechanism, which is a important
> > API of Java programming language.
> >
> > Weak reference is an useful tool for whom are much care about the
> > application memory usage, e.g. it can help developers to cache some
> > temporary large objects for later reusing; also, to do some cleanup
> > operations just when a object is acturally relaimed. There are 3 kinds of
> > weak reference class in Java language (the weakness is decreasingly):
> soft
> > reference, weak reference, phantom reference. Soft reference is usually
> used
> > to maintain better cache for large object, such as pictures load from
> web;
> > weak reference can help to reduce the frequency of creating short-live
> > objects; phantom reference can usully help programmer to do cleanup
> > operations just when an object is physically relaimed. (from now on, I
> will
> > use ‘weakref’ to represent the set of all the three kinds of weak
> > references) All the three kinds of reference classes have a get() method,
> if
> > the object has not been actually relaimed, the get() method will return a
> > reference of the referent object (referent object is the object which
> > reference object point to) except phantom reference (phantom reference’s
> > get() method always return null, since it is just used for cleanup
> > operations when object is actually relaimed). see more information [2,3]
> >
> > Weakrefs are always processed in GC. For generational GC, weakref feature
> > can be implemented easily: when minor collection occurs, weak reference
> > objects are equeued and their referents are reclaimed. when major
> collection
> > occurs, soft reference objects are enqueued and their referents are
> > reclaimed. when finalizer are processed, phantom reference objects are
> > enqueued and their referents are reclaimed.
> >
> > Unlike generational GC, Tick just simply treats all the references in the
> > heap as strong references, which make memory pressure become bigger for
> the
> > applications using much data structure based on weakrefs, such as
> > WeakHashMap. In this project, my job is to implement this weakref feature
> > for Tick, and this work should be independent to the concurrent
> algorithms
> > used in Tick.
> >
> > For concurrent GC, weakref processing is different with generational GC,
> > because the tracing thread is running concurrently with application
> threads.
> > Application threads may invoke the get() method to get the reference of
> > referent object, it may:
> > 1, put it to the thread local stack as local var or calling parameter,
> such
> > as
> >    Object localvar = weakref.get();
> >    functionA(weakref.get());
> > 2, assign it to another object slot, such as
> >    obj.field = weakref.get();
> > these operations will change the reachability of referent object, which
> may
> > cause the inconsistency issues in concurrent GC. In order to deal with
> it,
> > what we should do is to intercept these operations while application
> thread
> > is running. Because all these operations use the get() method to get the
> > reference of referent object, inserting the read barrier in the get() is
> a
> > simple and effective approach. The main job of the read barrier is to add
> > the pointer of referent object to a remembered set, then the garbage
> > collection thread will pick it up and trace from it.
> >
> > I divided the work into 3 major steps:
> >
> > The first step is to implement the read barrier for get() method of
> > reference object. My approch is implementing the barrier using a VMHelper
> > method, which is a VMMagic based helper function writen in Java. That is,
> > using Java to write some hot funtions in JVM, then these Java methods can
> be
> > compilered and even inlined into the native code produced by JIT compiler
> > [4]. This is very useful approach to reduce the read barrier’s overhead.
> In
> > this step, I will write a Java version read barrier and add some code to
> the
> > JIT compiler (Jitrino and JET) to make it compile the barrier correctly
> and
> > effiently.
> >
> > The second step is to implement the weakref processing module in Tick.
> This
> > module will collect all the weakrefs while concurrent marking and put
> them
> > to different queues depending on its reference type. When concurrent
> marking
> > finishes, GC threads pick up the reference object in the queues to see if
> > the referent object is marked, if so, the referent object is live and
> should
> > not be processed, otherwise, process the referent object according the
> type
> > of the reference object respectively. moreover, if the reference object
> is
> > intercepted by the read barrier in the get() method, its referent object
> > should be marked while tracing, so the consistency is well maintained.
> >
> > After this step, I will test Tick with benchmark using weakrefs, or using
> > data structure based on the weakrefs (such as WeakHashMap). The
> experiment
> > result will be analyzed and compared to Tick without weakrefs supporting
> and
> > generational GC with weakrefs supporting. Then I will form the result
> data
> > to a document report.
> >
> > The last step is trying to reduce the floating garabge [5] for concurrent
> > GC. This work is not related to weakrefs feature, but its effect may be
> > similar to it, since the major goal of reducing floating garbage is to
> ease
> > the memory pressure while using concurrent GC. Floating garbage always
> exist
> > and may not be totally eliminated in concurrent GC, my job is trying to
> > optimize its impact as much as possible.
> >
> > *Additional Information:*
> >
> > Personal Information:
> >  I am a master canidate of Computer Science in Parallel Processing
> > Institution, Fudan University. My interested area includes Compiler,
> > Computer Architecure, Managed Runtime etc. I started to follow Harmony
> with
> > interests 2 year ago, and I used to contribute to DRLVM and have submited
> > some patches to the community. As an intern, I improved Harmony
> concurrent
> > GC by using state-based phase design and implementing an efficient
> scheduler
> > in the summer of 2008, the patch of this work has been merged to the
> source
> > trunk [6].
> >
> > Schedule:
> >
> >  I can work at least 3 days per week for this work, my schedule this
> > project as follow:
> >  April 10 ~ April 26 Communicate with the mentor and design the interface
> > of the implementation, such as, the native interface of get() method,
> weak
> > reference processing interfaces in GC module. Write down the design
> > document.
> >
> >  April 27 ~ May 24 implementing read barrier for get() method of
> reference
> > object.
> >
> >  May 25 ~ June 30 implementing the weak and soft reference processing
> > features in GC module, add finalizer feature to Tick
> >
> >  July 1 ~ July 15 implementing phantom reference processing features.
> >
> >  July 16 ~ Aug 1 test and analyze Tick with weak reference related
> > benchmark. write document for implementing and testing.
> >
> >  Aug 1 ~ Aug 15 try to improve Tick by reducing floating garbage.
> >
> > *References:*
> >
> > [1]
> >
> http://xiao-feng.blogspot.com/2007/03/comparison-between-snapshot-based-gc.html
> > [2] http://www.realjenius.com/node/377
> > [3] http://www.pawlan.com/Monica/refobjs/
> > [4]
> >
> http://mail-archives.apache.org/mod_mbox/harmony-dev/200505.mbox/%3C4293E6BD.1060303@anu.edu.au%3E
> > [5] http://www.memorymanagement.org/glossary/f.html#floating.garbage
> > [6] http://issues.apache.org/jira/browse/HARMONY-5989
> >
> > --
> > From : Simon.Zhou@PPI, Fudan University
> >
>
>
>
> --
> http://people.apache.org/~xli <http://people.apache.org/%7Exli>
>



-- 
>From : Simon.Zhou@PPI, Fudan University

Re: [GSoC] Proposal of GC-1: implement WeakReference support in Harmony concurrent GC

Posted by Xiao-Feng Li <xi...@gmail.com>.
Simon, I've got some time reviewing your proposal. It looks very
feasible. Probably you'd better read my blog entry on WeakRef and
Finalizer processing of Harmony [1].

One question, will you implement weakref support for all the three
concurrent GC algorithm?

Btw, please go ahead to submit/update your proposal to the GSoC
application web.

[1] http://xiao-feng.blogspot.com/2007/05/weak-reference-processing-in-apache.html

Thanks,
xiaofeng

On Tue, Mar 31, 2009 at 6:06 PM, Simon Zhou <si...@gmail.com> wrote:
> Hi All,
> Here is my proposal, any suggestion is welcome!
>
> *Title/Summary:* GC-1: Implement WeakReference support in Harmony Concurent
> GC
>
> *Student:* Simon.Zhou
>
> *Student e-mail:* simon.harmony@gmail.com
>
> *Student Major:* Computer Science
>
> *Student Degree:* Master
>
> *Student Graduation:* 2009.7
>
> *Organization:* Fudan University
>
> *Assigned Mentor:*
>
> *Abstract:*
>
> weak reference allows a program to maintain special reference to objects,
> which is a useful API for the program interacting with the garbage
> collector to some extent. Apache Harmony has a concurrent garbage collector
> (code name: Tick) which can perform the garbage collection without
> suspending the application threads except for root set enumeration. Tick do
> not have weak reference supporting now, it treats all the references object
> just as  strong references, which may cause inefficiency for applications
> using weak references. In this project, I will add the weak reference
> support for Tick, which would be different from the implementation in
> generational GC, because the consistency should be maintained for the heap
> objects carefully. Read barrier of the get() method of reference object will
> be used for this implementation, and the performance issue will be seriously
> considered. before implement phantom reference, I will also implement the
> finalizer feature for Tick, although it is not recommanded to be used except
> it is necessary. Besides this, I am trying to reduce the floating garbage
> for Tick’s concurrent algorithms.
>
> *Detailed Description:*
>
> Tick is a concurrent GC of Harmony DRLVM, it can do the garbage collection
> without suspending application threads. For concurrent GC, the most
> important issue that should be dealt with is the inconsistency problem, that
> is if we trace the heap object graph while the mutator updating the same
> graph, inconsistency problem occurs, some live objects may be mistaken
> relaimed as garbage. In order to keep the consistency of objects, write
> barriers is inserted when concurrent tracing starts, which intercept all the
> write operations of object reference and record useful data for rescaning.
> Now there are 3 different concurrent collection algorithms in Tick: DLG,
> Sliding-View, Mostly-Concurrent; and 3 different write barriers are used
> repectively. see more information [1].
>
> Experiment result shows that Tick can significantly reduce or even eliminate
> the pause time caused by garbage collection. However, one drawback of Tick
> is that, it do not support weak reference mechanism, which is a important
> API of Java programming language.
>
> Weak reference is an useful tool for whom are much care about the
> application memory usage, e.g. it can help developers to cache some
> temporary large objects for later reusing; also, to do some cleanup
> operations just when a object is acturally relaimed. There are 3 kinds of
> weak reference class in Java language (the weakness is decreasingly): soft
> reference, weak reference, phantom reference. Soft reference is usually used
> to maintain better cache for large object, such as pictures load from web;
> weak reference can help to reduce the frequency of creating short-live
> objects; phantom reference can usully help programmer to do cleanup
> operations just when an object is physically relaimed. (from now on, I will
> use ‘weakref’ to represent the set of all the three kinds of weak
> references) All the three kinds of reference classes have a get() method, if
> the object has not been actually relaimed, the get() method will return a
> reference of the referent object (referent object is the object which
> reference object point to) except phantom reference (phantom reference’s
> get() method always return null, since it is just used for cleanup
> operations when object is actually relaimed). see more information [2,3]
>
> Weakrefs are always processed in GC. For generational GC, weakref feature
> can be implemented easily: when minor collection occurs, weak reference
> objects are equeued and their referents are reclaimed. when major collection
> occurs, soft reference objects are enqueued and their referents are
> reclaimed. when finalizer are processed, phantom reference objects are
> enqueued and their referents are reclaimed.
>
> Unlike generational GC, Tick just simply treats all the references in the
> heap as strong references, which make memory pressure become bigger for the
> applications using much data structure based on weakrefs, such as
> WeakHashMap. In this project, my job is to implement this weakref feature
> for Tick, and this work should be independent to the concurrent algorithms
> used in Tick.
>
> For concurrent GC, weakref processing is different with generational GC,
> because the tracing thread is running concurrently with application threads.
> Application threads may invoke the get() method to get the reference of
> referent object, it may:
> 1, put it to the thread local stack as local var or calling parameter, such
> as
>    Object localvar = weakref.get();
>    functionA(weakref.get());
> 2, assign it to another object slot, such as
>    obj.field = weakref.get();
> these operations will change the reachability of referent object, which may
> cause the inconsistency issues in concurrent GC. In order to deal with it,
> what we should do is to intercept these operations while application thread
> is running. Because all these operations use the get() method to get the
> reference of referent object, inserting the read barrier in the get() is a
> simple and effective approach. The main job of the read barrier is to add
> the pointer of referent object to a remembered set, then the garbage
> collection thread will pick it up and trace from it.
>
> I divided the work into 3 major steps:
>
> The first step is to implement the read barrier for get() method of
> reference object. My approch is implementing the barrier using a VMHelper
> method, which is a VMMagic based helper function writen in Java. That is,
> using Java to write some hot funtions in JVM, then these Java methods can be
> compilered and even inlined into the native code produced by JIT compiler
> [4]. This is very useful approach to reduce the read barrier’s overhead. In
> this step, I will write a Java version read barrier and add some code to the
> JIT compiler (Jitrino and JET) to make it compile the barrier correctly and
> effiently.
>
> The second step is to implement the weakref processing module in Tick. This
> module will collect all the weakrefs while concurrent marking and put them
> to different queues depending on its reference type. When concurrent marking
> finishes, GC threads pick up the reference object in the queues to see if
> the referent object is marked, if so, the referent object is live and should
> not be processed, otherwise, process the referent object according the type
> of the reference object respectively. moreover, if the reference object is
> intercepted by the read barrier in the get() method, its referent object
> should be marked while tracing, so the consistency is well maintained.
>
> After this step, I will test Tick with benchmark using weakrefs, or using
> data structure based on the weakrefs (such as WeakHashMap). The experiment
> result will be analyzed and compared to Tick without weakrefs supporting and
> generational GC with weakrefs supporting. Then I will form the result data
> to a document report.
>
> The last step is trying to reduce the floating garabge [5] for concurrent
> GC. This work is not related to weakrefs feature, but its effect may be
> similar to it, since the major goal of reducing floating garbage is to ease
> the memory pressure while using concurrent GC. Floating garbage always exist
> and may not be totally eliminated in concurrent GC, my job is trying to
> optimize its impact as much as possible.
>
> *Additional Information:*
>
> Personal Information:
>  I am a master canidate of Computer Science in Parallel Processing
> Institution, Fudan University. My interested area includes Compiler,
> Computer Architecure, Managed Runtime etc. I started to follow Harmony with
> interests 2 year ago, and I used to contribute to DRLVM and have submited
> some patches to the community. As an intern, I improved Harmony concurrent
> GC by using state-based phase design and implementing an efficient scheduler
> in the summer of 2008, the patch of this work has been merged to the source
> trunk [6].
>
> Schedule:
>
>  I can work at least 3 days per week for this work, my schedule this
> project as follow:
>  April 10 ~ April 26 Communicate with the mentor and design the interface
> of the implementation, such as, the native interface of get() method, weak
> reference processing interfaces in GC module. Write down the design
> document.
>
>  April 27 ~ May 24 implementing read barrier for get() method of reference
> object.
>
>  May 25 ~ June 30 implementing the weak and soft reference processing
> features in GC module, add finalizer feature to Tick
>
>  July 1 ~ July 15 implementing phantom reference processing features.
>
>  July 16 ~ Aug 1 test and analyze Tick with weak reference related
> benchmark. write document for implementing and testing.
>
>  Aug 1 ~ Aug 15 try to improve Tick by reducing floating garbage.
>
> *References:*
>
> [1]
> http://xiao-feng.blogspot.com/2007/03/comparison-between-snapshot-based-gc.html
> [2] http://www.realjenius.com/node/377
> [3] http://www.pawlan.com/Monica/refobjs/
> [4]
> http://mail-archives.apache.org/mod_mbox/harmony-dev/200505.mbox/%3C4293E6BD.1060303@anu.edu.au%3E
> [5] http://www.memorymanagement.org/glossary/f.html#floating.garbage
> [6] http://issues.apache.org/jira/browse/HARMONY-5989
>
> --
> From : Simon.Zhou@PPI, Fudan University
>



-- 
http://people.apache.org/~xli

Re: [GSoC] Proposal of GC-1: implement WeakReference support in Harmony concurrent GC

Posted by Xiao-Feng Li <xi...@gmail.com>.
Simon, excellent proposal. I need more time to fully examine your
proposal in detail. Let's continue the discussions here.

Thanks,
xiaofeng

On Tue, Mar 31, 2009 at 6:06 PM, Simon Zhou <si...@gmail.com> wrote:
> Hi All,
> Here is my proposal, any suggestion is welcome!
>
> *Title/Summary:* GC-1: Implement WeakReference support in Harmony Concurent
> GC
>
> *Student:* Simon.Zhou
>
> *Student e-mail:* simon.harmony@gmail.com
>
> *Student Major:* Computer Science
>
> *Student Degree:* Master
>
> *Student Graduation:* 2009.7
>
> *Organization:* Fudan University
>
> *Assigned Mentor:*
>
> *Abstract:*
>
> weak reference allows a program to maintain special reference to objects,
> which is a useful API for the program interacting with the garbage
> collector to some extent. Apache Harmony has a concurrent garbage collector
> (code name: Tick) which can perform the garbage collection without
> suspending the application threads except for root set enumeration. Tick do
> not have weak reference supporting now, it treats all the references object
> just as  strong references, which may cause inefficiency for applications
> using weak references. In this project, I will add the weak reference
> support for Tick, which would be different from the implementation in
> generational GC, because the consistency should be maintained for the heap
> objects carefully. Read barrier of the get() method of reference object will
> be used for this implementation, and the performance issue will be seriously
> considered. before implement phantom reference, I will also implement the
> finalizer feature for Tick, although it is not recommanded to be used except
> it is necessary. Besides this, I am trying to reduce the floating garbage
> for Tick’s concurrent algorithms.
>
> *Detailed Description:*
>
> Tick is a concurrent GC of Harmony DRLVM, it can do the garbage collection
> without suspending application threads. For concurrent GC, the most
> important issue that should be dealt with is the inconsistency problem, that
> is if we trace the heap object graph while the mutator updating the same
> graph, inconsistency problem occurs, some live objects may be mistaken
> relaimed as garbage. In order to keep the consistency of objects, write
> barriers is inserted when concurrent tracing starts, which intercept all the
> write operations of object reference and record useful data for rescaning.
> Now there are 3 different concurrent collection algorithms in Tick: DLG,
> Sliding-View, Mostly-Concurrent; and 3 different write barriers are used
> repectively. see more information [1].
>
> Experiment result shows that Tick can significantly reduce or even eliminate
> the pause time caused by garbage collection. However, one drawback of Tick
> is that, it do not support weak reference mechanism, which is a important
> API of Java programming language.
>
> Weak reference is an useful tool for whom are much care about the
> application memory usage, e.g. it can help developers to cache some
> temporary large objects for later reusing; also, to do some cleanup
> operations just when a object is acturally relaimed. There are 3 kinds of
> weak reference class in Java language (the weakness is decreasingly): soft
> reference, weak reference, phantom reference. Soft reference is usually used
> to maintain better cache for large object, such as pictures load from web;
> weak reference can help to reduce the frequency of creating short-live
> objects; phantom reference can usully help programmer to do cleanup
> operations just when an object is physically relaimed. (from now on, I will
> use ‘weakref’ to represent the set of all the three kinds of weak
> references) All the three kinds of reference classes have a get() method, if
> the object has not been actually relaimed, the get() method will return a
> reference of the referent object (referent object is the object which
> reference object point to) except phantom reference (phantom reference’s
> get() method always return null, since it is just used for cleanup
> operations when object is actually relaimed). see more information [2,3]
>
> Weakrefs are always processed in GC. For generational GC, weakref feature
> can be implemented easily: when minor collection occurs, weak reference
> objects are equeued and their referents are reclaimed. when major collection
> occurs, soft reference objects are enqueued and their referents are
> reclaimed. when finalizer are processed, phantom reference objects are
> enqueued and their referents are reclaimed.
>
> Unlike generational GC, Tick just simply treats all the references in the
> heap as strong references, which make memory pressure become bigger for the
> applications using much data structure based on weakrefs, such as
> WeakHashMap. In this project, my job is to implement this weakref feature
> for Tick, and this work should be independent to the concurrent algorithms
> used in Tick.
>
> For concurrent GC, weakref processing is different with generational GC,
> because the tracing thread is running concurrently with application threads.
> Application threads may invoke the get() method to get the reference of
> referent object, it may:
> 1, put it to the thread local stack as local var or calling parameter, such
> as
>    Object localvar = weakref.get();
>    functionA(weakref.get());
> 2, assign it to another object slot, such as
>    obj.field = weakref.get();
> these operations will change the reachability of referent object, which may
> cause the inconsistency issues in concurrent GC. In order to deal with it,
> what we should do is to intercept these operations while application thread
> is running. Because all these operations use the get() method to get the
> reference of referent object, inserting the read barrier in the get() is a
> simple and effective approach. The main job of the read barrier is to add
> the pointer of referent object to a remembered set, then the garbage
> collection thread will pick it up and trace from it.
>
> I divided the work into 3 major steps:
>
> The first step is to implement the read barrier for get() method of
> reference object. My approch is implementing the barrier using a VMHelper
> method, which is a VMMagic based helper function writen in Java. That is,
> using Java to write some hot funtions in JVM, then these Java methods can be
> compilered and even inlined into the native code produced by JIT compiler
> [4]. This is very useful approach to reduce the read barrier’s overhead. In
> this step, I will write a Java version read barrier and add some code to the
> JIT compiler (Jitrino and JET) to make it compile the barrier correctly and
> effiently.
>
> The second step is to implement the weakref processing module in Tick. This
> module will collect all the weakrefs while concurrent marking and put them
> to different queues depending on its reference type. When concurrent marking
> finishes, GC threads pick up the reference object in the queues to see if
> the referent object is marked, if so, the referent object is live and should
> not be processed, otherwise, process the referent object according the type
> of the reference object respectively. moreover, if the reference object is
> intercepted by the read barrier in the get() method, its referent object
> should be marked while tracing, so the consistency is well maintained.
>
> After this step, I will test Tick with benchmark using weakrefs, or using
> data structure based on the weakrefs (such as WeakHashMap). The experiment
> result will be analyzed and compared to Tick without weakrefs supporting and
> generational GC with weakrefs supporting. Then I will form the result data
> to a document report.
>
> The last step is trying to reduce the floating garabge [5] for concurrent
> GC. This work is not related to weakrefs feature, but its effect may be
> similar to it, since the major goal of reducing floating garbage is to ease
> the memory pressure while using concurrent GC. Floating garbage always exist
> and may not be totally eliminated in concurrent GC, my job is trying to
> optimize its impact as much as possible.
>
> *Additional Information:*
>
> Personal Information:
>  I am a master canidate of Computer Science in Parallel Processing
> Institution, Fudan University. My interested area includes Compiler,
> Computer Architecure, Managed Runtime etc. I started to follow Harmony with
> interests 2 year ago, and I used to contribute to DRLVM and have submited
> some patches to the community. As an intern, I improved Harmony concurrent
> GC by using state-based phase design and implementing an efficient scheduler
> in the summer of 2008, the patch of this work has been merged to the source
> trunk [6].
>
> Schedule:
>
>  I can work at least 3 days per week for this work, my schedule this
> project as follow:
>  April 10 ~ April 26 Communicate with the mentor and design the interface
> of the implementation, such as, the native interface of get() method, weak
> reference processing interfaces in GC module. Write down the design
> document.
>
>  April 27 ~ May 24 implementing read barrier for get() method of reference
> object.
>
>  May 25 ~ June 30 implementing the weak and soft reference processing
> features in GC module, add finalizer feature to Tick
>
>  July 1 ~ July 15 implementing phantom reference processing features.
>
>  July 16 ~ Aug 1 test and analyze Tick with weak reference related
> benchmark. write document for implementing and testing.
>
>  Aug 1 ~ Aug 15 try to improve Tick by reducing floating garbage.
>
> *References:*
>
> [1]
> http://xiao-feng.blogspot.com/2007/03/comparison-between-snapshot-based-gc.html
> [2] http://www.realjenius.com/node/377
> [3] http://www.pawlan.com/Monica/refobjs/
> [4]
> http://mail-archives.apache.org/mod_mbox/harmony-dev/200505.mbox/%3C4293E6BD.1060303@anu.edu.au%3E
> [5] http://www.memorymanagement.org/glossary/f.html#floating.garbage
> [6] http://issues.apache.org/jira/browse/HARMONY-5989
>
> --
> From : Simon.Zhou@PPI, Fudan University
>



-- 
http://people.apache.org/~xli