You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Robin Garner <ro...@anu.edu.au> on 2006/10/26 00:18:03 UTC

Re: [DRLVM][MMTk] current status and plan

>    follows.  Comments, suggestions are welcome.   It would be much
>    appreciated if Steve Blackburn and Robin Garner would reply to the
> questions
>    below directed to the "MMTk guys".

Interleaved below.

>       - The next step is to integrate MMTk in the early DRLVM boot
>       process.  The goal is to make sure all code the JIT generates
>       will allocate out of the MMTk heap.  This is a "chicken and egg"
>       kind of problem since no JITed code can execute until DRLVM has
> a GC that is
>       ready to support object allocation.  Most likely we will use the
>       MMTk notion of "ImmortalSpace" for early object allocation.  Objects
>       in ImmortalSpace are never collected, never moved.  At this
>       stage of MMTk/DRLVM porting, the cost of dead uncollected objects
> wasting
>       ImmortalSpace memory is not a concern.

The ability to specify the allocator on a per-call-site and per-type basis
can be useful here.

>       - Collection Class
>          - triggerCollection() method needs to be connected to the
>          Java API that forces a GC (this is low priority)
>          - prepareMutator() method probably needs to be integrated
>          with back-branch polling mechanism.  Also need to confirm
>          the requirement that a thread suspend request does indeed
> force the target
>          thread to be suspended at a GC safepoint.  (MMTk guys, can
>          you confirm this?)

This method is called at the start of a collection for every mutator
context that the VM interface knows about.  The work is farmed out in
parallel between the available collector threads.  What exactly happens is
up to each client VM, but this is the chance that the VM gets to prepare
each mutator thread for having its roots enumerated.

MMTk imposes no particular constraints per se on where the VM suspends its
threads, and essentially provides this method to allow the VM to do
whatever is required to suspend mutator threads and/or advance them to a
safe point before their roots are enumerated.


>          - prepareCollector() method – Its not clear MMTk/DRLVM
>          needs to do anything. (MMTk folks please comment on what the
> VM is supposed
>          to do!)

The CollectorContext object is the per-thread context object for each GC
thread.  A uniprocessor, single-threaded java program will have one
MutatorContext and one CollectorContext.  A uniprocessor multithreaded
java program (ie all practical programs on a uniprocessor) will have one
CollectorContext and several MutatorContexts.

This method is provided so that the VM can do whatever it requires to a
collector thread.  If this is nothing (which in drlvm it could well be),
this can be a no-op.  Whether that is true or not requires DRLVM internals
knowledge I don't have.

>          - rendezvous() method current is "hacked" to support only
>          a single thread Java app.  This needs to be fixed.  Its
>          not critical until we need to support multithread GC apps.

This is a pretty standard barrier synchronization construct - shouldn't be
hard to implement.

>          - scheduleFinalizerThread() – do nothing at this stage (It
>          will need to be fixed when MMTk/DRLVM is capable of running
> workloads that
>          need finalizers.)
>       - Lock class
>          - This looks complete.  (Can MMTk folks take a look and
>          confirm?)

Lets run through this together on Thursday.

>       - Memory Class
>          - This looks compete except for the large 450MB byte array
>          that is allocated from existing DRLVM GC.  This "hack"
>          will need to be removed (see below) to integrate MMTk into
> early stage DRLVM
>          boot process.  (Can MMTK folks confirm this analysis?)

MMTk should be allocating memory directly through mmap.

>       - ObjectModel Class
>          - Interestingly, many methods are not called by any of the
>          initial GC algorithms  targeted (MarkSweep, SemiSpace,
>          GenMS and CopyMS).  These methods currently will execute a
>          "VM.assertions._assert(false);".  The plan is to implement
>          these methods when the assert()s are hit.  Most likely
>          this will happen when additional GC algorithms are tried.
>          - copy() implementation needs to be completed.

Should be straightforward.  Need to
a) ask the VM how large the object will be when copied (allows for hash
words etc)
b) call ((CollectorContext)this).allocCopy to allocate space, then
c) ask the VM to copy the object
d) install the forwarding pointer in the old object
e) Call ((CollectorContext)this).postCopy

>          - getObjectType() returns an object of type MMTtype.  Currently
>          there is a very simple cache of MMType objects.  We need
>          to confirm this approach is functionally correct (MMTk guys,
> please
>          comment).  Then determine if a simple cache is good enough
>          to bring up work loads such as Dacapo and SpecJBB.  A
>          design issue that needs to be resolved – what part of the
> MMTk heap should
>          MMType objects be allocated from?  Maybe ImmortalSpace
>          (MMTk guys, is this correct?)

When a class is loaded, you should request an MMType from MMTk, then store
it somewhere that it can be returned by this call.  Allocating in
ImmortalSpace would be appropriate (or move before each GC using the
preCopyGCInstances hook).  JikesRVM does this (immortal allocation) using
compile-time allocation policy based on call site (by type) and object
type.

>          - Options class
>             - This prints out MMTk options and needs finishing
>             (low priority)

And allows MMTk to parse command line options it is given.  Good value for
time invested.

>          - ReferenceGlue class
>             - This manages SoftReference, WeakReference and
>             PhantomReference.  Implementation of this class can
>             wait until advanced workloads require this support (probably
> 2007).

Yep.

>          - Scanning class
>             - Most of the methods are never called by any of the
>             initial GC algorithms we are bringing up.  (MMTk
>             guys, does this seem correct?)

  public abstract void scanObject(TraceStep trace, ObjectReference object);

This exists to allow MMTk to scan objects that can't be described by an
MMType.  The motivating example was some kinds of ghc (haskell) closures
that are essentially stack fragments.  If all objects MMTk is required to
scan can be described easily bu MMTypes, this isn't needed.

  public abstract void precopyChildren(TraceLocal trace, ObjectReference
object);

ditto

  public abstract void resetThreadCounter();

required for parallel GC.

  public abstract void preCopyGCInstances(TraceLocal trace);

If MMTk objects (and thread stacks) live in a non-moving space, this
probably has nothing to do.  Otherwise required.

>             - computeAllRoots() needs to be integrated with
>             DRLVM root set enumeration code.  (NOTE: this might
>             actually impact DRLVM's JIT/VM/GC interface.)


Yes.  Possibly the biggest issue in the integration.  This needs to call
trace.enqueRootLocation on each root pointer.

  public abstract void computeBootImageRoots(TraceLocal trace);

>          - Selected {CollectorContext, MutatorContext, Plan,
>          PlanConstraints}
>             - This is a simple wrapper layer, it looks to be
>             completely implemented (MMTk guys, is this correct?)

In a C VM, this is probably true.  You need to associate a MutatorContext
with every thread and a CollectorContext with every collector thread.

>          - Statistics class
>             - Need to port this when performance becomes an
>             issue (probably 2007)

Probably right.  Should be easy though.

>          - Strings class
>             - The current implementation does a
>             System.out.println().  This works fine when the GC
>             has enough space to allocate objects as println() executes.
>             The corner case when GC runs out of space for object
>             new while attempting to println() GC diagnostics has not
> been thought out.
>              Maybe the MMTk guys have advice on this one.

Push this out to some native function that doesn't need to allocate. 
Putting printlns in allocators/barriers may be difficult without
implementing it properly.

>          - SynchronizedCounter class
>             - Need to add critical sections to this code.  (not
>             really needed until we bring up multithread GC apps)

Only used in parallel collectors, correct.

>          - DRLVM modifications needed to support MMTk
>          - Need to figure out how to attach both CollectorContext
>          and MutatorContext objects to DRLVM internal java thread data
> structure.
>          Also when the java thread exits, the CollectorContext and
>          MutatorContext reference pointers need to be set to NULL.

Unless you want to break new ground, the CollectorContexts will probably
be fixed and immortal over a given run.  MutatorContexts need to be
allocated per running thread.  Managing the set of MutatorContext objects
is largely up to the VM(interface) as long as the appropriate iterator
returns the live contexts I think it's open to many possible
implementations.

As far as I can tell, the xxxContext is a subset of the TLS for a DRLVM
thread, or could at least be pointed to by (one of) the GC-specific
fields.

>       - GCSPY – this "should just work".  Its probably best to wait
>       until after we go multithread to try to bring up GCSPY.

Yep.  GCSpy is probably only interesting when DRLVM can run a larger set
of applications (ie your motivation would be to visualise GC behaviour of
apps that JikesRVM can't run).

>    - Project 5
>       - Debug and verify JIT support for MMTk's  "Uninterruptible"
>       class.  This basically means that the JIT needs to not insert GC
>       polling calls when JITing an MMTk class that extends
> "Uninterruptible".
>       This project depends on VM and JIT support for Back-branch
>       polling.  It probably does not need to be fully developed and
>       debugged until we try to run multithread java apps.  The reason
>       is because it requires two or more running Java thread to create
> a condition
>       where one thread want to arbitrarily suspend the other java threads
> at GC
>       safepoints.

My feeling is multithreading is so fundamental to Java that this needs to
be done pdq.  I'm not sure how the helpers for GCV[45] are going to cope
without it.

>
>       --
>       Weldon Washburn
>       Intel Middleware Products Division
>



Re: [DRLVM][MMTk] current status and plan

Posted by Robin Garner <ro...@anu.edu.au>.
Weldon Washburn wrote:
> On 10/25/06, Robin Garner <ro...@anu.edu.au> wrote:
>>
>> >    follows.  Comments, suggestions are welcome.   It would be much
>> >    appreciated if Steve Blackburn and Robin Garner would reply to the
>> > questions
>> >    below directed to the "MMTk guys".
>>
>> Interleaved below.
>>
>> >          - prepareCollector() method – Its not clear MMTk/DRLVM
>> >          needs to do anything. (MMTk folks please comment on what the
>> > VM is supposed
>> >          to do!)
>>
>> The CollectorContext object is the per-thread context object for each GC
>> thread.  A uniprocessor, single-threaded java program will have one
>> MutatorContext and one CollectorContext.  A uniprocessor multithreaded
>> java program (ie all practical programs on a uniprocessor) will have one
>> CollectorContext and several MutatorContexts.
>>
>> This method is provided so that the VM can do whatever it requires to a
>> collector thread.  If this is nothing (which in drlvm it could well be),
>> this can be a no-op.  Whether that is true or not requires DRLVM 
>> internals
>> knowledge I don't have.
> 
> 
> Can we simply hardwire the number of CollectorContexts to be one?  At a
> later date when this becomes a performance problem on a 4-way SMP, we can
> fix this problem.  What do you think?

For the purposes of initial implementation, sure.  But I'd be inclined 
to debug a parallel collector earlier than that.

>>
>> >          - scheduleFinalizerThread() – do nothing at this stage (It
>> >          will need to be fixed when MMTk/DRLVM is capable of running
>> > workloads that
>> >          need finalizers.)
>> >       - Lock class
>> >          - This looks complete.  (Can MMTk folks take a look and
>> >          confirm?)
>>
>> Lets run through this together on Thursday.
> 
> 
> Well, we did not get to this today in Portland Oregon.  Maybe you can tell
> me what needs to be done over harmony-dev.

I'll have a look in SVN and see what I can do.

>> >          - getObjectType() returns an object of type MMTtype.  
>> Currently
>> >          there is a very simple cache of MMType objects.  We need
>> >          to confirm this approach is functionally correct (MMTk guys,
>> > please
>> >          comment).  Then determine if a simple cache is good enough
>> >          to bring up work loads such as Dacapo and SpecJBB.  A
>> >          design issue that needs to be resolved – what part of the
>> > MMTk heap should
>> >          MMType objects be allocated from?  Maybe ImmortalSpace
>> >          (MMTk guys, is this correct?)
>>
>> When a class is loaded, you should request an MMType from MMTk, then 
>> store
>> it somewhere that it can be returned by this call.  Allocating in
>> ImmortalSpace would be appropriate (or move before each GC using the
>> preCopyGCInstances hook).  JikesRVM does this (immortal allocation) using
>> compile-time allocation policy based on call site (by type) and object
>> type.
> 
> 
> I think this means that MMTk has a requirement on JIT functionality.  That
> is, the JIT must support compile-time allocation policy based on callsite.
> Is this correct?

I don't think it *requires* compile-time allocation policy.  As I say 
above, if you use preCopyGCInstances to move the MMType objects before 
GC, you should be able to allocate them in the general Java heap.  This 
goes for all the MMTk objects (xxContexts, Phases etc).

But VM/Compiler control over allocation policy is certainly very Nice To 
Have for a variety of reasons.

>>    - Project 5
>> >       - Debug and verify JIT support for MMTk's  "Uninterruptible"
>> >       class.  This basically means that the JIT needs to not insert GC
>> >       polling calls when JITing an MMTk class that extends
>> > "Uninterruptible".
>> >       This project depends on VM and JIT support for Back-branch
>> >       polling.  It probably does not need to be fully developed and
>> >       debugged until we try to run multithread java apps.  The reason
>> >       is because it requires two or more running Java thread to create
>> > a condition
>> >       where one thread want to arbitrarily suspend the other java
>> threads
>> > at GC
>> >       safepoints.
>>
>> My feeling is multithreading is so fundamental to Java that this needs to
>> be done pdq.  I'm not sure how the helpers for GCV[45] are going to cope
>> without it.
> 
> 
> Sorry for not being clear.  Yes, multithreading is fundamental to drlvm and
> it works.  I simply want to turn it off for this stage of the MMTk/DRLVM
> port.  Back-branch polling and Uninterruptible have to work for the ongoing
> DRLVM vmmagic vm helper coding that is currently under way.
> 


Re: [DRLVM][MMTk] current status and plan

Posted by Weldon Washburn <we...@gmail.com>.
On 10/25/06, Robin Garner <ro...@anu.edu.au> wrote:
>
> >    follows.  Comments, suggestions are welcome.   It would be much
> >    appreciated if Steve Blackburn and Robin Garner would reply to the
> > questions
> >    below directed to the "MMTk guys".
>
> Interleaved below.
>
> >          - prepareCollector() method – Its not clear MMTk/DRLVM
> >          needs to do anything. (MMTk folks please comment on what the
> > VM is supposed
> >          to do!)
>
> The CollectorContext object is the per-thread context object for each GC
> thread.  A uniprocessor, single-threaded java program will have one
> MutatorContext and one CollectorContext.  A uniprocessor multithreaded
> java program (ie all practical programs on a uniprocessor) will have one
> CollectorContext and several MutatorContexts.
>
> This method is provided so that the VM can do whatever it requires to a
> collector thread.  If this is nothing (which in drlvm it could well be),
> this can be a no-op.  Whether that is true or not requires DRLVM internals
> knowledge I don't have.


Can we simply hardwire the number of CollectorContexts to be one?  At a
later date when this becomes a performance problem on a 4-way SMP, we can
fix this problem.  What do you think?

.
>
> >          - scheduleFinalizerThread() – do nothing at this stage (It
> >          will need to be fixed when MMTk/DRLVM is capable of running
> > workloads that
> >          need finalizers.)
> >       - Lock class
> >          - This looks complete.  (Can MMTk folks take a look and
> >          confirm?)
>
> Lets run through this together on Thursday.


Well, we did not get to this today in Portland Oregon.  Maybe you can tell
me what needs to be done over harmony-dev.


> >          - getObjectType() returns an object of type MMTtype.  Currently
> >          there is a very simple cache of MMType objects.  We need
> >          to confirm this approach is functionally correct (MMTk guys,
> > please
> >          comment).  Then determine if a simple cache is good enough
> >          to bring up work loads such as Dacapo and SpecJBB.  A
> >          design issue that needs to be resolved – what part of the
> > MMTk heap should
> >          MMType objects be allocated from?  Maybe ImmortalSpace
> >          (MMTk guys, is this correct?)
>
> When a class is loaded, you should request an MMType from MMTk, then store
> it somewhere that it can be returned by this call.  Allocating in
> ImmortalSpace would be appropriate (or move before each GC using the
> preCopyGCInstances hook).  JikesRVM does this (immortal allocation) using
> compile-time allocation policy based on call site (by type) and object
> type.


I think this means that MMTk has a requirement on JIT functionality.  That
is, the JIT must support compile-time allocation policy based on callsite.
Is this correct?

>    - Project 5
> >       - Debug and verify JIT support for MMTk's  "Uninterruptible"
> >       class.  This basically means that the JIT needs to not insert GC
> >       polling calls when JITing an MMTk class that extends
> > "Uninterruptible".
> >       This project depends on VM and JIT support for Back-branch
> >       polling.  It probably does not need to be fully developed and
> >       debugged until we try to run multithread java apps.  The reason
> >       is because it requires two or more running Java thread to create
> > a condition
> >       where one thread want to arbitrarily suspend the other java
> threads
> > at GC
> >       safepoints.
>
> My feeling is multithreading is so fundamental to Java that this needs to
> be done pdq.  I'm not sure how the helpers for GCV[45] are going to cope
> without it.


Sorry for not being clear.  Yes, multithreading is fundamental to drlvm and
it works.  I simply want to turn it off for this stage of the MMTk/DRLVM
port.  Back-branch polling and Uninterruptible have to work for the ongoing
DRLVM vmmagic vm helper coding that is currently under way.

-- 
Weldon Washburn
Intel Enterprise Solutions Software Division