You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Egor Pasko <eg...@gmail.com> on 2006/10/05 12:09:35 UTC

[drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Hello, JIT & GC gurus! 

I'd like to share my observations on HARMONY-1862 and move the
discussion from JIRA. Mikhail, Ivan, your opinions are extremely
valuable!

> Mikhail Fursov commented on HARMONY-1682:
> -----------------------------------------
> 
> Ivan, I checked the test you sent
> The test fails and GC runs only once before the failure.
> I moved all methods except the problem one to Jitrino.JET and 
> Jitrino.OPT reports only 2 items: one object and one [mptr, 
> object] pair with a small offset.

Mikhail, 
a small offtopic: do you use debugger to detect that one object is
reported? I cannot see it in the logs. I can see only "pairs dump":
========================================================================
__IR_DUMP_BEGIN__: pairs dump
========================================================================
inst=1 num_pairs=0
inst=8 num_pairs=0
inst=18 num_pairs=1
    mptr=19 base=-1 offset=2147483647
inst=26 num_pairs=1
    mptr=19 base=-1 offset=2147483647
inst=35 num_pairs=1
    mptr=19 base=-1 offset=2147483647
inst=38 num_pairs=0
inst=48 num_pairs=1
    mptr=19 base=-1 offset=2147483647
inst=49 num_pairs=1
    mptr=19 base=-1 offset=2147483647
inst=50 num_pairs=1
    mptr=19 base=-1 offset=2147483647
inst=51 num_pairs=1
    mptr=19 base=-1 offset=2147483647
inst=52 num_pairs=1
    mptr=19 base=-1 offset=2147483647
inst=55 num_pairs=1
    mptr=19 base=-1 offset=2147483647
inst=57 num_pairs=1
    mptr=19 base=-1 offset=2147483647
inst=59 num_pairs=1
    mptr=19 base=-1 offset=2147483647
inst=67 num_pairs=0
inst=70 num_pairs=0
inst=80 num_pairs=0
inst=84 num_pairs=0
inst=94 num_pairs=0
inst=96 num_pairs=0
inst=99 num_pairs=0
inst=102 num_pairs=0
========================================================================
__IR_DUMP_END__: pairs dump
========================================================================

it should be slightly deferrent from yours because I turned off some
preliminary optimizations ("devirt","inline") to have a smaller IR, so
inst IDs are different, and this is OK.

My suspicious place in the IR looks like:
----------------------------------------------------------------------
BB_12
  PersistentId = 20
  ExecCnt = 473637
  Loop: Depth=2, !hdr, hdr=BB_11
  Predcessors: BB_11
  Successors:  BB_13 [Prob=1] UN_48 [Prob=1e-07](loopexit)
    I22: (AD:t32:cls:ClTest$MyObject) =CopyPseudoInst (AU:t31[t19]:object)


BB_13
  PersistentId = 21
  ExecCnt = 473637
  Loop: Depth=2, !hdr, hdr=BB_11
  Predcessors: BB_12
  Successors:  BB_15 [Prob=1] UN_48 [Prob=1e-07](loopexit)
    I24: (AD:t33:vtb:cls:ClTest$MyObject) =CopyPseudoInst (AU:t35[t32+t34(0:vtao)]:vtb:cls:ClTest$MyObject)
    I25: t37:method:getPackages (ID:v8(EFLGS):uint32) =ADD t33:vtb:cls:ClTest$MyObject,t36(0:vtso:ClTest$MyObject.getPackages):int32
    I26: (AD:t39:cls:ClTest$Package[]) =CALL t38[t37]:ptr:uintptr (AU:t32:cls:ClTest$MyObject)
----------------------------------------------------------------------

"I22" is the memory read of "acl[i]". (Here "acl" is the name of
original array from the test) The address is composed from the array's
base, plus the loop invariant index. It was optimized out by "memopt"
to become a simple memory read (bounds checks were also aggressively
eliminated by "memopt")

"I26" is the corresponding safepoint. An I right?

Another "acl[i]" follows this point. (insts I31, I63 for me)

My question here is: "t19 is the mem address, but why is it itself
considered as a managed pointer?" In my opinion, correct reporting
would make "acl" a base pointer, "t19" as the final address, and "t19
- acl" as offset. 

What I suspect here to happen. We stop on "I26" with GC. GC moves
"acl" array to another address, while "t19" becomes invalid, because
it was not reported as an interior to "acl" (was it?)

I can stick to this place in GDB and detect what really happens, but a
proof of concept from someone would be, anyway, useful for my
understanding. Can an array be moved like this in current
implementation? Is it really no relation between "acl" and "t19"
reported by JIT? If the last assumption is true, behaviour is quite
predictable. "t19" address is not updated, and the next mem read
from "[t19]" leads to SEGV.

Without memopt, "t19" is composed as "base + offset", the
corresponding memread looks like this:
----------------------------------------------------------------------
BB_14
  PersistentId = 20
  ExecCnt = 473626
  Loop: Depth=2, !hdr, hdr=BB_11
  Predcessors: BB_12
  Successors:  BB_15 [Prob=1] UN_52 [Prob=1e-07](loopexit)
Layout Succ: BB_15
    I31: (AD:t43(EAX):cls:ClTest$MyObject) =CopyPseudoInst (AU:t42[t4(ESI)+t31(EBX)*t40(4)+t38(12)]:object)

BB_15
  PersistentId = 21
  ExecCnt = 473626
  Loop: Depth=2, !hdr, hdr=BB_11
  Predcessors: BB_14
  Successors:  BB_17 [Prob=1] UN_52 [Prob=1e-07](loopexit)
Layout Succ: BB_17
    I33: (AD:t44(ECX):vtb:cls:ClTest$MyObject) =CopyPseudoInst (AU:t46[t43(EAX)+t45(0:vtao)]:vtb:cls:ClTest$MyObject)
    I35: (AD:t50(EAX):cls:ClTest$Package[]) =CALL t49[t44(ECX)+t47(88:vtso:ClTest$MyObject.getPackages)]:ptr:uintptr (AU:t43(EAX):cls:ClTest$MyObject)
    I125: GCInfoPseudoInst (AU:t4(ESI):cls:ClTest$MyObject[]) [phase:gcpoints]([t4,0])
----------------------------------------------------------------------

And "pairs dump" is always zero:
inst=18 num_pairs=0

What we can do here within Jitrino is another story:
* disable "memopt"
* disable "addindex" optimization in "memopt", that would lead to
  explicit "+" operation on each access.
* make "memopt" mark "t19" as a "base + offset", support in GC may be required

> According to the IR I have this is OK.
> The GC enumeration point is:
> I36: (AD:t54(EAX):cls:ClTest$Package[]) =CALL t53(-390:m:ClTest$MyObject.getPackages):int32 (AU:t43(EAX):cls:ClTest$MyObject) 
> 
> I can help you to add more logging into Jitrino.OPT GCMap 
> algorithms. Can you imagine the logging that will help?

-- 
Egor Pasko, Intel Managed Runtime Division


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Egor Pasko <eg...@gmail.com>.
On the 0x1F9 day of Apache Harmony Ivan Volosyuk wrote:
> On 05 Oct 2006 17:09:35 +0700, Egor Pasko <eg...@gmail.com> wrote:
> > Hello, JIT & GC gurus!
> >
> > I'd like to share my observations on HARMONY-1862 and move the
> > discussion from JIRA. Mikhail, Ivan, your opinions are extremely
> > valuable!
> >
> > > Mikhail Fursov commented on HARMONY-1682:
> > > -----------------------------------------
> > >
> > > Ivan, I checked the test you sent
> > > The test fails and GC runs only once before the failure.
> > > I moved all methods except the problem one to Jitrino.JET and
> > > Jitrino.OPT reports only 2 items: one object and one [mptr,
> > > object] pair with a small offset.
> 
> Well, I have little knowledge in jitrino internals, same for my
> ability to read IR trees.
> 
> "The offset == MAX_INT is interpreted as unknown. In this case the
> correct base for a
> managed pointer is chosen in runtime: the algorithm searches for the nearest
> known base."
> 
> Is this can be a problem? If the base pointer is optimized out and we
> will find different object base?

Yes, it can be. Base pointers should be preserved by Jitrino. There is
a mechanism for that in CG, but I see that the "base" is optimized out
as deadcode in "cg_dce". Mikhail, do you see that?

-- 
Egor Pasko, Intel Managed Runtime Division


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Ivan Volosyuk <iv...@gmail.com>.
Agree with pluses and minuses :)

2 Egor, it is not so easy to find 'base' pointer having only interior
pointer to an object. So, base pointer should be saved or it should be
deducible somehow.

Could someone review the changes in the patch to jitrino? I'm afraid I
don't have sufficient knowledge of the jitrino internals to do that.
--
Ivan

On 10/9/06, Mikhail Fursov <mi...@gmail.com> wrote:
> On 10/9/06, Salikh Zakirov <Sa...@intel.com> wrote:
> >
> > IMHO, the best solution would be to use (slot,offset) everywhere,
> > and pre-compute and cache offsets before enumerating the base pointer.
>
>
> +1. Already did it in the patch.
>
> Or even enumerate base pointer *after* all interior pointers that depend on
> > it.
>
>
> -1. Adding more states that must be documented and understood. :)
>
>
>
> --
> Mikhail Fursov

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Mikhail Fursov <mi...@gmail.com>.
On 10/9/06, Salikh Zakirov <Sa...@intel.com> wrote:
>
> IMHO, the best solution would be to use (slot,offset) everywhere,
> and pre-compute and cache offsets before enumerating the base pointer.


+1. Already did it in the patch.

Or even enumerate base pointer *after* all interior pointers that depend on
> it.


-1. Adding more states that must be documented and understood. :)



-- 
Mikhail Fursov

Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Salikh Zakirov <Sa...@Intel.com>.
Mikhail Fursov wrote:
> On 10/7/06, Weldon Washburn <we...@gmail.com> wrote:
>>
>> #3 might have originally been put in the JIT/GC interface because the JIT
>> developers liked it.  I can't think of a reason why a GC would prefer
>> this
>> interface over #2 above.
> 
> If it could be  JIT developers decision we would like not to distinguish
> between bases and mptrs at all.:) I mean to keep only
> 'enumerate(base_or_mptr)' method
> I'm not sure the code JIT generates will be much better in this case, but
> JIT internals will become simpler.

Sorry, as I am already lost in #2, #3 and other denotations, my comment
may turn out to be irrelevant, but I think it is worth noting, that

JIT enumeration interface is actually passed through VM like JIT<->VM<->GC:

JIT calls VM funtion vm_enumerate_root_interior_pointer_with_base(slot,base,...)

vmcore\src\gc\root_set_enum_common.cpp
183 void vm_enumerate_root_interior_pointer_with_base(void **slot_root, void **slot_base, Boolean is_pinned)
184 {
185     int offset = (int)(POINTER_SIZE_INT)(*((Byte**)slot_root)-*((Byte**)slot_base));
186     gc_add_root_set_entry_interior_pointer(slot_root, offset, is_pinned);
187 }

And VM in turn calls GC function gc_add_root_set_entry_interior_pointer(slot,offset,...)

IMHO, the best solution would be to use (slot,offset) everywhere,
and pre-compute and cache offsets before enumerating the base pointer.
Or even enumerate base pointer *after* all interior pointers that depend on it.
 


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Mikhail Fursov <mi...@gmail.com>.
On 10/7/06, Weldon Washburn <we...@gmail.com> wrote:
>
> #3 might have originally been put in the JIT/GC interface because the JIT
> developers liked it.  I can't think of a reason why a GC would prefer this
> interface over #2 above.

If it could be  JIT developers decision we would like not to distinguish
between bases and mptrs at all.:) I mean to keep only
'enumerate(base_or_mptr)' method
I'm not sure the code JIT generates will be much better in this case, but
JIT internals will become simpler.


Mikhail,
> Can you ask other JIT developers if it will hurt code optimization if we
> drop #3?  Maybe its a performance hit carrying around offset instead of
> interior pointer??
>

Ok I'll ask all experts I know to participate in this discussion if they
have something to add.

-- 
Mikhail Fursov

Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Weldon Washburn <we...@gmail.com>.
On 10/6/06, Ivan Volosyuk <iv...@gmail.com> wrote:
>
> On 10/6/06, Mikhail Fursov <mi...@gmail.com> wrote:
> > On 10/6/06, Ivan Volosyuk <iv...@gmail.com> wrote:
> > >
> > >
> > > I would like to have second variant implemented and corresponding
> > > (method3) interface removed as unused anymore. But, I'm not sure if
> > > this is possible. Thread stack enumeration  is just a part of root set
> > > enumeration process. Roots can be already reported (and moved) even
> > > before the jitrino enumeration starts. Will the second variant work
> > > despite of this?
> >
> >
> > Why do you think it will not work?
> > Here is the algorithm
> > JIT reports 2 types of items: 1) bases 2) mptrs + offsets
> >  1)  Once JIT report base GC uses the same algorithm as before: it moves
> the
> > object and for subsequent reports of the same base it uses 'forwarding
> > pointer' as you mentioned before:
> > " if the second reference points  to previous position of object it will
> be
> > automatically updated  according to installed forwarding pointer."
> >
> > 2) When JIT reports mptr + offset, GC deduce the old object location,
> and
> > uses the same 'forwarding pointer' to adjust the mptr
> >
> > What is wrong with this schema?
>
> If it is possible to get somehow valid offset within object then it is
> great! I would be happy to see this solved this way. Waiting for your
> patch.


Good idea!  I like it!

If I understand this completely, I agree with using only one API to report
interior pointers.  That is #2 in Mikhail's email above.  It looks like:

2. VMEXPORT void vm_enumerate_root_interior_pointer(void **slot, int offset,
Boolean is_pinned);

And remove #3 which reports object base and interior pointer from the JIT/GC
interface.

#3 might have originally been put in the JIT/GC interface because the JIT
developers liked it.  I can't think of a reason why a GC would prefer this
interface over #2 above.

Mikhail,
Can you ask other JIT developers if it will hurt code optimization if we
drop #3?  Maybe its a performance hit carrying around offset instead of
interior pointer??


For, now I have attached to JIRA HARMONY-1682 the pessimistic variant
> of fix for those who may not be interested in the discussion and wants
> to have the problem fixed (workarounded?) anyhow.
> --
> Thanks,
> Ivan
>
> >
> >
> > There is one more way to make this work. Do the same as gc_v4 does:
> > > get full root set and then do the modifications to heap.
> > >
> >
> > Yes this is the most pessimistic variant we have if all other variants
> > failed
> >
> > --
> > Mikhail Fursov
> >
> >
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Weldon Washburn
Intel Middleware Products Division

Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Mikhail Fursov <mi...@gmail.com>.
On 10/6/06, Ivan Volosyuk <iv...@gmail.com> wrote:
>
> If it is possible to get somehow valid offset within object then it is
> great! I would be happy to see this solved this way. Waiting for your
> patch.
>

Folks,
I put the patch
"jit_gc.diff<http://issues.apache.org/jira/secure/attachment/12342565/jit_gc.diff>"
into the JIRA and removed all use of
'vm_enumerate_root_interior_pointer_with_base' from Jitrino.
I didn't clean GC to avoid conflicts with ongoing GCV5 integration.


-- 
Mikhail Fursov

Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Mikhail Fursov <mi...@gmail.com>.
On 09 Oct 2006 14:25:58 +0700, Egor Pasko <eg...@gmail.com> wrote:
>
> BTW, what is mptr? is it 'base + offset' by definition? today I am
> more about dummy questions :)
>
> 'offset' is sometimes an operand (a mem location or a register) and
> not a mere int. As in our example, when arrays are accessed. Do I get
> it right?


Egor,  the managed (or interior) pointer is a pointer to the middle of the
object. This can be a field pointer, an array element pointer or a pointer
to VM's data in object that is usually stored in the first bytes of an
object.

'offset' usually is const 'int' value if it's known during the compilationt
time. In this case  mptr points to the field of an object. Offset is a
value placed in register or in memory if mptr is a pointer into the middle
of an array and there are no constant offset exists.

In the test that fails the offset within the array is unknown during the
compilation time: it changes inside of the loop.


Is it always easy to deduce which mem address relates to which base
> and preserve that information throughout all high-level optimizations?


I think yes. HLO operations are type safe and CG relies on it.


One example that does not let me sleep well at night.
> Some optimization (strength reduction) can convert a code like this:
> for (int i = 0; i < length; i++) {
>     A[A.base + i * sizeof(elem)] = X
> }
> to something like:
> for (int i = A.base; i < A.base + length * sizeof(elem);
>      i += sizeof(elem) ) {
>     A[i] = X
> }
>
> we need to report 'i' as a base + offset. Where is the base? Reporting
> 'i' as interior (to search A.base at runtime) has been easy, but now
> it is not supposed to work. What do we offer instead?
>
> Mikhail, maybe, your strategy covers that situation well, I just want
> be happy understanding all these things.
>
> In this case we do not know the offset during the compilation time. So
Jitrino.OPT will do the following
1) Make the base (the array opnd) to live through the whole loop
2) When reporting mptr Jitrino.OPT look for the nearest known base in memory
and report it as base for the managed pointer.
Subtracting the base address from mptr address we get the offset. The bug
reason is that GCv4.1 moves base object before we found the valid offset.

-- 
Mikhail Fursov

Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Egor Pasko <eg...@gmail.com>.
On the 0x1FA day of Apache Harmony Mikhail Fursov wrote:
> On 10/6/06, Ivan Volosyuk <iv...@gmail.com> wrote:
> >
> > If it is possible to get somehow valid offset within object then it is
> > great! I would be happy to see this solved this way. Waiting for your
> > patch.
> 
> 
> Yes, before reporting all objects and mptr for a method JIT is able to
> calculate offsets and use them when reporting mptrs.

BTW, what is mptr? is it 'base + offset' by definition? today I am
more about dummy questions :)

'offset' is sometimes an operand (a mem location or a register) and
not a mere int. As in our example, when arrays are accessed. Do I get
it right?

Is it always easy to deduce which mem address relates to which base
and preserve that information throughout all high-level optimizations?

One example that does not let me sleep well at night.
Some optimization (strength reduction) can convert a code like this:
for (int i = 0; i < length; i++) {
    A[A.base + i * sizeof(elem)] = X
}
to something like:
for (int i = A.base; i < A.base + length * sizeof(elem); 
     i += sizeof(elem) ) {
    A[i] = X
}

we need to report 'i' as a base + offset. Where is the base? Reporting
'i' as interior (to search A.base at runtime) has been easy, but now
it is not supposed to work. What do we offer instead?

Mikhail, maybe, your strategy covers that situation well, I just want
be happy understanding all these things.

> For, now I have attached to JIRA HARMONY-1682 the pessimistic variant
> > of fix for those who may not be interested in the discussion and wants
> > to have the problem fixed (workarounded?) anyhow.
> >
> 
> Ok.
> 
> -- 
> Mikhail Fursov

-- 
Egor Pasko, Intel Managed Runtime Division


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Mikhail Fursov <mi...@gmail.com>.
On 10/6/06, Ivan Volosyuk <iv...@gmail.com> wrote:
>
> If it is possible to get somehow valid offset within object then it is
> great! I would be happy to see this solved this way. Waiting for your
> patch.


Yes, before reporting all objects and mptr for a method JIT is able to
calculate offsets and use them when reporting mptrs.

For, now I have attached to JIRA HARMONY-1682 the pessimistic variant
> of fix for those who may not be interested in the discussion and wants
> to have the problem fixed (workarounded?) anyhow.
>

Ok.

-- 
Mikhail Fursov

Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Ivan Volosyuk <iv...@gmail.com>.
On 10/6/06, Mikhail Fursov <mi...@gmail.com> wrote:
> On 10/6/06, Ivan Volosyuk <iv...@gmail.com> wrote:
> >
> >
> > I would like to have second variant implemented and corresponding
> > (method3) interface removed as unused anymore. But, I'm not sure if
> > this is possible. Thread stack enumeration  is just a part of root set
> > enumeration process. Roots can be already reported (and moved) even
> > before the jitrino enumeration starts. Will the second variant work
> > despite of this?
>
>
> Why do you think it will not work?
> Here is the algorithm
> JIT reports 2 types of items: 1) bases 2) mptrs + offsets
>  1)  Once JIT report base GC uses the same algorithm as before: it moves the
> object and for subsequent reports of the same base it uses 'forwarding
> pointer' as you mentioned before:
> " if the second reference points  to previous position of object it will be
> automatically updated  according to installed forwarding pointer."
>
> 2) When JIT reports mptr + offset, GC deduce the old object location, and
> uses the same 'forwarding pointer' to adjust the mptr
>
> What is wrong with this schema?

If it is possible to get somehow valid offset within object then it is
great! I would be happy to see this solved this way. Waiting for your
patch.

For, now I have attached to JIRA HARMONY-1682 the pessimistic variant
of fix for those who may not be interested in the discussion and wants
to have the problem fixed (workarounded?) anyhow.
--
Thanks,
Ivan

>
>
> There is one more way to make this work. Do the same as gc_v4 does:
> > get full root set and then do the modifications to heap.
> >
>
> Yes this is the most pessimistic variant we have if all other variants
> failed
>
> --
> Mikhail Fursov
>
>

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Mikhail Fursov <mi...@gmail.com>.
On 10/6/06, Ivan Volosyuk <iv...@gmail.com> wrote:
>
>
> I would like to have second variant implemented and corresponding
> (method3) interface removed as unused anymore. But, I'm not sure if
> this is possible. Thread stack enumeration  is just a part of root set
> enumeration process. Roots can be already reported (and moved) even
> before the jitrino enumeration starts. Will the second variant work
> despite of this?


Why do you think it will not work?
Here is the algorithm
JIT reports 2 types of items: 1) bases 2) mptrs + offsets
 1)  Once JIT report base GC uses the same algorithm as before: it moves the
object and for subsequent reports of the same base it uses 'forwarding
pointer' as you mentioned before:
" if the second reference points  to previous position of object it will be
automatically updated  according to installed forwarding pointer."

2) When JIT reports mptr + offset, GC deduce the old object location, and
uses the same 'forwarding pointer' to adjust the mptr

What is wrong with this schema?


There is one more way to make this work. Do the same as gc_v4 does:
> get full root set and then do the modifications to heap.
>

Yes this is the most pessimistic variant we have if all other variants
failed

-- 
Mikhail Fursov

Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Ivan Volosyuk <iv...@gmail.com>.
On 10/6/06, Mikhail Fursov <mi...@gmail.com> wrote:
> Ivan,
> I'm agree that the problem we have is more complicated then my example. And
> I'm glad to hear that there are no problem with the scenario from my example
> today.
>
> Jitrino.OPT uses  3 interface methods to report GC root set:
>
> 1. VMEXPORT void vm_enumerate_root_reference(Managed_Object_Handle *ref,
> Boolean is_pinned);
> 2. VMEXPORT void vm_enumerate_root_interior_pointer(void **slot, int offset,
> Boolean is_pinned);
> 3. VMEXPORT void vm_enumerate_root_interior_pointer_with_base(void
> **slot_root, void **slot_base, Boolean is_pinned);
>
> The method 1 is used for bases
> The method 2 is used for mptrs with offset known at compilation time
> The method 3 is used for mptrs with dynamic offset that is computed only in
> runtime
>
> In our case JIT reports the pair: new_base and old_mptr.
> JIT does not report offset here. So if you could process this situation with
> 'forwarding' pointer we have the half of the problem solved.
>
> The other half of the problem is finding a base for the mptr with unknown
> offset.
> The current implementation in JIT looks for a base right before reporting
> the mptr: it selects the nearest one. So if base pointer was updated by GC
> the search will produce incorrect results.
>
> I see the following solutions in our problem
> 1) Fix in GC and JIT: GC is learned how to deal with mptrs reported with
> outdated base + fix in JIT in base searching algorithm (here we have a lot
> of variants)
> 2) Fix in JIT only: precache all offsets and then report. In this case JIT
> will always use only method1 and method2.
>
> I vote for the item 2 and can provide a fix in a day.
> But in this case I propose to remove the method3 from GC-JIT interface at
> all, as deprecated.

I would like to have second variant implemented and corresponding
(method3) interface removed as unused anymore. But, I'm not sure if
this is possible. Thread stack enumeration  is just a part of root set
enumeration process. Roots can be already reported (and moved) even
before the jitrino enumeration starts. Will the second variant work
despite of this?

There is one more way to make this work. Do the same as gc_v4 does:
get full root set and then do the modifications to heap.

-- 
Ivan
Intel Middleware Products Division

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Mikhail Fursov <mi...@gmail.com>.
Ivan,
I'm agree that the problem we have is more complicated then my example. And
I'm glad to hear that there are no problem with the scenario from my example
today.

Jitrino.OPT uses  3 interface methods to report GC root set:

1. VMEXPORT void vm_enumerate_root_reference(Managed_Object_Handle *ref,
Boolean is_pinned);
2. VMEXPORT void vm_enumerate_root_interior_pointer(void **slot, int offset,
Boolean is_pinned);
3. VMEXPORT void vm_enumerate_root_interior_pointer_with_base(void
**slot_root, void **slot_base, Boolean is_pinned);

The method 1 is used for bases
The method 2 is used for mptrs with offset known at compilation time
The method 3 is used for mptrs with dynamic offset that is computed only in
runtime

In our case JIT reports the pair: new_base and old_mptr.
JIT does not report offset here. So if you could process this situation with
'forwarding' pointer we have the half of the problem solved.

The other half of the problem is finding a base for the mptr with unknown
offset.
The current implementation in JIT looks for a base right before reporting
the mptr: it selects the nearest one. So if base pointer was updated by GC
the search will produce incorrect results.

I see the following solutions in our problem
1) Fix in GC and JIT: GC is learned how to deal with mptrs reported with
outdated base + fix in JIT in base searching algorithm (here we have a lot
of variants)
2) Fix in JIT only: precache all offsets and then report. In this case JIT
will always use only method1 and method2.

I vote for the item 2 and can provide a fix in a day.
But in this case I propose to remove the method3 from GC-JIT interface at
all, as deprecated.


On 10/6/06, Ivan Volosyuk <iv...@gmail.com> wrote:
>
> This is not the same. In your example, if the second reference points
> to previous position of object it will be automatically updated
> according to installed forwarding pointer.
>
> We have different problem. The reported offset within object is
> incorrect, as the base is taken from new location of object, but
> interior pointer points to old object location.
>
> Here is algorithm.
> before GC:
> *interior_pointer == base + offset
> after GC:
> *interior_pointer_updated == new_base + offset
>
> Reported offset should be: offset = *interior_pointer - base;
> But we have: offset' = *interior_pointer - new_base;
>
> After GC interior_pointer should be updated:
>   *interior_pointer_updated = new_base + offset
> But in our case it will be:
> *interiour_pointer_updated' = new_base + offset' = new_base +
> *interior_pointer - new_base = *interior_pointer
>
> So, interior_pointer will not be updated.
> --
> Ivan
>
> On 10/6/06, Mikhail Fursov <mi...@gmail.com> wrote:
> > Ivan,
> > the problem is described in the example in H1682, I can add it to the
> email
> > thread to invite other GC/JIT gurus to participate in the discussion.
> >
> > The example:
> > JIT has 2 references to report. Both of them point to the same object.
> JIT
> > expects that both references are updated when GC moves object.
> > 1) JIT reports reference 1
> > 2) GC moves object and updates reference 1.
> > 3) JIT reports reference 2, but the reference points to the old place
> where
> > the object was.
> >
> > This is almost the same situation we have today.
> >
> >
> >
> >
> > On 10/6/06, Ivan Volosyuk <iv...@gmail.com> wrote:
> > >
> > > Why don't we update roots just when it was reported? Thus we don't
> need to
> > > keep large array of roots and it can also have positive influence on
> > > d-cache utilization (need to proof this).
> > >
> >
> > --
> > Mikhail Fursov
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Mikhail Fursov

Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Ivan Volosyuk <iv...@gmail.com>.
This is not the same. In your example, if the second reference points
to previous position of object it will be automatically updated
according to installed forwarding pointer.

We have different problem. The reported offset within object is
incorrect, as the base is taken from new location of object, but
interior pointer points to old object location.

Here is algorithm.
before GC:
 *interior_pointer == base + offset
after GC:
 *interior_pointer_updated == new_base + offset

Reported offset should be: offset = *interior_pointer - base;
But we have: offset' = *interior_pointer - new_base;

After GC interior_pointer should be updated:
  *interior_pointer_updated = new_base + offset
But in our case it will be:
 *interiour_pointer_updated' = new_base + offset' = new_base +
*interior_pointer - new_base = *interior_pointer

So, interior_pointer will not be updated.
--
Ivan

On 10/6/06, Mikhail Fursov <mi...@gmail.com> wrote:
> Ivan,
> the problem is described in the example in H1682, I can add it to the email
> thread to invite other GC/JIT gurus to participate in the discussion.
>
> The example:
> JIT has 2 references to report. Both of them point to the same object. JIT
> expects that both references are updated when GC moves object.
> 1) JIT reports reference 1
> 2) GC moves object and updates reference 1.
> 3) JIT reports reference 2, but the reference points to the old place where
> the object was.
>
> This is almost the same situation we have today.
>
>
>
>
> On 10/6/06, Ivan Volosyuk <iv...@gmail.com> wrote:
> >
> > Why don't we update roots just when it was reported? Thus we don't need to
> > keep large array of roots and it can also have positive influence on
> > d-cache utilization (need to proof this).
> >
>
> --
> Mikhail Fursov

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Mikhail Fursov <mi...@gmail.com>.
Ivan,
the problem is described in the example in H1682, I can add it to the email
thread to invite other GC/JIT gurus to participate in the discussion.

The example:
JIT has 2 references to report. Both of them point to the same object. JIT
expects that both references are updated when GC moves object.
1) JIT reports reference 1
2) GC moves object and updates reference 1.
3) JIT reports reference 2, but the reference points to the old place where
the object was.

This is almost the same situation we have today.




On 10/6/06, Ivan Volosyuk <iv...@gmail.com> wrote:
>
> Why don't we update roots just when it was reported? Thus we don't need to
> keep large array of roots and it can also have positive influence on
> d-cache utilization (need to proof this).
>

-- 
Mikhail Fursov

Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Ivan Volosyuk <iv...@gmail.com>.
On 10/5/06, Mikhail Fursov <mi...@gmail.com> wrote:
> Ivan, could this enumeration scenario be possible?
>
> 1) JIT reports a base to GC
> 2) GC relocates the base and updates references.
> 3) JIT reports MPTR, looks for the base, founds the patched base on the
> stack and the resulting offset becomes invalid?
>
> The previous GC (gcv4) did not relocate object during the enumeration
> process. If gcv4.1 does this is the reason of the problem.
> ?

I see. Here is the problem.

The reason of the behaviour was: Root set size can be quite large on
some workloads. It can consume a lot of memory if would keep it. Why
don't we update roots just when it was reported? Thus we don't need to
keep large array of roots and it can also have positive influence on
d-cache utilization (need to proof this).

There is a command line switch: -Dgc.remember_root_set=true which will
cause GC to retrieve root set before any operations on heap.

-- 
Ivan
Intel Middleware Products Division

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Mikhail Fursov <mi...@gmail.com>.
Ivan, could this enumeration scenario be possible?

1) JIT reports a base to GC
2) GC relocates the base and updates references.
3) JIT reports MPTR, looks for the base, founds the patched base on the
stack and the resulting offset becomes invalid?

The previous GC (gcv4) did not relocate object during the enumeration
process. If gcv4.1 does this is the reason of the problem.
?

On 10/5/06, Ivan Volosyuk <iv...@gmail.com> wrote:
>
> On 10/5/06, Mikhail Fursov <mi...@gmail.com> wrote:
> > On 10/5/06, Ivan Volosyuk <iv...@gmail.com> wrote:
> > >
> > > Is this can be a problem? If the base pointer is optimized out and we
> > > will find different object base?
> > > --
> > > Ivan
> > >
> > > The base pointer for mptr with unknown offset must live as long as
> mptr is
> > alive.
> > The 'gcpoint' pass adds pseudo usages for such bases after every call
> > instruction where mptr with unknown offset is used.
> > So if base pointer is lost this is a bug.
>
> I can make a patch which will check this assertion in the GC enumeration
> code.
> --
> Ivan
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Mikhail Fursov

Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Ivan Volosyuk <iv...@gmail.com>.
On 10/5/06, Mikhail Fursov <mi...@gmail.com> wrote:
> On 10/5/06, Ivan Volosyuk <iv...@gmail.com> wrote:
> >
> > Is this can be a problem? If the base pointer is optimized out and we
> > will find different object base?
> > --
> > Ivan
> >
> > The base pointer for mptr with unknown offset must live as long as mptr is
> alive.
> The 'gcpoint' pass adds pseudo usages for such bases after every call
> instruction where mptr with unknown offset is used.
> So if base pointer is lost this is a bug.

I can make a patch which will check this assertion in the GC enumeration code.
--
Ivan

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Mikhail Fursov <mi...@gmail.com>.
On 10/5/06, Ivan Volosyuk <iv...@gmail.com> wrote:
>
> Is this can be a problem? If the base pointer is optimized out and we
> will find different object base?
> --
> Ivan
>
> The base pointer for mptr with unknown offset must live as long as mptr is
alive.
The 'gcpoint' pass adds pseudo usages for such bases after every call
instruction where mptr with unknown offset is used.
So if base pointer is lost this is a bug.

-- 
Mikhail Fursov

Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Ivan Volosyuk <iv...@gmail.com>.
On 05 Oct 2006 17:09:35 +0700, Egor Pasko <eg...@gmail.com> wrote:
> Hello, JIT & GC gurus!
>
> I'd like to share my observations on HARMONY-1862 and move the
> discussion from JIRA. Mikhail, Ivan, your opinions are extremely
> valuable!
>
> > Mikhail Fursov commented on HARMONY-1682:
> > -----------------------------------------
> >
> > Ivan, I checked the test you sent
> > The test fails and GC runs only once before the failure.
> > I moved all methods except the problem one to Jitrino.JET and
> > Jitrino.OPT reports only 2 items: one object and one [mptr,
> > object] pair with a small offset.

Well, I have little knowledge in jitrino internals, same for my
ability to read IR trees.

"The offset == MAX_INT is interpreted as unknown. In this case the
correct base for a
managed pointer is chosen in runtime: the algorithm searches for the nearest
known base."

Is this can be a problem? If the base pointer is optimized out and we
will find different object base?
--
Ivan

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Mikhail Fursov <mi...@gmail.com>.
On 05 Oct 2006 18:35:23 +0700, Egor Pasko <eg...@gmail.com> wrote:
>
> > So once managed pointer has no known static offset (the offset is
> > MAX_INT) the base is searched in runtime as I described above.
>
> should we document this? :)


I think  we should document the whole JIT.  The question is when :) ? I hope
the internal testing framework for Jitrino.OPT compiler will quicken it.

I don't quite well understand how an object reference is not reported
> at compile-time, but found at runtime.. Sniffing refs on a stack
> frame? Hm, is there any reason I thought Jitrino is not like that?...


 The log you pointed to is the log from MPTR resolving algorithm. This
algorithms is not interested in bases.
If you want to know all items that are reported to GC you should look for
GCInfoPseudoInst that are interested by GCMap pass. GCMap pass inserts these
insts after each call only for the logging purpose. The same
GCInfoPseudoInst is inserted by 'gcpoints' pass on the beginning of code
generation to extend bases life range for managed pointers with unknown
offsets.
Doing this we can guarantee that for every managed pointer with unknown
offset we have valid base pointer at every call.

So, the only way to detect all reported objects on a call-site is to
> stop on the call-site at runtime during enumeration? Not
> impressed. Does it log to the runtime log (rt.log) currently? I tried
> "-Djit.log=rt", but it produced no result. Is that a bug?


No, try -Djit.arg.log=rt or -Djit.CS_OPT.arg.log=rt to enable runtime
logging category.


-- 
Mikhail Fursov

Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Egor Pasko <eg...@gmail.com>.
On the 0x1F9 day of Apache Harmony Mikhail Fursov wrote:
> Egor,
> You are right that log contains only (mptr,base) pairs. The offset ==
> MAX_INT is interpreted as unknown. In this case the correct base for a
> managed pointer is chosen in runtime: the algorithm searches for the nearest
> known base.

aah! good thing. so the only left is "acl" that should be
reported (so GC could find it). In HLO the instruction for "acl" is:
I26:ldind.unc:o   [t15] ((t9,t13)) -) t16:cls:ClTest$MyObject

I can attach my log to JIRA if it makes sense.

> In the log for the GCMap codegenerator's stage base is always '-1' and this
> is correct (I can explain why if needed). 

If it does not bother you much, I would be happy to know more.

> So once managed pointer has no known static offset (the offset is
> MAX_INT) the base is searched in runtime as I described above.

should we document this? :)

> + Yes, I watched GC enumeration results from my debugger. The logs has only
> compile-time information.

I don't quite well understand how an object reference is not reported
at compile-time, but found at runtime.. Sniffing refs on a stack
frame? Hm, is there any reason I thought Jitrino is not like that?...

So, the only way to detect all reported objects on a call-site is to
stop on the call-site at runtime during enumeration? Not
impressed. Does it log to the runtime log (rt.log) currently? I tried
"-Djit.log=rt", but it produced no result. Is that a bug?

> Right now I'm finishing magic support patch for JET and after it's finished
> (1-2 hours) I going to switch to this problem. I hope I will help.

Great! I am obviously much slower on this part than you!

> AFAIU the optimizer's path should be
> -Djit.CS_OPT.path.optimizer=ssa
> ,purge,simplify,uce,dce,lazyexc,memopt,simplify,uce,dce,lower,dessa,statprof,markglobals
> ?
> I ask it because I want to have the same IR numbers as you have

Yes, my path is exactly as yours:
-Djit.CS_OPT.path.optimizer=ssa,purge,simplify,uce,dce,lazyexc,memopt,simplify,uce,dce,lower,dessa,statprof,markglobals

> On 05 Oct 2006 17:09:35 +0700, Egor Pasko <eg...@gmail.com> wrote:
> >
> > Mikhail,
> > a small offtopic: do you use debugger to detect that one object is
> > reported? I cannot see it in the logs. I can see only "pairs dump":

-- 
Egor Pasko, Intel Managed Runtime Division


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [drlvm] Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

Posted by Mikhail Fursov <mi...@gmail.com>.
Egor,
You are right that log contains only (mptr,base) pairs. The offset ==
MAX_INT is interpreted as unknown. In this case the correct base for a
managed pointer is chosen in runtime: the algorithm searches for the nearest
known base.

In the log for the GCMap codegenerator's stage base is always '-1' and this
is correct (I can explain why if needed). So once managed pointer has no
known static offset (the offset is MAX_INT) the base is searched in runtime
as I described above.

+ Yes, I watched GC enumeration results from my debugger. The logs has only
compile-time information.
Right now I'm finishing magic support patch for JET and after it's finished
(1-2 hours) I going to switch to this problem. I hope I will help.

AFAIU the optimizer's path should be
-Djit.CS_OPT.path.optimizer=ssa
,purge,simplify,uce,dce,lazyexc,memopt,simplify,uce,dce,lower,dessa,statprof,markglobals
?
I ask it because I want to have the same IR numbers as you have

On 05 Oct 2006 17:09:35 +0700, Egor Pasko <eg...@gmail.com> wrote:
>
> Mikhail,
> a small offtopic: do you use debugger to detect that one object is
> reported? I cannot see it in the logs. I can see only "pairs dump":
>

-- 
Mikhail Fursov