You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@harmony.apache.org by Mikhail Fursov <mi...@gmail.com> on 2006/08/09 16:26:04 UTC

[drlvm] Helper inlining in JIT

Folks,
Once we decided that we do not want to limit Harmony with only one VM/JIT/GC
instance I think it's time to start discussion how all these components will
work together effectively.
So the proposal is to discuss helper inlining interface.

Every component we have (VM, GC : )  can have helpers that must be called
from managed code.
While generating native code for Java method JIT inserts calls to these
helpers: "get_vtable", "allocate_new_object", "monitor_enter" e.t.c.

Some of these helpers have fast version: a "fast path", the algorithm that
worth to be inlined into generated method body to improve overall system
performance.

Here is an example of such helpers:
1)  new object allocation
2)  write barriers
3)  monitor enters/exits
4)  fast access to TLS data
...

The problem of inlining of a "fast-path" algorithm is that JIT must know the
details of the algorithm it inlines.
There are different ways to solve this problem and I will try to describe
some of them. Add more if you know better solution.
Which one to choose and details of the implementation is the subject of
discussion.


Solution 1.
Use "magic" methods (like MMTk does) and write fast path version of a helper
in Java.

"Magic" method is a method that is never compiled by JIT but replaced with a
native code.
In MMTk  (see link 1) there is a set of 'unboxed' types that allow to write
pointer arithmetic, pointer comparison, casts, and memory reads and writes
including atomic operations in Java language.
We can use MMTk-like 'unboxed' types with some new operations added  to
allow to write fast paths for helpers in Java. So JIT can access the
bytecode of the fast-path helper version and inline it as a usual Java
method.
What do we need to add to MMTk's like unboxed types (please correct if I've
missed something):
a)  Fast access to TLS.
Used to access thread local data in runtime: GC allocation, per-thread
profiling, monitors inlining e.t.c. :
b)  Develop an approach to call native methods using different calling
conventions.
Needed to call original and slow helper version if fast path version fails.


Solution 2.
Standardize algorithms by using custom interfaces for each of them.
GC, VM, JIT components interact with each other by using some intercomponent
interfaces, e.g. OPEN ones: (see link 2)
The proposal is to standardize interfaces for every algorithm or family of
algorithms we use in fast path helper versions, so if JIT supports one of
them it can ask VM or GC to provide a struct with functions pointers and
constants which describe the algorithm.
E.g. for a "monitor exit" fast path algorithm that checks some bit in header
in atomic operation it's enough to provide an offset of this bit to JIT.
Another example is a fast path of "bump pointer" allocation algorithm. Once
JIT is able generate a code to access to thread local data and knows the
offsets of 'current' and 'limit' pointers it can generate a code that
increments 'current' pointer or call to an original slow helper if check
with 'limit' failed.

Solution 3.
Develop and standardize a lightweight low-level language and parse it in JIT
to generate a helper's fast path.
In this case when JIT decides to inline a call of a helper it asks VM or GC
for a textual representation of the helper's fast path and parses it into
its internal representation. This solution looks exactly like Solution 1,
where Java bytecode serves as lightweight intercomponent language. So to
accept this solution we must get an answer why bytecode + "magic" methods
are not enough.


In my opinion all of these solutions will work, but the first one (Solution
1) is much more flexible.

Does anyone have other ideas how to design 'helper inlining' to make it easy
and reusable?
Is there any limitation with 'magic' approach that can prevent us to use
this way?


Links:
1)  http://cs.anu.edu.au/~Steve.Blackburn/pubs/papers/mmtk-icse-2004.pdf
2)  http://issues.apache.org/jira/browse/HARMONY-459.

--
Mikhail Fursov

Re: [drlvm] Helper inlining in JIT

Posted by Xiao-Feng Li <xi...@gmail.com>.

On 8/9/06, Mikhail Fursov <mi...@gmail.com> wrote:
> Does anyone have other ideas how to design 'helper inlining' to make it easy
> and reusable?
> Is there any limitation with 'magic' approach that can prevent us to use
> this way?
>

I agree that to write helper fast-path code in Java is a good idea.
For practical engineering perspective, it might be easier to firstly
implement those magic methods in native methods at first then move on
to JIT instrinsics.

Thanks,
xiaofeng

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [drlvm] Helper inlining in JIT

Posted by Xiao-Feng Li <xi...@gmail.com>.

On 8/16/06, Rana Dasgupta <rd...@gmail.com> wrote:
> Xiaofeng,
>    Thanks for the excellent description. My question was not whether partial
> inling of helpers( or fastpath inlining ) is necessary. It was more whether
> a generalized framework is absolutely necessary to support it. Or whether
> there is a finite set of helpers/services fastpaths that one can teach the
> jit about, so that the jit knows a priori how to generate the code for these
> inline. As far as I know, several product implementations are OK with
> teaching the jit, and their performance is quite excellent.
>    This may not be "ideal" for modularity and openness, but is it as bad as
> it sounds? If you plugged a new gc into drlvm that introduced a completely
> new helper never used before, jitrino would need to know something about its
> semantics and when to call it. So the seperation is not as clean as one
> might think.

The service routines' interaces are a finite set, and I think this
work is a not necessarily to be framework. If the runtime services'
interfaces are well-defined, JIT can always inline their
implementations. Once the interface is defined, any new GC has to
follow it in order for its serives to be inlined. We don't want to
introduce the full dynamic interface discovery complications at this
moment. :-) (I guess a framework is only needed when we want to
support multiple different IRs for different service routine
inlining.) But still we have two things to consider:

1. Which compoent is going to make the inlining decision? (Some new GC
doesn't provide the inlinable version for fast-path, or we just don't
want to inline for some reason.)
2. What flexibility can we give the serivce routine developer? (The
service routine may call other methods implemented internally or in
library, natively or in Java.)

>    If I am overruled and we feel that a framework is absolutely necessary,
> what you are suggesting makes sense. We could have a 2 phased approach:-
>    1) We code some of the helpers in Java and jitrino can inline them. We
> code the rest(unsafe) in asm and invoke them as naked or frameless calls (
> not jni ). My experience is that the cost is not in the call/ret ( unless
> you mispredict the branch ), but in setting up and tearing down the
> prolog/epilog, aligning the frame etc. We could add a helper calling
> convention to the jit for efficient parameter passing.
>    2) We develop the unsafe intrinsics and if using them we can get better
> performance we can use them to progressively replace the asm helpers.

Yes, this is what I thought as well. This work can be also useful for
the MMTk integration.

>   On a different topic, if we wanted to also optimize the JNI interface at
> some point, would we then develop a different framework for it :-) ?

Perhaps. :-) The serive routine inlining is special because it relates
all the components of JVM in ways of both high-level and low-level
designs. That is, e.g., to inline gc_alloc_object, we need define both
the functional interface, calling conventions, and module
interactions. If we have a new native interface and related
compilation framework, we probably will change the current design of
service inlining. My estimation is this Java fast-path inlining work
can be finished quickly, so it is worth to have a try anyway. :-)

Thanks,
xiaofeng

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [drlvm] Helper inlining in JIT

Posted by Weldon Washburn <we...@gmail.com>.

On 8/16/06, Xiao-Feng Li <xi...@gmail.com> wrote:
> This is off-topic now. :-)  But one (crazy) idea is to compile the
> native code with JVM compiler as well so that both Java and native
> code can use same IR.

Actually, its not really off-topic.  It is not crazy.  ECMA CLI does
something very similar.  The unspoken challenge is to extend the Java
bytecode set so that it can express arbitrary unsafe source code.  In
a sense, the discussion about inlining JNI and native code, etc is
really a discussion about which specific features of unsafe code the
JIT should comprehend.  Right now, vmmagic does a good job of
providing "C" style address arithmetic and "void *" pointers.  Its a
start but not complete.  Given the above, perhaps it makes sense to
let the JIT developers do opportunistic inlining of assembly code
snippets of DRLVM where it make sense.  That is, not design a complex
protocol for the JIT to discover what can be inlined.  Simply hardcode
to the situation.  Leave the real optimizing inlining to situations
where the helper is indeed expressed as standard Java source code.

>
> Thanks,
> xiaofeng
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>

-- 
Weldon Washburn
Intel Middleware Products Division

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [drlvm] Helper inlining in JIT

Posted by Weldon Washburn <we...@gmail.com>.

On 9/7/06, Mikhail Fursov <mi...@gmail.com> wrote:
>
> On 9/7/06, Rana Dasgupta <rd...@gmail.com> wrote:
> >
> > Hi Mikhail,
> >     Sorry I left this thread for a while. Are you implementing VMMagic
> > support in .OPT currently, and prototyping with bump allocation?  I am
> > just
> > trying to understand in what order we are doing this.
>
>
> I'm implementing VMMagics and going to use it to prototype bump pointer
> allocation. I hope I will finish VMMagic OPT package in a several weeks.
>
>   Would it be possible to list the fastpath helpers so that the java
> > interfaces to access them could be defined? All of them don't need magic
> > classes support and one of us could just write them. I don't know the
> list
> > of intrinsics implemented already in .JET. Can we just use them as is,
> and
> > what else ( other than TLS access, call support ) would need to be
> added?
>
>
> The list of the helpers with fast path to be written in Java:
> 1) object allocation
> 2) array allocation
> 3) monitor enters
> 4) monitor exits
> 5) I hope we can move some profiling helpers to Java and to remove the
> knowledge about their implementation from JIT.
> 6) write and read barriers
> 7) back branch polling helper
> 8) ? Some TI helpers ?
>
> VM gurus, have you anything to add to this list?


This is a good list to start with.  I would focus on the first four for
now.  This is because for the most part the interface from the common, fast
code to the slow not-so-common code is real simple and well understood.  The
idea is to start with the simplest thing possible that exercises all the
vmmagic functions you want to add (call, thread-local access).

Write barriers fall into a different category.  It depends on the write
barrier algorithm that happens to be implemented by the GC.  And what the GC
happens to be written in.  In other words, the integration we recently did
for MMTk probably does not apply to what Xiao Feng will be doing in GCV5.

Also on closer look, there are allocation algorithms that are not as simple
and clean as "bump the pointer".  For example a non-moving collector might
search size segregated lists to allocate an object.  The point is that it
would be good if the JIT keeps the existing helper interface in addition to
the work you are proposing.


+ I think that TLS and call support are enough.
>
>
> BTW, there may be a small omission in the example below..if I am reading
> > this right...
>
>
> IMO "newCurrent = oldCurrent + size" is OK. May be the source of the
> problem
> is the variable name, i.e. 'ceiling' is always >= 'current' in my example.
>
> Thanks,
> > Rana
> >
> > On 8/28/06, Mikhail Fursov <mi...@gmail.com> wrote:
> >
> > > Folks,
> > > Here is the example of fast allocation helper written in Java with the
> > > help
> > > of VMMagic
> > > If nobody objects I'm starting to implement VMMagic support in
> > > Jitrino.OPTthis week.
> > >
> > >
> > >
> > > private static final int GC_TLS_OFFSET = 10;
> > > private static final int GC_CURRENT_OFFSET= GC_TLS_OFFSET + 0;
> > > private static final int GC_CEILING_OFFSET= GC_TLS_OFFSET + 4;
> > > private static final int OBJ_VTABLE_OFFSET = 0;
> > >
> > > //annotate with calling convention and real VM helper id/name
> > information
> > > private static Address slowAlloc(int vtable, int size) {throw new
> > > Error("must never be called!");}
> > >
> > > private static Address fastAlloc(int vtable, int size) {
> > >    Address tlsBase = TLS.getAddress();  //load thread local client
> area
> > > address
> > >
> > >    Address currentFieldAddress = tlsBase.plus(GC_CURRENT_OFFSET);
> > >    Address ceilingFieldAddress = tlsBase.plus(GC_CEILING_OFFSET);
> > >
> > >    Address newObjectAddress; //the result of the method
> > >
> > >    // check if there is enough size to do allocation in thread local
> > > buffer
> > >    Address current = currentFieldAddress.loadAddress();
> > >    Address ceiling = ceilingFieldAddress.loadAddress();
> > >    Address newCurrent = current.plus(size);
> > >    if (newCurrent.LT(ceiling)) {
> >
> >
> > >>    newCurrent = newCurrent.plus(-size);
> >
> >        currentFieldAddress.store(newCurrent.toWord());
> > >        newObjectAddress = newCurrent;
> > >        newObjectAddress.store(vtable, Offset.fromInt
> > (OBJ_VTABLE_OFFSET));
> > >
> > >    } else {
> > >        newObjectAddress = slowAlloc(vtable, size);
> > >    }
> > >    return newObjectAddress;
> > > }
> > >
> > > --
> > > Mikhail Fursov
> > >
> > >
> >
> >
>
>
> --
> Mikhail Fursov
>
>


-- 
Weldon Washburn
Intel Middleware Products Division

Re: [drlvm] Helper inlining in JIT

Posted by Mikhail Fursov <mi...@gmail.com>.

On 9/7/06, Rana Dasgupta <rd...@gmail.com> wrote:
>
> Hi Mikhail,
>     Sorry I left this thread for a while. Are you implementing VMMagic
> support in .OPT currently, and prototyping with bump allocation?  I am
> just
> trying to understand in what order we are doing this.


I'm implementing VMMagics and going to use it to prototype bump pointer
allocation. I hope I will finish VMMagic OPT package in a several weeks.

   Would it be possible to list the fastpath helpers so that the java
> interfaces to access them could be defined? All of them don't need magic
> classes support and one of us could just write them. I don't know the list
> of intrinsics implemented already in .JET. Can we just use them as is, and
> what else ( other than TLS access, call support ) would need to be added?


The list of the helpers with fast path to be written in Java:
1) object allocation
2) array allocation
3) monitor enters
4) monitor exits
5) I hope we can move some profiling helpers to Java and to remove the
knowledge about their implementation from JIT.
6) write and read barriers
7) back branch polling helper
8) ? Some TI helpers ?

VM gurus, have you anything to add to this list?

+ I think that TLS and call support are enough.


  BTW, there may be a small omission in the example below..if I am reading
> this right...


IMO "newCurrent = oldCurrent + size" is OK. May be the source of the problem
is the variable name, i.e. 'ceiling' is always >= 'current' in my example.

Thanks,
> Rana
>
> On 8/28/06, Mikhail Fursov <mi...@gmail.com> wrote:
>
> > Folks,
> > Here is the example of fast allocation helper written in Java with the
> > help
> > of VMMagic
> > If nobody objects I'm starting to implement VMMagic support in
> > Jitrino.OPTthis week.
> >
> >
> >
> > private static final int GC_TLS_OFFSET = 10;
> > private static final int GC_CURRENT_OFFSET= GC_TLS_OFFSET + 0;
> > private static final int GC_CEILING_OFFSET= GC_TLS_OFFSET + 4;
> > private static final int OBJ_VTABLE_OFFSET = 0;
> >
> > //annotate with calling convention and real VM helper id/name
> information
> > private static Address slowAlloc(int vtable, int size) {throw new
> > Error("must never be called!");}
> >
> > private static Address fastAlloc(int vtable, int size) {
> >    Address tlsBase = TLS.getAddress();  //load thread local client area
> > address
> >
> >    Address currentFieldAddress = tlsBase.plus(GC_CURRENT_OFFSET);
> >    Address ceilingFieldAddress = tlsBase.plus(GC_CEILING_OFFSET);
> >
> >    Address newObjectAddress; //the result of the method
> >
> >    // check if there is enough size to do allocation in thread local
> > buffer
> >    Address current = currentFieldAddress.loadAddress();
> >    Address ceiling = ceilingFieldAddress.loadAddress();
> >    Address newCurrent = current.plus(size);
> >    if (newCurrent.LT(ceiling)) {
>
>
> >>    newCurrent = newCurrent.plus(-size);
>
>        currentFieldAddress.store(newCurrent.toWord());
> >        newObjectAddress = newCurrent;
> >        newObjectAddress.store(vtable, Offset.fromInt
> (OBJ_VTABLE_OFFSET));
> >
> >    } else {
> >        newObjectAddress = slowAlloc(vtable, size);
> >    }
> >    return newObjectAddress;
> > }
> >
> > --
> > Mikhail Fursov
> >
> >
>
>


-- 
Mikhail Fursov

Re: [drlvm] Helper inlining in JIT

Posted by Rana Dasgupta <rd...@gmail.com>.

Hi Mikhail,
    Sorry I left this thread for a while. Are you implementing VMMagic
support in .OPT currently, and prototyping with bump allocation?  I am just
trying to understand in what order we are doing this.
   Would it be possible to list the fastpath helpers so that the java
interfaces to access them could be defined? All of them don't need magic
classes support and one of us could just write them. I don't know the list
of intrinsics implemented already in .JET. Can we just use them as is, and
what else ( other than TLS access, call support ) would need to be added?
  BTW, there may be a small omission in the example below..if I am reading
this right...

Thanks,
Rana

On 8/28/06, Mikhail Fursov <mi...@gmail.com> wrote:

> Folks,
> Here is the example of fast allocation helper written in Java with the
> help
> of VMMagic
> If nobody objects I'm starting to implement VMMagic support in
> Jitrino.OPTthis week.
>
>
>
> private static final int GC_TLS_OFFSET = 10;
> private static final int GC_CURRENT_OFFSET= GC_TLS_OFFSET + 0;
> private static final int GC_CEILING_OFFSET= GC_TLS_OFFSET + 4;
> private static final int OBJ_VTABLE_OFFSET = 0;
>
> //annotate with calling convention and real VM helper id/name information
> private static Address slowAlloc(int vtable, int size) {throw new
> Error("must never be called!");}
>
> private static Address fastAlloc(int vtable, int size) {
>    Address tlsBase = TLS.getAddress();  //load thread local client area
> address
>
>    Address currentFieldAddress = tlsBase.plus(GC_CURRENT_OFFSET);
>    Address ceilingFieldAddress = tlsBase.plus(GC_CEILING_OFFSET);
>
>    Address newObjectAddress; //the result of the method
>
>    // check if there is enough size to do allocation in thread local
> buffer
>    Address current = currentFieldAddress.loadAddress();
>    Address ceiling = ceilingFieldAddress.loadAddress();
>    Address newCurrent = current.plus(size);
>    if (newCurrent.LT(ceiling)) {


>>    newCurrent = newCurrent.plus(-size);

       currentFieldAddress.store(newCurrent.toWord());
>        newObjectAddress = newCurrent;
>        newObjectAddress.store(vtable, Offset.fromInt(OBJ_VTABLE_OFFSET));
>
>    } else {
>        newObjectAddress = slowAlloc(vtable, size);
>    }
>    return newObjectAddress;
> }
>
> --
> Mikhail Fursov
>
>

Re: [drlvm] Helper inlining in JIT

Posted by Weldon Washburn <we...@gmail.com>.

On 8/28/06, Mikhail Fursov <mi...@gmail.com> wrote:
> Folks,
> Here is the example of fast allocation helper written in Java with the help
> of VMMagic
> If nobody objects I'm starting to implement VMMagic support in
> Jitrino.OPTthis week.

I like it!  It makes sense.  No objections to what you propose.
  - Weldon

>
>
>
> private static final int GC_TLS_OFFSET = 10;
> private static final int GC_CURRENT_OFFSET= GC_TLS_OFFSET + 0;
> private static final int GC_CEILING_OFFSET= GC_TLS_OFFSET + 4;
> private static final int OBJ_VTABLE_OFFSET = 0;
>
> //annotate with calling convention and real VM helper id/name information
> private static Address slowAlloc(int vtable, int size) {throw new
> Error("must never be called!");}
>
> private static Address fastAlloc(int vtable, int size) {
>    Address tlsBase = TLS.getAddress();  //load thread local client area
> address
>
>    Address currentFieldAddress = tlsBase.plus(GC_CURRENT_OFFSET);
>    Address ceilingFieldAddress = tlsBase.plus(GC_CEILING_OFFSET);
>
>    Address newObjectAddress; //the result of the method
>
>    // check if there is enough size to do allocation in thread local buffer
>    Address current = currentFieldAddress.loadAddress();
>    Address ceiling = ceilingFieldAddress.loadAddress();
>    Address newCurrent = current.plus(size);
>    if (newCurrent.LT(ceiling)) {
>        currentFieldAddress.store(newCurrent.toWord());
>        newObjectAddress = newCurrent;
>        newObjectAddress.store(vtable, Offset.fromInt(OBJ_VTABLE_OFFSET));
>
>    } else {
>        newObjectAddress = slowAlloc(vtable, size);
>    }
>    return newObjectAddress;
> }
>
> --
> Mikhail Fursov
>
>


-- 
Weldon Washburn
Intel Middleware Products Division

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [drlvm] Helper inlining in JIT

Posted by Mikhail Fursov <mi...@gmail.com>.

Hi Xiao-Feng,
I think that MMTk unboxed package + native call support + tls access magics
are enought for the all helpers I know.
Do you know any examples to extend this list?

On 8/29/06, Xiao-Feng Li <xi...@gmail.com> wrote:
>
> Fursov, what are the intrinsics supports you want to implement? The
> code below you gave has only a few examples. I think it would be a
> good idea to well define the instrinsics before code them.
>
> Thanks,
> xiaofeng
>

-- 
Mikhail Fursov

Re: [drlvm] Helper inlining in JIT

Posted by Xiao-Feng Li <xi...@gmail.com>.

Fursov, what are the intrinsics supports you want to implement? The
code below you gave has only a few examples. I think it would be a
good idea to well define the instrinsics before code them.

Thanks,
xiaofeng

On 8/28/06, Mikhail Fursov <mi...@gmail.com> wrote:
> Folks,
> Here is the example of fast allocation helper written in Java with the help
> of VMMagic
> If nobody objects I'm starting to implement VMMagic support in
> Jitrino.OPTthis week.
>
>
>
> private static final int GC_TLS_OFFSET = 10;
> private static final int GC_CURRENT_OFFSET= GC_TLS_OFFSET + 0;
> private static final int GC_CEILING_OFFSET= GC_TLS_OFFSET + 4;
> private static final int OBJ_VTABLE_OFFSET = 0;
>
> //annotate with calling convention and real VM helper id/name information
> private static Address slowAlloc(int vtable, int size) {throw new
> Error("must never be called!");}
>
> private static Address fastAlloc(int vtable, int size) {
>     Address tlsBase = TLS.getAddress();  //load thread local client area
> address
>     Address currentFieldAddress = tlsBase.plus(GC_CURRENT_OFFSET);
>     Address ceilingFieldAddress = tlsBase.plus(GC_CEILING_OFFSET);
>
>     Address newObjectAddress; //the result of the method
>
>     // check if there is enough size to do allocation in thread local buffer
>     Address current = currentFieldAddress.loadAddress();
>     Address ceiling = ceilingFieldAddress.loadAddress();
>     Address newCurrent = current.plus(size);
>     if (newCurrent.LT(ceiling)) {
>         currentFieldAddress.store(newCurrent.toWord());
>         newObjectAddress = newCurrent;
>         newObjectAddress.store(vtable, Offset.fromInt(OBJ_VTABLE_OFFSET));
>
>     } else {
>         newObjectAddress = slowAlloc(vtable, size);
>     }
>     return newObjectAddress;
> }
>
> --
> Mikhail Fursov
>
>

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [drlvm] Helper inlining in JIT

Posted by Mikhail Fursov <mi...@gmail.com>.

Folks,
Here is the example of fast allocation helper written in Java with the help
of VMMagic
If nobody objects I'm starting to implement VMMagic support in
Jitrino.OPTthis week.



private static final int GC_TLS_OFFSET = 10;
private static final int GC_CURRENT_OFFSET= GC_TLS_OFFSET + 0;
private static final int GC_CEILING_OFFSET= GC_TLS_OFFSET + 4;
private static final int OBJ_VTABLE_OFFSET = 0;

//annotate with calling convention and real VM helper id/name information
private static Address slowAlloc(int vtable, int size) {throw new
Error("must never be called!");}

private static Address fastAlloc(int vtable, int size) {
    Address tlsBase = TLS.getAddress();  //load thread local client area
address

    Address currentFieldAddress = tlsBase.plus(GC_CURRENT_OFFSET);
    Address ceilingFieldAddress = tlsBase.plus(GC_CEILING_OFFSET);

    Address newObjectAddress; //the result of the method

    // check if there is enough size to do allocation in thread local buffer
    Address current = currentFieldAddress.loadAddress();
    Address ceiling = ceilingFieldAddress.loadAddress();
    Address newCurrent = current.plus(size);
    if (newCurrent.LT(ceiling)) {
        currentFieldAddress.store(newCurrent.toWord());
        newObjectAddress = newCurrent;
        newObjectAddress.store(vtable, Offset.fromInt(OBJ_VTABLE_OFFSET));

    } else {
        newObjectAddress = slowAlloc(vtable, size);
    }
    return newObjectAddress;
}

-- 
Mikhail Fursov

Re: [drlvm] Helper inlining in JIT

Posted by Rana Dasgupta <rd...@gmail.com>.

Xiao Feng,
  Yes exactly please see below...


On 8/16/06, Xiao-Feng Li <xi...@gmail.com> wrote:
>
> On 8/17/06, Rana Dasgupta <rd...@gmail.com> wrote:


      Adding:
        - We will define one or more interfaces for grouping the fastpaths.
These will consist of methods that compatible VM's, GC's will implement.


>    -  So we will write the inlinable  fastpaths wherever possible in pure
> >    Java, using an annotated calling convention to call the slowpath ( to
> >    support developer freedom :-) ).
> >    - Where the fastpaths cannot be expressible in pure Java, we will
> >    first use asm to develop the helper and a custom calling convention
> to
> >    invoke and test it
> >    - As and when the magic classes are all available( I have not seen
> >    Alex and Weldon's code ), we will switch the second set  above to
> Java +
> >    magic and start inlining these as well
> >    - We can start with the new object allocation helper and .Jet if we
> >    want to, I guess
> >    - For folks who are interested in this, the core helpers live in
> >    vmcore\src\jit\jit_runtime_support.cpp and the exports in
> >    vmcore\include\jit_export.h.  There is platform specific stuff under
> >    vmcore\util\[platform]\base
> >
>
> Thanks,


   Rana

Re: [drlvm] Helper inlining in JIT

Posted by Xiao-Feng Li <xi...@gmail.com>.

On 8/17/06, Rana Dasgupta <rd...@gmail.com> wrote:
>    -  So we will write the inlinable  fastpaths wherever possible in pure
>    Java, using an annotated calling convention to call the slowpath ( to
>    support developer freedom :-) ).
>    - Where the fastpaths cannot be expressible in pure Java, we will
>    first use asm to develop the helper and a custom calling convention to
>    invoke and test it
>    - As and when the magic classes are all available( I have not seen
>    Alex and Weldon's code ), we will switch the second set  above to Java +
>    magic and start inlining these as well
>    - We can start with the new object allocation helper and .Jet if we
>    want to, I guess
>    - For folks who are interested in this, the core helpers live in
>    vmcore\src\jit\jit_runtime_support.cpp and the exports in
>    vmcore\include\jit_export.h.  There is platform specific stuff under
>    vmcore\util\[platform]\base
>

These make sense. Only one addition is, if we want the work be reused
by other GCs easily, the interfaces for fast path routines should be
defined. If the set is small (I think it is), we can define one or
more Java interfaces including all the methods, then any compatible GC
can implement these interfaces to leverage the inlining advantage. The
interfaces can be put into kernel class directory, and the
implementation code will stay with GC.  (I am using GC here as only an
example.)

Thanks,
xiaofeng

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [drlvm] Helper inlining in JIT

Posted by Weldon Washburn <we...@gmail.com>.

On 8/16/06, Rana Dasgupta <rd...@gmail.com> wrote:
> On 8/16/06, Xiao-Feng Li <xi...@gmail.com> wrote:
>
> > On 8/16/06, Mikhail Fursov <mi...@gmail.com> wrote:
> > >>
> > >> So why can't we optimize native (JNI) calls from Java code using
> > annotations
> > >> similar to those used to annotate helper's slow call above?
> > >> Once developer annotates native call with information about it's
> > sideffects
> > >> we can optimize the call and reduce JNI overhead.
> > >> ?
> >
> > >This is off-topic now. :-)  But one (crazy) idea is to compile the
> > >native code with JVM compiler as well so that both Java and native
> > >code can use same IR.
>
>
>  Let's skip JNI optimization on this thread, I am sorry I brought it up.
>
>   -  So we will write the inlinable  fastpaths wherever possible in pure
>   Java, using an annotated calling convention to call the slowpath ( to
>   support developer freedom :-) ).
>   - Where the fastpaths cannot be expressible in pure Java, we will
>   first use asm to develop the helper and a custom calling convention to
>   invoke and test it
Yes. I agree with this approach.  Note that the assembler syntax is
different between windows and Linux.  Writing this sticky code twice
is not a good idea.  Back when I did the original port of ORP to
Linux, I borrowed the code emitter from the JIT.  Then wrote basically
portable asm code that would compile at boottime and run equally on
both Lin and Win.  LIL was created later to try to remove the asm.  I
would vote for removing LIL entirely.  It seems more productive to
focus energies on moving to Java that fixing up LIL.

>   - As and when the magic classes are all available( I have not seen
>   Alex and Weldon's code ), we will switch the second set  above to Java +
>   magic and start inlining these as well

Actually, you really want two pieces.  First the JIRA patches that
make Jitrino.JET do the correct thing with vmmagic classes.  See
http://issues.apache.org/jira/browse/HARMONY-816.  You also need the
dummy vmmagic classes.  You write code that calls these classes.  Then
compile to bytecode.  When Jitrino.JET compiles this bytecode, it will
do "funny stuff" with the calls to the vmmagic classes.

The dummy vmmagic classes you need are contained in:
http://cs.anu.edu.au/~Steve.Blackburn/private/mmtk-20060714.jar

Incidentally the code I am working on, the MMTk port, will be
interesting to you but I suggest you look at it later.  The first
focus should be understanding how vmmagic works and what you can do
with it.

I have been trying to get vmmagic regression tests donated to open
source for a while.  These tests should help you with code
development.

>   - We can start with the new object allocation helper and .Jet if we
>   want to, I guess
>   - For folks who are interested in this, the core helpers live in
>   vmcore\src\jit\jit_runtime_support.cpp and the exports in
>   vmcore\include\jit_export.h.  There is platform specific stuff under
>   vmcore\util\[platform]\base
>
> Thanks,
> Rana
>
>
>
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >
> >
>
>


-- 
Weldon Washburn
Intel Middleware Products Division

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [drlvm] Helper inlining in JIT

Posted by Rana Dasgupta <rd...@gmail.com>.

On 8/16/06, Xiao-Feng Li <xi...@gmail.com> wrote:

> On 8/16/06, Mikhail Fursov <mi...@gmail.com> wrote:
> >>
> >> So why can't we optimize native (JNI) calls from Java code using
> annotations
> >> similar to those used to annotate helper's slow call above?
> >> Once developer annotates native call with information about it's
> sideffects
> >> we can optimize the call and reduce JNI overhead.
> >> ?
>
> >This is off-topic now. :-)  But one (crazy) idea is to compile the
> >native code with JVM compiler as well so that both Java and native
> >code can use same IR.


 Let's skip JNI optimization on this thread, I am sorry I brought it up.

   -  So we will write the inlinable  fastpaths wherever possible in pure
   Java, using an annotated calling convention to call the slowpath ( to
   support developer freedom :-) ).
   - Where the fastpaths cannot be expressible in pure Java, we will
   first use asm to develop the helper and a custom calling convention to
   invoke and test it
   - As and when the magic classes are all available( I have not seen
   Alex and Weldon's code ), we will switch the second set  above to Java +
   magic and start inlining these as well
   - We can start with the new object allocation helper and .Jet if we
   want to, I guess
   - For folks who are interested in this, the core helpers live in
   vmcore\src\jit\jit_runtime_support.cpp and the exports in
   vmcore\include\jit_export.h.  There is platform specific stuff under
   vmcore\util\[platform]\base

Thanks,
Rana



> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>

Re: [drlvm] Helper inlining in JIT

Posted by Xiao-Feng Li <xi...@gmail.com>.

On 8/16/06, Mikhail Fursov <mi...@gmail.com> wrote:
> On 8/16/06, Xiao-Feng Li <xi...@gmail.com> wrote:
> >
> >
> > > AFAIU  it's  enough to annotate JNI method with calling convention
> > details
> > > and to support it in JIT and VM.  So I see no difference in
> > implementation
> > > with helper inlining here. Just an extension or another version of
> > helper
> > > inlining mechanism.
> > > Am I missing something about JNI support?
> >
> > Yes, I guess you were missing something. ;-)
> > The service routine inlining is to attack the performance issue
> > brought when Java code calls into VM native services. If the JNI is
> > not a problem, this Java inlining can be not very interesting.
>
>
> So why can't we optimize native (JNI) calls from Java code using annotations
> similar to those used to annotate helper's slow call above?
> Once developer annotates native call with information about it's sideffects
> we can optimize the call and reduce JNI overhead.
> ?

This is off-topic now. :-)  But one (crazy) idea is to compile the
native code with JVM compiler as well so that both Java and native
code can use same IR.

Thanks,
xiaofeng

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [drlvm] Helper inlining in JIT

Posted by Mikhail Fursov <mi...@gmail.com>.

On 8/16/06, Xiao-Feng Li <xi...@gmail.com> wrote:
>
>
> > AFAIU  it's  enough to annotate JNI method with calling convention
> details
> > and to support it in JIT and VM.  So I see no difference in
> implementation
> > with helper inlining here. Just an extension or another version of
> helper
> > inlining mechanism.
> > Am I missing something about JNI support?
>
> Yes, I guess you were missing something. ;-)
> The service routine inlining is to attack the performance issue
> brought when Java code calls into VM native services. If the JNI is
> not a problem, this Java inlining can be not very interesting.


So why can't we optimize native (JNI) calls from Java code using annotations
similar to those used to annotate helper's slow call above?
Once developer annotates native call with information about it's sideffects
we can optimize the call and reduce JNI overhead.
?

-- 
Mikhail Fursov

Re: [drlvm] Helper inlining in JIT

Posted by Xiao-Feng Li <xi...@gmail.com>.

Mikhail, I wrote my reply before read yours. :-)

On 8/16/06, Mikhail Fursov <mi...@gmail.com> wrote:
> The separation is not clean if there is a helper that we can't write in
> Java. Once we added TLS and native calls support to Java I can't imagine
> this helper today, but I'm still looking for it :)

I guess Rana's question on framework is more about the decision making
about inlining and the service routine interface. I replied in my last
email.

> Yes, this is a good point to start. The information about native method's
> calling convention (slow helper)  can be stored as annotation for the
> method. Here is an example from Xiao-Feng:
>
> Reference object_alloc(int size, Address vtable ) {
>       Reference result = ThreadLocalBlock().alloc(size) ;
>       if( ! free.isNull() ){
>            result.setAddressAtOffset(vtable, VTABLE_OFFSET_IN_OBJECT);
>            return result;
>       }
>       //native slow path
>       return object_alloc_slowpath(size, vtable);
> }
>
> and object_alloc_slowpath() is the native version of the slow helper.
>
> We can add Java method to the same Java class:
> private static Reference object_alloc_slowpath(int size, Address vtable)
> and annotate it something like this
> @NativeHelper{"helperName=slowAlloc", callingConvention="cdecl",
> enumerationPoint="true"....}

Good! This is related to a point in my last email about "freedom to
the developer". Annotations can be used to make JIT inlining much
easier.

>
> What is the difference of unsafe intrinsic and asm helper?
>

Good question. :-) I think it's more conceptual or a design logic.

1. intrinsic is JIT-domain entity, the helper in DRLVM is VMCore
entity. It can be written in whatever IR that JIT prefers.
2. insinsic code provider. Thread-local-storage is more a VM concept
but used by JIT, since a JIT developer is unlikely wanting to know how
TLS works in underlying OSes. As stated in 1 above, we prefer JIT to
hold all the intrinsics in its domain, that means we need some
agreement between VM and JIT for TLS access.

In reality, they can be very similar. :-)

> AFAIU  it's  enough to annotate JNI method with calling convention details
> and to support it in JIT and VM.  So I see no difference in implementation
> with helper inlining here. Just an extension or another version of helper
> inlining mechanism.
> Am I missing something about JNI support?

Yes, I guess you were missing something. ;-)
The service routine inlining is to attack the performance issue
brought when Java code calls into VM native services. If the JNI is
not a problem, this Java inlining can be not very interesting.

> AFAIK Weldon and Alex made a lot of work related to support of unboxed types
> in DRLVM.
> We can reuse their work to prototype the inlining of the first helper's fast
> path in Jitrino.JET (it's easier) and after it succeed port the code to
> Jitrino.OPT compiler.

Agreed. I mentioned the same thing in my last email.

> Is new object allocation helper OK to start from? Any other ideas which
> helper to use to start?

I think it's a good idea.

Thanks,
xiaofeng

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [drlvm] Helper inlining in JIT

Posted by Mikhail Fursov <mi...@gmail.com>.

Rana,

On 8/16/06, Rana Dasgupta <rd...@gmail.com> wrote:
>
> Or whether there is a finite set of helpers/services fastpaths that one
> can teach the
> jit about, so that the jit knows a priori how to generate the code for
> these
> inline. As far as I know, several product implementations are OK with
> teaching the jit, and their performance is quite excellent.
>    This may not be "ideal" for modularity and openness, but is it as bad
> as
> it sounds? If you plugged a new gc into drlvm that introduced a completely
> new helper never used before, jitrino would need to know something about
> its
> semantics and when to call it. So the seperation is not as clean as one
> might think.

I also thought about it, the solution2 in initial letter is almost about the
same.
This solution requires modification of JIT every time you need new helper
implementation and  limits a developer. E.g. GC or ThreadManager developer
has to know JIT internals or depends on some JIT guru to add the fast-path
support to JIT.
When you have a several VM/GC/TM distributions JIT may become bloated with a
lot of helpers to support. Doing this way we share the code between JIT and
another component while we can avoid it and keep the fast-path version of
the helper near the helper itself but not in a separate component.
IMO the solution2 is good for conserved VMs like SUN's or BEA's but limits
Harmony. On the another hand the Solution1 allows you to write a whole
component (e.g. GC) in Java and communicate with it effectively.

The separation is not clean if there is a helper that we can't write in
Java. Once we added TLS and native calls support to Java I can't imagine
this helper today, but I'm still looking for it :)

   If I am overruled and we feel that a framework is absolutely necessary,

Not is absolutely necessary, but IMO is better by design. Once supported it
will make helper optimization easier.

what you are suggesting makes sense. We could have a 2 phased approach:-
>    1) We code some of the helpers in Java and jitrino can inline them. We
> code the rest(unsafe) in asm and invoke them as naked or frameless calls (
> not jni ). My experience is that the cost is not in the call/ret ( unless
> you mispredict the branch ), but in setting up and tearing down the
> prolog/epilog, aligning the frame etc. We could add a helper calling
> convention to the jit for efficient parameter passing.

Yes, this is a good point to start. The information about native method's
calling convention (slow helper)  can be stored as annotation for the
method. Here is an example from Xiao-Feng:

Reference object_alloc(int size, Address vtable ) {
      Reference result = ThreadLocalBlock().alloc(size) ;
      if( ! free.isNull() ){
           result.setAddressAtOffset(vtable, VTABLE_OFFSET_IN_OBJECT);
           return result;
      }
      //native slow path
      return object_alloc_slowpath(size, vtable);
}

and object_alloc_slowpath() is the native version of the slow helper.

We can add Java method to the same Java class:
private static Reference object_alloc_slowpath(int size, Address vtable)
and annotate it something like this
@NativeHelper{"helperName=slowAlloc", callingConvention="cdecl",
enumerationPoint="true"....}

The same could be done for JNI methods too, but JNI requires additional
support from VM.

   2) We develop the unsafe intrinsics and if using them we can get better
> performance we can use them to progressively replace the asm helpers.

What is the difference of unsafe intrinsic and asm helper?

  On a different topic, if we wanted to also optimize the JNI interface at
> some point, would we then develop a different framework for it :-) ?

AFAIU  it's  enough to annotate JNI method with calling convention details
and to support it in JIT and VM.  So I see no difference in implementation
with helper inlining here. Just an extension or another version of helper
inlining mechanism.
Am I missing something about JNI support?

AFAIK Weldon and Alex made a lot of work related to support of unboxed types
in DRLVM.
We can reuse their work to prototype the inlining of the first helper's fast
path in Jitrino.JET (it's easier) and after it succeed port the code to
Jitrino.OPT compiler.
Is new object allocation helper OK to start from? Any other ideas which
helper to use to start?

-- 
Mikhail Fursov

Re: [drlvm] Helper inlining in JIT

Posted by Rana Dasgupta <rd...@gmail.com>.

Xiaofeng,
   Thanks for the excellent description. My question was not whether partial
inling of helpers( or fastpath inlining ) is necessary. It was more whether
a generalized framework is absolutely necessary to support it. Or whether
there is a finite set of helpers/services fastpaths that one can teach the
jit about, so that the jit knows a priori how to generate the code for these
inline. As far as I know, several product implementations are OK with
teaching the jit, and their performance is quite excellent.
   This may not be "ideal" for modularity and openness, but is it as bad as
it sounds? If you plugged a new gc into drlvm that introduced a completely
new helper never used before, jitrino would need to know something about its
semantics and when to call it. So the seperation is not as clean as one
might think.
   If I am overruled and we feel that a framework is absolutely necessary,
what you are suggesting makes sense. We could have a 2 phased approach:-
   1) We code some of the helpers in Java and jitrino can inline them. We
code the rest(unsafe) in asm and invoke them as naked or frameless calls (
not jni ). My experience is that the cost is not in the call/ret ( unless
you mispredict the branch ), but in setting up and tearing down the
prolog/epilog, aligning the frame etc. We could add a helper calling
convention to the jit for efficient parameter passing.
   2) We develop the unsafe intrinsics and if using them we can get better
performance we can use them to progressively replace the asm helpers.

  On a different topic, if we wanted to also optimize the JNI interface at
some point, would we then develop a different framework for it :-) ?

Thanks,
Rana

Re: [drlvm] Helper inlining in JIT

Posted by Xiao-Feng Li <xi...@gmail.com>.

Rana, good arguments. :-) Based on my experience in developing JIT and
GC in Java, I think there are two issues, or two levels of issues.

1. Runtime services inlining.

The word "helper" is not a widely-used term. I think the idea is to
enable JIT to inline the frequently accessed routines of JVM into Java
application jitted code. These routines are runtime services provided
by the components of JVM, including GC, threading, etc. In a C-written
JVM, they are native methods accessed by jitted code through JNI or
fast JNI or raw native interface, hence the performance could be a
problem, because a) the generated code sequence depends on the
compiler that compiled the JVM, b) the call/ret instructions and
parameter passing etc. have negative performance impact.

DRLVM is using and used manually-written machine code sequence or
specially-defined IR to solve this problem, so that the runtime
runtines code sequences are guaranteed or can be inlined. In order to
reduce the labor work and maintenance efforts, only the fast path of
the runtime service routines are manually written in machine code or
low-level IR.

This kind of code sequences are called "helpers" in DRLVM because they
are provided by the VM core to help JIT compilation. This design has a
problem that some of the runtime services actually are provided by
other components than VM core, such as GC. If GC changes, the code
sequence may be required to change accordingly. But the design goes
this way to pursue the JVM modularity: JIT only needs to query the VM
core for all the fast path services, and its VM core's role to
coordinate JIT compilation and other components.

It's desirable for respective components to provide their own services
and they are inlined by the JIT.  We can let GC to provide the object
allocation fast-path code sequence or low-level IR.  Then we need an
additional set of interfaces to talk about IR inlining besides current
simple C functions. For example, we can write the routine fast-path in
Java, so that the JIT compiler can inline them naturally, and what we
need is to define the Java interfaces for those services.

For example, the object allocation sequence can be written in Java in
a way like:
in Class GC_X,
Reference object_alloc(int size, Address vtable )
{
       Reference result = ThreadLocalBlock().alloc(size);
       if( ! free.isNull() ){
            result.setAddressAtOffset(vtable, VTABLE_OFFSET_IN_OBJECT);
            return result;
       }

       //native slow path
       return object_alloc_slowpath(size, vtable);
}

This function is preformance critical. Its inlining can be achieved
automatically if a) the gc_alloc interface is known to JIT and b) the
JIT optimization inlines small size methods. (If JIT doesn't inline
it, annotation can be used to give instruction.)

These are obvious, and no special tricks are needed if all the methods
in the fast path are written in Java. It is not necessary for all the
methods in runtime service routines' fast-path to be written in Java.
They still can be supported by the idea of "helper" if that's
desirable. (For engineering purpose, native methods can be used as
well.) But how about if we still want to use Java to write them? Then
it comes to the second issue, a lower level issue.

2. Inlining of unsafe or low-level operations.

Java is not enough to write all runtime service routines. Because of
its safety property and platform independence, Java has no direct
memory access, no address arithmetics, and no direct platform-specific
operations. For example, Reference.setAddressAtOffset() is to store a
memory address into an offset position of a reference.
ThreadLocalBlock() is to retrieve thread local allocation block for
non-contended bump-pointer allocation, which needs OS specific
thread-local-storage access.  The alloc() invocation on
ThreadLocalBlock needs to do address arithmetic and memory access as
well.

These routines are not able to be written in direct Java. We need to
tweak the Java language if we want to inline them. The tweaked Java
language is actually just to borrow Java syntax as the IR to denote
the unsafe or low-level operations. The advantage is we don't need any
other compiler, the JIT compiler understands JVM bytecode.

For example, the Reference class can be defined in this way:
Class Reference {
      int address;
      void setIntValueAtOffset(int value, int offset){
           //empty
           // JIT to generate code of: *((int *)(address + offset) = value;
      }
      // other methods
}

The method body is left empty intentionally for JIT to generate
whatever proper code sequence for its execution platform, which
requires the JIT understand the method as compiler intrinsics.
Annotations can be used to instruct the JIT compiler.

This kind of unsafe/lowlevel operations are not many, can be
categorized and well-defined. Any JIT that wants to inline the Java
fast-path has to understand them. (btw, this kind of tweaks is called
VMMagic in JikesRVM as I know. ) Well since the inlining is only a
performance optimization, we need consider if DRLVM wants to work with
a JIT not knowing the Java services.

WIth these two issues clarified,  this "helper inlining" proposal can
be seperate into two tasks. One is to develop unsafe/low-level Java
intrinsics; the other is to develop runtime service routines'
fast-path in Java (using native methods first, then Java intrinsics).

Then back to your question, when we have the service routines written
in Java, we can let JIT to compile and inline them automatically; when
the unsafe Java intrinsics are ready, we can achieve good performance
with both supports. No need to expose registers to Java level in
either task. :-)

Thanks,
xiaofeng

On 8/12/06, Rana Dasgupta <rd...@gmail.com> wrote:
> MIkhail,
>   I have some questions, or just arguments :-) First, though it is nice to
> talk of an open helper inlining framework to call VM/GC helpers, isn't the
> set of helpers used in  JVM's more or less well known and standardized? In
> other words, is a framework absolutely necessary, can't the jit just
> generate code inline for the known fastpath of the helper algorithm? For
> example if we replaced the DRLVM gc with gc_new , how many new helpers do
> you think we would introduce?
>
> Also, please see below for some inline comments...
>
>
> On 8/11/06, Mikhail Fursov <mi...@gmail.com> wrote:
> >
> >
> > >Even this simple helper reveals a lot of features to support:
> >
> > >1) Access to TLS is required. It could be fs:[14h] on Windows or helper
> > call
> > >on Linux (depends on kernel (?) ). Even if we have an access to TLS we
> > need
> > >to know the offsets of our slots: 'current' and 'ceiling' in example.
> > >So we need to have a way to pass these values to JIT. May be a 'private
> > >static final' variables in the helper's class with special runtime
> > >annotations could be used to pass the values to JIT?
>
>
> So basically, every new VM would need to implement the helper in an almost
> identical way or at least using identical final variables...I wonder how
> much flexibility that leaves ..
>
> >2) Calls support is needed. Both for "slow" helper version and to access
> > >TLS.
> > >This means also that calling convention support is needed. Runtime level
> > >annotations for helper calls could be used here. E.g. create a magic
> > method
> > >'slow_alloc' and teach JIT to call real helper instead of it. Get the
> > >calling convention info from annotation (?)
>
>
> Is it not possible for the fastpath helper to just return a failure so that
> the jit generated conventional helper callpath can kick in. In other words,
> does the slow helper have to be called from the fast helper?
>
>
> >3) Do we really want to expose native regs to Java: EAX, ECX... I vote do
> > >not to use them when writing helpers in Java and to allow to JIT to
> > optimize
> > >unboxed operations (allocate registers by itself)
>
>
> This again goes back to my original question...if the JIT knew the standard
> fastpath, it could generate code optimally for it anyway with the best use
> of registers?
>
> These are just arguments before we set off developing a framework. I would
> like to know if helpers vary a lot across VM implementations, seems to me
> they need to do the same old things. Even if a framework is needed, it seems
> that we will land up inventing a language with unboxed primitives,  virtual
> register access and special calling conventions for it to work optimally.
> And then we want every VM to use this language to code fastpath helpers and
> also to code the slower helpers in the conventional way. Maybe we should ask
> VM developers to vote on how open this is....
>
> Thanks,
> Rana
>
>

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [drlvm] Helper inlining in JIT

Posted by Rana Dasgupta <rd...@gmail.com>.

MIkhail,
  I have some questions, or just arguments :-) First, though it is nice to
talk of an open helper inlining framework to call VM/GC helpers, isn't the
set of helpers used in  JVM's more or less well known and standardized? In
other words, is a framework absolutely necessary, can't the jit just
generate code inline for the known fastpath of the helper algorithm? For
example if we replaced the DRLVM gc with gc_new , how many new helpers do
you think we would introduce?

Also, please see below for some inline comments...

On 8/11/06, Mikhail Fursov <mi...@gmail.com> wrote:
>
>
> >Even this simple helper reveals a lot of features to support:
>
> >1) Access to TLS is required. It could be fs:[14h] on Windows or helper
> call
> >on Linux (depends on kernel (?) ). Even if we have an access to TLS we
> need
> >to know the offsets of our slots: 'current' and 'ceiling' in example.
> >So we need to have a way to pass these values to JIT. May be a 'private
> >static final' variables in the helper's class with special runtime
> >annotations could be used to pass the values to JIT?

So basically, every new VM would need to implement the helper in an almost
identical way or at least using identical final variables...I wonder how
much flexibility that leaves ..

>2) Calls support is needed. Both for "slow" helper version and to access
> >TLS.
> >This means also that calling convention support is needed. Runtime level
> >annotations for helper calls could be used here. E.g. create a magic
> method
> >'slow_alloc' and teach JIT to call real helper instead of it. Get the
> >calling convention info from annotation (?)

Is it not possible for the fastpath helper to just return a failure so that
the jit generated conventional helper callpath can kick in. In other words,
does the slow helper have to be called from the fast helper?

>3) Do we really want to expose native regs to Java: EAX, ECX... I vote do
> >not to use them when writing helpers in Java and to allow to JIT to
> optimize
> >unboxed operations (allocate registers by itself)

This again goes back to my original question...if the JIT knew the standard
fastpath, it could generate code optimally for it anyway with the best use
of registers?

These are just arguments before we set off developing a framework. I would
like to know if helpers vary a lot across VM implementations, seems to me
they need to do the same old things. Even if a framework is needed, it seems
that we will land up inventing a language with unboxed primitives,  virtual
register access and special calling conventions for it to work optimally.
And then we want every VM to use this language to code fastpath helpers and
also to code the slower helpers in the conventional way. Maybe we should ask
VM developers to vote on how open this is....

Thanks,
Rana

Re: [drlvm] Helper inlining in JIT

Posted by Mikhail Fursov <mi...@gmail.com>.

On 8/10/06, Weldon Washburn <we...@gmail.com> wrote:
>
> On 8/9/06, Mikhail Fursov <mi...@gmail.com> wrote:
> > a)  Fast access to TLS.
>
> Yes!  Good idea.   What you want is a "mov eax, fs[14]" intrinsic for
> winxp and the equivalent for Linux.   It may be a stretch for vmmagic
> purists to add segment register address arithmetic to vmmagic.  Worst
> case, Harmony could add the segment register facility to Harmony svn.
>

Exactly,
To see why support of calls and calling conventions is also important I
propose to try to write a real helper using unboxed types.
This way we can prove that the solution does really work, check if the
solution is convenient to use and find out what new functionality is needed.

Here is very simple example in asm of simple bump-pointer allocation
helper's "fast path" for windows:

Incoming params: vtable, size
Result - a new object reference.

main:
    mov ecx, fs:[14h]                 // access to TLS
    mov eax, [ecx + current]      // access to 'current' field
    add eax, size                      // check if there is enough size to
do allocation in thread local buffer
    cmp eax, [ecx + ceiling]
    jg slow

fast:
    mov [ecx+current], eax      // save new 'current'
    sub eax, size                    // reposition to the start of the
object
    mov [eax], vtable               // save type info
    mov objRef eax                 // save or return result

slow:
    call real new obj helper..

Even this simple helper reveals a lot of features to support:

1) Access to TLS is required. It could be fs:[14h] on Windows or helper call
on Linux (depends on kernel (?) ). Even if we have an access to TLS we need
to know the offsets of our slots: 'current' and 'ceiling' in example.
So we need to have a way to pass these values to JIT. May be a 'private
static final' variables in the helper's class with special runtime
annotations could be used to pass the values to JIT?

2) Calls support is needed. Both for "slow" helper version and to access
TLS.
This means also that calling convention support is needed. Runtime level
annotations for helper calls could be used here. E.g. create a magic method
'slow_alloc' and teach JIT to call real helper instead of it. Get the
calling convention info from annotation (?)

3) Do we really want to expose native regs to Java: EAX, ECX... I vote do
not to use them when writing helpers in Java and to allow to JIT to optimize
unboxed operations (allocate registers by itself)

So, there are a lot of questions to discuss before we can use the code like
Alex did (http://issues.apache.org/jira/browse/HARMONY-816) to write this
simple helper :)

-- 
Mikhail Fursov

Re: [drlvm] Helper inlining in JIT

Posted by Weldon Washburn <we...@gmail.com>.

On 8/9/06, Mikhail Fursov <mi...@gmail.com> wrote:
> Folks,
> Once we decided that we do not want to limit Harmony with only one VM/JIT/GC
> instance I think it's time to start discussion how all these components will
> work together effectively.
> So the proposal is to discuss helper inlining interface.
>
> Every component we have (VM, GC : )  can have helpers that must be called
> from managed code.
> While generating native code for Java method JIT inserts calls to these
> helpers: "get_vtable", "allocate_new_object", "monitor_enter" e.t.c.
>
> Some of these helpers have fast version: a "fast path", the algorithm that
> worth to be inlined into generated method body to improve overall system
> performance.
>
> Here is an example of such helpers:
> 1)  new object allocation
> 2)  write barriers
> 3)  monitor enters/exits
> 4)  fast access to TLS data
> ...
>
> The problem of inlining of a "fast-path" algorithm is that JIT must know the
> details of the algorithm it inlines.
> There are different ways to solve this problem and I will try to describe
> some of them. Add more if you know better solution.
> Which one to choose and details of the implementation is the subject of
> discussion.
>
>
> Solution 1.
> Use "magic" methods (like MMTk does) and write fast path version of a helper
> in Java.

I really like this approach.  I spent lots of time writing tedious
code that glues JIT activation records to underlying VM and OS.  This
is very hard to maintain code.  In general, moving to writing in Java
is a good idea.

>
> "Magic" method is a method that is never compiled by JIT but replaced with a
> native code.
> In MMTk  (see link 1) there is a set of 'unboxed' types that allow to write
> pointer arithmetic, pointer comparison, casts, and memory reads and writes
> including atomic operations in Java language.
> We can use MMTk-like 'unboxed' types with some new operations added  to
> allow to write fast paths for helpers in Java. So JIT can access the
> bytecode of the fast-path helper version and inline it as a usual Java
> method.
> What do we need to add to MMTk's like unboxed types (please correct if I've
> missed something):
> a)  Fast access to TLS.

Yes!  Good idea.   What you want is a "mov eax, fs[14]" intrinsic for
winxp and the equivalent for Linux.   It may be a stretch for vmmagic
purists to add segment register address arithmetic to vmmagic.  Worst
case, Harmony could add the segment register facility to Harmony svn.

> Used to access thread local data in runtime: GC allocation, per-thread
> profiling, monitors inlining e.t.c. :
> b)  Develop an approach to call native methods using different calling
> conventions.
> Needed to call original and slow helper version if fast path version fails.
>
>
> Solution 2.
> Standardize algorithms by using custom interfaces for each of them.
> GC, VM, JIT components interact with each other by using some intercomponent
> interfaces, e.g. OPEN ones: (see link 2)
> The proposal is to standardize interfaces for every algorithm or family of
> algorithms we use in fast path helper versions, so if JIT supports one of
> them it can ask VM or GC to provide a struct with functions pointers and
> constants which describe the algorithm.
> E.g. for a "monitor exit" fast path algorithm that checks some bit in header
> in atomic operation it's enough to provide an offset of this bit to JIT.
> Another example is a fast path of "bump pointer" allocation algorithm. Once
> JIT is able generate a code to access to thread local data and knows the
> offsets of 'current' and 'limit' pointers it can generate a code that
> increments 'current' pointer or call to an original slow helper if check
> with 'limit' failed.
>
> Solution 3.
> Develop and standardize a lightweight low-level language and parse it in JIT
> to generate a helper's fast path.
> In this case when JIT decides to inline a call of a helper it asks VM or GC
> for a textual representation of the helper's fast path and parses it into
> its internal representation. This solution looks exactly like Solution 1,
> where Java bytecode serves as lightweight intercomponent language. So to
> accept this solution we must get an answer why bytecode + "magic" methods
> are not enough.
>
>
> In my opinion all of these solutions will work, but the first one (Solution
> 1) is much more flexible.

Agreed.    :)

>
> Does anyone have other ideas how to design 'helper inlining' to make it easy
> and reusable?
> Is there any limitation with 'magic' approach that can prevent us to use
> this way?
>
>
> Links:
> 1)  http://cs.anu.edu.au/~Steve.Blackburn/pubs/papers/mmtk-icse-2004.pdf
> 2)  http://issues.apache.org/jira/browse/HARMONY-459.
>
> --
> Mikhail Fursov
>
>


-- 
Weldon Washburn
Intel Middleware Products Division

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org