You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@harmony.apache.org by Egor Pasko <eg...@gmail.com> on 2006/07/10 17:44:52 UTC

[drlvm] interface call devirtualization

I looked through the list of TODO projects for JIT [1] and decided to write a
microbenchmark detecting how good interface call devirtualization works in JIT 
(see below)

Jitrino.OPT showed very-very slow (~2.5 times slower than JRockit (1.5/linux)).

I looked through the compile-time log of Jitrino.OPT (see below) and found the
reason: Jitrino.OPT did *not* hoist "interface vtable address load instruction
(ldintfcvt)" above the hottest loop. 

At the moment, I would wait a little before hacking load hoisting
since it looks more complicated than interface call
devirtualization. A strange thing, "interface call devirtualization"
would have boosted JRockit's performance too (I checked that with a
slightly changed benchmark).

So, that would be interesting to implement it!

Seems like the best choice is to start from a couple of easy heuristics:
* if there is only one loaded class to implement the interface, choose it
* if there are more, choose the one with it's method invoked earlier (compiled
  by some JIT, possibly, some other JIT), 
* if we have many candidate methods that are compiled, choose the most frequent
  one (need a method-entry profile, the feature is likely to stay untouched for
  a while, I guess)

4 questions for now:

0. Does anybody want to participate? :))) 
   I am always ready to help with implementation details.

1. Does anybody have some additional elegant ideas?

2. How do you like the benchmark?

3. Should I create a JIRA for the issue ASAP? :)

P.S.:

The benchmark:
--------------------------------------------------------------------------------
import java.util.Date;

interface Intfc {
    public void reset();
    public void inc();
    public int getNum();

    static final long exercise = 1000000000;
}

class IntfcImpl implements Intfc {

    public IntfcImpl() { reset(); }

    private int num;

    public void inc() { num++; }

    public int getNum() { return num; }

    public void reset() { num = 0; }
}

class Runner {
    public Runner(Intfc o) { intfc_obj = o; }
    public void run()
    {
        intfc_obj.reset();

        /* uncomment to test performance on a non-devirtualized version */
        //IntfcImpl impl_obj = (IntfcImpl) intfc_obj;

        for (long i = 0; i < Intfc.exercise; i++ ) {
            intfc_obj.inc();

            /* uncomment to test performance on a non-devirtualized version */
            //impl_obj.inc();
        }
    }

    public void measureTime()
    {
        Date before, after;

        before = new Date();
        run();
        after = new Date();
        System.out.println("run: " + (after.getTime() - before.getTime()));
    }

    private Intfc intfc_obj;
}

public class IntfcCaller {
    
    public static void main(String[] args) {
        IntfcImpl obj = new IntfcImpl();
 
        // use obj a little
        obj.reset();
        obj.inc();

        Runner runner = new Runner(obj);

        runner.measureTime();
        runner.measureTime();
        runner.measureTime();

        if ( obj.getNum() != Intfc.exercise ) {
            System.out.println("FAIL");
            System.exit(1);
        }
    }
}
--------------------------------------------------------------------------------

to get the compile-time log of method Runner::run() I put an extra option:

-Xjit LOG=\"singlefile,root=all,method=run\"

A piece of loop body:
--------------------------------------------------------------------------------
Block L5:
  Predecessors: L4
  Successors: L6 UNWIND
  I43:L5:
  I27:tauhastype      t18,cls:Intfc -) t20:tau
  I28:ldintfcvt t18,cls:Intfc ((t19)) -) t21:vtb:cls:Intfc
  I29:ldvfnslot [t21.Intfc::inc] ((t19)) -) t22:method:inc
  I31:callimem  [t22](t18) ((t19,t20)) -)
  GOTO L6
--------------------------------------------------------------------------------

Even the highest optimization level (option -Xjit opt::skip=off) does not help
:(

[1] http://mail-archives.apache.org/mod_mbox/incubator-harmony-dev/200606.mbox/%3Cuejxhim8n.fsf@gmail.com%3E 

-- 
Egor Pasko, Intel Managed Runtime Division


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [drlvm] interface call devirtualization

Posted by Egor Pasko <eg...@gmail.com>.

On the 0x1A5 day of Apache Harmony Rana Dasgupta wrote:
>  Nice benchmark. 

I am soo glad ;)

> Yes, the cost of not devirtualizing as well as not hoisting
> ldintfcvt is high. I played around a little with this too and have some
> comments and questions...
> 
>   First some high level stuff....
>   1) What are the instructions like ldintfcvt, ldvfnslot, etc.in the jit
> dump? Are these part of jitrino HIR? 

exactly! the High-level IR

let's look at these instructions in more detail:
  I28:ldintfcvt t18,cls:Intfc ((t19)) -) t21:vtb:cls:Intfc
  I29:ldvfnslot [t21.Intfc::inc] ((t19)) -) t22:method:inc

I28 and I29 are identifiers of instruction instances (this is an easy one:)

let's skip ((t19)) for simplicity (it shows explicit data
dependency to the corresponding null-check, usually represented as a
special operand in Jitrino.OPT)

*ldintfcvt* = Load Interface Vtable address
takes 2 parameters: (Object) and (Interface_name), 
returns: address of the virtual table 
(every object has a ref to it's interface vtables). 

You see, '-)' is an arrow :)

*ldvfnslot* = Load Virtual FuNction address SLOT
takes 2 parameters: (virtual table) and (item to search)
returns: address of the address of the method

*callimem* = Indirect Memory Call

> While they seem more or less readable,
> is there any doc describing them ... since they are the first level internal
> representation, and anyone who wants to work with the jit needs to
> understand them?

Yes, the doc would be great! But, there is no document describing HIR
commands at the moment. I am thinking of updating the sources with
self-documenting code. 

Up to now the easiest way to find out what an instruction does is to
look in the Opcode.cpp, where opcodeTable array is initialized.

we can find the entries for our 2 instructions:
    { Op_TauLdIntfcVTableAddr,  ..blablabla.. "ldintfcvt", 
        "ldintfcvt %0,%d ((%1)) -) %l",        }, 

    { Op_TauLdVirtFunAddrSlot,  ..blablabla.. "ldvfnslot"
         "ldvfnslot [%0.%d] ((%1)) -) %l",      },

skip that Tau for simplicity and you get a more-or-less self
descriptive names in the first column. What I am thinking of, is
adding an extra column and fill it with more descriptive comments that
could be printed on request, say ... -Xjit print_hir_doc

should not take too much time, I think. You can always ask what these
or those instructions mean, I'll try to explain

>   2) When experimenting with the JIT related command line options to
> ij.exe-Xjit...I found many of them listed in the vm/doc/GettingStarted
> guide...Just FYI for interested folks.

yeah, and many undocumented ones, specific to optimization
passes. They are easy to find in functions like readFlagsFromCommandLine

> On 10 Jul 2006 22:44:52 +0700, Egor Pasko <eg...@gmail.com> wrote:
> > > I looked through the list of TODO projects for JIT [1] and
> > > decided to write a >microbenchmark detecting how good interface
> > > call devirtualization works in JIT >(see below)
> >
> > >Jitrino.OPT showed very-very slow (~2.5 times slower than JRockit (1.5
> > /linux)).
> >
> > > A strange thing, "interface call devirtualization"
> > >would have boosted JRockit's performance too (I checked that with a
> > >slightly changed benchmark).
> 
> 
>   Yes, this optimization would have helped here...I also converted this
> interface dispatch effectively to a virtual dispatch in your test and the
> performance significantly improves with the resultant devirtualization...
> 

yes, I can comment on your piece of dump

> Block L8:
>   Predecessors: L7
>   Successors: L11 L9
>   I74:L8:
>   I40:ldvtable  t13 ((t27)) -) t28:vtb:cls:IntfcImpl
>   I41:getvtable cls:IntfcImpl -) t29:vtb:cls:IntfcImpl
>   I42:if ceq.vtb t28, t29 goto L11
>   GOTO L9

this is an explicit condition equivalent to:
(if object t13 is of class IntfcImpl) 
// it is implemented via comparing corresponding virtual table addresses
So, this is where the guarded devirtualization of virtual calls shows itself.

> Block L9:
>   Predecessors: L8
>   Successors: L14 UNWIND
>   I37:L9:
>   I43:tauhastype      t13,cls:IntfcImpl -) t30:tau
>   I44:ldvfnslot [t28.IntfcImpl::inc] ((t27)) -) t31:method:inc
>   I46:callimem  [t31](t13) ((t27,t30)) -)
>   GOTO L14

This is a guard :)

> Block L11:
>   Predecessors: L8
>   Successors: L12 UNWIND
>   I38:L11:
>   I48:--- IntfcImpl::inc: ()
>   I49:chknull   t13 -) t32:tau
>   GOTO L12

this is a devirtualized way, you see "IntfcImpl::inc" is inlined here.
Inlining sometimes happens via a set of heuristics such as size of
inlined bytecode...

> >So, that would be interesting to implement it!
> >
> > >Seems like the best choice is to start from a couple of easy heuristics:
> > >* if there is only one loaded class to implement the interface, choose it
> > >* if there are more, choose the one with it's method invoked earlier
> > (compiled
> > >by some JIT, possibly, some other JIT),
> 
>   If we forget the profile guidance for now, could you please elaborate more
> about how we should do this and on what exactly is happening now? 

OK, just need to check my ideas. In brief, I see 2 places where to
devirtualize: in Translator and in High-Level Optimizer. Each has it's
own benefits. Translator is faster, but less heuristic-oriented (can
rely only on bytecode size).

> BTW, do we currently raise the IncompatibleClassChangeError if the
> objectref's class does not actually support the interface?

Yes, AFAIR, IncompatibleClassChangeError works, there should be a
special vtable entry on that. Not sure, I can check, if it helps.

> Do we cache the interface tables per class object and can we improve
> this cache search in the optimization? 

This is a kind of optimization in VM core. The original proposal was
to implement an optimization in Jitrino.OPT.

> In non trivial cases where many classes implement the same
> interface, the cache search may be more expensive than the slot look
> up.

hm, we could even store slots in objects, but this is not easy to do
:) performance mplact is not obvious in all cases :(

>   We could also virtualize and then devirtualize the interface invocation
> when we can...somewhat like the jit dump above. What do you think?

yes, that's right, make a virtual call (guarded) and then rely on
guarded devirtualizer. Good and simple. Thanks!

> > >* if we have many candidate methods that are compiled, choose the most
> > frequent
> > >one (need a method-entry profile, the feature is likely to stay untouched
> > for
> > >a while, I guess)
> >
> > > 3. Should I create a JIRA for the issue ASAP? :)
> 
>   I would say yes, let's create a JIRA issue

OK, TBD (sorry, I am soo.. busy today:)

-- 
Egor Pasko, Intel Managed Runtime Division


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [drlvm] interface call devirtualization

Posted by Rana Dasgupta <rd...@gmail.com>.

Hi Egor,
 Nice benchmark. Yes, the cost of not devirtualizing as well as not hoisting
ldintfcvt is high. I played around a little with this too and have some
comments and questions...

  First some high level stuff....
  1) What are the instructions like ldintfcvt, ldvfnslot, etc.in the jit
dump? Are these part of jitrino HIR? While they seem more or less readable,
is there any doc describing them ... since they are the first level internal
representation, and anyone who wants to work with the jit needs to
understand them?
  2) When experimenting with the JIT related command line options to
ij.exe-Xjit...I found many of them listed in the vm/doc/GettingStarted
guide...Just FYI for interested folks.

  Please see below...

On 10 Jul 2006 22:44:52 +0700, Egor Pasko <eg...@gmail.com> wrote:
>
>
> >I looked through the list of TODO projects for JIT [1] and decided to
> write a
> >microbenchmark detecting how good interface call devirtualization works
> in JIT
> >(see below)
>
> >Jitrino.OPT showed very-very slow (~2.5 times slower than JRockit (1.5
> /linux)).
>
> > A strange thing, "interface call devirtualization"
> >would have boosted JRockit's performance too (I checked that with a
> >slightly changed benchmark).

  Yes, this optimization would have helped here...I also converted this
interface dispatch effectively to a virtual dispatch in your test and the
performance significantly improves with the resultant devirtualization...

Block L8:
  Predecessors: L7
  Successors: L11 L9
  I74:L8:
  I40:ldvtable  t13 ((t27)) -) t28:vtb:cls:IntfcImpl
  I41:getvtable cls:IntfcImpl -) t29:vtb:cls:IntfcImpl
  I42:if ceq.vtb t28, t29 goto L11
  GOTO L9

Block L9:
  Predecessors: L8
  Successors: L14 UNWIND
  I37:L9:
  I43:tauhastype      t13,cls:IntfcImpl -) t30:tau
  I44:ldvfnslot [t28.IntfcImpl::inc] ((t27)) -) t31:method:inc
  I46:callimem  [t31](t13) ((t27,t30)) -)
  GOTO L14

Block L11:
  Predecessors: L8
  Successors: L12 UNWIND
  I38:L11:
  I48:--- IntfcImpl::inc: ()
  I49:chknull   t13 -) t32:tau
  GOTO L12

>So, that would be interesting to implement it!
>
> >Seems like the best choice is to start from a couple of easy heuristics:
> >* if there is only one loaded class to implement the interface, choose it
> >* if there are more, choose the one with it's method invoked earlier
> (compiled
> >by some JIT, possibly, some other JIT),

  If we forget the profile guidance for now, could you please elaborate more
about how we should do this and on what exactly is happening now? BTW, do we
currently raise the IncompatibleClassChangeError if the objectref's class
does not actually support the interface? Do we cache the interface tables
per class object and can we improve this cache search in the optimization?
In non trivial cases where many classes implement the same interface, the
cache search may be more expensive than  the slot look up.
  We could also virtualize and then devirtualize the interface invocation
when we can...somewhat like the jit dump above. What do you think?

> >* if we have many candidate methods that are compiled, choose the most
> frequent
> >one (need a method-entry profile, the feature is likely to stay untouched
> for
> >a while, I guess)
>
> > 3. Should I create a JIRA for the issue ASAP? :)

  I would say yes, let's create a JIRA issue

Thanks,

    Rana

---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>

Re: [drlvm] interface call devirtualization

Posted by Mikhail Fursov <mi...@gmail.com>.

Folks,
I tried to fix devirtualizer to process interfaces. Still can't run such a
big application like Eclipse but initial test in JIRA 957 works.
The performance benefit of interface devirtualization for this test is
rather small (14%) but the problem now not with devirtualizer but with loop
optimizations that do not move vtable loading operation out of the loop.
Here is a small report of what was done, how to use the patch and some notes
on current DRLVM problems

What was done:
1) Devirtualization of interface calls
2) Profile directed devirtualization for both interface and class calls
3) Two new options added that work only if there are an entry-backedge
profile for a method and devirtualizer is ON.
The first option is "opt::devirt::skip_interfaces" and if set to 'false'
devirtualizer processes interfaces and selects a direct call target by
method hotness profile.
The second option is "opt::devirt::profile_selection" - if set to 'true' the
devirtualizer selects direct calls target by hotness profile for normal
methods (not interfaces)

Files in attachment (I going to add it to JIRA in a minute)
1) patch.diff - the diff with changes
+ diff includes the previous changes posted by me to this thread that count
the percent of virtual methods ... (see previous post or just skip it, this
is not so important now)
2) em configuration file and run script for Windows: use these files to run
the test. But fix paths in files first.
3) Initial test from Egor with decreased number of iterations: wait 30
seconds for each measurements is too much  :)

Notes:
1) VM CHA analysys (method iteration) does not work if root class is
interface
2) RI works fast even for the first run: OSR ?
3) The slow version of method (generated by JET) is executed one extra time
after method is recompiled, I hope we will fix it one day.
4) We really need to improve our loop optimizations.
5) It's still unclear if method hotness profile is useful for direct target
selection in devirtualizer..

Thats all for today. Please try/check the diff and comment (JIRA 957)

On 7/24/06, Mikhail Fursov <mi...@gmail.com> wrote:
>
> On 24 Jul 2006 12:17:18 +0700, Egor Pasko < egor.pasko@gmail.com> wrote:
>
> > Interesting results! QuickQuestions:
> > * did you include interface calls in your investigations. Or is it
> >   just invokevirtual that you tried?
>
>
> I have no idea what was the way of calling a method. The result shows only
> the percentage of actual calls of virtual methods on the top of class
> hierarchy / number of total virtual calls.
>
> * Why do you count number of methods with mult-dispatches? I would
> >   count only interface calls having multiple dispatches.
>
>
> I checked only methods with multiple virtual versions, because this is the
> only case devirtualizer have a choice. If method is virtual but has only one
> version (method is not overridden) it's impossible to make a mistake in
> devirtualization :)
>
>
> > For me it is interesting if
> > the patch handles the situations when classes are loaded from time to
> > time. We need to check each interface call for having multiple
> > dispatches (at runtime).
>
>
> But if we have a recompilation we can be sure that all of "hot" classes
> and methods are already loaded.
>
> I created HARMONY-957 for the issue and attached my benchmark. You can
> > put your patch there, please.
>
>
> My patch is a hack in entry-backedge profiler code. So I will put it with
> comments like 'this is just an example, not to be included to harmony code'
>
> OK, see you soon :)
>
>
> Hope I finish it till the end of the week. Any help from other people
> interested in harmony development is welcome :)
>
> + I'm going to ask Pavel Pervov who wants to refactor Class.h code to add
> more methods to simplify CHA analysis. The way I look for the virtual copy
> of method in superclass in my patch is really ugly.
>
> --
> Mikhail Fursov
>
>

-- 
Mikhail Fursov

Re: [drlvm] interface call devirtualization

Posted by Mikhail Fursov <mi...@gmail.com>.

On 24 Jul 2006 12:17:18 +0700, Egor Pasko <eg...@gmail.com> wrote:
>
> Interesting results! QuickQuestions:
> * did you include interface calls in your investigations. Or is it
>   just invokevirtual that you tried?

I have no idea what was the way of calling a method. The result shows only
the percentage of actual calls of virtual methods on the top of class
hierarchy / number of total virtual calls.

* Why do you count number of methods with mult-dispatches? I would
>   count only interface calls having multiple dispatches.

I checked only methods with multiple virtual versions, because this is the
only case devirtualizer have a choice. If method is virtual but has only one
version (method is not overridden) it's impossible to make a mistake in
devirtualization :)

> For me it is interesting if
> the patch handles the situations when classes are loaded from time to
> time. We need to check each interface call for having multiple
> dispatches (at runtime).

But if we have a recompilation we can be sure that all of "hot" classes and
methods are already loaded.

I created HARMONY-957 for the issue and attached my benchmark. You can
> put your patch there, please.

My patch is a hack in entry-backedge profiler code. So I will put it with
comments like 'this is just an example, not to be included to harmony code'

OK, see you soon :)

Hope I finish it till the end of the week. Any help from other people
interested in harmony development is welcome :)

+ I'm going to ask Pavel Pervov who wants to refactor Class.h code to add
more methods to simplify CHA analysis. The way I look for the virtual copy
of method in superclass in my patch is really ugly.

-- 
Mikhail Fursov

Re: [drlvm] interface call devirtualization

Posted by Egor Pasko <eg...@gmail.com>.

On the 0x1AD day of Apache Harmony Mikhail Fursov wrote:
> I promised to start to investigate devirtualization techniques and here is
> the first result:
> I hacked the entry-backedge profiler in DRLVM and added the dump of virtual
> methods call frequencies when VM is about to be shutted down.
> The goal is to test that the last method in class hierarchy is the most
> probable one.
> 
> The numbers I've got for different runtimes:
> 
> Eclipse startup (with a couple of opened projects):
> Number of methods compiled: 18136
> Number of virtual methods with multiple dispatches: 4676
> Percent of the top methods frequency: 76%
> 
> DeCapo bloat:
> Number of methods compiled: 1635
> Number of virtual methods with multiple dispatches: 401
> Percent of the top methods frequency: 88%
> 
> DeCapo hsqldb:
> Number of methods compiled: 2322
> Number of virtual methods with multiple dispatches: 512
> Percent of the top methods frequency: 64%
> 
> DeCapo xalan:
> Number of methods compiled: 2857
> Number of virtual methods with multiple dispatches: 824
> Percent of the top methods frequency: 59%
> 
> The results show that the heuristics to devirtualize the top  method in
> hierarchy is practical, but the frequency of 'middle' methods is too high
> and more advanced methods should be used.

Interesting results! QuickQuestions: 
* did you include interface calls in your investigations. Or is it
  just invokevirtual that you tried? 
* Why do you count number of methods with mult-dispatches? I would
  count only interface calls having multiple dispatches.

> Could I ask you or someone else to review my patch? It's a pity if I had an
> errors in it :)

Sure, I'll gratefully look at your patch and dig into understanding
your ideas, hoping to be successful :) For me it is interesting if
the patch handles the situations when classes are loaded from time to
time. We need to check each interface call for having multiple
dispatches (at runtime).

I created HARMONY-957 for the issue and attached my benchmark. You can
put your patch there, please.

> As the next task I will modify devirtualizer to select a method to
> devirtualize by the hottest frequency and will compare the results.

OK, see you soon :)

> On 20 Jul 2006 09:41:37 +0700, Egor Pasko <eg...@gmail.com> wrote:
> >
> > On the 0x1AB day of Apache Harmony Mikhail Fursov wrote:
> > > > Seems like the best choice is to start from a couple of easy
> > heuristics:
> > > > * if there is only one loaded class to implement the interface, choose
> > it
> > > > * if there are more, choose the one with it's method invoked earlier
> > > > (compiled
> > > >   by some JIT, possibly, some other JIT),
> > > > * if we have many candidate methods that are compiled, choose the most
> > > > frequent
> > > >   one (need a method-entry profile, the feature is likely to stay
> > > > untouched for
> > > >   a while, I guess)
> > > >
> > > >
> > > > 1. Does anybody have some additional elegant ideas?
> > > >
> > > >
> > > Egor,  I'm interested in devirtualizer improvement too.
> >
> > Great!
> >
> > > IMO the profile based devirtualization will probably have the best
> > results
> > > and is easy to implement and to check: infrastructure in Jitrino is
> > ready to
> > > do it right now with a help of entry-backedge profile collected for OPT
> > by
> > > JET.
> >
> > OK, if it is not so difficult to use the profile. BTW, if I have a
> > backedge in my HIR, how do I identify it with JET's backedge? Or is
> > this done automatically in some way I should not care about?
> >
> > > + more ideas I have:
> > > 1) to use edge profiler as value based one if initial compilation
> > > devirtualize all possible dispatches and count their edge frequencies.
> >
> > Oh, that's a real code bloater!:) IMHO, inlining of these 'possible
> > dispatches' should be disabled. Needs to make a hint for inliner. Not
> > a difficult task though. Just an extra flag in call instruction.
> >
> > > 2) implement a real value profiler - this is not an easy task, but may
> > be
> > > reused in other optimizations
> >
> > Yes, sure. This is a big item for ongoing development.
> >
> > > 3) Add special annotations to classlib code about the most probable
> > > dispatch. E.g. if the variable type does not depend on user's
> > environment
> > > and developer can prove that 90% of time the variable is of specific
> > class -
> > > JIT can read this annotation from method during the compilation and to
> > > devirtualize it.
> >
> > Okay, waiting for 1.5 support here. I like the idea of writing
> > annotations for devirtualization, parallelization, and other. They can
> > provide hints without changing program semantics. I think, they should
> > be available not only in classlib, but for user code too.
> >
> > > I will read more about other devirtualization techniques we can use and
> > will
> > > reply in a several days with results.
> >
> > That would be great! Thanks for ideas!
> >
> > --
> > Egor Pasko, Intel Managed Runtime Division
> >
> >
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >
> >
> 
> 
> -- 
> Mikhail Fursov
> 
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org

-- 
Egor Pasko, Intel Managed Runtime Division


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [drlvm] interface call devirtualization

Posted by Mikhail Fursov <mi...@gmail.com>.

Egor,
I promised to start to investigate devirtualization techniques and here is
the first result:
I hacked the entry-backedge profiler in DRLVM and added the dump of virtual
methods call frequencies when VM is about to be shutted down.
The goal is to test that the last method in class hierarchy is the most
probable one.

The numbers I've got for different runtimes:

Eclipse startup (with a couple of opened projects):
Number of methods compiled: 18136
Number of virtual methods with multiple dispatches: 4676
Percent of the top methods frequency: 76%

DeCapo bloat:
Number of methods compiled: 1635
Number of virtual methods with multiple dispatches: 401
Percent of the top methods frequency: 88%

DeCapo hsqldb:
Number of methods compiled: 2322
Number of virtual methods with multiple dispatches: 512
Percent of the top methods frequency: 64%

DeCapo xalan:
Number of methods compiled: 2857
Number of virtual methods with multiple dispatches: 824
Percent of the top methods frequency: 59%

The results show that the heuristics to devirtualize the top  method in
hierarchy is practical, but the frequency of 'middle' methods is too high
and more advanced methods should be used.

Could I ask you or someone else to review my patch? It's a pity if I had an
errors in it :)

As the next task I will modify devirtualizer to select a method to
devirtualize by the hottest frequency and will compare the results.


On 20 Jul 2006 09:41:37 +0700, Egor Pasko <eg...@gmail.com> wrote:
>
> On the 0x1AB day of Apache Harmony Mikhail Fursov wrote:
> > > Seems like the best choice is to start from a couple of easy
> heuristics:
> > > * if there is only one loaded class to implement the interface, choose
> it
> > > * if there are more, choose the one with it's method invoked earlier
> > > (compiled
> > >   by some JIT, possibly, some other JIT),
> > > * if we have many candidate methods that are compiled, choose the most
> > > frequent
> > >   one (need a method-entry profile, the feature is likely to stay
> > > untouched for
> > >   a while, I guess)
> > >
> > >
> > > 1. Does anybody have some additional elegant ideas?
> > >
> > >
> > Egor,  I'm interested in devirtualizer improvement too.
>
> Great!
>
> > IMO the profile based devirtualization will probably have the best
> results
> > and is easy to implement and to check: infrastructure in Jitrino is
> ready to
> > do it right now with a help of entry-backedge profile collected for OPT
> by
> > JET.
>
> OK, if it is not so difficult to use the profile. BTW, if I have a
> backedge in my HIR, how do I identify it with JET's backedge? Or is
> this done automatically in some way I should not care about?
>
> > + more ideas I have:
> > 1) to use edge profiler as value based one if initial compilation
> > devirtualize all possible dispatches and count their edge frequencies.
>
> Oh, that's a real code bloater!:) IMHO, inlining of these 'possible
> dispatches' should be disabled. Needs to make a hint for inliner. Not
> a difficult task though. Just an extra flag in call instruction.
>
> > 2) implement a real value profiler - this is not an easy task, but may
> be
> > reused in other optimizations
>
> Yes, sure. This is a big item for ongoing development.
>
> > 3) Add special annotations to classlib code about the most probable
> > dispatch. E.g. if the variable type does not depend on user's
> environment
> > and developer can prove that 90% of time the variable is of specific
> class -
> > JIT can read this annotation from method during the compilation and to
> > devirtualize it.
>
> Okay, waiting for 1.5 support here. I like the idea of writing
> annotations for devirtualization, parallelization, and other. They can
> provide hints without changing program semantics. I think, they should
> be available not only in classlib, but for user code too.
>
> > I will read more about other devirtualization techniques we can use and
> will
> > reply in a several days with results.
>
> That would be great! Thanks for ideas!
>
> --
> Egor Pasko, Intel Managed Runtime Division
>
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Mikhail Fursov

Re: [drlvm] interface call devirtualization

Posted by Egor Pasko <eg...@gmail.com>.

On the 0x1AB day of Apache Harmony Mikhail Fursov wrote:
> > Seems like the best choice is to start from a couple of easy heuristics:
> > * if there is only one loaded class to implement the interface, choose it
> > * if there are more, choose the one with it's method invoked earlier
> > (compiled
> >   by some JIT, possibly, some other JIT),
> > * if we have many candidate methods that are compiled, choose the most
> > frequent
> >   one (need a method-entry profile, the feature is likely to stay
> > untouched for
> >   a while, I guess)
> >
> >
> > 1. Does anybody have some additional elegant ideas?
> >
> >
> Egor,  I'm interested in devirtualizer improvement too.

Great!

> IMO the profile based devirtualization will probably have the best results
> and is easy to implement and to check: infrastructure in Jitrino is ready to
> do it right now with a help of entry-backedge profile collected for OPT by
> JET.

OK, if it is not so difficult to use the profile. BTW, if I have a
backedge in my HIR, how do I identify it with JET's backedge? Or is
this done automatically in some way I should not care about?

> + more ideas I have:
> 1) to use edge profiler as value based one if initial compilation
> devirtualize all possible dispatches and count their edge frequencies.

Oh, that's a real code bloater!:) IMHO, inlining of these 'possible
dispatches' should be disabled. Needs to make a hint for inliner. Not
a difficult task though. Just an extra flag in call instruction.

> 2) implement a real value profiler - this is not an easy task, but may be
> reused in other optimizations

Yes, sure. This is a big item for ongoing development.

> 3) Add special annotations to classlib code about the most probable
> dispatch. E.g. if the variable type does not depend on user's environment
> and developer can prove that 90% of time the variable is of specific class -
> JIT can read this annotation from method during the compilation and to
> devirtualize it.

Okay, waiting for 1.5 support here. I like the idea of writing
annotations for devirtualization, parallelization, and other. They can
provide hints without changing program semantics. I think, they should
be available not only in classlib, but for user code too.

> I will read more about other devirtualization techniques we can use and will
> reply in a several days with results.

That would be great! Thanks for ideas!

-- 
Egor Pasko, Intel Managed Runtime Division


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [drlvm] interface call devirtualization

Posted by Mikhail Fursov <mi...@gmail.com>.

>
> Seems like the best choice is to start from a couple of easy heuristics:
> * if there is only one loaded class to implement the interface, choose it
> * if there are more, choose the one with it's method invoked earlier
> (compiled
>   by some JIT, possibly, some other JIT),
> * if we have many candidate methods that are compiled, choose the most
> frequent
>   one (need a method-entry profile, the feature is likely to stay
> untouched for
>   a while, I guess)
>
>
> 1. Does anybody have some additional elegant ideas?
>
>
Egor,  I'm interested in devirtualizer improvement too.
IMO the profile based devirtualization will probably have the best results
and is easy to implement and to check: infrastructure in Jitrino is ready to
do it right now with a help of entry-backedge profile collected for OPT by
JET.

+ more ideas I have:
1) to use edge profiler as value based one if initial compilation
devirtualize all possible dispatches and count their edge frequencies.
2) implement a real value profiler - this is not an easy task, but may be
reused in other optimizations
3) Add special annotations to classlib code about the most probable
dispatch. E.g. if the variable type does not depend on user's environment
and developer can prove that 90% of time the variable is of specific class -
JIT can read this annotation from method during the compilation and to
devirtualize it.

I will read more about other devirtualization techniques we can use and will
reply in a several days with results.

-- 
Mikhail Fursov