You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by sh...@computer.org on 2005/05/23 01:48:54 UTC
Re: Other interesting papers and research
From: Steve Blackburn <St...@anu.edu.au>
> > acoliver@apache.org wrote:
> >
> >> The approach of using C Compiler generated code rather than writing a
> >> full compiler appeals to me:
> >> http://www.csc.uvic.ca/~csc586a/papers/ertlgregg04.pdf
> >>
> >> I am curious on how well the approach performs compared to existing
> >> JITs.
> They automatically build themselves
> simple JIT backends (by extracting fragments produced by the ahead of
> time compiler). This sounds like a great way to achieve portability
> while achiving better performance than a conventional interpreter.
I guess it's a bit better or just comparable with a good interpreter.
In 1998, I have written such a JIT compiler concatinate code fragments
generated by GCC for each JVM instruction. Unfortunately, the JIT was
slightly slower than an interpreter in Sun's Classic VM. The
interpreter was written in x86 assembly language and implements
dynamic stack caching with 2 registers and 3 states. It performs much
better than the previous interpreter written in C.
Then I rewrote the JIT.
I am not very sure which is better for us, having a portable and so-so
baseline compiler or a good interpreter which is possibly less
portable than the compiler. There will be a trade off between memory
consumption, portability and so on.
Kazuyuki Shudo shudo@computer.org http://www.shudo.net/
Re: Other interesting papers and research
Posted by Davanum Srinivas <da...@gmail.com>.
Found a paper from David too...
http://www.research.ibm.com/people/d/dgrove/papers/cgo05.html
On 6/6/05, shudo@computer.org <sh...@computer.org> wrote:
> Hi Rob,
>
> > From: Robert Lougher <ro...@gmail.com>
> > Date: Mon, 6 Jun 2005 14:58:45 +0100
>
> > > > One thing to
> > > > note is that a threaded interpreter would see something like a 2-4x
> > > > expansion over "normal" bytecodes when it converts from bytecodes to its
> > > > internal form (arrays of function pointers).
> > >
> > > Direct threading interpreters like JDK's one work on plain Java
> > > bytecode and they do not need to expand normal bytecode instructions.
> > > Such expansion may have been required if Java bytecode is not linear
> > > and rather a tree or other complicated form.
> >
> > According to my understanding, an indirect threaded interpreter uses
> > the original bytecode stream. It's indirect because the handler
> > address must be looked up via the bytecode.
>
> Ah, thanks for the indication.
> My wording 'direct threading' was not correct.
>
> Threading (interpreting) techniques I referred as implemented in JDKs
> should be called 'token threading', neither direct nor indirect threading
> because they work directly on bytecode instructions withought any expansion.
> Note that the interpreter provides NEXT routines to for all native
> code fragments corresponding to VM instructions.
> For JVM, this wording like something threading is not very informative
> because direct interpretation of portable bytecode is naturally
> 'token threading'.
>
> Dave's last posting was based on direct threading technique and his saying
> was correct about direct threading but my posting was incorrect in advance.
>
> Kazuyuki Shudo shudo@computer.org http://www.shudo.net/
>
--
Davanum Srinivas - http://webservices.apache.org/~dims/
Re: Other interesting papers and research
Posted by sh...@computer.org.
Hi Rob,
> From: Robert Lougher <ro...@gmail.com>
> Date: Mon, 6 Jun 2005 14:58:45 +0100
> > > One thing to
> > > note is that a threaded interpreter would see something like a 2-4x
> > > expansion over "normal" bytecodes when it converts from bytecodes to its
> > > internal form (arrays of function pointers).
> >
> > Direct threading interpreters like JDK's one work on plain Java
> > bytecode and they do not need to expand normal bytecode instructions.
> > Such expansion may have been required if Java bytecode is not linear
> > and rather a tree or other complicated form.
>
> According to my understanding, an indirect threaded interpreter uses
> the original bytecode stream. It's indirect because the handler
> address must be looked up via the bytecode.
Ah, thanks for the indication.
My wording 'direct threading' was not correct.
Threading (interpreting) techniques I referred as implemented in JDKs
should be called 'token threading', neither direct nor indirect threading
because they work directly on bytecode instructions withought any expansion.
Note that the interpreter provides NEXT routines to for all native
code fragments corresponding to VM instructions.
For JVM, this wording like something threading is not very informative
because direct interpretation of portable bytecode is naturally
'token threading'.
Dave's last posting was based on direct threading technique and his saying
was correct about direct threading but my posting was incorrect in advance.
Kazuyuki Shudo shudo@computer.org http://www.shudo.net/
Re: Other interesting papers and research
Posted by Robert Lougher <ro...@gmail.com>.
Hi,
On 6/6/05, shudo@computer.org <sh...@computer.org> wrote:
> Hi Dave,
>
> > From: David P Grove <gr...@us.ibm.com>
> >
> > One thing to
> > note is that a threaded interpreter would see something like a 2-4x
> > expansion over "normal" bytecodes when it converts from bytecodes to its
> > internal form (arrays of function pointers).
>
> Direct threading interpreters like JDK's one work on plain Java
> bytecode and they do not need to expand normal bytecode instructions.
> Such expansion may have been required if Java bytecode is not linear
> and rather a tree or other complicated form.
>
According to my understanding, an indirect threaded interpreter uses
the original bytecode stream. It's indirect because the handler
address must be looked up via the bytecode. A direct threaded
interpreter removes this step by placing the actual handler addresses
in the rewritten instruction stream.
For what it's worth, JamVM supports both direct and indirect
threading, with static and dynamic stack-caching respectively. I seem
to recall an average code size increase of ~4x for JamVM's internal
instruction format but I'll need to recheck my figures to be sure.
Note, this is assuming a 32-bit architecture. Handler addresses will
be double on a 64-bit machine, and the code increase over bytecodes
therefore larger.
Rob.
Re: Other interesting papers and research
Posted by sh...@computer.org.
Hi Dave,
> From: David P Grove <gr...@us.ibm.com>
> shudo@computer.org wrote on 06/05/2005 10:48:29 PM:
>
> > - The machine code concatinating technique consumes much memory.
> > In my experience, generated machine code is about 10 times larger
> > than the original instructions in Java bytecode.
> >
> > In the paper, the authors have not mentioned memory consumption of the
> > technique. We cannot guess how much it is precisely, but it is
> > possible to be a big drawback. Yes, we can say the same for the
> > approach taking a baseline compiler instead of an interpreter (like
> > Jikes RVM). Memory consumption of the baseline compiler of Jike RVM
> > is very interesting.
>
> It's platform dependent of course, but on IA32 isn't too horrible. For
> example, running SPECjvm98 we see a 6.23x expansion from the Jikes RVM
> baseline compiler machine code bytes over bytecode bytes.
Thanks for giving us such an useful number.
It looks reasonable.
> One thing to
> note is that a threaded interpreter would see something like a 2-4x
> expansion over "normal" bytecodes when it converts from bytecodes to its
> internal form (arrays of function pointers).
Direct threading interpreters like JDK's one work on plain Java
bytecode and they do not need to expand normal bytecode instructions.
Such expansion may have been required if Java bytecode is not linear
and rather a tree or other complicated form.
Then,
> So, a 6x expansion is
> probably only roughly 2x worse than some interpreted systems would
> actually see in practice.
We have to just say the baseline compiler of Jikes RVM generates 6x
larger native code than the original bytecode instructions.
For Java-written JVM, it seems to be natural to have a baseline
compiler instead of an interpreter.
It looks complicated to have an interpreter for a Java-written JVM. We
hope that the architecture of a JVM (e.g. interpreter or baseline
compiler) is independent of the language for implementing a certain
part of JVM. But there seems to be an implication between them.
Any comment?
Kazuyuki Shudo shudo@computer.org http://www.shudo.net/
Re: Other interesting papers and research
Posted by David P Grove <gr...@us.ibm.com>.
shudo@computer.org wrote on 06/05/2005 10:48:29 PM:
> - The machine code concatinating technique consumes much memory.
> In my experience, generated machine code is about 10 times larger
> than the original instructions in Java bytecode.
>
> In the paper, the authors have not mentioned memory consumption of the
> technique. We cannot guess how much it is precisely, but it is
> possible to be a big drawback. Yes, we can say the same for the
> approach taking a baseline compiler instead of an interpreter (like
> Jikes RVM). Memory consumption of the baseline compiler of Jike RVM
> is very interesting.
It's platform dependent of course, but on IA32 isn't too horrible. For
example, running SPECjvm98 we see a 6.23x expansion from the Jikes RVM
baseline compiler machine code bytes over bytecode bytes. One thing to
note is that a threaded interpreter would see something like a 2-4x
expansion over "normal" bytecodes when it converts from bytecodes to its
internal form (arrays of function pointers). So, a 6x expansion is
probably only roughly 2x worse than some interpreted systems would
actually see in practice. You can get this data out of Jikes RVM for your
platform/program with -X:vm:measureCompilation=true.
--dave
Re: Other interesting papers and research
Posted by sh...@computer.org.
Hi Steve and all,
| The approach of using C Compiler generated code rather than writing a
| full compiler appeals to me:
| http://www.csc.uvic.ca/~csc586a/papers/ertlgregg04.pdf
> From: Steve Blackburn <St...@anu.edu.au>
> Date: Tue, 24 May 2005 21:08:05 +1000
> >>They automatically build themselves
> >>simple JIT backends (by extracting fragments produced by the ahead of
> >>time compiler). This sounds like a great way to achieve portability
> >>while achiving better performance than a conventional interpreter.
> >I guess it's a bit better or just comparable with a good interpreter.
>
> They say it is a lot better: "speedups of up to 1.87 over the fastest
> previous interpreter based technique, and performance comparable to
> simple native code compilers.
Their technique may be reasonable in cases,
but it is better to be aware of:
- On an Athlon processor, speedup (gforth-native over gforth-super) is
less than those on PowerPC. It is between 1.06 and 1.49.
- The machine code concatinating technique consumes much memory.
In my experience, generated machine code is about 10 times larger
than the original instructions in Java bytecode.
In the paper, the authors have not mentioned memory consumption of the
technique. We cannot guess how much it is precisely, but it is
possible to be a big drawback. Yes, we can say the same for the
approach taking a baseline compiler instead of an interpreter (like
Jikes RVM). Memory consumption of the baseline compiler of Jike RVM
is very interesting.
And the next one does not reduce the value of the technique, but
better to know:
- The compared interpreter (gforth-super) could be improved
by other techniques including stack caching.
This means that their machine code concatinating technique may
benefit from those remaining techniques.
> >In 1998, I have written such a JIT compiler concatinate code fragments
> >generated by GCC for each JVM instruction.
>
> Very interesting!
>
> >Unfortunately, the JIT was
> >slightly slower than an interpreter in Sun's Classic VM. The
> >interpreter was written in x86 assembly language and implements
> >dynamic stack caching with 2 registers and 3 states. It performs much
> >better than the previous interpreter written in C.
> >
> >Then I rewrote the JIT.
In reality, my old JIT compiler in C cannot be strictly compared to
the JDK interpreter in assembly language.
I try clarifying the situation.
inst. execution stack handling
(1)Ertl&Gregg's gforth-super direct threading <2> TOS in a register <2>
(2)Ertl&Gregg's gforth-native concat'd machine code <1> TOS in a register <2>
(3)JDK interpreter in C switch threading <3> memory <3>
(4)JDK interpreter in asm direct threading <2> stack caching <1>
(5)Shudo's first JIT concat'd machine code <1> memory <3>
(6)Shudo's lightweight JIT concat'd macihne code <1> stack caching <1>
Gforth-super (1) is the Ertl&Gregg's fastest interpreter compared to
the machine code concatinating technique (2).
(3) is the only interpreter which an ancient JDK 1.0.2 had.
(4) is an interpreter of JDK 1.1 written in assembly language.
(5) is my first JIT compiler which concatinates machine code
for each VM instruction.
(6) is shuJIT, the JIT compiler re-written after (5).
<1>-<3> mean the order of expected performance.
Machine code concatination proposed in the Ertl&Gregg's paper is
marked with <1> because it is the best in those in this table
for VM instruction execution.
(4) and (6) uses 2 machine registers for stack caching and
Ertl&Gregg's implementations uses one register
to cache the top of stack (TOS).
Ertl&Gregg's paper says that gforth-native (2) provided speedups up to
1.87 over gforth-super (1).
I wrote in a previous posting that (5) was slower than (4), but (5)
was inferior to (4) in stack handling. It is possible to be the reason
of the slowness and then I cannot say that the machine code
concatinating technique is not necessarily faster than a good
interpreter like (4).
The followings are other facts I experienced:
- Shudo's lightweight JIT (6) generates about 10 times larger native code
compared with the original Java bytecode instructions.
This is possible to be a drawback of an approach taking a baseline
compiler instead of an interpreter.
Does the Ertl&Gregg's machine concatinating technique suffer it?
How about a baseline compiler of Jikes RVM?
- An interpreter of JDK 1.1 written in assembly (4) executed
a numerical benchmark, Linpack about twice as fast as
the previous interpreter in C (3).
A lesson here is that an interpreter is often regarded as just slow
but a faster one and a slower one are very different in performance.
- Register utilization is still important even for an interpreter.
Interpreters (1) and (4), and lightweight JIT compilers (2) and (6)
utilizes one or two machine registers to cache values around TOS.
It is also important to map VM registers (e.g. PC) onto machine registers.
This means a good interpreter cannot be implemented in a completely
portable way.
Note that I do not know how Ertl&Gregg's implementations (1) and (2)
use a machine register to keep TOS. I guess they have a little
assembly code.
Kazuyuki Shudo shudo@computer.org http://www.shudo.net/
Re: Other interesting papers and research
Posted by Steve Blackburn <St...@anu.edu.au>.
shudo@computer.org wrote:
>>They automatically build themselves
>>simple JIT backends (by extracting fragments produced by the ahead of
>>time compiler). This sounds like a great way to achieve portability
>>while achiving better performance than a conventional interpreter.
>>
>>
>
>I guess it's a bit better or just comparable with a good interpreter.
>
>
They say it is a lot better: "speedups of up to 1.87 over the fastest
previous interpreter based technique, and performance comparable to
simple native code compilers. The effort required for retargeting our
implementation from the 386 to the PPC architecture was less than a
person day."
>In 1998, I have written such a JIT compiler concatinate code fragments
>generated by GCC for each JVM instruction.
>
Very interesting!
>Unfortunately, the JIT was
>slightly slower than an interpreter in Sun's Classic VM. The
>interpreter was written in x86 assembly language and implements
>dynamic stack caching with 2 registers and 3 states. It performs much
>better than the previous interpreter written in C.
>
>Then I rewrote the JIT.
>
>
It would be interesting to hear your perspective on Ertl & Gregg's
approach. Did they do something you had not done? Do they have any
particular insight? You are in an excellent position to make a critical
assessment of their work.
>I am not very sure which is better for us, having a portable and so-so
>baseline compiler or a good interpreter which is possibly less
>portable than the compiler. There will be a trade off between memory
>consumption, portability and so on.
>
>
Ideally we will have both as components and the capacity to choose
either or both depending on the build we are targetting.
Cheers,
--Steve