You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Nathan Beyer <nd...@apache.org> on 2007/04/18 07:03:15 UTC

[drlvm] Another mfence operation??? fast_thread_array???

I've run into another SSE2 operation in DRLVM, this time it's in the
hythr library, but I can't find the specific instance. According to
GDB, it's happening in fast_thread_array() in libhythr.so.

Is this from the 'apr_thread_ext.c' file? I'm going to attempt to
modify this to be like the changes I made to the 'atomics.cpp', so
that this can run on P3 (SSE) CPUs.

nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
./java -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
Illegal instruction
nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
gdb --args ./java -Xint -cp /home/nathan/workspace/helloworld/bin
HelloWorld
GNU gdb 6.6-debian
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) run
Starting program:
/home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/java
-Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
[Thread debugging using libthread_db enabled]
[New Thread -1214834000 (LWP 32414)]
[New Thread -1215992944 (LWP 32417)]
[New process 32414]
[New LWP 32414]

Program received signal SIGILL, Illegal instruction.
0xb7bc543f in fast_thread_array ()
   from /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/libhythr.so
(gdb) x/1i $eip
0xb7bc543f <fast_thread_array+510367>:  mfence
(gdb)

Re: [drlvm] Another mfence operation??? fast_thread_array???

Posted by Gregory Shimansky <gs...@gmail.com>.
Rana Dasgupta wrote:
> Nathan,
>   I was going through your patch and had a small suggestion. We had
> discussed on other related threads that on Windows we wanted to use
> the MSVC provided _ReadWriteBarrier intrinsic and the patch does that,
> but this may not be enough. This generates no instruction, but is a
> compiler fence only. x86 does not need a store fence, but for smp the
> load fence can be certainly needed.
>  That's the assumption on Linux where in apr_memory_rw_barrier() we
> insert both a compiler and an instruction fence, eg.,
> 
> apr_memory_rw_barrier()
> {
> ...
> #else
> asm volatile ("lock; addl $0, 0(%%esp)":::"memory");
> ...
> }
> 
> I agree that since there is no mfence on PIII, probably the dummy lock
> is the safest option for x86 wide code. But  "lock" is expensive,
> specially for Pentium 4 , Centrino etc. which have the "mfence"
> instruction. And this penalty on current platforms is only on Linux.
> My suggestion is to do the same on Windows as well. If we need the
> real fence, we need it on both....So on Windows...
> 
> On Windows in apr_thread_ext.c:
> ================================
> #if _MSC_VER < 1400
>     extern void _ReadWriteBarrier();
> #else
>     #include <intrin.h>
> #endif
> #pragma intrinsic (_ReadWriteBarrier)
> 
> APR_DECLARE(void) apr_memory_rw_barrier(){
> #ifdef _EM64T_
>     // create rwfence64() in asm file port\src\thread\win\rwfence.asm

Actually MSVC (both 2003 and 2005) have intrinsics for all(?) SSE2 
instructions, I found them just recently. So if you need to generate 
mfence you can use _mm_mfence() [1] for it (for sfence there is 
_mm_sfence() [2]). They require mmintrin.h header to me included.

[1] 
ms-help://MS.VSCC.v80/MS.MSDN.v80/MS.VisualStudio.v80.en/dv_vclang/html/a9c9b42f-de9e-4374-aeb6-5f65bfb669b6.htm
[2]
ms-help://MS.VSCC.v80/MS.MSDN.v80/MS.VisualStudio.v80.en/dv_vclang/html/b6c0d18e-3628-4318-826b-45f66782e870.htm

>     rwfence64();
> #else
>     __asm {lock add [esp], 0 }
> #endif
>       _ReadWriteBarrier();
> }
> 
> In port\src\thread\win\rwfence.asm:
> ===================================
> PUBLIC rwfence64
> 
> _TEXT SEGMENT
> 
> rwfence64 PROC
> 
>     mfence
> 
> 
> PROC ENDP
> TEXT ENDS
> 
> What do you think?
> 
> Thanks,
> Rana
> 
> On 4/17/07, Nathan Beyer <nd...@apache.org> wrote:
>> I think I have this resolved with this commit [1]. I'm now able to run
>> DRLVM in interpreted mode on my Quad P3 ... at least a 'Hello World'
>> app.
>>
>> -Nathan
>>
>> [1] http://svn.apache.org/viewvc?view=rev&rev=529880
>>
>> On 4/18/07, Nathan Beyer <nd...@apache.org> wrote:
>> > I've run into another SSE2 operation in DRLVM, this time it's in the
>> > hythr library, but I can't find the specific instance. According to
>> > GDB, it's happening in fast_thread_array() in libhythr.so.
>> >
>> > Is this from the 'apr_thread_ext.c' file? I'm going to attempt to
>> > modify this to be like the changes I made to the 'atomics.cpp', so
>> > that this can run on P3 (SSE) CPUs.
>> >
>> > 
>> nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$ 
>>
>> > ./java -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
>> > Illegal instruction
>> > 
>> nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$ 
>>
>> > gdb --args ./java -Xint -cp /home/nathan/workspace/helloworld/bin
>> > HelloWorld
>> > GNU gdb 6.6-debian
>> > Copyright (C) 2006 Free Software Foundation, Inc.
>> > GDB is free software, covered by the GNU General Public License, and 
>> you are
>> > welcome to change it and/or distribute copies of it under certain 
>> conditions.
>> > Type "show copying" to see the conditions.
>> > There is absolutely no warranty for GDB.  Type "show warranty" for 
>> details.
>> > This GDB was configured as "i486-linux-gnu"...
>> > Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
>> > (gdb) run
>> > Starting program:
>> > 
>> /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/java 
>>
>> > -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
>> > [Thread debugging using libthread_db enabled]
>> > [New Thread -1214834000 (LWP 32414)]
>> > [New Thread -1215992944 (LWP 32417)]
>> > [New process 32414]
>> > [New LWP 32414]
>> >
>> > Program received signal SIGILL, Illegal instruction.
>> > 0xb7bc543f in fast_thread_array ()
>> >    from 
>> /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/libhythr.so 
>>
>> > (gdb) x/1i $eip
>> > 0xb7bc543f <fast_thread_array+510367>:  mfence
>> > (gdb)
>> >
>>
> 


-- 
Gregory


Re: [drlvm] Another mfence operation??? fast_thread_array???

Posted by Nathan Beyer <nd...@apache.org>.
I've committed the changes at
http://svn.apache.org/viewvc?view=rev&revision=531314.

I'm waiting for the builds to make sure the x86_64 Windows builds are okay.

-Nathan

On 4/22/07, Rana Dasgupta <rd...@gmail.com> wrote:
> I spoke in haste :-)  In fact, you don't necessarily need a seperate
> asm file for generating the mfence . You can use another Visual Studio
> intrinsic:
>
> void mm_mfence(void)
>
> to generate the mfence instruction on x86-64 section of  your patch by
> including emmintrin.h.
>
> http://msdn2.microsoft.com/en-us/library/ba08y07y(VS.80).aspx
>
>
>
> On 4/22/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > Nathan,
> >  It supports it for 32 bit x86, but not for 64 bit as an earlier
> > disclaimer of the same msdn docs state.
> >
> > "Inline assembly is not supported on the Itanium and x64 processors.
> > The following topics explain how to use the Visual C/C++ inline
> > assembler with x86 processors:..."
> >
> > So you will need to create a seperate file.
> >
> >  Thanks,
> > Rana
> >
> >
> >
> >
> >
> > On 4/21/07, Nathan Beyer <nd...@apache.org> wrote:
> > > Rana,
> > > You mentioned that VS2005 doesn't support inline assembly, but based
> > > on this reference [1], it seems like it's still supported?
> > >
> > > -Nathan
> > >
> > > [1] http://msdn2.microsoft.com/en-us/library/45yd4tzz(VS.80).aspx
> > >
> > > On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > > > Thanks much Nathan.
> > > >
> > > > On 4/20/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > > Okay, it's clear as mud now! :) Thanks.
> > > > >
> > > > > I'll patch this in now.
> > > > >
> > > > > -Nathan
> > > > >
> > > > > On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > > > > > On 4/20/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > > > > I agree that we need to add the lock as well on windows. BTW - the
> > > > > > > locking for RW barriers was pulled from this document [1]. The lock
> > > > > > > instruction seemed the best of the three options; use mfence (not an
> > > > > > > option), use no-op lock or cpuid, which is apparently very slow.
> > > > > >
> > > > > > Yes, I agree this is the safest if you really want a real read write
> > > > > > barrier that can be used anywhere, with least common denominator code
> > > > > > on x86, using a single binary image. But the penalty needs to be the
> > > > > > same on Windows and Linux.
> > > > > > >
> > > > > > > Another option would be looking at the operation just after the
> > > > > > > barrier; according to this [1], any lock-prefixed instruction will
> > > > > > > work on x86, so instead of a lock and wasted instruction, we could
> > > > > > > lock with the next instruction. Though, this may only be something a
> > > > > > > JIT would do.
> > > > > >
> > > > > > Not all instructions can take the lock prefix that causes an assert on
> > > > > > the bus, which is what you need for the instruction to serialize, so
> > > > > > this "may" not always work.
> > > > > > >
> > > > > > > What does the EM64T mean?
> > > > > > > Isn't that just saying P4? I'm still not
> > > > > > > quite clear on the differences between EM6T, IPF, x86_64/x64, etc?
> > > > > >
> > > > > > I agree, there are just too many acronyms that seem to be ever
> > > > > > changing!  EM64T here indicates 64 bit x86, also called AMD64
> > > > > > elsewhere, or x64. It does have a seperate binary for us, and for it
> > > > > > we use the VS2005 compiler. (For reference, your Linux part of the
> > > > > > patch seperates this out very nicely). Unfortunately, for VS2005, we
> > > > > > can't do __asm{ mfence } etc. since Microsoft does not support inline
> > > > > > assembly in this new compiler. So a seperate .asm file ( above ) is
> > > > > > needed. The build file will have to be modified to assemble this( see
> > > > > > other port/atomic/win etc. )directories, sorry I didn't add the build
> > > > > > change. So the above code will use mfence only on EM64T ( as in Linux
> > > > > > ) , but use the "lock add..." on all 32 bit x86 platforms. IPF  ( also
> > > > > > referred to as iA64 ) is Itanium, and we only support it on Linux
> > > > > > where you use "asm volatile("mf:::memory" which is correct. We don't
> > > > > > need to worry about it on Windows.
> > > > > >
> > > > > > >How
> > > > > > > do these map to the builds? There's not a separate EM64T end state,
> > > > > > > right?
> > > > > > >
> > > > > > > If that does just mean a P4, I don't think this code snippet is
> > > > > > > correct, unless we're going to start separating our builds that way.
> > > > > > >
> > > > > > > [1] http://gee.cs.oswego.edu/dl/jmm/cookbook.html
> > > > > > >
> > > > > > > -Nathan
> > > > > > >
> > > > > > > On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > > > > > > > Nathan,
> > > > > > > >    I was going through your patch and had a small suggestion. We had
> > > > > > > > discussed on other related threads that on Windows we wanted to use
> > > > > > > > the MSVC provided _ReadWriteBarrier intrinsic and the patch does that,
> > > > > > > > but this may not be enough. This generates no instruction, but is a
> > > > > > > > compiler fence only. x86 does not need a store fence, but for smp the
> > > > > > > > load fence can be certainly needed.
> > > > > > > >   That's the assumption on Linux where in apr_memory_rw_barrier() we
> > > > > > > > insert both a compiler and an instruction fence, eg.,
> > > > > > > >
> > > > > > > > apr_memory_rw_barrier()
> > > > > > > > {
> > > > > > > > ...
> > > > > > > > #else
> > > > > > > > asm volatile ("lock; addl $0, 0(%%esp)":::"memory");
> > > > > > > > ...
> > > > > > > > }
> > > > > > > >
> > > > > > > > I agree that since there is no mfence on PIII, probably the dummy lock
> > > > > > > > is the safest option for x86 wide code. But  "lock" is expensive,
> > > > > > > > specially for Pentium 4 , Centrino etc. which have the "mfence"
> > > > > > > > instruction. And this penalty on current platforms is only on Linux.
> > > > > > > > My suggestion is to do the same on Windows as well. If we need the
> > > > > > > > real fence, we need it on both....So on Windows...
> > > > > > > >
> > > > > > > > On Windows in apr_thread_ext.c:
> > > > > > > > ================================
> > > > > > > > #if _MSC_VER < 1400
> > > > > > > >         extern void _ReadWriteBarrier();
> > > > > > > > #else
> > > > > > > >         #include <intrin.h>
> > > > > > > > #endif
> > > > > > > > #pragma intrinsic (_ReadWriteBarrier)
> > > > > > > >
> > > > > > > > APR_DECLARE(void) apr_memory_rw_barrier(){
> > > > > > > > #ifdef _EM64T_
> > > > > > > >         // create rwfence64() in asm file port\src\thread\win\rwfence.asm
> > > > > > > >         rwfence64();
> > > > > > > > #else
> > > > > > > >         __asm {lock add [esp], 0 }
> > > > > > > > #endif
> > > > > > > >        _ReadWriteBarrier();
> > > > > > > > }
> > > > > > > >
> > > > > > > > In port\src\thread\win\rwfence.asm:
> > > > > > > > ===================================
> > > > > > > > PUBLIC rwfence64
> > > > > > > >
> > > > > > > > _TEXT SEGMENT
> > > > > > > >
> > > > > > > > rwfence64 PROC
> > > > > > > >
> > > > > > > >         mfence
> > > > > > > >
> > > > > > > >
> > > > > > > > PROC ENDP
> > > > > > > > TEXT ENDS
> > > > > > > >
> > > > > > > > What do you think?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Rana
> > > > > > > >
> > > > > > > > On 4/17/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > > > > > > I think I have this resolved with this commit [1]. I'm now able to run
> > > > > > > > > DRLVM in interpreted mode on my Quad P3 ... at least a 'Hello World'
> > > > > > > > > app.
> > > > > > > > >
> > > > > > > > > -Nathan
> > > > > > > > >
> > > > > > > > > [1] http://svn.apache.org/viewvc?view=rev&rev=529880
> > > > > > > > >
> > > > > > > > > On 4/18/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > > > > > > > I've run into another SSE2 operation in DRLVM, this time it's in the
> > > > > > > > > > hythr library, but I can't find the specific instance. According to
> > > > > > > > > > GDB, it's happening in fast_thread_array() in libhythr.so.
> > > > > > > > > >
> > > > > > > > > > Is this from the 'apr_thread_ext.c' file? I'm going to attempt to
> > > > > > > > > > modify this to be like the changes I made to the 'atomics.cpp', so
> > > > > > > > > > that this can run on P3 (SSE) CPUs.
> > > > > > > > > >
> > > > > > > > > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > > > > > > > > ./java -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > > > > > > > > Illegal instruction
> > > > > > > > > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > > > > > > > > gdb --args ./java -Xint -cp /home/nathan/workspace/helloworld/bin
> > > > > > > > > > HelloWorld
> > > > > > > > > > GNU gdb 6.6-debian
> > > > > > > > > > Copyright (C) 2006 Free Software Foundation, Inc.
> > > > > > > > > > GDB is free software, covered by the GNU General Public License, and you are
> > > > > > > > > > welcome to change it and/or distribute copies of it under certain conditions.
> > > > > > > > > > Type "show copying" to see the conditions.
> > > > > > > > > > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > > > > > > > > > This GDB was configured as "i486-linux-gnu"...
> > > > > > > > > > Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
> > > > > > > > > > (gdb) run
> > > > > > > > > > Starting program:
> > > > > > > > > > /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/java
> > > > > > > > > > -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > > > > > > > > [Thread debugging using libthread_db enabled]
> > > > > > > > > > [New Thread -1214834000 (LWP 32414)]
> > > > > > > > > > [New Thread -1215992944 (LWP 32417)]
> > > > > > > > > > [New process 32414]
> > > > > > > > > > [New LWP 32414]
> > > > > > > > > >
> > > > > > > > > > Program received signal SIGILL, Illegal instruction.
> > > > > > > > > > 0xb7bc543f in fast_thread_array ()
> > > > > > > > > >    from /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/libhythr.so
> > > > > > > > > > (gdb) x/1i $eip
> > > > > > > > > > 0xb7bc543f <fast_thread_array+510367>:  mfence
> > > > > > > > > > (gdb)
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [drlvm] Another mfence operation??? fast_thread_array???

Posted by Rana Dasgupta <rd...@gmail.com>.
I spoke in haste :-)  In fact, you don't necessarily need a seperate
asm file for generating the mfence . You can use another Visual Studio
intrinsic:

void mm_mfence(void)

to generate the mfence instruction on x86-64 section of  your patch by
including emmintrin.h.

http://msdn2.microsoft.com/en-us/library/ba08y07y(VS.80).aspx



On 4/22/07, Rana Dasgupta <rd...@gmail.com> wrote:
> Nathan,
>  It supports it for 32 bit x86, but not for 64 bit as an earlier
> disclaimer of the same msdn docs state.
>
> "Inline assembly is not supported on the Itanium and x64 processors.
> The following topics explain how to use the Visual C/C++ inline
> assembler with x86 processors:..."
>
> So you will need to create a seperate file.
>
>  Thanks,
> Rana
>
>
>
>
>
> On 4/21/07, Nathan Beyer <nd...@apache.org> wrote:
> > Rana,
> > You mentioned that VS2005 doesn't support inline assembly, but based
> > on this reference [1], it seems like it's still supported?
> >
> > -Nathan
> >
> > [1] http://msdn2.microsoft.com/en-us/library/45yd4tzz(VS.80).aspx
> >
> > On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > > Thanks much Nathan.
> > >
> > > On 4/20/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > Okay, it's clear as mud now! :) Thanks.
> > > >
> > > > I'll patch this in now.
> > > >
> > > > -Nathan
> > > >
> > > > On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > > > > On 4/20/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > > > I agree that we need to add the lock as well on windows. BTW - the
> > > > > > locking for RW barriers was pulled from this document [1]. The lock
> > > > > > instruction seemed the best of the three options; use mfence (not an
> > > > > > option), use no-op lock or cpuid, which is apparently very slow.
> > > > >
> > > > > Yes, I agree this is the safest if you really want a real read write
> > > > > barrier that can be used anywhere, with least common denominator code
> > > > > on x86, using a single binary image. But the penalty needs to be the
> > > > > same on Windows and Linux.
> > > > > >
> > > > > > Another option would be looking at the operation just after the
> > > > > > barrier; according to this [1], any lock-prefixed instruction will
> > > > > > work on x86, so instead of a lock and wasted instruction, we could
> > > > > > lock with the next instruction. Though, this may only be something a
> > > > > > JIT would do.
> > > > >
> > > > > Not all instructions can take the lock prefix that causes an assert on
> > > > > the bus, which is what you need for the instruction to serialize, so
> > > > > this "may" not always work.
> > > > > >
> > > > > > What does the EM64T mean?
> > > > > > Isn't that just saying P4? I'm still not
> > > > > > quite clear on the differences between EM6T, IPF, x86_64/x64, etc?
> > > > >
> > > > > I agree, there are just too many acronyms that seem to be ever
> > > > > changing!  EM64T here indicates 64 bit x86, also called AMD64
> > > > > elsewhere, or x64. It does have a seperate binary for us, and for it
> > > > > we use the VS2005 compiler. (For reference, your Linux part of the
> > > > > patch seperates this out very nicely). Unfortunately, for VS2005, we
> > > > > can't do __asm{ mfence } etc. since Microsoft does not support inline
> > > > > assembly in this new compiler. So a seperate .asm file ( above ) is
> > > > > needed. The build file will have to be modified to assemble this( see
> > > > > other port/atomic/win etc. )directories, sorry I didn't add the build
> > > > > change. So the above code will use mfence only on EM64T ( as in Linux
> > > > > ) , but use the "lock add..." on all 32 bit x86 platforms. IPF  ( also
> > > > > referred to as iA64 ) is Itanium, and we only support it on Linux
> > > > > where you use "asm volatile("mf:::memory" which is correct. We don't
> > > > > need to worry about it on Windows.
> > > > >
> > > > > >How
> > > > > > do these map to the builds? There's not a separate EM64T end state,
> > > > > > right?
> > > > > >
> > > > > > If that does just mean a P4, I don't think this code snippet is
> > > > > > correct, unless we're going to start separating our builds that way.
> > > > > >
> > > > > > [1] http://gee.cs.oswego.edu/dl/jmm/cookbook.html
> > > > > >
> > > > > > -Nathan
> > > > > >
> > > > > > On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > > > > > > Nathan,
> > > > > > >    I was going through your patch and had a small suggestion. We had
> > > > > > > discussed on other related threads that on Windows we wanted to use
> > > > > > > the MSVC provided _ReadWriteBarrier intrinsic and the patch does that,
> > > > > > > but this may not be enough. This generates no instruction, but is a
> > > > > > > compiler fence only. x86 does not need a store fence, but for smp the
> > > > > > > load fence can be certainly needed.
> > > > > > >   That's the assumption on Linux where in apr_memory_rw_barrier() we
> > > > > > > insert both a compiler and an instruction fence, eg.,
> > > > > > >
> > > > > > > apr_memory_rw_barrier()
> > > > > > > {
> > > > > > > ...
> > > > > > > #else
> > > > > > > asm volatile ("lock; addl $0, 0(%%esp)":::"memory");
> > > > > > > ...
> > > > > > > }
> > > > > > >
> > > > > > > I agree that since there is no mfence on PIII, probably the dummy lock
> > > > > > > is the safest option for x86 wide code. But  "lock" is expensive,
> > > > > > > specially for Pentium 4 , Centrino etc. which have the "mfence"
> > > > > > > instruction. And this penalty on current platforms is only on Linux.
> > > > > > > My suggestion is to do the same on Windows as well. If we need the
> > > > > > > real fence, we need it on both....So on Windows...
> > > > > > >
> > > > > > > On Windows in apr_thread_ext.c:
> > > > > > > ================================
> > > > > > > #if _MSC_VER < 1400
> > > > > > >         extern void _ReadWriteBarrier();
> > > > > > > #else
> > > > > > >         #include <intrin.h>
> > > > > > > #endif
> > > > > > > #pragma intrinsic (_ReadWriteBarrier)
> > > > > > >
> > > > > > > APR_DECLARE(void) apr_memory_rw_barrier(){
> > > > > > > #ifdef _EM64T_
> > > > > > >         // create rwfence64() in asm file port\src\thread\win\rwfence.asm
> > > > > > >         rwfence64();
> > > > > > > #else
> > > > > > >         __asm {lock add [esp], 0 }
> > > > > > > #endif
> > > > > > >        _ReadWriteBarrier();
> > > > > > > }
> > > > > > >
> > > > > > > In port\src\thread\win\rwfence.asm:
> > > > > > > ===================================
> > > > > > > PUBLIC rwfence64
> > > > > > >
> > > > > > > _TEXT SEGMENT
> > > > > > >
> > > > > > > rwfence64 PROC
> > > > > > >
> > > > > > >         mfence
> > > > > > >
> > > > > > >
> > > > > > > PROC ENDP
> > > > > > > TEXT ENDS
> > > > > > >
> > > > > > > What do you think?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Rana
> > > > > > >
> > > > > > > On 4/17/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > > > > > I think I have this resolved with this commit [1]. I'm now able to run
> > > > > > > > DRLVM in interpreted mode on my Quad P3 ... at least a 'Hello World'
> > > > > > > > app.
> > > > > > > >
> > > > > > > > -Nathan
> > > > > > > >
> > > > > > > > [1] http://svn.apache.org/viewvc?view=rev&rev=529880
> > > > > > > >
> > > > > > > > On 4/18/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > > > > > > I've run into another SSE2 operation in DRLVM, this time it's in the
> > > > > > > > > hythr library, but I can't find the specific instance. According to
> > > > > > > > > GDB, it's happening in fast_thread_array() in libhythr.so.
> > > > > > > > >
> > > > > > > > > Is this from the 'apr_thread_ext.c' file? I'm going to attempt to
> > > > > > > > > modify this to be like the changes I made to the 'atomics.cpp', so
> > > > > > > > > that this can run on P3 (SSE) CPUs.
> > > > > > > > >
> > > > > > > > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > > > > > > > ./java -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > > > > > > > Illegal instruction
> > > > > > > > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > > > > > > > gdb --args ./java -Xint -cp /home/nathan/workspace/helloworld/bin
> > > > > > > > > HelloWorld
> > > > > > > > > GNU gdb 6.6-debian
> > > > > > > > > Copyright (C) 2006 Free Software Foundation, Inc.
> > > > > > > > > GDB is free software, covered by the GNU General Public License, and you are
> > > > > > > > > welcome to change it and/or distribute copies of it under certain conditions.
> > > > > > > > > Type "show copying" to see the conditions.
> > > > > > > > > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > > > > > > > > This GDB was configured as "i486-linux-gnu"...
> > > > > > > > > Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
> > > > > > > > > (gdb) run
> > > > > > > > > Starting program:
> > > > > > > > > /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/java
> > > > > > > > > -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > > > > > > > [Thread debugging using libthread_db enabled]
> > > > > > > > > [New Thread -1214834000 (LWP 32414)]
> > > > > > > > > [New Thread -1215992944 (LWP 32417)]
> > > > > > > > > [New process 32414]
> > > > > > > > > [New LWP 32414]
> > > > > > > > >
> > > > > > > > > Program received signal SIGILL, Illegal instruction.
> > > > > > > > > 0xb7bc543f in fast_thread_array ()
> > > > > > > > >    from /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/libhythr.so
> > > > > > > > > (gdb) x/1i $eip
> > > > > > > > > 0xb7bc543f <fast_thread_array+510367>:  mfence
> > > > > > > > > (gdb)
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [drlvm] Another mfence operation??? fast_thread_array???

Posted by Rana Dasgupta <rd...@gmail.com>.
Nathan,
  It supports it for 32 bit x86, but not for 64 bit as an earlier
disclaimer of the same msdn docs state.

"Inline assembly is not supported on the Itanium and x64 processors.
The following topics explain how to use the Visual C/C++ inline
assembler with x86 processors:..."

So you will need to create a seperate file.

  Thanks,
Rana





On 4/21/07, Nathan Beyer <nd...@apache.org> wrote:
> Rana,
> You mentioned that VS2005 doesn't support inline assembly, but based
> on this reference [1], it seems like it's still supported?
>
> -Nathan
>
> [1] http://msdn2.microsoft.com/en-us/library/45yd4tzz(VS.80).aspx
>
> On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > Thanks much Nathan.
> >
> > On 4/20/07, Nathan Beyer <nd...@apache.org> wrote:
> > > Okay, it's clear as mud now! :) Thanks.
> > >
> > > I'll patch this in now.
> > >
> > > -Nathan
> > >
> > > On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > > > On 4/20/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > > I agree that we need to add the lock as well on windows. BTW - the
> > > > > locking for RW barriers was pulled from this document [1]. The lock
> > > > > instruction seemed the best of the three options; use mfence (not an
> > > > > option), use no-op lock or cpuid, which is apparently very slow.
> > > >
> > > > Yes, I agree this is the safest if you really want a real read write
> > > > barrier that can be used anywhere, with least common denominator code
> > > > on x86, using a single binary image. But the penalty needs to be the
> > > > same on Windows and Linux.
> > > > >
> > > > > Another option would be looking at the operation just after the
> > > > > barrier; according to this [1], any lock-prefixed instruction will
> > > > > work on x86, so instead of a lock and wasted instruction, we could
> > > > > lock with the next instruction. Though, this may only be something a
> > > > > JIT would do.
> > > >
> > > > Not all instructions can take the lock prefix that causes an assert on
> > > > the bus, which is what you need for the instruction to serialize, so
> > > > this "may" not always work.
> > > > >
> > > > > What does the EM64T mean?
> > > > > Isn't that just saying P4? I'm still not
> > > > > quite clear on the differences between EM6T, IPF, x86_64/x64, etc?
> > > >
> > > > I agree, there are just too many acronyms that seem to be ever
> > > > changing!  EM64T here indicates 64 bit x86, also called AMD64
> > > > elsewhere, or x64. It does have a seperate binary for us, and for it
> > > > we use the VS2005 compiler. (For reference, your Linux part of the
> > > > patch seperates this out very nicely). Unfortunately, for VS2005, we
> > > > can't do __asm{ mfence } etc. since Microsoft does not support inline
> > > > assembly in this new compiler. So a seperate .asm file ( above ) is
> > > > needed. The build file will have to be modified to assemble this( see
> > > > other port/atomic/win etc. )directories, sorry I didn't add the build
> > > > change. So the above code will use mfence only on EM64T ( as in Linux
> > > > ) , but use the "lock add..." on all 32 bit x86 platforms. IPF  ( also
> > > > referred to as iA64 ) is Itanium, and we only support it on Linux
> > > > where you use "asm volatile("mf:::memory" which is correct. We don't
> > > > need to worry about it on Windows.
> > > >
> > > > >How
> > > > > do these map to the builds? There's not a separate EM64T end state,
> > > > > right?
> > > > >
> > > > > If that does just mean a P4, I don't think this code snippet is
> > > > > correct, unless we're going to start separating our builds that way.
> > > > >
> > > > > [1] http://gee.cs.oswego.edu/dl/jmm/cookbook.html
> > > > >
> > > > > -Nathan
> > > > >
> > > > > On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > > > > > Nathan,
> > > > > >    I was going through your patch and had a small suggestion. We had
> > > > > > discussed on other related threads that on Windows we wanted to use
> > > > > > the MSVC provided _ReadWriteBarrier intrinsic and the patch does that,
> > > > > > but this may not be enough. This generates no instruction, but is a
> > > > > > compiler fence only. x86 does not need a store fence, but for smp the
> > > > > > load fence can be certainly needed.
> > > > > >   That's the assumption on Linux where in apr_memory_rw_barrier() we
> > > > > > insert both a compiler and an instruction fence, eg.,
> > > > > >
> > > > > > apr_memory_rw_barrier()
> > > > > > {
> > > > > > ...
> > > > > > #else
> > > > > > asm volatile ("lock; addl $0, 0(%%esp)":::"memory");
> > > > > > ...
> > > > > > }
> > > > > >
> > > > > > I agree that since there is no mfence on PIII, probably the dummy lock
> > > > > > is the safest option for x86 wide code. But  "lock" is expensive,
> > > > > > specially for Pentium 4 , Centrino etc. which have the "mfence"
> > > > > > instruction. And this penalty on current platforms is only on Linux.
> > > > > > My suggestion is to do the same on Windows as well. If we need the
> > > > > > real fence, we need it on both....So on Windows...
> > > > > >
> > > > > > On Windows in apr_thread_ext.c:
> > > > > > ================================
> > > > > > #if _MSC_VER < 1400
> > > > > >         extern void _ReadWriteBarrier();
> > > > > > #else
> > > > > >         #include <intrin.h>
> > > > > > #endif
> > > > > > #pragma intrinsic (_ReadWriteBarrier)
> > > > > >
> > > > > > APR_DECLARE(void) apr_memory_rw_barrier(){
> > > > > > #ifdef _EM64T_
> > > > > >         // create rwfence64() in asm file port\src\thread\win\rwfence.asm
> > > > > >         rwfence64();
> > > > > > #else
> > > > > >         __asm {lock add [esp], 0 }
> > > > > > #endif
> > > > > >        _ReadWriteBarrier();
> > > > > > }
> > > > > >
> > > > > > In port\src\thread\win\rwfence.asm:
> > > > > > ===================================
> > > > > > PUBLIC rwfence64
> > > > > >
> > > > > > _TEXT SEGMENT
> > > > > >
> > > > > > rwfence64 PROC
> > > > > >
> > > > > >         mfence
> > > > > >
> > > > > >
> > > > > > PROC ENDP
> > > > > > TEXT ENDS
> > > > > >
> > > > > > What do you think?
> > > > > >
> > > > > > Thanks,
> > > > > > Rana
> > > > > >
> > > > > > On 4/17/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > > > > I think I have this resolved with this commit [1]. I'm now able to run
> > > > > > > DRLVM in interpreted mode on my Quad P3 ... at least a 'Hello World'
> > > > > > > app.
> > > > > > >
> > > > > > > -Nathan
> > > > > > >
> > > > > > > [1] http://svn.apache.org/viewvc?view=rev&rev=529880
> > > > > > >
> > > > > > > On 4/18/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > > > > > I've run into another SSE2 operation in DRLVM, this time it's in the
> > > > > > > > hythr library, but I can't find the specific instance. According to
> > > > > > > > GDB, it's happening in fast_thread_array() in libhythr.so.
> > > > > > > >
> > > > > > > > Is this from the 'apr_thread_ext.c' file? I'm going to attempt to
> > > > > > > > modify this to be like the changes I made to the 'atomics.cpp', so
> > > > > > > > that this can run on P3 (SSE) CPUs.
> > > > > > > >
> > > > > > > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > > > > > > ./java -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > > > > > > Illegal instruction
> > > > > > > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > > > > > > gdb --args ./java -Xint -cp /home/nathan/workspace/helloworld/bin
> > > > > > > > HelloWorld
> > > > > > > > GNU gdb 6.6-debian
> > > > > > > > Copyright (C) 2006 Free Software Foundation, Inc.
> > > > > > > > GDB is free software, covered by the GNU General Public License, and you are
> > > > > > > > welcome to change it and/or distribute copies of it under certain conditions.
> > > > > > > > Type "show copying" to see the conditions.
> > > > > > > > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > > > > > > > This GDB was configured as "i486-linux-gnu"...
> > > > > > > > Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
> > > > > > > > (gdb) run
> > > > > > > > Starting program:
> > > > > > > > /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/java
> > > > > > > > -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > > > > > > [Thread debugging using libthread_db enabled]
> > > > > > > > [New Thread -1214834000 (LWP 32414)]
> > > > > > > > [New Thread -1215992944 (LWP 32417)]
> > > > > > > > [New process 32414]
> > > > > > > > [New LWP 32414]
> > > > > > > >
> > > > > > > > Program received signal SIGILL, Illegal instruction.
> > > > > > > > 0xb7bc543f in fast_thread_array ()
> > > > > > > >    from /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/libhythr.so
> > > > > > > > (gdb) x/1i $eip
> > > > > > > > 0xb7bc543f <fast_thread_array+510367>:  mfence
> > > > > > > > (gdb)
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [drlvm] Another mfence operation??? fast_thread_array???

Posted by Gregory Shimansky <gs...@gmail.com>.
Nathan Beyer wrote:
> Rana,
> You mentioned that VS2005 doesn't support inline assembly, but based
> on this reference [1], it seems like it's still supported?

Rana didn't write quite correctly. MSVC 2005 supports inline assembly in 
32-bit compilation mode only, just line MSVC 2003 does. For x86_64 
target there is no inline assembly.

> [1] http://msdn2.microsoft.com/en-us/library/45yd4tzz(VS.80).aspx
> 
> On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
>> Thanks much Nathan.
>>
>> On 4/20/07, Nathan Beyer <nd...@apache.org> wrote:
>> > Okay, it's clear as mud now! :) Thanks.
>> >
>> > I'll patch this in now.
>> >
>> > -Nathan
>> >
>> > On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
>> > > On 4/20/07, Nathan Beyer <nd...@apache.org> wrote:
>> > > > I agree that we need to add the lock as well on windows. BTW - the
>> > > > locking for RW barriers was pulled from this document [1]. The lock
>> > > > instruction seemed the best of the three options; use mfence 
>> (not an
>> > > > option), use no-op lock or cpuid, which is apparently very slow.
>> > >
>> > > Yes, I agree this is the safest if you really want a real read write
>> > > barrier that can be used anywhere, with least common denominator code
>> > > on x86, using a single binary image. But the penalty needs to be the
>> > > same on Windows and Linux.
>> > > >
>> > > > Another option would be looking at the operation just after the
>> > > > barrier; according to this [1], any lock-prefixed instruction will
>> > > > work on x86, so instead of a lock and wasted instruction, we could
>> > > > lock with the next instruction. Though, this may only be 
>> something a
>> > > > JIT would do.
>> > >
>> > > Not all instructions can take the lock prefix that causes an 
>> assert on
>> > > the bus, which is what you need for the instruction to serialize, so
>> > > this "may" not always work.
>> > > >
>> > > > What does the EM64T mean?
>> > > > Isn't that just saying P4? I'm still not
>> > > > quite clear on the differences between EM6T, IPF, x86_64/x64, etc?
>> > >
>> > > I agree, there are just too many acronyms that seem to be ever
>> > > changing!  EM64T here indicates 64 bit x86, also called AMD64
>> > > elsewhere, or x64. It does have a seperate binary for us, and for it
>> > > we use the VS2005 compiler. (For reference, your Linux part of the
>> > > patch seperates this out very nicely). Unfortunately, for VS2005, we
>> > > can't do __asm{ mfence } etc. since Microsoft does not support inline
>> > > assembly in this new compiler. So a seperate .asm file ( above ) is
>> > > needed. The build file will have to be modified to assemble this( see
>> > > other port/atomic/win etc. )directories, sorry I didn't add the build
>> > > change. So the above code will use mfence only on EM64T ( as in Linux
>> > > ) , but use the "lock add..." on all 32 bit x86 platforms. IPF  ( 
>> also
>> > > referred to as iA64 ) is Itanium, and we only support it on Linux
>> > > where you use "asm volatile("mf:::memory" which is correct. We don't
>> > > need to worry about it on Windows.
>> > >
>> > > >How
>> > > > do these map to the builds? There's not a separate EM64T end state,
>> > > > right?
>> > > >
>> > > > If that does just mean a P4, I don't think this code snippet is
>> > > > correct, unless we're going to start separating our builds that 
>> way.
>> > > >
>> > > > [1] http://gee.cs.oswego.edu/dl/jmm/cookbook.html
>> > > >
>> > > > -Nathan
>> > > >
>> > > > On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
>> > > > > Nathan,
>> > > > >    I was going through your patch and had a small suggestion. 
>> We had
>> > > > > discussed on other related threads that on Windows we wanted 
>> to use
>> > > > > the MSVC provided _ReadWriteBarrier intrinsic and the patch 
>> does that,
>> > > > > but this may not be enough. This generates no instruction, but 
>> is a
>> > > > > compiler fence only. x86 does not need a store fence, but for 
>> smp the
>> > > > > load fence can be certainly needed.
>> > > > >   That's the assumption on Linux where in 
>> apr_memory_rw_barrier() we
>> > > > > insert both a compiler and an instruction fence, eg.,
>> > > > >
>> > > > > apr_memory_rw_barrier()
>> > > > > {
>> > > > > ...
>> > > > > #else
>> > > > > asm volatile ("lock; addl $0, 0(%%esp)":::"memory");
>> > > > > ...
>> > > > > }
>> > > > >
>> > > > > I agree that since there is no mfence on PIII, probably the 
>> dummy lock
>> > > > > is the safest option for x86 wide code. But  "lock" is expensive,
>> > > > > specially for Pentium 4 , Centrino etc. which have the "mfence"
>> > > > > instruction. And this penalty on current platforms is only on 
>> Linux.
>> > > > > My suggestion is to do the same on Windows as well. If we need 
>> the
>> > > > > real fence, we need it on both....So on Windows...
>> > > > >
>> > > > > On Windows in apr_thread_ext.c:
>> > > > > ================================
>> > > > > #if _MSC_VER < 1400
>> > > > >         extern void _ReadWriteBarrier();
>> > > > > #else
>> > > > >         #include <intrin.h>
>> > > > > #endif
>> > > > > #pragma intrinsic (_ReadWriteBarrier)
>> > > > >
>> > > > > APR_DECLARE(void) apr_memory_rw_barrier(){
>> > > > > #ifdef _EM64T_
>> > > > >         // create rwfence64() in asm file 
>> port\src\thread\win\rwfence.asm
>> > > > >         rwfence64();
>> > > > > #else
>> > > > >         __asm {lock add [esp], 0 }
>> > > > > #endif
>> > > > >        _ReadWriteBarrier();
>> > > > > }
>> > > > >
>> > > > > In port\src\thread\win\rwfence.asm:
>> > > > > ===================================
>> > > > > PUBLIC rwfence64
>> > > > >
>> > > > > _TEXT SEGMENT
>> > > > >
>> > > > > rwfence64 PROC
>> > > > >
>> > > > >         mfence
>> > > > >
>> > > > >
>> > > > > PROC ENDP
>> > > > > TEXT ENDS
>> > > > >
>> > > > > What do you think?
>> > > > >
>> > > > > Thanks,
>> > > > > Rana
>> > > > >
>> > > > > On 4/17/07, Nathan Beyer <nd...@apache.org> wrote:
>> > > > > > I think I have this resolved with this commit [1]. I'm now 
>> able to run
>> > > > > > DRLVM in interpreted mode on my Quad P3 ... at least a 
>> 'Hello World'
>> > > > > > app.
>> > > > > >
>> > > > > > -Nathan
>> > > > > >
>> > > > > > [1] http://svn.apache.org/viewvc?view=rev&rev=529880
>> > > > > >
>> > > > > > On 4/18/07, Nathan Beyer <nd...@apache.org> wrote:
>> > > > > > > I've run into another SSE2 operation in DRLVM, this time 
>> it's in the
>> > > > > > > hythr library, but I can't find the specific instance. 
>> According to
>> > > > > > > GDB, it's happening in fast_thread_array() in libhythr.so.
>> > > > > > >
>> > > > > > > Is this from the 'apr_thread_ext.c' file? I'm going to 
>> attempt to
>> > > > > > > modify this to be like the changes I made to the 
>> 'atomics.cpp', so
>> > > > > > > that this can run on P3 (SSE) CPUs.
>> > > > > > >
>> > > > > > > 
>> nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$ 
>>
>> > > > > > > ./java -Xint -cp /home/nathan/workspace/helloworld/bin 
>> HelloWorld
>> > > > > > > Illegal instruction
>> > > > > > > 
>> nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$ 
>>
>> > > > > > > gdb --args ./java -Xint -cp 
>> /home/nathan/workspace/helloworld/bin
>> > > > > > > HelloWorld
>> > > > > > > GNU gdb 6.6-debian
>> > > > > > > Copyright (C) 2006 Free Software Foundation, Inc.
>> > > > > > > GDB is free software, covered by the GNU General Public 
>> License, and you are
>> > > > > > > welcome to change it and/or distribute copies of it under 
>> certain conditions.
>> > > > > > > Type "show copying" to see the conditions.
>> > > > > > > There is absolutely no warranty for GDB.  Type "show 
>> warranty" for details.
>> > > > > > > This GDB was configured as "i486-linux-gnu"...
>> > > > > > > Using host libthread_db library 
>> "/lib/tls/i686/cmov/libthread_db.so.1".
>> > > > > > > (gdb) run
>> > > > > > > Starting program:
>> > > > > > > 
>> /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/java 
>>
>> > > > > > > -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
>> > > > > > > [Thread debugging using libthread_db enabled]
>> > > > > > > [New Thread -1214834000 (LWP 32414)]
>> > > > > > > [New Thread -1215992944 (LWP 32417)]
>> > > > > > > [New process 32414]
>> > > > > > > [New LWP 32414]
>> > > > > > >
>> > > > > > > Program received signal SIGILL, Illegal instruction.
>> > > > > > > 0xb7bc543f in fast_thread_array ()
>> > > > > > >    from 
>> /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/libhythr.so 
>>
>> > > > > > > (gdb) x/1i $eip
>> > > > > > > 0xb7bc543f <fast_thread_array+510367>:  mfence
>> > > > > > > (gdb)
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
> 


-- 
Gregory


Re: [drlvm] Another mfence operation??? fast_thread_array???

Posted by Nathan Beyer <nd...@apache.org>.
Rana,
You mentioned that VS2005 doesn't support inline assembly, but based
on this reference [1], it seems like it's still supported?

-Nathan

[1] http://msdn2.microsoft.com/en-us/library/45yd4tzz(VS.80).aspx

On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> Thanks much Nathan.
>
> On 4/20/07, Nathan Beyer <nd...@apache.org> wrote:
> > Okay, it's clear as mud now! :) Thanks.
> >
> > I'll patch this in now.
> >
> > -Nathan
> >
> > On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > > On 4/20/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > I agree that we need to add the lock as well on windows. BTW - the
> > > > locking for RW barriers was pulled from this document [1]. The lock
> > > > instruction seemed the best of the three options; use mfence (not an
> > > > option), use no-op lock or cpuid, which is apparently very slow.
> > >
> > > Yes, I agree this is the safest if you really want a real read write
> > > barrier that can be used anywhere, with least common denominator code
> > > on x86, using a single binary image. But the penalty needs to be the
> > > same on Windows and Linux.
> > > >
> > > > Another option would be looking at the operation just after the
> > > > barrier; according to this [1], any lock-prefixed instruction will
> > > > work on x86, so instead of a lock and wasted instruction, we could
> > > > lock with the next instruction. Though, this may only be something a
> > > > JIT would do.
> > >
> > > Not all instructions can take the lock prefix that causes an assert on
> > > the bus, which is what you need for the instruction to serialize, so
> > > this "may" not always work.
> > > >
> > > > What does the EM64T mean?
> > > > Isn't that just saying P4? I'm still not
> > > > quite clear on the differences between EM6T, IPF, x86_64/x64, etc?
> > >
> > > I agree, there are just too many acronyms that seem to be ever
> > > changing!  EM64T here indicates 64 bit x86, also called AMD64
> > > elsewhere, or x64. It does have a seperate binary for us, and for it
> > > we use the VS2005 compiler. (For reference, your Linux part of the
> > > patch seperates this out very nicely). Unfortunately, for VS2005, we
> > > can't do __asm{ mfence } etc. since Microsoft does not support inline
> > > assembly in this new compiler. So a seperate .asm file ( above ) is
> > > needed. The build file will have to be modified to assemble this( see
> > > other port/atomic/win etc. )directories, sorry I didn't add the build
> > > change. So the above code will use mfence only on EM64T ( as in Linux
> > > ) , but use the "lock add..." on all 32 bit x86 platforms. IPF  ( also
> > > referred to as iA64 ) is Itanium, and we only support it on Linux
> > > where you use "asm volatile("mf:::memory" which is correct. We don't
> > > need to worry about it on Windows.
> > >
> > > >How
> > > > do these map to the builds? There's not a separate EM64T end state,
> > > > right?
> > > >
> > > > If that does just mean a P4, I don't think this code snippet is
> > > > correct, unless we're going to start separating our builds that way.
> > > >
> > > > [1] http://gee.cs.oswego.edu/dl/jmm/cookbook.html
> > > >
> > > > -Nathan
> > > >
> > > > On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > > > > Nathan,
> > > > >    I was going through your patch and had a small suggestion. We had
> > > > > discussed on other related threads that on Windows we wanted to use
> > > > > the MSVC provided _ReadWriteBarrier intrinsic and the patch does that,
> > > > > but this may not be enough. This generates no instruction, but is a
> > > > > compiler fence only. x86 does not need a store fence, but for smp the
> > > > > load fence can be certainly needed.
> > > > >   That's the assumption on Linux where in apr_memory_rw_barrier() we
> > > > > insert both a compiler and an instruction fence, eg.,
> > > > >
> > > > > apr_memory_rw_barrier()
> > > > > {
> > > > > ...
> > > > > #else
> > > > > asm volatile ("lock; addl $0, 0(%%esp)":::"memory");
> > > > > ...
> > > > > }
> > > > >
> > > > > I agree that since there is no mfence on PIII, probably the dummy lock
> > > > > is the safest option for x86 wide code. But  "lock" is expensive,
> > > > > specially for Pentium 4 , Centrino etc. which have the "mfence"
> > > > > instruction. And this penalty on current platforms is only on Linux.
> > > > > My suggestion is to do the same on Windows as well. If we need the
> > > > > real fence, we need it on both....So on Windows...
> > > > >
> > > > > On Windows in apr_thread_ext.c:
> > > > > ================================
> > > > > #if _MSC_VER < 1400
> > > > >         extern void _ReadWriteBarrier();
> > > > > #else
> > > > >         #include <intrin.h>
> > > > > #endif
> > > > > #pragma intrinsic (_ReadWriteBarrier)
> > > > >
> > > > > APR_DECLARE(void) apr_memory_rw_barrier(){
> > > > > #ifdef _EM64T_
> > > > >         // create rwfence64() in asm file port\src\thread\win\rwfence.asm
> > > > >         rwfence64();
> > > > > #else
> > > > >         __asm {lock add [esp], 0 }
> > > > > #endif
> > > > >        _ReadWriteBarrier();
> > > > > }
> > > > >
> > > > > In port\src\thread\win\rwfence.asm:
> > > > > ===================================
> > > > > PUBLIC rwfence64
> > > > >
> > > > > _TEXT SEGMENT
> > > > >
> > > > > rwfence64 PROC
> > > > >
> > > > >         mfence
> > > > >
> > > > >
> > > > > PROC ENDP
> > > > > TEXT ENDS
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Thanks,
> > > > > Rana
> > > > >
> > > > > On 4/17/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > > > I think I have this resolved with this commit [1]. I'm now able to run
> > > > > > DRLVM in interpreted mode on my Quad P3 ... at least a 'Hello World'
> > > > > > app.
> > > > > >
> > > > > > -Nathan
> > > > > >
> > > > > > [1] http://svn.apache.org/viewvc?view=rev&rev=529880
> > > > > >
> > > > > > On 4/18/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > > > > I've run into another SSE2 operation in DRLVM, this time it's in the
> > > > > > > hythr library, but I can't find the specific instance. According to
> > > > > > > GDB, it's happening in fast_thread_array() in libhythr.so.
> > > > > > >
> > > > > > > Is this from the 'apr_thread_ext.c' file? I'm going to attempt to
> > > > > > > modify this to be like the changes I made to the 'atomics.cpp', so
> > > > > > > that this can run on P3 (SSE) CPUs.
> > > > > > >
> > > > > > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > > > > > ./java -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > > > > > Illegal instruction
> > > > > > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > > > > > gdb --args ./java -Xint -cp /home/nathan/workspace/helloworld/bin
> > > > > > > HelloWorld
> > > > > > > GNU gdb 6.6-debian
> > > > > > > Copyright (C) 2006 Free Software Foundation, Inc.
> > > > > > > GDB is free software, covered by the GNU General Public License, and you are
> > > > > > > welcome to change it and/or distribute copies of it under certain conditions.
> > > > > > > Type "show copying" to see the conditions.
> > > > > > > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > > > > > > This GDB was configured as "i486-linux-gnu"...
> > > > > > > Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
> > > > > > > (gdb) run
> > > > > > > Starting program:
> > > > > > > /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/java
> > > > > > > -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > > > > > [Thread debugging using libthread_db enabled]
> > > > > > > [New Thread -1214834000 (LWP 32414)]
> > > > > > > [New Thread -1215992944 (LWP 32417)]
> > > > > > > [New process 32414]
> > > > > > > [New LWP 32414]
> > > > > > >
> > > > > > > Program received signal SIGILL, Illegal instruction.
> > > > > > > 0xb7bc543f in fast_thread_array ()
> > > > > > >    from /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/libhythr.so
> > > > > > > (gdb) x/1i $eip
> > > > > > > 0xb7bc543f <fast_thread_array+510367>:  mfence
> > > > > > > (gdb)
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [drlvm] Another mfence operation??? fast_thread_array???

Posted by Rana Dasgupta <rd...@gmail.com>.
Thanks much Nathan.

On 4/20/07, Nathan Beyer <nd...@apache.org> wrote:
> Okay, it's clear as mud now! :) Thanks.
>
> I'll patch this in now.
>
> -Nathan
>
> On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > On 4/20/07, Nathan Beyer <nd...@apache.org> wrote:
> > > I agree that we need to add the lock as well on windows. BTW - the
> > > locking for RW barriers was pulled from this document [1]. The lock
> > > instruction seemed the best of the three options; use mfence (not an
> > > option), use no-op lock or cpuid, which is apparently very slow.
> >
> > Yes, I agree this is the safest if you really want a real read write
> > barrier that can be used anywhere, with least common denominator code
> > on x86, using a single binary image. But the penalty needs to be the
> > same on Windows and Linux.
> > >
> > > Another option would be looking at the operation just after the
> > > barrier; according to this [1], any lock-prefixed instruction will
> > > work on x86, so instead of a lock and wasted instruction, we could
> > > lock with the next instruction. Though, this may only be something a
> > > JIT would do.
> >
> > Not all instructions can take the lock prefix that causes an assert on
> > the bus, which is what you need for the instruction to serialize, so
> > this "may" not always work.
> > >
> > > What does the EM64T mean?
> > > Isn't that just saying P4? I'm still not
> > > quite clear on the differences between EM6T, IPF, x86_64/x64, etc?
> >
> > I agree, there are just too many acronyms that seem to be ever
> > changing!  EM64T here indicates 64 bit x86, also called AMD64
> > elsewhere, or x64. It does have a seperate binary for us, and for it
> > we use the VS2005 compiler. (For reference, your Linux part of the
> > patch seperates this out very nicely). Unfortunately, for VS2005, we
> > can't do __asm{ mfence } etc. since Microsoft does not support inline
> > assembly in this new compiler. So a seperate .asm file ( above ) is
> > needed. The build file will have to be modified to assemble this( see
> > other port/atomic/win etc. )directories, sorry I didn't add the build
> > change. So the above code will use mfence only on EM64T ( as in Linux
> > ) , but use the "lock add..." on all 32 bit x86 platforms. IPF  ( also
> > referred to as iA64 ) is Itanium, and we only support it on Linux
> > where you use "asm volatile("mf:::memory" which is correct. We don't
> > need to worry about it on Windows.
> >
> > >How
> > > do these map to the builds? There's not a separate EM64T end state,
> > > right?
> > >
> > > If that does just mean a P4, I don't think this code snippet is
> > > correct, unless we're going to start separating our builds that way.
> > >
> > > [1] http://gee.cs.oswego.edu/dl/jmm/cookbook.html
> > >
> > > -Nathan
> > >
> > > On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > > > Nathan,
> > > >    I was going through your patch and had a small suggestion. We had
> > > > discussed on other related threads that on Windows we wanted to use
> > > > the MSVC provided _ReadWriteBarrier intrinsic and the patch does that,
> > > > but this may not be enough. This generates no instruction, but is a
> > > > compiler fence only. x86 does not need a store fence, but for smp the
> > > > load fence can be certainly needed.
> > > >   That's the assumption on Linux where in apr_memory_rw_barrier() we
> > > > insert both a compiler and an instruction fence, eg.,
> > > >
> > > > apr_memory_rw_barrier()
> > > > {
> > > > ...
> > > > #else
> > > > asm volatile ("lock; addl $0, 0(%%esp)":::"memory");
> > > > ...
> > > > }
> > > >
> > > > I agree that since there is no mfence on PIII, probably the dummy lock
> > > > is the safest option for x86 wide code. But  "lock" is expensive,
> > > > specially for Pentium 4 , Centrino etc. which have the "mfence"
> > > > instruction. And this penalty on current platforms is only on Linux.
> > > > My suggestion is to do the same on Windows as well. If we need the
> > > > real fence, we need it on both....So on Windows...
> > > >
> > > > On Windows in apr_thread_ext.c:
> > > > ================================
> > > > #if _MSC_VER < 1400
> > > >         extern void _ReadWriteBarrier();
> > > > #else
> > > >         #include <intrin.h>
> > > > #endif
> > > > #pragma intrinsic (_ReadWriteBarrier)
> > > >
> > > > APR_DECLARE(void) apr_memory_rw_barrier(){
> > > > #ifdef _EM64T_
> > > >         // create rwfence64() in asm file port\src\thread\win\rwfence.asm
> > > >         rwfence64();
> > > > #else
> > > >         __asm {lock add [esp], 0 }
> > > > #endif
> > > >        _ReadWriteBarrier();
> > > > }
> > > >
> > > > In port\src\thread\win\rwfence.asm:
> > > > ===================================
> > > > PUBLIC rwfence64
> > > >
> > > > _TEXT SEGMENT
> > > >
> > > > rwfence64 PROC
> > > >
> > > >         mfence
> > > >
> > > >
> > > > PROC ENDP
> > > > TEXT ENDS
> > > >
> > > > What do you think?
> > > >
> > > > Thanks,
> > > > Rana
> > > >
> > > > On 4/17/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > > I think I have this resolved with this commit [1]. I'm now able to run
> > > > > DRLVM in interpreted mode on my Quad P3 ... at least a 'Hello World'
> > > > > app.
> > > > >
> > > > > -Nathan
> > > > >
> > > > > [1] http://svn.apache.org/viewvc?view=rev&rev=529880
> > > > >
> > > > > On 4/18/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > > > I've run into another SSE2 operation in DRLVM, this time it's in the
> > > > > > hythr library, but I can't find the specific instance. According to
> > > > > > GDB, it's happening in fast_thread_array() in libhythr.so.
> > > > > >
> > > > > > Is this from the 'apr_thread_ext.c' file? I'm going to attempt to
> > > > > > modify this to be like the changes I made to the 'atomics.cpp', so
> > > > > > that this can run on P3 (SSE) CPUs.
> > > > > >
> > > > > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > > > > ./java -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > > > > Illegal instruction
> > > > > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > > > > gdb --args ./java -Xint -cp /home/nathan/workspace/helloworld/bin
> > > > > > HelloWorld
> > > > > > GNU gdb 6.6-debian
> > > > > > Copyright (C) 2006 Free Software Foundation, Inc.
> > > > > > GDB is free software, covered by the GNU General Public License, and you are
> > > > > > welcome to change it and/or distribute copies of it under certain conditions.
> > > > > > Type "show copying" to see the conditions.
> > > > > > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > > > > > This GDB was configured as "i486-linux-gnu"...
> > > > > > Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
> > > > > > (gdb) run
> > > > > > Starting program:
> > > > > > /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/java
> > > > > > -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > > > > [Thread debugging using libthread_db enabled]
> > > > > > [New Thread -1214834000 (LWP 32414)]
> > > > > > [New Thread -1215992944 (LWP 32417)]
> > > > > > [New process 32414]
> > > > > > [New LWP 32414]
> > > > > >
> > > > > > Program received signal SIGILL, Illegal instruction.
> > > > > > 0xb7bc543f in fast_thread_array ()
> > > > > >    from /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/libhythr.so
> > > > > > (gdb) x/1i $eip
> > > > > > 0xb7bc543f <fast_thread_array+510367>:  mfence
> > > > > > (gdb)
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [drlvm] Another mfence operation??? fast_thread_array???

Posted by Nathan Beyer <nd...@apache.org>.
Okay, it's clear as mud now! :) Thanks.

I'll patch this in now.

-Nathan

On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> On 4/20/07, Nathan Beyer <nd...@apache.org> wrote:
> > I agree that we need to add the lock as well on windows. BTW - the
> > locking for RW barriers was pulled from this document [1]. The lock
> > instruction seemed the best of the three options; use mfence (not an
> > option), use no-op lock or cpuid, which is apparently very slow.
>
> Yes, I agree this is the safest if you really want a real read write
> barrier that can be used anywhere, with least common denominator code
> on x86, using a single binary image. But the penalty needs to be the
> same on Windows and Linux.
> >
> > Another option would be looking at the operation just after the
> > barrier; according to this [1], any lock-prefixed instruction will
> > work on x86, so instead of a lock and wasted instruction, we could
> > lock with the next instruction. Though, this may only be something a
> > JIT would do.
>
> Not all instructions can take the lock prefix that causes an assert on
> the bus, which is what you need for the instruction to serialize, so
> this "may" not always work.
> >
> > What does the EM64T mean?
> > Isn't that just saying P4? I'm still not
> > quite clear on the differences between EM6T, IPF, x86_64/x64, etc?
>
> I agree, there are just too many acronyms that seem to be ever
> changing!  EM64T here indicates 64 bit x86, also called AMD64
> elsewhere, or x64. It does have a seperate binary for us, and for it
> we use the VS2005 compiler. (For reference, your Linux part of the
> patch seperates this out very nicely). Unfortunately, for VS2005, we
> can't do __asm{ mfence } etc. since Microsoft does not support inline
> assembly in this new compiler. So a seperate .asm file ( above ) is
> needed. The build file will have to be modified to assemble this( see
> other port/atomic/win etc. )directories, sorry I didn't add the build
> change. So the above code will use mfence only on EM64T ( as in Linux
> ) , but use the "lock add..." on all 32 bit x86 platforms. IPF  ( also
> referred to as iA64 ) is Itanium, and we only support it on Linux
> where you use "asm volatile("mf:::memory" which is correct. We don't
> need to worry about it on Windows.
>
> >How
> > do these map to the builds? There's not a separate EM64T end state,
> > right?
> >
> > If that does just mean a P4, I don't think this code snippet is
> > correct, unless we're going to start separating our builds that way.
> >
> > [1] http://gee.cs.oswego.edu/dl/jmm/cookbook.html
> >
> > -Nathan
> >
> > On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > > Nathan,
> > >    I was going through your patch and had a small suggestion. We had
> > > discussed on other related threads that on Windows we wanted to use
> > > the MSVC provided _ReadWriteBarrier intrinsic and the patch does that,
> > > but this may not be enough. This generates no instruction, but is a
> > > compiler fence only. x86 does not need a store fence, but for smp the
> > > load fence can be certainly needed.
> > >   That's the assumption on Linux where in apr_memory_rw_barrier() we
> > > insert both a compiler and an instruction fence, eg.,
> > >
> > > apr_memory_rw_barrier()
> > > {
> > > ...
> > > #else
> > > asm volatile ("lock; addl $0, 0(%%esp)":::"memory");
> > > ...
> > > }
> > >
> > > I agree that since there is no mfence on PIII, probably the dummy lock
> > > is the safest option for x86 wide code. But  "lock" is expensive,
> > > specially for Pentium 4 , Centrino etc. which have the "mfence"
> > > instruction. And this penalty on current platforms is only on Linux.
> > > My suggestion is to do the same on Windows as well. If we need the
> > > real fence, we need it on both....So on Windows...
> > >
> > > On Windows in apr_thread_ext.c:
> > > ================================
> > > #if _MSC_VER < 1400
> > >         extern void _ReadWriteBarrier();
> > > #else
> > >         #include <intrin.h>
> > > #endif
> > > #pragma intrinsic (_ReadWriteBarrier)
> > >
> > > APR_DECLARE(void) apr_memory_rw_barrier(){
> > > #ifdef _EM64T_
> > >         // create rwfence64() in asm file port\src\thread\win\rwfence.asm
> > >         rwfence64();
> > > #else
> > >         __asm {lock add [esp], 0 }
> > > #endif
> > >        _ReadWriteBarrier();
> > > }
> > >
> > > In port\src\thread\win\rwfence.asm:
> > > ===================================
> > > PUBLIC rwfence64
> > >
> > > _TEXT SEGMENT
> > >
> > > rwfence64 PROC
> > >
> > >         mfence
> > >
> > >
> > > PROC ENDP
> > > TEXT ENDS
> > >
> > > What do you think?
> > >
> > > Thanks,
> > > Rana
> > >
> > > On 4/17/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > I think I have this resolved with this commit [1]. I'm now able to run
> > > > DRLVM in interpreted mode on my Quad P3 ... at least a 'Hello World'
> > > > app.
> > > >
> > > > -Nathan
> > > >
> > > > [1] http://svn.apache.org/viewvc?view=rev&rev=529880
> > > >
> > > > On 4/18/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > > I've run into another SSE2 operation in DRLVM, this time it's in the
> > > > > hythr library, but I can't find the specific instance. According to
> > > > > GDB, it's happening in fast_thread_array() in libhythr.so.
> > > > >
> > > > > Is this from the 'apr_thread_ext.c' file? I'm going to attempt to
> > > > > modify this to be like the changes I made to the 'atomics.cpp', so
> > > > > that this can run on P3 (SSE) CPUs.
> > > > >
> > > > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > > > ./java -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > > > Illegal instruction
> > > > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > > > gdb --args ./java -Xint -cp /home/nathan/workspace/helloworld/bin
> > > > > HelloWorld
> > > > > GNU gdb 6.6-debian
> > > > > Copyright (C) 2006 Free Software Foundation, Inc.
> > > > > GDB is free software, covered by the GNU General Public License, and you are
> > > > > welcome to change it and/or distribute copies of it under certain conditions.
> > > > > Type "show copying" to see the conditions.
> > > > > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > > > > This GDB was configured as "i486-linux-gnu"...
> > > > > Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
> > > > > (gdb) run
> > > > > Starting program:
> > > > > /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/java
> > > > > -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > > > [Thread debugging using libthread_db enabled]
> > > > > [New Thread -1214834000 (LWP 32414)]
> > > > > [New Thread -1215992944 (LWP 32417)]
> > > > > [New process 32414]
> > > > > [New LWP 32414]
> > > > >
> > > > > Program received signal SIGILL, Illegal instruction.
> > > > > 0xb7bc543f in fast_thread_array ()
> > > > >    from /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/libhythr.so
> > > > > (gdb) x/1i $eip
> > > > > 0xb7bc543f <fast_thread_array+510367>:  mfence
> > > > > (gdb)
> > > > >
> > > >
> > >
> >
>

Re: [drlvm] Another mfence operation??? fast_thread_array???

Posted by Rana Dasgupta <rd...@gmail.com>.
On 4/20/07, Nathan Beyer <nd...@apache.org> wrote:
> I agree that we need to add the lock as well on windows. BTW - the
> locking for RW barriers was pulled from this document [1]. The lock
> instruction seemed the best of the three options; use mfence (not an
> option), use no-op lock or cpuid, which is apparently very slow.

Yes, I agree this is the safest if you really want a real read write
barrier that can be used anywhere, with least common denominator code
on x86, using a single binary image. But the penalty needs to be the
same on Windows and Linux.
>
> Another option would be looking at the operation just after the
> barrier; according to this [1], any lock-prefixed instruction will
> work on x86, so instead of a lock and wasted instruction, we could
> lock with the next instruction. Though, this may only be something a
> JIT would do.

Not all instructions can take the lock prefix that causes an assert on
the bus, which is what you need for the instruction to serialize, so
this "may" not always work.
>
> What does the EM64T mean?
> Isn't that just saying P4? I'm still not
> quite clear on the differences between EM6T, IPF, x86_64/x64, etc?

I agree, there are just too many acronyms that seem to be ever
changing!  EM64T here indicates 64 bit x86, also called AMD64
elsewhere, or x64. It does have a seperate binary for us, and for it
we use the VS2005 compiler. (For reference, your Linux part of the
patch seperates this out very nicely). Unfortunately, for VS2005, we
can't do __asm{ mfence } etc. since Microsoft does not support inline
assembly in this new compiler. So a seperate .asm file ( above ) is
needed. The build file will have to be modified to assemble this( see
other port/atomic/win etc. )directories, sorry I didn't add the build
change. So the above code will use mfence only on EM64T ( as in Linux
) , but use the "lock add..." on all 32 bit x86 platforms. IPF  ( also
referred to as iA64 ) is Itanium, and we only support it on Linux
where you use "asm volatile("mf:::memory" which is correct. We don't
need to worry about it on Windows.

>How
> do these map to the builds? There's not a separate EM64T end state,
> right?
>
> If that does just mean a P4, I don't think this code snippet is
> correct, unless we're going to start separating our builds that way.
>
> [1] http://gee.cs.oswego.edu/dl/jmm/cookbook.html
>
> -Nathan
>
> On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> > Nathan,
> >    I was going through your patch and had a small suggestion. We had
> > discussed on other related threads that on Windows we wanted to use
> > the MSVC provided _ReadWriteBarrier intrinsic and the patch does that,
> > but this may not be enough. This generates no instruction, but is a
> > compiler fence only. x86 does not need a store fence, but for smp the
> > load fence can be certainly needed.
> >   That's the assumption on Linux where in apr_memory_rw_barrier() we
> > insert both a compiler and an instruction fence, eg.,
> >
> > apr_memory_rw_barrier()
> > {
> > ...
> > #else
> > asm volatile ("lock; addl $0, 0(%%esp)":::"memory");
> > ...
> > }
> >
> > I agree that since there is no mfence on PIII, probably the dummy lock
> > is the safest option for x86 wide code. But  "lock" is expensive,
> > specially for Pentium 4 , Centrino etc. which have the "mfence"
> > instruction. And this penalty on current platforms is only on Linux.
> > My suggestion is to do the same on Windows as well. If we need the
> > real fence, we need it on both....So on Windows...
> >
> > On Windows in apr_thread_ext.c:
> > ================================
> > #if _MSC_VER < 1400
> >         extern void _ReadWriteBarrier();
> > #else
> >         #include <intrin.h>
> > #endif
> > #pragma intrinsic (_ReadWriteBarrier)
> >
> > APR_DECLARE(void) apr_memory_rw_barrier(){
> > #ifdef _EM64T_
> >         // create rwfence64() in asm file port\src\thread\win\rwfence.asm
> >         rwfence64();
> > #else
> >         __asm {lock add [esp], 0 }
> > #endif
> >        _ReadWriteBarrier();
> > }
> >
> > In port\src\thread\win\rwfence.asm:
> > ===================================
> > PUBLIC rwfence64
> >
> > _TEXT SEGMENT
> >
> > rwfence64 PROC
> >
> >         mfence
> >
> >
> > PROC ENDP
> > TEXT ENDS
> >
> > What do you think?
> >
> > Thanks,
> > Rana
> >
> > On 4/17/07, Nathan Beyer <nd...@apache.org> wrote:
> > > I think I have this resolved with this commit [1]. I'm now able to run
> > > DRLVM in interpreted mode on my Quad P3 ... at least a 'Hello World'
> > > app.
> > >
> > > -Nathan
> > >
> > > [1] http://svn.apache.org/viewvc?view=rev&rev=529880
> > >
> > > On 4/18/07, Nathan Beyer <nd...@apache.org> wrote:
> > > > I've run into another SSE2 operation in DRLVM, this time it's in the
> > > > hythr library, but I can't find the specific instance. According to
> > > > GDB, it's happening in fast_thread_array() in libhythr.so.
> > > >
> > > > Is this from the 'apr_thread_ext.c' file? I'm going to attempt to
> > > > modify this to be like the changes I made to the 'atomics.cpp', so
> > > > that this can run on P3 (SSE) CPUs.
> > > >
> > > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > > ./java -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > > Illegal instruction
> > > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > > gdb --args ./java -Xint -cp /home/nathan/workspace/helloworld/bin
> > > > HelloWorld
> > > > GNU gdb 6.6-debian
> > > > Copyright (C) 2006 Free Software Foundation, Inc.
> > > > GDB is free software, covered by the GNU General Public License, and you are
> > > > welcome to change it and/or distribute copies of it under certain conditions.
> > > > Type "show copying" to see the conditions.
> > > > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > > > This GDB was configured as "i486-linux-gnu"...
> > > > Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
> > > > (gdb) run
> > > > Starting program:
> > > > /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/java
> > > > -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > > [Thread debugging using libthread_db enabled]
> > > > [New Thread -1214834000 (LWP 32414)]
> > > > [New Thread -1215992944 (LWP 32417)]
> > > > [New process 32414]
> > > > [New LWP 32414]
> > > >
> > > > Program received signal SIGILL, Illegal instruction.
> > > > 0xb7bc543f in fast_thread_array ()
> > > >    from /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/libhythr.so
> > > > (gdb) x/1i $eip
> > > > 0xb7bc543f <fast_thread_array+510367>:  mfence
> > > > (gdb)
> > > >
> > >
> >
>

Re: [drlvm] Another mfence operation??? fast_thread_array???

Posted by Nathan Beyer <nd...@apache.org>.
I agree that we need to add the lock as well on windows. BTW - the
locking for RW barriers was pulled from this document [1]. The lock
instruction seemed the best of the three options; use mfence (not an
option), use no-op lock or cpuid, which is apparently very slow.

Another option would be looking at the operation just after the
barrier; according to this [1], any lock-prefixed instruction will
work on x86, so instead of a lock and wasted instruction, we could
lock with the next instruction. Though, this may only be something a
JIT would do.

What does the EM64T mean? Isn't that just saying P4? I'm still not
quite clear on the differences between EM6T, IPF, x86_64/x64, etc? How
do these map to the builds? There's not a separate EM64T end state,
right?

If that does just mean a P4, I don't think this code snippet is
correct, unless we're going to start separating our builds that way.

[1] http://gee.cs.oswego.edu/dl/jmm/cookbook.html

-Nathan

On 4/20/07, Rana Dasgupta <rd...@gmail.com> wrote:
> Nathan,
>    I was going through your patch and had a small suggestion. We had
> discussed on other related threads that on Windows we wanted to use
> the MSVC provided _ReadWriteBarrier intrinsic and the patch does that,
> but this may not be enough. This generates no instruction, but is a
> compiler fence only. x86 does not need a store fence, but for smp the
> load fence can be certainly needed.
>   That's the assumption on Linux where in apr_memory_rw_barrier() we
> insert both a compiler and an instruction fence, eg.,
>
> apr_memory_rw_barrier()
> {
> ...
> #else
> asm volatile ("lock; addl $0, 0(%%esp)":::"memory");
> ...
> }
>
> I agree that since there is no mfence on PIII, probably the dummy lock
> is the safest option for x86 wide code. But  "lock" is expensive,
> specially for Pentium 4 , Centrino etc. which have the "mfence"
> instruction. And this penalty on current platforms is only on Linux.
> My suggestion is to do the same on Windows as well. If we need the
> real fence, we need it on both....So on Windows...
>
> On Windows in apr_thread_ext.c:
> ================================
> #if _MSC_VER < 1400
>         extern void _ReadWriteBarrier();
> #else
>         #include <intrin.h>
> #endif
> #pragma intrinsic (_ReadWriteBarrier)
>
> APR_DECLARE(void) apr_memory_rw_barrier(){
> #ifdef _EM64T_
>         // create rwfence64() in asm file port\src\thread\win\rwfence.asm
>         rwfence64();
> #else
>         __asm {lock add [esp], 0 }
> #endif
>        _ReadWriteBarrier();
> }
>
> In port\src\thread\win\rwfence.asm:
> ===================================
> PUBLIC rwfence64
>
> _TEXT SEGMENT
>
> rwfence64 PROC
>
>         mfence
>
>
> PROC ENDP
> TEXT ENDS
>
> What do you think?
>
> Thanks,
> Rana
>
> On 4/17/07, Nathan Beyer <nd...@apache.org> wrote:
> > I think I have this resolved with this commit [1]. I'm now able to run
> > DRLVM in interpreted mode on my Quad P3 ... at least a 'Hello World'
> > app.
> >
> > -Nathan
> >
> > [1] http://svn.apache.org/viewvc?view=rev&rev=529880
> >
> > On 4/18/07, Nathan Beyer <nd...@apache.org> wrote:
> > > I've run into another SSE2 operation in DRLVM, this time it's in the
> > > hythr library, but I can't find the specific instance. According to
> > > GDB, it's happening in fast_thread_array() in libhythr.so.
> > >
> > > Is this from the 'apr_thread_ext.c' file? I'm going to attempt to
> > > modify this to be like the changes I made to the 'atomics.cpp', so
> > > that this can run on P3 (SSE) CPUs.
> > >
> > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > ./java -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > Illegal instruction
> > > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > > gdb --args ./java -Xint -cp /home/nathan/workspace/helloworld/bin
> > > HelloWorld
> > > GNU gdb 6.6-debian
> > > Copyright (C) 2006 Free Software Foundation, Inc.
> > > GDB is free software, covered by the GNU General Public License, and you are
> > > welcome to change it and/or distribute copies of it under certain conditions.
> > > Type "show copying" to see the conditions.
> > > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > > This GDB was configured as "i486-linux-gnu"...
> > > Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
> > > (gdb) run
> > > Starting program:
> > > /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/java
> > > -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > > [Thread debugging using libthread_db enabled]
> > > [New Thread -1214834000 (LWP 32414)]
> > > [New Thread -1215992944 (LWP 32417)]
> > > [New process 32414]
> > > [New LWP 32414]
> > >
> > > Program received signal SIGILL, Illegal instruction.
> > > 0xb7bc543f in fast_thread_array ()
> > >    from /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/libhythr.so
> > > (gdb) x/1i $eip
> > > 0xb7bc543f <fast_thread_array+510367>:  mfence
> > > (gdb)
> > >
> >
>

Re: [drlvm] Another mfence operation??? fast_thread_array???

Posted by Rana Dasgupta <rd...@gmail.com>.
Nathan,
   I was going through your patch and had a small suggestion. We had
discussed on other related threads that on Windows we wanted to use
the MSVC provided _ReadWriteBarrier intrinsic and the patch does that,
but this may not be enough. This generates no instruction, but is a
compiler fence only. x86 does not need a store fence, but for smp the
load fence can be certainly needed.
  That's the assumption on Linux where in apr_memory_rw_barrier() we
insert both a compiler and an instruction fence, eg.,

apr_memory_rw_barrier()
{
...
#else
asm volatile ("lock; addl $0, 0(%%esp)":::"memory");
...
}

I agree that since there is no mfence on PIII, probably the dummy lock
is the safest option for x86 wide code. But  "lock" is expensive,
specially for Pentium 4 , Centrino etc. which have the "mfence"
instruction. And this penalty on current platforms is only on Linux.
My suggestion is to do the same on Windows as well. If we need the
real fence, we need it on both....So on Windows...

On Windows in apr_thread_ext.c:
================================
#if _MSC_VER < 1400
	extern void _ReadWriteBarrier();
#else
	#include <intrin.h>
#endif
#pragma intrinsic (_ReadWriteBarrier)

APR_DECLARE(void) apr_memory_rw_barrier(){
#ifdef _EM64T_
	// create rwfence64() in asm file port\src\thread\win\rwfence.asm
	rwfence64();
#else
	__asm {lock add [esp], 0 }
#endif
       _ReadWriteBarrier();
}

In port\src\thread\win\rwfence.asm:
===================================
PUBLIC rwfence64

_TEXT SEGMENT

rwfence64 PROC

	mfence


PROC ENDP
TEXT ENDS

What do you think?

Thanks,
Rana

On 4/17/07, Nathan Beyer <nd...@apache.org> wrote:
> I think I have this resolved with this commit [1]. I'm now able to run
> DRLVM in interpreted mode on my Quad P3 ... at least a 'Hello World'
> app.
>
> -Nathan
>
> [1] http://svn.apache.org/viewvc?view=rev&rev=529880
>
> On 4/18/07, Nathan Beyer <nd...@apache.org> wrote:
> > I've run into another SSE2 operation in DRLVM, this time it's in the
> > hythr library, but I can't find the specific instance. According to
> > GDB, it's happening in fast_thread_array() in libhythr.so.
> >
> > Is this from the 'apr_thread_ext.c' file? I'm going to attempt to
> > modify this to be like the changes I made to the 'atomics.cpp', so
> > that this can run on P3 (SSE) CPUs.
> >
> > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > ./java -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > Illegal instruction
> > nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> > gdb --args ./java -Xint -cp /home/nathan/workspace/helloworld/bin
> > HelloWorld
> > GNU gdb 6.6-debian
> > Copyright (C) 2006 Free Software Foundation, Inc.
> > GDB is free software, covered by the GNU General Public License, and you are
> > welcome to change it and/or distribute copies of it under certain conditions.
> > Type "show copying" to see the conditions.
> > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > This GDB was configured as "i486-linux-gnu"...
> > Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
> > (gdb) run
> > Starting program:
> > /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/java
> > -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> > [Thread debugging using libthread_db enabled]
> > [New Thread -1214834000 (LWP 32414)]
> > [New Thread -1215992944 (LWP 32417)]
> > [New process 32414]
> > [New LWP 32414]
> >
> > Program received signal SIGILL, Illegal instruction.
> > 0xb7bc543f in fast_thread_array ()
> >    from /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/libhythr.so
> > (gdb) x/1i $eip
> > 0xb7bc543f <fast_thread_array+510367>:  mfence
> > (gdb)
> >
>

Re: [drlvm] Another mfence operation??? fast_thread_array???

Posted by Nathan Beyer <nd...@apache.org>.
I think I have this resolved with this commit [1]. I'm now able to run
DRLVM in interpreted mode on my Quad P3 ... at least a 'Hello World'
app.

-Nathan

[1] http://svn.apache.org/viewvc?view=rev&rev=529880

On 4/18/07, Nathan Beyer <nd...@apache.org> wrote:
> I've run into another SSE2 operation in DRLVM, this time it's in the
> hythr library, but I can't find the specific instance. According to
> GDB, it's happening in fast_thread_array() in libhythr.so.
>
> Is this from the 'apr_thread_ext.c' file? I'm going to attempt to
> modify this to be like the changes I made to the 'atomics.cpp', so
> that this can run on P3 (SSE) CPUs.
>
> nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> ./java -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> Illegal instruction
> nathan@pants:~/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin$
> gdb --args ./java -Xint -cp /home/nathan/workspace/helloworld/bin
> HelloWorld
> GNU gdb 6.6-debian
> Copyright (C) 2006 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i486-linux-gnu"...
> Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
> (gdb) run
> Starting program:
> /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/java
> -Xint -cp /home/nathan/workspace/helloworld/bin HelloWorld
> [Thread debugging using libthread_db enabled]
> [New Thread -1214834000 (LWP 32414)]
> [New Thread -1215992944 (LWP 32417)]
> [New process 32414]
> [New LWP 32414]
>
> Program received signal SIGILL, Illegal instruction.
> 0xb7bc543f in fast_thread_array ()
>    from /home/nathan/harmony/drlvm-trunk/build/lnx_ia32_gcc_debug/deploy/jdk/jre/bin/libhythr.so
> (gdb) x/1i $eip
> 0xb7bc543f <fast_thread_array+510367>:  mfence
> (gdb)
>