You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Uwe Schindler <us...@apache.org> on 2013/03/05 22:48:40 UTC

JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Hi,

since a few month we are extensively testing various preview builds of JDK 8 for compatibility with Apache Lucene and Solr, so we can find any bugs early and prevent the problems we had with the release of Java 7 two years ago. Currently we have a Linux (Ubuntu 64bit) Jenkins machine that has various JDKs (JDK 6, JDK 7, JDK 8 snapshot, IBM J9, older JRockit) installed, choosing a different one with different hotspot and garbage collector settings on every run of the test suite (which takes approx. 30-45 minutes).

JDK 8 b79 works so far very well on Linux, we found some strange behavior in early versions (maybe compiler errors), but no longer at the moment. There is one configuration that constantly and reproducibly hangs in one module that is tested: The configuration uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client does not matter). The JVM running the tests hangs irresponsible (jstack or kill -3 have no effect/cannot connect, standard kill does not stop it, only kill -9 actually kills it). It can be reproduced in this Lucene module 100% (it hangs always).

I was able to connect with GDB to the JVM and get a stack trace on all threads (see attachment, dump.txt). As you see all threads of G1GC seem to hang in a syscall (os:park(), a conditional wait in pthread library). Unfortunately that’s all I can give you. A Java stacktrace is not possible because the JVM reacts on neither kill -3 nor jstack. With all other garbage collectors it passes the test without hangs in a few seconds, with 32 bit G1GC it can stand still for hours. The 64 bit JVM passes with G1GC, so only the 32 bit variant is affected. Client or Server VM makes no difference.

To reproduce:
- Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this should not matter)
- Download Lucene Source code (e.g. the snapshot version we were testing with: https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/)
- change to directory lucene/analysis/uima and run:
	ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3 -Dtests.jvms=1 test
After a while the test framework prints "stalled" messages (because the child VM actually running the test no longer responds). The PID is also printed. Try to get a stack trace or kill it, no response. Only kill -9 helps. Choosing another garbage collector in the above command line makes the test finish after a few seconds, e.g. -Dargs="-server -XX:+UseConcMarkSweepGC"

I posted this bug report directly to the mailing list, because with earlier bug reports, there seem to be a problem with bugs.sun.com - there is no response from any reviewer after several weeks and we were able to help to find and fix javadoc and javac-compiler bugs early. So I hope you can help for this bug, too.

Uwe

-----
Uwe Schindler
uschindler@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/



RE: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Uwe Schindler <us...@apache.org>.
Thanks, I'll try to use jstack with -F or -m!

Uwe

-----
Uwe Schindler
uschindler@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/


> -----Original Message-----
> From: Krystal Mok [mailto:rednaxelafx@gmail.com]
> Sent: Wednesday, March 06, 2013 7:08 AM
> To: Uwe Schindler
> Cc: hotspot-gc-dev@openjdk.java.net; hotspot-dev@openjdk.java.net;
> Dawid Weiss; dev@lucene.apache.org
> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
> 
> Hi Uwe,
> 
> If you can attach gdb onto it, and jstack -m and jstack -F should also work;
> that'll get you the Java stack trace.
> (But it probably doesn't matter in this case, because the hang is probably bug
> in the VM).
> 
> - Kris
> 
> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler <us...@apache.org>
> wrote:
> > Hi,
> >
> > since a few month we are extensively testing various preview builds of JDK
> 8 for compatibility with Apache Lucene and Solr, so we can find any bugs
> early and prevent the problems we had with the release of Java 7 two years
> ago. Currently we have a Linux (Ubuntu 64bit) Jenkins machine that has
> various JDKs (JDK 6, JDK 7, JDK 8 snapshot, IBM J9, older JRockit) installed,
> choosing a different one with different hotspot and garbage collector
> settings on every run of the test suite (which takes approx. 30-45 minutes).
> >
> > JDK 8 b79 works so far very well on Linux, we found some strange behavior
> in early versions (maybe compiler errors), but no longer at the moment.
> There is one configuration that constantly and reproducibly hangs in one
> module that is tested: The configuration uses JDK 8 b79 (same for b78), 32
> bit, and G1GC (server or client does not matter). The JVM running the tests
> hangs irresponsible (jstack or kill -3 have no effect/cannot connect, standard
> kill does not stop it, only kill -9 actually kills it). It can be reproduced in this
> Lucene module 100% (it hangs always).
> >
> > I was able to connect with GDB to the JVM and get a stack trace on all
> threads (see attachment, dump.txt). As you see all threads of G1GC seem to
> hang in a syscall (os:park(), a conditional wait in pthread library).
> Unfortunately that’s all I can give you. A Java stacktrace is not possible
> because the JVM reacts on neither kill -3 nor jstack. With all other garbage
> collectors it passes the test without hangs in a few seconds, with 32 bit G1GC
> it can stand still for hours. The 64 bit JVM passes with G1GC, so only the 32 bit
> variant is affected. Client or Server VM makes no difference.
> >
> > To reproduce:
> > - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this
> > should not matter)
> > - Download Lucene Source code (e.g. the snapshot version we were
> > testing with:
> > https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/luc
> > ene/dist/)
> > - change to directory lucene/analysis/uima and run:
> >         ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
> > -Dtests.jvms=1 test After a while the test framework prints "stalled"
> messages (because the child VM actually running the test no longer
> responds). The PID is also printed. Try to get a stack trace or kill it, no
> response. Only kill -9 helps. Choosing another garbage collector in the above
> command line makes the test finish after a few seconds, e.g. -Dargs="-server
> -XX:+UseConcMarkSweepGC"
> >
> > I posted this bug report directly to the mailing list, because with earlier bug
> reports, there seem to be a problem with bugs.sun.com - there is no
> response from any reviewer after several weeks and we were able to help to
> find and fix javadoc and javac-compiler bugs early. So I hope you can help for
> this bug, too.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > uschindler@apache.org
> > Apache Lucene PMC Member / Committer
> > Bremen, Germany
> > http://lucene.apache.org/
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Thomas Schatzl <th...@oracle.com>.
Hi,

On Wed, 2013-03-06 at 20:23 +1000, David Holmes wrote:
> On 6/03/2013 5:55 PM, Dawid Weiss wrote:
> >
> > Here you go:
> > http://pastebin.com/raw.php?i=b2PHLm1e
> 
> Thanks. I would have to say this seems to be the suspicious part:
> 
> Thread 22 (Thread 0xf20ffb40 (LWP 22939)):
[...]
>     from 
> /var/lib/jenkins/tools/java/32bit/jdk1.8.0-ea-b79/jre/lib/i386/server/libjvm.so
> #6  0xf6b5ea41 in ConcurrentG1RefineThread::run_young_rs_sampling() ()
>     from 
> The suspendible thread set logic looks 'tricky". Time for the G1 experts 
> to take over. :)

The young gen rs sampling thread is a thread that does some statistical
updates while the application is running. So that in the STW pause not
so much work needs to be done.

At a safepoint it is always suspended, this is normal.

As Bengt mentioned, the problem seems to be thread 10, which is the VM
thread (the one responsible for bringing everything to a safepoint and
then distributing work).

According to the stack trace, this thread seems to be waiting for
synchronization with the marking threads because of a mark stack
overflow during weak reference processing.

However all marking threads are already waiting due to the safepointing
operation, and so it waits endlessly.

As Bengt mentioned, this thread shouldn't be waiting, and shouldn't need
to because it seems to be the only thread working on weak references
anyway (i.e. this phase is single threaded).

(All imo)

Thomas



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by David Holmes <da...@oracle.com>.
On 6/03/2013 5:55 PM, Dawid Weiss wrote:
>
> Here you go:
> http://pastebin.com/raw.php?i=b2PHLm1e

Thanks. I would have to say this seems to be the suspicious part:

Thread 22 (Thread 0xf20ffb40 (LWP 22939)):
#0  0xf7743430 in __kernel_vsyscall ()
#1  0xf771e96b in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/i386-linux-gnu/libpthread.so.0
#2  0xf6ec849c in os::PlatformEvent::park() ()
    from 
/var/lib/jenkins/tools/java/32bit/jdk1.8.0-ea-b79/jre/lib/i386/server/libjvm.so
#3  0xf6e98b82 in Monitor::IWait(Thread*, long long) ()
    from 
/var/lib/jenkins/tools/java/32bit/jdk1.8.0-ea-b79/jre/lib/i386/server/libjvm.so
#4  0xf6e99370 in Monitor::wait(bool, long, bool) ()
    from 
/var/lib/jenkins/tools/java/32bit/jdk1.8.0-ea-b79/jre/lib/i386/server/libjvm.so
#5  0xf6b5fb16 in SuspendibleThreadSet::join() ()
    from 
/var/lib/jenkins/tools/java/32bit/jdk1.8.0-ea-b79/jre/lib/i386/server/libjvm.so
#6  0xf6b5ea41 in ConcurrentG1RefineThread::run_young_rs_sampling() ()
    from 
/var/lib/jenkins/tools/java/32bit/jdk1.8.0-ea-b79/jre/lib/i386/server/libjvm.so
#7  0xf6b5ef91 in ConcurrentG1RefineThread::run() ()

The suspendible thread set logic looks 'tricky". Time for the G1 experts 
to take over. :)

David

> Dawid
>
> On Wed, Mar 6, 2013 at 8:52 AM, David Holmes <david.holmes@oracle.com
> <ma...@oracle.com>> wrote:
>
>     If the VM is completely unresponsive then it suggests we are at a
>     safepoint.
>
>     The GC threads are not "hung" in os::parK, they are parked - waiting
>     to be notified of something.
>
>     The thing is to find out why they are not being woken up.
>
>     Can the gdb log be posted somewhere? I don't know if the attachment
>     made it to the original posting on hotspot-gc but it's no longer
>     available on hotspot-dev.
>
>     Thanks,
>     David
>
>
>     On 6/03/2013 4:07 PM, Krystal Mok wrote:
>
>         Hi Uwe,
>
>         If you can attach gdb onto it, and jstack -m and jstack -F
>         should also
>         work; that'll get you the Java stack trace.
>         (But it probably doesn't matter in this case, because the hang is
>         probably bug in the VM).
>
>         - Kris
>
>         On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler
>         <uschindler@apache.org <ma...@apache.org>> wrote:
>
>             Hi,
>
>             since a few month we are extensively testing various preview
>             builds of JDK 8 for compatibility with Apache Lucene and
>             Solr, so we can find any bugs early and prevent the problems
>             we had with the release of Java 7 two years ago. Currently
>             we have a Linux (Ubuntu 64bit) Jenkins machine that has
>             various JDKs (JDK 6, JDK 7, JDK 8 snapshot, IBM J9, older
>             JRockit) installed, choosing a different one with different
>             hotspot and garbage collector settings on every run of the
>             test suite (which takes approx. 30-45 minutes).
>
>             JDK 8 b79 works so far very well on Linux, we found some
>             strange behavior in early versions (maybe compiler errors),
>             but no longer at the moment. There is one configuration that
>             constantly and reproducibly hangs in one module that is
>             tested: The configuration uses JDK 8 b79 (same for b78), 32
>             bit, and G1GC (server or client does not matter). The JVM
>             running the tests hangs irresponsible (jstack or kill -3
>             have no effect/cannot connect, standard kill does not stop
>             it, only kill -9 actually kills it). It can be reproduced in
>             this Lucene module 100% (it hangs always).
>
>             I was able to connect with GDB to the JVM and get a stack
>             trace on all threads (see attachment, dump.txt). As you see
>             all threads of G1GC seem to hang in a syscall (os:park(), a
>             conditional wait in pthread library). Unfortunately that’s
>             all I can give you. A Java stacktrace is not possible
>             because the JVM reacts on neither kill -3 nor jstack. With
>             all other garbage collectors it passes the test without
>             hangs in a few seconds, with 32 bit G1GC it can stand still
>             for hours. The 64 bit JVM passes with G1GC, so only the 32
>             bit variant is affected. Client or Server VM makes no
>             difference.
>
>             To reproduce:
>             - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but
>             this should not matter)
>             - Download Lucene Source code (e.g. the snapshot version we
>             were testing with:
>             https://builds.apache.org/job/__Lucene-Artifacts-trunk/2212/__artifact/lucene/dist/
>             <https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/>)
>             - change to directory lucene/analysis/uima and run:
>                       ant -Dargs="-server -XX:+UseG1GC"
>             -Dtests.multiplier=3 -Dtests.jvms=1 test
>             After a while the test framework prints "stalled" messages
>             (because the child VM actually running the test no longer
>             responds). The PID is also printed. Try to get a stack trace
>             or kill it, no response. Only kill -9 helps. Choosing
>             another garbage collector in the above command line makes
>             the test finish after a few seconds, e.g. -Dargs="-server
>             -XX:+UseConcMarkSweepGC"
>
>             I posted this bug report directly to the mailing list,
>             because with earlier bug reports, there seem to be a problem
>             with bugs.sun.com <http://bugs.sun.com> - there is no
>             response from any reviewer after several weeks and we were
>             able to help to find and fix javadoc and javac-compiler bugs
>             early. So I hope you can help for this bug, too.
>
>             Uwe
>
>             -----
>             Uwe Schindler
>             uschindler@apache.org <ma...@apache.org>
>             Apache Lucene PMC Member / Committer
>             Bremen, Germany
>             http://lucene.apache.org/
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Dawid Weiss <dw...@apache.org>.
Here you go:
http://pastebin.com/raw.php?i=b2PHLm1e

Dawid

On Wed, Mar 6, 2013 at 8:52 AM, David Holmes <da...@oracle.com>wrote:

> If the VM is completely unresponsive then it suggests we are at a
> safepoint.
>
> The GC threads are not "hung" in os::parK, they are parked - waiting to be
> notified of something.
>
> The thing is to find out why they are not being woken up.
>
> Can the gdb log be posted somewhere? I don't know if the attachment made
> it to the original posting on hotspot-gc but it's no longer available on
> hotspot-dev.
>
> Thanks,
> David
>
>
> On 6/03/2013 4:07 PM, Krystal Mok wrote:
>
>> Hi Uwe,
>>
>> If you can attach gdb onto it, and jstack -m and jstack -F should also
>> work; that'll get you the Java stack trace.
>> (But it probably doesn't matter in this case, because the hang is
>> probably bug in the VM).
>>
>> - Kris
>>
>> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler <us...@apache.org>
>> wrote:
>>
>>> Hi,
>>>
>>> since a few month we are extensively testing various preview builds of
>>> JDK 8 for compatibility with Apache Lucene and Solr, so we can find any
>>> bugs early and prevent the problems we had with the release of Java 7 two
>>> years ago. Currently we have a Linux (Ubuntu 64bit) Jenkins machine that
>>> has various JDKs (JDK 6, JDK 7, JDK 8 snapshot, IBM J9, older JRockit)
>>> installed, choosing a different one with different hotspot and garbage
>>> collector settings on every run of the test suite (which takes approx.
>>> 30-45 minutes).
>>>
>>> JDK 8 b79 works so far very well on Linux, we found some strange
>>> behavior in early versions (maybe compiler errors), but no longer at the
>>> moment. There is one configuration that constantly and reproducibly hangs
>>> in one module that is tested: The configuration uses JDK 8 b79 (same for
>>> b78), 32 bit, and G1GC (server or client does not matter). The JVM running
>>> the tests hangs irresponsible (jstack or kill -3 have no effect/cannot
>>> connect, standard kill does not stop it, only kill -9 actually kills it).
>>> It can be reproduced in this Lucene module 100% (it hangs always).
>>>
>>> I was able to connect with GDB to the JVM and get a stack trace on all
>>> threads (see attachment, dump.txt). As you see all threads of G1GC seem to
>>> hang in a syscall (os:park(), a conditional wait in pthread library).
>>> Unfortunately that’s all I can give you. A Java stacktrace is not possible
>>> because the JVM reacts on neither kill -3 nor jstack. With all other
>>> garbage collectors it passes the test without hangs in a few seconds, with
>>> 32 bit G1GC it can stand still for hours. The 64 bit JVM passes with G1GC,
>>> so only the 32 bit variant is affected. Client or Server VM makes no
>>> difference.
>>>
>>> To reproduce:
>>> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this should
>>> not matter)
>>> - Download Lucene Source code (e.g. the snapshot version we were testing
>>> with: https://builds.apache.org/job/**Lucene-Artifacts-trunk/2212/**
>>> artifact/lucene/dist/<https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/>
>>> )
>>> - change to directory lucene/analysis/uima and run:
>>>          ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
>>> -Dtests.jvms=1 test
>>> After a while the test framework prints "stalled" messages (because the
>>> child VM actually running the test no longer responds). The PID is also
>>> printed. Try to get a stack trace or kill it, no response. Only kill -9
>>> helps. Choosing another garbage collector in the above command line makes
>>> the test finish after a few seconds, e.g. -Dargs="-server
>>> -XX:+UseConcMarkSweepGC"
>>>
>>> I posted this bug report directly to the mailing list, because with
>>> earlier bug reports, there seem to be a problem with bugs.sun.com -
>>> there is no response from any reviewer after several weeks and we were able
>>> to help to find and fix javadoc and javac-compiler bugs early. So I hope
>>> you can help for this bug, too.
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> uschindler@apache.org
>>> Apache Lucene PMC Member / Committer
>>> Bremen, Germany
>>> http://lucene.apache.org/
>>>
>>>
>>>
>

RE: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Uwe Schindler <us...@apache.org>.
Hi,

If you want the command line of the JVM running the tests, there is one trick:

Run the ANT test (the final step in my howto). When it hangs (you should see ANT printing some "tests stalled" messages on stdout, you go to a separate console and kill the worker JVM with kill -9 (the PID is printed by ANT before starting the tests).
Once you killed the JVM, the ANT build should proceed and print debugging information about the "failed" JVM (the one you killed from the other terminal), including the full cmd line with full classpath. So it should be possible to spawn this in isolation using the printed command line.

Uwe

-----
Uwe Schindler
uschindler@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/


> -----Original Message-----
> From: John Cuthbertson [mailto:john.cuthbertson@oracle.com]
> Sent: Wednesday, March 06, 2013 8:21 PM
> To: Uwe Schindler
> Cc: 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net;
> dev@lucene.apache.org
> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
> 
> Hi Uwe,
> 
> Let me try with your detailed instructions below before you go to all of that
> trouble. I will let you know how I get on.
> 
> Thanks,
> 
> JohnC
> 
> On 3/6/2013 11:15 AM, Uwe Schindler wrote:
> > Hi,
> >
> > That's unfortunately not so easy, because of project dependencies. To run
> the test you have to compile Lucene Core then the specific module + the test
> framework (which is special for Lucene) and download some JARs from
> Maven central (JAR hell, as usual).
> > If you give me some time, I would collect all needed JAR files from my local
> checkout and provide you the correct cmd line + a ZIP file with maybe a shell
> script to startup. It should be doable, but needs some work to collect all
> dependencies for the classpath.
> >
> > If you want to do it quicker (should be quite fast to do):
> > - Download ANT 1.8.2 binary zip (unfortunately ANT 1.8.4 has a bug making
> it not working out of the box with Java 8):
> http://archive.apache.org/dist/ant/binaries/apache-ant-1.8.2-bin.tar.gz - I
> just wonder about the fact: isn't ANT needed to build the JDK classlib by
> itself? I remember that the FreeBSD OpenJDK build downloads ANT and does
> a large part of the compilation using ANT...
> > - put the ANT bin/ dir into your PATH
> > - download the Apache Lucene source code from Jenkins:
> > https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/luc
> > ene/dist/lucene-5.0-2013-03-05_15-37-06-src.tgz
> > - go to extracted lucene source dir, call "ant ivy-bootstrap" (this
> > will download Apache IVY, so all dependencies can be downloaded from
> > Maven Central)
> > - change to the module that fails: # cd analysis/uima
> > - execute: # ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
> > -Dtests.jvms=1 test
> > - In a parallel console you might be able to attach to the process, the build
> in the main console using ANT runs inside ANT and the test framework
> spawns separate worker instances of the JVM to execute the tests. This
> makes it hard to reproduce in standalone (the command line passed to the
> child JVM is veeeeery long).
> >
> > I will work on putting together a precompiled ZIP file with all needed JARs +
> the command line. Just tell me if you got it managed with the above howto,
> then I don’t need to do this.
> > Uwe
> >
> > -----
> > Uwe Schindler
> > uschindler@apache.org
> > Apache Lucene PMC Member / Committer
> > Bremen, Germany
> > http://lucene.apache.org/
> >
> >
> >> -----Original Message-----
> >> From: John Cuthbertson [mailto:john.cuthbertson@oracle.com]
> >> Sent: Wednesday, March 06, 2013 7:51 PM
> >> To: Uwe Schindler
> >> Cc: 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net;
> >> dev@lucene.apache.org
> >> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32
> >> bit)
> >>
> >> Hi Uwe,
> >>
> >> I've downloaded  lucene-5.0-2013-03-05_15-37-06.zip from
> >> https://builds.apache.org/job/Lucene-Artifacts-
> >> trunk/2212/artifact/lucene/dist/
> >>
> >> I don't have ant on my workstation so do you have a java command line
> >> to run the test(s) that generate the error?
> >>
> >> Thanks,
> >>
> >> JohnC
> >>
> >> On 3/6/2013 3:16 AM, Uwe Schindler wrote:
> >>> Hi,
> >>>
> >>>> I think this is a VM bug and the thread dumps that Uwe produced are
> >>>> enough to start tracking down the root cause.
> >>> I hope it is enough! If I can help with more details, tell me what I
> >>> should do
> >> to track this down. Unfortunately, we have no isolated test case
> >> (like a small java class that triggers this bug) - you have to run
> >> the test cases of this Lucene's module. It only happens there, not in
> >> any other Lucene test suite. It may be caused by a lot of GC activity in this
> "UIMA" module or a specific test.
> >>>> On 3/6/13 8:52 AM, David Holmes wrote:
> >>>>> If the VM is completely unresponsive then it suggests we are at a
> >>>>> safepoint.
> >>>> Yes, we are hanging during a stop-the-world GC, so we are at a
> safepoint.
> >>>>
> >>>>> The GC threads are not "hung" in os::parK, they are parked -
> >>>>> waiting to be notified of something.
> >>>> It looks like the reference processing thread is stuck in a loop
> >>>> where it does wait(). So, the VM is hanging even if that stack
> >>>> trace also ends up in os::park().
> >>>>
> >>>>> The thing is to find out why they are not being woken up.
> >>>> Actually, in this case we should probably not even be calling wait...
> >>>>
> >>>>> Can the gdb log be posted somewhere? I don't know if the
> >>>>> attachment made it to the original posting on hotspot-gc but it's
> >>>>> no longer available on hotspot-dev.
> >>>> I received the attachment with the original email. I've attached it
> >>>> to the bug report that I created: 8009536. You can find it there if
> >>>> you want to. But I think we have a fairly good idea of what change
> >>>> caused the hang.
> >>> If it helps: Unfortunately, we had some problems with recent JDK
> >>> builds,
> >> because javac and javadoc tools were not working correctly, failing
> >> to build our source code. Since b78 this was fixed. Until this was
> >> fixed, we used build
> >> b65 (which was the last one working) and the G1GC hangs did not
> >> appear on this version. So it must have happened by a change after b65 till
> b78.
> >>> Uwe
> >>>
> >>>> Bengt
> >>>>
> >>>>> Thanks,
> >>>>> David
> >>>>>
> >>>>> On 6/03/2013 4:07 PM, Krystal Mok wrote:
> >>>>>> Hi Uwe,
> >>>>>>
> >>>>>> If you can attach gdb onto it, and jstack -m and jstack -F should
> >>>>>> also work; that'll get you the Java stack trace.
> >>>>>> (But it probably doesn't matter in this case, because the hang is
> >>>>>> probably bug in the VM).
> >>>>>>
> >>>>>> - Kris
> >>>>>>
> >>>>>> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler
> >>>> <us...@apache.org>
> >>>>>> wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> since a few month we are extensively testing various preview
> >>>>>>> builds of JDK 8 for compatibility with Apache Lucene and Solr,
> >>>>>>> so we can find any bugs early and prevent the problems we had
> >>>>>>> with the release of Java 7 two years ago. Currently we have a
> >>>>>>> Linux (Ubuntu 64bit) Jenkins machine that has various JDKs (JDK
> >>>>>>> 6, JDK 7, JDK 8 snapshot, IBM J9, older JRockit) installed,
> >>>>>>> choosing a different one with different hotspot and garbage
> >>>>>>> collector settings on every run of the test suite (which takes
> >>>>>>> approx. 30-45
> >> minutes).
> >>>>>>> JDK 8 b79 works so far very well on Linux, we found some strange
> >>>>>>> behavior in early versions (maybe compiler errors), but no
> >>>>>>> longer at the moment. There is one configuration that constantly
> >>>>>>> and reproducibly hangs in one module that is tested: The
> >>>>>>> configuration uses JDK 8 b79 (same for b78), 32 bit, and G1GC
> >>>>>>> (server or client does not matter). The JVM running the tests
> >>>>>>> hangs irresponsible (jstack or kill -3 have no effect/cannot
> >>>>>>> connect, standard kill does not stop it, only kill -9 actually
> >>>>>>> kills it). It can be reproduced in this Lucene module 100% (it hangs
> always).
> >>>>>>>
> >>>>>>> I was able to connect with GDB to the JVM and get a stack trace
> >>>>>>> on all threads (see attachment, dump.txt). As you see all
> >>>>>>> threads of G1GC seem to hang in a syscall (os:park(), a
> >>>>>>> conditional wait in pthread library). Unfortunately that’s all I
> >>>>>>> can give you. A Java stacktrace is not possible because the JVM
> >>>>>>> reacts on neither kill
> >>>>>>> -3 nor jstack. With all other garbage collectors it passes the
> >>>>>>> test without hangs in a few seconds, with 32 bit G1GC it can
> >>>>>>> stand still for hours. The 64 bit JVM passes with G1GC, so only
> >>>>>>> the 32 bit variant is affected. Client or Server VM makes no
> difference.
> >>>>>>>
> >>>>>>> To reproduce:
> >>>>>>> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but
> >>>>>>> this should not matter)
> >>>>>>> - Download Lucene Source code (e.g. the snapshot version we
> were
> >>>>>>> testing with:
> >>>>>>> https://builds.apache.org/job/Lucene-Artifacts-
> >>>> trunk/2212/artifact/lucene/dist/)
> >>>>>>> - change to directory lucene/analysis/uima and run:
> >>>>>>>            ant -Dargs="-server -XX:+UseG1GC"
> >>>>>>> -Dtests.multiplier=3
> >>>>>>> -Dtests.jvms=1 test
> >>>>>>> After a while the test framework prints "stalled" messages
> >>>>>>> (because the child VM actually running the test no longer
> >>>>>>> responds). The PID is also printed. Try to get a stack trace or
> >>>>>>> kill it, no
> >> response.
> >>>>>>> Only kill -9 helps. Choosing another garbage collector in the
> >>>>>>> above command line makes the test finish after a few seconds, e.g.
> >>>>>>> -Dargs="-server -XX:+UseConcMarkSweepGC"
> >>>>>>>
> >>>>>>> I posted this bug report directly to the mailing list, because
> >>>>>>> with earlier bug reports, there seem to be a problem with
> >>>>>>> bugs.sun.com - there is no response from any reviewer after
> >>>>>>> several weeks and we were able to help to find and fix javadoc
> >>>>>>> and javac-compiler bugs early. So I hope you can help for this bug,
> too.
> >>>>>>>
> >>>>>>> Uwe
> >>>>>>>
> >>>>>>> -----
> >>>>>>> Uwe Schindler
> >>>>>>> uschindler@apache.org
> >>>>>>> Apache Lucene PMC Member / Committer Bremen, Germany
> >>>>>>> http://lucene.apache.org/
> >>>>>>>
> >>>>>>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by John Cuthbertson <jo...@oracle.com>.
Hi Uwe,

Let me try with your detailed instructions below before you go to all of 
that trouble. I will let you know how I get on.

Thanks,

JohnC

On 3/6/2013 11:15 AM, Uwe Schindler wrote:
> Hi,
>
> That's unfortunately not so easy, because of project dependencies. To run the test you have to compile Lucene Core then the specific module + the test framework (which is special for Lucene) and download some JARs from Maven central (JAR hell, as usual).
> If you give me some time, I would collect all needed JAR files from my local checkout and provide you the correct cmd line + a ZIP file with maybe a shell script to startup. It should be doable, but needs some work to collect all dependencies for the classpath.
>
> If you want to do it quicker (should be quite fast to do):
> - Download ANT 1.8.2 binary zip (unfortunately ANT 1.8.4 has a bug making it not working out of the box with Java 8): http://archive.apache.org/dist/ant/binaries/apache-ant-1.8.2-bin.tar.gz - I just wonder about the fact: isn't ANT needed to build the JDK classlib by itself? I remember that the FreeBSD OpenJDK build downloads ANT and does a large part of the compilation using ANT...
> - put the ANT bin/ dir into your PATH
> - download the Apache Lucene source code from Jenkins: https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/lucene-5.0-2013-03-05_15-37-06-src.tgz
> - go to extracted lucene source dir, call "ant ivy-bootstrap" (this will download Apache IVY, so all dependencies can be downloaded from Maven Central)
> - change to the module that fails: # cd analysis/uima
> - execute: # ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3 -Dtests.jvms=1 test
> - In a parallel console you might be able to attach to the process, the build in the main console using ANT runs inside ANT and the test framework spawns separate worker instances of the JVM to execute the tests. This makes it hard to reproduce in standalone (the command line passed to the child JVM is veeeeery long).
>
> I will work on putting together a precompiled ZIP file with all needed JARs + the command line. Just tell me if you got it managed with the above howto, then I don’t need to do this.
> Uwe
>
> -----
> Uwe Schindler
> uschindler@apache.org
> Apache Lucene PMC Member / Committer
> Bremen, Germany
> http://lucene.apache.org/
>
>
>> -----Original Message-----
>> From: John Cuthbertson [mailto:john.cuthbertson@oracle.com]
>> Sent: Wednesday, March 06, 2013 7:51 PM
>> To: Uwe Schindler
>> Cc: 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net;
>> dev@lucene.apache.org
>> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
>>
>> Hi Uwe,
>>
>> I've downloaded  lucene-5.0-2013-03-05_15-37-06.zip from
>> https://builds.apache.org/job/Lucene-Artifacts-
>> trunk/2212/artifact/lucene/dist/
>>
>> I don't have ant on my workstation so do you have a java command line to
>> run the test(s) that generate the error?
>>
>> Thanks,
>>
>> JohnC
>>
>> On 3/6/2013 3:16 AM, Uwe Schindler wrote:
>>> Hi,
>>>
>>>> I think this is a VM bug and the thread dumps that Uwe produced are
>>>> enough to start tracking down the root cause.
>>> I hope it is enough! If I can help with more details, tell me what I should do
>> to track this down. Unfortunately, we have no isolated test case (like a small
>> java class that triggers this bug) - you have to run the test cases of this
>> Lucene's module. It only happens there, not in any other Lucene test suite. It
>> may be caused by a lot of GC activity in this "UIMA" module or a specific test.
>>>> On 3/6/13 8:52 AM, David Holmes wrote:
>>>>> If the VM is completely unresponsive then it suggests we are at a
>>>>> safepoint.
>>>> Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.
>>>>
>>>>> The GC threads are not "hung" in os::parK, they are parked - waiting
>>>>> to be notified of something.
>>>> It looks like the reference processing thread is stuck in a loop
>>>> where it does wait(). So, the VM is hanging even if that stack trace
>>>> also ends up in os::park().
>>>>
>>>>> The thing is to find out why they are not being woken up.
>>>> Actually, in this case we should probably not even be calling wait...
>>>>
>>>>> Can the gdb log be posted somewhere? I don't know if the attachment
>>>>> made it to the original posting on hotspot-gc but it's no longer
>>>>> available on hotspot-dev.
>>>> I received the attachment with the original email. I've attached it
>>>> to the bug report that I created: 8009536. You can find it there if
>>>> you want to. But I think we have a fairly good idea of what change
>>>> caused the hang.
>>> If it helps: Unfortunately, we had some problems with recent JDK builds,
>> because javac and javadoc tools were not working correctly, failing to build
>> our source code. Since b78 this was fixed. Until this was fixed, we used build
>> b65 (which was the last one working) and the G1GC hangs did not appear on
>> this version. So it must have happened by a change after b65 till b78.
>>> Uwe
>>>
>>>> Bengt
>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 6/03/2013 4:07 PM, Krystal Mok wrote:
>>>>>> Hi Uwe,
>>>>>>
>>>>>> If you can attach gdb onto it, and jstack -m and jstack -F should
>>>>>> also work; that'll get you the Java stack trace.
>>>>>> (But it probably doesn't matter in this case, because the hang is
>>>>>> probably bug in the VM).
>>>>>>
>>>>>> - Kris
>>>>>>
>>>>>> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler
>>>> <us...@apache.org>
>>>>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> since a few month we are extensively testing various preview
>>>>>>> builds of JDK 8 for compatibility with Apache Lucene and Solr, so
>>>>>>> we can find any bugs early and prevent the problems we had with
>>>>>>> the release of Java 7 two years ago. Currently we have a Linux
>>>>>>> (Ubuntu 64bit) Jenkins machine that has various JDKs (JDK 6, JDK
>>>>>>> 7, JDK 8 snapshot, IBM J9, older JRockit) installed, choosing a
>>>>>>> different one with different hotspot and garbage collector
>>>>>>> settings on every run of the test suite (which takes approx. 30-45
>> minutes).
>>>>>>> JDK 8 b79 works so far very well on Linux, we found some strange
>>>>>>> behavior in early versions (maybe compiler errors), but no longer
>>>>>>> at the moment. There is one configuration that constantly and
>>>>>>> reproducibly hangs in one module that is tested: The configuration
>>>>>>> uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client
>>>>>>> does not matter). The JVM running the tests hangs irresponsible
>>>>>>> (jstack or kill -3 have no effect/cannot connect, standard kill
>>>>>>> does not stop it, only kill -9 actually kills it). It can be
>>>>>>> reproduced in this Lucene module 100% (it hangs always).
>>>>>>>
>>>>>>> I was able to connect with GDB to the JVM and get a stack trace on
>>>>>>> all threads (see attachment, dump.txt). As you see all threads of
>>>>>>> G1GC seem to hang in a syscall (os:park(), a conditional wait in
>>>>>>> pthread library). Unfortunately that’s all I can give you. A Java
>>>>>>> stacktrace is not possible because the JVM reacts on neither kill
>>>>>>> -3 nor jstack. With all other garbage collectors it passes the
>>>>>>> test without hangs in a few seconds, with 32 bit G1GC it can stand
>>>>>>> still for hours. The 64 bit JVM passes with G1GC, so only the 32
>>>>>>> bit variant is affected. Client or Server VM makes no difference.
>>>>>>>
>>>>>>> To reproduce:
>>>>>>> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this
>>>>>>> should not matter)
>>>>>>> - Download Lucene Source code (e.g. the snapshot version we were
>>>>>>> testing with:
>>>>>>> https://builds.apache.org/job/Lucene-Artifacts-
>>>> trunk/2212/artifact/lucene/dist/)
>>>>>>> - change to directory lucene/analysis/uima and run:
>>>>>>>            ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
>>>>>>> -Dtests.jvms=1 test
>>>>>>> After a while the test framework prints "stalled" messages
>>>>>>> (because the child VM actually running the test no longer
>>>>>>> responds). The PID is also printed. Try to get a stack trace or kill it, no
>> response.
>>>>>>> Only kill -9 helps. Choosing another garbage collector in the
>>>>>>> above command line makes the test finish after a few seconds, e.g.
>>>>>>> -Dargs="-server -XX:+UseConcMarkSweepGC"
>>>>>>>
>>>>>>> I posted this bug report directly to the mailing list, because
>>>>>>> with earlier bug reports, there seem to be a problem with
>>>>>>> bugs.sun.com - there is no response from any reviewer after
>>>>>>> several weeks and we were able to help to find and fix javadoc and
>>>>>>> javac-compiler bugs early. So I hope you can help for this bug, too.
>>>>>>>
>>>>>>> Uwe
>>>>>>>
>>>>>>> -----
>>>>>>> Uwe Schindler
>>>>>>> uschindler@apache.org
>>>>>>> Apache Lucene PMC Member / Committer Bremen, Germany
>>>>>>> http://lucene.apache.org/
>>>>>>>
>>>>>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by John Cuthbertson <jo...@oracle.com>.
Hi Dawid,

Thanks,

I'll give it a try. As I said I'm fairly certain I have the fix but want 
to verify it before sending out for review.

JohnC

On 3/7/2013 1:37 AM, Dawid Weiss wrote:
>
>     At the top of the archive there is a repro.sh (or repro.cmd for
>     windows) which reproduces the issue, JDK 1.8 included. I couldn't
>     get it to hang on a 1-cpu linux (vmware). on windows it hangs for
>     me all the time.
>
>
> Update: hangs for me 100% under vmware/ubuntu if I bump the number of 
> cpus to 2. So:
>
> cd /tmp
> wget http://ophelia.cs.put.poznan.pl/~dweiss/download/_g1gc.tar.gz 
> <http://ophelia.cs.put.poznan.pl/%7Edweiss/download/_g1gc.tar.gz>
> tar -zxf _g1gc.tar.gz
> cd _g1gc
> ./repro.sh
>
> Dawid


Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.
> At the top of the archive there is a repro.sh (or repro.cmd for windows)
> which reproduces the issue, JDK 1.8 included. I couldn't get it to hang on
> a 1-cpu linux (vmware). on windows it hangs for me all the time.
>

Update: hangs for me 100% under vmware/ubuntu if I bump the number of cpus
to 2. So:

cd /tmp
wget http://ophelia.cs.put.poznan.pl/~dweiss/download/_g1gc.tar.gz
tar -zxf _g1gc.tar.gz
cd _g1gc
./repro.sh

Dawid

Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.
I think I can help. Fetch this (sorry for putting together the whole world
-- space for time tradeoff):

http://ophelia.cs.put.poznan.pl/~dweiss/download/_g1gc.tar.gz

and unpack to, depending on which system you want to test on:

if on windows:
C:\_g1gc\

if on linux (32-bit!):
/tmp/_g1gc/

At the top of the archive there is a repro.sh (or repro.cmd for windows)
which reproduces the issue, JDK 1.8 included. I couldn't get it to hang on
a 1-cpu linux (vmware). on windows it hangs for me all the time.

Dawid

On Thu, Mar 7, 2013 at 7:44 AM, Uwe Schindler <us...@apache.org> wrote:

> Hi John,****
>
> ** **
>
> I only have time to work on a setup this evening Germen time, because I am
> on a business trip today. Will come back to you. Unfortunately I failed to
> quickly setup an easy classpath without Ivy downloading the JARS. ****
>
> ** **
>
> Uwe****
>
> ** **
>
> -----****
>
> Uwe Schindler****
>
> uschindler@apache.org ****
>
> Apache Lucene PMC Member / Committer****
>
> Bremen, Germany****
>
> http://lucene.apache.org/****
>
> ** **
>
> *From:* John Cuthbertson [mailto:john.cuthbertson@oracle.com]
> *Sent:* Thursday, March 07, 2013 12:49 AM
>
> *To:* Uwe Schindler
> *Cc:* 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net;
> dev@lucene.apache.org
> *Subject:* Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32
> bit)****
>
> ** **
>
> Hi Uwe,
>
> An update:
>
> I have downloaded ant and the lucerne source.
>
> I attempted the ivy-bootstrap but it failed to download the ivy=2.3.0.jar
> file - even after setting:
>
> ANT_OPTS=-Dhttp.proxyHost=<...> -Dhttp.proxyPort=<...>
>
> So I manually downloaded and placed it into the ANT library and now get:
>
>
> ****
>
> ivy-bootstrap1:
>     [mkdir] Skipping /home/jcuthber/.ant/lib because it already exists.
>      [echo] installing ivy 2.3.0 to /home/jcuthber/.ant/lib
>       [get] Getting:
> http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar
>       [get] To: /home/jcuthber/.ant/lib/ivy-2.3.0.jar
>       [get] Error getting
> http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar to
> /home/jcuthber/.ant/lib/ivy-2.3.0.jar
> [available] Found: /home/jcuthber/.ant/lib/ivy-2.3.0.jar
>
> ivy-bootstrap2:
> Skipped because property 'ivy.bootstrap1.success' set.
>
> ivy-checksum:
>
> ivy-bootstrap:
>
> BUILD SUCCESSFUL
> Total time: 3 minutes 46 seconds****
>
> Presumably I have to build the lucerne source before executing the tests.
> That seemed to go OK.
>
> When I run the analysis/uima tests it seems to get hung up at the
> "resolve" target - even without specifying G1:
>
>
> ****
>
> cairnapple{jcuthber}:408> cd analysis/uima/
> cairnapple{jcuthber}:409> ls -l
> total 29
> -rw-r--r--   1 jcuthber staff       1473 Dec 10 10:39 build.xml
> -rw-rw-r--   1 jcuthber staff       6895 Mar  6 15:20 hotspot.log
> -rw-r--r--   1 jcuthber staff       1316 Mar 30  2012 ivy.xml
> drwxr-xr-x   2 jcuthber staff          2 Mar  5 07:42 lib/
> drwxr-xr-x   6 jcuthber staff          6 Mar  5 07:42 src/****
>
>
>
> ****
>
> ivy-configure:
> [ivy:configure] Loading
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivy.properties
> [ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 ::
> http://ant.apache.org/ivy/ ::
> [ivy:configure] jakarta commons httpclient not found: using jdk url
> handling
> [ivy:configure] :: loading settings :: file =
> /export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/ivy-settings.xml
> [ivy:configure] no default ivy user dir defined: set to
> /home/jcuthber/.ivy2
> [ivy:configure] including url:
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-public.xml
> [ivy:configure] no default cache defined: set to /home/jcuthber/.ivy2/cache
> [ivy:configure] including url:
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-shared.xml
> [ivy:configure] including url:
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-local.xml
> [ivy:configure] including url:
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-main-chain.xml
> [ivy:configure] settings loaded (289ms)
> [ivy:configure]         default cache: /home/jcuthber/.ivy2/cache
> [ivy:configure]         default resolver: default
> [ivy:configure]         -- 7 resolvers:
> [ivy:configure]         working-chinese-mirror [ibiblio]
> [ivy:configure]         main [chain] [shared, public]
> [ivy:configure]         local [file]
> [ivy:configure]         shared [file]
> [ivy:configure]         sonatype-releases [ibiblio]
> [ivy:configure]         public [ibiblio]
> [ivy:configure]         default [chain] [local, main, sonatype-releases,
> working-chinese-mirror]
>
> resolve:
> [ivy:retrieve] no resolved descriptor found: launching default resolve
> Overriding previous definition of property "ivy.version"
> [ivy:retrieve] using ivy parser to parse
> file:/export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/analysis/uima/ivy.xml
> [ivy:retrieve] :: resolving dependencies ::
> org.apache.lucene#analyzers-uima;working@cairnapple
> [ivy:retrieve]  confs: [default]
> [ivy:retrieve]  validate = true
> [ivy:retrieve]  refresh = false
> [ivy:retrieve] resolving dependencies for configuration 'default'
> [ivy:retrieve] == resolving dependencies for
> org.apache.lucene#analyzers-uima;working@cairnapple [default]
> [ivy:retrieve] == resolving dependencies
> org.apache.lucene#analyzers-uima;working@cairnapple->org.apache.uima#Tagger;2.3.1
> [default->*]
> [ivy:retrieve] default: Checking cache for: dependency:
> org.apache.uima#Tagger;2.3.1 {*=[*]}
> [ivy:retrieve] don't use cache for org.apache.uima#Tagger;2.3.1:
> checkModified=true
> [ivy:retrieve]          tried
> /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
> [ivy:retrieve]          tried
> /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
> [ivy:retrieve]  local: no ivy file nor artifact found for
> org.apache.uima#Tagger;2.3.1
> [ivy:retrieve] main: Checking cache for: dependency:
> org.apache.uima#Tagger;2.3.1 {*=[*]}
> [ivy:retrieve]          tried
> /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
> [ivy:retrieve]          tried
> /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
> [ivy:retrieve]  shared: no ivy file nor artifact found for
> org.apache.uima#Tagger;2.3.1
> [ivy:retrieve]          tried
> http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom
> ****
>
> and there it hangs - presumably trying to access
> http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom
>
> There must be something with our proxy settings that that won't allow this.
>
> JohnC
>
>
> On 03/06/13 11:15, Uwe Schindler wrote: ****
>
> Hi,****
>
> ** **
>
> That's unfortunately not so easy, because of project dependencies. To run the test you have to compile Lucene Core then the specific module + the test framework (which is special for Lucene) and download some JARs from Maven central (JAR hell, as usual).****
>
> If you give me some time, I would collect all needed JAR files from my local checkout and provide you the correct cmd line + a ZIP file with maybe a shell script to startup. It should be doable, but needs some work to collect all dependencies for the classpath.****
>
> ** **
>
> If you want to do it quicker (should be quite fast to do):****
>
> - Download ANT 1.8.2 binary zip (unfortunately ANT 1.8.4 has a bug making it not working out of the box with Java 8): http://archive.apache.org/dist/ant/binaries/apache-ant-1.8.2-bin.tar.gz - I just wonder about the fact: isn't ANT needed to build the JDK classlib by itself? I remember that the FreeBSD OpenJDK build downloads ANT and does a large part of the compilation using ANT...****
>
> - put the ANT bin/ dir into your PATH****
>
> - download the Apache Lucene source code from Jenkins: https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/lucene-5.0-2013-03-05_15-37-06-src.tgz****
>
> - go to extracted lucene source dir, call "ant ivy-bootstrap" (this will download Apache IVY, so all dependencies can be downloaded from Maven Central)****
>
> - change to the module that fails: # cd analysis/uima****
>
> - execute: # ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3 -Dtests.jvms=1 test****
>
> - In a parallel console you might be able to attach to the process, the build in the main console using ANT runs inside ANT and the test framework spawns separate worker instances of the JVM to execute the tests. This makes it hard to reproduce in standalone (the command line passed to the child JVM is veeeeery long).****
>
> ** **
>
> I will work on putting together a precompiled ZIP file with all needed JARs + the command line. Just tell me if you got it managed with the above howto, then I don’t need to do this.****
>
> Uwe****
>
> ** **
>
> -----****
>
> Uwe Schindler****
>
> uschindler@apache.org ****
>
> Apache Lucene PMC Member / Committer****
>
> Bremen, Germany****
>
> http://lucene.apache.org/****
>
> ** **
>
> ** **
>
>   ****
>
> -----Original Message-----****
>
> From: John Cuthbertson [mailto:john.cuthbertson@oracle.com <jo...@oracle.com>]****
>
> Sent: Wednesday, March 06, 2013 7:51 PM****
>
> To: Uwe Schindler****
>
> Cc: 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net;****
>
> dev@lucene.apache.org****
>
> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)****
>
> ** **
>
> Hi Uwe,****
>
> ** **
>
> I've downloaded  lucene-5.0-2013-03-05_15-37-06.zip from****
>
> https://builds.apache.org/job/Lucene-Artifacts-****
>
> trunk/2212/artifact/lucene/dist/****
>
> ** **
>
> I don't have ant on my workstation so do you have a java command line to****
>
> run the test(s) that generate the error?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> JohnC****
>
> ** **
>
> On 3/6/2013 3:16 AM, Uwe Schindler wrote:****
>
>     ****
>
> Hi,****
>
> ** **
>
>       ****
>
> I think this is a VM bug and the thread dumps that Uwe produced are****
>
> enough to start tracking down the root cause.****
>
>         ****
>
> I hope it is enough! If I can help with more details, tell me what I should do****
>
>
>       ****
>
> to track this down. Unfortunately, we have no isolated test case (like a small****
>
> java class that triggers this bug) - you have to run the test cases of this****
>
> Lucene's module. It only happens there, not in any other Lucene test suite. It****
>
> may be caused by a lot of GC activity in this "UIMA" module or a specific test.****
>
>     ****
>
> On 3/6/13 8:52 AM, David Holmes wrote:****
>
>         ****
>
> If the VM is completely unresponsive then it suggests we are at a****
>
> safepoint.****
>
>           ****
>
> Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.****
>
> ** **
>
>         ****
>
> The GC threads are not "hung" in os::parK, they are parked - waiting****
>
> to be notified of something.****
>
>           ****
>
> It looks like the reference processing thread is stuck in a loop****
>
> where it does wait(). So, the VM is hanging even if that stack trace****
>
> also ends up in os::park().****
>
> ** **
>
>         ****
>
> The thing is to find out why they are not being woken up.****
>
>           ****
>
> Actually, in this case we should probably not even be calling wait...****
>
> ** **
>
>         ****
>
> Can the gdb log be posted somewhere? I don't know if the attachment****
>
> made it to the original posting on hotspot-gc but it's no longer****
>
> available on hotspot-dev.****
>
>           ****
>
> I received the attachment with the original email. I've attached it****
>
> to the bug report that I created: 8009536. You can find it there if****
>
> you want to. But I think we have a fairly good idea of what change****
>
> caused the hang.****
>
>         ****
>
> If it helps: Unfortunately, we had some problems with recent JDK builds,****
>
>       ****
>
> because javac and javadoc tools were not working correctly, failing to build****
>
> our source code. Since b78 this was fixed. Until this was fixed, we used build****
>
> b65 (which was the last one working) and the G1GC hangs did not appear on****
>
> this version. So it must have happened by a change after b65 till b78.****
>
>     ****
>
> Uwe****
>
> ** **
>
>       ****
>
> Bengt****
>
> ** **
>
>         ****
>
> Thanks,****
>
> David****
>
> ** **
>
> On 6/03/2013 4:07 PM, Krystal Mok wrote:****
>
>           ****
>
> Hi Uwe,****
>
> ** **
>
> If you can attach gdb onto it, and jstack -m and jstack -F should****
>
> also work; that'll get you the Java stack trace.****
>
> (But it probably doesn't matter in this case, because the hang is****
>
> probably bug in the VM).****
>
> ** **
>
> - Kris****
>
> ** **
>
> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler****
>
>             ****
>
> <us...@apache.org> <us...@apache.org>****
>
>         ****
>
> wrote:****
>
>             ****
>
> Hi,****
>
> ** **
>
> since a few month we are extensively testing various preview****
>
> builds of JDK 8 for compatibility with Apache Lucene and Solr, so****
>
> we can find any bugs early and prevent the problems we had with****
>
> the release of Java 7 two years ago. Currently we have a Linux****
>
> (Ubuntu 64bit) Jenkins machine that has various JDKs (JDK 6, JDK****
>
> 7, JDK 8 snapshot, IBM J9, older JRockit) installed, choosing a****
>
> different one with different hotspot and garbage collector****
>
> settings on every run of the test suite (which takes approx. 30-45****
>
>               ****
>
> minutes).****
>
>     ****
>
>  JDK 8 b79 works so far very well on Linux, we found some strange****
>
> behavior in early versions (maybe compiler errors), but no longer****
>
> at the moment. There is one configuration that constantly and****
>
> reproducibly hangs in one module that is tested: The configuration****
>
> uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client****
>
> does not matter). The JVM running the tests hangs irresponsible****
>
> (jstack or kill -3 have no effect/cannot connect, standard kill****
>
> does not stop it, only kill -9 actually kills it). It can be****
>
> reproduced in this Lucene module 100% (it hangs always).****
>
> ** **
>
> I was able to connect with GDB to the JVM and get a stack trace on****
>
> all threads (see attachment, dump.txt). As you see all threads of****
>
> G1GC seem to hang in a syscall (os:park(), a conditional wait in****
>
> pthread library). Unfortunately that’s all I can give you. A Java****
>
> stacktrace is not possible because the JVM reacts on neither kill****
>
> -3 nor jstack. With all other garbage collectors it passes the****
>
> test without hangs in a few seconds, with 32 bit G1GC it can stand****
>
> still for hours. The 64 bit JVM passes with G1GC, so only the 32****
>
> bit variant is affected. Client or Server VM makes no difference.****
>
> ** **
>
> To reproduce:****
>
> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this****
>
> should not matter)****
>
> - Download Lucene Source code (e.g. the snapshot version we were****
>
> testing with:****
>
> https://builds.apache.org/job/Lucene-Artifacts-****
>
>               ****
>
> trunk/2212/artifact/lucene/dist/)****
>
>         ****
>
> - change to directory lucene/analysis/uima and run:****
>
>           ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3****
>
> -Dtests.jvms=1 test****
>
> After a while the test framework prints "stalled" messages****
>
> (because the child VM actually running the test no longer****
>
>
> responds). The PID is also printed. Try to get a stack trace or kill it, no****
>
>               ****
>
> response.****
>
>     ****
>
> Only kill -9 helps. Choosing another garbage collector in the****
>
> above command line makes the test finish after a few seconds, e.g.****
>
> -Dargs="-server -XX:+UseConcMarkSweepGC"****
>
> ** **
>
> I posted this bug report directly to the mailing list, because****
>
> with earlier bug reports, there seem to be a problem with****
>
> bugs.sun.com - there is no response from any reviewer after****
>
> several weeks and we were able to help to find and fix javadoc and****
>
> javac-compiler bugs early. So I hope you can help for this bug, too.****
>
> ** **
>
> Uwe****
>
> ** **
>
> -----****
>
> Uwe Schindler****
>
> uschindler@apache.org****
>
> Apache Lucene PMC Member / Committer Bremen, Germany****
>
> http://lucene.apache.org/****
>
> ** **
>
> ** **
>
>               ****
>
> ** **
>
>   ****
>
> ** **
>

RE: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Uwe Schindler <us...@apache.org>.
Hi John,

 

I am back from the trip since a few minutes (Cebit fair in Hannover). David Weiss already sent a solution, I hope that helps. My idea was to send you my own ~/ivy/ cache folder (that contains all downloaded Maven Central artifacts), so ANT/IVY does not need to download them. But I think you can first try Dawid Weiss’ solution (who packed his complete Lucene folder). Also Bengt gave some proxy settings to make it work with Oracle’s proxies, he was able to reproduce on Windows. So I don’t think I need to send another variant for reproducing.

 

You can send the binary/patch to Bengt, who may be able to try it out!

 

Uwe

 

-----

Uwe Schindler

uschindler@apache.org 

Apache Lucene PMC Member / Committer

Bremen, Germany

http://lucene.apache.org/

 

From: John Cuthbertson [mailto:john.cuthbertson@oracle.com] 
Sent: Thursday, March 07, 2013 6:21 PM
To: Uwe Schindler
Cc: 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net; dev@lucene.apache.org
Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

 

Hi Uwe,

Thanks. In the meantime I'm going to ask within oracle if anyone has the magic formula for the proxy settings.

There might able be another test case I can try. IIR we had another application that overflowed during reference processing (if I specified a mark stack size of 16K). I'm going to try that.

I'm fairly certain I have the fix - just want to verify it. I klnow you offered but I don't think we can send out under the table binaries though we can provide patches.

JohnC

On 3/6/2013 10:44 PM, Uwe Schindler wrote:

Hi John,

 

I only have time to work on a setup this evening Germen time, because I am on a business trip today. Will come back to you. Unfortunately I failed to quickly setup an easy classpath without Ivy downloading the JARS. 

 

Uwe

 

-----

Uwe Schindler

uschindler@apache.org 

Apache Lucene PMC Member / Committer

Bremen, Germany

http://lucene.apache.org/

 

From: John Cuthbertson [mailto:john.cuthbertson@oracle.com] 
Sent: Thursday, March 07, 2013 12:49 AM
To: Uwe Schindler
Cc: 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net; dev@lucene.apache.org
Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

 

Hi Uwe,

An update:

I have downloaded ant and the lucerne source.

I attempted the ivy-bootstrap but it failed to download the ivy=2.3.0.jar file - even after setting:

ANT_OPTS=-Dhttp.proxyHost=<...> -Dhttp.proxyPort=<...>

So I manually downloaded and placed it into the ANT library and now get:





ivy-bootstrap1:
    [mkdir] Skipping /home/jcuthber/.ant/lib because it already exists.
     [echo] installing ivy 2.3.0 to /home/jcuthber/.ant/lib
      [get] Getting: http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar
      [get] To: /home/jcuthber/.ant/lib/ivy-2.3.0.jar
      [get] Error getting http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar to /home/jcuthber/.ant/lib/ivy-2.3.0.jar
[available] Found: /home/jcuthber/.ant/lib/ivy-2.3.0.jar

ivy-bootstrap2:
Skipped because property 'ivy.bootstrap1.success' set.

ivy-checksum:

ivy-bootstrap:

BUILD SUCCESSFUL
Total time: 3 minutes 46 seconds

Presumably I have to build the lucerne source before executing the tests. That seemed to go OK.

When I run the analysis/uima tests it seems to get hung up at the "resolve" target - even without specifying G1:





cairnapple{jcuthber}:408> cd analysis/uima/
cairnapple{jcuthber}:409> ls -l
total 29
-rw-r--r--   1 jcuthber staff       1473 Dec 10 10:39 build.xml
-rw-rw-r--   1 jcuthber staff       6895 Mar  6 15:20 hotspot.log
-rw-r--r--   1 jcuthber staff       1316 Mar 30  2012 ivy.xml
drwxr-xr-x   2 jcuthber staff          2 Mar  5 07:42 lib/
drwxr-xr-x   6 jcuthber staff          6 Mar  5 07:42 src/






ivy-configure:
[ivy:configure] Loading jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivy.properties <jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar%21/org/apache/ivy/core/settings/ivy.properties> 
[ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 :: http://ant.apache.org/ivy/ ::
[ivy:configure] jakarta commons httpclient not found: using jdk url handling
[ivy:configure] :: loading settings :: file = /export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/ivy-settings.xml
[ivy:configure] no default ivy user dir defined: set to /home/jcuthber/.ivy2
[ivy:configure] including url: jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-public.xml <jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar%21/org/apache/ivy/core/settings/ivysettings-public.xml> 
[ivy:configure] no default cache defined: set to /home/jcuthber/.ivy2/cache
[ivy:configure] including url: jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-shared.xml <jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar%21/org/apache/ivy/core/settings/ivysettings-shared.xml> 
[ivy:configure] including url: jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-local.xml <jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar%21/org/apache/ivy/core/settings/ivysettings-local.xml> 
[ivy:configure] including url: jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-main-chain.xml <jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar%21/org/apache/ivy/core/settings/ivysettings-main-chain.xml> 
[ivy:configure] settings loaded (289ms)
[ivy:configure]         default cache: /home/jcuthber/.ivy2/cache
[ivy:configure]         default resolver: default
[ivy:configure]         -- 7 resolvers:
[ivy:configure]         working-chinese-mirror [ibiblio]
[ivy:configure]         main [chain] [shared, public]
[ivy:configure]         local [file]
[ivy:configure]         shared [file]
[ivy:configure]         sonatype-releases [ibiblio]
[ivy:configure]         public [ibiblio]
[ivy:configure]         default [chain] [local, main, sonatype-releases, working-chinese-mirror]

resolve:
[ivy:retrieve] no resolved descriptor found: launching default resolve
Overriding previous definition of property "ivy.version"
[ivy:retrieve] using ivy parser to parse file:/export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/analysis/uima/ivy.xml <file:///\\%5C%5Cexport%5Cbugs%5C8009536%5Clucene-5.0-2013-03-05_15-37-06%5Canalysis%5Cuima%5Civy.xml> 
[ivy:retrieve] :: resolving dependencies :: org.apache.lucene#analyzers-uima;working@cairnapple
[ivy:retrieve]  confs: [default]
[ivy:retrieve]  validate = true
[ivy:retrieve]  refresh = false
[ivy:retrieve] resolving dependencies for configuration 'default'
[ivy:retrieve] == resolving dependencies for org.apache.lucene#analyzers-uima;working@cairnapple [default]
[ivy:retrieve] == resolving dependencies org.apache.lucene#analyzers-uima;working@cairnapple->org.apache.uima#Tagger;2.3.1 [default->*]
[ivy:retrieve] default: Checking cache for: dependency: org.apache.uima#Tagger;2.3.1 {*=[*]}
[ivy:retrieve] don't use cache for org.apache.uima#Tagger;2.3.1: checkModified=true
[ivy:retrieve]          tried /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
[ivy:retrieve]          tried /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
[ivy:retrieve]  local: no ivy file nor artifact found for org.apache.uima#Tagger;2.3.1
[ivy:retrieve] main: Checking cache for: dependency: org.apache.uima#Tagger;2.3.1 {*=[*]}
[ivy:retrieve]          tried /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
[ivy:retrieve]          tried /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
[ivy:retrieve]  shared: no ivy file nor artifact found for org.apache.uima#Tagger;2.3.1
[ivy:retrieve]          tried http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom

and there it hangs - presumably trying to access http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom

There must be something with our proxy settings that that won't allow this.

JohnC


On 03/06/13 11:15, Uwe Schindler wrote: 

Hi,
 
That's unfortunately not so easy, because of project dependencies. To run the test you have to compile Lucene Core then the specific module + the test framework (which is special for Lucene) and download some JARs from Maven central (JAR hell, as usual).
If you give me some time, I would collect all needed JAR files from my local checkout and provide you the correct cmd line + a ZIP file with maybe a shell script to startup. It should be doable, but needs some work to collect all dependencies for the classpath.
 
If you want to do it quicker (should be quite fast to do):
- Download ANT 1.8.2 binary zip (unfortunately ANT 1.8.4 has a bug making it not working out of the box with Java 8): http://archive.apache.org/dist/ant/binaries/apache-ant-1.8.2-bin.tar.gz - I just wonder about the fact: isn't ANT needed to build the JDK classlib by itself? I remember that the FreeBSD OpenJDK build downloads ANT and does a large part of the compilation using ANT...
- put the ANT bin/ dir into your PATH
- download the Apache Lucene source code from Jenkins: https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/lucene-5.0-2013-03-05_15-37-06-src.tgz
- go to extracted lucene source dir, call "ant ivy-bootstrap" (this will download Apache IVY, so all dependencies can be downloaded from Maven Central)
- change to the module that fails: # cd analysis/uima
- execute: # ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3 -Dtests.jvms=1 test
- In a parallel console you might be able to attach to the process, the build in the main console using ANT runs inside ANT and the test framework spawns separate worker instances of the JVM to execute the tests. This makes it hard to reproduce in standalone (the command line passed to the child JVM is veeeeery long).
 
I will work on putting together a precompiled ZIP file with all needed JARs + the command line. Just tell me if you got it managed with the above howto, then I don’t need to do this.
Uwe
 
-----
Uwe Schindler
uschindler@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/
 
 
  

-----Original Message-----
From: John Cuthbertson [mailto:john.cuthbertson@oracle.com]
Sent: Wednesday, March 06, 2013 7:51 PM
To: Uwe Schindler
Cc: 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net;
dev@lucene.apache.org
Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
 
Hi Uwe,
 
I've downloaded  lucene-5.0-2013-03-05_15-37-06.zip from
https://builds.apache.org/job/Lucene-Artifacts-
trunk/2212/artifact/lucene/dist/
 
I don't have ant on my workstation so do you have a java command line to
run the test(s) that generate the error?
 
Thanks,
 
JohnC
 
On 3/6/2013 3:16 AM, Uwe Schindler wrote:
    

Hi,
 
      

I think this is a VM bug and the thread dumps that Uwe produced are
enough to start tracking down the root cause.
        

I hope it is enough! If I can help with more details, tell me what I should do
      

to track this down. Unfortunately, we have no isolated test case (like a small
java class that triggers this bug) - you have to run the test cases of this
Lucene's module. It only happens there, not in any other Lucene test suite. It
may be caused by a lot of GC activity in this "UIMA" module or a specific test.
    

On 3/6/13 8:52 AM, David Holmes wrote:
        

If the VM is completely unresponsive then it suggests we are at a
safepoint.
          

Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.
 
        

The GC threads are not "hung" in os::parK, they are parked - waiting
to be notified of something.
          

It looks like the reference processing thread is stuck in a loop
where it does wait(). So, the VM is hanging even if that stack trace
also ends up in os::park().
 
        

The thing is to find out why they are not being woken up.
          

Actually, in this case we should probably not even be calling wait...
 
        

Can the gdb log be posted somewhere? I don't know if the attachment
made it to the original posting on hotspot-gc but it's no longer
available on hotspot-dev.
          

I received the attachment with the original email. I've attached it
to the bug report that I created: 8009536. You can find it there if
you want to. But I think we have a fairly good idea of what change
caused the hang.
        

If it helps: Unfortunately, we had some problems with recent JDK builds,
      

because javac and javadoc tools were not working correctly, failing to build
our source code. Since b78 this was fixed. Until this was fixed, we used build
b65 (which was the last one working) and the G1GC hangs did not appear on
this version. So it must have happened by a change after b65 till b78.
    

Uwe
 
      

Bengt
 
        

Thanks,
David
 
On 6/03/2013 4:07 PM, Krystal Mok wrote:
          

Hi Uwe,
 
If you can attach gdb onto it, and jstack -m and jstack -F should
also work; that'll get you the Java stack trace.
(But it probably doesn't matter in this case, because the hang is
probably bug in the VM).
 
- Kris
 
On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler
            

 <ma...@apache.org> <us...@apache.org>
        

wrote:
            

Hi,
 
since a few month we are extensively testing various preview
builds of JDK 8 for compatibility with Apache Lucene and Solr, so
we can find any bugs early and prevent the problems we had with
the release of Java 7 two years ago. Currently we have a Linux
(Ubuntu 64bit) Jenkins machine that has various JDKs (JDK 6, JDK
7, JDK 8 snapshot, IBM J9, older JRockit) installed, choosing a
different one with different hotspot and garbage collector
settings on every run of the test suite (which takes approx. 30-45
              

minutes).
    

JDK 8 b79 works so far very well on Linux, we found some strange
behavior in early versions (maybe compiler errors), but no longer
at the moment. There is one configuration that constantly and
reproducibly hangs in one module that is tested: The configuration
uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client
does not matter). The JVM running the tests hangs irresponsible
(jstack or kill -3 have no effect/cannot connect, standard kill
does not stop it, only kill -9 actually kills it). It can be
reproduced in this Lucene module 100% (it hangs always).
 
I was able to connect with GDB to the JVM and get a stack trace on
all threads (see attachment, dump.txt). As you see all threads of
G1GC seem to hang in a syscall (os:park(), a conditional wait in
pthread library). Unfortunately that’s all I can give you. A Java
stacktrace is not possible because the JVM reacts on neither kill
-3 nor jstack. With all other garbage collectors it passes the
test without hangs in a few seconds, with 32 bit G1GC it can stand
still for hours. The 64 bit JVM passes with G1GC, so only the 32
bit variant is affected. Client or Server VM makes no difference.
 
To reproduce:
- Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this
should not matter)
- Download Lucene Source code (e.g. the snapshot version we were
testing with:
https://builds.apache.org/job/Lucene-Artifacts-
              

trunk/2212/artifact/lucene/dist/)
        

- change to directory lucene/analysis/uima and run:
          ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
-Dtests.jvms=1 test
After a while the test framework prints "stalled" messages
(because the child VM actually running the test no longer
responds). The PID is also printed. Try to get a stack trace or kill it, no
              

response.
    

Only kill -9 helps. Choosing another garbage collector in the
above command line makes the test finish after a few seconds, e.g.
-Dargs="-server -XX:+UseConcMarkSweepGC"
 
I posted this bug report directly to the mailing list, because
with earlier bug reports, there seem to be a problem with
bugs.sun.com - there is no response from any reviewer after
several weeks and we were able to help to find and fix javadoc and
javac-compiler bugs early. So I hope you can help for this bug, too.
 
Uwe
 
-----
Uwe Schindler
uschindler@apache.org
Apache Lucene PMC Member / Committer Bremen, Germany
http://lucene.apache.org/
 
 
              

 
  

 

 


Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by John Cuthbertson <jo...@oracle.com>.
Hi Uwe,

Thanks. In the meantime I'm going to ask within oracle if anyone has the 
magic formula for the proxy settings.

There might able be another test case I can try. IIR we had another 
application that overflowed during reference processing (if I specified 
a mark stack size of 16K). I'm going to try that.

I'm fairly certain I have the fix - just want to verify it. I klnow you 
offered but I don't think we can send out under the table binaries 
though we can provide patches.

JohnC

On 3/6/2013 10:44 PM, Uwe Schindler wrote:
>
> Hi John,
>
> I only have time to work on a setup this evening Germen time, because 
> I am on a business trip today. Will come back to you. Unfortunately I 
> failed to quickly setup an easy classpath without Ivy downloading the 
> JARS.
>
> Uwe
>
> -----
>
> Uwe Schindler
>
> uschindler@apache.org
>
> Apache Lucene PMC Member / Committer
>
> Bremen, Germany
>
> http://lucene.apache.org/
>
> *From:*John Cuthbertson [mailto:john.cuthbertson@oracle.com]
> *Sent:* Thursday, March 07, 2013 12:49 AM
> *To:* Uwe Schindler
> *Cc:* 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net; 
> dev@lucene.apache.org
> *Subject:* Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 
> 32 bit)
>
> Hi Uwe,
>
> An update:
>
> I have downloaded ant and the lucerne source.
>
> I attempted the ivy-bootstrap but it failed to download the 
> ivy=2.3.0.jar file - even after setting:
>
> ANT_OPTS=-Dhttp.proxyHost=<...> -Dhttp.proxyPort=<...>
>
> So I manually downloaded and placed it into the ANT library and now get:
>
>
> ivy-bootstrap1:
>     [mkdir] Skipping /home/jcuthber/.ant/lib because it already exists.
>      [echo] installing ivy 2.3.0 to /home/jcuthber/.ant/lib
>       [get] Getting: 
> http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar
>       [get] To: /home/jcuthber/.ant/lib/ivy-2.3.0.jar
>       [get] Error getting 
> http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar 
> to /home/jcuthber/.ant/lib/ivy-2.3.0.jar
> [available] Found: /home/jcuthber/.ant/lib/ivy-2.3.0.jar
>
> ivy-bootstrap2:
> Skipped because property 'ivy.bootstrap1.success' set.
>
> ivy-checksum:
>
> ivy-bootstrap:
>
> BUILD SUCCESSFUL
> Total time: 3 minutes 46 seconds
>
> Presumably I have to build the lucerne source before executing the 
> tests. That seemed to go OK.
>
> When I run the analysis/uima tests it seems to get hung up at the 
> "resolve" target - even without specifying G1:
>
>
> cairnapple{jcuthber}:408> cd analysis/uima/
> cairnapple{jcuthber}:409> ls -l
> total 29
> -rw-r--r--   1 jcuthber staff       1473 Dec 10 10:39 build.xml
> -rw-rw-r--   1 jcuthber staff       6895 Mar  6 15:20 hotspot.log
> -rw-r--r--   1 jcuthber staff       1316 Mar 30  2012 ivy.xml
> drwxr-xr-x   2 jcuthber staff          2 Mar  5 07:42 lib/
> drwxr-xr-x   6 jcuthber staff          6 Mar  5 07:42 src/
>
>
>
> ivy-configure:
> [ivy:configure] Loading 
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivy.properties 
> <jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar%21/org/apache/ivy/core/settings/ivy.properties>
> [ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 :: 
> http://ant.apache.org/ivy/ ::
> [ivy:configure] jakarta commons httpclient not found: using jdk url 
> handling
> [ivy:configure] :: loading settings :: file = 
> /export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/ivy-settings.xml
> [ivy:configure] no default ivy user dir defined: set to 
> /home/jcuthber/.ivy2
> [ivy:configure] including url: 
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-public.xml 
> <jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar%21/org/apache/ivy/core/settings/ivysettings-public.xml>
> [ivy:configure] no default cache defined: set to 
> /home/jcuthber/.ivy2/cache
> [ivy:configure] including url: 
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-shared.xml 
> <jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar%21/org/apache/ivy/core/settings/ivysettings-shared.xml>
> [ivy:configure] including url: 
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-local.xml 
> <jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar%21/org/apache/ivy/core/settings/ivysettings-local.xml>
> [ivy:configure] including url: 
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-main-chain.xml 
> <jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar%21/org/apache/ivy/core/settings/ivysettings-main-chain.xml>
> [ivy:configure] settings loaded (289ms)
> [ivy:configure]         default cache: /home/jcuthber/.ivy2/cache
> [ivy:configure]         default resolver: default
> [ivy:configure]         -- 7 resolvers:
> [ivy:configure]         working-chinese-mirror [ibiblio]
> [ivy:configure]         main [chain] [shared, public]
> [ivy:configure]         local [file]
> [ivy:configure]         shared [file]
> [ivy:configure]         sonatype-releases [ibiblio]
> [ivy:configure]         public [ibiblio]
> [ivy:configure]         default [chain] [local, main, 
> sonatype-releases, working-chinese-mirror]
>
> resolve:
> [ivy:retrieve] no resolved descriptor found: launching default resolve
> Overriding previous definition of property "ivy.version"
> [ivy:retrieve] using ivy parser to parse 
> file:/export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/analysis/uima/ivy.xml 
> <file:///%5C%5Cexport%5Cbugs%5C8009536%5Clucene-5.0-2013-03-05_15-37-06%5Canalysis%5Cuima%5Civy.xml>
> [ivy:retrieve] :: resolving dependencies :: 
> org.apache.lucene#analyzers-uima;working@cairnapple
> [ivy:retrieve]  confs: [default]
> [ivy:retrieve]  validate = true
> [ivy:retrieve]  refresh = false
> [ivy:retrieve] resolving dependencies for configuration 'default'
> [ivy:retrieve] == resolving dependencies for 
> org.apache.lucene#analyzers-uima;working@cairnapple [default]
> [ivy:retrieve] == resolving dependencies 
> org.apache.lucene#analyzers-uima;working@cairnapple->org.apache.uima#Tagger;2.3.1 
> [default->*]
> [ivy:retrieve] default: Checking cache for: dependency: 
> org.apache.uima#Tagger;2.3.1 {*=[*]}
> [ivy:retrieve] don't use cache for org.apache.uima#Tagger;2.3.1: 
> checkModified=true
> [ivy:retrieve]          tried 
> /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
> [ivy:retrieve]          tried 
> /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
> [ivy:retrieve]  local: no ivy file nor artifact found for 
> org.apache.uima#Tagger;2.3.1
> [ivy:retrieve] main: Checking cache for: dependency: 
> org.apache.uima#Tagger;2.3.1 {*=[*]}
> [ivy:retrieve]          tried 
> /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
> [ivy:retrieve]          tried 
> /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
> [ivy:retrieve]  shared: no ivy file nor artifact found for 
> org.apache.uima#Tagger;2.3.1
> [ivy:retrieve]          tried 
> http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom
>
> and there it hangs - presumably trying to access 
> http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom
>
> There must be something with our proxy settings that that won't allow 
> this.
>
> JohnC
>
>
> On 03/06/13 11:15, Uwe Schindler wrote:
>
> Hi,
>   
> That's unfortunately not so easy, because of project dependencies. To run the test you have to compile Lucene Core then the specific module + the test framework (which is special for Lucene) and download some JARs from Maven central (JAR hell, as usual).
> If you give me some time, I would collect all needed JAR files from my local checkout and provide you the correct cmd line + a ZIP file with maybe a shell script to startup. It should be doable, but needs some work to collect all dependencies for the classpath.
>   
> If you want to do it quicker (should be quite fast to do):
> - Download ANT 1.8.2 binary zip (unfortunately ANT 1.8.4 has a bug making it not working out of the box with Java 8):http://archive.apache.org/dist/ant/binaries/apache-ant-1.8.2-bin.tar.gz  - I just wonder about the fact: isn't ANT needed to build the JDK classlib by itself? I remember that the FreeBSD OpenJDK build downloads ANT and does a large part of the compilation using ANT...
> - put the ANT bin/ dir into your PATH
> - download the Apache Lucene source code from Jenkins:https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/lucene-5.0-2013-03-05_15-37-06-src.tgz
> - go to extracted lucene source dir, call "ant ivy-bootstrap" (this will download Apache IVY, so all dependencies can be downloaded from Maven Central)
> - change to the module that fails: # cd analysis/uima
> - execute: # ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3 -Dtests.jvms=1 test
> - In a parallel console you might be able to attach to the process, the build in the main console using ANT runs inside ANT and the test framework spawns separate worker instances of the JVM to execute the tests. This makes it hard to reproduce in standalone (the command line passed to the child JVM is veeeeery long).
>   
> I will work on putting together a precompiled ZIP file with all needed JARs + the command line. Just tell me if you got it managed with the above howto, then I don’t need to do this.
> Uwe
>   
> -----
> Uwe Schindler
> uschindler@apache.org  <ma...@apache.org>  
> Apache Lucene PMC Member / Committer
> Bremen, Germany
> http://lucene.apache.org/
>   
>   
>    
>
>     -----Original Message-----
>
>     From: John Cuthbertson [mailto:john.cuthbertson@oracle.com]
>
>     Sent: Wednesday, March 06, 2013 7:51 PM
>
>     To: Uwe Schindler
>
>     Cc: 'Bengt Rutisson';hotspot-gc-dev@openjdk.java.net  <ma...@openjdk.java.net>;
>
>     dev@lucene.apache.org  <ma...@lucene.apache.org>
>
>     Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
>
>       
>
>     Hi Uwe,
>
>       
>
>     I've downloaded  lucene-5.0-2013-03-05_15-37-06.zip from
>
>     https://builds.apache.org/job/Lucene-Artifacts-
>
>     trunk/2212/artifact/lucene/dist/
>
>       
>
>     I don't have ant on my workstation so do you have a java command line to
>
>     run the test(s) that generate the error?
>
>       
>
>     Thanks,
>
>       
>
>     JohnC
>
>       
>
>     On 3/6/2013 3:16 AM, Uwe Schindler wrote:
>
>          
>
>         Hi,
>
>           
>
>                
>
>             I think this is a VM bug and the thread dumps that Uwe produced are
>
>             enough to start tracking down the root cause.
>
>                      
>
>         I hope it is enough! If I can help with more details, tell me what I should do
>
>                
>
>     to track this down. Unfortunately, we have no isolated test case (like a small
>
>     java class that triggers this bug) - you have to run the test cases of this
>
>     Lucene's module. It only happens there, not in any other Lucene test suite. It
>
>     may be caused by a lot of GC activity in this "UIMA" module or a specific test.
>
>          
>
>             On 3/6/13 8:52 AM, David Holmes wrote:
>
>                      
>
>                 If the VM is completely unresponsive then it suggests we are at a
>
>                 safepoint.
>
>                            
>
>             Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.
>
>               
>
>                      
>
>                 The GC threads are not "hung" in os::parK, they are parked - waiting
>
>                 to be notified of something.
>
>                            
>
>             It looks like the reference processing thread is stuck in a loop
>
>             where it does wait(). So, the VM is hanging even if that stack trace
>
>             also ends up in os::park().
>
>               
>
>                      
>
>                 The thing is to find out why they are not being woken up.
>
>                            
>
>             Actually, in this case we should probably not even be calling wait...
>
>               
>
>                      
>
>                 Can the gdb log be posted somewhere? I don't know if the attachment
>
>                 made it to the original posting on hotspot-gc but it's no longer
>
>                 available on hotspot-dev.
>
>                            
>
>             I received the attachment with the original email. I've attached it
>
>             to the bug report that I created: 8009536. You can find it there if
>
>             you want to. But I think we have a fairly good idea of what change
>
>             caused the hang.
>
>                      
>
>         If it helps: Unfortunately, we had some problems with recent JDK builds,
>
>                
>
>     because javac and javadoc tools were not working correctly, failing to build
>
>     our source code. Since b78 this was fixed. Until this was fixed, we used build
>
>     b65 (which was the last one working) and the G1GC hangs did not appear on
>
>     this version. So it must have happened by a change after b65 till b78.
>
>          
>
>         Uwe
>
>           
>
>                
>
>             Bengt
>
>               
>
>                      
>
>                 Thanks,
>
>                 David
>
>                   
>
>                 On 6/03/2013 4:07 PM, Krystal Mok wrote:
>
>                            
>
>                     Hi Uwe,
>
>                       
>
>                     If you can attach gdb onto it, and jstack -m and jstack -F should
>
>                     also work; that'll get you the Java stack trace.
>
>                     (But it probably doesn't matter in this case, because the hang is
>
>                     probably bug in the VM).
>
>                       
>
>                     - Kris
>
>                       
>
>                     On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler
>
>                                  
>
>             <us...@apache.org>  <ma...@apache.org>
>
>                      
>
>                     wrote:
>
>                                  
>
>                         Hi,
>
>                           
>
>                         since a few month we are extensively testing various preview
>
>                         builds of JDK 8 for compatibility with Apache Lucene and Solr, so
>
>                         we can find any bugs early and prevent the problems we had with
>
>                         the release of Java 7 two years ago. Currently we have a Linux
>
>                         (Ubuntu 64bit) Jenkins machine that has various JDKs (JDK 6, JDK
>
>                         7, JDK 8 snapshot, IBM J9, older JRockit) installed, choosing a
>
>                         different one with different hotspot and garbage collector
>
>                         settings on every run of the test suite (which takes approx. 30-45
>
>                                        
>
>     minutes).
>
>          
>
>                         JDK 8 b79 works so far very well on Linux, we found some strange
>
>                         behavior in early versions (maybe compiler errors), but no longer
>
>                         at the moment. There is one configuration that constantly and
>
>                         reproducibly hangs in one module that is tested: The configuration
>
>                         uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client
>
>                         does not matter). The JVM running the tests hangs irresponsible
>
>                         (jstack or kill -3 have no effect/cannot connect, standard kill
>
>                         does not stop it, only kill -9 actually kills it). It can be
>
>                         reproduced in this Lucene module 100% (it hangs always).
>
>                           
>
>                         I was able to connect with GDB to the JVM and get a stack trace on
>
>                         all threads (see attachment, dump.txt). As you see all threads of
>
>                         G1GC seem to hang in a syscall (os:park(), a conditional wait in
>
>                         pthread library). Unfortunately that’s all I can give you. A Java
>
>                         stacktrace is not possible because the JVM reacts on neither kill
>
>                         -3 nor jstack. With all other garbage collectors it passes the
>
>                         test without hangs in a few seconds, with 32 bit G1GC it can stand
>
>                         still for hours. The 64 bit JVM passes with G1GC, so only the 32
>
>                         bit variant is affected. Client or Server VM makes no difference.
>
>                           
>
>                         To reproduce:
>
>                         - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this
>
>                         should not matter)
>
>                         - Download Lucene Source code (e.g. the snapshot version we were
>
>                         testing with:
>
>                         https://builds.apache.org/job/Lucene-Artifacts-
>
>                                        
>
>             trunk/2212/artifact/lucene/dist/)
>
>                      
>
>                         - change to directory lucene/analysis/uima and run:
>
>                                    ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
>
>                         -Dtests.jvms=1 test
>
>                         After a while the test framework prints "stalled" messages
>
>                         (because the child VM actually running the test no longer
>
>                         responds). The PID is also printed. Try to get a stack trace or kill it, no
>
>                                        
>
>     response.
>
>          
>
>                         Only kill -9 helps. Choosing another garbage collector in the
>
>                         above command line makes the test finish after a few seconds, e.g.
>
>                         -Dargs="-server -XX:+UseConcMarkSweepGC"
>
>                           
>
>                         I posted this bug report directly to the mailing list, because
>
>                         with earlier bug reports, there seem to be a problem with
>
>                         bugs.sun.com - there is no response from any reviewer after
>
>                         several weeks and we were able to help to find and fix javadoc and
>
>                         javac-compiler bugs early. So I hope you can help for this bug, too.
>
>                           
>
>                         Uwe
>
>                           
>
>                         -----
>
>                         Uwe Schindler
>
>                         uschindler@apache.org  <ma...@apache.org>
>
>                         Apache Lucene PMC Member / Committer Bremen, Germany
>
>                         http://lucene.apache.org/
>
>                           
>
>                           
>
>                                        
>
>   
>    
>


Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Erick Erickson <er...@gmail.com>.
Awesome work all!


On Thu, Mar 7, 2013 at 7:24 AM, Bengt Rutisson <be...@oracle.com>wrote:

>
> John and Uwe,
>
> I followed the original instruction sent out by Uwe to reproduce the test.
> I got it up and running on my Windows x64 workstation using a 32 bit
> binary. The test hangs every time I run it.
>
> John, I think your proxy issues are due to the fact that ant picks up its
> proxy setting from Java. So you need to set the system properties
> http.proxyHost and http.proxyPort. I did this by exporting the the
> _JAVA_OPTIONS environment variable as:
>
> _JAVA_OPTIONS=-Dhttp.proxyHost=<oracle www proxy> -Dhttp.proxyPort=<oracle
> proxy port>
>
> Let me know if this does not work for you. We can try to debug it offline.
>
> Since I could catch the hang in a debugger I could confirm both that the
> hang is indeed related to the recent change to the
> DrainMarkingStackClosures and that the problem is that we enter the
> termination protocol even when reference processing is single threaded.
>
> Looking at the comment in the constructor for G1CMDrainMarkingStackClosure:
>
>     // We only allow stealing and only enter the termination protocol
>     // in CMTask::do_marking_step() if this closure is being instantiated
>     // for parallel reference processing.
>     _do_stealing = _do_termination = is_par;
>
> I came up with a patch that makes the test work again. But I leave it to
> you, John, to figure out if this is the right way to solve the problem.
>
> diff --git a/src/share/vm/gc_implementation/g1/concurrentMark.cpp
> b/src/share/vm/gc_implementation/g1/concurrentMark.cpp
> --- a/src/share/vm/gc_implementation/g1/concurrentMark.cpp
> +++ b/src/share/vm/gc_implementation/g1/concurrentMark.cpp
> @@ -4336,7 +4336,9 @@
>          gclog_or_tty->print_cr("[%u] detected overflow", _worker_id);
>        }
>
> + if (do_stealing || do_termination) {
>        _cm->enter_first_sync_barrier(_worker_id);
> + }
>        // When we exit this sync barrier we know that all tasks have
>        // stopped doing marking work. So, it's now safe to
>        // re-initialise our data structures. At the end of this method,
> @@ -4347,8 +4349,10 @@
>        // We clear the local state of this task...
>        clear_region_fields();
>
> + if (do_stealing || do_termination) {
>        // ...and enter the second barrier.
>        _cm->enter_second_sync_barrier(_worker_id);
> + }
>        // At this point everything has bee re-initialised and we're
>        // ready to restart.
>      }
>
>
> Thanks,
> Bengt
>
>
> On 3/7/13 7:44 AM, Uwe Schindler wrote:
>
>  Hi John,****
>
> ** **
>
> I only have time to work on a setup this evening Germen time, because I am
> on a business trip today. Will come back to you. Unfortunately I failed to
> quickly setup an easy classpath without Ivy downloading the JARS. ****
>
> ** **
>
> Uwe****
>
> ** **
>
> -----****
>
> Uwe Schindler****
>
> uschindler@apache.org ****
>
> Apache Lucene PMC Member / Committer****
>
> Bremen, Germany****
>
> http://lucene.apache.org/****
>
> ** **
>
> *From:* John Cuthbertson [mailto:john.cuthbertson@oracle.com<jo...@oracle.com>]
>
> *Sent:* Thursday, March 07, 2013 12:49 AM
> *To:* Uwe Schindler
> *Cc:* 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net;
> dev@lucene.apache.org
> *Subject:* Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32
> bit)****
>
> ** **
>
> Hi Uwe,
>
> An update:
>
> I have downloaded ant and the lucerne source.
>
> I attempted the ivy-bootstrap but it failed to download the ivy=2.3.0.jar
> file - even after setting:
>
> ANT_OPTS=-Dhttp.proxyHost=<...> -Dhttp.proxyPort=<...>
>
> So I manually downloaded and placed it into the ANT library and now get:
>
>
> ****
>
> ivy-bootstrap1:
>     [mkdir] Skipping /home/jcuthber/.ant/lib because it already exists.
>      [echo] installing ivy 2.3.0 to /home/jcuthber/.ant/lib
>       [get] Getting:
> http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar
>       [get] To: /home/jcuthber/.ant/lib/ivy-2.3.0.jar
>       [get] Error getting
> http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar to
> /home/jcuthber/.ant/lib/ivy-2.3.0.jar
> [available] Found: /home/jcuthber/.ant/lib/ivy-2.3.0.jar
>
> ivy-bootstrap2:
> Skipped because property 'ivy.bootstrap1.success' set.
>
> ivy-checksum:
>
> ivy-bootstrap:
>
> BUILD SUCCESSFUL
> Total time: 3 minutes 46 seconds****
>
> Presumably I have to build the lucerne source before executing the tests.
> That seemed to go OK.
>
> When I run the analysis/uima tests it seems to get hung up at the
> "resolve" target - even without specifying G1:
>
>
> ****
>
> cairnapple{jcuthber}:408> cd analysis/uima/
> cairnapple{jcuthber}:409> ls -l
> total 29
> -rw-r--r--   1 jcuthber staff       1473 Dec 10 10:39 build.xml
> -rw-rw-r--   1 jcuthber staff       6895 Mar  6 15:20 hotspot.log
> -rw-r--r--   1 jcuthber staff       1316 Mar 30  2012 ivy.xml
> drwxr-xr-x   2 jcuthber staff          2 Mar  5 07:42 lib/
> drwxr-xr-x   6 jcuthber staff          6 Mar  5 07:42 src/****
>
>
>
> ****
>
> ivy-configure:
> [ivy:configure] Loading
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivy.properties
> [ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 ::
> http://ant.apache.org/ivy/ ::
> [ivy:configure] jakarta commons httpclient not found: using jdk url
> handling
> [ivy:configure] :: loading settings :: file =
> /export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/ivy-settings.xml
> [ivy:configure] no default ivy user dir defined: set to
> /home/jcuthber/.ivy2
> [ivy:configure] including url:
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-public.xml
> [ivy:configure] no default cache defined: set to /home/jcuthber/.ivy2/cache
> [ivy:configure] including url:
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-shared.xml
> [ivy:configure] including url:
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-local.xml
> [ivy:configure] including url:
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-main-chain.xml
> [ivy:configure] settings loaded (289ms)
> [ivy:configure]         default cache: /home/jcuthber/.ivy2/cache
> [ivy:configure]         default resolver: default
> [ivy:configure]         -- 7 resolvers:
> [ivy:configure]         working-chinese-mirror [ibiblio]
> [ivy:configure]         main [chain] [shared, public]
> [ivy:configure]         local [file]
> [ivy:configure]         shared [file]
> [ivy:configure]         sonatype-releases [ibiblio]
> [ivy:configure]         public [ibiblio]
> [ivy:configure]         default [chain] [local, main, sonatype-releases,
> working-chinese-mirror]
>
> resolve:
> [ivy:retrieve] no resolved descriptor found: launching default resolve
> Overriding previous definition of property "ivy.version"
> [ivy:retrieve] using ivy parser to parse
> file:/export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/analysis/uima/ivy.xml
> [ivy:retrieve] :: resolving dependencies ::
> org.apache.lucene#analyzers-uima;working@cairnapple
> [ivy:retrieve]  confs: [default]
> [ivy:retrieve]  validate = true
> [ivy:retrieve]  refresh = false
> [ivy:retrieve] resolving dependencies for configuration 'default'
> [ivy:retrieve] == resolving dependencies for
> org.apache.lucene#analyzers-uima;working@cairnapple [default]
> [ivy:retrieve] == resolving dependencies
> org.apache.lucene#analyzers-uima;working@cairnapple->org.apache.uima#Tagger;2.3.1
> [default->*]
> [ivy:retrieve] default: Checking cache for: dependency:
> org.apache.uima#Tagger;2.3.1 {*=[*]}
> [ivy:retrieve] don't use cache for org.apache.uima#Tagger;2.3.1:
> checkModified=true
> [ivy:retrieve]          tried
> /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
> [ivy:retrieve]          tried
> /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
> [ivy:retrieve]  local: no ivy file nor artifact found for
> org.apache.uima#Tagger;2.3.1
> [ivy:retrieve] main: Checking cache for: dependency:
> org.apache.uima#Tagger;2.3.1 {*=[*]}
> [ivy:retrieve]          tried
> /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
> [ivy:retrieve]          tried
> /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
> [ivy:retrieve]  shared: no ivy file nor artifact found for
> org.apache.uima#Tagger;2.3.1
> [ivy:retrieve]          tried
> http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom
> ****
>
> and there it hangs - presumably trying to access
> http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom
>
> There must be something with our proxy settings that that won't allow this.
>
> JohnC
>
>
> On 03/06/13 11:15, Uwe Schindler wrote: ****
>
> Hi,****
>
> ** **
>
> That's unfortunately not so easy, because of project dependencies. To run the test you have to compile Lucene Core then the specific module + the test framework (which is special for Lucene) and download some JARs from Maven central (JAR hell, as usual).****
>
> If you give me some time, I would collect all needed JAR files from my local checkout and provide you the correct cmd line + a ZIP file with maybe a shell script to startup. It should be doable, but needs some work to collect all dependencies for the classpath.****
>
> ** **
>
> If you want to do it quicker (should be quite fast to do):****
>
> - Download ANT 1.8.2 binary zip (unfortunately ANT 1.8.4 has a bug making it not working out of the box with Java 8): http://archive.apache.org/dist/ant/binaries/apache-ant-1.8.2-bin.tar.gz - I just wonder about the fact: isn't ANT needed to build the JDK classlib by itself? I remember that the FreeBSD OpenJDK build downloads ANT and does a large part of the compilation using ANT...****
>
> - put the ANT bin/ dir into your PATH****
>
> - download the Apache Lucene source code from Jenkins: https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/lucene-5.0-2013-03-05_15-37-06-src.tgz****
>
> - go to extracted lucene source dir, call "ant ivy-bootstrap" (this will download Apache IVY, so all dependencies can be downloaded from Maven Central)****
>
> - change to the module that fails: # cd analysis/uima****
>
> - execute: # ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3 -Dtests.jvms=1 test****
>
> - In a parallel console you might be able to attach to the process, the build in the main console using ANT runs inside ANT and the test framework spawns separate worker instances of the JVM to execute the tests. This makes it hard to reproduce in standalone (the command line passed to the child JVM is veeeeery long).****
>
> ** **
>
> I will work on putting together a precompiled ZIP file with all needed JARs + the command line. Just tell me if you got it managed with the above howto, then I don’t need to do this.****
>
> Uwe****
>
> ** **
>
> -----****
>
> Uwe Schindler****
>
> uschindler@apache.org ****
>
> Apache Lucene PMC Member / Committer****
>
> Bremen, Germany****
>
> http://lucene.apache.org/****
>
> ** **
>
> ** **
>
>   ****
>
>  -----Original Message-----****
>
> From: John Cuthbertson [mailto:john.cuthbertson@oracle.com <jo...@oracle.com>]****
>
> Sent: Wednesday, March 06, 2013 7:51 PM****
>
> To: Uwe Schindler****
>
> Cc: 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net;****
>
> dev@lucene.apache.org****
>
> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)****
>
> ** **
>
> Hi Uwe,****
>
> ** **
>
> I've downloaded  lucene-5.0-2013-03-05_15-37-06.zip from****
>
> https://builds.apache.org/job/Lucene-Artifacts-****
>
> trunk/2212/artifact/lucene/dist/****
>
> ** **
>
> I don't have ant on my workstation so do you have a java command line to****
>
> run the test(s) that generate the error?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> JohnC****
>
> ** **
>
> On 3/6/2013 3:16 AM, Uwe Schindler wrote:****
>
>     ****
>
>  Hi,****
>
> ** **
>
>       ****
>
>  I think this is a VM bug and the thread dumps that Uwe produced are****
>
> enough to start tracking down the root cause.****
>
>         ****
>
>  I hope it is enough! If I can help with more details, tell me what I should do****
>
>       ****
>
>  to track this down. Unfortunately, we have no isolated test case (like a small****
>
> java class that triggers this bug) - you have to run the test cases of this****
>
> Lucene's module. It only happens there, not in any other Lucene test suite. It****
>
> may be caused by a lot of GC activity in this "UIMA" module or a specific test.****
>
>     ****
>
>  On 3/6/13 8:52 AM, David Holmes wrote:****
>
>         ****
>
>  If the VM is completely unresponsive then it suggests we are at a****
>
> safepoint.****
>
>           ****
>
>  Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.****
>
> ** **
>
>         ****
>
>  The GC threads are not "hung" in os::parK, they are parked - waiting****
>
> to be notified of something.****
>
>           ****
>
>  It looks like the reference processing thread is stuck in a loop****
>
> where it does wait(). So, the VM is hanging even if that stack trace****
>
> also ends up in os::park().****
>
> ** **
>
>         ****
>
>  The thing is to find out why they are not being woken up.****
>
>           ****
>
>  Actually, in this case we should probably not even be calling wait...****
>
> ** **
>
>         ****
>
>  Can the gdb log be posted somewhere? I don't know if the attachment****
>
> made it to the original posting on hotspot-gc but it's no longer****
>
> available on hotspot-dev.****
>
>           ****
>
>  I received the attachment with the original email. I've attached it****
>
> to the bug report that I created: 8009536. You can find it there if****
>
> you want to. But I think we have a fairly good idea of what change****
>
> caused the hang.****
>
>         ****
>
>  If it helps: Unfortunately, we had some problems with recent JDK builds,****
>
>       ****
>
>  because javac and javadoc tools were not working correctly, failing to build****
>
> our source code. Since b78 this was fixed. Until this was fixed, we used build****
>
> b65 (which was the last one working) and the G1GC hangs did not appear on****
>
> this version. So it must have happened by a change after b65 till b78.****
>
>     ****
>
>  Uwe****
>
> ** **
>
>       ****
>
>  Bengt****
>
> ** **
>
>         ****
>
>  Thanks,****
>
> David****
>
> ** **
>
> On 6/03/2013 4:07 PM, Krystal Mok wrote:****
>
>           ****
>
>  Hi Uwe,****
>
> ** **
>
> If you can attach gdb onto it, and jstack -m and jstack -F should****
>
> also work; that'll get you the Java stack trace.****
>
> (But it probably doesn't matter in this case, because the hang is****
>
> probably bug in the VM).****
>
> ** **
>
> - Kris****
>
> ** **
>
> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler****
>
>             ****
>
>  <us...@apache.org> <us...@apache.org>****
>
>         ****
>
>  wrote:****
>
>             ****
>
>  Hi,****
>
> ** **
>
> since a few month we are extensively testing various preview****
>
> builds of JDK 8 for compatibility with Apache Lucene and Solr, so****
>
> we can find any bugs early and prevent the problems we had with****
>
> the release of Java 7 two years ago. Currently we have a Linux****
>
> (Ubuntu 64bit) Jenkins machine that has various JDKs (JDK 6, JDK****
>
> 7, JDK 8 snapshot, IBM J9, older JRockit) installed, choosing a****
>
> different one with different hotspot and garbage collector****
>
> settings on every run of the test suite (which takes approx. 30-45****
>
>               ****
>
>    minutes).****
>
>     ****
>
>    JDK 8 b79 works so far very well on Linux, we found some strange****
>
> behavior in early versions (maybe compiler errors), but no longer****
>
> at the moment. There is one configuration that constantly and****
>
> reproducibly hangs in one module that is tested: The configuration****
>
> uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client****
>
> does not matter). The JVM running the tests hangs irresponsible****
>
> (jstack or kill -3 have no effect/cannot connect, standard kill****
>
> does not stop it, only kill -9 actually kills it). It can be****
>
> reproduced in this Lucene module 100% (it hangs always).****
>
> ** **
>
> I was able to connect with GDB to the JVM and get a stack trace on****
>
> all threads (see attachment, dump.txt). As you see all threads of****
>
> G1GC seem to hang in a syscall (os:park(), a conditional wait in****
>
> pthread library). Unfortunately that’s all I can give you. A Java****
>
> stacktrace is not possible because the JVM reacts on neither kill****
>
> -3 nor jstack. With all other garbage collectors it passes the****
>
> test without hangs in a few seconds, with 32 bit G1GC it can stand****
>
> still for hours. The 64 bit JVM passes with G1GC, so only the 32****
>
> bit variant is affected. Client or Server VM makes no difference.****
>
> ** **
>
> To reproduce:****
>
> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this****
>
> should not matter)****
>
> - Download Lucene Source code (e.g. the snapshot version we were****
>
> testing with:****
>
> https://builds.apache.org/job/Lucene-Artifacts-****
>
>               ****
>
>   trunk/2212/artifact/lucene/dist/)****
>
>         ****
>
>   - change to directory lucene/analysis/uima and run:****
>
>           ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3****
>
> -Dtests.jvms=1 test****
>
> After a while the test framework prints "stalled" messages****
>
> (because the child VM actually running the test no longer****
>
> responds). The PID is also printed. Try to get a stack trace or kill it, no****
>
>               ****
>
>    response.****
>
>     ****
>
>    Only kill -9 helps. Choosing another garbage collector in the****
>
> above command line makes the test finish after a few seconds, e.g.****
>
> -Dargs="-server -XX:+UseConcMarkSweepGC"****
>
> ** **
>
> I posted this bug report directly to the mailing list, because****
>
> with earlier bug reports, there seem to be a problem with****
>
> bugs.sun.com - there is no response from any reviewer after****
>
> several weeks and we were able to help to find and fix javadoc and****
>
> javac-compiler bugs early. So I hope you can help for this bug, too.****
>
> ** **
>
> Uwe****
>
> ** **
>
> -----****
>
> Uwe Schindler****
>
> uschindler@apache.org****
>
> Apache Lucene PMC Member / Committer Bremen, Germany****
>
> http://lucene.apache.org/****
>
> ** **
>
> ** **
>
>               ****
>
>    ** **
>
>   ****
>
> ** **
>
>
>

Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Bengt Rutisson <be...@oracle.com>.
John and Uwe,

I followed the original instruction sent out by Uwe to reproduce the 
test. I got it up and running on my Windows x64 workstation using a 32 
bit binary. The test hangs every time I run it.

John, I think your proxy issues are due to the fact that ant picks up 
its proxy setting from Java. So you need to set the system properties 
http.proxyHost and http.proxyPort. I did this by exporting the the 
_JAVA_OPTIONS environment variable as:

_JAVA_OPTIONS=-Dhttp.proxyHost=<oracle www proxy> 
-Dhttp.proxyPort=<oracle proxy port>

Let me know if this does not work for you. We can try to debug it offline.

Since I could catch the hang in a debugger I could confirm both that the 
hang is indeed related to the recent change to the 
DrainMarkingStackClosures and that the problem is that we enter the 
termination protocol even when reference processing is single threaded.

Looking at the comment in the constructor for G1CMDrainMarkingStackClosure:

     // We only allow stealing and only enter the termination protocol
     // in CMTask::do_marking_step() if this closure is being instantiated
     // for parallel reference processing.
     _do_stealing = _do_termination = is_par;

I came up with a patch that makes the test work again. But I leave it to 
you, John, to figure out if this is the right way to solve the problem.

diff --git a/src/share/vm/gc_implementation/g1/concurrentMark.cpp 
b/src/share/vm/gc_implementation/g1/concurrentMark.cpp
--- a/src/share/vm/gc_implementation/g1/concurrentMark.cpp
+++ b/src/share/vm/gc_implementation/g1/concurrentMark.cpp
@@ -4336,7 +4336,9 @@
          gclog_or_tty->print_cr("[%u] detected overflow", _worker_id);
        }

+ if (do_stealing || do_termination) {
        _cm->enter_first_sync_barrier(_worker_id);
+ }
        // When we exit this sync barrier we know that all tasks have
        // stopped doing marking work. So, it's now safe to
        // re-initialise our data structures. At the end of this method,
@@ -4347,8 +4349,10 @@
        // We clear the local state of this task...
        clear_region_fields();

+ if (do_stealing || do_termination) {
        // ...and enter the second barrier.
        _cm->enter_second_sync_barrier(_worker_id);
+ }
        // At this point everything has bee re-initialised and we're
        // ready to restart.
      }


Thanks,
Bengt

On 3/7/13 7:44 AM, Uwe Schindler wrote:
>
> Hi John,
>
> I only have time to work on a setup this evening Germen time, because 
> I am on a business trip today. Will come back to you. Unfortunately I 
> failed to quickly setup an easy classpath without Ivy downloading the 
> JARS.
>
> Uwe
>
> -----
>
> Uwe Schindler
>
> uschindler@apache.org
>
> Apache Lucene PMC Member / Committer
>
> Bremen, Germany
>
> http://lucene.apache.org/
>
> *From:*John Cuthbertson [mailto:john.cuthbertson@oracle.com]
> *Sent:* Thursday, March 07, 2013 12:49 AM
> *To:* Uwe Schindler
> *Cc:* 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net; 
> dev@lucene.apache.org
> *Subject:* Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 
> 32 bit)
>
> Hi Uwe,
>
> An update:
>
> I have downloaded ant and the lucerne source.
>
> I attempted the ivy-bootstrap but it failed to download the 
> ivy=2.3.0.jar file - even after setting:
>
> ANT_OPTS=-Dhttp.proxyHost=<...> -Dhttp.proxyPort=<...>
>
> So I manually downloaded and placed it into the ANT library and now get:
>
>
> ivy-bootstrap1:
>     [mkdir] Skipping /home/jcuthber/.ant/lib because it already exists.
>      [echo] installing ivy 2.3.0 to /home/jcuthber/.ant/lib
>       [get] Getting: 
> http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar
>       [get] To: /home/jcuthber/.ant/lib/ivy-2.3.0.jar
>       [get] Error getting 
> http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar 
> to /home/jcuthber/.ant/lib/ivy-2.3.0.jar
> [available] Found: /home/jcuthber/.ant/lib/ivy-2.3.0.jar
>
> ivy-bootstrap2:
> Skipped because property 'ivy.bootstrap1.success' set.
>
> ivy-checksum:
>
> ivy-bootstrap:
>
> BUILD SUCCESSFUL
> Total time: 3 minutes 46 seconds
>
> Presumably I have to build the lucerne source before executing the 
> tests. That seemed to go OK.
>
> When I run the analysis/uima tests it seems to get hung up at the 
> "resolve" target - even without specifying G1:
>
>
> cairnapple{jcuthber}:408> cd analysis/uima/
> cairnapple{jcuthber}:409> ls -l
> total 29
> -rw-r--r--   1 jcuthber staff       1473 Dec 10 10:39 build.xml
> -rw-rw-r--   1 jcuthber staff       6895 Mar  6 15:20 hotspot.log
> -rw-r--r--   1 jcuthber staff       1316 Mar 30  2012 ivy.xml
> drwxr-xr-x   2 jcuthber staff          2 Mar  5 07:42 lib/
> drwxr-xr-x   6 jcuthber staff          6 Mar  5 07:42 src/
>
>
>
> ivy-configure:
> [ivy:configure] Loading 
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivy.properties 
> <jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar%21/org/apache/ivy/core/settings/ivy.properties>
> [ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 :: 
> http://ant.apache.org/ivy/ ::
> [ivy:configure] jakarta commons httpclient not found: using jdk url 
> handling
> [ivy:configure] :: loading settings :: file = 
> /export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/ivy-settings.xml
> [ivy:configure] no default ivy user dir defined: set to 
> /home/jcuthber/.ivy2
> [ivy:configure] including url: 
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-public.xml 
> <jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar%21/org/apache/ivy/core/settings/ivysettings-public.xml>
> [ivy:configure] no default cache defined: set to 
> /home/jcuthber/.ivy2/cache
> [ivy:configure] including url: 
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-shared.xml 
> <jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar%21/org/apache/ivy/core/settings/ivysettings-shared.xml>
> [ivy:configure] including url: 
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-local.xml 
> <jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar%21/org/apache/ivy/core/settings/ivysettings-local.xml>
> [ivy:configure] including url: 
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-main-chain.xml 
> <jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar%21/org/apache/ivy/core/settings/ivysettings-main-chain.xml>
> [ivy:configure] settings loaded (289ms)
> [ivy:configure]         default cache: /home/jcuthber/.ivy2/cache
> [ivy:configure]         default resolver: default
> [ivy:configure]         -- 7 resolvers:
> [ivy:configure]         working-chinese-mirror [ibiblio]
> [ivy:configure]         main [chain] [shared, public]
> [ivy:configure]         local [file]
> [ivy:configure]         shared [file]
> [ivy:configure]         sonatype-releases [ibiblio]
> [ivy:configure]         public [ibiblio]
> [ivy:configure]         default [chain] [local, main, 
> sonatype-releases, working-chinese-mirror]
>
> resolve:
> [ivy:retrieve] no resolved descriptor found: launching default resolve
> Overriding previous definition of property "ivy.version"
> [ivy:retrieve] using ivy parser to parse 
> file:/export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/analysis/uima/ivy.xml 
> <file:///%5C%5Cexport%5Cbugs%5C8009536%5Clucene-5.0-2013-03-05_15-37-06%5Canalysis%5Cuima%5Civy.xml>
> [ivy:retrieve] :: resolving dependencies :: 
> org.apache.lucene#analyzers-uima;working@cairnapple
> [ivy:retrieve]  confs: [default]
> [ivy:retrieve]  validate = true
> [ivy:retrieve]  refresh = false
> [ivy:retrieve] resolving dependencies for configuration 'default'
> [ivy:retrieve] == resolving dependencies for 
> org.apache.lucene#analyzers-uima;working@cairnapple [default]
> [ivy:retrieve] == resolving dependencies 
> org.apache.lucene#analyzers-uima;working@cairnapple->org.apache.uima#Tagger;2.3.1 
> [default->*]
> [ivy:retrieve] default: Checking cache for: dependency: 
> org.apache.uima#Tagger;2.3.1 {*=[*]}
> [ivy:retrieve] don't use cache for org.apache.uima#Tagger;2.3.1: 
> checkModified=true
> [ivy:retrieve]          tried 
> /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
> [ivy:retrieve]          tried 
> /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
> [ivy:retrieve]  local: no ivy file nor artifact found for 
> org.apache.uima#Tagger;2.3.1
> [ivy:retrieve] main: Checking cache for: dependency: 
> org.apache.uima#Tagger;2.3.1 {*=[*]}
> [ivy:retrieve]          tried 
> /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
> [ivy:retrieve]          tried 
> /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
> [ivy:retrieve]  shared: no ivy file nor artifact found for 
> org.apache.uima#Tagger;2.3.1
> [ivy:retrieve]          tried 
> http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom
>
> and there it hangs - presumably trying to access 
> http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom
>
> There must be something with our proxy settings that that won't allow 
> this.
>
> JohnC
>
>
> On 03/06/13 11:15, Uwe Schindler wrote:
>
> Hi,
>   
> That's unfortunately not so easy, because of project dependencies. To run the test you have to compile Lucene Core then the specific module + the test framework (which is special for Lucene) and download some JARs from Maven central (JAR hell, as usual).
> If you give me some time, I would collect all needed JAR files from my local checkout and provide you the correct cmd line + a ZIP file with maybe a shell script to startup. It should be doable, but needs some work to collect all dependencies for the classpath.
>   
> If you want to do it quicker (should be quite fast to do):
> - Download ANT 1.8.2 binary zip (unfortunately ANT 1.8.4 has a bug making it not working out of the box with Java 8):http://archive.apache.org/dist/ant/binaries/apache-ant-1.8.2-bin.tar.gz  - I just wonder about the fact: isn't ANT needed to build the JDK classlib by itself? I remember that the FreeBSD OpenJDK build downloads ANT and does a large part of the compilation using ANT...
> - put the ANT bin/ dir into your PATH
> - download the Apache Lucene source code from Jenkins:https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/lucene-5.0-2013-03-05_15-37-06-src.tgz
> - go to extracted lucene source dir, call "ant ivy-bootstrap" (this will download Apache IVY, so all dependencies can be downloaded from Maven Central)
> - change to the module that fails: # cd analysis/uima
> - execute: # ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3 -Dtests.jvms=1 test
> - In a parallel console you might be able to attach to the process, the build in the main console using ANT runs inside ANT and the test framework spawns separate worker instances of the JVM to execute the tests. This makes it hard to reproduce in standalone (the command line passed to the child JVM is veeeeery long).
>   
> I will work on putting together a precompiled ZIP file with all needed JARs + the command line. Just tell me if you got it managed with the above howto, then I don’t need to do this.
> Uwe
>   
> -----
> Uwe Schindler
> uschindler@apache.org  <ma...@apache.org>  
> Apache Lucene PMC Member / Committer
> Bremen, Germany
> http://lucene.apache.org/
>   
>   
>    
>
>     -----Original Message-----
>
>     From: John Cuthbertson [mailto:john.cuthbertson@oracle.com]
>
>     Sent: Wednesday, March 06, 2013 7:51 PM
>
>     To: Uwe Schindler
>
>     Cc: 'Bengt Rutisson';hotspot-gc-dev@openjdk.java.net  <ma...@openjdk.java.net>;
>
>     dev@lucene.apache.org  <ma...@lucene.apache.org>
>
>     Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
>
>       
>
>     Hi Uwe,
>
>       
>
>     I've downloaded  lucene-5.0-2013-03-05_15-37-06.zip from
>
>     https://builds.apache.org/job/Lucene-Artifacts-
>
>     trunk/2212/artifact/lucene/dist/
>
>       
>
>     I don't have ant on my workstation so do you have a java command line to
>
>     run the test(s) that generate the error?
>
>       
>
>     Thanks,
>
>       
>
>     JohnC
>
>       
>
>     On 3/6/2013 3:16 AM, Uwe Schindler wrote:
>
>          
>
>         Hi,
>
>           
>
>                
>
>             I think this is a VM bug and the thread dumps that Uwe produced are
>
>             enough to start tracking down the root cause.
>
>                      
>
>         I hope it is enough! If I can help with more details, tell me what I should do
>
>                
>
>     to track this down. Unfortunately, we have no isolated test case (like a small
>
>     java class that triggers this bug) - you have to run the test cases of this
>
>     Lucene's module. It only happens there, not in any other Lucene test suite. It
>
>     may be caused by a lot of GC activity in this "UIMA" module or a specific test.
>
>          
>
>             On 3/6/13 8:52 AM, David Holmes wrote:
>
>                      
>
>                 If the VM is completely unresponsive then it suggests we are at a
>
>                 safepoint.
>
>                            
>
>             Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.
>
>               
>
>                      
>
>                 The GC threads are not "hung" in os::parK, they are parked - waiting
>
>                 to be notified of something.
>
>                            
>
>             It looks like the reference processing thread is stuck in a loop
>
>             where it does wait(). So, the VM is hanging even if that stack trace
>
>             also ends up in os::park().
>
>               
>
>                      
>
>                 The thing is to find out why they are not being woken up.
>
>                            
>
>             Actually, in this case we should probably not even be calling wait...
>
>               
>
>                      
>
>                 Can the gdb log be posted somewhere? I don't know if the attachment
>
>                 made it to the original posting on hotspot-gc but it's no longer
>
>                 available on hotspot-dev.
>
>                            
>
>             I received the attachment with the original email. I've attached it
>
>             to the bug report that I created: 8009536. You can find it there if
>
>             you want to. But I think we have a fairly good idea of what change
>
>             caused the hang.
>
>                      
>
>         If it helps: Unfortunately, we had some problems with recent JDK builds,
>
>                
>
>     because javac and javadoc tools were not working correctly, failing to build
>
>     our source code. Since b78 this was fixed. Until this was fixed, we used build
>
>     b65 (which was the last one working) and the G1GC hangs did not appear on
>
>     this version. So it must have happened by a change after b65 till b78.
>
>          
>
>         Uwe
>
>           
>
>                
>
>             Bengt
>
>               
>
>                      
>
>                 Thanks,
>
>                 David
>
>                   
>
>                 On 6/03/2013 4:07 PM, Krystal Mok wrote:
>
>                            
>
>                     Hi Uwe,
>
>                       
>
>                     If you can attach gdb onto it, and jstack -m and jstack -F should
>
>                     also work; that'll get you the Java stack trace.
>
>                     (But it probably doesn't matter in this case, because the hang is
>
>                     probably bug in the VM).
>
>                       
>
>                     - Kris
>
>                       
>
>                     On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler
>
>                                  
>
>             <us...@apache.org>  <ma...@apache.org>
>
>                      
>
>                     wrote:
>
>                                  
>
>                         Hi,
>
>                           
>
>                         since a few month we are extensively testing various preview
>
>                         builds of JDK 8 for compatibility with Apache Lucene and Solr, so
>
>                         we can find any bugs early and prevent the problems we had with
>
>                         the release of Java 7 two years ago. Currently we have a Linux
>
>                         (Ubuntu 64bit) Jenkins machine that has various JDKs (JDK 6, JDK
>
>                         7, JDK 8 snapshot, IBM J9, older JRockit) installed, choosing a
>
>                         different one with different hotspot and garbage collector
>
>                         settings on every run of the test suite (which takes approx. 30-45
>
>                                        
>
>     minutes).
>
>          
>
>                         JDK 8 b79 works so far very well on Linux, we found some strange
>
>                         behavior in early versions (maybe compiler errors), but no longer
>
>                         at the moment. There is one configuration that constantly and
>
>                         reproducibly hangs in one module that is tested: The configuration
>
>                         uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client
>
>                         does not matter). The JVM running the tests hangs irresponsible
>
>                         (jstack or kill -3 have no effect/cannot connect, standard kill
>
>                         does not stop it, only kill -9 actually kills it). It can be
>
>                         reproduced in this Lucene module 100% (it hangs always).
>
>                           
>
>                         I was able to connect with GDB to the JVM and get a stack trace on
>
>                         all threads (see attachment, dump.txt). As you see all threads of
>
>                         G1GC seem to hang in a syscall (os:park(), a conditional wait in
>
>                         pthread library). Unfortunately that’s all I can give you. A Java
>
>                         stacktrace is not possible because the JVM reacts on neither kill
>
>                         -3 nor jstack. With all other garbage collectors it passes the
>
>                         test without hangs in a few seconds, with 32 bit G1GC it can stand
>
>                         still for hours. The 64 bit JVM passes with G1GC, so only the 32
>
>                         bit variant is affected. Client or Server VM makes no difference.
>
>                           
>
>                         To reproduce:
>
>                         - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this
>
>                         should not matter)
>
>                         - Download Lucene Source code (e.g. the snapshot version we were
>
>                         testing with:
>
>                         https://builds.apache.org/job/Lucene-Artifacts-
>
>                                        
>
>             trunk/2212/artifact/lucene/dist/)
>
>                      
>
>                         - change to directory lucene/analysis/uima and run:
>
>                                    ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
>
>                         -Dtests.jvms=1 test
>
>                         After a while the test framework prints "stalled" messages
>
>                         (because the child VM actually running the test no longer
>
>                         responds). The PID is also printed. Try to get a stack trace or kill it, no
>
>                                        
>
>     response.
>
>          
>
>                         Only kill -9 helps. Choosing another garbage collector in the
>
>                         above command line makes the test finish after a few seconds, e.g.
>
>                         -Dargs="-server -XX:+UseConcMarkSweepGC"
>
>                           
>
>                         I posted this bug report directly to the mailing list, because
>
>                         with earlier bug reports, there seem to be a problem with
>
>                         bugs.sun.com - there is no response from any reviewer after
>
>                         several weeks and we were able to help to find and fix javadoc and
>
>                         javac-compiler bugs early. So I hope you can help for this bug, too.
>
>                           
>
>                         Uwe
>
>                           
>
>                         -----
>
>                         Uwe Schindler
>
>                         uschindler@apache.org  <ma...@apache.org>
>
>                         Apache Lucene PMC Member / Committer Bremen, Germany
>
>                         http://lucene.apache.org/
>
>                           
>
>                           
>
>                                        
>
>   
>    
>


RE: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Uwe Schindler <us...@apache.org>.
Hi John,

 

I only have time to work on a setup this evening Germen time, because I am on a business trip today. Will come back to you. Unfortunately I failed to quickly setup an easy classpath without Ivy downloading the JARS. 

 

Uwe

 

-----

Uwe Schindler

uschindler@apache.org 

Apache Lucene PMC Member / Committer

Bremen, Germany

http://lucene.apache.org/

 

From: John Cuthbertson [mailto:john.cuthbertson@oracle.com] 
Sent: Thursday, March 07, 2013 12:49 AM
To: Uwe Schindler
Cc: 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net; dev@lucene.apache.org
Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

 

Hi Uwe,

An update:

I have downloaded ant and the lucerne source.

I attempted the ivy-bootstrap but it failed to download the ivy=2.3.0.jar file - even after setting:

ANT_OPTS=-Dhttp.proxyHost=<...> -Dhttp.proxyPort=<...>

So I manually downloaded and placed it into the ANT library and now get:




ivy-bootstrap1:
    [mkdir] Skipping /home/jcuthber/.ant/lib because it already exists.
     [echo] installing ivy 2.3.0 to /home/jcuthber/.ant/lib
      [get] Getting: http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar
      [get] To: /home/jcuthber/.ant/lib/ivy-2.3.0.jar
      [get] Error getting http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar to /home/jcuthber/.ant/lib/ivy-2.3.0.jar
[available] Found: /home/jcuthber/.ant/lib/ivy-2.3.0.jar

ivy-bootstrap2:
Skipped because property 'ivy.bootstrap1.success' set.

ivy-checksum:

ivy-bootstrap:

BUILD SUCCESSFUL
Total time: 3 minutes 46 seconds

Presumably I have to build the lucerne source before executing the tests. That seemed to go OK.

When I run the analysis/uima tests it seems to get hung up at the "resolve" target - even without specifying G1:




cairnapple{jcuthber}:408> cd analysis/uima/
cairnapple{jcuthber}:409> ls -l
total 29
-rw-r--r--   1 jcuthber staff       1473 Dec 10 10:39 build.xml
-rw-rw-r--   1 jcuthber staff       6895 Mar  6 15:20 hotspot.log
-rw-r--r--   1 jcuthber staff       1316 Mar 30  2012 ivy.xml
drwxr-xr-x   2 jcuthber staff          2 Mar  5 07:42 lib/
drwxr-xr-x   6 jcuthber staff          6 Mar  5 07:42 src/





ivy-configure:
[ivy:configure] Loading jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivy.properties
[ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 :: http://ant.apache.org/ivy/ ::
[ivy:configure] jakarta commons httpclient not found: using jdk url handling
[ivy:configure] :: loading settings :: file = /export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/ivy-settings.xml
[ivy:configure] no default ivy user dir defined: set to /home/jcuthber/.ivy2
[ivy:configure] including url: jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-public.xml
[ivy:configure] no default cache defined: set to /home/jcuthber/.ivy2/cache
[ivy:configure] including url: jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-shared.xml
[ivy:configure] including url: jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-local.xml
[ivy:configure] including url: jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-main-chain.xml
[ivy:configure] settings loaded (289ms)
[ivy:configure]         default cache: /home/jcuthber/.ivy2/cache
[ivy:configure]         default resolver: default
[ivy:configure]         -- 7 resolvers:
[ivy:configure]         working-chinese-mirror [ibiblio]
[ivy:configure]         main [chain] [shared, public]
[ivy:configure]         local [file]
[ivy:configure]         shared [file]
[ivy:configure]         sonatype-releases [ibiblio]
[ivy:configure]         public [ibiblio]
[ivy:configure]         default [chain] [local, main, sonatype-releases, working-chinese-mirror]

resolve:
[ivy:retrieve] no resolved descriptor found: launching default resolve
Overriding previous definition of property "ivy.version"
[ivy:retrieve] using ivy parser to parse file:/export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/analysis/uima/ivy.xml <file:///\\export\bugs\8009536\lucene-5.0-2013-03-05_15-37-06\analysis\uima\ivy.xml> 
[ivy:retrieve] :: resolving dependencies :: org.apache.lucene#analyzers-uima;working@cairnapple
[ivy:retrieve]  confs: [default]
[ivy:retrieve]  validate = true
[ivy:retrieve]  refresh = false
[ivy:retrieve] resolving dependencies for configuration 'default'
[ivy:retrieve] == resolving dependencies for org.apache.lucene#analyzers-uima;working@cairnapple [default]
[ivy:retrieve] == resolving dependencies org.apache.lucene#analyzers-uima;working@cairnapple->org.apache.uima#Tagger;2.3.1 [default->*]
[ivy:retrieve] default: Checking cache for: dependency: org.apache.uima#Tagger;2.3.1 {*=[*]}
[ivy:retrieve] don't use cache for org.apache.uima#Tagger;2.3.1: checkModified=true
[ivy:retrieve]          tried /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
[ivy:retrieve]          tried /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
[ivy:retrieve]  local: no ivy file nor artifact found for org.apache.uima#Tagger;2.3.1
[ivy:retrieve] main: Checking cache for: dependency: org.apache.uima#Tagger;2.3.1 {*=[*]}
[ivy:retrieve]          tried /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
[ivy:retrieve]          tried /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
[ivy:retrieve]  shared: no ivy file nor artifact found for org.apache.uima#Tagger;2.3.1
[ivy:retrieve]          tried http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom

and there it hangs - presumably trying to access http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom

There must be something with our proxy settings that that won't allow this.

JohnC


On 03/06/13 11:15, Uwe Schindler wrote: 

Hi,
 
That's unfortunately not so easy, because of project dependencies. To run the test you have to compile Lucene Core then the specific module + the test framework (which is special for Lucene) and download some JARs from Maven central (JAR hell, as usual).
If you give me some time, I would collect all needed JAR files from my local checkout and provide you the correct cmd line + a ZIP file with maybe a shell script to startup. It should be doable, but needs some work to collect all dependencies for the classpath.
 
If you want to do it quicker (should be quite fast to do):
- Download ANT 1.8.2 binary zip (unfortunately ANT 1.8.4 has a bug making it not working out of the box with Java 8): http://archive.apache.org/dist/ant/binaries/apache-ant-1.8.2-bin.tar.gz - I just wonder about the fact: isn't ANT needed to build the JDK classlib by itself? I remember that the FreeBSD OpenJDK build downloads ANT and does a large part of the compilation using ANT...
- put the ANT bin/ dir into your PATH
- download the Apache Lucene source code from Jenkins: https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/lucene-5.0-2013-03-05_15-37-06-src.tgz
- go to extracted lucene source dir, call "ant ivy-bootstrap" (this will download Apache IVY, so all dependencies can be downloaded from Maven Central)
- change to the module that fails: # cd analysis/uima
- execute: # ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3 -Dtests.jvms=1 test
- In a parallel console you might be able to attach to the process, the build in the main console using ANT runs inside ANT and the test framework spawns separate worker instances of the JVM to execute the tests. This makes it hard to reproduce in standalone (the command line passed to the child JVM is veeeeery long).
 
I will work on putting together a precompiled ZIP file with all needed JARs + the command line. Just tell me if you got it managed with the above howto, then I don’t need to do this.
Uwe
 
-----
Uwe Schindler
uschindler@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/
 
 
  

-----Original Message-----
From: John Cuthbertson [mailto:john.cuthbertson@oracle.com]
Sent: Wednesday, March 06, 2013 7:51 PM
To: Uwe Schindler
Cc: 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net;
dev@lucene.apache.org
Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
 
Hi Uwe,
 
I've downloaded  lucene-5.0-2013-03-05_15-37-06.zip from
https://builds.apache.org/job/Lucene-Artifacts-
trunk/2212/artifact/lucene/dist/
 
I don't have ant on my workstation so do you have a java command line to
run the test(s) that generate the error?
 
Thanks,
 
JohnC
 
On 3/6/2013 3:16 AM, Uwe Schindler wrote:
    

Hi,
 
      

I think this is a VM bug and the thread dumps that Uwe produced are
enough to start tracking down the root cause.
        

I hope it is enough! If I can help with more details, tell me what I should do
      

to track this down. Unfortunately, we have no isolated test case (like a small
java class that triggers this bug) - you have to run the test cases of this
Lucene's module. It only happens there, not in any other Lucene test suite. It
may be caused by a lot of GC activity in this "UIMA" module or a specific test.
    

On 3/6/13 8:52 AM, David Holmes wrote:
        

If the VM is completely unresponsive then it suggests we are at a
safepoint.
          

Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.
 
        

The GC threads are not "hung" in os::parK, they are parked - waiting
to be notified of something.
          

It looks like the reference processing thread is stuck in a loop
where it does wait(). So, the VM is hanging even if that stack trace
also ends up in os::park().
 
        

The thing is to find out why they are not being woken up.
          

Actually, in this case we should probably not even be calling wait...
 
        

Can the gdb log be posted somewhere? I don't know if the attachment
made it to the original posting on hotspot-gc but it's no longer
available on hotspot-dev.
          

I received the attachment with the original email. I've attached it
to the bug report that I created: 8009536. You can find it there if
you want to. But I think we have a fairly good idea of what change
caused the hang.
        

If it helps: Unfortunately, we had some problems with recent JDK builds,
      

because javac and javadoc tools were not working correctly, failing to build
our source code. Since b78 this was fixed. Until this was fixed, we used build
b65 (which was the last one working) and the G1GC hangs did not appear on
this version. So it must have happened by a change after b65 till b78.
    

Uwe
 
      

Bengt
 
        

Thanks,
David
 
On 6/03/2013 4:07 PM, Krystal Mok wrote:
          

Hi Uwe,
 
If you can attach gdb onto it, and jstack -m and jstack -F should
also work; that'll get you the Java stack trace.
(But it probably doesn't matter in this case, because the hang is
probably bug in the VM).
 
- Kris
 
On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler
            

 <ma...@apache.org> <us...@apache.org>
        

wrote:
            

Hi,
 
since a few month we are extensively testing various preview
builds of JDK 8 for compatibility with Apache Lucene and Solr, so
we can find any bugs early and prevent the problems we had with
the release of Java 7 two years ago. Currently we have a Linux
(Ubuntu 64bit) Jenkins machine that has various JDKs (JDK 6, JDK
7, JDK 8 snapshot, IBM J9, older JRockit) installed, choosing a
different one with different hotspot and garbage collector
settings on every run of the test suite (which takes approx. 30-45
              

minutes).
    

JDK 8 b79 works so far very well on Linux, we found some strange
behavior in early versions (maybe compiler errors), but no longer
at the moment. There is one configuration that constantly and
reproducibly hangs in one module that is tested: The configuration
uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client
does not matter). The JVM running the tests hangs irresponsible
(jstack or kill -3 have no effect/cannot connect, standard kill
does not stop it, only kill -9 actually kills it). It can be
reproduced in this Lucene module 100% (it hangs always).
 
I was able to connect with GDB to the JVM and get a stack trace on
all threads (see attachment, dump.txt). As you see all threads of
G1GC seem to hang in a syscall (os:park(), a conditional wait in
pthread library). Unfortunately that’s all I can give you. A Java
stacktrace is not possible because the JVM reacts on neither kill
-3 nor jstack. With all other garbage collectors it passes the
test without hangs in a few seconds, with 32 bit G1GC it can stand
still for hours. The 64 bit JVM passes with G1GC, so only the 32
bit variant is affected. Client or Server VM makes no difference.
 
To reproduce:
- Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this
should not matter)
- Download Lucene Source code (e.g. the snapshot version we were
testing with:
https://builds.apache.org/job/Lucene-Artifacts-
              

trunk/2212/artifact/lucene/dist/)
        

- change to directory lucene/analysis/uima and run:
          ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
-Dtests.jvms=1 test
After a while the test framework prints "stalled" messages
(because the child VM actually running the test no longer
responds). The PID is also printed. Try to get a stack trace or kill it, no
              

response.
    

Only kill -9 helps. Choosing another garbage collector in the
above command line makes the test finish after a few seconds, e.g.
-Dargs="-server -XX:+UseConcMarkSweepGC"
 
I posted this bug report directly to the mailing list, because
with earlier bug reports, there seem to be a problem with
bugs.sun.com - there is no response from any reviewer after
several weeks and we were able to help to find and fix javadoc and
javac-compiler bugs early. So I hope you can help for this bug, too.
 
Uwe
 
-----
Uwe Schindler
uschindler@apache.org
Apache Lucene PMC Member / Committer Bremen, Germany
http://lucene.apache.org/
 
 
              

 
  

 


Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by John Cuthbertson <jo...@oracle.com>.
Hi Uwe,

An update:

I have downloaded ant and the lucerne source.

I attempted the ivy-bootstrap but it failed to download the 
ivy=2.3.0.jar file - even after setting:

ANT_OPTS=-Dhttp.proxyHost=<...> -Dhttp.proxyPort=<...>

So I manually downloaded and placed it into the ANT library and now get:

> ivy-bootstrap1:
>     [mkdir] Skipping /home/jcuthber/.ant/lib because it already exists.
>      [echo] installing ivy 2.3.0 to /home/jcuthber/.ant/lib
>       [get] Getting: 
> http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar
>       [get] To: /home/jcuthber/.ant/lib/ivy-2.3.0.jar
>       [get] Error getting 
> http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar 
> to /home/jcuthber/.ant/lib/ivy-2.3.0.jar
> [available] Found: /home/jcuthber/.ant/lib/ivy-2.3.0.jar
>
> ivy-bootstrap2:
> Skipped because property 'ivy.bootstrap1.success' set.
>
> ivy-checksum:
>
> ivy-bootstrap:
>
> BUILD SUCCESSFUL
> Total time: 3 minutes 46 seconds
Presumably I have to build the lucerne source before executing the 
tests. That seemed to go OK.

When I run the analysis/uima tests it seems to get hung up at the 
"resolve" target - even without specifying G1:

> cairnapple{jcuthber}:408> cd analysis/uima/
> cairnapple{jcuthber}:409> ls -l
> total 29
> -rw-r--r--   1 jcuthber staff       1473 Dec 10 10:39 build.xml
> -rw-rw-r--   1 jcuthber staff       6895 Mar  6 15:20 hotspot.log
> -rw-r--r--   1 jcuthber staff       1316 Mar 30  2012 ivy.xml
> drwxr-xr-x   2 jcuthber staff          2 Mar  5 07:42 lib/
> drwxr-xr-x   6 jcuthber staff          6 Mar  5 07:42 src/

> ivy-configure:
> [ivy:configure] Loading 
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivy.properties
> [ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 :: 
> http://ant.apache.org/ivy/ ::
> [ivy:configure] jakarta commons httpclient not found: using jdk url 
> handling
> [ivy:configure] :: loading settings :: file = 
> /export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/ivy-settings.xml
> [ivy:configure] no default ivy user dir defined: set to 
> /home/jcuthber/.ivy2
> [ivy:configure] including url: 
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-public.xml
> [ivy:configure] no default cache defined: set to 
> /home/jcuthber/.ivy2/cache
> [ivy:configure] including url: 
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-shared.xml
> [ivy:configure] including url: 
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-local.xml
> [ivy:configure] including url: 
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-main-chain.xml
> [ivy:configure] settings loaded (289ms)
> [ivy:configure]         default cache: /home/jcuthber/.ivy2/cache
> [ivy:configure]         default resolver: default
> [ivy:configure]         -- 7 resolvers:
> [ivy:configure]         working-chinese-mirror [ibiblio]
> [ivy:configure]         main [chain] [shared, public]
> [ivy:configure]         local [file]
> [ivy:configure]         shared [file]
> [ivy:configure]         sonatype-releases [ibiblio]
> [ivy:configure]         public [ibiblio]
> [ivy:configure]         default [chain] [local, main, 
> sonatype-releases, working-chinese-mirror]
>
> resolve:
> [ivy:retrieve] no resolved descriptor found: launching default resolve
> Overriding previous definition of property "ivy.version"
> [ivy:retrieve] using ivy parser to parse 
> file:/export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/analysis/uima/ivy.xml
> [ivy:retrieve] :: resolving dependencies :: 
> org.apache.lucene#analyzers-uima;working@cairnapple
> [ivy:retrieve]  confs: [default]
> [ivy:retrieve]  validate = true
> [ivy:retrieve]  refresh = false
> [ivy:retrieve] resolving dependencies for configuration 'default'
> [ivy:retrieve] == resolving dependencies for 
> org.apache.lucene#analyzers-uima;working@cairnapple [default]
> [ivy:retrieve] == resolving dependencies 
> org.apache.lucene#analyzers-uima;working@cairnapple->org.apache.uima#Tagger;2.3.1 
> [default->*]
> [ivy:retrieve] default: Checking cache for: dependency: 
> org.apache.uima#Tagger;2.3.1 {*=[*]}
> [ivy:retrieve] don't use cache for org.apache.uima#Tagger;2.3.1: 
> checkModified=true
> [ivy:retrieve]          tried 
> /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
> [ivy:retrieve]          tried 
> /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
> [ivy:retrieve]  local: no ivy file nor artifact found for 
> org.apache.uima#Tagger;2.3.1
> [ivy:retrieve] main: Checking cache for: dependency: 
> org.apache.uima#Tagger;2.3.1 {*=[*]}
> [ivy:retrieve]          tried 
> /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
> [ivy:retrieve]          tried 
> /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
> [ivy:retrieve]  shared: no ivy file nor artifact found for 
> org.apache.uima#Tagger;2.3.1
> [ivy:retrieve]          tried 
> http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom
and there it hangs - presumably trying to access 
http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom

There must be something with our proxy settings that that won't allow this.

JohnC


On 03/06/13 11:15, Uwe Schindler wrote:
> Hi,
>
> That's unfortunately not so easy, because of project dependencies. To run the test you have to compile Lucene Core then the specific module + the test framework (which is special for Lucene) and download some JARs from Maven central (JAR hell, as usual).
> If you give me some time, I would collect all needed JAR files from my local checkout and provide you the correct cmd line + a ZIP file with maybe a shell script to startup. It should be doable, but needs some work to collect all dependencies for the classpath.
>
> If you want to do it quicker (should be quite fast to do):
> - Download ANT 1.8.2 binary zip (unfortunately ANT 1.8.4 has a bug making it not working out of the box with Java 8): http://archive.apache.org/dist/ant/binaries/apache-ant-1.8.2-bin.tar.gz - I just wonder about the fact: isn't ANT needed to build the JDK classlib by itself? I remember that the FreeBSD OpenJDK build downloads ANT and does a large part of the compilation using ANT...
> - put the ANT bin/ dir into your PATH
> - download the Apache Lucene source code from Jenkins: https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/lucene-5.0-2013-03-05_15-37-06-src.tgz
> - go to extracted lucene source dir, call "ant ivy-bootstrap" (this will download Apache IVY, so all dependencies can be downloaded from Maven Central)
> - change to the module that fails: # cd analysis/uima
> - execute: # ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3 -Dtests.jvms=1 test
> - In a parallel console you might be able to attach to the process, the build in the main console using ANT runs inside ANT and the test framework spawns separate worker instances of the JVM to execute the tests. This makes it hard to reproduce in standalone (the command line passed to the child JVM is veeeeery long).
>
> I will work on putting together a precompiled ZIP file with all needed JARs + the command line. Just tell me if you got it managed with the above howto, then I don’t need to do this.
> Uwe
>
> -----
> Uwe Schindler
> uschindler@apache.org 
> Apache Lucene PMC Member / Committer
> Bremen, Germany
> http://lucene.apache.org/
>
>
>   
>> -----Original Message-----
>> From: John Cuthbertson [mailto:john.cuthbertson@oracle.com]
>> Sent: Wednesday, March 06, 2013 7:51 PM
>> To: Uwe Schindler
>> Cc: 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net;
>> dev@lucene.apache.org
>> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
>>
>> Hi Uwe,
>>
>> I've downloaded  lucene-5.0-2013-03-05_15-37-06.zip from
>> https://builds.apache.org/job/Lucene-Artifacts-
>> trunk/2212/artifact/lucene/dist/
>>
>> I don't have ant on my workstation so do you have a java command line to
>> run the test(s) that generate the error?
>>
>> Thanks,
>>
>> JohnC
>>
>> On 3/6/2013 3:16 AM, Uwe Schindler wrote:
>>     
>>> Hi,
>>>
>>>       
>>>> I think this is a VM bug and the thread dumps that Uwe produced are
>>>> enough to start tracking down the root cause.
>>>>         
>>> I hope it is enough! If I can help with more details, tell me what I should do
>>>       
>> to track this down. Unfortunately, we have no isolated test case (like a small
>> java class that triggers this bug) - you have to run the test cases of this
>> Lucene's module. It only happens there, not in any other Lucene test suite. It
>> may be caused by a lot of GC activity in this "UIMA" module or a specific test.
>>     
>>>> On 3/6/13 8:52 AM, David Holmes wrote:
>>>>         
>>>>> If the VM is completely unresponsive then it suggests we are at a
>>>>> safepoint.
>>>>>           
>>>> Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.
>>>>
>>>>         
>>>>> The GC threads are not "hung" in os::parK, they are parked - waiting
>>>>> to be notified of something.
>>>>>           
>>>> It looks like the reference processing thread is stuck in a loop
>>>> where it does wait(). So, the VM is hanging even if that stack trace
>>>> also ends up in os::park().
>>>>
>>>>         
>>>>> The thing is to find out why they are not being woken up.
>>>>>           
>>>> Actually, in this case we should probably not even be calling wait...
>>>>
>>>>         
>>>>> Can the gdb log be posted somewhere? I don't know if the attachment
>>>>> made it to the original posting on hotspot-gc but it's no longer
>>>>> available on hotspot-dev.
>>>>>           
>>>> I received the attachment with the original email. I've attached it
>>>> to the bug report that I created: 8009536. You can find it there if
>>>> you want to. But I think we have a fairly good idea of what change
>>>> caused the hang.
>>>>         
>>> If it helps: Unfortunately, we had some problems with recent JDK builds,
>>>       
>> because javac and javadoc tools were not working correctly, failing to build
>> our source code. Since b78 this was fixed. Until this was fixed, we used build
>> b65 (which was the last one working) and the G1GC hangs did not appear on
>> this version. So it must have happened by a change after b65 till b78.
>>     
>>> Uwe
>>>
>>>       
>>>> Bengt
>>>>
>>>>         
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 6/03/2013 4:07 PM, Krystal Mok wrote:
>>>>>           
>>>>>> Hi Uwe,
>>>>>>
>>>>>> If you can attach gdb onto it, and jstack -m and jstack -F should
>>>>>> also work; that'll get you the Java stack trace.
>>>>>> (But it probably doesn't matter in this case, because the hang is
>>>>>> probably bug in the VM).
>>>>>>
>>>>>> - Kris
>>>>>>
>>>>>> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler
>>>>>>             
>>>> <us...@apache.org>
>>>>         
>>>>>> wrote:
>>>>>>             
>>>>>>> Hi,
>>>>>>>
>>>>>>> since a few month we are extensively testing various preview
>>>>>>> builds of JDK 8 for compatibility with Apache Lucene and Solr, so
>>>>>>> we can find any bugs early and prevent the problems we had with
>>>>>>> the release of Java 7 two years ago. Currently we have a Linux
>>>>>>> (Ubuntu 64bit) Jenkins machine that has various JDKs (JDK 6, JDK
>>>>>>> 7, JDK 8 snapshot, IBM J9, older JRockit) installed, choosing a
>>>>>>> different one with different hotspot and garbage collector
>>>>>>> settings on every run of the test suite (which takes approx. 30-45
>>>>>>>               
>> minutes).
>>     
>>>>>>> JDK 8 b79 works so far very well on Linux, we found some strange
>>>>>>> behavior in early versions (maybe compiler errors), but no longer
>>>>>>> at the moment. There is one configuration that constantly and
>>>>>>> reproducibly hangs in one module that is tested: The configuration
>>>>>>> uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client
>>>>>>> does not matter). The JVM running the tests hangs irresponsible
>>>>>>> (jstack or kill -3 have no effect/cannot connect, standard kill
>>>>>>> does not stop it, only kill -9 actually kills it). It can be
>>>>>>> reproduced in this Lucene module 100% (it hangs always).
>>>>>>>
>>>>>>> I was able to connect with GDB to the JVM and get a stack trace on
>>>>>>> all threads (see attachment, dump.txt). As you see all threads of
>>>>>>> G1GC seem to hang in a syscall (os:park(), a conditional wait in
>>>>>>> pthread library). Unfortunately that’s all I can give you. A Java
>>>>>>> stacktrace is not possible because the JVM reacts on neither kill
>>>>>>> -3 nor jstack. With all other garbage collectors it passes the
>>>>>>> test without hangs in a few seconds, with 32 bit G1GC it can stand
>>>>>>> still for hours. The 64 bit JVM passes with G1GC, so only the 32
>>>>>>> bit variant is affected. Client or Server VM makes no difference.
>>>>>>>
>>>>>>> To reproduce:
>>>>>>> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this
>>>>>>> should not matter)
>>>>>>> - Download Lucene Source code (e.g. the snapshot version we were
>>>>>>> testing with:
>>>>>>> https://builds.apache.org/job/Lucene-Artifacts-
>>>>>>>               
>>>> trunk/2212/artifact/lucene/dist/)
>>>>         
>>>>>>> - change to directory lucene/analysis/uima and run:
>>>>>>>           ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
>>>>>>> -Dtests.jvms=1 test
>>>>>>> After a while the test framework prints "stalled" messages
>>>>>>> (because the child VM actually running the test no longer
>>>>>>> responds). The PID is also printed. Try to get a stack trace or kill it, no
>>>>>>>               
>> response.
>>     
>>>>>>> Only kill -9 helps. Choosing another garbage collector in the
>>>>>>> above command line makes the test finish after a few seconds, e.g.
>>>>>>> -Dargs="-server -XX:+UseConcMarkSweepGC"
>>>>>>>
>>>>>>> I posted this bug report directly to the mailing list, because
>>>>>>> with earlier bug reports, there seem to be a problem with
>>>>>>> bugs.sun.com - there is no response from any reviewer after
>>>>>>> several weeks and we were able to help to find and fix javadoc and
>>>>>>> javac-compiler bugs early. So I hope you can help for this bug, too.
>>>>>>>
>>>>>>> Uwe
>>>>>>>
>>>>>>> -----
>>>>>>> Uwe Schindler
>>>>>>> uschindler@apache.org
>>>>>>> Apache Lucene PMC Member / Committer Bremen, Germany
>>>>>>> http://lucene.apache.org/
>>>>>>>
>>>>>>>
>>>>>>>               
>
>   


RE: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Uwe Schindler <us...@apache.org>.
Hi,

That's unfortunately not so easy, because of project dependencies. To run the test you have to compile Lucene Core then the specific module + the test framework (which is special for Lucene) and download some JARs from Maven central (JAR hell, as usual).
If you give me some time, I would collect all needed JAR files from my local checkout and provide you the correct cmd line + a ZIP file with maybe a shell script to startup. It should be doable, but needs some work to collect all dependencies for the classpath.

If you want to do it quicker (should be quite fast to do):
- Download ANT 1.8.2 binary zip (unfortunately ANT 1.8.4 has a bug making it not working out of the box with Java 8): http://archive.apache.org/dist/ant/binaries/apache-ant-1.8.2-bin.tar.gz - I just wonder about the fact: isn't ANT needed to build the JDK classlib by itself? I remember that the FreeBSD OpenJDK build downloads ANT and does a large part of the compilation using ANT...
- put the ANT bin/ dir into your PATH
- download the Apache Lucene source code from Jenkins: https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/lucene-5.0-2013-03-05_15-37-06-src.tgz
- go to extracted lucene source dir, call "ant ivy-bootstrap" (this will download Apache IVY, so all dependencies can be downloaded from Maven Central)
- change to the module that fails: # cd analysis/uima
- execute: # ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3 -Dtests.jvms=1 test
- In a parallel console you might be able to attach to the process, the build in the main console using ANT runs inside ANT and the test framework spawns separate worker instances of the JVM to execute the tests. This makes it hard to reproduce in standalone (the command line passed to the child JVM is veeeeery long).

I will work on putting together a precompiled ZIP file with all needed JARs + the command line. Just tell me if you got it managed with the above howto, then I don’t need to do this.
Uwe

-----
Uwe Schindler
uschindler@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/


> -----Original Message-----
> From: John Cuthbertson [mailto:john.cuthbertson@oracle.com]
> Sent: Wednesday, March 06, 2013 7:51 PM
> To: Uwe Schindler
> Cc: 'Bengt Rutisson'; hotspot-gc-dev@openjdk.java.net;
> dev@lucene.apache.org
> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
> 
> Hi Uwe,
> 
> I've downloaded  lucene-5.0-2013-03-05_15-37-06.zip from
> https://builds.apache.org/job/Lucene-Artifacts-
> trunk/2212/artifact/lucene/dist/
> 
> I don't have ant on my workstation so do you have a java command line to
> run the test(s) that generate the error?
> 
> Thanks,
> 
> JohnC
> 
> On 3/6/2013 3:16 AM, Uwe Schindler wrote:
> > Hi,
> >
> >> I think this is a VM bug and the thread dumps that Uwe produced are
> >> enough to start tracking down the root cause.
> > I hope it is enough! If I can help with more details, tell me what I should do
> to track this down. Unfortunately, we have no isolated test case (like a small
> java class that triggers this bug) - you have to run the test cases of this
> Lucene's module. It only happens there, not in any other Lucene test suite. It
> may be caused by a lot of GC activity in this "UIMA" module or a specific test.
> >
> >> On 3/6/13 8:52 AM, David Holmes wrote:
> >>> If the VM is completely unresponsive then it suggests we are at a
> >>> safepoint.
> >> Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.
> >>
> >>> The GC threads are not "hung" in os::parK, they are parked - waiting
> >>> to be notified of something.
> >> It looks like the reference processing thread is stuck in a loop
> >> where it does wait(). So, the VM is hanging even if that stack trace
> >> also ends up in os::park().
> >>
> >>> The thing is to find out why they are not being woken up.
> >> Actually, in this case we should probably not even be calling wait...
> >>
> >>> Can the gdb log be posted somewhere? I don't know if the attachment
> >>> made it to the original posting on hotspot-gc but it's no longer
> >>> available on hotspot-dev.
> >> I received the attachment with the original email. I've attached it
> >> to the bug report that I created: 8009536. You can find it there if
> >> you want to. But I think we have a fairly good idea of what change
> >> caused the hang.
> > If it helps: Unfortunately, we had some problems with recent JDK builds,
> because javac and javadoc tools were not working correctly, failing to build
> our source code. Since b78 this was fixed. Until this was fixed, we used build
> b65 (which was the last one working) and the G1GC hangs did not appear on
> this version. So it must have happened by a change after b65 till b78.
> >
> > Uwe
> >
> >> Bengt
> >>
> >>> Thanks,
> >>> David
> >>>
> >>> On 6/03/2013 4:07 PM, Krystal Mok wrote:
> >>>> Hi Uwe,
> >>>>
> >>>> If you can attach gdb onto it, and jstack -m and jstack -F should
> >>>> also work; that'll get you the Java stack trace.
> >>>> (But it probably doesn't matter in this case, because the hang is
> >>>> probably bug in the VM).
> >>>>
> >>>> - Kris
> >>>>
> >>>> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler
> >> <us...@apache.org>
> >>>> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> since a few month we are extensively testing various preview
> >>>>> builds of JDK 8 for compatibility with Apache Lucene and Solr, so
> >>>>> we can find any bugs early and prevent the problems we had with
> >>>>> the release of Java 7 two years ago. Currently we have a Linux
> >>>>> (Ubuntu 64bit) Jenkins machine that has various JDKs (JDK 6, JDK
> >>>>> 7, JDK 8 snapshot, IBM J9, older JRockit) installed, choosing a
> >>>>> different one with different hotspot and garbage collector
> >>>>> settings on every run of the test suite (which takes approx. 30-45
> minutes).
> >>>>>
> >>>>> JDK 8 b79 works so far very well on Linux, we found some strange
> >>>>> behavior in early versions (maybe compiler errors), but no longer
> >>>>> at the moment. There is one configuration that constantly and
> >>>>> reproducibly hangs in one module that is tested: The configuration
> >>>>> uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client
> >>>>> does not matter). The JVM running the tests hangs irresponsible
> >>>>> (jstack or kill -3 have no effect/cannot connect, standard kill
> >>>>> does not stop it, only kill -9 actually kills it). It can be
> >>>>> reproduced in this Lucene module 100% (it hangs always).
> >>>>>
> >>>>> I was able to connect with GDB to the JVM and get a stack trace on
> >>>>> all threads (see attachment, dump.txt). As you see all threads of
> >>>>> G1GC seem to hang in a syscall (os:park(), a conditional wait in
> >>>>> pthread library). Unfortunately that’s all I can give you. A Java
> >>>>> stacktrace is not possible because the JVM reacts on neither kill
> >>>>> -3 nor jstack. With all other garbage collectors it passes the
> >>>>> test without hangs in a few seconds, with 32 bit G1GC it can stand
> >>>>> still for hours. The 64 bit JVM passes with G1GC, so only the 32
> >>>>> bit variant is affected. Client or Server VM makes no difference.
> >>>>>
> >>>>> To reproduce:
> >>>>> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this
> >>>>> should not matter)
> >>>>> - Download Lucene Source code (e.g. the snapshot version we were
> >>>>> testing with:
> >>>>> https://builds.apache.org/job/Lucene-Artifacts-
> >> trunk/2212/artifact/lucene/dist/)
> >>>>> - change to directory lucene/analysis/uima and run:
> >>>>>           ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
> >>>>> -Dtests.jvms=1 test
> >>>>> After a while the test framework prints "stalled" messages
> >>>>> (because the child VM actually running the test no longer
> >>>>> responds). The PID is also printed. Try to get a stack trace or kill it, no
> response.
> >>>>> Only kill -9 helps. Choosing another garbage collector in the
> >>>>> above command line makes the test finish after a few seconds, e.g.
> >>>>> -Dargs="-server -XX:+UseConcMarkSweepGC"
> >>>>>
> >>>>> I posted this bug report directly to the mailing list, because
> >>>>> with earlier bug reports, there seem to be a problem with
> >>>>> bugs.sun.com - there is no response from any reviewer after
> >>>>> several weeks and we were able to help to find and fix javadoc and
> >>>>> javac-compiler bugs early. So I hope you can help for this bug, too.
> >>>>>
> >>>>> Uwe
> >>>>>
> >>>>> -----
> >>>>> Uwe Schindler
> >>>>> uschindler@apache.org
> >>>>> Apache Lucene PMC Member / Committer Bremen, Germany
> >>>>> http://lucene.apache.org/
> >>>>>
> >>>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by John Cuthbertson <jo...@oracle.com>.
Hi Uwe,

I've downloaded  lucene-5.0-2013-03-05_15-37-06.zip from 
https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/

I don't have ant on my workstation so do you have a java command line to 
run the test(s) that generate the error?

Thanks,

JohnC

On 3/6/2013 3:16 AM, Uwe Schindler wrote:
> Hi,
>   
>> I think this is a VM bug and the thread dumps that Uwe produced are enough
>> to start tracking down the root cause.
> I hope it is enough! If I can help with more details, tell me what I should do to track this down. Unfortunately, we have no isolated test case (like a small java class that triggers this bug) - you have to run the test cases of this Lucene's module. It only happens there, not in any other Lucene test suite. It may be caused by a lot of GC activity in this "UIMA" module or a specific test.
>
>> On 3/6/13 8:52 AM, David Holmes wrote:
>>> If the VM is completely unresponsive then it suggests we are at a
>>> safepoint.
>> Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.
>>
>>> The GC threads are not "hung" in os::parK, they are parked - waiting
>>> to be notified of something.
>> It looks like the reference processing thread is stuck in a loop where it does
>> wait(). So, the VM is hanging even if that stack trace also ends up in
>> os::park().
>>
>>> The thing is to find out why they are not being woken up.
>> Actually, in this case we should probably not even be calling wait...
>>
>>> Can the gdb log be posted somewhere? I don't know if the attachment
>>> made it to the original posting on hotspot-gc but it's no longer
>>> available on hotspot-dev.
>> I received the attachment with the original email. I've attached it to
>> the bug report that I created: 8009536. You can find it there if you
>> want to. But I think we have a fairly good idea of what change caused
>> the hang.
> If it helps: Unfortunately, we had some problems with recent JDK builds, because javac and javadoc tools were not working correctly, failing to build our source code. Since b78 this was fixed. Until this was fixed, we used build b65 (which was the last one working) and the G1GC hangs did not appear on this version. So it must have happened by a change after b65 till b78.
>
> Uwe
>
>> Bengt
>>
>>> Thanks,
>>> David
>>>
>>> On 6/03/2013 4:07 PM, Krystal Mok wrote:
>>>> Hi Uwe,
>>>>
>>>> If you can attach gdb onto it, and jstack -m and jstack -F should also
>>>> work; that'll get you the Java stack trace.
>>>> (But it probably doesn't matter in this case, because the hang is
>>>> probably bug in the VM).
>>>>
>>>> - Kris
>>>>
>>>> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler
>> <us...@apache.org>
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> since a few month we are extensively testing various preview builds
>>>>> of JDK 8 for compatibility with Apache Lucene and Solr, so we can
>>>>> find any bugs early and prevent the problems we had with the release
>>>>> of Java 7 two years ago. Currently we have a Linux (Ubuntu 64bit)
>>>>> Jenkins machine that has various JDKs (JDK 6, JDK 7, JDK 8 snapshot,
>>>>> IBM J9, older JRockit) installed, choosing a different one with
>>>>> different hotspot and garbage collector settings on every run of the
>>>>> test suite (which takes approx. 30-45 minutes).
>>>>>
>>>>> JDK 8 b79 works so far very well on Linux, we found some strange
>>>>> behavior in early versions (maybe compiler errors), but no longer at
>>>>> the moment. There is one configuration that constantly and
>>>>> reproducibly hangs in one module that is tested: The configuration
>>>>> uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client
>>>>> does not matter). The JVM running the tests hangs irresponsible
>>>>> (jstack or kill -3 have no effect/cannot connect, standard kill does
>>>>> not stop it, only kill -9 actually kills it). It can be reproduced
>>>>> in this Lucene module 100% (it hangs always).
>>>>>
>>>>> I was able to connect with GDB to the JVM and get a stack trace on
>>>>> all threads (see attachment, dump.txt). As you see all threads of
>>>>> G1GC seem to hang in a syscall (os:park(), a conditional wait in
>>>>> pthread library). Unfortunately that’s all I can give you. A Java
>>>>> stacktrace is not possible because the JVM reacts on neither kill -3
>>>>> nor jstack. With all other garbage collectors it passes the test
>>>>> without hangs in a few seconds, with 32 bit G1GC it can stand still
>>>>> for hours. The 64 bit JVM passes with G1GC, so only the 32 bit
>>>>> variant is affected. Client or Server VM makes no difference.
>>>>>
>>>>> To reproduce:
>>>>> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this
>>>>> should not matter)
>>>>> - Download Lucene Source code (e.g. the snapshot version we were
>>>>> testing with:
>>>>> https://builds.apache.org/job/Lucene-Artifacts-
>> trunk/2212/artifact/lucene/dist/)
>>>>> - change to directory lucene/analysis/uima and run:
>>>>>           ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
>>>>> -Dtests.jvms=1 test
>>>>> After a while the test framework prints "stalled" messages (because
>>>>> the child VM actually running the test no longer responds). The PID
>>>>> is also printed. Try to get a stack trace or kill it, no response.
>>>>> Only kill -9 helps. Choosing another garbage collector in the above
>>>>> command line makes the test finish after a few seconds, e.g.
>>>>> -Dargs="-server -XX:+UseConcMarkSweepGC"
>>>>>
>>>>> I posted this bug report directly to the mailing list, because with
>>>>> earlier bug reports, there seem to be a problem with bugs.sun.com -
>>>>> there is no response from any reviewer after several weeks and we
>>>>> were able to help to find and fix javadoc and javac-compiler bugs
>>>>> early. So I hope you can help for this bug, too.
>>>>>
>>>>> Uwe
>>>>>
>>>>> -----
>>>>> Uwe Schindler
>>>>> uschindler@apache.org
>>>>> Apache Lucene PMC Member / Committer
>>>>> Bremen, Germany
>>>>> http://lucene.apache.org/
>>>>>
>>>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Uwe Schindler <us...@apache.org>.
Hi,
 
> I think this is a VM bug and the thread dumps that Uwe produced are enough
> to start tracking down the root cause.

I hope it is enough! If I can help with more details, tell me what I should do to track this down. Unfortunately, we have no isolated test case (like a small java class that triggers this bug) - you have to run the test cases of this Lucene's module. It only happens there, not in any other Lucene test suite. It may be caused by a lot of GC activity in this "UIMA" module or a specific test.

> On 3/6/13 8:52 AM, David Holmes wrote:
> > If the VM is completely unresponsive then it suggests we are at a
> > safepoint.
> Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.
> 
> >
> > The GC threads are not "hung" in os::parK, they are parked - waiting
> > to be notified of something.
> 
> It looks like the reference processing thread is stuck in a loop where it does
> wait(). So, the VM is hanging even if that stack trace also ends up in
> os::park().
> 
> >
> > The thing is to find out why they are not being woken up.
> 
> Actually, in this case we should probably not even be calling wait...
> 
> >
> > Can the gdb log be posted somewhere? I don't know if the attachment
> > made it to the original posting on hotspot-gc but it's no longer
> > available on hotspot-dev.
> 
> I received the attachment with the original email. I've attached it to
> the bug report that I created: 8009536. You can find it there if you
> want to. But I think we have a fairly good idea of what change caused
> the hang.

If it helps: Unfortunately, we had some problems with recent JDK builds, because javac and javadoc tools were not working correctly, failing to build our source code. Since b78 this was fixed. Until this was fixed, we used build b65 (which was the last one working) and the G1GC hangs did not appear on this version. So it must have happened by a change after b65 till b78.

Uwe

> Bengt
> 
> >
> > Thanks,
> > David
> >
> > On 6/03/2013 4:07 PM, Krystal Mok wrote:
> >> Hi Uwe,
> >>
> >> If you can attach gdb onto it, and jstack -m and jstack -F should also
> >> work; that'll get you the Java stack trace.
> >> (But it probably doesn't matter in this case, because the hang is
> >> probably bug in the VM).
> >>
> >> - Kris
> >>
> >> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler
> <us...@apache.org>
> >> wrote:
> >>> Hi,
> >>>
> >>> since a few month we are extensively testing various preview builds
> >>> of JDK 8 for compatibility with Apache Lucene and Solr, so we can
> >>> find any bugs early and prevent the problems we had with the release
> >>> of Java 7 two years ago. Currently we have a Linux (Ubuntu 64bit)
> >>> Jenkins machine that has various JDKs (JDK 6, JDK 7, JDK 8 snapshot,
> >>> IBM J9, older JRockit) installed, choosing a different one with
> >>> different hotspot and garbage collector settings on every run of the
> >>> test suite (which takes approx. 30-45 minutes).
> >>>
> >>> JDK 8 b79 works so far very well on Linux, we found some strange
> >>> behavior in early versions (maybe compiler errors), but no longer at
> >>> the moment. There is one configuration that constantly and
> >>> reproducibly hangs in one module that is tested: The configuration
> >>> uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client
> >>> does not matter). The JVM running the tests hangs irresponsible
> >>> (jstack or kill -3 have no effect/cannot connect, standard kill does
> >>> not stop it, only kill -9 actually kills it). It can be reproduced
> >>> in this Lucene module 100% (it hangs always).
> >>>
> >>> I was able to connect with GDB to the JVM and get a stack trace on
> >>> all threads (see attachment, dump.txt). As you see all threads of
> >>> G1GC seem to hang in a syscall (os:park(), a conditional wait in
> >>> pthread library). Unfortunately that’s all I can give you. A Java
> >>> stacktrace is not possible because the JVM reacts on neither kill -3
> >>> nor jstack. With all other garbage collectors it passes the test
> >>> without hangs in a few seconds, with 32 bit G1GC it can stand still
> >>> for hours. The 64 bit JVM passes with G1GC, so only the 32 bit
> >>> variant is affected. Client or Server VM makes no difference.
> >>>
> >>> To reproduce:
> >>> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this
> >>> should not matter)
> >>> - Download Lucene Source code (e.g. the snapshot version we were
> >>> testing with:
> >>> https://builds.apache.org/job/Lucene-Artifacts-
> trunk/2212/artifact/lucene/dist/)
> >>> - change to directory lucene/analysis/uima and run:
> >>>          ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
> >>> -Dtests.jvms=1 test
> >>> After a while the test framework prints "stalled" messages (because
> >>> the child VM actually running the test no longer responds). The PID
> >>> is also printed. Try to get a stack trace or kill it, no response.
> >>> Only kill -9 helps. Choosing another garbage collector in the above
> >>> command line makes the test finish after a few seconds, e.g.
> >>> -Dargs="-server -XX:+UseConcMarkSweepGC"
> >>>
> >>> I posted this bug report directly to the mailing list, because with
> >>> earlier bug reports, there seem to be a problem with bugs.sun.com -
> >>> there is no response from any reviewer after several weeks and we
> >>> were able to help to find and fix javadoc and javac-compiler bugs
> >>> early. So I hope you can help for this bug, too.
> >>>
> >>> Uwe
> >>>
> >>> -----
> >>> Uwe Schindler
> >>> uschindler@apache.org
> >>> Apache Lucene PMC Member / Committer
> >>> Bremen, Germany
> >>> http://lucene.apache.org/
> >>>
> >>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Simon Willnauer <si...@gmail.com>.
Uwe you rock! Beside my morning entertainment this was an awesome job!

simon

On Tue, Mar 12, 2013 at 8:31 AM, Tommaso Teofili
<to...@gmail.com> wrote:
> thanks Uwe!
>
>
> 2013/3/12 Robert Muir <rc...@gmail.com>
>>
>> Uwe: Thanks for working with them to get all these issues fixed.
>>
>> On Mon, Mar 11, 2013 at 7:34 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> > Hi,
>> >
>> > FYI, Oracle has a fix for the G1GC hang in UIMA waiting for review:
>> >
>> > Issue: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8009536
>> > Webrev:
>> > http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-March/006215.html
>> > Patch: http://cr.openjdk.java.net/~johnc/8009536/webrev.0/
>> >
>> > Thanks to John Cuthbertson and Bengt Rutisson @ Oracle for fixing so
>> > fast! We just have to wait for a new JDK8 build with that fix included (and
>> > some more for the other Lucene-related bugs).
>> > Uwe
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe@thetaphi.de
>> >
>> >
>> >> -----Original Message-----
>> >> From: Mark Miller [mailto:markrmiller@gmail.com]
>> >> Sent: Wednesday, March 06, 2013 7:52 PM
>> >> To: dev@lucene.apache.org
>> >> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32
>> >> bit)
>> >>
>> >> Awesome work Uwe! Nice job getting this some attention.
>> >>
>> >> - mark
>> >>
>> >> On Mar 6, 2013, at 10:41 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> >>
>> >> > It seems that there is already an explanation from the Oracle
>> >> > engineer:
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: John Cuthbertson [mailto:john.cuthbertson@oracle.com]
>> >> >> Sent: Wednesday, March 06, 2013 7:04 PM
>> >> >> To: Thomas Schatzl
>> >> >> Cc: Uwe Schindler; hotspot-gc-dev@openjdk.java.net; 'David Holmes';
>> >> >> 'Dawid Weiss'; hotspot-dev@openjdk.java.net
>> >> >> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux
>> >> >> 32
>> >> >> bit)
>> >> >>
>> >> >> Hi Everyone,
>> >> >>
>> >> >> All:
>> >> >> I've looked at the bug report (haven't tried to reproduce it yet)
>> >> >> and
>> >> >> Bengt's analysis is correct. The concurrent mark thread is entering
>> >> >> the synchronization protocol in a marking step call. That code is
>> >> >> waiting for some non-existent workers to terminate before
>> >> >> proceeding.
>> >> >> Normally we shouldn't be entering that code but I think we
>> >> >> overflowed
>> >> >> the global marking stack (I updated the CR at ~1am my time with that
>> >> >> conjecture). I think I missed a set_phase() call to tell the
>> >> >> parallel
>> >> >> terminator that we only have one thread and it's picking up the
>> >> >> number of workers that executed the remark parallel task.
>> >> >>
>> >> >> Thomas: you were on the right track with your comment about the
>> >> >> marking stack size.
>> >> >>
>> >> >> David:
>> >> >> Thanks for helping out here. The stack trace you mentioned was for
>> >> >> one the refinement threads - a concurrent GC thread. When a
>> >> >> concurrent GC thread "joins" the suspendible thread set, it means
>> >> >> that it will observe and participate in safepoint operations, i.e.
>> >> >> the thread will notice that it should reach a safepoint and the
>> >> >> safepoint
>> >> synchronizer code will wait for it to block.
>> >> >> When we wish a concurrent GC thread to not observe safepoints, that
>> >> >> thread leaves the suspendible thread set. I think the name could be
>> >> >> a
>> >> >> bit better and Tony, before he left, had a change that used a scoped
>> >> >> object to join and leave the STS that hasn't been integrated yet.
>> >> >> IIRC Tony wasn't happy with the name he chose for that also.
>> >> >>
>> >> >> Uwe:
>> >> >> Thanks for bringing this up and my apologies for not replying
>> >> >> sooner.
>> >> >> I will have a fix fairly soon. If I'm correct about it being caused
>> >> >> by overflowing the marking stack you can work around the issue by
>> >> >> increasing the MarkStackSize.you could try increasing it to 2M or 4M
>> >> >> entries (which is the current max size).
>> >> >>
>> >> >> Cheers,
>> >> >>
>> >> >> JohnC
>> >> >
>> >> > -----
>> >> > Uwe Schindler
>> >> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> > http://www.thetaphi.de
>> >> > eMail: uwe@thetaphi.de
>> >> >
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> >> >> Sent: Wednesday, March 06, 2013 1:35 PM
>> >> >> To: dev@lucene.apache.org
>> >> >> Subject: FW: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux
>> >> >> 32
>> >> >> bit)
>> >> >>
>> >> >> They already understood the G1GC problem with JDK 8 b78/b79 and
>> >> >> working on a fix. This was really fast:
>> >> >> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-
>> >> >> March/006128.html
>> >> >>
>> >> >> Uwe
>> >> >>
>> >> >> -----
>> >> >> Uwe Schindler
>> >> >> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> >> http://www.thetaphi.de
>> >> >> eMail: uwe@thetaphi.de
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>> >> > additional commands, e-mail: dev-help@lucene.apache.org
>> >> >
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>> >> additional
>> >> commands, e-mail: dev-help@lucene.apache.org
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: dev-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Tommaso Teofili <to...@gmail.com>.
thanks Uwe!


2013/3/12 Robert Muir <rc...@gmail.com>

> Uwe: Thanks for working with them to get all these issues fixed.
>
> On Mon, Mar 11, 2013 at 7:34 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> > Hi,
> >
> > FYI, Oracle has a fix for the G1GC hang in UIMA waiting for review:
> >
> > Issue: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8009536
> > Webrev:
> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-March/006215.html
> > Patch: http://cr.openjdk.java.net/~johnc/8009536/webrev.0/
> >
> > Thanks to John Cuthbertson and Bengt Rutisson @ Oracle for fixing so
> fast! We just have to wait for a new JDK8 build with that fix included (and
> some more for the other Lucene-related bugs).
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Mark Miller [mailto:markrmiller@gmail.com]
> >> Sent: Wednesday, March 06, 2013 7:52 PM
> >> To: dev@lucene.apache.org
> >> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32
> bit)
> >>
> >> Awesome work Uwe! Nice job getting this some attention.
> >>
> >> - mark
> >>
> >> On Mar 6, 2013, at 10:41 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> >>
> >> > It seems that there is already an explanation from the Oracle
> engineer:
> >> >
> >> >> -----Original Message-----
> >> >> From: John Cuthbertson [mailto:john.cuthbertson@oracle.com]
> >> >> Sent: Wednesday, March 06, 2013 7:04 PM
> >> >> To: Thomas Schatzl
> >> >> Cc: Uwe Schindler; hotspot-gc-dev@openjdk.java.net; 'David Holmes';
> >> >> 'Dawid Weiss'; hotspot-dev@openjdk.java.net
> >> >> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32
> >> >> bit)
> >> >>
> >> >> Hi Everyone,
> >> >>
> >> >> All:
> >> >> I've looked at the bug report (haven't tried to reproduce it yet) and
> >> >> Bengt's analysis is correct. The concurrent mark thread is entering
> >> >> the synchronization protocol in a marking step call. That code is
> >> >> waiting for some non-existent workers to terminate before proceeding.
> >> >> Normally we shouldn't be entering that code but I think we overflowed
> >> >> the global marking stack (I updated the CR at ~1am my time with that
> >> >> conjecture). I think I missed a set_phase() call to tell the parallel
> >> >> terminator that we only have one thread and it's picking up the
> >> >> number of workers that executed the remark parallel task.
> >> >>
> >> >> Thomas: you were on the right track with your comment about the
> >> >> marking stack size.
> >> >>
> >> >> David:
> >> >> Thanks for helping out here. The stack trace you mentioned was for
> >> >> one the refinement threads - a concurrent GC thread. When a
> >> >> concurrent GC thread "joins" the suspendible thread set, it means
> >> >> that it will observe and participate in safepoint operations, i.e.
> >> >> the thread will notice that it should reach a safepoint and the
> safepoint
> >> synchronizer code will wait for it to block.
> >> >> When we wish a concurrent GC thread to not observe safepoints, that
> >> >> thread leaves the suspendible thread set. I think the name could be a
> >> >> bit better and Tony, before he left, had a change that used a scoped
> >> >> object to join and leave the STS that hasn't been integrated yet.
> >> >> IIRC Tony wasn't happy with the name he chose for that also.
> >> >>
> >> >> Uwe:
> >> >> Thanks for bringing this up and my apologies for not replying sooner.
> >> >> I will have a fix fairly soon. If I'm correct about it being caused
> >> >> by overflowing the marking stack you can work around the issue by
> >> >> increasing the MarkStackSize.you could try increasing it to 2M or 4M
> >> >> entries (which is the current max size).
> >> >>
> >> >> Cheers,
> >> >>
> >> >> JohnC
> >> >
> >> > -----
> >> > Uwe Schindler
> >> > H.-H.-Meier-Allee 63, D-28213 Bremen
> >> > http://www.thetaphi.de
> >> > eMail: uwe@thetaphi.de
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> >> >> Sent: Wednesday, March 06, 2013 1:35 PM
> >> >> To: dev@lucene.apache.org
> >> >> Subject: FW: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32
> >> >> bit)
> >> >>
> >> >> They already understood the G1GC problem with JDK 8 b78/b79 and
> >> >> working on a fix. This was really fast:
> >> >> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-
> >> >> March/006128.html
> >> >>
> >> >> Uwe
> >> >>
> >> >> -----
> >> >> Uwe Schindler
> >> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> >> http://www.thetaphi.de
> >> >> eMail: uwe@thetaphi.de
> >> >>
> >> >
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> >> > additional commands, e-mail: dev-help@lucene.apache.org
> >> >
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> additional
> >> commands, e-mail: dev-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Robert Muir <rc...@gmail.com>.
Uwe: Thanks for working with them to get all these issues fixed.

On Mon, Mar 11, 2013 at 7:34 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> Hi,
>
> FYI, Oracle has a fix for the G1GC hang in UIMA waiting for review:
>
> Issue: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8009536
> Webrev: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-March/006215.html
> Patch: http://cr.openjdk.java.net/~johnc/8009536/webrev.0/
>
> Thanks to John Cuthbertson and Bengt Rutisson @ Oracle for fixing so fast! We just have to wait for a new JDK8 build with that fix included (and some more for the other Lucene-related bugs).
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Mark Miller [mailto:markrmiller@gmail.com]
>> Sent: Wednesday, March 06, 2013 7:52 PM
>> To: dev@lucene.apache.org
>> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
>>
>> Awesome work Uwe! Nice job getting this some attention.
>>
>> - mark
>>
>> On Mar 6, 2013, at 10:41 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>>
>> > It seems that there is already an explanation from the Oracle engineer:
>> >
>> >> -----Original Message-----
>> >> From: John Cuthbertson [mailto:john.cuthbertson@oracle.com]
>> >> Sent: Wednesday, March 06, 2013 7:04 PM
>> >> To: Thomas Schatzl
>> >> Cc: Uwe Schindler; hotspot-gc-dev@openjdk.java.net; 'David Holmes';
>> >> 'Dawid Weiss'; hotspot-dev@openjdk.java.net
>> >> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32
>> >> bit)
>> >>
>> >> Hi Everyone,
>> >>
>> >> All:
>> >> I've looked at the bug report (haven't tried to reproduce it yet) and
>> >> Bengt's analysis is correct. The concurrent mark thread is entering
>> >> the synchronization protocol in a marking step call. That code is
>> >> waiting for some non-existent workers to terminate before proceeding.
>> >> Normally we shouldn't be entering that code but I think we overflowed
>> >> the global marking stack (I updated the CR at ~1am my time with that
>> >> conjecture). I think I missed a set_phase() call to tell the parallel
>> >> terminator that we only have one thread and it's picking up the
>> >> number of workers that executed the remark parallel task.
>> >>
>> >> Thomas: you were on the right track with your comment about the
>> >> marking stack size.
>> >>
>> >> David:
>> >> Thanks for helping out here. The stack trace you mentioned was for
>> >> one the refinement threads - a concurrent GC thread. When a
>> >> concurrent GC thread "joins" the suspendible thread set, it means
>> >> that it will observe and participate in safepoint operations, i.e.
>> >> the thread will notice that it should reach a safepoint and the safepoint
>> synchronizer code will wait for it to block.
>> >> When we wish a concurrent GC thread to not observe safepoints, that
>> >> thread leaves the suspendible thread set. I think the name could be a
>> >> bit better and Tony, before he left, had a change that used a scoped
>> >> object to join and leave the STS that hasn't been integrated yet.
>> >> IIRC Tony wasn't happy with the name he chose for that also.
>> >>
>> >> Uwe:
>> >> Thanks for bringing this up and my apologies for not replying sooner.
>> >> I will have a fix fairly soon. If I'm correct about it being caused
>> >> by overflowing the marking stack you can work around the issue by
>> >> increasing the MarkStackSize.you could try increasing it to 2M or 4M
>> >> entries (which is the current max size).
>> >>
>> >> Cheers,
>> >>
>> >> JohnC
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe@thetaphi.de
>> >
>> >
>> >> -----Original Message-----
>> >> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> >> Sent: Wednesday, March 06, 2013 1:35 PM
>> >> To: dev@lucene.apache.org
>> >> Subject: FW: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32
>> >> bit)
>> >>
>> >> They already understood the G1GC problem with JDK 8 b78/b79 and
>> >> working on a fix. This was really fast:
>> >> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-
>> >> March/006128.html
>> >>
>> >> Uwe
>> >>
>> >> -----
>> >> Uwe Schindler
>> >> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> http://www.thetaphi.de
>> >> eMail: uwe@thetaphi.de
>> >>
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>> > additional commands, e-mail: dev-help@lucene.apache.org
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>> commands, e-mail: dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

FYI, Oracle has a fix for the G1GC hang in UIMA waiting for review:

Issue: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8009536
Webrev: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-March/006215.html
Patch: http://cr.openjdk.java.net/~johnc/8009536/webrev.0/

Thanks to John Cuthbertson and Bengt Rutisson @ Oracle for fixing so fast! We just have to wait for a new JDK8 build with that fix included (and some more for the other Lucene-related bugs).
Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Wednesday, March 06, 2013 7:52 PM
> To: dev@lucene.apache.org
> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
> 
> Awesome work Uwe! Nice job getting this some attention.
> 
> - mark
> 
> On Mar 6, 2013, at 10:41 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> 
> > It seems that there is already an explanation from the Oracle engineer:
> >
> >> -----Original Message-----
> >> From: John Cuthbertson [mailto:john.cuthbertson@oracle.com]
> >> Sent: Wednesday, March 06, 2013 7:04 PM
> >> To: Thomas Schatzl
> >> Cc: Uwe Schindler; hotspot-gc-dev@openjdk.java.net; 'David Holmes';
> >> 'Dawid Weiss'; hotspot-dev@openjdk.java.net
> >> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32
> >> bit)
> >>
> >> Hi Everyone,
> >>
> >> All:
> >> I've looked at the bug report (haven't tried to reproduce it yet) and
> >> Bengt's analysis is correct. The concurrent mark thread is entering
> >> the synchronization protocol in a marking step call. That code is
> >> waiting for some non-existent workers to terminate before proceeding.
> >> Normally we shouldn't be entering that code but I think we overflowed
> >> the global marking stack (I updated the CR at ~1am my time with that
> >> conjecture). I think I missed a set_phase() call to tell the parallel
> >> terminator that we only have one thread and it's picking up the
> >> number of workers that executed the remark parallel task.
> >>
> >> Thomas: you were on the right track with your comment about the
> >> marking stack size.
> >>
> >> David:
> >> Thanks for helping out here. The stack trace you mentioned was for
> >> one the refinement threads - a concurrent GC thread. When a
> >> concurrent GC thread "joins" the suspendible thread set, it means
> >> that it will observe and participate in safepoint operations, i.e.
> >> the thread will notice that it should reach a safepoint and the safepoint
> synchronizer code will wait for it to block.
> >> When we wish a concurrent GC thread to not observe safepoints, that
> >> thread leaves the suspendible thread set. I think the name could be a
> >> bit better and Tony, before he left, had a change that used a scoped
> >> object to join and leave the STS that hasn't been integrated yet.
> >> IIRC Tony wasn't happy with the name he chose for that also.
> >>
> >> Uwe:
> >> Thanks for bringing this up and my apologies for not replying sooner.
> >> I will have a fix fairly soon. If I'm correct about it being caused
> >> by overflowing the marking stack you can work around the issue by
> >> increasing the MarkStackSize.you could try increasing it to 2M or 4M
> >> entries (which is the current max size).
> >>
> >> Cheers,
> >>
> >> JohnC
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> >> Sent: Wednesday, March 06, 2013 1:35 PM
> >> To: dev@lucene.apache.org
> >> Subject: FW: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32
> >> bit)
> >>
> >> They already understood the G1GC problem with JDK 8 b78/b79 and
> >> working on a fix. This was really fast:
> >> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-
> >> March/006128.html
> >>
> >> Uwe
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: uwe@thetaphi.de
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> > additional commands, e-mail: dev-help@lucene.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Mark Miller <ma...@gmail.com>.
Awesome work Uwe! Nice job getting this some attention.

- mark

On Mar 6, 2013, at 10:41 AM, Uwe Schindler <uw...@thetaphi.de> wrote:

> It seems that there is already an explanation from the Oracle engineer:
> 
>> -----Original Message-----
>> From: John Cuthbertson [mailto:john.cuthbertson@oracle.com]
>> Sent: Wednesday, March 06, 2013 7:04 PM
>> To: Thomas Schatzl
>> Cc: Uwe Schindler; hotspot-gc-dev@openjdk.java.net; 'David Holmes';
>> 'Dawid Weiss'; hotspot-dev@openjdk.java.net
>> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
>> 
>> Hi Everyone,
>> 
>> All:
>> I've looked at the bug report (haven't tried to reproduce it yet) and Bengt's
>> analysis is correct. The concurrent mark thread is entering the
>> synchronization protocol in a marking step call. That code is waiting for some
>> non-existent workers to terminate before proceeding. Normally we
>> shouldn't be entering that code but I think we overflowed the global marking
>> stack (I updated the CR at ~1am my time with that conjecture). I think I
>> missed a set_phase() call to tell the parallel terminator that we only have one
>> thread and it's picking up the number of workers that executed the remark
>> parallel task.
>> 
>> Thomas: you were on the right track with your comment about the marking
>> stack size.
>> 
>> David:
>> Thanks for helping out here. The stack trace you mentioned was for one the
>> refinement threads - a concurrent GC thread. When a concurrent GC thread
>> "joins" the suspendible thread set, it means that it will observe and
>> participate in safepoint operations, i.e. the thread will notice that it should
>> reach a safepoint and the safepoint synchronizer code will wait for it to block.
>> When we wish a concurrent GC thread to not observe safepoints, that
>> thread leaves the suspendible thread set. I think the name could be a bit
>> better and Tony, before he left, had a change that used a scoped object to
>> join and leave the STS that hasn't been integrated yet. IIRC Tony wasn't
>> happy with the name he chose for that also.
>> 
>> Uwe:
>> Thanks for bringing this up and my apologies for not replying sooner. I will
>> have a fix fairly soon. If I'm correct about it being caused by overflowing the
>> marking stack you can work around the issue by increasing the
>> MarkStackSize.you could try increasing it to 2M or 4M entries (which is the
>> current max size).
>> 
>> Cheers,
>> 
>> JohnC
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
>> -----Original Message-----
>> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> Sent: Wednesday, March 06, 2013 1:35 PM
>> To: dev@lucene.apache.org
>> Subject: FW: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
>> 
>> They already understood the G1GC problem with JDK 8 b78/b79 and working
>> on a fix. This was really fast:
>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-
>> March/006128.html
>> 
>> Uwe
>> 
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Uwe Schindler <uw...@thetaphi.de>.
It seems that there is already an explanation from the Oracle engineer:

> -----Original Message-----
> From: John Cuthbertson [mailto:john.cuthbertson@oracle.com]
> Sent: Wednesday, March 06, 2013 7:04 PM
> To: Thomas Schatzl
> Cc: Uwe Schindler; hotspot-gc-dev@openjdk.java.net; 'David Holmes';
> 'Dawid Weiss'; hotspot-dev@openjdk.java.net
> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
> 
> Hi Everyone,
> 
> All:
> I've looked at the bug report (haven't tried to reproduce it yet) and Bengt's
> analysis is correct. The concurrent mark thread is entering the
> synchronization protocol in a marking step call. That code is waiting for some
> non-existent workers to terminate before proceeding. Normally we
> shouldn't be entering that code but I think we overflowed the global marking
> stack (I updated the CR at ~1am my time with that conjecture). I think I
> missed a set_phase() call to tell the parallel terminator that we only have one
> thread and it's picking up the number of workers that executed the remark
> parallel task.
> 
> Thomas: you were on the right track with your comment about the marking
> stack size.
> 
> David:
> Thanks for helping out here. The stack trace you mentioned was for one the
> refinement threads - a concurrent GC thread. When a concurrent GC thread
> "joins" the suspendible thread set, it means that it will observe and
> participate in safepoint operations, i.e. the thread will notice that it should
> reach a safepoint and the safepoint synchronizer code will wait for it to block.
> When we wish a concurrent GC thread to not observe safepoints, that
> thread leaves the suspendible thread set. I think the name could be a bit
> better and Tony, before he left, had a change that used a scoped object to
> join and leave the STS that hasn't been integrated yet. IIRC Tony wasn't
> happy with the name he chose for that also.
> 
> Uwe:
> Thanks for bringing this up and my apologies for not replying sooner. I will
> have a fix fairly soon. If I'm correct about it being caused by overflowing the
> marking stack you can work around the issue by increasing the
> MarkStackSize.you could try increasing it to 2M or 4M entries (which is the
> current max size).
> 
> Cheers,
> 
> JohnC

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Wednesday, March 06, 2013 1:35 PM
> To: dev@lucene.apache.org
> Subject: FW: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
> 
> They already understood the G1GC problem with JDK 8 b78/b79 and working
> on a fix. This was really fast:
> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-
> March/006128.html
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


FW: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Uwe Schindler <uw...@thetaphi.de>.
They already understood the G1GC problem with JDK 8 b78/b79 and working on a fix. This was really fast:
http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-March/006128.html

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de



Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by David Holmes <da...@oracle.com>.
If the VM is completely unresponsive then it suggests we are at a safepoint.

The GC threads are not "hung" in os::parK, they are parked - waiting to 
be notified of something.

The thing is to find out why they are not being woken up.

Can the gdb log be posted somewhere? I don't know if the attachment made 
it to the original posting on hotspot-gc but it's no longer available on 
hotspot-dev.

Thanks,
David

On 6/03/2013 4:07 PM, Krystal Mok wrote:
> Hi Uwe,
>
> If you can attach gdb onto it, and jstack -m and jstack -F should also
> work; that'll get you the Java stack trace.
> (But it probably doesn't matter in this case, because the hang is
> probably bug in the VM).
>
> - Kris
>
> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler <us...@apache.org> wrote:
>> Hi,
>>
>> since a few month we are extensively testing various preview builds of JDK 8 for compatibility with Apache Lucene and Solr, so we can find any bugs early and prevent the problems we had with the release of Java 7 two years ago. Currently we have a Linux (Ubuntu 64bit) Jenkins machine that has various JDKs (JDK 6, JDK 7, JDK 8 snapshot, IBM J9, older JRockit) installed, choosing a different one with different hotspot and garbage collector settings on every run of the test suite (which takes approx. 30-45 minutes).
>>
>> JDK 8 b79 works so far very well on Linux, we found some strange behavior in early versions (maybe compiler errors), but no longer at the moment. There is one configuration that constantly and reproducibly hangs in one module that is tested: The configuration uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client does not matter). The JVM running the tests hangs irresponsible (jstack or kill -3 have no effect/cannot connect, standard kill does not stop it, only kill -9 actually kills it). It can be reproduced in this Lucene module 100% (it hangs always).
>>
>> I was able to connect with GDB to the JVM and get a stack trace on all threads (see attachment, dump.txt). As you see all threads of G1GC seem to hang in a syscall (os:park(), a conditional wait in pthread library). Unfortunately that’s all I can give you. A Java stacktrace is not possible because the JVM reacts on neither kill -3 nor jstack. With all other garbage collectors it passes the test without hangs in a few seconds, with 32 bit G1GC it can stand still for hours. The 64 bit JVM passes with G1GC, so only the 32 bit variant is affected. Client or Server VM makes no difference.
>>
>> To reproduce:
>> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this should not matter)
>> - Download Lucene Source code (e.g. the snapshot version we were testing with: https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/)
>> - change to directory lucene/analysis/uima and run:
>>          ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3 -Dtests.jvms=1 test
>> After a while the test framework prints "stalled" messages (because the child VM actually running the test no longer responds). The PID is also printed. Try to get a stack trace or kill it, no response. Only kill -9 helps. Choosing another garbage collector in the above command line makes the test finish after a few seconds, e.g. -Dargs="-server -XX:+UseConcMarkSweepGC"
>>
>> I posted this bug report directly to the mailing list, because with earlier bug reports, there seem to be a problem with bugs.sun.com - there is no response from any reviewer after several weeks and we were able to help to find and fix javadoc and javac-compiler bugs early. So I hope you can help for this bug, too.
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> uschindler@apache.org
>> Apache Lucene PMC Member / Committer
>> Bremen, Germany
>> http://lucene.apache.org/
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Krystal Mok <re...@gmail.com>.
Hi Uwe,

If you can attach gdb onto it, and jstack -m and jstack -F should also
work; that'll get you the Java stack trace.
(But it probably doesn't matter in this case, because the hang is
probably bug in the VM).

- Kris

On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler <us...@apache.org> wrote:
> Hi,
>
> since a few month we are extensively testing various preview builds of JDK 8 for compatibility with Apache Lucene and Solr, so we can find any bugs early and prevent the problems we had with the release of Java 7 two years ago. Currently we have a Linux (Ubuntu 64bit) Jenkins machine that has various JDKs (JDK 6, JDK 7, JDK 8 snapshot, IBM J9, older JRockit) installed, choosing a different one with different hotspot and garbage collector settings on every run of the test suite (which takes approx. 30-45 minutes).
>
> JDK 8 b79 works so far very well on Linux, we found some strange behavior in early versions (maybe compiler errors), but no longer at the moment. There is one configuration that constantly and reproducibly hangs in one module that is tested: The configuration uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client does not matter). The JVM running the tests hangs irresponsible (jstack or kill -3 have no effect/cannot connect, standard kill does not stop it, only kill -9 actually kills it). It can be reproduced in this Lucene module 100% (it hangs always).
>
> I was able to connect with GDB to the JVM and get a stack trace on all threads (see attachment, dump.txt). As you see all threads of G1GC seem to hang in a syscall (os:park(), a conditional wait in pthread library). Unfortunately that’s all I can give you. A Java stacktrace is not possible because the JVM reacts on neither kill -3 nor jstack. With all other garbage collectors it passes the test without hangs in a few seconds, with 32 bit G1GC it can stand still for hours. The 64 bit JVM passes with G1GC, so only the 32 bit variant is affected. Client or Server VM makes no difference.
>
> To reproduce:
> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this should not matter)
> - Download Lucene Source code (e.g. the snapshot version we were testing with: https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/)
> - change to directory lucene/analysis/uima and run:
>         ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3 -Dtests.jvms=1 test
> After a while the test framework prints "stalled" messages (because the child VM actually running the test no longer responds). The PID is also printed. Try to get a stack trace or kill it, no response. Only kill -9 helps. Choosing another garbage collector in the above command line makes the test finish after a few seconds, e.g. -Dargs="-server -XX:+UseConcMarkSweepGC"
>
> I posted this bug report directly to the mailing list, because with earlier bug reports, there seem to be a problem with bugs.sun.com - there is no response from any reviewer after several weeks and we were able to help to find and fix javadoc and javac-compiler bugs early. So I hope you can help for this bug, too.
>
> Uwe
>
> -----
> Uwe Schindler
> uschindler@apache.org
> Apache Lucene PMC Member / Committer
> Bremen, Germany
> http://lucene.apache.org/
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Fwd: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Posted by Tommaso Teofili <to...@gmail.com>.
FYI
there's a strange (interesting?) bug in latest Java8 update which is
triggered by the UIMA Lucene Analyzers tests [1]
Regards,
Tommaso

[1] : http://markmail.org/thread/wtz2mnpx7b63lxjn

---------- Forwarded message ----------
From: Uwe Schindler <us...@apache.org>
Date: 2013/3/5
Subject: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
To: hotspot-gc-dev@openjdk.java.net, hotspot-dev@openjdk.java.net
Cc: Dawid Weiss <dw...@apache.org>, dev@lucene.apache.org


Hi,

since a few month we are extensively testing various preview builds of JDK
8 for compatibility with Apache Lucene and Solr, so we can find any bugs
early and prevent the problems we had with the release of Java 7 two years
ago. Currently we have a Linux (Ubuntu 64bit) Jenkins machine that has
various JDKs (JDK 6, JDK 7, JDK 8 snapshot, IBM J9, older JRockit)
installed, choosing a different one with different hotspot and garbage
collector settings on every run of the test suite (which takes approx.
30-45 minutes).

JDK 8 b79 works so far very well on Linux, we found some strange behavior
in early versions (maybe compiler errors), but no longer at the moment.
There is one configuration that constantly and reproducibly hangs in one
module that is tested: The configuration uses JDK 8 b79 (same for b78), 32
bit, and G1GC (server or client does not matter). The JVM running the tests
hangs irresponsible (jstack or kill -3 have no effect/cannot connect,
standard kill does not stop it, only kill -9 actually kills it). It can be
reproduced in this Lucene module 100% (it hangs always).

I was able to connect with GDB to the JVM and get a stack trace on all
threads (see attachment, dump.txt). As you see all threads of G1GC seem to
hang in a syscall (os:park(), a conditional wait in pthread library).
Unfortunately that’s all I can give you. A Java stacktrace is not possible
because the JVM reacts on neither kill -3 nor jstack. With all other
garbage collectors it passes the test without hangs in a few seconds, with
32 bit G1GC it can stand still for hours. The 64 bit JVM passes with G1GC,
so only the 32 bit variant is affected. Client or Server VM makes no
difference.

To reproduce:
- Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this should
not matter)
- Download Lucene Source code (e.g. the snapshot version we were testing
with:
https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/
)
- change to directory lucene/analysis/uima and run:
        ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
-Dtests.jvms=1 test
After a while the test framework prints "stalled" messages (because the
child VM actually running the test no longer responds). The PID is also
printed. Try to get a stack trace or kill it, no response. Only kill -9
helps. Choosing another garbage collector in the above command line makes
the test finish after a few seconds, e.g. -Dargs="-server
-XX:+UseConcMarkSweepGC"

I posted this bug report directly to the mailing list, because with earlier
bug reports, there seem to be a problem with bugs.sun.com - there is no
response from any reviewer after several weeks and we were able to help to
find and fix javadoc and javac-compiler bugs early. So I hope you can help
for this bug, too.

Uwe

-----
Uwe Schindler
uschindler@apache.org
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org