You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Uwe Schindler <uw...@thetaphi.de> on 2016/02/05 12:06:38 UTC

FW: More SIGSEGV problems with JDK9 EA build 102

Hi,

I contacted the hotspot team about the build 102 failures:
================================================

Hi Hotspot team,

we are testing build 102 of JDK 9 EA builds (without Jigsaw). Previously we were using build 94 without any problems (except the compact strings problem, that we worked around using the "-XX:-CompactStrings" flag). Tobias told me that the problem should be fixed in later EA builds.

With build 102 we got multiple SIGSEGVs or test assertions failed in our code (see also the talk about our randomized testing on FOSDEM). One of these issues looks like the https://bugs.openjdk.java.net/browse/JDK-8148490 issue. By applying "-XX:-UseSuperWord" the problems disappeared (we have some classes in Lucene, e.g., CompressingTermVectorsWriter.java, which are very sensitive to vectorization and failures there are almost always AVX optimization problems, so it was quite clear that the issue was AVX). Once this is fixed and part of EA builds, we can reenable super words in testing, for now we leave them disabled.

But even with "-XX:-UseSuperWord -XX:-CompactStrings" passed on command line, we still get on every 3rd run a crush / test failure (32 bits more often, but 64 bits also sometimes). The failures are in most cases SIGSEGV at various places inside the JVM or libc. We also get "invalid pointer" on free(). As said before, no problems with build 94, but happens all the time with build 102.

Does anybody has some idea what could be the problem? The crushes look like this:

3 SIGSEGVs:
   [junit4] >>> JVM J2 emitted unexpected output (verbatim) ----
   [junit4] #
   [junit4] # A fatal error has been detected by the Java Runtime Environment:
   [junit4] #
   [junit4] #  SIGSEGV (0xb) at pc=0x00007fc2be539edc, pid=10944, tid=11054
   [junit4] #
   [junit4] # JRE version: Java(TM) SE Runtime Environment (9.0+102) (build 9-ea+102-2016-01-21-001533.javare.4316.nc)
   [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (9-ea+102-2016-01-21-001533.javare.4316.nc, mixed mode, tiered, g1 gc, linux-amd64)
   [junit4] # Problematic frame:
   [junit4] # C  [libc.so.6+0x7fedc][thread 12464 also had an error]
   [junit4] # [ timer expired, abort... ]
   [junit4] <<< JVM J2: EOF ----

   [junit4] >>> JVM J2 emitted unexpected output (verbatim) ----
   [junit4] #
   [junit4] # A fatal error has been detected by the Java Runtime Environment:
   [junit4] #
   [junit4] #  SIGSEGV (0xb) at pc=0xf6ac0445, pid=13204, tid=13307
   [junit4] #
   [junit4] # JRE version: Java(TM) SE Runtime Environment (9.0+102) (build 9-ea+102-2016-01-21-001243.javare.4316.nc)
   [junit4] # Java VM: Java HotSpot(TM) Server VM (9-ea+102-2016-01-21-001243.javare.4316.nc, mixed mode, tiered, g1 gc, linux-x86)
   [junit4] # Problematic frame:
   [junit4] # V  [libjvm.so+0x241445]  ChunkPool::allocate(unsigned int, AllocFailStrategy::AllocFailEnum)+0x25
   [junit4] #
   [junit4] # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
   [junit4] #
   [junit4] # An error report file with more information is saved as:
   [junit4] # /home/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/build/backward-codecs/test/J2/hs_err_pid13204.log
   [junit4] 
   [junit4] [error occurred during error reporting , id 0xb]
   [junit4] 
   [junit4] #
   [junit4] # If you would like to submit a bug report, please visit:
   [junit4] #   http://bugreport.java.com/bugreport/crash.jsp
   [junit4] #
   [junit4] <<< JVM J2: EOF ----

   [junit4] >>> JVM J2 emitted unexpected output (verbatim) ----
   [junit4] #
   [junit4] # A fatal error has been detected by the Java Runtime Environment:
   [junit4] #
   [junit4] #  SIGSEGV (0xb) at pc=0xf7249300, pid=31605, tid=31729
   [junit4] #
   [junit4] # JRE version: Java(TM) SE Runtime Environment (9.0+102) (build 9-ea+102-2016-01-21-001243.javare.4316.nc)
   [junit4] # Java VM: Java HotSpot(TM) Server VM (9-ea+102-2016-01-21-001243.javare.4316.nc, mixed mode, tiered, g1 gc, linux-x86)
   [junit4] # Problematic frame:
   [junit4] # V  [libjvm.so+0x9b3300]  Type::cmp(Type const*, Type const*)+0x10
   [junit4] #
   [junit4] # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
   [junit4] #
   [junit4] # An error report file with more information is saved as:
   [junit4] # /home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build/core/test/J2/hs_err_pid31605.log
   [junit4] #
   [junit4] # Compiler replay data is saved as:
   [junit4] # /home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build/core/test/J2/replay_pid31605.log
   [junit4] #
   [junit4] # If you would like to submit a bug report, please visit:
   [junit4] #   http://bugreport.java.com/bugreport/crash.jsp
   [junit4] #
   [junit4] <<< JVM J2: EOF ----

"invalid pointer" on free():

   [junit4] JVM J0: stderr was not empty, see: /home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build/test-framework/test/temp/junit4-J0-20160204_124703_166.syserr
   [junit4] >>> JVM J0 emitted unexpected output (verbatim) ----
   [junit4] *** Error in `/home/jenkins/tools/java/64bit/jdk-9-ea+102/bin/java': free(): invalid pointer: 0x00007f74bd485960 ***
   [junit4] <<< JVM J0: EOF ----

   [junit4] JVM J0: stderr was not empty, see: /home/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build/test-framework/test/temp/junit4-J0-20160203_223446_729.syserr
   [junit4] >>> JVM J0 emitted unexpected output (verbatim) ----
   [junit4] *** Error in `/home/jenkins/tools/java/64bit/jdk-9-ea+102/bin/java': free(): invalid pointer: 0x00007f53c8b0ece0 ***
   [junit4] <<< JVM J0: EOF ----

(after that it exited with exit code 134)

I don't have the hs_err.pid files for all failures anymore (because they are deleted after 20 Jenkins Job runs), but I can provide them if needed - for the recent ones I clicked on "preserve" on the Jenkins Job (sorry for not doing that earlier!).

I just wanted to know if anybody has an idea what could cause this.
Today I have seen a patch for this issue: 8149038: SIGSEGV at	frame::is_interpreted_frame_valid -> StubRoutines::SafeFetchN (could be related).

Kind regards and thanks for the nice FOSDEM meeting last week,

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org