You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2019/07/02 02:04:06 UTC

RE: Need to upgrade jenkins jdk-11 jobs >= 11.0.3 to fix JVM SSL bugs


Uwe: when you upgraded the JVMs, were there by any chance other 
(potentially inadvertent) changes to the VMs -- particularly the Windows 
VM?

Since June 22nd, we've see ReplicationFactorTest fail in _almost_ every 
Windows build from your server, regardless of JVM version used, on 
both master and branch_8x -- and when it fails, the failure repro 
checks at the end of the build "succeed" in reproducing the failure 
anywhere from 1-5 times.  All of the failures occur at the exact same line 
number of the test.

(Prior to June 22nd, this test only failed a total of 5 times in 2019, 
across all jenkins servers/OSes, etc... and at a quick glance, rarely in 
this specific spot)

Based on my review of a handful of the logs from the past week, it looks 
like the failures *may* be caused by CPU starvation -- but even that's a 
guess since similar situations are also tested earlier in the same test -- 
and it never seems to fail during them (as it simulates 3 servers with one 
or 2 of the replicas partitioned off via a closed proxy, one update thread 
on the leader is continuously trying to reconnect, while another leader 
thread is sending updates to the remaining "live" replica -- but that live 
replica doesn't seem to "see" the request until 30 seconds later, after 
the first update thread appears to have given up on the "down" replica)

in any case -- i'm wondering if perhaps the number of "virtual 
CPUs" or some other VM realted setting might have changed on your Windows 
test instance.



: Date: Sat, 22 Jun 2019 18:37:42 +0200
: From: Uwe Schindler <uw...@thetaphi.de>
: To: dev@lucene.apache.org, Chris Hostetter <ho...@fucit.org>
: Subject: RE: Need to upgrade jenkins jdk-11 jobs >= 11.0.3 to fix JVM SSL bugs
: 
: Hi Hoss,
: 
:  
: 
: sorry for the delay, I updated JDK on Policeman Jenkins. But there are some things to mention:
: 
:  
: 
: *	JDK 11.0.3 is *not* available free of charge from Oracle nor they supply OpenJDK convenience builds. Because of this we have to use JDK 11.0.3 from AdoptOpenJDK (Hotspot variant). I have installed those builds on Linux, Windows, MacOSX
: *	JDK 12.0.1 was also installed on Linux, Windows, MacOSX (also AdoptOpenJDK)
: *	JDK 13-ea+26 was installed on Linux and Windows
: *	JDK-13-ea+shipilev-fastdebug (nightly) was updated to yesterday’s build) on Linux
: 
:  
: 
: If the SSL errors are really coming from this and Users have to install 11.0.3 at minimum, we have to mention this in release notes and on the web page. Especially we have to tell people to either pay Oracle to get 11.0.3 LTS or to use AdoptOpenJDK-Hotspot or Coretto (untested). Of course this is only a limitation if you enable TLS, which most people don’t do on their Solr servers.
: 
:  
: 
: Uwe
: 
:  
: 
: -----
: 
: Uwe Schindler
: 
: Achterdiek 19, D-28357 Bremen
: 
: https://www.thetaphi.de
: 
: eMail: uwe@thetaphi.de
: 
:  
: 
: From: Uwe Schindler <uw...@thetaphi.de> 
: Sent: Saturday, June 22, 2019 10:19 AM
: To: Chris Hostetter <ho...@fucit.org>
: Cc: Lucene Dev <de...@lucene.apache.org>
: Subject: Re: Need to upgrade jenkins jdk-11 jobs >= 11.0.3 to fix JVM SSL bugs
: 
:  
: 
: Ok, will work on it later today.
: 
: Uwe
: 
: Am June 22, 2019 12:23:17 AM UTC schrieb Chris Hostetter <hossman_lucene@fucit.org <ma...@fucit.org> >:
: 
: 
: We also need to upgrade the jdk-13 jenkins jobs to at least 13-ea+26, 
: which includes the fix for JDK-8224829...
: 
:       <https://bugs.openjdk.java.net/browse/JDK-8224829> https://bugs.openjdk.java.net/browse/JDK-8224829
: 
: : Date: Tue, 18 Jun 2019 14:44:51 -0700 (MST)
: : From: Chris Hostetter < <ma...@fucit.org> hossman_lucene@fucit.org>
: : To: Uwe Schindler < <ma...@thetaphi.de> uwe@thetaphi.de>
: : Cc: Lucene Dev < <ma...@lucene.apache.org> dev@lucene.apache.org>
: : Subject: Need to upgrade jenkins jdk-11 jobs >= 11.0.3 to fix JVM SSL bugs
: : 
: : 
: : TL;DR: Uwe: can you please upgrade the jdk-11 used on the apache lucene jenkis
: : jobs and your policeman jenkins jobs to 11.0.3 ?
: : 
: : ---
: : 
: : Dat & I have (coincidently) found ourselves both looking into some (long
: : standing) SSL weirdness that has only ever manifested on java>=11.
: : 
: : Details can be found in SOLR-12988 & SOLR-12990 but the long and short of it
: : is there are at least 2 known OpenJDK bugs in SSL that have been fixed in
: : 11.0.3, which we are seeing evidence of in jenkins builds using 11.0.2....
: : 
: :  <https://bugs.openjdk.java.net/browse/JDK-8213202> https://bugs.openjdk.java.net/browse/JDK-8213202
: :  <https://bugs.openjdk.java.net/browse/JDK-8212885> https://bugs.openjdk.java.net/browse/JDK-8212885 / JDK-8220723
: : 
: : (The nature of these bugs makes it hard -- at least AFAICT -- to try to write
: : any "assume" logic to auto-detect if they apply to the current JVM.)
: : 
: : There may in fact still be other SSL related bugs in jdk 11.0.3, but it will
: : be hard to know until we at least upgrade to 11.0.3 to see what still fails.
: : 
: : Uwe / whomever has access: if you could help us out here it would be
: : appreciated.
: : 
: : 
: : 
: : -Hoss
: :  <http://www.lucidworks.com/> http://www.lucidworks.com/
: : 
: 
: -Hoss
:  <http://www.lucidworks.com/> http://www.lucidworks.com/
: 
: 
: --
: Uwe Schindler
: Achterdiek 19, 28357 Bremen
: https://www.thetaphi.de
: 
: 

-Hoss
http://www.lucidworks.com/

RE: Need to upgrade jenkins jdk-11 jobs >= 11.0.3 to fix JVM SSL bugs

Posted by Chris Hostetter <ho...@fucit.org>.
: no changes in the VM configurations (same number of cores,  everything same). But one change:

hmmm, ok .. jus a shot in the dark.  thanks for confirming.

: So it was updated to the Windows 10 Feature Update 1903… an 22nd of 
: June. I have no idea what this change may have caused in our tests; but 
: if there is something different, it would also affect Windows Server 
: users upgrading to latest Windows Server versions.

yeah ... i'm really not sure how anything OS level could be causing this 
... even my "CPU contention" guess doesn't _really_ make sense to me, 
since earlier points in the test are basically doing the exact same thing, 
and should need basically the same number of threads ... yet when the test 
fails it also fails at this exact spot.

weird.


-Hoss
http://www.lucidworks.com/

RE: Need to upgrade jenkins jdk-11 jobs >= 11.0.3 to fix JVM SSL bugs

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

 

no changes in the VM configurations (same number of cores,  everything same). But one change:



 

So it was updated to the Windows 10 Feature Update 1903… an 22nd of June. I have no idea what this change may have caused in our tests; but if there is something different, it would also affect Windows Server users upgrading to latest Windows Server versions.

 

Nevertheless, it’s now installing the next 2019-06 cumulative update.

 

Uwe

 

-----

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: uwe@thetaphi.de

 

> -----Original Message-----

> From: Chris Hostetter <ho...@fucit.org>

> Sent: Tuesday, July 2, 2019 4:04 AM

> To: Uwe Schindler <uw...@thetaphi.de>

> Cc: dev@lucene.apache.org

> Subject: RE: Need to upgrade jenkins jdk-11 jobs >= 11.0.3 to fix JVM SSL bugs

> 

> 

> 

> Uwe: when you upgraded the JVMs, were there by any chance other

> (potentially inadvertent) changes to the VMs -- particularly the Windows

> VM?

> 

> Since June 22nd, we've see ReplicationFactorTest fail in _almost_ every

> Windows build from your server, regardless of JVM version used, on

> both master and branch_8x -- and when it fails, the failure repro

> checks at the end of the build "succeed" in reproducing the failure

> anywhere from 1-5 times.  All of the failures occur at the exact same line

> number of the test.

> 

> (Prior to June 22nd, this test only failed a total of 5 times in 2019,

> across all jenkins servers/OSes, etc... and at a quick glance, rarely in

> this specific spot)

> 

> Based on my review of a handful of the logs from the past week, it looks

> like the failures *may* be caused by CPU starvation -- but even that's a

> guess since similar situations are also tested earlier in the same test --

> and it never seems to fail during them (as it simulates 3 servers with one

> or 2 of the replicas partitioned off via a closed proxy, one update thread

> on the leader is continuously trying to reconnect, while another leader

> thread is sending updates to the remaining "live" replica -- but that live

> replica doesn't seem to "see" the request until 30 seconds later, after

> the first update thread appears to have given up on the "down" replica)

> 

> in any case -- i'm wondering if perhaps the number of "virtual

> CPUs" or some other VM realted setting might have changed on your

> Windows

> test instance.

> 

> 

> 

> : Date: Sat, 22 Jun 2019 18:37:42 +0200

> : From: Uwe Schindler < <ma...@thetaphi.de> uwe@thetaphi.de>

> : To:  <ma...@lucene.apache.org> dev@lucene.apache.org, Chris Hostetter < <ma...@fucit.org> hossman_lucene@fucit.org>

> : Subject: RE: Need to upgrade jenkins jdk-11 jobs >= 11.0.3 to fix JVM SSL

> bugs

> :

> : Hi Hoss,

> :

> :

> :

> : sorry for the delay, I updated JDK on Policeman Jenkins. But there are some

> things to mention:

> :

> :

> :

> : *       JDK 11.0.3 is *not* available free of charge from Oracle nor they

> supply OpenJDK convenience builds. Because of this we have to use JDK

> 11.0.3 from AdoptOpenJDK (Hotspot variant). I have installed those builds on

> Linux, Windows, MacOSX

> : *       JDK 12.0.1 was also installed on Linux, Windows, MacOSX (also

> AdoptOpenJDK)

> : *       JDK 13-ea+26 was installed on Linux and Windows

> : *       JDK-13-ea+shipilev-fastdebug (nightly) was updated to yesterday’s

> build) on Linux

> :

> :

> :

> : If the SSL errors are really coming from this and Users have to install 11.0.3

> at minimum, we have to mention this in release notes and on the web page.

> Especially we have to tell people to either pay Oracle to get 11.0.3 LTS or to

> use AdoptOpenJDK-Hotspot or Coretto (untested). Of course this is only a

> limitation if you enable TLS, which most people don’t do on their Solr

> servers.

> :

> :

> :

> : Uwe

> :

> :

> :

> : -----

> :

> : Uwe Schindler

> :

> : Achterdiek 19, D-28357 Bremen

> :

> :  <https://www.thetaphi.de> https://www.thetaphi.de

> :

> : eMail:  <ma...@thetaphi.de> uwe@thetaphi.de

> :

> :

> :

> : From: Uwe Schindler < <ma...@thetaphi.de> uwe@thetaphi.de>

> : Sent: Saturday, June 22, 2019 10:19 AM

> : To: Chris Hostetter < <ma...@fucit.org> hossman_lucene@fucit.org>

> : Cc: Lucene Dev < <ma...@lucene.apache.org> dev@lucene.apache.org>

> : Subject: Re: Need to upgrade jenkins jdk-11 jobs >= 11.0.3 to fix JVM SSL

> bugs

> :

> :

> :

> : Ok, will work on it later today.

> :

> : Uwe

> :

> : Am June 22, 2019 12:23:17 AM UTC schrieb Chris Hostetter

> < <ma...@fucit.org> hossman_lucene@fucit.org <ma...@fucit.org> >:

> :

> :

> : We also need to upgrade the jdk-13 jenkins jobs to at least 13-ea+26,

> : which includes the fix for JDK-8224829...

> :

> :       < <https://bugs.openjdk.java.net/browse/JDK-8224829> https://bugs.openjdk.java.net/browse/JDK-8224829>

>  <https://bugs.openjdk.java.net/browse/JDK-8224829> https://bugs.openjdk.java.net/browse/JDK-8224829

> :

> : : Date: Tue, 18 Jun 2019 14:44:51 -0700 (MST)

> : : From: Chris Hostetter < < <ma...@fucit.org> mailto:hossman_lucene@fucit.org>

>  <ma...@fucit.org> hossman_lucene@fucit.org>

> : : To: Uwe Schindler < < <ma...@thetaphi.de> mailto:uwe@thetaphi.de>  <ma...@thetaphi.de> uwe@thetaphi.de>

> : : Cc: Lucene Dev < < <ma...@lucene.apache.org> mailto:dev@lucene.apache.org>

>  <ma...@lucene.apache.org> dev@lucene.apache.org>

> : : Subject: Need to upgrade jenkins jdk-11 jobs >= 11.0.3 to fix JVM SSL bugs

> : :

> : :

> : : TL;DR: Uwe: can you please upgrade the jdk-11 used on the apache lucene

> jenkis

> : : jobs and your policeman jenkins jobs to 11.0.3 ?

> : :

> : : ---

> : :

> : : Dat & I have (coincidently) found ourselves both looking into some (long

> : : standing) SSL weirdness that has only ever manifested on java>=11.

> : :

> : : Details can be found in SOLR-12988 & SOLR-12990 but the long and short

> of it

> : : is there are at least 2 known OpenJDK bugs in SSL that have been fixed in

> : : 11.0.3, which we are seeing evidence of in jenkins builds using 11.0.2....

> : :

> : :  < <https://bugs.openjdk.java.net/browse/JDK-8213202> https://bugs.openjdk.java.net/browse/JDK-8213202>

>  <https://bugs.openjdk.java.net/browse/JDK-8213202> https://bugs.openjdk.java.net/browse/JDK-8213202

> : :  < <https://bugs.openjdk.java.net/browse/JDK-8212885> https://bugs.openjdk.java.net/browse/JDK-8212885>

>  <https://bugs.openjdk.java.net/browse/JDK-8212885> https://bugs.openjdk.java.net/browse/JDK-8212885 / JDK-8220723

> : :

> : : (The nature of these bugs makes it hard -- at least AFAICT -- to try to write

> : : any "assume" logic to auto-detect if they apply to the current JVM.)

> : :

> : : There may in fact still be other SSL related bugs in jdk 11.0.3, but it will

> : : be hard to know until we at least upgrade to 11.0.3 to see what still fails.

> : :

> : : Uwe / whomever has access: if you could help us out here it would be

> : : appreciated.

> : :

> : :

> : :

> : : -Hoss

> : :  < <http://www.lucidworks.com/> http://www.lucidworks.com/>  <http://www.lucidworks.com/> http://www.lucidworks.com/

> : :

> :

> : -Hoss

> :  < <http://www.lucidworks.com/> http://www.lucidworks.com/>  <http://www.lucidworks.com/> http://www.lucidworks.com/

> :

> :

> : --

> : Uwe Schindler

> : Achterdiek 19, 28357 Bremen

> :  <https://www.thetaphi.de> https://www.thetaphi.de

> :

> :

> 

> -Hoss

>  <http://www.lucidworks.com/> http://www.lucidworks.com/