You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Marcus Eagan <ma...@gmail.com> on 2020/08/10 23:12:08 UTC

Performance in Solr 9 / Java 11

In my IDE, I have a few profiling tools that I bounce between that I
started using in my work at Lucidworks but I continue to use in my current
work today. I have suspicions that there may be some performance
improvements in Java 11 that we can exploit further.  I'm curious as to if
there has been any investigation, possibly Mark Miller or @uwe@thetaphi.de
<uw...@thetaphi.de>,  into performance improvements specific to the newer
version of Java in Master? There are some obvious ones that we get for
free, like a better GC, but curious as to prior work in this area before
publishing anything that might be redundant or irrelevant.

Best,

-- 
Marcus Eagan

RE: Performance in Solr 9 / Java 11

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

 

Yes, mr-jar build were always done in Lucene 8x (on Lucene’s part of the build). You might think: “How can that be as I am – as release manager – used JDK8 to build, and I am forced to do so?”

 

This is a trick that works for some types of JDK9+ improvements, which rely on some simple “replacement” operations for static method that are highly optiized in later JDK versions using an compiler intrinsic. The trick is the following: We have (possibly slower) replacement implementations in Lucene’s codebase for the new method (see classes oal.utils.FutureArrays and FutureObjects) with identical static signature (this is why it only works for that case). After compilation of the code an additional Ant task is running that uses ASM to patch the produced JAR files of the Java 8 compiler to replace the usage of the FutureXxxx classes in the code:  <https://github.com/apache/lucene-solr/blob/branch_8x/lucene/tools/src/groovy/patch-mrjar-classes.groovy> https://github.com/apache/lucene-solr/blob/branch_8x/lucene/tools/src/groovy/patch-mrjar-classes.groovy The patched classes are written to build/java9 and jarred into jar file below META-INF/versions/9

 

In master this was removed. If we have a similar thing in later JDK versions (static methods which got highly optimized versions with intrinsics in later Java versions), we can do the same, the above script just needs to be included into the Gradle build. But I have not figured out anything like this.

 

The classical “MR-builds” are much more complex to setup, as it requires every developer to have the later version already installed and we need to do parallel compilation of different source trees which also has the risk of producing broken code that won’t work at runtime. So the “just replace method calls in Java 8 classes” is very convenient & safe, but limited. Very often it’s better for more complex cases, to add a factory pattern to your source code that uses a completely different implementation at runtime (like FSDirectory.open()). One example that might come later is a replacement for MMapDirectory. I am working on this, but it’s not yet useable (see my short talk at barcamp @ berlinbuzzwords this year). Here, we would have a separate impl of MMapDirectory like MemorySegmentDirectory that’s choosen at runtime depending on Java version. We would have a separate gradle module to implement that, which requires a later compiler. The reason for that approach is: The code is so different, that a classical MR JAR would complicate a lot. It’s better to declare this as a separate “implementation class”, so users get better stack traces on error as they actually see what implementation is used.

 

In short: The release manager in 8.x does not need to care, it’s fully automatic and requires no later version of Java during build.

 

Uwe

 

-----

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: Adrien Grand <jp...@gmail.com> 
Sent: Sunday, August 30, 2020 9:00 AM
To: Lucene Dev <de...@lucene.apache.org>
Subject: Re: Performance in Solr 9 / Java 11

 

Tomoko is correct, an MR JAR is created not only upon release but also every time you create a lucene-core JAR on branch_8x.

 

On Sun, Aug 30, 2020 at 5:49 AM Tomoko Uchida <tomoko.uchida.1111@gmail.com <ma...@gmail.com> > wrote:

I believe mr-jar build is enabled in the 8x branch (LUCENE-7966), and the workaround was dropped on the master branch when the minimum java version was bumped up to java 11 (LUCENE-8738); if my understanding is correct.


 

$ jar tf core/lucene-core-8.6.1.jar | grep META-INF/versions
META-INF/versions/
META-INF/versions/9/
META-INF/versions/9/org/
META-INF/versions/9/org/apache/

...

 

$ jar tf core/build/libs/lucene-core-9.0.0-SNAPSHOT.jar | grep META-INF/versions

// no outputs

 

 

 

 

2020年8月30日(日) 6:48 Mike Drob <mdrob@apache.org <ma...@apache.org> >:

Do you know if these mr-jars are built by default as part of the release process? I definitely had no idea about them when doing 8.5.2 and did not even think to verify anything about it. 

 

On Sat, Aug 29, 2020 at 4:05 PM Adrien Grand <jpountz@gmail.com <ma...@gmail.com> > wrote:

It may only be indirectly related to your question, but there is support for vectorized operations of byte[] arrays that was added in JDK 13 (this blog https://richardstartin.github.io/posts/vectorised-byte-operations explains well what it is about) that we started leveraging for compressing terms dictionaries in Lucene 8.5: https://issues.apache.org/jira/browse/LUCENE-4702.

 

I don't know how well this is known but our build also has logic to create multi-release JARs. We don't use it in master today but it's used on branch_8x, which requires Java 8, in order to use APIs that were introduced in Java 9 such as Arrays#mismatch. See the "patch-mr-jar" target in the branch_8x build: https://github.com/apache/lucene-solr/blob/branch_8x/lucene/common-build.xml#L602. So if APIs that could help performance were introduced in say JDK 15, we might still be able to leverage them in Lucene/Solr 9 using the same mechanism.

 

 

 

On Tue, Aug 11, 2020 at 1:12 AM Marcus Eagan <marcuseagan@gmail.com <ma...@gmail.com> > wrote:

In my IDE, I have a few profiling tools that I bounce between that I started using in my work at Lucidworks but I continue to use in my current work today. I have suspicions that there may be some performance improvements in Java 11 that we can exploit further.  I'm curious as to if there has been any investigation, possibly Mark Miller or @uwe@thetaphi.de <ma...@thetaphi.de> ,  into performance improvements specific to the newer version of Java in Master? There are some obvious ones that we get for free, like a better GC, but curious as to prior work in this area before publishing anything that might be redundant or irrelevant. 

 

Best,


 

-- 

Marcus Eagan

 




 

-- 

Adrien

 




 

-- 

Adrien


Re: Performance in Solr 9 / Java 11

Posted by Adrien Grand <jp...@gmail.com>.
Tomoko is correct, an MR JAR is created not only upon release but also
every time you create a lucene-core JAR on branch_8x.

On Sun, Aug 30, 2020 at 5:49 AM Tomoko Uchida <to...@gmail.com>
wrote:

> I believe mr-jar build is enabled in the 8x branch (LUCENE-7966), and the
> workaround was dropped on the master branch when the minimum java version
> was bumped up to java 11 (LUCENE-8738); if my understanding is correct.
>
> $ jar tf core/lucene-core-8.6.1.jar | grep META-INF/versions
> META-INF/versions/
> META-INF/versions/9/
> META-INF/versions/9/org/
> META-INF/versions/9/org/apache/
> ...
>
> $ jar tf core/build/libs/lucene-core-9.0.0-SNAPSHOT.jar | grep
> META-INF/versions
> // no outputs
>
>
>
>
> 2020年8月30日(日) 6:48 Mike Drob <md...@apache.org>:
>
>> Do you know if these mr-jars are built by default as part of the release
>> process? I definitely had no idea about them when doing 8.5.2 and did not
>> even think to verify anything about it.
>>
>> On Sat, Aug 29, 2020 at 4:05 PM Adrien Grand <jp...@gmail.com> wrote:
>>
>>> It may only be indirectly related to your question, but there is support
>>> for vectorized operations of byte[] arrays that was added in JDK 13 (this
>>> blog https://richardstartin.github.io/posts/vectorised-byte-operations explains
>>> well what it is about) that we started leveraging for compressing terms
>>> dictionaries in Lucene 8.5:
>>> https://issues.apache.org/jira/browse/LUCENE-4702.
>>>
>>> I don't know how well this is known but our build also has logic to
>>> create multi-release JARs. We don't use it in master today but it's used on
>>> branch_8x, which requires Java 8, in order to use APIs that were introduced
>>> in Java 9 such as Arrays#mismatch. See the "patch-mr-jar" target in the
>>> branch_8x build:
>>> https://github.com/apache/lucene-solr/blob/branch_8x/lucene/common-build.xml#L602.
>>> So if APIs that could help performance were introduced in say JDK 15, we
>>> might still be able to leverage them in Lucene/Solr 9 using the same
>>> mechanism.
>>>
>>>
>>>
>>> On Tue, Aug 11, 2020 at 1:12 AM Marcus Eagan <ma...@gmail.com>
>>> wrote:
>>>
>>>> In my IDE, I have a few profiling tools that I bounce between that I
>>>> started using in my work at Lucidworks but I continue to use in my current
>>>> work today. I have suspicions that there may be some performance
>>>> improvements in Java 11 that we can exploit further.  I'm curious as to if
>>>> there has been any investigation, possibly Mark Miller or
>>>> @uwe@thetaphi.de <uw...@thetaphi.de>,  into performance improvements
>>>> specific to the newer version of Java in Master? There are some obvious
>>>> ones that we get for free, like a better GC, but curious as to prior work
>>>> in this area before publishing anything that might be redundant or
>>>> irrelevant.
>>>>
>>>> Best,
>>>>
>>>> --
>>>> Marcus Eagan
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> Adrien
>>>
>>>
>>>

-- 
Adrien

Re: Performance in Solr 9 / Java 11

Posted by Tomoko Uchida <to...@gmail.com>.
I believe mr-jar build is enabled in the 8x branch (LUCENE-7966), and the
workaround was dropped on the master branch when the minimum java version
was bumped up to java 11 (LUCENE-8738); if my understanding is correct.

$ jar tf core/lucene-core-8.6.1.jar | grep META-INF/versions
META-INF/versions/
META-INF/versions/9/
META-INF/versions/9/org/
META-INF/versions/9/org/apache/
...

$ jar tf core/build/libs/lucene-core-9.0.0-SNAPSHOT.jar | grep
META-INF/versions
// no outputs




2020年8月30日(日) 6:48 Mike Drob <md...@apache.org>:

> Do you know if these mr-jars are built by default as part of the release
> process? I definitely had no idea about them when doing 8.5.2 and did not
> even think to verify anything about it.
>
> On Sat, Aug 29, 2020 at 4:05 PM Adrien Grand <jp...@gmail.com> wrote:
>
>> It may only be indirectly related to your question, but there is support
>> for vectorized operations of byte[] arrays that was added in JDK 13 (this
>> blog https://richardstartin.github.io/posts/vectorised-byte-operations explains
>> well what it is about) that we started leveraging for compressing terms
>> dictionaries in Lucene 8.5:
>> https://issues.apache.org/jira/browse/LUCENE-4702.
>>
>> I don't know how well this is known but our build also has logic to
>> create multi-release JARs. We don't use it in master today but it's used on
>> branch_8x, which requires Java 8, in order to use APIs that were introduced
>> in Java 9 such as Arrays#mismatch. See the "patch-mr-jar" target in the
>> branch_8x build:
>> https://github.com/apache/lucene-solr/blob/branch_8x/lucene/common-build.xml#L602.
>> So if APIs that could help performance were introduced in say JDK 15, we
>> might still be able to leverage them in Lucene/Solr 9 using the same
>> mechanism.
>>
>>
>>
>> On Tue, Aug 11, 2020 at 1:12 AM Marcus Eagan <ma...@gmail.com>
>> wrote:
>>
>>> In my IDE, I have a few profiling tools that I bounce between that I
>>> started using in my work at Lucidworks but I continue to use in my current
>>> work today. I have suspicions that there may be some performance
>>> improvements in Java 11 that we can exploit further.  I'm curious as to if
>>> there has been any investigation, possibly Mark Miller or
>>> @uwe@thetaphi.de <uw...@thetaphi.de>,  into performance improvements
>>> specific to the newer version of Java in Master? There are some obvious
>>> ones that we get for free, like a better GC, but curious as to prior work
>>> in this area before publishing anything that might be redundant or
>>> irrelevant.
>>>
>>> Best,
>>>
>>> --
>>> Marcus Eagan
>>>
>>>
>>>
>>>
>>
>> --
>> Adrien
>>
>>
>>

Re: Performance in Solr 9 / Java 11

Posted by Mike Drob <md...@apache.org>.
Do you know if these mr-jars are built by default as part of the release
process? I definitely had no idea about them when doing 8.5.2 and did not
even think to verify anything about it.

On Sat, Aug 29, 2020 at 4:05 PM Adrien Grand <jp...@gmail.com> wrote:

> It may only be indirectly related to your question, but there is support
> for vectorized operations of byte[] arrays that was added in JDK 13 (this
> blog https://richardstartin.github.io/posts/vectorised-byte-operations explains
> well what it is about) that we started leveraging for compressing terms
> dictionaries in Lucene 8.5:
> https://issues.apache.org/jira/browse/LUCENE-4702.
>
> I don't know how well this is known but our build also has logic to create
> multi-release JARs. We don't use it in master today but it's used on
> branch_8x, which requires Java 8, in order to use APIs that were introduced
> in Java 9 such as Arrays#mismatch. See the "patch-mr-jar" target in the
> branch_8x build:
> https://github.com/apache/lucene-solr/blob/branch_8x/lucene/common-build.xml#L602.
> So if APIs that could help performance were introduced in say JDK 15, we
> might still be able to leverage them in Lucene/Solr 9 using the same
> mechanism.
>
>
>
> On Tue, Aug 11, 2020 at 1:12 AM Marcus Eagan <ma...@gmail.com>
> wrote:
>
>> In my IDE, I have a few profiling tools that I bounce between that I
>> started using in my work at Lucidworks but I continue to use in my current
>> work today. I have suspicions that there may be some performance
>> improvements in Java 11 that we can exploit further.  I'm curious as to if
>> there has been any investigation, possibly Mark Miller or
>> @uwe@thetaphi.de <uw...@thetaphi.de>,  into performance improvements
>> specific to the newer version of Java in Master? There are some obvious
>> ones that we get for free, like a better GC, but curious as to prior work
>> in this area before publishing anything that might be redundant or
>> irrelevant.
>>
>> Best,
>>
>> --
>> Marcus Eagan
>>
>>
>>
>>
>
> --
> Adrien
>
>
>

Re: Performance in Solr 9 / Java 11

Posted by Adrien Grand <jp...@gmail.com>.
It may only be indirectly related to your question, but there is support
for vectorized operations of byte[] arrays that was added in JDK 13 (this
blog https://richardstartin.github.io/posts/vectorised-byte-operations explains
well what it is about) that we started leveraging for compressing terms
dictionaries in Lucene 8.5:
https://issues.apache.org/jira/browse/LUCENE-4702.

I don't know how well this is known but our build also has logic to create
multi-release JARs. We don't use it in master today but it's used on
branch_8x, which requires Java 8, in order to use APIs that were introduced
in Java 9 such as Arrays#mismatch. See the "patch-mr-jar" target in the
branch_8x build:
https://github.com/apache/lucene-solr/blob/branch_8x/lucene/common-build.xml#L602.
So if APIs that could help performance were introduced in say JDK 15, we
might still be able to leverage them in Lucene/Solr 9 using the same
mechanism.



On Tue, Aug 11, 2020 at 1:12 AM Marcus Eagan <ma...@gmail.com> wrote:

> In my IDE, I have a few profiling tools that I bounce between that I
> started using in my work at Lucidworks but I continue to use in my current
> work today. I have suspicions that there may be some performance
> improvements in Java 11 that we can exploit further.  I'm curious as to if
> there has been any investigation, possibly Mark Miller or @uwe@thetaphi.de
> <uw...@thetaphi.de>,  into performance improvements specific to the newer
> version of Java in Master? There are some obvious ones that we get for
> free, like a better GC, but curious as to prior work in this area before
> publishing anything that might be redundant or irrelevant.
>
> Best,
>
> --
> Marcus Eagan
>
>

-- 
Adrien