You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2019/04/28 15:32:00 UTC

[jira] [Comment Edited] (LUCENE-8780) Improve ByteBufferGuard in Java 11

    [ https://issues.apache.org/jira/browse/LUCENE-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828039#comment-16828039 ] 

Uwe Schindler edited comment on LUCENE-8780 at 4/28/19 3:31 PM:
----------------------------------------------------------------

Thats the result after 20 runs with 6 searcher threads (with ParallelGC) on Mike's lucenebench:

{noformat}
use java command /home/jenkins/tools/java/64bit/jdk-11.0.2/bin/java -server -Xms2g -Xmx2g -XX:+UseParallelGC -Xbatch

JAVA:
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)

OS:
Linux serv1.sd-datasolutions.de 4.18.0-17-generic #18~18.04.1-Ubuntu SMP Fri Mar 15 15:27:12 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

[...]

Report after iter 19:
                    Task    QPS orig      StdDev   QPS patch      StdDev                Pct diff
                  IntNRQ       30.88      (0.6%)       26.33      (0.8%)  -14.7% ( -16% -  -13%)
                PKLookup      107.70      (2.7%)       94.31      (2.9%)  -12.4% ( -17% -   -7%)
             AndHighHigh       10.76     (11.5%)       10.17      (3.3%)   -5.4% ( -18% -   10%)
                  Fuzzy2       45.10      (7.7%)       43.21      (9.0%)   -4.2% ( -19% -   13%)
         LowSloppyPhrase        7.28     (16.8%)        6.98      (6.3%)   -4.2% ( -23% -   22%)
            OrHighNotLow      783.24      (7.1%)      751.37      (2.5%)   -4.1% ( -12% -    5%)
           OrHighNotHigh      934.39      (6.5%)      896.38      (2.1%)   -4.1% ( -11% -    4%)
                 Respell       45.36     (10.6%)       43.65      (7.0%)   -3.8% ( -19% -   15%)
           OrNotHighHigh      779.95      (3.8%)      752.28      (1.8%)   -3.5% (  -8% -    2%)
        HighSloppyPhrase       10.37     (12.8%)       10.03      (3.5%)   -3.3% ( -17% -   14%)
               LowPhrase       11.60      (8.9%)       11.23      (1.7%)   -3.2% ( -12% -    8%)
                 LowTerm     1694.00      (8.9%)     1642.34      (5.5%)   -3.0% ( -16% -   12%)
                 MedTerm     1292.82      (9.3%)     1253.69      (8.2%)   -3.0% ( -18% -   15%)
              AndHighMed       71.41      (9.9%)       69.77      (7.5%)   -2.3% ( -17% -   16%)
            OrNotHighMed      634.32      (7.2%)      620.67      (7.5%)   -2.2% ( -15% -   13%)
                 Prefix3      110.65     (14.9%)      108.55      (8.7%)   -1.9% ( -22% -   25%)
               OrHighLow      347.02      (4.3%)      340.51      (9.9%)   -1.9% ( -15% -   12%)
            OrNotHighLow      591.61      (5.5%)      580.60      (9.0%)   -1.9% ( -15% -   13%)
            OrHighNotMed     1258.21      (1.8%)     1237.28      (5.0%)   -1.7% (  -8% -    5%)
                  Fuzzy1       91.79      (4.3%)       90.77     (11.1%)   -1.1% ( -15% -   14%)
               OrHighMed       10.29      (7.9%)       10.25     (11.8%)   -0.4% ( -18% -   20%)
                Wildcard       52.28      (6.3%)       52.21      (6.8%)   -0.1% ( -12% -   13%)
              OrHighHigh        8.16      (6.9%)        8.22      (9.3%)    0.8% ( -14% -   18%)
              AndHighLow      563.89      (9.1%)      569.31     (15.3%)    1.0% ( -21% -   27%)
              HighPhrase       15.88      (9.3%)       16.04     (13.0%)    1.0% ( -19% -   25%)
               MedPhrase       14.84      (9.0%)       15.15     (12.8%)    2.1% ( -18% -   26%)
            HighSpanNear        2.16      (9.8%)        2.21     (10.1%)    2.3% ( -16% -   24%)
         MedSloppyPhrase       18.48     (15.4%)       18.96     (18.9%)    2.6% ( -27% -   43%)
             MedSpanNear       17.75      (3.8%)       18.31     (10.0%)    3.1% ( -10% -   17%)
                HighTerm     1031.00      (9.9%)     1068.12     (17.1%)    3.6% ( -21% -   33%)
             LowSpanNear        8.22      (5.5%)        8.53     (13.3%)    3.7% ( -14% -   23%)
   HighTermDayOfYearSort        9.78     (11.0%)       10.25     (18.2%)    4.8% ( -21% -   38%)
       HighTermMonthSort       23.40     (26.5%)       27.11     (32.1%)   15.9% ( -33% -  101%)
{noformat}

The total runtime of each run did not change, always approx 280s per run patched and unpatched. Not sure how to interpret this.


was (Author: thetaphi):
Thats the result after 20 runs with 6 searcher threads (with ParallelGC) on Mike's lucenebench:

{noformat}
use java command /home/jenkins/tools/java/64bit/jdk-11.0.2/bin/java -server -Xms2g -Xmx2g -XX:+UseParallelGC -Xbatch

JAVA:
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)

OS:
Linux serv1.sd-datasolutions.de 4.18.0-17-generic #18~18.04.1-Ubuntu SMP Fri Mar 15 15:27:12 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

[...]

Report after iter 19:
                    Task    QPS orig      StdDev   QPS patch      StdDev                Pct diff
                  IntNRQ       30.88      (0.6%)       26.33      (0.8%)  -14.7% ( -16% -  -13%)
                PKLookup      107.70      (2.7%)       94.31      (2.9%)  -12.4% ( -17% -   -7%)
             AndHighHigh       10.76     (11.5%)       10.17      (3.3%)   -5.4% ( -18% -   10%)
                  Fuzzy2       45.10      (7.7%)       43.21      (9.0%)   -4.2% ( -19% -   13%)
         LowSloppyPhrase        7.28     (16.8%)        6.98      (6.3%)   -4.2% ( -23% -   22%)
            OrHighNotLow      783.24      (7.1%)      751.37      (2.5%)   -4.1% ( -12% -    5%)
           OrHighNotHigh      934.39      (6.5%)      896.38      (2.1%)   -4.1% ( -11% -    4%)
                 Respell       45.36     (10.6%)       43.65      (7.0%)   -3.8% ( -19% -   15%)
           OrNotHighHigh      779.95      (3.8%)      752.28      (1.8%)   -3.5% (  -8% -    2%)
        HighSloppyPhrase       10.37     (12.8%)       10.03      (3.5%)   -3.3% ( -17% -   14%)
               LowPhrase       11.60      (8.9%)       11.23      (1.7%)   -3.2% ( -12% -    8%)
                 LowTerm     1694.00      (8.9%)     1642.34      (5.5%)   -3.0% ( -16% -   12%)
                 MedTerm     1292.82      (9.3%)     1253.69      (8.2%)   -3.0% ( -18% -   15%)
              AndHighMed       71.41      (9.9%)       69.77      (7.5%)   -2.3% ( -17% -   16%)
            OrNotHighMed      634.32      (7.2%)      620.67      (7.5%)   -2.2% ( -15% -   13%)
                 Prefix3      110.65     (14.9%)      108.55      (8.7%)   -1.9% ( -22% -   25%)
               OrHighLow      347.02      (4.3%)      340.51      (9.9%)   -1.9% ( -15% -   12%)
            OrNotHighLow      591.61      (5.5%)      580.60      (9.0%)   -1.9% ( -15% -   13%)
            OrHighNotMed     1258.21      (1.8%)     1237.28      (5.0%)   -1.7% (  -8% -    5%)
                  Fuzzy1       91.79      (4.3%)       90.77     (11.1%)   -1.1% ( -15% -   14%)
               OrHighMed       10.29      (7.9%)       10.25     (11.8%)   -0.4% ( -18% -   20%)
                Wildcard       52.28      (6.3%)       52.21      (6.8%)   -0.1% ( -12% -   13%)
              OrHighHigh        8.16      (6.9%)        8.22      (9.3%)    0.8% ( -14% -   18%)
              AndHighLow      563.89      (9.1%)      569.31     (15.3%)    1.0% ( -21% -   27%)
              HighPhrase       15.88      (9.3%)       16.04     (13.0%)    1.0% ( -19% -   25%)
               MedPhrase       14.84      (9.0%)       15.15     (12.8%)    2.1% ( -18% -   26%)
            HighSpanNear        2.16      (9.8%)        2.21     (10.1%)    2.3% ( -16% -   24%)
         MedSloppyPhrase       18.48     (15.4%)       18.96     (18.9%)    2.6% ( -27% -   43%)
             MedSpanNear       17.75      (3.8%)       18.31     (10.0%)    3.1% ( -10% -   17%)
                HighTerm     1031.00      (9.9%)     1068.12     (17.1%)    3.6% ( -21% -   33%)
             LowSpanNear        8.22      (5.5%)        8.53     (13.3%)    3.7% ( -14% -   23%)
   HighTermDayOfYearSort        9.78     (11.0%)       10.25     (18.2%)    4.8% ( -21% -   38%)
       HighTermMonthSort       23.40     (26.5%)       27.11     (32.1%)   15.9% ( -33% -  101%)
{noformat}

The total runtime of each run did not change. Not sure how to interpret this.

> Improve ByteBufferGuard in Java 11
> ----------------------------------
>
>                 Key: LUCENE-8780
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8780
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/store
>    Affects Versions: master (9.0)
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Major
>              Labels: Java11
>         Attachments: LUCENE-8780.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In LUCENE-7409 we added {{ByteBufferGuard}} to protect MMapDirectory from crushing the JVM with SIGSEGV when you close and unmap the mmapped buffers of an IndexInput, while another thread is accessing it.
> The idea was to do a volatile write access to flush the caches (to trigger a full fence) and set a non-volatile boolean to true. All accesses would check the boolean and stop the caller from accessing the underlying ByteBuffer. This worked most of the time, until the JVM optimized away the plain read access to the boolean (you can easily see this after some runtime of our by-default ignored testcase).
> With master on Java 11, we can improve the whole thing. Using VarHandles you can use any access type when reading or writing the boolean. After reading Doug Lea's expanation <http://gee.cs.oswego.edu/dl/html/j9mm.html> and some testing, I was no longer able to crush my JDK (even after running for minutes unmapping bytebuffers).
> The apraoch is the same, we do a full-fenced write (standard volatile write) when we unmap, then we yield the thread (to finish in-flight reads in other threads) and then unmap all byte buffers.
> On the test side (read access), instead of using a plain read, we use the new "opaque read". Opaque reads are the same as plain reads, there are only different order requirements. Actually the main difference is explained by Doug like this: "For example in constructions in which the only modification of some variable x is for one thread to write in Opaque (or stronger) mode, X.setOpaque(this, 1), any other thread spinning in while(X.getOpaque(this)!=1){} will eventually terminate. Note that this guarantee does NOT hold in Plain mode, in which spin loops may (and usually do) infinitely loop -- they are not required to notice that a write ever occurred in another thread if it was not seen on first encounter." - And that's waht we want to have: We don't want to do volatile reads, but we want to prevent the compiler from optimizing away our read to the boolean. So we want it to "eventually" see the change. By the much stronger volatile write, the cache effects should be visible even faster (like in our Java 8 approach, just now we improved our read side).
> The new code is much slimmer (theoretically we could also use a AtomicBoolean for that and use the new method {{getOpaque()}}, but I wanted to prevent extra method calls, so I used a VarHandle directly).
> It's setup like this:
> - The underlying boolean field is a private member (with unused SuppressWarnings, as its unused by the java compiler), marked as volatile (that's the recommendation, but in reality it does not matter at all).
> - We create a VarHandle to access this boolean, we never do this directly (this is why the volatile marking does not affect us).
> - We use VarHandle.setVolatile() to change our "invalidated" boolean to "true", so enforcing a full fence
> - On the read side we use VarHandle.getOpaque() instead of VarHandle.get() (like in our old code for Java 8).
> I had to tune our test a bit, as the VarHandles make it take longer until it actually crushes (as optimizations jump in later). I also used a random for the reads to prevent the optimizer from removing all the bytebuffer reads. When we commit this, we can disable the test again (it takes approx 50 secs on my machine).
> I'd still like to see the differences between the plain read and the opaque read in production, so maybe [~mikemccand] or [~rcmuir] can do a comparison with nightly benchmarker?
> Have fun, maybe [~dweiss] has some ideas, too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org