You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Uwe Schindler (Jira)" <ji...@apache.org> on 2021/09/19 11:58:00 UTC
[jira] [Commented] (LUCENE-10113) Improve ByteArrayDataInput to read primitive short/int/long natively using VarHandles

    [ https://issues.apache.org/jira/browse/LUCENE-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417323#comment-17417323 ] 

Uwe Schindler commented on LUCENE-10113:
----------------------------------------

Performance comparison:

{noformat}
                    TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
       HighTermMonthSort       80.36     (11.4%)       78.81      (7.9%)   -1.9% ( -19% -   19%) 0.533
    HighTermTitleBDVSort       12.47     (20.5%)       12.23     (18.6%)   -1.9% ( -33% -   46%) 0.756
        HighSloppyPhrase       14.80      (3.9%)       14.61      (3.4%)   -1.2% (  -8% -    6%) 0.287
               OrHighLow      191.33      (3.1%)      189.67      (3.8%)   -0.9% (  -7% -    6%) 0.436
            HighSpanNear        2.76      (2.6%)        2.74      (4.4%)   -0.6% (  -7% -    6%) 0.575
                  Fuzzy2       57.57      (1.6%)       57.21      (2.2%)   -0.6% (  -4% -    3%) 0.304
              HighPhrase       39.92      (3.5%)       39.68      (2.8%)   -0.6% (  -6% -    5%) 0.546
             LowSpanNear        8.42      (2.3%)        8.39      (2.8%)   -0.4% (  -5% -    4%) 0.663
                Wildcard       31.79      (3.8%)       31.74      (3.9%)   -0.2% (  -7% -    7%) 0.891
                  IntNRQ       94.20      (3.5%)       94.05      (4.1%)   -0.2% (  -7% -    7%) 0.898
             MedSpanNear       29.62      (2.3%)       29.59      (2.7%)   -0.1% (  -4% -    5%) 0.894
   HighTermDayOfYearSort       56.16      (7.7%)       56.11      (7.5%)   -0.1% ( -14% -   16%) 0.969
         LowSloppyPhrase       10.42      (2.7%)       10.42      (2.4%)   -0.0% (  -4% -    5%) 0.985
         MedSloppyPhrase        7.32      (2.5%)        7.32      (2.3%)    0.0% (  -4% -    4%) 0.977
                 Prefix3       12.62      (9.1%)       12.63     (10.5%)    0.1% ( -17% -   21%) 0.987
                  Fuzzy1       89.28      (1.1%)       89.34      (2.0%)    0.1% (  -3% -    3%) 0.903
   BrowseMonthSSDVFacets        5.04      (5.5%)        5.04      (5.2%)    0.1% ( -10% -   11%) 0.966
            OrNotHighLow      571.04      (1.5%)      571.46      (2.7%)    0.1% (  -4% -    4%) 0.917
     MedIntervalsOrdered       36.70      (5.6%)       36.78      (5.7%)    0.2% ( -10% -   12%) 0.905
                PKLookup      203.36      (3.8%)      203.81      (3.2%)    0.2% (  -6% -    7%) 0.845
    HighIntervalsOrdered        3.55      (5.3%)        3.56      (5.0%)    0.2% (  -9% -   11%) 0.891
                 Respell       59.27      (1.3%)       59.44      (1.7%)    0.3% (  -2% -    3%) 0.548
               MedPhrase      367.55      (2.0%)      368.61      (1.7%)    0.3% (  -3% -    4%) 0.623
              AndHighLow      560.26      (3.3%)      561.90      (3.8%)    0.3% (  -6% -    7%) 0.795
            OrNotHighMed      971.44      (2.4%)      974.30      (3.2%)    0.3% (  -5% -    6%) 0.742
               LowPhrase       41.63      (2.5%)       41.76      (2.3%)    0.3% (  -4% -    5%) 0.672
     LowIntervalsOrdered       94.44      (3.1%)       94.75      (3.2%)    0.3% (  -5% -    6%) 0.744
                 MedTerm     1590.31      (4.9%)     1596.02      (5.0%)    0.4% (  -9% -   10%) 0.819
           OrHighNotHigh      958.25      (3.5%)      964.26      (3.7%)    0.6% (  -6% -    8%) 0.581
                 LowTerm     1527.92      (2.5%)     1538.97      (3.0%)    0.7% (  -4% -    6%) 0.412
              OrHighHigh       26.32      (3.0%)       26.55      (3.4%)    0.9% (  -5% -    7%) 0.373
            OrHighNotMed     1177.62      (4.3%)     1188.50      (4.8%)    0.9% (  -7% -   10%) 0.522
            OrHighNotLow     1215.18      (4.5%)     1227.52      (4.6%)    1.0% (  -7% -   10%) 0.481
               OrHighMed       65.77      (4.0%)       66.50      (3.7%)    1.1% (  -6% -    9%) 0.365
              AndHighMed       44.34      (4.4%)       44.84      (5.0%)    1.1% (  -7% -   11%) 0.449
           OrNotHighHigh      783.60      (3.9%)      792.60      (4.6%)    1.1% (  -7% -    9%) 0.392
             AndHighHigh       38.95      (4.7%)       39.44      (4.6%)    1.3% (  -7% -   11%) 0.392
BrowseDayOfYearSSDVFacets        4.68     (10.0%)        4.77      (9.7%)    1.9% ( -16% -   23%) 0.551
   BrowseMonthTaxoFacets        1.20      (9.0%)        1.23      (9.7%)    2.3% ( -15% -   23%) 0.437
BrowseDayOfYearTaxoFacets        1.15      (9.7%)        1.18     (11.0%)    2.4% ( -16% -   25%) 0.461
                HighTerm     2329.95      (4.5%)     2391.41      (5.3%)    2.6% (  -6% -   13%) 0.092
    BrowseDateTaxoFacets        1.16      (9.7%)        1.19     (11.1%)    2.7% ( -16% -   25%) 0.421
              TermDTSort       65.25      (7.7%)       68.06     (10.3%)    4.3% ( -12% -   24%) 0.132


CPU merged search profile for my_modified_version:
PROFILE SUMMARY from 1454870 events (total: 1M)
  tests.profile.mode=cpu
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
9.22%         134161        org.apache.lucene.util.packed.DirectMonotonicReader#get()
8.22%         119591        org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
5.35%         77845         org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue()
3.52%         51219         java.util.Collections$UnmodifiableCollection$1#<init>()
3.49%         50705         org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
2.44%         35538         org.apache.lucene.store.ByteBufferGuard#getShort()
2.37%         34551         java.nio.Buffer#scope()
2.03%         29597         java.nio.ByteBuffer#getArray()
2.02%         29320         jdk.internal.misc.Unsafe#convEndian()
1.91%         27740         java.nio.DirectByteBuffer#getShort()
1.90%         27626         org.apache.lucene.store.ByteBufferIndexInput#readBytes()
1.81%         26300         java.util.Objects#checkIndex()
1.76%         25665         org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
1.60%         23302         org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$VaryingBPVReader#getLongValue()
1.51%         21980         org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
1.51%         21925         org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
1.22%         17802         jdk.internal.misc.Unsafe#copyMemory()
1.16%         16832         org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
1.14%         16544         org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
1.11%         16206         jdk.internal.util.Preconditions#checkFromIndexSize()
1.10%         16072         org.apache.lucene.codecs.lucene90.ForUtil#expand8()
1.07%         15617         java.nio.Buffer#position()
1.05%         15245         org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
0.97%         14144         org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
0.95%         13785         org.apache.lucene.search.ConjunctionDISI#doNext()
0.94%         13737         org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
0.94%         13647         org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions()
0.86%         12467         java.util.Collections$UnmodifiableCollection$1#hasNext()
0.82%         11951         org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.82%         11948         org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()


CPU merged search profile for baseline:
PROFILE SUMMARY from 1455846 events (total: 1M)
  tests.profile.mode=cpu
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
9.92%         144437        org.apache.lucene.util.packed.DirectMonotonicReader#get()
8.91%         129653        org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
6.42%         93437         org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue()
3.60%         52446         java.util.Collections$UnmodifiableCollection$1#<init>()
3.30%         48044         org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
2.33%         33969         org.apache.lucene.store.ByteBufferIndexInput#readBytes()
2.26%         32835         java.nio.ByteBuffer#getArray()
1.99%         28936         java.util.Objects#checkIndex()
1.96%         28474         org.apache.lucene.store.ByteBufferGuard#getShort()
1.92%         27908         java.nio.Buffer#scope()
1.87%         27199         jdk.internal.util.Preconditions#checkFromIndexSize()
1.77%         25702         org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
1.71%         24943         jdk.internal.misc.Unsafe#convEndian()
1.67%         24365         org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$VaryingBPVReader#getLongValue()
1.67%         24310         org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
1.57%         22795         org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
1.51%         21924         org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
1.23%         17931         java.nio.Buffer#position()
1.18%         17145         jdk.internal.misc.Unsafe#copyMemory()
1.16%         16845         org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
1.12%         16270         org.apache.lucene.codecs.lucene90.ForUtil#expand8()
1.10%         15971         java.nio.DirectByteBuffer#getShort()
0.99%         14447         org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
0.98%         14252         org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions()
0.95%         13817         org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
0.93%         13573         org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
0.92%         13323         org.apache.lucene.search.ConjunctionDISI#doNext()
0.80%         11677         java.util.Collections$UnmodifiableCollection$1#hasNext()
0.78%         11422         org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.76%         11123         org.apache.lucene.queries.spans.NearSpansOrdered#nextStartPosition()


HEAP merged search profile for my_modified_version:
PROFILE SUMMARY from 78058 events (total: 27928M)
  tests.profile.mode=heap
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       HEAP SAMPLES  STACK
17.16%        4792M         org.apache.lucene.util.FixedBitSet#<init>()
8.40%         2344M         org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
7.41%         2070M         org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
6.80%         1898M         java.util.AbstractList#iterator()
5.46%         1524M         org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame#<init>()
3.18%         888M          org.apache.lucene.util.ArrayUtil#growExact()
2.97%         830M          org.apache.lucene.util.BytesRef#<init>()
2.69%         750M          java.util.ArrayList#grow()
2.60%         726M          org.apache.lucene.util.fst.ByteSequenceOutputs#read()
2.56%         715M          org.apache.lucene.queryparser.charstream.FastCharStream#refill()
2.43%         677M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#<init>()
1.79%         499M          java.nio.DirectByteBufferR#duplicate()
1.67%         466M          org.apache.lucene.queryparser.charstream.FastCharStream#GetImage()
1.66%         463M          jdk.internal.misc.Unsafe#allocateUninitializedArray()
1.58%         440M          org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#getFrame()
1.55%         433M          java.util.ArrayList#iterator()
1.34%         375M          org.apache.lucene.util.PriorityQueue#<init>()
1.30%         363M          org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#load()
1.20%         333M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#<init>()
1.14%         318M          org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#<init>()
1.11%         309M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#newTermState()
1.09%         305M          org.apache.lucene.codecs.lucene90.ForUtil#<init>()
0.87%         244M          java.nio.DirectByteBufferR#slice()
0.83%         230M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#<init>()
0.82%         229M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#<init>()
0.80%         223M          org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document()
0.73%         204M          java.nio.DirectByteBufferR#asLongBuffer()
0.73%         203M          org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#<init>()
0.72%         201M          java.util.AbstractList#listIterator()
0.68%         191M          java.util.Arrays#copyOf()


HEAP merged search profile for baseline:
PROFILE SUMMARY from 78116 events (total: 27923M)
  tests.profile.mode=heap
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       HEAP SAMPLES  STACK
17.15%        4789M         org.apache.lucene.util.FixedBitSet#<init>()
8.30%         2317M         org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
7.31%         2040M         org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
6.89%         1925M         java.util.AbstractList#iterator()
5.39%         1506M         org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame#<init>()
3.24%         904M          org.apache.lucene.util.ArrayUtil#growExact()
3.01%         840M          org.apache.lucene.util.BytesRef#<init>()
2.72%         759M          java.util.ArrayList#grow()
2.59%         724M          org.apache.lucene.util.fst.ByteSequenceOutputs#read()
2.50%         697M          org.apache.lucene.queryparser.charstream.FastCharStream#refill()
2.41%         673M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#<init>()
1.85%         515M          java.nio.DirectByteBufferR#duplicate()
1.69%         472M          jdk.internal.misc.Unsafe#allocateUninitializedArray()
1.60%         446M          org.apache.lucene.queryparser.charstream.FastCharStream#GetImage()
1.56%         434M          org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#getFrame()
1.50%         419M          java.util.ArrayList#iterator()
1.36%         378M          org.apache.lucene.util.PriorityQueue#<init>()
1.33%         371M          org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#load()
1.15%         321M          org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#<init>()
1.12%         313M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#<init>()
1.07%         298M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#newTermState()
1.02%         284M          org.apache.lucene.codecs.lucene90.ForUtil#<init>()
0.93%         260M          java.nio.DirectByteBufferR#slice()
0.82%         228M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#<init>()
0.81%         227M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#<init>()
0.81%         225M          org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document()
0.77%         214M          java.nio.DirectByteBufferR#asLongBuffer()
0.69%         193M          org.apache.lucene.queryparser.classic.Token#newToken()
0.67%         187M          java.util.AbstractList#listIterator()
0.66%         185M          java.util.Arrays#copyOf()
{noformat}

It looks like there's a slight improvement in some queries/sorting. The new code is much cleaner, so I see no reason not to commit this. I am still open for suggestions about the FST readers.

> Improve ByteArrayDataInput to read primitive short/int/long natively using VarHandles
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-10113
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10113
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/store
>    Affects Versions: main (9.0)
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Major
>             Fix For: main (9.0)
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> LUCENE-10112 reminded me about something i wanted to do long ago: Basically for all IndexInputs/DataInputs we are able to natively read short, int, long using little endian with single CPU instructions (due to using ByteBuffer's methods that support primitive reads). Only ByteArrayDataInput still uses manual code beased on the the inherited byte-by-byte approach to read single bytes and combining the bytes using little endian.
> The approach here is to use Java 9+ VarHandles to allow reading int/long/short as single cpu instructions and not manually recombining the bytes. The trick is to make a "view" var handle which allows to access the byte array using the same mechanisms as ByteBuffers or JDK 17 MemorySegments (under the hood it uses Unsafe to use CPU instructions and optionally swap bytes if platform endianness is BE).
> In LUCENE-10112 there were similar stuff done with LZ4 and a microbenchmark was written that showed a significant speed improvement when accessing the types with VarHandle.
> P.S.: The same applies to FST.BytesReader and/or ByteSliceReader, but I am no sure if those use the int/short/long ones at all. At least this one does not override the methods to read ints, longs and shorts, so there is no optimization at all. FST seems to read bytes and byte[] only and ByteSliceReader mostly VInts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org