You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Uwe Schindler (Jira)" <ji...@apache.org> on 2021/09/19 11:58:00 UTC
[jira] [Commented] (LUCENE-10113) Improve ByteArrayDataInput to
read primitive short/int/long natively using VarHandles
[ https://issues.apache.org/jira/browse/LUCENE-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417323#comment-17417323 ]
Uwe Schindler commented on LUCENE-10113:
----------------------------------------
Performance comparison:
{noformat}
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
HighTermMonthSort 80.36 (11.4%) 78.81 (7.9%) -1.9% ( -19% - 19%) 0.533
HighTermTitleBDVSort 12.47 (20.5%) 12.23 (18.6%) -1.9% ( -33% - 46%) 0.756
HighSloppyPhrase 14.80 (3.9%) 14.61 (3.4%) -1.2% ( -8% - 6%) 0.287
OrHighLow 191.33 (3.1%) 189.67 (3.8%) -0.9% ( -7% - 6%) 0.436
HighSpanNear 2.76 (2.6%) 2.74 (4.4%) -0.6% ( -7% - 6%) 0.575
Fuzzy2 57.57 (1.6%) 57.21 (2.2%) -0.6% ( -4% - 3%) 0.304
HighPhrase 39.92 (3.5%) 39.68 (2.8%) -0.6% ( -6% - 5%) 0.546
LowSpanNear 8.42 (2.3%) 8.39 (2.8%) -0.4% ( -5% - 4%) 0.663
Wildcard 31.79 (3.8%) 31.74 (3.9%) -0.2% ( -7% - 7%) 0.891
IntNRQ 94.20 (3.5%) 94.05 (4.1%) -0.2% ( -7% - 7%) 0.898
MedSpanNear 29.62 (2.3%) 29.59 (2.7%) -0.1% ( -4% - 5%) 0.894
HighTermDayOfYearSort 56.16 (7.7%) 56.11 (7.5%) -0.1% ( -14% - 16%) 0.969
LowSloppyPhrase 10.42 (2.7%) 10.42 (2.4%) -0.0% ( -4% - 5%) 0.985
MedSloppyPhrase 7.32 (2.5%) 7.32 (2.3%) 0.0% ( -4% - 4%) 0.977
Prefix3 12.62 (9.1%) 12.63 (10.5%) 0.1% ( -17% - 21%) 0.987
Fuzzy1 89.28 (1.1%) 89.34 (2.0%) 0.1% ( -3% - 3%) 0.903
BrowseMonthSSDVFacets 5.04 (5.5%) 5.04 (5.2%) 0.1% ( -10% - 11%) 0.966
OrNotHighLow 571.04 (1.5%) 571.46 (2.7%) 0.1% ( -4% - 4%) 0.917
MedIntervalsOrdered 36.70 (5.6%) 36.78 (5.7%) 0.2% ( -10% - 12%) 0.905
PKLookup 203.36 (3.8%) 203.81 (3.2%) 0.2% ( -6% - 7%) 0.845
HighIntervalsOrdered 3.55 (5.3%) 3.56 (5.0%) 0.2% ( -9% - 11%) 0.891
Respell 59.27 (1.3%) 59.44 (1.7%) 0.3% ( -2% - 3%) 0.548
MedPhrase 367.55 (2.0%) 368.61 (1.7%) 0.3% ( -3% - 4%) 0.623
AndHighLow 560.26 (3.3%) 561.90 (3.8%) 0.3% ( -6% - 7%) 0.795
OrNotHighMed 971.44 (2.4%) 974.30 (3.2%) 0.3% ( -5% - 6%) 0.742
LowPhrase 41.63 (2.5%) 41.76 (2.3%) 0.3% ( -4% - 5%) 0.672
LowIntervalsOrdered 94.44 (3.1%) 94.75 (3.2%) 0.3% ( -5% - 6%) 0.744
MedTerm 1590.31 (4.9%) 1596.02 (5.0%) 0.4% ( -9% - 10%) 0.819
OrHighNotHigh 958.25 (3.5%) 964.26 (3.7%) 0.6% ( -6% - 8%) 0.581
LowTerm 1527.92 (2.5%) 1538.97 (3.0%) 0.7% ( -4% - 6%) 0.412
OrHighHigh 26.32 (3.0%) 26.55 (3.4%) 0.9% ( -5% - 7%) 0.373
OrHighNotMed 1177.62 (4.3%) 1188.50 (4.8%) 0.9% ( -7% - 10%) 0.522
OrHighNotLow 1215.18 (4.5%) 1227.52 (4.6%) 1.0% ( -7% - 10%) 0.481
OrHighMed 65.77 (4.0%) 66.50 (3.7%) 1.1% ( -6% - 9%) 0.365
AndHighMed 44.34 (4.4%) 44.84 (5.0%) 1.1% ( -7% - 11%) 0.449
OrNotHighHigh 783.60 (3.9%) 792.60 (4.6%) 1.1% ( -7% - 9%) 0.392
AndHighHigh 38.95 (4.7%) 39.44 (4.6%) 1.3% ( -7% - 11%) 0.392
BrowseDayOfYearSSDVFacets 4.68 (10.0%) 4.77 (9.7%) 1.9% ( -16% - 23%) 0.551
BrowseMonthTaxoFacets 1.20 (9.0%) 1.23 (9.7%) 2.3% ( -15% - 23%) 0.437
BrowseDayOfYearTaxoFacets 1.15 (9.7%) 1.18 (11.0%) 2.4% ( -16% - 25%) 0.461
HighTerm 2329.95 (4.5%) 2391.41 (5.3%) 2.6% ( -6% - 13%) 0.092
BrowseDateTaxoFacets 1.16 (9.7%) 1.19 (11.1%) 2.7% ( -16% - 25%) 0.421
TermDTSort 65.25 (7.7%) 68.06 (10.3%) 4.3% ( -12% - 24%) 0.132
CPU merged search profile for my_modified_version:
PROFILE SUMMARY from 1454870 events (total: 1M)
tests.profile.mode=cpu
tests.profile.count=30
tests.profile.stacksize=1
tests.profile.linenumbers=false
PERCENT CPU SAMPLES STACK
9.22% 134161 org.apache.lucene.util.packed.DirectMonotonicReader#get()
8.22% 119591 org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
5.35% 77845 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue()
3.52% 51219 java.util.Collections$UnmodifiableCollection$1#<init>()
3.49% 50705 org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
2.44% 35538 org.apache.lucene.store.ByteBufferGuard#getShort()
2.37% 34551 java.nio.Buffer#scope()
2.03% 29597 java.nio.ByteBuffer#getArray()
2.02% 29320 jdk.internal.misc.Unsafe#convEndian()
1.91% 27740 java.nio.DirectByteBuffer#getShort()
1.90% 27626 org.apache.lucene.store.ByteBufferIndexInput#readBytes()
1.81% 26300 java.util.Objects#checkIndex()
1.76% 25665 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
1.60% 23302 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$VaryingBPVReader#getLongValue()
1.51% 21980 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
1.51% 21925 org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
1.22% 17802 jdk.internal.misc.Unsafe#copyMemory()
1.16% 16832 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
1.14% 16544 org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
1.11% 16206 jdk.internal.util.Preconditions#checkFromIndexSize()
1.10% 16072 org.apache.lucene.codecs.lucene90.ForUtil#expand8()
1.07% 15617 java.nio.Buffer#position()
1.05% 15245 org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
0.97% 14144 org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
0.95% 13785 org.apache.lucene.search.ConjunctionDISI#doNext()
0.94% 13737 org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
0.94% 13647 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions()
0.86% 12467 java.util.Collections$UnmodifiableCollection$1#hasNext()
0.82% 11951 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.82% 11948 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()
CPU merged search profile for baseline:
PROFILE SUMMARY from 1455846 events (total: 1M)
tests.profile.mode=cpu
tests.profile.count=30
tests.profile.stacksize=1
tests.profile.linenumbers=false
PERCENT CPU SAMPLES STACK
9.92% 144437 org.apache.lucene.util.packed.DirectMonotonicReader#get()
8.91% 129653 org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
6.42% 93437 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue()
3.60% 52446 java.util.Collections$UnmodifiableCollection$1#<init>()
3.30% 48044 org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
2.33% 33969 org.apache.lucene.store.ByteBufferIndexInput#readBytes()
2.26% 32835 java.nio.ByteBuffer#getArray()
1.99% 28936 java.util.Objects#checkIndex()
1.96% 28474 org.apache.lucene.store.ByteBufferGuard#getShort()
1.92% 27908 java.nio.Buffer#scope()
1.87% 27199 jdk.internal.util.Preconditions#checkFromIndexSize()
1.77% 25702 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
1.71% 24943 jdk.internal.misc.Unsafe#convEndian()
1.67% 24365 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$VaryingBPVReader#getLongValue()
1.67% 24310 org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
1.57% 22795 org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
1.51% 21924 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
1.23% 17931 java.nio.Buffer#position()
1.18% 17145 jdk.internal.misc.Unsafe#copyMemory()
1.16% 16845 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
1.12% 16270 org.apache.lucene.codecs.lucene90.ForUtil#expand8()
1.10% 15971 java.nio.DirectByteBuffer#getShort()
0.99% 14447 org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
0.98% 14252 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions()
0.95% 13817 org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
0.93% 13573 org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
0.92% 13323 org.apache.lucene.search.ConjunctionDISI#doNext()
0.80% 11677 java.util.Collections$UnmodifiableCollection$1#hasNext()
0.78% 11422 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.76% 11123 org.apache.lucene.queries.spans.NearSpansOrdered#nextStartPosition()
HEAP merged search profile for my_modified_version:
PROFILE SUMMARY from 78058 events (total: 27928M)
tests.profile.mode=heap
tests.profile.count=30
tests.profile.stacksize=1
tests.profile.linenumbers=false
PERCENT HEAP SAMPLES STACK
17.16% 4792M org.apache.lucene.util.FixedBitSet#<init>()
8.40% 2344M org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
7.41% 2070M org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
6.80% 1898M java.util.AbstractList#iterator()
5.46% 1524M org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame#<init>()
3.18% 888M org.apache.lucene.util.ArrayUtil#growExact()
2.97% 830M org.apache.lucene.util.BytesRef#<init>()
2.69% 750M java.util.ArrayList#grow()
2.60% 726M org.apache.lucene.util.fst.ByteSequenceOutputs#read()
2.56% 715M org.apache.lucene.queryparser.charstream.FastCharStream#refill()
2.43% 677M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#<init>()
1.79% 499M java.nio.DirectByteBufferR#duplicate()
1.67% 466M org.apache.lucene.queryparser.charstream.FastCharStream#GetImage()
1.66% 463M jdk.internal.misc.Unsafe#allocateUninitializedArray()
1.58% 440M org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#getFrame()
1.55% 433M java.util.ArrayList#iterator()
1.34% 375M org.apache.lucene.util.PriorityQueue#<init>()
1.30% 363M org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#load()
1.20% 333M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#<init>()
1.14% 318M org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#<init>()
1.11% 309M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#newTermState()
1.09% 305M org.apache.lucene.codecs.lucene90.ForUtil#<init>()
0.87% 244M java.nio.DirectByteBufferR#slice()
0.83% 230M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#<init>()
0.82% 229M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#<init>()
0.80% 223M org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document()
0.73% 204M java.nio.DirectByteBufferR#asLongBuffer()
0.73% 203M org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#<init>()
0.72% 201M java.util.AbstractList#listIterator()
0.68% 191M java.util.Arrays#copyOf()
HEAP merged search profile for baseline:
PROFILE SUMMARY from 78116 events (total: 27923M)
tests.profile.mode=heap
tests.profile.count=30
tests.profile.stacksize=1
tests.profile.linenumbers=false
PERCENT HEAP SAMPLES STACK
17.15% 4789M org.apache.lucene.util.FixedBitSet#<init>()
8.30% 2317M org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
7.31% 2040M org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
6.89% 1925M java.util.AbstractList#iterator()
5.39% 1506M org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame#<init>()
3.24% 904M org.apache.lucene.util.ArrayUtil#growExact()
3.01% 840M org.apache.lucene.util.BytesRef#<init>()
2.72% 759M java.util.ArrayList#grow()
2.59% 724M org.apache.lucene.util.fst.ByteSequenceOutputs#read()
2.50% 697M org.apache.lucene.queryparser.charstream.FastCharStream#refill()
2.41% 673M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#<init>()
1.85% 515M java.nio.DirectByteBufferR#duplicate()
1.69% 472M jdk.internal.misc.Unsafe#allocateUninitializedArray()
1.60% 446M org.apache.lucene.queryparser.charstream.FastCharStream#GetImage()
1.56% 434M org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#getFrame()
1.50% 419M java.util.ArrayList#iterator()
1.36% 378M org.apache.lucene.util.PriorityQueue#<init>()
1.33% 371M org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#load()
1.15% 321M org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#<init>()
1.12% 313M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#<init>()
1.07% 298M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#newTermState()
1.02% 284M org.apache.lucene.codecs.lucene90.ForUtil#<init>()
0.93% 260M java.nio.DirectByteBufferR#slice()
0.82% 228M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#<init>()
0.81% 227M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#<init>()
0.81% 225M org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document()
0.77% 214M java.nio.DirectByteBufferR#asLongBuffer()
0.69% 193M org.apache.lucene.queryparser.classic.Token#newToken()
0.67% 187M java.util.AbstractList#listIterator()
0.66% 185M java.util.Arrays#copyOf()
{noformat}
It looks like there's a slight improvement in some queries/sorting. The new code is much cleaner, so I see no reason not to commit this. I am still open for suggestions about the FST readers.
> Improve ByteArrayDataInput to read primitive short/int/long natively using VarHandles
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-10113
> URL: https://issues.apache.org/jira/browse/LUCENE-10113
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/store
> Affects Versions: main (9.0)
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Priority: Major
> Fix For: main (9.0)
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> LUCENE-10112 reminded me about something i wanted to do long ago: Basically for all IndexInputs/DataInputs we are able to natively read short, int, long using little endian with single CPU instructions (due to using ByteBuffer's methods that support primitive reads). Only ByteArrayDataInput still uses manual code beased on the the inherited byte-by-byte approach to read single bytes and combining the bytes using little endian.
> The approach here is to use Java 9+ VarHandles to allow reading int/long/short as single cpu instructions and not manually recombining the bytes. The trick is to make a "view" var handle which allows to access the byte array using the same mechanisms as ByteBuffers or JDK 17 MemorySegments (under the hood it uses Unsafe to use CPU instructions and optionally swap bytes if platform endianness is BE).
> In LUCENE-10112 there were similar stuff done with LZ4 and a microbenchmark was written that showed a significant speed improvement when accessing the types with VarHandle.
> P.S.: The same applies to FST.BytesReader and/or ByteSliceReader, but I am no sure if those use the int/short/long ones at all. At least this one does not override the methods to read ints, longs and shorts, so there is no optimization at all. FST seems to read bytes and byte[] only and ByteSliceReader mostly VInts.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org