You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Robert Muir (Jira)" <ji...@apache.org> on 2021/10/02 13:27:00 UTC
[jira] [Commented] (LUCENE-10128) large indexing slowdown after
increasing HNSW beam width
[ https://issues.apache.org/jira/browse/LUCENE-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423529#comment-17423529 ]
Robert Muir commented on LUCENE-10128:
--------------------------------------
I opened LUCENE-10142 to try to help the large amount of time I see spent in java.util.Random here
> large indexing slowdown after increasing HNSW beam width
> --------------------------------------------------------
>
> Key: LUCENE-10128
> URL: https://issues.apache.org/jira/browse/LUCENE-10128
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Robert Muir
> Priority: Major
> Attachments: LUCENE-10128_remove_sparse_fixed_bitset_reflection.patch, Screen_Shot_2021-09-28_at_09.10.15.png
>
>
> Just opening a ticket in case there is anything we could/should do about it. Looking at Mike's nightly benchmarks, I see a large (like 4x) drop in indexing perf with vectors after LUCENE-10109.
> There's some new stuff in the top CPU offenders:
> {noformat}
> PERCENT CPU SAMPLES STACK
> 19.93% 821395 org.apache.lucene.util.VectorUtil#dotProduct()
> 13.80% 568786 org.apache.lucene.util.LongHeap#downHeap()
> 11.06% 455711 org.apache.lucene.codecs.KnnVectorsWriter$VectorValuesMerger$MergerRandomAccess#vectorValue()
> 9.84% 405678 org.apache.lucene.util.LongHeap#upHeap()
> 6.72% 276931 java.util.concurrent.atomic.AtomicLong#get()
> 5.30% 218564 org.apache.lucene.util.LongHeap$2#lessThan()
> 2.69% 110872 java.util.Arrays#binarySearch0()
> 2.58% 106294 org.apache.lucene.util.hnsw.HnswGraph#search()
> 1.90% 78254 org.apache.lucene.util.LongHeap#push()
> {noformat}
> compared to before where the profile stacks looked like this:
> {noformat}
> PERCENT CPU SAMPLES STACK
> 13.58% 171575 org.apache.lucene.util.VectorUtil#dotProduct()
> 10.13% 127904 org.apache.lucene.util.LongHeap#downHeap()
> 9.84% 124257 org.apache.lucene.util.LongHeap#upHeap()
> 6.26% 79125 java.util.ArrayList#elementData()
> 4.34% 54831 java.util.Random#nextInt()
> 3.98% 50255 org.apache.lucene.util.BytesRefHash#equals()
> 3.69% 46594 org.apache.lucene.util.ByteBlockPool#allocSlice()
> 2.62% 33118 org.apache.lucene.util.BytesRefHash#findHash()
> 2.24% 28275 org.apache.lucene.analysis.standard.StandardTokenizerImpl#getNextToken()
> 2.14% 27033 org.apache.lucene.analysis.standard.StandardTokenizer#incrementToken()
> {noformat}
> At a glance, it seems to me that although some perf differences should be expected, merging itself may have become more costly. Maybe there is some stuff we can optimize about it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org