You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Robert Muir (Jira)" <ji...@apache.org> on 2021/02/21 15:50:00 UTC
[jira] [Commented] (LUCENE-9795) investigate large
checkindex/grouping regression in nightly benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287988#comment-17287988 ]
Robert Muir commented on LUCENE-9795:
-------------------------------------
OK, I think i can explain the checkindex stuff.
When profiling unit tests, I do see this stack as top CPU user:
{noformat}
java.nio.ByteBuffer#get()
at java.nio.DirectByteBuffer#get()
at org.apache.lucene.store.ByteBufferGuard#getBytes()
at org.apache.lucene.store.ByteBufferIndexInput#readBytes()
at org.apache.lucene.store.MockIndexInputWrapper#readBytes()
at org.apache.lucene.util.compress.LZ4#decompress()
at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict#decompressBlock()
at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict#next()
at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict#seekExact()
at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$BaseSortedDocValues#lookupOrd()
at org.apache.lucene.index.SortedDocValues#binaryValue()
at org.apache.lucene.index.CheckIndex#checkBinaryDocValues()
{noformat}
I don't think checkindex should test retrieving every SORTED doc's bytes as if it were BINARY. Looks to me like a leftover actually. I will upload a simple patch.
The grouping stuff should maybe be a separate issue, I suspect grouping logic may be inefficiently doing similar stuff (reading tons of terms bytes instead of using ordinals or something).
> investigate large checkindex/grouping regression in nightly benchmarks
> ----------------------------------------------------------------------
>
> Key: LUCENE-9795
> URL: https://issues.apache.org/jira/browse/LUCENE-9795
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Robert Muir
> Priority: Major
> Attachments: Screen_Shot_2021-02-21_at_09.17.53.png, Screen_Shot_2021-02-21_at_09.30.30.png
>
>
> In the nightly benchmark, checkindex times increased more than 4x on the 2/16 datapoint
> Looking at the commits on 2/15, most obvious thing to look into is docvalues terms dict compression: LUCENE-9663
> Will try to pinpoint it more, my concern is some perf bug such as every single term causing decompression of the whole block repeatedly (missing seek-within-block opto?)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org