You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tomás Fernández Löbbe (JIRA)" <ji...@apache.org> on 2019/02/08 23:28:00 UTC
[jira] [Resolved] (LUCENE-8662) Change
TermsEnum.seekExact(BytesRef) to abstract + delegate seekExact(BytesRef) in
FilterLeafReader.FilterTermsEnum
[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tomás Fernández Löbbe resolved LUCENE-8662.
-------------------------------------------
Resolution: Fixed
Fix Version/s: (was: 7.7)
> Change TermsEnum.seekExact(BytesRef) to abstract + delegate seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
> -------------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
> Reporter: jefferyyuan
> Priority: Major
> Labels: query
> Fix For: 8.0
>
> Attachments: output of test program.txt
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Recently in our production, we found that Solr uses a lot of memory(more than 10g) during recovery or commit for a small index (3.5gb)
> The stack trace is:
>
> {code:java}
> Thread 0x4d4b115c0
> at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125)
> at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V (SegmentTermsEnumFrame.java:157)
> at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:786)
> at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnumFrame.java:538)
> at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (SegmentTermsEnum.java:757)
> at org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; (FilterLeafReader.java:185)
> at org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z (TermsEnum.java:74)
> at org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J (SolrIndexSearcher.java:823)
> at org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:204)
> at org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (UpdateLog.java:786)
> at org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; (VersionInfo.java:194)
> at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z (DistributedUpdateProcessor.java:1051)
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
> FSDirectory index = FSDirectory.open(Paths.get("the-index"));
> try (IndexReader reader = new ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
> TermsEnum te = lf.reader().terms("id").iterator();
> System.out.println(te.seekExact(text));
> }
> }
> }
> {code}
>
> I added System.out.println("ord: " + ord); in codecs.blocktree.SegmentTermsEnum.getFrame(int).
> Please check the attached output of test program.txt.
>
> We found out the root cause:
> we didn't implement seekExact(BytesRef) method in FilterLeafReader.FilterTerms, so it uses the base class TermsEnum.seekExact(BytesRef) implementation which is very inefficient in this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
> return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
> return in.seekExact(text);
> }
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org