You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2019/05/16 03:22:00 UTC
[jira] [Closed] (LUCENE-8800) FieldsReader#terms poor performance
on a index with many field names sharing common prefix
[ https://issues.apache.org/jira/browse/LUCENE-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Smiley closed LUCENE-8800.
--------------------------------
> FieldsReader#terms poor performance on a index with many field names sharing common prefix
> ------------------------------------------------------------------------------------------
>
> Key: LUCENE-8800
> URL: https://issues.apache.org/jira/browse/LUCENE-8800
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Affects Versions: 8.0
> Reporter: Huy Le
> Priority: Major
> Attachments: Screen Shot 2019-05-15 at 5.08.26 pm.png
>
>
> We have experienced poor performance on an index with many fields, their names share common prefix. Sampling stack using jprofiler showed a hotspot on methodĀ FieldsReader#terms.
> !Screen Shot 2019-05-15 at 5.08.26 pm.png!
> Looking at source code I have seen thatĀ TreeMap is used to map between field name to FieldsProducer which means a lookup incurs O(logN) comparisons.
> {code:java}
> private static class FieldsReader extends FieldsProducer {
> ...
> private final Map<String,FieldsProducer> fields = new TreeMap<>();
> ...
> @Override
> public Terms terms(String field) throws IOException {
> FieldsProducer fieldsProducer = fields.get(field);
> return fieldsProducer == null ? null : fieldsProducer.terms(field);
> }
> {code}
> The problem becomes much worse when field names are long and share common prefix because each comparison has to iterate over an entire string.
> In our case, the index has around 6000 fields in form of customfield_*. I wonder if we can change the TreeMap to HashMap or LinkedHashMap in case we want to preserve the sorted order to improve the situation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org