You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2019/05/16 03:22:00 UTC

[jira] [Closed] (LUCENE-8800) FieldsReader#terms poor performance on a index with many field names sharing common prefix

     [ https://issues.apache.org/jira/browse/LUCENE-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley closed LUCENE-8800.
--------------------------------

> FieldsReader#terms poor performance on a index with many field names sharing common prefix
> ------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8800
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8800
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: 8.0
>            Reporter: Huy Le
>            Priority: Major
>         Attachments: Screen Shot 2019-05-15 at 5.08.26 pm.png
>
>
> We have experienced poor performance on an index with many fields, their names share common prefix. Sampling stack using jprofiler showed a hotspot on methodĀ FieldsReader#terms.
> !Screen Shot 2019-05-15 at 5.08.26 pm.png!
> Looking at source code I have seen thatĀ TreeMap is used to map between field name to  FieldsProducer which means a lookup incurs O(logN) comparisons. 
> {code:java}
> private static class FieldsReader extends FieldsProducer {
>     ...    
>     private final Map<String,FieldsProducer> fields = new TreeMap<>();
>     ...
>     @Override
>     public Terms terms(String field) throws IOException {
>       FieldsProducer fieldsProducer = fields.get(field);
>       return fieldsProducer == null ? null : fieldsProducer.terms(field);
>     }
> {code}
> The problem becomes much worse when field names are long and share common prefix because each comparison has to iterate over an entire string.
> In our case, the index has around 6000 fields in form of customfield_*.  I wonder if we can change the TreeMap to HashMap or LinkedHashMap in case we want to preserve the sorted order to improve the situation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org