You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by "Bruno Roustant (Jira)" <ji...@apache.org> on 2019/12/02 15:59:00 UTC

[jira] [Commented] (LUCENE-8041) All Fields.terms(fld) impls should be O(1) not O(log(N))

    [ https://issues.apache.org/jira/browse/LUCENE-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986153#comment-16986153 ] 

Bruno Roustant commented on LUCENE-8041:
----------------------------------------

I'm sorry I created and worked on a kind of duplicate Jira issue LUCENE-9045 (now linked to this one as a child). I just heard about this one now.

The mentioned Jira issue fixed the problem for BlockTree and PerFieldPostingsFormat only.

I read in the thread that we should work on making term vectors consistent across the index. Should I create another Jira issue specific to that (and close this one as dupicate)? Or should I keep this one and maybe rename it?

> All Fields.terms(fld) impls should be O(1) not O(log(N))
> --------------------------------------------------------
>
>                 Key: LUCENE-8041
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8041
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: David Smiley
>            Priority: Major
>         Attachments: LUCENE-8041-LinkedHashMap.patch, LUCENE-8041.patch
>
>
> I've seen apps that have a good number of fields -- hundreds.  The O(log(N)) of TreeMap definitely shows up in a profiler; sometimes 20% of search time, if I recall.  There are many Field implementations that are impacted... in part because Fields is the base class of FieldsProducer.  
> As an aside, I hope Fields to go away some day; FieldsProducer should be TermsProducer and not have an iterator of fields. If DocValuesProducer doesn't have this then why should the terms index part of our API have it?  If we did this then the issue here would be a simple transition to a HashMap.
> Or maybe we can switch to HashMap and relax the definition of Fields.iterator to not necessarily be sorted?
> Perhaps the fix can be a relatively simple conversion over to LinkedHashMap in many cases if we can assume when we initialize these internal maps that we consume them in sorted order to begin with.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org