You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2015/05/22 15:39:17 UTC

[jira] [Commented] (LUCENE-6496) Updatable OrdinalMap

    [ https://issues.apache.org/jira/browse/LUCENE-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556147#comment-14556147 ] 

Robert Muir commented on LUCENE-6496:
-------------------------------------

Can we avoid the interface here? There is already an "interface" which is LongValues getGlobalOrds() and so on. Otherwise lets remove that.

Can we consider just keeping the current one as-is and trying out the updatable one in sandbox or similar, so it can be iterated on?

The reason I ask, its critical to keep the complexity of this thing low. it is used by indexwriter for merging.

> Updatable OrdinalMap 
> ---------------------
>
>                 Key: LUCENE-6496
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6496
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Martijn van Groningen
>            Priority: Minor
>         Attachments: LUCENE-6496.patch
>
>
> The MultiDocValues.OrdinalMap that we have to today requires a rebuild on each reopen. When the OrdinalMap has been built, lookups are fast and the logic is simple. Many time rebuilding the the OrdinalMap isn't even an issue, because for low to medium cardinality fields the rebuilding doesn't take that much time. The time required to build the OrdinalMap depends on the number of unique terms in a field.
> For high cardinality fields (lets say >= 1M terms) rebuilding the OrdinalMap can take some time to complete. This can then impact the NRT aspect of many applications (facets may rely on ordinal maps to be rebuilt before a new search can happen after the reopen).
> I like to explore a different OrdinalMap implementation that doesn't need to be rebuilt on each reopen. There are simple improvements that can made:
> * Lets say docs have only been marked as deleted, then we basically reuse the OrdinalMap that has already been built. 
> * If no new terms have been introduced we can just add segment ordinal to global ordinal lookups to the OrdinalMap that has already been built.
> I think a complete OrdinalMap rebuild is inevitable, but it would be great if we could rebuild on a flush / merge instead of on each reopen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org