You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2016/05/06 21:14:12 UTC

[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

    [ https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274734#comment-15274734 ] 

Michael McCandless commented on LUCENE-6766:
--------------------------------------------

I've been slowly iterating here and pushing changes to https://github.com/mikemccand/lucene-solr/tree/index_sort

There are tons of nocommits, but tests do pass, including index sorting tests (though they still need improving).

Some details:

  - I added a new {{DocIDMerger}} helper class, and the default merge impls use this to abstract away how to iterate the documents from the N sub-readers, whether they are simply concatenated or merge-sorted.  I think this should be quite a bit more efficient than what {{SortingMergePolicy}} does today, but it does add some increase in code complexity, which I think is OK/contained.

  - {{SlowCompositeReader}} is no longer used for index sorting

  - Points now work fine w/ index sorting

  - CheckIndex verifies the claimed per-segment index sort is in fact true

  - IW gets angry if you open an existing index with a different index sort

  - Only simple sort types are allowed; no CUSTOM, SCORE or REWRITEABLE

  - I made a new {{Lucene62Codec}}, with a new {{Lucene62SegmentInfoFormat}} that supports index sorting.

  - I added {{LeafReader.getIndexSort}} so apps can check if a given segment was sorted

  - I disable bulk merge optos when index sorting is present

IW flush still does not sort, and so at merge time we wrap such segments with {{SortingLeafReader}}.  This is quite ugly, that an index can have some segments sorted and some not sorted.  E.g. it means IW's check for whether the new index sort matches the existing one, is just best effort ... but this is already an enormous change so
I think we really have to look into "sort on flush" (which is hairy by itself) later, separately


> Make index sorting a first-class citizen
> ----------------------------------------
>
>                 Key: LUCENE-6766
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6766
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge policy, custom collectors, etc. I would like to explore making it a first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs on a sort order that is a prefix of the sort order of a segment (and if the user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org