You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "AMIRAULT Martin (JIRA)" <ji...@apache.org> on 2016/10/07 03:03:21 UTC
[jira] [Closed] (LUCENE-7482) Faster sorted index search for
reverse order search
[ https://issues.apache.org/jira/browse/LUCENE-7482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
AMIRAULT Martin closed LUCENE-7482.
-----------------------------------
Resolution: Invalid
Sorry, just realized that actually my implementation assumed that all documents match, which most of the time is not the case.
> Faster sorted index search for reverse order search
> ---------------------------------------------------
>
> Key: LUCENE-7482
> URL: https://issues.apache.org/jira/browse/LUCENE-7482
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: AMIRAULT Martin
> Priority: Minor
>
> We are currently using Lucene here in my company for our main product.
> Our search functionnality is quite basic and the results are always sorted given a predefined field. The user is only able to choose the sort order (Asc/Desc).
> I am currently investigating using the index sort feature with EarlyTerminationSortingCollector.
> This is quite a shame searching on a sorted index in reverse order do not have any optimization and was wondering if it would be possible to make it faster by creating a special "ReverseSortingCollector" for this purpose.
> I am aware the posting list is designed to be always iterated in the same order, so it is not about early-terminating the search but more about filtering-out unneeded documents more efficiently.
> If a segment is sorted in reverse order, we can work out easily the docId from which documents should be collected.
> Here is a sample quick code:
> {code:title=ReverseSortingCollector.java|borderStyle=solid}
> public class ReverseSortingCollector extends FilterCollector {
> /** Sort used to sort the search results */
> protected final Sort sort;
> /** Number of documents to collect in each segment */
> protected final int numDocsToCollect;
>
> [...]
> @Override
> public LeafCollector getLeafCollector(LeafReaderContext context) throws IOException {
> LeafReader reader = context.reader();
> Sort segmentSort = reader.getIndexSort();
> if (isReverseOrder(sort, segmentSort)) {//segment is sorted in reverse order than the search sort
>
> //Here we can easily work out the docNum from which we should collect
> long collectFrom = context.reader().numDocs() - numDocsToCollect;
>
> return new FilterLeafCollector(in.getLeafCollector(context)) {
> @Override
> public void collect(int doc) throws IOException {
> if (doc >= collectFrom) {//only delegates
> super.collect(doc);
> }
> }
> };
> }else{
> return in.getLeafCollector(context);
> }
> }
>
> }
> {code}
> This is specially efficient when used along with TopFieldCollector as a lot of docValue lookup would not take place.
> In my experiment it reduced search time by 90%.
> However I was wondering if it is correct, as my knowledge of Lucene is still quite limited.
> Especially is it correct to assume that LeafReader docId always span from 0=>LeafReader.numDocs() ?
> Note : Does not support paging. Could be eventually implemented by providing a way to look up the docId to match from the last document collected (eg for LongPoint querying the docId closest to the previously returned value...)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org