You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Paul Hill <pa...@metajure.com> on 2012/06/08 19:32:52 UTC

IndexSearcher.search(query, filter, collector) considered less efficient

I noticed today that my code calls
IndexSearcher.search (Query query, Filter filter, Collector collector)
But also noticed that the DOCs says

"Applications should only use this if they need all of the matching documents. The high-level search API (Searcher.search(Query, Filter, int)
) is usually more efficient, as it skips non-high-scoring hits."
   http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/core/org/apache/lucene/search/IndexSearcher.html#searchAfter%28org.apache.lucene.search.ScoreDoc,%20org.apache.lucene.search.Query,%20int%29
Which makes complete sense since I didn't provide it with any count limit.
My original, but apparently inefficient call is:
            searcher.search(userQuery, securityFilter, dedupingCollector);
The userQuery is really an enhanced query based on what the user entered, not really the usersQuery.
The duplicateCollector uses one fieldCache (FieldCache.DEFAULT.getStrings(reader, deDupField) to work out which ones to collect and which ones to reject, saving a list of 1st occurrences of documents.
I don't think I can use the contrib DuplicateFilter, because my duplicates are not guaranteed to be in the same index segment.

So am I being misled by my interpretation of the JavaDoc comment, even though I really DON'T "need all matching documents" or is there some way to work a count limit and a flitering into the whole chain of filters and collectors.

-Paul

Re: IndexSearcher.search(query, filter, collector) considered less efficient

Posted by Michael McCandless <lu...@mikemccandless.com>.

I think that javadoc is stale; my guess is it was written back when
the collect method took a score, but we changed that so the collector
calls .score() if it really needs the score... so I can't think of why
that search method is inherently inefficient.

I'll fix the javadocs (remove that warning).

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jun 8, 2012 at 1:32 PM, Paul Hill <pa...@metajure.com> wrote:
> I noticed today that my code calls
> IndexSearcher.search (Query query, Filter filter, Collector collector)
> But also noticed that the DOCs says
>
> "Applications should only use this if they need all of the matching documents. The high-level search API (Searcher.search(Query, Filter, int)
> ) is usually more efficient, as it skips non-high-scoring hits."
>   http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/core/org/apache/lucene/search/IndexSearcher.html#searchAfter%28org.apache.lucene.search.ScoreDoc,%20org.apache.lucene.search.Query,%20int%29
> Which makes complete sense since I didn't provide it with any count limit.
> My original, but apparently inefficient call is:
>            searcher.search(userQuery, securityFilter, dedupingCollector);
> The userQuery is really an enhanced query based on what the user entered, not really the usersQuery.
> The duplicateCollector uses one fieldCache (FieldCache.DEFAULT.getStrings(reader, deDupField) to work out which ones to collect and which ones to reject, saving a list of 1st occurrences of documents.
> I don't think I can use the contrib DuplicateFilter, because my duplicates are not guaranteed to be in the same index segment.
>
> So am I being misled by my interpretation of the JavaDoc comment, even though I really DON'T "need all matching documents" or is there some way to work a count limit and a flitering into the whole chain of filters and collectors.
>
> -Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org