You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Benson Margulies <bi...@gmail.com> on 2012/02/19 15:21:48 UTC

Counting all the hits with parallel searching

If I have a lot of segments, and an executor service in my searcher,
the following runs out of memory instantly, building giant heaps. Is
there another way to express this? Should I file a JIRA that the
parallel code should have some graceful behavior?

int longestMentionFreq = searcher.search(longestMentionQuery, filter,
Integer.MAX_VALUE).totalHits + 1;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Counting all the hits with parallel searching

Posted by Robert Muir <rc...@gmail.com>.

On Sun, Feb 19, 2012 at 10:23 AM, Benson Margulies
<bi...@gmail.com> wrote:
> thanks, that's what I needed.
>

Thanks for bringing this up, I think its a common issue, I created
https://issues.apache.org/jira/browse/LUCENE-3799 to hopefully improve
the docs situation.

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Counting all the hits with parallel searching

Posted by Benson Margulies <bi...@gmail.com>.

thanks, that's what I needed.

On Feb 19, 2012, at 9:51 AM, Robert Muir <rc...@gmail.com> wrote:

> On Sun, Feb 19, 2012 at 9:21 AM, Benson Margulies <bi...@gmail.com> wrote:
>> If I have a lot of segments, and an executor service in my searcher,
>> the following runs out of memory instantly, building giant heaps. Is
>> there another way to express this? Should I file a JIRA that the
>> parallel code should have some graceful behavior?
>>
>> int longestMentionFreq = searcher.search(longestMentionQuery, filter,
>> Integer.MAX_VALUE).totalHits + 1;
>>
>
> the _n_ you pass there is the actual number of results that you need
> to display to the user, in top-N order.
> so in most cases this should be something like 20.
>
> This is because it builds a priority queue of size _n_ to return
> results in sorted order.
>
> Don't pass huge numbers here: if you are not actually returning pages
> of results to the user, but just counting hits, then pass
> TotalHitCountCollector.
>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Counting all the hits with parallel searching

Posted by Robert Muir <rc...@gmail.com>.

On Sun, Feb 19, 2012 at 9:21 AM, Benson Margulies <bi...@gmail.com> wrote:
> If I have a lot of segments, and an executor service in my searcher,
> the following runs out of memory instantly, building giant heaps. Is
> there another way to express this? Should I file a JIRA that the
> parallel code should have some graceful behavior?
>
> int longestMentionFreq = searcher.search(longestMentionQuery, filter,
> Integer.MAX_VALUE).totalHits + 1;
>

the _n_ you pass there is the actual number of results that you need
to display to the user, in top-N order.
so in most cases this should be something like 20.

This is because it builds a priority queue of size _n_ to return
results in sorted order.

Don't pass huge numbers here: if you are not actually returning pages
of results to the user, but just counting hits, then pass
TotalHitCountCollector.

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Counting all the hits with parallel searching

Posted by Uwe Schindler <uw...@thetaphi.de>.

By passing Integer.MAX_VALUE you are requesting Lucene to allocate a priority queue for collecting results with that size, this OOMs. With Lucene if you are using TopDocs, the idea is to only get a limited amount of Top-Ranking documents to display search results. The user is not interested in the 2 million's result page, so pass a small number of top hits.

To simply count all hits like you seem to do, there is a separate collector available: http://goo.gl/XsPVR

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Benson Margulies [mailto:bimargulies@gmail.com]
> Sent: Sunday, February 19, 2012 3:22 PM
> To: java-user@lucene.apache.org
> Subject: Counting all the hits with parallel searching
> 
> If I have a lot of segments, and an executor service in my searcher, the
> following runs out of memory instantly, building giant heaps. Is there another
> way to express this? Should I file a JIRA that the parallel code should have some
> graceful behavior?
> 
> int longestMentionFreq = searcher.search(longestMentionQuery, filter,
> Integer.MAX_VALUE).totalHits + 1;
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org