You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Omri Suissa <om...@diffdoof.com> on 2012/12/04 17:35:58 UTC

How to get all documents that fits a query?

Hi,

I want to enumerate all the documents in the index that fits a specific
query (let's say "cat") and perform a task in my database.

When I search I need to give a collector, my common search method using
TopScoreDocCollector that gets numHits in the Create method.

I don't want to limit the amount to documents (if there is 10M documents I
want to get all 10M results) so I use int.MaxValue but then I get out of
memory exception.

Lucene doesn't support paging so I can't ask X documents every time
starting form document Y (because when I want to get the 20th document I
need to get the first one first).

What can I do?



Thanks,

Omri

Re: How to get all documents that fits a query?

Posted by Omri Suissa <om...@diffdoof.com>.
Hi,
Thanks!

*Omri Suissa     **VP R&D*

*Tel:    +972 9 7724228                         **DiffDoof .ltd**
            *

*Cell:   +972 54 5395206                       **11, Galgaley Haplada
Street, *

*Fax:   +972 9 9512577**                         P.O.Box 2150***

*www.DiffDoof.com* <http://www.DiffDoof.com>*                              *
*Herzlia Pituach 46120, Israel*



On Tue, Dec 4, 2012 at 6:42 PM, Simon Svensson <si...@devhost.se> wrote:

>  Hi,
>
> You could build a custom collector that does this by reading domain ids in
> the Collect method. You wouldn't hit an OutOfMemoryException if you avoid
> reading all hits into an array (or other storage), but processing the hits
> "as they come".
>
> Example using DelegatingCollector<https://github.com/devhost/Corelicious/blob/master/Corelicious.Lucene/DelagatingCollector.cs>
>
> var collector = new DelegatingCollector((reader, id) => {
>     var document = reader.Document(id);
>     // Do something with your document.
> });
> searcher.Search(query, collector);
>
> On 2012-12-04 17:35, Omri Suissa wrote:
>
>   Hi,
>
> I want to enumerate all the documents in the index that fits a specific
> query (let's say "cat") and perform a task in my database.
>
> When I search I need to give a collector, my common search method using
> TopScoreDocCollector that gets numHits in the Create method.
>
> I don't want to limit the amount to documents (if there is 10M documents I
> want to get all 10M results) so I use int.MaxValue but then I get out of
> memory exception.
>
> Lucene doesn't support paging so I can't ask X documents every time
> starting form document Y (because when I want to get the 20th document I
> need to get the first one first).
>
> What can I do?
>
>
>
> Thanks,
>
> Omri
>
>
>

Re: How to get all documents that fits a query?

Posted by Simon Svensson <si...@devhost.se>.
Hi,

You could build a custom collector that does this by reading domain ids 
in the Collect method. You wouldn't hit an OutOfMemoryException if you 
avoid reading all hits into an array (or other storage), but processing 
the hits "as they come".

Example using DelegatingCollector 
<https://github.com/devhost/Corelicious/blob/master/Corelicious.Lucene/DelagatingCollector.cs> 


|var collector = new DelegatingCollector((reader, id) => {
     var document = reader.Document(id);
     // Do something with your document.
});
searcher.Search(query, collector);|

On 2012-12-04 17:35, Omri Suissa wrote:

> Hi,
>
> I want to enumerate all the documents in the index that fits a specific
> query (let's say "cat") and perform a task in my database.
>
> When I search I need to give a collector, my common search method using
> TopScoreDocCollector that gets numHits in the Create method.
>
> I don't want to limit the amount to documents (if there is 10M documents I
> want to get all 10M results) so I use int.MaxValue but then I get out of
> memory exception.
>
> Lucene doesn't support paging so I can't ask X documents every time
> starting form document Y (because when I want to get the 20th document I
> need to get the first one first).
>
> What can I do?
>
>
>
> Thanks,
>
> Omri
>