You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Richard Wilde <ri...@wildesoft.net> on 2012/11/28 22:55:02 UTC

Get number of documents

Hi , I have the following code that I inherited that searches for
"Extract:Simon AND Type:Contact" and the following:

 

1)      I want to find the total number of Documents that match the query

2)      I want to only return the first X Id's (from maxResults)

 

However when I look at this code I can see that 10,000 is passed into the
TopScoreDocCollector.Create method and the code loops through until it hits
the maxResults where it breaks out of the hits loop.

I am sure there is a better approach to this but Google so far has not
helped me out here. Can anyone point me in the right direction? I am using
v2.9.4.2 - will v3.0.3 server me better here?

 

The code I have is:-

 

var indexSearch = new IndexSearcher(indexReader);

var queryParser = new QueryParser(Version.LUCENE_29, "Extract", Analyzer);

 

var special = "Simon AND Type:Contact";

 

var collector = TopScoreDocCollector.Create(10000, true);

indexSearch.Search(queryParser.Parse(special), collector);

var hits = collector.TopDocs().ScoreDocs;

 

for (var i = 0; i < hits.Length; i++)

{

     if (i >= maxResults) break;

     var document = indexSearch.Doc(hits[i].Doc);

luceneDocuments.Add(document);

}

 

Thanks

Rippo

 


Re: Get number of documents

Posted by Simon Svensson <si...@devhost.se>.
Hi.

 1.

    TopScoreDocCollector.TopDocs
    <http://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/search/TopDocsCollector.html#topDocs%28%29>
    returns an TopDocs
    <http://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/search/TopDocs.html>
    instance which have teo properties; |scoreDocs| and |totalHits|.
    You're looking for the later.

 2.

    The first parameter to TopScoreDocCollector.Create is the |numHits|
    parameter. Pass |maxResults| I you only want the first |maxResults|
    results, instead of 10000.

// Simon

On 2012-11-28 22:55, Richard Wilde wrote:

> Hi , I have the following code that I inherited that searches for
> "Extract:Simon AND Type:Contact" and the following:
>
>   
>
> 1)      I want to find the total number of Documents that match the query
>
> 2)      I want to only return the first X Id's (from maxResults)
>
>   
>
> However when I look at this code I can see that 10,000 is passed into the
> TopScoreDocCollector.Create method and the code loops through until it hits
> the maxResults where it breaks out of the hits loop.
>
> I am sure there is a better approach to this but Google so far has not
> helped me out here. Can anyone point me in the right direction? I am using
> v2.9.4.2 - will v3.0.3 server me better here?
>
>   
>
> The code I have is:-
>
>   
>
> var indexSearch = new IndexSearcher(indexReader);
>
> var queryParser = new QueryParser(Version.LUCENE_29, "Extract", Analyzer);
>
>   
>
> var special = "Simon AND Type:Contact";
>
>   
>
> var collector = TopScoreDocCollector.Create(10000, true);
>
> indexSearch.Search(queryParser.Parse(special), collector);
>
> var hits = collector.TopDocs().ScoreDocs;
>
>   
>
> for (var i = 0; i < hits.Length; i++)
>
> {
>
>       if (i >= maxResults) break;
>
>       var document = indexSearch.Doc(hits[i].Doc);
>
> luceneDocuments.Add(document);
>
> }
>
>   
>
> Thanks
>
> Rippo
>
>   
>
>