You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gary Mangum <ga...@gmail.com> on 2006/01/17 06:20:02 UTC

How do I get a count of all search results inside of my content?

I am trying to find out a quick way to get a complete count of all search
results found in all of my Documents.

Let me back up...

I have split the content that I am searching into many Documents and then
indexed this content.  Each Document represents about one "paragraph" of
data.

Now I search all of my Documents for a word or phrase.

If I understand correctly, the Hits that are returned tell me which
Documents contain the information that I am searching for.  And Hits.length()
would tell me how many documents contain my information.

I would like to know how many total results were found for my search.  In
other words, if a Document contains the word or phrase more than once, I
would like to know this information so that I can return a "true" count of
search results that were found across all of my Documents.  It seems that
Lucene must already know this information since it searched the Document
already when it scored and added it to my Hits.

What is the best way to get this information quickly?

Thanks,


Gary

Re: How do I get a count of all search results inside of my content?

Posted by Chris Hostetter <ho...@fucit.org>.
1) There's no need to send the same message twice just because you didn't
get a rapid response to hte first one ... in most parts of hte US this has
been a three day weekend, so it's not that suprising that no one wrote a
reply yet since the first time you asked this question friday night.

2) you need to be carefully about your terminology...

: I would like to know how many total results were found for my search.  In
: other words, if a Document contains the word or phrase more than once, I
: would like to know this information so that I can return a "true" count of
: search results that were found across all of my Documents.  It seems that

The total results of your search is Hits.length().
1 result is 1 matching document.  what you are asking for is information
about the frequency of a word or phrase.

The TermEnum class makes it easy to find out the frequency of a term in
your entire index.

The frequency of a phrase is more complicated.   I would suggest you start
by looking at the documenation on Similarity and the way scores are
calculated.  I believe that it is possible to write an implimentation of
Similarity that will result in the raw score of a PhraseQuery on
any document being the number of times that phrase appears in the
document. You will then need to use a HitCollector to sum the raw scores
so they don't get normalized for you.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org