You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Lisa Lee <co...@hanmail.net> on 2008/02/01 12:46:12 UTC

How can I get document's top n raw score?

I need know document's top n raw score & term.

For example,

If one document have {apple, banana, coconut} terms, and I need top 2 score
in the document.

Simple way is just search all term in the document and sort the score - like
as below.

first, search about 'apple' term then write the score using IndexSearcher,
Query, Hits, Document and Explanation class.
second, as same way, search about term 'banana'.
third, as same way, search about term 'coconut'. 
last, compare these score and find out top 2 high score term and score.

It is no problem if number of documents are small.
But, I handle over the 1,000,000 documents and over the 20,000,000 terms.

Is there any way solve this problem more quickly?

I use 2.2.0 ver lucene.
-- 
View this message in context: http://www.nabble.com/How-can-I-get-document%27s-top-n-raw-score--tp15224916p15224916.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How can I get document's top n raw score?

Posted by Grant Ingersoll <gs...@apache.org>.
I'm not sure I understand what you are asking, but, you can get non- 
normalized scores by using the lower-level non-Hits based search like  
the TopDocs, etc.

However, scores are not really all that comparable across queries.

-Grant

On Feb 1, 2008, at 6:46 AM, Lisa Lee wrote:

>
> I need know document's top n raw score & term.
>
> For example,
>
> If one document have {apple, banana, coconut} terms, and I need top  
> 2 score
> in the document.
>
> Simple way is just search all term in the document and sort the  
> score - like
> as below.
>
> first, search about 'apple' term then write the score using  
> IndexSearcher,
> Query, Hits, Document and Explanation class.
> second, as same way, search about term 'banana'.
> third, as same way, search about term 'coconut'.
> last, compare these score and find out top 2 high score term and  
> score.
>
> It is no problem if number of documents are small.
> But, I handle over the 1,000,000 documents and over the 20,000,000  
> terms.
>
> Is there any way solve this problem more quickly?
>
> I use 2.2.0 ver lucene.
> -- 
> View this message in context: http://www.nabble.com/How-can-I-get-document%27s-top-n-raw-score--tp15224916p15224916.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org