You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Philippe Deslauriers (Beetext)" <de...@beetext.com> on 2006/04/27 14:39:49 UTC

Occurence (freq) and ordering

Hi again,

 

Upgrading from lucene 1.3 to 1.9.

 

We need to order the result in order of occurrences (score of a doc = sum of
occurrences of all Query).

In lucene 1.3 we did rewrite all the Query classes (BooleanQuery,
PhraseQuery, etc..) to reach our goals, but is there an easier way to do it
in 1.9?

 

I am just starting to read on Similarity, weights etc.

 

Can someone give me a heads up?

 

Thanks!

 

Philippe Deslauriers

RE: Occurence (freq) and ordering

Posted by Chris Hostetter <ho...@fucit.org>.

: By example doc contains 3 times the word "test", and 1 time the word
: "example", and the query was looking for both words, the score for the doc
: should be 4.
:
: But whatever I do, score is 1.

1) this is where Searcher.explain really comes in handy ... it will help
you seewhat is going on.

2) are you using the Hits API?  if so then if the high score is above 1,
all scores are normalized realtive hte high score.  use the TopDocs search
method instead.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Occurence (freq) and ordering

Posted by Philippe Deslauriers <de...@beetext.com>.

Thanks for the Field.setOmitNorms(true) tip!

Regarding the Similarity implementation I am trying to do, somehow it does
not work.

Here's what I understand:

Scorer implementation uses the method defined in Similarity, to compute
score. (the formula expressed in
"http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.
html" is implemented in the scorer.  

According to the formula if all my methods return 1, except for tf(freq)
which simply returns freq, all should work.

score(q,d) = SUM (t in q) : tf(freq) * 1 * 1 * 1 * 1 * 1 * 1

By example doc contains 3 times the word "test", and 1 time the word
"example", and the query was looking for both words, the score for the doc
should be 4.

But whatever I do, score is 1.

What I am missing?

Thanks

Phil.

-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: Thursday, April 27, 2006 2:21 PM
To: java-user@lucene.apache.org
Subject: Re: Occurence (freq) and ordering


: Upgrading from lucene 1.3 to 1.9.

: We need to order the result in order of occurrences (score of a doc = sum
of
: occurrences of all Query).

: I am just starting to read on Similarity, weights etc.

You are definitely on the right track with Similarity.  What you want is a
Similarity implimentation where the values returned by most methods are
either 0 or 1, except for the tf(int) and tf(float) which should be an
identify function.

If you *allways* want *every* query to work this way, then you may also
want to look at using the new Field.setOmitNorms(true) option when you
index your documents.  It not only removes the lengthNorm from the scoring
equation, but it can help to reduce the size of your index (which i seem
to recall you were concerned with an another thread)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Occurence (freq) and ordering

Posted by Chris Hostetter <ho...@fucit.org>.

: Upgrading from lucene 1.3 to 1.9.

: We need to order the result in order of occurrences (score of a doc = sum of
: occurrences of all Query).

: I am just starting to read on Similarity, weights etc.

You are definitely on the right track with Similarity.  What you want is a
Similarity implimentation where the values returned by most methods are
either 0 or 1, except for the tf(int) and tf(float) which should be an
identify function.

If you *allways* want *every* query to work this way, then you may also
want to look at using the new Field.setOmitNorms(true) option when you
index your documents.  It not only removes the lengthNorm from the scoring
equation, but it can help to reduce the size of your index (which i seem
to recall you were concerned with an another thread)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org