You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Soeren Pekrul <so...@gmx.de> on 2006/11/09 10:21:40 UTC

Scoring depending on terms combination

How can I manipulate the score depending on the combination of query 
terms containing in the result document? Not a single term is important. 
That could be boosted. Important is the combination of terms.

The user searches for the terms A, B, C and D.
Of-course, the document containing all terms has the highest score. The 
document containing just the terms B and C has a higher score than the 
document containing the terms A and B.

A+B+C+D > B+C > A+B

I know the boosting combinations at query time.

Has anybody an idea how to do this?

Thanks. Sören

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Scoring depending on terms combination

Posted by Soeren Pekrul <so...@gmx.de>.
Chris Hostetter wrote:
> that's a pretty specific and not all together intuitive ranking... can you
> elaborate on your actual use case? ... why is B+C better then A+B ? .. are
> these rules specific to a known list of terms, or is a general rule
> relating to how you parse the users input?

The original user query was a Boolean query:
+(A B) +(C D)

It is possible that this query is to restrict. So I would like to give 
the user to the hits matching his original query additional hits.

> off the top of my head, i would suggest building one big BooleanQuery and
> putting each of the permutations you care about in it as subqueries with
> boosts that corripsond to their importance.  you'll probably want to
> disable the coord, and depending on how you want things to work if a doc
> matches your "A+B" clause *and* matches your "B+C" clause you may want to
> use a DisjunctionMaxQuery with a 0.0f tiebreaker value instead of a
> BooleanQuery.

My first idea was sub classing TopDocCollector and overriding the 
collect function. In this function I wanted to ask for terms of the 
current document, calculate the score and call the collect function of 
the base class with the new score as argument. I afraid it takes to much 
time.

Boolean queries for each interesting combination with a corresponding 
boost value should be faster.

Thank you, Hoss.

Sören

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Scoring depending on terms combination

Posted by Chris Hostetter <ho...@fucit.org>.
: The user searches for the terms A, B, C and D.
: Of-course, the document containing all terms has the highest score. The
: document containing just the terms B and C has a higher score than the
: document containing the terms A and B.
:
: A+B+C+D > B+C > A+B
:
: I know the boosting combinations at query time.

that's a pretty specific and not all together intuitive ranking... can you
elaborate on your actual use case? ... why is B+C better then A+B ? .. are
these rules specific to a known list of terms, or is a general rule
relating to how you parse the users input?

off the top of my head, i would suggest building one big BooleanQuery and
putting each of the permutations you care about in it as subqueries with
boosts that corripsond to their importance.  you'll probably want to
disable the coord, and depending on how you want things to work if a doc
matches your "A+B" clause *and* matches your "B+C" clause you may want to
use a DisjunctionMaxQuery with a 0.0f tiebreaker value instead of a
BooleanQuery.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org