You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Soeren Pekrul <so...@gmx.de> on 2006/11/09 10:21:40 UTC
Scoring depending on terms combination
How can I manipulate the score depending on the combination of query
terms containing in the result document? Not a single term is important.
That could be boosted. Important is the combination of terms.
The user searches for the terms A, B, C and D.
Of-course, the document containing all terms has the highest score. The
document containing just the terms B and C has a higher score than the
document containing the terms A and B.
A+B+C+D > B+C > A+B
I know the boosting combinations at query time.
Has anybody an idea how to do this?
Thanks. Sören
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Scoring depending on terms combination
Posted by Soeren Pekrul <so...@gmx.de>.
Chris Hostetter wrote:
> that's a pretty specific and not all together intuitive ranking... can you
> elaborate on your actual use case? ... why is B+C better then A+B ? .. are
> these rules specific to a known list of terms, or is a general rule
> relating to how you parse the users input?
The original user query was a Boolean query:
+(A B) +(C D)
It is possible that this query is to restrict. So I would like to give
the user to the hits matching his original query additional hits.
> off the top of my head, i would suggest building one big BooleanQuery and
> putting each of the permutations you care about in it as subqueries with
> boosts that corripsond to their importance. you'll probably want to
> disable the coord, and depending on how you want things to work if a doc
> matches your "A+B" clause *and* matches your "B+C" clause you may want to
> use a DisjunctionMaxQuery with a 0.0f tiebreaker value instead of a
> BooleanQuery.
My first idea was sub classing TopDocCollector and overriding the
collect function. In this function I wanted to ask for terms of the
current document, calculate the score and call the collect function of
the base class with the new score as argument. I afraid it takes to much
time.
Boolean queries for each interesting combination with a corresponding
boost value should be faster.
Thank you, Hoss.
Sören
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Scoring depending on terms combination
Posted by Chris Hostetter <ho...@fucit.org>.
: The user searches for the terms A, B, C and D.
: Of-course, the document containing all terms has the highest score. The
: document containing just the terms B and C has a higher score than the
: document containing the terms A and B.
:
: A+B+C+D > B+C > A+B
:
: I know the boosting combinations at query time.
that's a pretty specific and not all together intuitive ranking... can you
elaborate on your actual use case? ... why is B+C better then A+B ? .. are
these rules specific to a known list of terms, or is a general rule
relating to how you parse the users input?
off the top of my head, i would suggest building one big BooleanQuery and
putting each of the permutations you care about in it as subqueries with
boosts that corripsond to their importance. you'll probably want to
disable the coord, and depending on how you want things to work if a doc
matches your "A+B" clause *and* matches your "B+C" clause you may want to
use a DisjunctionMaxQuery with a 0.0f tiebreaker value instead of a
BooleanQuery.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org