You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Allan Hill <pa...@metajure.com> on 2012/02/08 23:42:11 UTC

Please explain DisjunctionMaxQuery JavaDoc.

What the heck does is the JavaDoc for DisjunctionMaxQuery saying:

"A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries. This is useful when searching for a word in multiple fields with different boost factors (so that the fields cannot be combined equivalently into a single search field). We want the primary score to be the one associated with the highest boost, not the sum of the field scores (as BooleanQuery would give). If the query is "albino elephant" this ensures that "albino" matching one field and "elephant" matching another gets a higher score than "albino" matching both fields. To get this result, use both BooleanQuery and DisjunctionMaxQuery: for each term a DisjunctionMaxQuery searches for it in each field, while the set of these DisjunctionMaxQuery's is combined into a BooleanQuery. The tie breaker capability allows results that include the same term in multiple fields to be judged better than results that include this term in only the best of those multiple fields, without confusing this with the better case of two different terms in the multiple fields."

"Maximum ...  as produced by any subquery", OK that makes sense.  We pick the score that is the highest
If you have
DMQ ( Q1, Q2, Q3 )
And the subquery scores are ( 0.1, 0.2, 0.1) then Q2 wins and the overall score is 0.2 right?
But then what is the meaning of "any additional matching subqueries"?
Is the description then

(1)    Running with the idea that something has to tie to involve a tie-breaker, I might say "If two subqueries are both the maximum of all the subqueries, the score will be the maximum score increased by the tie breaker increment"
Example: DMAQ with an increment of 0.15 and three subqueries ( Q1, Q2, Q3 ) which score (0.1, 0.2, 0.2) then
because there are two 0.2 score then the score for this query will be 0.2 + 0.15 or 0.35.  If the scores are (0.1,0.1, 0.2) the overall score is 0.2, because we had only one maximum.

OR alternately forgetting the idea that anything is tied within the set of subqueries


(2)    "if in addition to the maximum subquery score there are any other subqueries with nonzero scores, the overall score is increased by the tiebreaker increment."

Example: Using the same increment of 0.15, if the score are (0.0, 0.0, 0.2) the result is score 0.2, but (0.0, 0.1, 0.2 ) scores 0.35.

I'm leaning toward interpretation #2, but "tie breaking for ... additional matching..." does not say that to me, because I don't see any tie.
Once I understand that I'll ask about the how to "use both BooleanQuery and DisjunctionMaxQuery".

-Paul

RE: Please explain DisjunctionMaxQuery JavaDoc.

Posted by Paul Allan Hill <pa...@metajure.com>.

> -----Original Message-----
> From: Paul Allan Hill [mailto:paul@metajure.com]
> Sent: Wednesday, February 08, 2012 2:42 PM
> To: java-user@lucene.apache.org
> Subject: Please explain DisjunctionMaxQuery JavaDoc.
> 
> What the heck does is the JavaDoc for DisjunctionMaxQuery saying:
> 
>[...] plus a tie
> breaking increment 

Oh my, the 1st problem is the class description discusses "tie breaking increment", but the API says tie breaking multiplier.
Then wondering around in the code I find
DisjuncitonMaxScorer.score()
...
return scoreMax + (scoreSum - scoreMax) * tieBreakerMultiplier;
...
Which is upon examination IS " the score of each non-maximum disjunct for a document is multiplied by this weight and added into the final score." As described in the c'tor of DisjunctionMaxQuery.
But what this has anything to do with any idea of a "tie" anywhere in this query I don't know.  

-Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org