You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Smith G <gu...@gmail.com> on 2010/03/08 12:39:02 UTC

question related to coord() [might be expert level]

Hello,
         I came to know that coord() value is being calculated on each
sub-query (BooleanQuery) present in the main query.
For Ex : f-field, k-keyword

            (( f1:k1 OR f2:k2) OR f3:k3) OR f4:k4

Here if I am correct, coord() is being calculated totally 3 times. My
goal is to boost ( or edit formula of ) coord() value which is "for
the last time". It may seem strange untill you know why it is needed.

   We are expanding query using QueryParser plugin. It adds
synonym-terms of each field.
For example : town:lausanne ---> is expanded to : (town:lausanne OR
city:lausanne).
Consider a big query : Let us assume that f1s1-> first synonym of f1 ,
f1s2---> second synonym of f1, and so on....
So, the query mentioned above is expanded to ..

(((f1:k1 or f1s1:k1 or f1s2:k1) OR (f2:k2 or f2s1:k2)) OR (f3:k3 or
f3s1:k3))  OR  f4:k4  [assume no synonyms for f4] .

So, here it makes sense to edit coord formula for the last "coord"
value, but not for every sub-boolean query because there could be 10
synonyms in some cases, etc..
My questions..

1) Is there any chance of finding out inside Similarity whether
current one is the last coord() ?

2) Or is there any other place where we can edit and reach our goal.

3) I have found out usage of "Coordintor" inside "BooleanScorer2",
which seems there could be a way to boost the last element of the
index in coordFactors[], but I do not know whether there could be
plugin for that, or even what would be the effect.

  This seems really expert level [for my knowledge], so I am seeking some help.

Thanks.

Re: question related to coord() [might be expert level]

Posted by Chris Hostetter <ho...@fucit.org>.
:          I came to know that coord() value is being calculated on each
: sub-query (BooleanQuery) present in the main query.
: For Ex : f-field, k-keyword
: 
:             (( f1:k1 OR f2:k2) OR f3:k3) OR f4:k4
: 
: Here if I am correct, coord() is being calculated totally 3 times. My

More specificly: every BooleanQuery has a "coord" value it factors into 
it's scoring, based on how many of the clauses match.  this is even if the 
BooleanQuery is a clause in another BooleanQuery.

Note that this is the fundemental difference between the example query 
listed above and something like...

      ( f1:k1 f2:k2 f3:k3 f4:k4 )

...both queries should match the exact same set of documents, but hte 
scores will be differnet because of the coord factor (and the queryNorm)


: (((f1:k1 or f1s1:k1 or f1s2:k1) OR (f2:k2 or f2s1:k2)) OR (f3:k3 or
: f3s1:k3))  OR  f4:k4  [assume no synonyms for f4] .
: 
: So, here it makes sense to edit coord formula for the last "coord"
: value, but not for every sub-boolean query because there could be 10
: synonyms in some cases, etc..
: My questions..
: 
: 1) Is there any chance of finding out inside Similarity whether
: current one is the last coord() ?
: 
: 2) Or is there any other place where we can edit and reach our goal.

The right way to do this, is that when your code constructs the "inner" 
BooleanQueries, it should modify the Similarity instances used by that 
those BooleanQueries to ignore the coord value -- this is such a common 
thing for BooleanQueries, there is actually a constructor arg to ask 
BooleanQuery to do it automaticly (disableCoord) ...

http://lucene.apache.org/java/3_0_0/api/all/org/apache/lucene/search/BooleanQuery.html#BooleanQuery%28boolean%29


-Hoss