You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Sebastian Marius Kirsch <sk...@sebastian-kirsch.org> on 2006/03/07 19:15:34 UTC

Scoring with FunctionQueries?

Hello,

I have been trying out Yonik's excellent FunctionQuery (from Solr),
but am having some problems regarding the scoring of FunctionQueries
in conjunction with other queries.

I am currently researching a data fusion approach, where you have
several separate scores for a document and combine them to produce a
composite score. One of these scores is the regular Lucene score,
which I essentially treat as a black box.

I'm trying a simple linear combination, ie.

score = a * score_a + b * score_b ...

I have one Query produced by a QueryParser, and one (possibly several)
FunctionQueries which provide additional scores. I combine them with a
BooleanQuery with required clauses.

When I look at the explanation of such a combined Query, I see that
the scores of the subqueries are all multiplied by the query norm --
but I want only the score of the full-text query to be multiplied by
the query norm. The function queries should be added to the final
query as they are (the factors a, b, ... could be set using a query
boost.)

How do I achieve that? I'm rather lost in the forest of Scorer,
Similarity and Weight right now. Which is the right place to add such
a modification, so that it doesn't mess up the rest of the scoring?


I already tried extending BooleanQuery so that getSimilarity returns a
Similarity which overloads just queryNorm, to return 1.0. But this
queryNorm is then used both for the FunctionQuery and the full-text
query.


Thanks very much for your answers.

Regards, Sebastian

-- 
Sebastian Kirsch <sk...@sebastian-kirsch.org> [http://www.sebastian-kirsch.org/]

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Scoring with FunctionQueries?

Posted by Sebastian Marius Kirsch <sk...@sebastian-kirsch.org>.

On Tue, Mar 07, 2006 at 06:19:53PM -0800, Chris Hostetter wrote:
> once you've tried the suggestions above, can you make send out a
> selfcontained JUnit test showing the problems?

Thanks, Chris, glad you agree that it doesn't work as you expect it to
work. I will try your suggestions and send in a JUnit test. May take a
while though, as I'm just leaving for CeBIT.

Oh, FWIW, I'm using Lucene 1.9.1 and the Solr nightly build from
2006-03-07 for the FunctionQuery implementation.

Regards, Sebastian

-- 
Sebastian Kirsch <sk...@sebastian-kirsch.org> [http://www.sebastian-kirsch.org/]

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Scoring with FunctionQueries?

Posted by Chris Hostetter <ho...@fucit.org>.

: I tried both approaches, and both don't seem to do what I
: want. Perhaps I did not understand you properly.

>From what I can tell it looks like you understood me perfectly, I too am
baffled by the results you are getting.  I have a couple of thoughts:

1) check the raw core you get from these docs using a HitCollector and
compare thatwith the value from explain ... the explain info is calculated
through a parallel code path which differs from the normal search/score
code path and it's totally possible there are bugs (BooleanQuery for
example will happily deal with sub queries that return scores of <= 0, but
it's explain functionwill not ... i don't think that's the issue here, but
it may be similar.

2) Add some logging (or set some breakpoints) to your custom similarites
   queryNorm methods (and your getSimilarity methods) to see
   if/when/how-often the methods are being called.

2) Try eliminating some variables andd see what happens ...
   a) create concrete subclasses instead of using anonomous instances with
      overriden methods.
   b) don't bother using FunctionQuery, just use two seperate TermQueries
      with different getSimilarity() methods (FunctionQuery is fairly new
      ... there may be bugs in it, also this way if you still have a
      problem you have a use case that anyone with lucene familiarty will
      understand even if they've never seen FunctionQuery)

: I generated a small in-memory index (six documents) for testing your
: suggestions, with some text in field "content" and a numeric score in
: field "score". Following are the code I used and the explanations I
: obtained.

once you've tried the suggestions above, can you make send out a
selfcontained JUnit test showing the problems?

: Please see the code above. I have not delved into the depths of Lucene
: yet, but it seems that Lucene uses only one similarity instance for
: scoring all clauses in the boolean query, and doesn't honour the
: similarity instances provided by the individual clauses.

i just double checked, and i can't see anyway that could be happening --
but you're seeing something weird, so *something* isn't working the way i
thought.  as i said, if you can post a self contained unit test that
demonstrates the problem, then maybe someone can spot the glitch.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Scoring with FunctionQueries?

Posted by Sebastian Marius Kirsch <sk...@sebastian-kirsch.org>.

Dear Chris,

thanks very much for your quick answer.

I tried both approaches, and both don't seem to do what I
want. Perhaps I did not understand you properly.

I generated a small in-memory index (six documents) for testing your
suggestions, with some text in field "content" and a numeric score in
field "score". Following are the code I used and the explanations I
obtained.

On Tue, Mar 07, 2006 at 11:10:51AM -0800, Chris Hostetter wrote:
> 1) change the default similarity (using Similarity.setDefault(Similarity)
> used by all queries to a version with queryNorm returning an constant, and
> then in the few queries where you want the more traditional queryNorm,
> override the getSimilarity method inline...
> 
>    Query q = new TermQuery(new Term("foo","bar")) {
>       public Similarity getSimilarity(Searcher s) {
>         return new DefaultSimilarity();
>       }
>    };

This is the code I used:

    IndexSearcher searcher = new IndexSearcher(directory);
 
    searcher.setSimilarity(new DefaultSimilarity() {
      public float queryNorm(float sumOfSquaredWeight) {
        return 1.0f;
      }
    });
    
    TermQuery tq = new TermQuery(new Term("content", "desmond")) {
      public Similarity getSimilarity(Searcher s) {
        return new DefaultSimilarity();
      }
    };
    
    FunctionQuery fq = new FunctionQuery(new FloatFieldSource("score"));
    
    BooleanQuery bq = new BooleanQuery();
    bq.add(fq, BooleanClause.Occur.SHOULD);
    bq.add(tq, BooleanClause.Occur.MUST);
 
And this is the explanation I obtained:

2.526826 = sum of:
  0.6 = FunctionQuery(org.apache.solr.search.function.FloatFieldSource:float(score)), product of:
    0.6 = float(score)=0.6
    1.0 = boost
    1.0 = queryNorm
  1.926826 = weight(content:desmond in 3), product of:
    2.0986123 = queryWeight(content:desmond), product of:
      2.0986123 = idf(docFreq=1)
      1.0 = queryNorm
    0.9181429 = fieldWeight(content:desmond in 3), product of:
      1.0 = tf(termFreq(content:desmond)=1)
      2.0986123 = idf(docFreq=1)
      0.4375 = fieldNorm(field=content, doc=3)

So, as you see, the query norm for the FunctionQuery is 1.0, but for
the TermQuery, this query norm is also used (when it should be
computed from the terms in the query.)

> 2) reverse step one ... override getSimiliarity() just in the classes you
> want to queryNorm to be constant and leave hte default alone.

OK, so this would look like the following:

    IndexSearcher searcher = new IndexSearcher(directory);
 
    TermQuery tq = new TermQuery(new Term("content", "desmond"));
    FunctionQuery fq = new FunctionQuery(new FloatFieldSource("score")) {
      public Similarity getSimilarity(Searcher s) {
        return new DefaultSimilarity() {
          public float queryNorm(float sumOfSquaredWeight) {
            return 1.0f;
          }
        };
      }
    };
    
    BooleanQuery bq = new BooleanQuery();
    bq.add(fq, BooleanClause.Occur.SHOULD);
    bq.add(tq, BooleanClause.Occur.MUST);

And what I get as an explanation is this:

1.0869528 = sum of:
  0.25809917 = FunctionQuery(org.apache.solr.search.function.FloatFieldSource:float(score)), product of:
    0.6 = float(score)=0.6
    1.0 = boost
    0.43016526 = queryNorm
  0.82885367 = weight(content:desmond in 3), product of:
    0.90275013 = queryWeight(content:desmond), product of:
      2.0986123 = idf(docFreq=1)
      0.43016526 = queryNorm
    0.9181429 = fieldWeight(content:desmond in 3), product of:
      1.0 = tf(termFreq(content:desmond)=1)
      2.0986123 = idf(docFreq=1)
      0.4375 = fieldNorm(field=content, doc=3)

So, this is also wrong, but in a different way -- the queryNorm for
the FunctionQuery should be 1.0.

I hope I interpreted your explanations correctly, and this is what you
intended me to try.


So, what I *really* want is something like this (modulo normalization;
I might want to boost both clauses to 0.5. But I'm not worrying about
that right now.):

1.42885367 = sum of:
  0.6 = FunctionQuery(org.apache.solr.search.function.FloatFieldSource:float(score)), product of:
    0.6 = float(score)=0.6
    1.0 = boost
    1.0 = queryNorm
  0.82885367 = weight(content:desmond in 3), product of:
    0.90275013 = queryWeight(content:desmond), product of:
      2.0986123 = idf(docFreq=1)
      0.43016526 = queryNorm
    0.9181429 = fieldWeight(content:desmond in 3), product of:
      1.0 = tf(termFreq(content:desmond)=1)
      2.0986123 = idf(docFreq=1)
      0.4375 = fieldNorm(field=content, doc=3)

> Hmmm ... that really doesn't sound right, are you sure you don't mean you
> changed the default similarity, or changed the similarity on the searcher?

Please see the code above. I have not delved into the depths of Lucene
yet, but it seems that Lucene uses only one similarity instance for
scoring all clauses in the boolean query, and doesn't honour the
similarity instances provided by the individual clauses.

Or I'm wrong somewhere ;)

I've also wondered whether perhaps I might get by with not normalizing
the query, or with using a queryNorm of 1.0 everywhere. But then the
magnitude of the Lucene similarity score and my "static score" will
not be comparable, of course.


I hope someone with more insight into Lucene scoring can shed light on
this.

Regards, Sebastian

-- 
Sebastian Kirsch <sk...@sebastian-kirsch.org> [http://www.sebastian-kirsch.org/]

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Scoring with FunctionQueries?

Posted by Chris Hostetter <ho...@fucit.org>.

: but I want only the score of the full-text query to be multiplied by
: the query norm. The function queries should be added to the final
: query as they are (the factors a, b, ... could be set using a query
: boost.)
:
: How do I achieve that? I'm rather lost in the forest of Scorer,
: Similarity and Weight right now. Which is the right place to add such
: a modification, so that it doesn't mess up the rest of the scoring?

two approaches would work depending on your goal:

1) change the default similarity (using Similarity.setDefault(Similarity)
used by all queries to a version with queryNorm returning an constant, and
then in the few queries where you want the more traditional queryNorm,
override the getSimilarity method inline...

   Query q = new TermQuery(new Term("foo","bar")) {
      public Similarity getSimilarity(Searcher s) {
        return new DefaultSimilarity();
      }
   };

2) reverse step one ... override getSimiliarity() just in the classes you
want to queryNorm to be constant and leave hte default alone.

: I already tried extending BooleanQuery so that getSimilarity returns a
: Similarity which overloads just queryNorm, to return 1.0. But this
: queryNorm is then used both for the FunctionQuery and the full-text
: query.

Hmmm ... that really doesn't sound right, are you sure you don't mean you
changed the default similarity, or changed the similarity on the searcher?



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org