You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Chun Wei Ho <cw...@gmail.com> on 2006/07/11 04:32:52 UTC

Reducing the boost for a particular Term

I have a index from which I have a number of documents from authors,
but would like to drop the relevance/score for documents from one
particular author using the query. That is for documents returned by
querying: (content:"miracle cure"), I would like to reduce the
relevancy of authorid:3024

However I tried several combinations and they didnt work the way I
expected, e.g.

+(content:"miracle cure") (authorid:3024^0.5)
=> increases authorid:3024 score instead

+(content:"miracle cure") +(authorid:3024^0.5 ((-authorid:3024))^10.0)
=> The boosted optional prohibited term seems to have no effect

+(content:"miracle cure") (-authorid:3024^0.5)
=> Again, the boosted optional prohibited term seems to have no effect

Also this approach appears to add a small modifier to the overall
relevancy returned by +(content:"miracle cure"). Is there a better way
to do a multiplicative modification - so that for example the score as
returned by +(content:"miracle cure") is reduced to 0.6 of its value
if the authorid is 3024.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Reducing the boost for a particular Term

Posted by Chris Hostetter <ho...@fucit.org>.

: particular author using the query. That is for documents returned by
: querying: (content:"miracle cure"), I would like to reduce the
: relevancy of authorid:3024

: +(content:"miracle cure") +(authorid:3024^0.5 ((-authorid:3024))^10.0)
: => The boosted optional prohibited term seems to have no effect
:
: +(content:"miracle cure") (-authorid:3024^0.5)
: => Again, the boosted optional prohibited term seems to have no effect

that's because you can't have a boolean query containing nothing but
negative clauses ... to "penalize" documents that match a query, you must
reward documents which match the inverse of that query.   From a Set
perspective the inverse of a query like "authorid:3024" is...

      +(+matchalldocs:true -authorid:3024)

...but for your purposes, you could just as easily use...

   +content:"miracle cure" +(+content:"miracle cure" -authorid:3024)

: Also this approach appears to add a small modifier to the overall
: relevancy returned by +(content:"miracle cure"). Is there a better way
: to do a multiplicative modification - so that for example the score as
: returned by +(content:"miracle cure") is reduced to 0.6 of its value
: if the authorid is 3024.

First off: don't place to much stock in the specific numeric score value
returned by a search, Scores aren't absolute, so trying to do absolute
things to them isn't really meaningful.

Second: take a look at the Explanation class (and the Searcher.explain
method) ... it will help you understand why you get the scores you are
getting, and how methods in the Similarity class (which you can change)
affect things.  With the query structure i outlined above, you can
probably get pretty close to what you want just by tewaking the boosts.

Third: if you really want to go all out, you could write a HitCollector to
penalize documents by performing some arbitrary match on the
scores of documents matching some specific criteria (like the authorid)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org