You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Eric Jain <Er...@isb-sib.ch> on 2006/01/09 10:34:36 UTC
Scoring by number of terms in field
Lucene seems to prefer matches in shorter documents. Is it possible to
influence the scoring mechanism to have matches in shorter fields score
higher instead?
For example, a query for "europe" should rank:
1. title:"Europe"
2. title:"History of Europe"
3. title:"Travel in Europe, Middle East and Africa"
4. subtitle:"Fairy Tales from Europe"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Scoring by number of terms in field
Posted by Eric Jain <Er...@isb-sib.ch>.
Paul Elschot wrote:
> In case you prefer to use the maximum score over the clauses you
> can use the DisjunctionMaxQuery from the development version.
Yes, that may help! I'll need to have a look...
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Scoring by number of terms in field
Posted by Paul Elschot <pa...@xs4all.nl>.
On Tuesday 10 January 2006 07:32, Eric Jain wrote:
> Paul Elschot wrote:
> >>For example, a query for "europe" should rank:
> >>
> >>1. title:"Europe"
> >>2. title:"History of Europe"
> >>3. title:"Travel in Europe, Middle East and Africa"
> >>4. subtitle:"Fairy Tales from Europe"
> >
> > Perhaps with this query (assuming the default implicit OR):
> >
> > title:europe subtitle:europe^0.5 body:europe
>
> This will ensure that match 4 appears at the end, but as far as I can see
> this won't help with getting matches 1-3 ordered correctly? Note that match
> 1 for example may have a "description" field that contains a lot terms, but
> no mention of the query term.
In general, the length of a field that does not match does not influence
the score, but it's still not easy to predict the order in this case.
With an OR over multiple fields:
- when a shorter field matches, it usually dominates the score for a
document.
- when other fields match, they contribute to the score via the sum
score over the clauses and via the coordination factor in the
DefaultSimilarity.
In case you prefer to use the maximum score over the clauses you
can use the DisjunctionMaxQuery from the development version.
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Scoring by number of terms in field
Posted by Eric Jain <Er...@isb-sib.ch>.
Paul Elschot wrote:
>>For example, a query for "europe" should rank:
>>
>>1. title:"Europe"
>>2. title:"History of Europe"
>>3. title:"Travel in Europe, Middle East and Africa"
>>4. subtitle:"Fairy Tales from Europe"
>
> Perhaps with this query (assuming the default implicit OR):
>
> title:europe subtitle:europe^0.5 body:europe
This will ensure that match 4 appears at the end, but as far as I can see
this won't help with getting matches 1-3 ordered correctly? Note that match
1 for example may have a "description" field that contains a lot terms, but
no mention of the query term.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Scoring by number of terms in field
Posted by Paul Elschot <pa...@xs4all.nl>.
On Monday 09 January 2006 10:34, Eric Jain wrote:
> Lucene seems to prefer matches in shorter documents. Is it possible to
> influence the scoring mechanism to have matches in shorter fields score
> higher instead?
A query is always in at least one field of a document.
>
> For example, a query for "europe" should rank:
>
> 1. title:"Europe"
> 2. title:"History of Europe"
> 3. title:"Travel in Europe, Middle East and Africa"
> 4. subtitle:"Fairy Tales from Europe"
Perhaps with this query (assuming the default implicit OR):
title:europe subtitle:europe^0.5 body:europe
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Scoring by number of terms in field
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Sorry for the quick reply, but yes you can accomplish this by
tweaking a custom Similarity implementation (or DefaultSimilarity
subclass). Check out IndexSearcher.explain on a query and a document
and then tinker.
Erik
On Jan 9, 2006, at 4:34 AM, Eric Jain wrote:
> Lucene seems to prefer matches in shorter documents. Is it possible
> to influence the scoring mechanism to have matches in shorter
> fields score higher instead?
>
> For example, a query for "europe" should rank:
>
> 1. title:"Europe"
> 2. title:"History of Europe"
> 3. title:"Travel in Europe, Middle East and Africa"
> 4. subtitle:"Fairy Tales from Europe"
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org