You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eric Kilby <ki...@stylefeeder.com> on 2009/01/09 18:56:30 UTC

Boosting based on number of values in multiValued field?

hi,

I'm looking through the list archives and the documentation on boost
queries, and I don't see anything that matches this case.  

I have an index of documents, some of which are very similar but not
identical.  Therefore the scores are very close and the ordering is affected
by somewhat arbitrary factors.  When I do a query the similar documents come
up close together, so that's a good start.  

Each document has a multivalued field, with 1-n values in it (as many as
20).  The actual values don't matter to me, but the number of values is a
rough proxy for the quality of a record.  I'd like to apply a very small
boost based on the number of values in that field, so that among a set of
similar documents the ones with more values will score higher and sort ahead
of those with less values.

Is there currently a function or set of functions that can be applied to
this use case?  Or a place where I could build and contribute something?  In
that case I'd look for a starting point on where to look.

thanks,
Eric
-- 
View this message in context: http://www.nabble.com/Boosting-based-on-number-of-values-in-multiValued-field--tp21377250p21377250.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Boosting based on number of values in multiValued field?

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jan 9, 2009, at 12:56 PM, Eric Kilby wrote:
> Each document has a multivalued field, with 1-n values in it (as  
> many as
> 20).  The actual values don't matter to me, but the number of values  
> is a
> rough proxy for the quality of a record.  I'd like to apply a very  
> small
> boost based on the number of values in that field, so that among a  
> set of
> similar documents the ones with more values will score higher and  
> sort ahead
> of those with less values.

The simplest technique would be to have your indexer add another field  
with the count (or some boost factor based on it), and then leverage  
that.  Perhaps even use the document boost capability at indexing time.

	Erik