You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Matt Honeycutt <mb...@gmail.com> on 2009/10/29 18:14:56 UTC

Boosting Multi-Value Fields

I am cross-posting this question here on behalf of a co-worker.  The
original question is on StackOverflow at
http://stackoverflow.com/questions/1645197/boosting-multi-value-fields, but
I thought there is a better chance that someone on this list has come across
something similar than the more-general StackOverflow crowd. Any suggestions
would be greatly appreciated.

---------------------

I have a set of documents containing scored items that I'd like to index.
Our data structure looks like:

Document
  ID
  Text
  List<RelatedScore>

RelatedScore
  ID
  Score

My first thought was to add each RelatedScore as a multi-value field using
the Boost property of the Field to modify the value of the particular score
when searching.

foreach (var relatedScore in document.RelatedScores) {
  var field = new Field("RelatedScore", relatedScore.ID,
                        Field.Store.YES, Field.Index.UN_TOKENIZED);
  field.SetBoost(relatedScore.Score);
  luceneDoc.Add(field);
}

However, it appears that the "Norm" that is calculated applies to the entire
multi-field - all the RelatedScore" values for a document will end up having
the same score.

Is there a mechanism in Lucene to allow for this functionality? I would
rather not create another index just to account for this - it feels like
there should be a way using a single index. If there isn't a means to
accomplish this, a few ideas that we have to compensate are :

   1. Insert the multi-value field items in order of descending value. Then
   somehow add a positional-aware analysis to assign higher boost/score to the
   first items in the field.
   2. Add a high value score multiple times to the field. So, a RelatedScore
   with Score==1 might be added three times, while a RelatedScore with
   Score==.3 would only be added once.

Both of these will result in a loss of search fidelity on these fields, yes,
but they may be good enough. Any thoughts on this?