You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by cristh <ha...@yahoo.com> on 2011/04/16 17:39:52 UTC

Choosing boosting in Lucene

Hello,

I have a few questions about boosting in Lucene. I am running a research
project where I have, for each document, 4 fields: f1, f2, f3, f4. I also
have a set of queries for my corpus, and I know the relevant documents for
each of these queries. What I want to study is how boosting affects the
search results of these queries. Basically, I want to show that by boosting
some of these fields the results are better (I hope).
I have, though, a few essential questions that I cannot figure out and I
would really appreciate some help...

1. Is there any difference between boosting the fields at index time and
boosting the terms in the queries which appear in these fields at search
time? 
Again, I know beforehand the set of queries and also the terms in these
queries which appear in the documents in the corpus in each of the fields.

2. In what range are boosting values usually chosen? I.e., should I choose
boosts in a 0.5-2 range (say 0.5, 1, 1.5, 2), like I have seen in soem
examples, or is it the same if I choose boosts in a range like 50-200
(respectively 50, 100, 150, 200)? 

3. How sensitive is boosting in Lucene? For example, if I know approximately
the importance of each field, and I want to assign boosting values
accordingly, what would be good differences between the values of the
boosting factor for the different fields? More precisely, if the importance
order is f1<f2<f3<f4, will it matter if I choose the boosts as (1,2,3,4), or
(1, 5, 10, 15)?

4. Is there any method besides trial and error for finding the boosts for
each field that work the best for a particular corpus? 

Thank you very much,
Cristina

--
View this message in context: http://lucene.472066.n3.nabble.com/Choosing-boosting-in-Lucene-tp2828117p2828117.html
Sent from the Lucene - General mailing list archive at Nabble.com.