You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kasun Perera <ka...@opensource.lk> on 2012/06/19 11:56:49 UTC

Different Weights to Lucene fields with Okapi Similarity

Based on this link http://www2002.org/CDROM/refereed/643/node6.html , I'm
calculating Okapi similarity between the query document and another
document as below using Lucene:

I have indexed the documents using 3 fields. I want to give higher weight
to field 2 and field 3. I can't use Lucene's boost function since i'm using
a my own similarity function. Can anyone suggest me a method how to give
different weights to fields using this Okapi Similarity function?

This is Okapi Similarity Schema that I have used

sim(query, doc) = sum(t in terms(query), freq(t, query) * w(t, doc))

where (from the second link, slightly modified as I think the formula in
the link is incorrect)

w(t, doc) = idf(t) * (k+1)*freq(t, doc) / (k*(1-b + b*ls(doc)) + freq(t, doc))

ls(doc) = len(doc)/avgdoclen

and idf(t) is idf(t) = log (totalNumIndexedDocs - docFreq + 0.5)/(docFreq +
0.5), freq(t, doc) is the frequency of term t in document doc.

Choosing b=0.25 and k = 1.2 you get

w(t, doc) = idf(t) * 2.2*freq(t, doc) / (1.2*(0.25+0.75*ls(doc)) + freq(t, doc))

-- 
Regards

Kasun Perera

Different Weights to Lucene fields with Okapi Similarity

Posted by Kasun Perera <ka...@opensource.lk>.
Resending again, since my question didn't get much attention

---------- Forwarded message ----------
From: Kasun Perera <ka...@opensource.lk>
Date: Tue, Jun 19, 2012 at 3:26 PM
Subject: Different Weights to Lucene fields with Okapi Similarity
To: java-user@lucene.apache.org



Based on this link http://www2002.org/CDROM/refereed/643/node6.html , I'm
calculating Okapi similarity between the query document and another
document as below using Lucene:

I have indexed the documents using 3 fields. I want to give higher weight
to field 2 and field 3. I can't use Lucene's boost function since i'm using
a my own similarity function. Can anyone suggest me a method how to give
different weights to fields using this Okapi Similarity function?

This is Okapi Similarity Schema that I have used

sim(query, doc) = sum(t in terms(query), freq(t, query) * w(t, doc))

 where (from the second link, slightly modified as I think the formula in
the link is incorrect)

w(t, doc) = idf(t) * (k+1)*freq(t, doc) / (k*(1-b + b*ls(doc)) + freq(t, doc))


ls(doc) = len(doc)/avgdoclen

 and idf(t) is idf(t) = log (totalNumIndexedDocs - docFreq + 0.5)/(docFreq
+ 0.5), freq(t, doc) is the frequency of term t in document doc.

Choosing b=0.25 and k = 1.2 you get

w(t, doc) = idf(t) * 2.2*freq(t, doc) / (1.2*(0.25+0.75*ls(doc)) + freq(t, doc))


-- 
Regards

Kasun Perera