You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Oliver Xu <ol...@gmail.com> on 2013/06/12 15:00:01 UTC

A Problem in Customizing DefaultSimilarity

Dear,

 

I built my own scoring class by extending the DefaultSimilarity. Three major
methods from DefaultSimilarity were overrided, including:

1. public float lengthNorm(FieldInvertState state)

2. public float tf(float freq)

3.public float idf(long docFreq, long numDocs)

 

However, with embedded printing sentences (they are used to indicate which
and when a method is called by printing messages to console), I found only
tf() and idf() were called during a search process. The method lengthNorm(),
which is really what I wanted to work on, was never called.

 

I rolled back to Lucene350 and checked again. The DefaultSimilarity under
Lucene 350 uses a computeNorm() method instead of lengthNorm(). And again,
the overrided computeNorm() is never called either.

 

I used explanation() to check the components of each score for a document.
Besides the idf and tf scores, I did find a fieldNorm score, which has
something to do with the document length.

 

My questions are:

1.       Why are the overrided lengthNorm() (under Lucene410) or
computeNorm() (under Lucene350) methods not called during a searching
process?

2.       How and where is fieldNorm calculated?

 

Thank you very much!

 

Oliver

 

Oliver Xu(徐永)
Aigine InfoTech Co.(语擎科技)
W: <http://www.aigine.com> www.aigine.com
T: +86-189189 02886
E:  <ma...@aigine.com> oliver.xu@aigine.com
MSN: oliver_xuyong@msn.com
Weibo: 语擎-集体智慧编程

 


Re: 答复: [SPAM] Re: A Problem in Customizing DefaultSimilarity

Posted by Varun Thacker <va...@gmail.com>.
Hi Oliver,
I would like to add a couple of things here..

1. Regarding lengthNorm being called you will also have to make sure that
you set your custom similarity class during index by calling
IndexWriterConfig.setSimilarity(new CustomSimilarity());


2. From what I can see in the source for computeNorm basically calls
fieldNorm and encodes it so that it can be stored in the index


On Thu, Jun 13, 2013 at 6:24 AM, Oliver Xu (Aigine Co) <oliver.xu@aigine.com
> wrote:

> Lovely. Thank you very much! Oliver
>
> -----邮件原件-----
> 发件人: java-user-return-56101-oliver.xu=aigine.com@lucene.apache.org
> [mailto:java-user-return-56101-oliver.xu=aigine.com@lucene.apache.org] 代表
> Koji Sekiguchi
> 发送时间: 2013年6月12日 22:47
> 收件人: java-user@lucene.apache.org
> 主题: [SPAM] Re: A Problem in Customizing DefaultSimilarity
>
> Hi Oliver,
>
> > My questions are:
> >
> > 1.       Why are the overrided lengthNorm() (under Lucene410) or
> > computeNorm() (under Lucene350) methods not called during a searching
> > process?
>
> Regardless of whether you override the method or not, Lucene framework
> calls the method during index time only because length norm can be
> calculated by using the number of tokens in the field, i.e. not need to
> call
> it search time.
>
> > 2.       How and where is fieldNorm calculated?
>
> I think fieldNorm = lengthNorm * boost of the field. Please see Javadoc for
> more detail:
>
>
> http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/similariti
> es/TFIDFSimilarity.html
>
> koji
> --
>
> http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikip
> edia.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 


Regards,
Varun Thacker
http://www.vthacker.in/

答复: [SPAM] Re: A Problem in Customizing DefaultSimilarity

Posted by "Oliver Xu (Aigine Co)" <ol...@aigine.com>.
Lovely. Thank you very much! Oliver

-----邮件原件-----
发件人: java-user-return-56101-oliver.xu=aigine.com@lucene.apache.org
[mailto:java-user-return-56101-oliver.xu=aigine.com@lucene.apache.org] 代表
Koji Sekiguchi
发送时间: 2013年6月12日 22:47
收件人: java-user@lucene.apache.org
主题: [SPAM] Re: A Problem in Customizing DefaultSimilarity

Hi Oliver,

> My questions are:
> 
> 1.       Why are the overrided lengthNorm() (under Lucene410) or
> computeNorm() (under Lucene350) methods not called during a searching
> process?

Regardless of whether you override the method or not, Lucene framework
calls the method during index time only because length norm can be
calculated by using the number of tokens in the field, i.e. not need to call
it search time.

> 2.       How and where is fieldNorm calculated?

I think fieldNorm = lengthNorm * boost of the field. Please see Javadoc for
more detail:

http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/similariti
es/TFIDFSimilarity.html

koji
-- 
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikip
edia.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: A Problem in Customizing DefaultSimilarity

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Hi Oliver,

> My questions are:
> 
> 1.       Why are the overrided lengthNorm() (under Lucene410) or
> computeNorm() (under Lucene350) methods not called during a searching
> process?

Regardless of whether you override the method or not, Lucene framework
calls the method during index time only because length norm can be
calculated by using the number of tokens in the field, i.e. not need to call
it search time.

> 2.       How and where is fieldNorm calculated?

I think fieldNorm = lengthNorm * boost of the field. Please see Javadoc for more detail:

http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

koji
-- 
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org