You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Michael Celona <mc...@criticalmention.com> on 2005/02/07 14:48:29 UTC

Similarity coord,lengthNorm

I have varying length text fields which I am searching on.  I would like
relevancy to be dictated predominantly by the number of terms in my query
that match.  Right now I am seeing a high relevancy for a single word
matching in a small document even though all the terms in my query don't
match.  Does, anyone have an example of a custom Similarity sub class which
overrides the coord and lengthNorm methods.

 

Thanks..

Michael 


Re: Similarity coord,lengthNorm

Posted by Andrzej Bialecki <ab...@getopt.org>.
Erik Hatcher wrote:
> 
> On Feb 7, 2005, at 8:53 AM, Michael Celona wrote:
> 
>> Would fixing the lengthNorm to 1 fix this problem?
> 
> 
> Yes, it would eliminate the length of a field as a factor.
> 
> Your best bet is to set up a test harness where you can try out various 
> tweaks to Similarity, but setting the length normalization factor to 1.0 
> may be all you need to do, as the coord() takes care of the other factor 
> you're after.

I'm releasing next week a new version of Luke, which includes a custom 
Similarity designer (using Rhino JavaScript engine) - it makes 
experimenting with Similarity super-easy.

-- 
Best regards,
Andrzej Bialecki
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Similarity coord,lengthNorm

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 7, 2005, at 8:53 AM, Michael Celona wrote:
> Would fixing the lengthNorm to 1 fix this problem?

Yes, it would eliminate the length of a field as a factor.

Your best bet is to set up a test harness where you can try out various 
tweaks to Similarity, but setting the length normalization factor to 
1.0 may be all you need to do, as the coord() takes care of the other 
factor you're after.

	Erik

>
> Michael
>
> -----Original Message-----
> From: Michael Celona [mailto:mcelona@criticalmention.com]
> Sent: Monday, February 07, 2005 8:48 AM
> To: Lucene Users List
> Subject: Similarity coord,lengthNorm
>
> I have varying length text fields which I am searching on.  I would 
> like
> relevancy to be dictated predominantly by the number of terms in my 
> query
> that match.  Right now I am seeing a high relevancy for a single word
> matching in a small document even though all the terms in my query 
> don't
> match.  Does, anyone have an example of a custom Similarity sub class 
> which
> overrides the coord and lengthNorm methods.
>
>
>
> Thanks..
>
> Michael
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Similarity coord,lengthNorm

Posted by Michael Celona <mc...@criticalmention.com>.
Would fixing the lengthNorm to 1 fix this problem?

Michael

-----Original Message-----
From: Michael Celona [mailto:mcelona@criticalmention.com] 
Sent: Monday, February 07, 2005 8:48 AM
To: Lucene Users List
Subject: Similarity coord,lengthNorm

I have varying length text fields which I am searching on.  I would like
relevancy to be dictated predominantly by the number of terms in my query
that match.  Right now I am seeing a high relevancy for a single word
matching in a small document even though all the terms in my query don't
match.  Does, anyone have an example of a custom Similarity sub class which
overrides the coord and lengthNorm methods.

 

Thanks..

Michael 




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org