You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Daniel Einspanjer <de...@gmail.com> on 2007/05/18 00:46:49 UTC

Is it possible to use a custom similarity class to cause extra terms in a field to lower score?

If I have two items in an index:
Terminator 2
Terminator 2: Judgment Day

And I score them against the query +title:(Terminator 2)
they come up with the same score (which makes sense, it just isn't
quite what I want)

Would there be some method or combination of methods in Similarity
that I could easily override to allow me to penalize the second item
because it had "unused terms"?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Is it possible to use a custom similarity class to cause extra terms in a field to lower score?

Posted by Daniel Einspanjer <de...@gmail.com>.

Oops.  I do indeed have omitNorms turned on.  I will re-read the
documentation on it and look at turning it off.

Sorry for the bother. :/

On 5/17/07, Chris Hostetter <ho...@fucit.org> wrote:
>
> : Terminator 2
> : Terminator 2: Judgment Day
> :
> : And I score them against the query +title:(Terminator 2)
>
> : Would there be some method or combination of methods in Similarity
> : that I could easily override to allow me to penalize the second item
> : because it had "unused terms"?
>
> that's what the DefaultSimilarity does, it uses the (length)norm
> information stored when the documents are indexed to know which one is a
> better match (because it matches on a shorter field)
>
> I you aren'tseeing that behavior then perhaps you turned omitNorms for
> that field, or perhaps the byte encoding is making the distinction between
> your various terms too small -- overriding the lengthNorm function and
> reindexing might help.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Is it possible to use a custom similarity class to cause extra terms in a field to lower score?

Posted by Chris Hostetter <ho...@fucit.org>.

: Terminator 2
: Terminator 2: Judgment Day
:
: And I score them against the query +title:(Terminator 2)

: Would there be some method or combination of methods in Similarity
: that I could easily override to allow me to penalize the second item
: because it had "unused terms"?

that's what the DefaultSimilarity does, it uses the (length)norm
information stored when the documents are indexed to know which one is a
better match (because it matches on a shorter field)

I you aren'tseeing that behavior then perhaps you turned omitNorms for
that field, or perhaps the byte encoding is making the distinction between
your various terms too small -- overriding the lengthNorm function and
reindexing might help.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org