You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Kalvir Sandhu <ka...@kalv.co.uk> on 2007/08/31 18:08:23 UTC

Weighting Issue

Hi all.

I am working on building a lucene index to search names of people. I want to
be able to score things differently. Here is an example of the behaviour i
need.

Doc 1 with aliases
name: Bob Jones
alias: John Smith Andrew Jones

Doc 2 without aliases
name: John Andrew Smith
alias: none

When i run a search with the lucene query:
name:(John Smith) alias:(John Smith)

I get Doc 2 as higher scored result than Doc 1. And the score of Doc 2 is
quite low. I need the score to not reflect how many names were assigned to
the document. I have been playing with the DefaultSimilarity to override
certain fields but not getting anywhere.

I could use a ConstantScoreQuery but i want to be able to perfom Fuzzy query
options sometimes too.

Any Ideas?

Kalv.

Re: Weighting Issue

Posted by Chris Hostetter <ho...@fucit.org>.

> Have you tried giving the name field a boost?  E.g. name:(John Smith)^10
> alias:(John Smith)

i'm also guessing youd be much happier with a sloppy phrase query then 
with the boolean queries you are currently using..

    name:"John Smith"~3^10 alias:"John Smith"~3


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Weighting Issue

Posted by Kalvir Sandhu <ka...@kalv.co.uk>.

Thanks for the reply - i have tried boosting but not like you stated. I have
tried to boost the Alias field so that it would score as high as a match on
the name field. But it didn't increase enough. like :

name:(John Smith) alias:(John Smith)^10

I think it has something to do with the fact that there is a lot of terms
stored in that document for alias, therefore weighting lower.

On 8/31/07, Michael Stoppelman <st...@gmail.com> wrote:
>
> Kalvir,
>
> Have you tried giving the name field a boost?  E.g. name:(John Smith)^10
> alias:(John Smith)
>
> -M
>
> On 8/31/07, Kalvir Sandhu <ka...@kalv.co.uk> wrote:
> >
> > Hi all.
> >
> > I am working on building a lucene index to search names of people. I
> want
> > to
> > be able to score things differently. Here is an example of the behaviour
> i
> > need.
> >
> > Doc 1 with aliases
> > name: Bob Jones
> > alias: John Smith Andrew Jones
> >
> > Doc 2 without aliases
> > name: John Andrew Smith
> > alias: none
> >
> > When i run a search with the lucene query:
> > name:(John Smith) alias:(John Smith)
> >
> > I get Doc 2 as higher scored result than Doc 1. And the score of Doc 2
> is
> > quite low. I need the score to not reflect how many names were assigned
> to
> > the document. I have been playing with the DefaultSimilarity to override
> > certain fields but not getting anywhere.
> >
> > I could use a ConstantScoreQuery but i want to be able to perfom Fuzzy
> > query
> > options sometimes too.
> >
> > Any Ideas?
> >
> > Kalv.
> >
>

Re: Weighting Issue

Posted by Michael Stoppelman <st...@gmail.com>.

Kalvir,

Have you tried giving the name field a boost?  E.g. name:(John Smith)^10
alias:(John Smith)

-M

On 8/31/07, Kalvir Sandhu <ka...@kalv.co.uk> wrote:
>
> Hi all.
>
> I am working on building a lucene index to search names of people. I want
> to
> be able to score things differently. Here is an example of the behaviour i
> need.
>
> Doc 1 with aliases
> name: Bob Jones
> alias: John Smith Andrew Jones
>
> Doc 2 without aliases
> name: John Andrew Smith
> alias: none
>
> When i run a search with the lucene query:
> name:(John Smith) alias:(John Smith)
>
> I get Doc 2 as higher scored result than Doc 1. And the score of Doc 2 is
> quite low. I need the score to not reflect how many names were assigned to
> the document. I have been playing with the DefaultSimilarity to override
> certain fields but not getting anywhere.
>
> I could use a ConstantScoreQuery but i want to be able to perfom Fuzzy
> query
> options sometimes too.
>
> Any Ideas?
>
> Kalv.
>