You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Brian Goetz <br...@quiotix.com> on 2002/09/21 11:08:50 UTC

Term boosting

I've got a searching problem which I know lots of other people have run 
across too.  We've got documents which have keywords (which we extract and 
put into a 'keywords' field) and also have body text (which we put in a 
'body' field.)

Lets say we search for "text retrieval".  We want to find documents that 
have "text retrieval" in the body OR in the keywords, but we want to weight 
hits on the keywords more heavily.  I can't boost the tokens in the index 
base, so I have to do that through the query.

If I convert a query for phrase Q into this:
   body:Q OR keywords:Q^n
does that do what I want?

How should I select the boost factor N?  Are there negative consequences to 
this strategy?  Am I better off doing two queries and merging the results 
myself?


--
Brian Goetz
Quiotix Corporation
brian@quiotix.com           Tel: 650-843-1300            Fax: 650-324-8032

http://www.quiotix.com


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Term boosting

Posted by Clemens Marschner <cm...@lanlab.de>.
I had the same problem as Brian. But since you have to rewrite the query
anyway to do a query in two different fields it makes no difference if you
use term or field boosting. Performance is the same.
For new applications I'd say field boosting is a little simpler because you
save on some commands during the query rewriting phase. Since I already had
written that when field boosting came up, for me there is no difference.

Btw. I changed the query classes to allow query rewriting. I made them
Cloneable and added setter methods for them. If there's interest I'll
contribute the patches asap.

--Clemens

----- Original Message -----
From: "Alex Murzaku" <mu...@yahoo.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Saturday, September 21, 2002 5:38 PM
Subject: Re: Term boosting


> Wouldn't field boosting (the new capability added as of
> http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01727.html)
> be a simpler solution?


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Term boosting

Posted by Alex Murzaku <mu...@yahoo.com>.
Wouldn't field boosting (the new capability added as of
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01727.html)
be a simpler solution? I would just set the boost for the 'keywords'
field to something higher than one depending on your requirements. As
for the value of the booster, I have noticed that it needs quite some
tweaking since there doesn't appear to exist a magic formula. In a
similar situation, I just kept modifying it until I got something that
satisfied my users. It was funny because, in typical Monty Python style
we ended up deciding that "the number shall be three..."

--- Brian Goetz <br...@quiotix.com> wrote:
> 
> I've got a searching problem which I know lots of other people have
> run 
> across too.  We've got documents which have keywords (which we
> extract and 
> put into a 'keywords' field) and also have body text (which we put in
> a 
> 'body' field.)
> 
> Lets say we search for "text retrieval".  We want to find documents
> that 
> have "text retrieval" in the body OR in the keywords, but we want to
> weight 
> hits on the keywords more heavily.  I can't boost the tokens in the
> index 
> base, so I have to do that through the query.
> 
> If I convert a query for phrase Q into this:
>    body:Q OR keywords:Q^n
> does that do what I want?
> 
> How should I select the boost factor N?  Are there negative
> consequences to 
> this strategy?  Am I better off doing two queries and merging the
> results 
> myself?
> 
> 
> --
> Brian Goetz
> Quiotix Corporation
> brian@quiotix.com           Tel: 650-843-1300            Fax:
> 650-324-8032
> 
> http://www.quiotix.com
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>