You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Brian Goetz <br...@quiotix.com> on 2002/09/21 11:08:50 UTC
Term boosting
I've got a searching problem which I know lots of other people have run
across too. We've got documents which have keywords (which we extract and
put into a 'keywords' field) and also have body text (which we put in a
'body' field.)
Lets say we search for "text retrieval". We want to find documents that
have "text retrieval" in the body OR in the keywords, but we want to weight
hits on the keywords more heavily. I can't boost the tokens in the index
base, so I have to do that through the query.
If I convert a query for phrase Q into this:
body:Q OR keywords:Q^n
does that do what I want?
How should I select the boost factor N? Are there negative consequences to
this strategy? Am I better off doing two queries and merging the results
myself?
--
Brian Goetz
Quiotix Corporation
brian@quiotix.com Tel: 650-843-1300 Fax: 650-324-8032
http://www.quiotix.com
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: Term boosting
Posted by Clemens Marschner <cm...@lanlab.de>.
I had the same problem as Brian. But since you have to rewrite the query
anyway to do a query in two different fields it makes no difference if you
use term or field boosting. Performance is the same.
For new applications I'd say field boosting is a little simpler because you
save on some commands during the query rewriting phase. Since I already had
written that when field boosting came up, for me there is no difference.
Btw. I changed the query classes to allow query rewriting. I made them
Cloneable and added setter methods for them. If there's interest I'll
contribute the patches asap.
--Clemens
----- Original Message -----
From: "Alex Murzaku" <mu...@yahoo.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Saturday, September 21, 2002 5:38 PM
Subject: Re: Term boosting
> Wouldn't field boosting (the new capability added as of
> http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01727.html)
> be a simpler solution?
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: Term boosting
Posted by Alex Murzaku <mu...@yahoo.com>.
Wouldn't field boosting (the new capability added as of
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01727.html)
be a simpler solution? I would just set the boost for the 'keywords'
field to something higher than one depending on your requirements. As
for the value of the booster, I have noticed that it needs quite some
tweaking since there doesn't appear to exist a magic formula. In a
similar situation, I just kept modifying it until I got something that
satisfied my users. It was funny because, in typical Monty Python style
we ended up deciding that "the number shall be three..."
--- Brian Goetz <br...@quiotix.com> wrote:
>
> I've got a searching problem which I know lots of other people have
> run
> across too. We've got documents which have keywords (which we
> extract and
> put into a 'keywords' field) and also have body text (which we put in
> a
> 'body' field.)
>
> Lets say we search for "text retrieval". We want to find documents
> that
> have "text retrieval" in the body OR in the keywords, but we want to
> weight
> hits on the keywords more heavily. I can't boost the tokens in the
> index
> base, so I have to do that through the query.
>
> If I convert a query for phrase Q into this:
> body:Q OR keywords:Q^n
> does that do what I want?
>
> How should I select the boost factor N? Are there negative
> consequences to
> this strategy? Am I better off doing two queries and merging the
> results
> myself?
>
>
> --
> Brian Goetz
> Quiotix Corporation
> brian@quiotix.com Tel: 650-843-1300 Fax:
> 650-324-8032
>
> http://www.quiotix.com
>
>
> --
> To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
>
__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>