You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jeremy Hanna <je...@mac.com> on 2006/04/14 00:29:55 UTC
Boosting Fields (in index) or Queries
I have a situation where I'm indexing database entries and have
fields such as:
name
sku
model
category name
description
features
specifications
I am trying to set a priority higher for the name, category name, and
description.
I've tried setting the fields' boost values as I've indexed the db
and it seemed to have little or no result.
When I tried to do the queries, I started using the
MultiFieldQueryParser but found that those don't have priority or
boost values you can set on any of the Fields at query time. Then I
tried to have separate query parsers - one for each field. That way
I could set a boost level for each of the queries created by those
query parsers. I joined them together with a BooleanQuery and all of
them set to BooleanQuery.Occur.SHOULD. I ended up setting the
features, specifications, and description to default to
Query.Operator.AND and that helped, but the boost value seems to do
nothing.
I try to set the categoryParser's query boost to 4.0f, then 8.0f,
then 20.0f and have tried downgrading other queries, but the results
don't change at all in their order.
I am using 1.9.1 and for my database I'm using hibernate to mysql 5
and ArrayLists with the bag mapping in hibernate.
Does anyone have any thoughts or suggestions?
Thanks!
Jeremy
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boosting Fields (in index) or Queries
Posted by Jeremy Hanna <je...@mac.com>.
I still have a similar problem with the boost factor. I change the
name to have the AND operator and set that query's boost to a very
high value in relation to the others. I also have a regular OR based
name so that it doesn't rule those out. However whenever I change
the boost values with the queries, nothing, absolutely nothing
changes with the results. Besides that - I search for: playstation
game. The only value that has both playstation and game in the name
field is Hit number 20. That's really why I put the name AND
operator in there with such a high boost value, to see if it would
bring that single ANDed record towards the top, but nothing. Am I
doing something wrong in all of this? Am I doing the boost wrong or
something?
On Apr 14, 2006, at 1:43 PM, Michael D. Curtin wrote:
> Jeremy Hanna wrote:
>
>> I would use a database function to force the ordering like the
>> one your provided that works in Oracle, but it doesn't look like
>> mysql 5 supports that. If anyone else knows of a way to force
>> the ordering using mysql 5 queries, please respond. I think I'll
>> just resort them when they get back though.
>
> If there's nothing in the relational table that specifies the
> ordering, I'm afraid you've probably got similar problems in other
> places. RDBMSes don't guarantee to return rows in the order they
> were INSERTed. Sure, early in the life of a table that will tend
> to happen, but as DELETEs, then UPDATEs and new INSERTs get
> processed, the on-disk order tends to get pretty jumbled. Note
> that I'm talking about anything that uses the results of your
> SELECT, not just your Lucene-related code.
>
> If ordering of the rows is something your app needs, I recommend
> adding a column that is expressly for ordering. A one-up integer
> or something like that. I don't remember what the keyword in MySQL
> is for that, but I'm pretty sure there is one. Then you can code
> all your SELECTs with an ORDER BY clause that does what you want.
>
> Good luck!
>
> --MDC
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boosting Fields (in index) or Queries
Posted by "Michael D. Curtin" <mi...@curtin.com>.
Jeremy Hanna wrote:
> I would use a database function to force the ordering like the one your
> provided that works in Oracle, but it doesn't look like mysql 5
> supports that. If anyone else knows of a way to force the ordering
> using mysql 5 queries, please respond. I think I'll just resort them
> when they get back though.
If there's nothing in the relational table that specifies the ordering, I'm
afraid you've probably got similar problems in other places. RDBMSes don't
guarantee to return rows in the order they were INSERTed. Sure, early in the
life of a table that will tend to happen, but as DELETEs, then UPDATEs and new
INSERTs get processed, the on-disk order tends to get pretty jumbled. Note
that I'm talking about anything that uses the results of your SELECT, not just
your Lucene-related code.
If ordering of the rows is something your app needs, I recommend adding a
column that is expressly for ordering. A one-up integer or something like
that. I don't remember what the keyword in MySQL is for that, but I'm pretty
sure there is one. Then you can code all your SELECTs with an ORDER BY clause
that does what you want.
Good luck!
--MDC
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boosting Fields (in index) or Queries
Posted by Jeremy Hanna <je...@mac.com>.
I would use a database function to force the ordering like the one
your provided that works in Oracle, but it doesn't look like mysql 5
supports that. If anyone else knows of a way to force the ordering
using mysql 5 queries, please respond. I think I'll just resort them
when they get back though.
Thanks!
On Apr 14, 2006, at 11:39 AM, Bryzek.Michael wrote:
> We tried two approaches:
>
> 1) Pull data from the db in arbitrary order and then sort in the
> application AFTER the retrieve. This will require two passes over
> the results.
>
> 2) Add an order by clause to the select. In Oracle, you could do
> something like "order by decode(444,1,333,2,555,3,888,4,...)". This
> will force the order you want in the query from the db.
>
> FWIW, after trying both of the above in production, we changed our
> strategy to avoid the db hit altogether, storing everything we
> needed for presentation within the Lucene index. We saw a net
> performance increase AND simpler code when we did this.
>
> -Mike
>
> -----Original Message-----
> From: Jeremy Hanna [mailto:jeremy_hanna@mac.com]
> Sent: Fri 4/14/06 1:15 PM
> To: java-user@lucene.apache.org
> Cc:
> Subject: Re: Boosting Fields (in index) or Queries
>
> Wow, I finally found out why I was getting results in the wrong order
> - I got the results in the correct order from the Lucene index. I
> got the explanation of each of the results along with their database
> id and found the ordering mismatch.
> The problem is in the database call. I am calling:
>
> select * from product where id in (444, 333, 555, 888);
>
> and the ordering that comes back is not preserved. So the results
> are correct but the ordering and hence all of the relevancy is out
> the window. So that at least leads me to the actual problem. Now I
> have to figure out how I'll approach reordering the results because I
> don't believe that there's any way to force the ordering of a list
> and I don't want to call a separate database query for each id (lots
> of database round-trips).
>
> Thanks for the help Erik!
>
> On Apr 13, 2006, at 7:13 PM, Erik Hatcher wrote:
>
>>
>> On Apr 13, 2006, at 8:55 PM, Jeremy Hanna wrote:
>>> Looking at the results, the first document in the results should
>>> hopefully be near the bottom and the Explanation for this document
>>> has a Description/Details (using the toString() on the
>>> Explanation) of:
>>>
>>> product of:
>>> 0.0 = sum of:
>>> 0.0 = coord(0/7)
>>>
>>> So I'm kind of at a loss as to what's going on. Am I just doing
>>> something crazy weird in my code? I didn't find that many
>>> examples out there, so I'm kind of winging it according to what
>>> I've read in the javadocs and what examples I could find.
>>
>> Be sure to pass the document id, not the hit number, to explain().
>> Looks like you passed an id of an unmatched document.
>>
>> Erik
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Boosting Fields (in index) or Queries
Posted by "Bryzek.Michael" <Mi...@uwa.unitedway.org>.
We tried two approaches:
1) Pull data from the db in arbitrary order and then sort in the application AFTER the retrieve. This will require two passes over the results.
2) Add an order by clause to the select. In Oracle, you could do something like "order by decode(444,1,333,2,555,3,888,4,...)". This will force the order you want in the query from the db.
FWIW, after trying both of the above in production, we changed our strategy to avoid the db hit altogether, storing everything we needed for presentation within the Lucene index. We saw a net performance increase AND simpler code when we did this.
-Mike
-----Original Message-----
From: Jeremy Hanna [mailto:jeremy_hanna@mac.com]
Sent: Fri 4/14/06 1:15 PM
To: java-user@lucene.apache.org
Cc:
Subject: Re: Boosting Fields (in index) or Queries
Wow, I finally found out why I was getting results in the wrong order
- I got the results in the correct order from the Lucene index. I
got the explanation of each of the results along with their database
id and found the ordering mismatch.
The problem is in the database call. I am calling:
select * from product where id in (444, 333, 555, 888);
and the ordering that comes back is not preserved. So the results
are correct but the ordering and hence all of the relevancy is out
the window. So that at least leads me to the actual problem. Now I
have to figure out how I'll approach reordering the results because I
don't believe that there's any way to force the ordering of a list
and I don't want to call a separate database query for each id (lots
of database round-trips).
Thanks for the help Erik!
On Apr 13, 2006, at 7:13 PM, Erik Hatcher wrote:
>
> On Apr 13, 2006, at 8:55 PM, Jeremy Hanna wrote:
>> Looking at the results, the first document in the results should
>> hopefully be near the bottom and the Explanation for this document
>> has a Description/Details (using the toString() on the
>> Explanation) of:
>>
>> product of:
>> 0.0 = sum of:
>> 0.0 = coord(0/7)
>>
>> So I'm kind of at a loss as to what's going on. Am I just doing
>> something crazy weird in my code? I didn't find that many
>> examples out there, so I'm kind of winging it according to what
>> I've read in the javadocs and what examples I could find.
>
> Be sure to pass the document id, not the hit number, to explain().
> Looks like you passed an id of an unmatched document.
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boosting Fields (in index) or Queries
Posted by Jeremy Hanna <je...@mac.com>.
Wow, I finally found out why I was getting results in the wrong order
- I got the results in the correct order from the Lucene index. I
got the explanation of each of the results along with their database
id and found the ordering mismatch.
The problem is in the database call. I am calling:
select * from product where id in (444, 333, 555, 888);
and the ordering that comes back is not preserved. So the results
are correct but the ordering and hence all of the relevancy is out
the window. So that at least leads me to the actual problem. Now I
have to figure out how I'll approach reordering the results because I
don't believe that there's any way to force the ordering of a list
and I don't want to call a separate database query for each id (lots
of database round-trips).
Thanks for the help Erik!
On Apr 13, 2006, at 7:13 PM, Erik Hatcher wrote:
>
> On Apr 13, 2006, at 8:55 PM, Jeremy Hanna wrote:
>> Looking at the results, the first document in the results should
>> hopefully be near the bottom and the Explanation for this document
>> has a Description/Details (using the toString() on the
>> Explanation) of:
>>
>> product of:
>> 0.0 = sum of:
>> 0.0 = coord(0/7)
>>
>> So I'm kind of at a loss as to what's going on. Am I just doing
>> something crazy weird in my code? I didn't find that many
>> examples out there, so I'm kind of winging it according to what
>> I've read in the javadocs and what examples I could find.
>
> Be sure to pass the document id, not the hit number, to explain().
> Looks like you passed an id of an unmatched document.
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boosting Fields (in index) or Queries
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 13, 2006, at 8:55 PM, Jeremy Hanna wrote:
> Looking at the results, the first document in the results should
> hopefully be near the bottom and the Explanation for this document
> has a Description/Details (using the toString() on the Explanation)
> of:
>
> product of:
> 0.0 = sum of:
> 0.0 = coord(0/7)
>
> So I'm kind of at a loss as to what's going on. Am I just doing
> something crazy weird in my code? I didn't find that many examples
> out there, so I'm kind of winging it according to what I've read in
> the javadocs and what examples I could find.
Be sure to pass the document id, not the hit number, to explain().
Looks like you passed an id of an unmatched document.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Boosting Fields (in index) or Queries
Posted by Jeremy Hanna <je...@mac.com>.
Thanks for the tip. I'm trying to decipher what the explanation
tells me right now.
Btw, here is the code that I'm currently running:
////////
QueryParser nameParser = new QueryParser("name", analyzer);
QueryParser categoryParser = new QueryParser("category", analyzer);
QueryParser descriptionParser = new QueryParser("description",
analyzer);
QueryParser featuresParser = new QueryParser("features", analyzer);
QueryParser specificationsParser = new QueryParser("specifications",
analyzer);
QueryParser skuParser = new QueryParser("sku", analyzer);
QueryParser modelParser = new QueryParser("model", analyzer);
descriptionParser.setDefaultOperator(QueryParser.Operator.AND);
featuresParser.setDefaultOperator(QueryParser.Operator.AND);
specificationsParser.setDefaultOperator(QueryParser.Operator.AND);
BooleanQuery booleanQuery = new BooleanQuery();
Query current = null;
current = categoryParser.parse(queryString);
current.setBoost(20.0f);
booleanQuery.add(current, BooleanClause.Occur.SHOULD);
booleanQuery.add(nameParser.parse(queryString),
BooleanClause.Occur.SHOULD);
booleanQuery.add(descriptionParser.parse(queryString),
BooleanClause.Occur.SHOULD);
booleanQuery.add(featuresParser.parse(queryString),
BooleanClause.Occur.SHOULD);
booleanQuery.add(specificationsParser.parse(queryString),
BooleanClause.Occur.SHOULD);
booleanQuery.add(skuParser.parse(queryString),
BooleanClause.Occur.SHOULD);
booleanQuery.add(modelParser.parse(queryString),
BooleanClause.Occur.SHOULD);
hits = indexSearcher.search(booleanQuery);
hits = indexSearcher.search(booleanQuery, Sort.RELEVANCE);
////////
Looking at the results, the first document in the results should
hopefully be near the bottom and the Explanation for this document
has a Description/Details (using the toString() on the Explanation) of:
product of:
0.0 = sum of:
0.0 = coord(0/7)
So I'm kind of at a loss as to what's going on. Am I just doing
something crazy weird in my code? I didn't find that many examples
out there, so I'm kind of winging it according to what I've read in
the javadocs and what examples I could find.
Thanks,
Jeremy
On Apr 13, 2006, at 6:17 PM, Erik Hatcher wrote:
> The best recommendation is to have a look at the Explanation
> returned from IndexSearcher.explain() for a specific query and
> document to trace how things are being scored. Is it possible
> you're boosting all documents by the same amount?
>
> Erik
>
>
> On Apr 13, 2006, at 6:29 PM, Jeremy Hanna wrote:
>
>> I have a situation where I'm indexing database entries and have
>> fields such as:
>>
>> name
>> sku
>> model
>> category name
>> description
>> features
>> specifications
>>
>> I am trying to set a priority higher for the name, category name,
>> and description.
>>
>> I've tried setting the fields' boost values as I've indexed the db
>> and it seemed to have little or no result.
>>
>> When I tried to do the queries, I started using the
>> MultiFieldQueryParser but found that those don't have priority or
>> boost values you can set on any of the Fields at query time. Then
>> I tried to have separate query parsers - one for each field. That
>> way I could set a boost level for each of the queries created by
>> those query parsers. I joined them together with a BooleanQuery
>> and all of them set to BooleanQuery.Occur.SHOULD. I ended up
>> setting the features, specifications, and description to default
>> to Query.Operator.AND and that helped, but the boost value seems
>> to do nothing.
>>
>> I try to set the categoryParser's query boost to 4.0f, then 8.0f,
>> then 20.0f and have tried downgrading other queries, but the
>> results don't change at all in their order.
>>
>> I am using 1.9.1 and for my database I'm using hibernate to mysql
>> 5 and ArrayLists with the bag mapping in hibernate.
>>
>> Does anyone have any thoughts or suggestions?
>>
>> Thanks!
>>
>> Jeremy
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
Re: Boosting Fields (in index) or Queries
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
The best recommendation is to have a look at the Explanation returned
from IndexSearcher.explain() for a specific query and document to
trace how things are being scored. Is it possible you're boosting
all documents by the same amount?
Erik
On Apr 13, 2006, at 6:29 PM, Jeremy Hanna wrote:
> I have a situation where I'm indexing database entries and have
> fields such as:
>
> name
> sku
> model
> category name
> description
> features
> specifications
>
> I am trying to set a priority higher for the name, category name,
> and description.
>
> I've tried setting the fields' boost values as I've indexed the db
> and it seemed to have little or no result.
>
> When I tried to do the queries, I started using the
> MultiFieldQueryParser but found that those don't have priority or
> boost values you can set on any of the Fields at query time. Then
> I tried to have separate query parsers - one for each field. That
> way I could set a boost level for each of the queries created by
> those query parsers. I joined them together with a BooleanQuery
> and all of them set to BooleanQuery.Occur.SHOULD. I ended up
> setting the features, specifications, and description to default to
> Query.Operator.AND and that helped, but the boost value seems to do
> nothing.
>
> I try to set the categoryParser's query boost to 4.0f, then 8.0f,
> then 20.0f and have tried downgrading other queries, but the
> results don't change at all in their order.
>
> I am using 1.9.1 and for my database I'm using hibernate to mysql 5
> and ArrayLists with the bag mapping in hibernate.
>
> Does anyone have any thoughts or suggestions?
>
> Thanks!
>
> Jeremy
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org