You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jeremy Hanna <je...@mac.com> on 2006/04/14 00:29:55 UTC

Boosting Fields (in index) or Queries

I have a situation where I'm indexing database entries and have  
fields such as:

name
sku
model
category name
description
features
specifications

I am trying to set a priority higher for the name, category name, and  
description.

I've tried setting the fields' boost values as I've indexed the db  
and it seemed to have little or no result.

When I tried to do the queries, I started using the  
MultiFieldQueryParser but found that those don't have priority or  
boost values you can set on any of the Fields at query time.  Then I  
tried to have separate query parsers - one for each field.  That way  
I could set a boost level for each of the queries created by those  
query parsers.  I joined them together with a BooleanQuery and all of  
them set to BooleanQuery.Occur.SHOULD.  I ended up setting the  
features, specifications, and description to default to  
Query.Operator.AND and that helped, but the boost value seems to do  
nothing.

I try to set the categoryParser's query boost to 4.0f, then 8.0f,  
then 20.0f and have tried downgrading other queries, but the results  
don't change at all in their order.

I am using 1.9.1 and for my database I'm using hibernate to mysql 5  
and ArrayLists with the bag mapping in hibernate.

Does anyone have any thoughts or suggestions?

Thanks!

Jeremy

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Boosting Fields (in index) or Queries

Posted by Jeremy Hanna <je...@mac.com>.
I still have a similar problem with the boost factor.  I change the  
name to have the AND operator and set that query's boost to a very  
high value in relation to the others.  I also have a regular OR based  
name so that it doesn't rule those out.  However whenever I change  
the boost values with the queries, nothing, absolutely nothing  
changes with the results.  Besides that - I search for: playstation  
game.  The only value that has both playstation and game in the name  
field is Hit number 20.  That's really why I put the name AND  
operator in there with such a high boost value, to see if it would  
bring that single ANDed record towards the top, but nothing.  Am I  
doing something wrong in all of this?  Am I doing the boost wrong or  
something?

On Apr 14, 2006, at 1:43 PM, Michael D. Curtin wrote:

> Jeremy Hanna wrote:
>
>> I would use a database function to force the ordering like the  
>> one  your provided that works in Oracle, but it doesn't look like  
>> mysql 5  supports that.  If anyone else knows of a way to force  
>> the ordering  using mysql 5 queries, please respond.  I think I'll  
>> just resort them  when they get back though.
>
> If there's nothing in the relational table that specifies the  
> ordering, I'm afraid you've probably got similar problems in other  
> places.  RDBMSes don't guarantee to return rows in the order they  
> were INSERTed.  Sure, early in the life of a table that will tend  
> to happen, but as DELETEs, then UPDATEs and new INSERTs get  
> processed, the on-disk order tends to get pretty jumbled.  Note  
> that I'm talking about anything that uses the results of your  
> SELECT, not just your Lucene-related code.
>
> If ordering of the rows is something your app needs, I recommend  
> adding a column that is expressly for ordering.  A one-up integer  
> or something like that.  I don't remember what the keyword in MySQL  
> is for that, but I'm pretty sure there is one.  Then you can code  
> all your SELECTs with an ORDER BY clause that does what you want.
>
> Good luck!
>
> --MDC
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Boosting Fields (in index) or Queries

Posted by "Michael D. Curtin" <mi...@curtin.com>.
Jeremy Hanna wrote:

> I would use a database function to force the ordering like the one  your 
> provided that works in Oracle, but it doesn't look like mysql 5  
> supports that.  If anyone else knows of a way to force the ordering  
> using mysql 5 queries, please respond.  I think I'll just resort them  
> when they get back though.

If there's nothing in the relational table that specifies the ordering, I'm 
afraid you've probably got similar problems in other places.  RDBMSes don't 
guarantee to return rows in the order they were INSERTed.  Sure, early in the 
life of a table that will tend to happen, but as DELETEs, then UPDATEs and new 
INSERTs get processed, the on-disk order tends to get pretty jumbled.  Note 
that I'm talking about anything that uses the results of your SELECT, not just 
your Lucene-related code.

If ordering of the rows is something your app needs, I recommend adding a 
column that is expressly for ordering.  A one-up integer or something like 
that.  I don't remember what the keyword in MySQL is for that, but I'm pretty 
sure there is one.  Then you can code all your SELECTs with an ORDER BY clause 
that does what you want.

Good luck!

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Boosting Fields (in index) or Queries

Posted by Jeremy Hanna <je...@mac.com>.
I would use a database function to force the ordering like the one  
your provided that works in Oracle, but it doesn't look like mysql 5  
supports that.  If anyone else knows of a way to force the ordering  
using mysql 5 queries, please respond.  I think I'll just resort them  
when they get back though.
Thanks!

On Apr 14, 2006, at 11:39 AM, Bryzek.Michael wrote:

> We tried two approaches:
>
>   1) Pull data from the db in arbitrary order and then sort in the  
> application  AFTER the retrieve. This will require two passes over  
> the results.
>
>   2) Add an order by clause to the select. In Oracle, you could do  
> something like "order by decode(444,1,333,2,555,3,888,4,...)". This  
> will force the order you want in the query from the db.
>
> FWIW, after trying both of the above in production, we changed our  
> strategy to avoid the db hit altogether, storing everything we  
> needed for presentation within the Lucene index. We saw a net  
> performance increase AND simpler code when we did this.
>
> -Mike
>
> -----Original Message-----
> From:	Jeremy Hanna [mailto:jeremy_hanna@mac.com]
> Sent:	Fri 4/14/06 1:15 PM
> To:	java-user@lucene.apache.org
> Cc:	
> Subject:	Re: Boosting Fields (in index) or Queries
>
> Wow, I finally found out why I was getting results in the wrong order
> - I got the results in the correct order from the Lucene index.  I
> got the explanation of each of the results along with their database
> id and found the ordering mismatch.
> The problem is in the database call.  I am calling:
>
> select * from product where id in (444, 333, 555, 888);
>
> and the ordering that comes back is not preserved.  So the results
> are correct but the ordering and hence all of the relevancy is out
> the window.  So that at least leads me to the actual problem.  Now I
> have to figure out how I'll approach reordering the results because I
> don't believe that there's any way to force the ordering of a list
> and I don't want to call a separate database query for each id (lots
> of database round-trips).
>
> Thanks for the help Erik!
>
> On Apr 13, 2006, at 7:13 PM, Erik Hatcher wrote:
>
>>
>> On Apr 13, 2006, at 8:55 PM, Jeremy Hanna wrote:
>>> Looking at the results, the first document in the results should
>>> hopefully be near the bottom and the Explanation for this document
>>> has a Description/Details (using the toString() on the
>>> Explanation) of:
>>>
>>> product of:
>>>   0.0 = sum of:
>>>   0.0 = coord(0/7)
>>>
>>> So I'm kind of at a loss as to what's going on.  Am I just doing
>>> something crazy weird in my code?  I didn't find that many
>>> examples out there, so I'm kind of winging it according to what
>>> I've read in the javadocs and what examples I could find.
>>
>> Be sure to pass the document id, not the hit number, to explain().
>> Looks like you passed an id of an unmatched document.
>>
>> 	Erik
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Boosting Fields (in index) or Queries

Posted by "Bryzek.Michael" <Mi...@uwa.unitedway.org>.
We tried two approaches:

  1) Pull data from the db in arbitrary order and then sort in the application  AFTER the retrieve. This will require two passes over the results.

  2) Add an order by clause to the select. In Oracle, you could do something like "order by decode(444,1,333,2,555,3,888,4,...)". This will force the order you want in the query from the db.

FWIW, after trying both of the above in production, we changed our strategy to avoid the db hit altogether, storing everything we needed for presentation within the Lucene index. We saw a net performance increase AND simpler code when we did this.

-Mike

-----Original Message-----
From:	Jeremy Hanna [mailto:jeremy_hanna@mac.com]
Sent:	Fri 4/14/06 1:15 PM
To:	java-user@lucene.apache.org
Cc:	
Subject:	Re: Boosting Fields (in index) or Queries

Wow, I finally found out why I was getting results in the wrong order  
- I got the results in the correct order from the Lucene index.  I  
got the explanation of each of the results along with their database  
id and found the ordering mismatch.
The problem is in the database call.  I am calling:

select * from product where id in (444, 333, 555, 888);

and the ordering that comes back is not preserved.  So the results  
are correct but the ordering and hence all of the relevancy is out  
the window.  So that at least leads me to the actual problem.  Now I  
have to figure out how I'll approach reordering the results because I  
don't believe that there's any way to force the ordering of a list  
and I don't want to call a separate database query for each id (lots  
of database round-trips).

Thanks for the help Erik!

On Apr 13, 2006, at 7:13 PM, Erik Hatcher wrote:

>
> On Apr 13, 2006, at 8:55 PM, Jeremy Hanna wrote:
>> Looking at the results, the first document in the results should  
>> hopefully be near the bottom and the Explanation for this document  
>> has a Description/Details (using the toString() on the  
>> Explanation) of:
>>
>> product of:
>>   0.0 = sum of:
>>   0.0 = coord(0/7)
>>
>> So I'm kind of at a loss as to what's going on.  Am I just doing  
>> something crazy weird in my code?  I didn't find that many  
>> examples out there, so I'm kind of winging it according to what  
>> I've read in the javadocs and what examples I could find.
>
> Be sure to pass the document id, not the hit number, to explain().   
> Looks like you passed an id of an unmatched document.
>
> 	Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Boosting Fields (in index) or Queries

Posted by Jeremy Hanna <je...@mac.com>.
Wow, I finally found out why I was getting results in the wrong order  
- I got the results in the correct order from the Lucene index.  I  
got the explanation of each of the results along with their database  
id and found the ordering mismatch.
The problem is in the database call.  I am calling:

select * from product where id in (444, 333, 555, 888);

and the ordering that comes back is not preserved.  So the results  
are correct but the ordering and hence all of the relevancy is out  
the window.  So that at least leads me to the actual problem.  Now I  
have to figure out how I'll approach reordering the results because I  
don't believe that there's any way to force the ordering of a list  
and I don't want to call a separate database query for each id (lots  
of database round-trips).

Thanks for the help Erik!

On Apr 13, 2006, at 7:13 PM, Erik Hatcher wrote:

>
> On Apr 13, 2006, at 8:55 PM, Jeremy Hanna wrote:
>> Looking at the results, the first document in the results should  
>> hopefully be near the bottom and the Explanation for this document  
>> has a Description/Details (using the toString() on the  
>> Explanation) of:
>>
>> product of:
>>   0.0 = sum of:
>>   0.0 = coord(0/7)
>>
>> So I'm kind of at a loss as to what's going on.  Am I just doing  
>> something crazy weird in my code?  I didn't find that many  
>> examples out there, so I'm kind of winging it according to what  
>> I've read in the javadocs and what examples I could find.
>
> Be sure to pass the document id, not the hit number, to explain().   
> Looks like you passed an id of an unmatched document.
>
> 	Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Boosting Fields (in index) or Queries

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 13, 2006, at 8:55 PM, Jeremy Hanna wrote:
> Looking at the results, the first document in the results should  
> hopefully be near the bottom and the Explanation for this document  
> has a Description/Details (using the toString() on the Explanation)  
> of:
>
> product of:
>   0.0 = sum of:
>   0.0 = coord(0/7)
>
> So I'm kind of at a loss as to what's going on.  Am I just doing  
> something crazy weird in my code?  I didn't find that many examples  
> out there, so I'm kind of winging it according to what I've read in  
> the javadocs and what examples I could find.

Be sure to pass the document id, not the hit number, to explain().   
Looks like you passed an id of an unmatched document.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Boosting Fields (in index) or Queries

Posted by Jeremy Hanna <je...@mac.com>.
Thanks for the tip.  I'm trying to decipher what the explanation  
tells me right now.

Btw, here is the code that I'm currently running:

////////

QueryParser nameParser = new QueryParser("name", analyzer);
QueryParser categoryParser = new QueryParser("category", analyzer);
QueryParser descriptionParser = new QueryParser("description",  
analyzer);
QueryParser featuresParser = new QueryParser("features", analyzer);
QueryParser specificationsParser = new QueryParser("specifications",  
analyzer);
QueryParser skuParser = new QueryParser("sku", analyzer);
QueryParser modelParser = new QueryParser("model", analyzer);

descriptionParser.setDefaultOperator(QueryParser.Operator.AND);
featuresParser.setDefaultOperator(QueryParser.Operator.AND);
specificationsParser.setDefaultOperator(QueryParser.Operator.AND);

BooleanQuery booleanQuery = new BooleanQuery();
Query current = null;

current = categoryParser.parse(queryString);
current.setBoost(20.0f);
booleanQuery.add(current, BooleanClause.Occur.SHOULD);

booleanQuery.add(nameParser.parse(queryString),  
BooleanClause.Occur.SHOULD);
booleanQuery.add(descriptionParser.parse(queryString),  
BooleanClause.Occur.SHOULD);
booleanQuery.add(featuresParser.parse(queryString),  
BooleanClause.Occur.SHOULD);
booleanQuery.add(specificationsParser.parse(queryString),  
BooleanClause.Occur.SHOULD);
booleanQuery.add(skuParser.parse(queryString),  
BooleanClause.Occur.SHOULD);
booleanQuery.add(modelParser.parse(queryString),  
BooleanClause.Occur.SHOULD);


hits = indexSearcher.search(booleanQuery);
hits = indexSearcher.search(booleanQuery, Sort.RELEVANCE);

////////

Looking at the results, the first document in the results should  
hopefully be near the bottom and the Explanation for this document  
has a Description/Details (using the toString() on the Explanation) of:

product of:
   0.0 = sum of:
   0.0 = coord(0/7)

So I'm kind of at a loss as to what's going on.  Am I just doing  
something crazy weird in my code?  I didn't find that many examples  
out there, so I'm kind of winging it according to what I've read in  
the javadocs and what examples I could find.

Thanks,
Jeremy

On Apr 13, 2006, at 6:17 PM, Erik Hatcher wrote:

> The best recommendation is to have a look at the Explanation  
> returned from IndexSearcher.explain() for a specific query and  
> document to trace how things are being scored.  Is it possible  
> you're boosting all documents by the same amount?
>
> 	Erik
>
>
> On Apr 13, 2006, at 6:29 PM, Jeremy Hanna wrote:
>
>> I have a situation where I'm indexing database entries and have  
>> fields such as:
>>
>> name
>> sku
>> model
>> category name
>> description
>> features
>> specifications
>>
>> I am trying to set a priority higher for the name, category name,  
>> and description.
>>
>> I've tried setting the fields' boost values as I've indexed the db  
>> and it seemed to have little or no result.
>>
>> When I tried to do the queries, I started using the  
>> MultiFieldQueryParser but found that those don't have priority or  
>> boost values you can set on any of the Fields at query time.  Then  
>> I tried to have separate query parsers - one for each field.  That  
>> way I could set a boost level for each of the queries created by  
>> those query parsers.  I joined them together with a BooleanQuery  
>> and all of them set to BooleanQuery.Occur.SHOULD.  I ended up  
>> setting the features, specifications, and description to default  
>> to Query.Operator.AND and that helped, but the boost value seems  
>> to do nothing.
>>
>> I try to set the categoryParser's query boost to 4.0f, then 8.0f,  
>> then 20.0f and have tried downgrading other queries, but the  
>> results don't change at all in their order.
>>
>> I am using 1.9.1 and for my database I'm using hibernate to mysql  
>> 5 and ArrayLists with the bag mapping in hibernate.
>>
>> Does anyone have any thoughts or suggestions?
>>
>> Thanks!
>>
>> Jeremy
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


Re: Boosting Fields (in index) or Queries

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
The best recommendation is to have a look at the Explanation returned  
from IndexSearcher.explain() for a specific query and document to  
trace how things are being scored.  Is it possible you're boosting  
all documents by the same amount?

	Erik


On Apr 13, 2006, at 6:29 PM, Jeremy Hanna wrote:

> I have a situation where I'm indexing database entries and have  
> fields such as:
>
> name
> sku
> model
> category name
> description
> features
> specifications
>
> I am trying to set a priority higher for the name, category name,  
> and description.
>
> I've tried setting the fields' boost values as I've indexed the db  
> and it seemed to have little or no result.
>
> When I tried to do the queries, I started using the  
> MultiFieldQueryParser but found that those don't have priority or  
> boost values you can set on any of the Fields at query time.  Then  
> I tried to have separate query parsers - one for each field.  That  
> way I could set a boost level for each of the queries created by  
> those query parsers.  I joined them together with a BooleanQuery  
> and all of them set to BooleanQuery.Occur.SHOULD.  I ended up  
> setting the features, specifications, and description to default to  
> Query.Operator.AND and that helped, but the boost value seems to do  
> nothing.
>
> I try to set the categoryParser's query boost to 4.0f, then 8.0f,  
> then 20.0f and have tried downgrading other queries, but the  
> results don't change at all in their order.
>
> I am using 1.9.1 and for my database I'm using hibernate to mysql 5  
> and ArrayLists with the bag mapping in hibernate.
>
> Does anyone have any thoughts or suggestions?
>
> Thanks!
>
> Jeremy
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org