You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Paul Rosen <pa...@performantsoftware.com> on 2009/11/06 22:24:07 UTC

de-boosting certain facets during search

Hi,

I'm using solr-ruby-0.0.8 and solr 1.4.

My data contains a faceted field called "genre". We would like one 
particular genre, (the one named "Citation") to show up last in the results.

I'm having trouble figuring out how to add the boost parameter to the 
solr-ruby call. Here is my code:

req = Solr::Request::Standard.new(:start => start,
   :rows => max,
   :sort => sort_param,
   :query => query,
   :filter_queries => filter_queries,
   :field_list => @field_list,
   :facets => {:fields => @facet_fields,
     :mincount => 1,
     :missing => true,
     :limit => -1},
   :highlighting => {:field_list => ['text'],
     :fragment_size => 600},
     :shards => @cores)

response = @solr.send(req)

Do I just format it inside my query, like this:

query = query + "AND genre:Citation^.01"

or in filter_query, like this:

filter_queries.push("genre:Citation^.01")

or is there a hash parameter that I set?

(Note that the user can select Citation explicitly. I'll probably 
special case that.)

I've tried variations of the above, but I've had no luck so far.

Thanks,
Paul

Re: de-boosting certain facets during search

Posted by Paul Rosen <pa...@performantsoftware.com>.
Thanks Erik,

Your suggestion below works great.

And we do want a particularly relevant Citation to appear higher in the 
list.

I'm guessing that the value of the boost (you've given "5" in your 
example) is important to getting the Citations to be just high enough.

Is there a way for me to determine, in a generic way, what a good value 
for that boost would be? Since there are an infinite number of possible 
queries the user can make, I don't think trial and error is particularly 
useful.

Are there any rules-of-thumb for determining that number?

Erik Hatcher wrote:
> Paul,
> 
> Inline below...
> 
> On Nov 9, 2009, at 6:28 PM, Paul Rosen wrote:
>> If I could just create the desired URL, I can probably work backwards 
>> and construct the correct ruby call.
> 
> Right, this list will always serve you best if you take the Ruby out of 
> the equation.  solr-ruby, while cool and all, isn't very well known by 
> many, but Solr URLs are universal lingo here.
> 
>> http://localhost:8983/solr/resources/select?hl.fragsize=600
>> &hl=true
>> &facet.field=genre
>> &facet.field=archive
>> &facet.limit=-1
>> &qt=standard
>> &start=0
>> &fq=archive%3A%22blake%22
>> &hl.fl=text
>> &fl=uri%2Carchive%2Cdate_label%2Cgenre
>> &facet=true
>> &q=%28history%29
>> &rows=60
>> &facet.missing=true
>> &facet.mincount=1
>>
>> What this search returns from my index is 53 hits. The first 43 
>> contain the genre field value "Citation" and the last 10 do not (they 
>> contain other values in that field.)
>>
>> Note: the genre field is multivalued, if that matters.
> 
> It matters if you want to sort by genre.  It doesn't make sense to sort 
> by a multivalued field though.
> 
>> I'd like the search to put all of the objects that contain genre 
>> "Citation" below the 10 objects that do not contain that genre.
> 
> Are you dogmatic about them _all_ appearing below?  Or might it be ok if 
> a Citation that has substantially better term matching than another type 
> of object appear ahead in the results?
> 
>> I've read the various pages on boosting, but since I'm not actively 
>> searching on the field that I want to put a boost value on, I'm not 
>> sure how to go about this.
> 
> How this is done is dependent on the query parser.  You're using the 
> Lucene query parser.  Something like this might work for you:
> 
>     
> http://localhost:8983/solr/select?q=ipod%20%20OR%20%28ipod%20-manu:Belkin%29^5&debugQuery=true 
> 
> 
> unurlencoded, that is q=ipod  OR (ipod -manu:Belkin)^5, where the users 
> query is repeated in a second clause that boosts up all documents that 
> are not of a particular manufacturer using the example docs that Solr 
> ships with.
> 
> Be sure to use debugQuery=true to look at the score explanations (try 
> looking at the output in the wt=ruby&indent=on format for best 
> readability).
> 
> Additionally...
> 
> 
>>
>> Thanks for any hints.
>>
>> Paul Rosen wrote:
>>> Hi,
>>> I'm using solr-ruby-0.0.8 and solr 1.4.
>>> My data contains a faceted field called "genre". We would like one 
>>> particular genre, (the one named "Citation") to show up last in the 
>>> results.
>>> I'm having trouble figuring out how to add the boost parameter to the 
>>> solr-ruby call. Here is my code:
>>> req = Solr::Request::Standard.new(:start => start,
>>>  :rows => max,
>>>  :sort => sort_param,
>>>  :query => query,
>>>  :filter_queries => filter_queries,
>>>  :field_list => @field_list,
>>>  :facets => {:fields => @facet_fields,
>>>    :mincount => 1,
>>>    :missing => true,
>>>    :limit => -1},
>>>  :highlighting => {:field_list => ['text'],
>>>    :fragment_size => 600},
>>>    :shards => @cores)
>>> response = @solr.send(req)
>>> Do I just format it inside my query, like this:
>>> query = query + "AND genre:Citation^.01"
>>> or in filter_query, like this:
>>> filter_queries.push("genre:Citation^.01")
>>> or is there a hash parameter that I set?
> 
> filter queries (fq) do not contribute to the score, so boosting them 
> makes no score difference at all.
> 
>>> (Note that the user can select Citation explicitly. I'll probably 
>>> special case that.)
>>> I've tried variations of the above, but I've had no luck so far.
>>> Thanks,
>>> Paul
>>
> 
>     Erik
> 
> 


Re: de-boosting certain facets during search

Posted by Erik Hatcher <er...@gmail.com>.
Paul,

Inline below...

On Nov 9, 2009, at 6:28 PM, Paul Rosen wrote:
> If I could just create the desired URL, I can probably work  
> backwards and construct the correct ruby call.

Right, this list will always serve you best if you take the Ruby out  
of the equation.  solr-ruby, while cool and all, isn't very well known  
by many, but Solr URLs are universal lingo here.

> http://localhost:8983/solr/resources/select?hl.fragsize=600
> &hl=true
> &facet.field=genre
> &facet.field=archive
> &facet.limit=-1
> &qt=standard
> &start=0
> &fq=archive%3A%22blake%22
> &hl.fl=text
> &fl=uri%2Carchive%2Cdate_label%2Cgenre
> &facet=true
> &q=%28history%29
> &rows=60
> &facet.missing=true
> &facet.mincount=1
>
> What this search returns from my index is 53 hits. The first 43  
> contain the genre field value "Citation" and the last 10 do not  
> (they contain other values in that field.)
>
> Note: the genre field is multivalued, if that matters.

It matters if you want to sort by genre.  It doesn't make sense to  
sort by a multivalued field though.

> I'd like the search to put all of the objects that contain genre  
> "Citation" below the 10 objects that do not contain that genre.

Are you dogmatic about them _all_ appearing below?  Or might it be ok  
if a Citation that has substantially better term matching than another  
type of object appear ahead in the results?

> I've read the various pages on boosting, but since I'm not actively  
> searching on the field that I want to put a boost value on, I'm not  
> sure how to go about this.

How this is done is dependent on the query parser.  You're using the  
Lucene query parser.  Something like this might work for you:

     http://localhost:8983/solr/select?q=ipod%20%20OR%20%28ipod%20-manu:Belkin%29 
^5&debugQuery=true

unurlencoded, that is q=ipod  OR (ipod -manu:Belkin)^5, where the  
users query is repeated in a second clause that boosts up all  
documents that are not of a particular manufacturer using the example  
docs that Solr ships with.

Be sure to use debugQuery=true to look at the score explanations (try  
looking at the output in the wt=ruby&indent=on format for best  
readability).

Additionally...


>
> Thanks for any hints.
>
> Paul Rosen wrote:
>> Hi,
>> I'm using solr-ruby-0.0.8 and solr 1.4.
>> My data contains a faceted field called "genre". We would like one  
>> particular genre, (the one named "Citation") to show up last in the  
>> results.
>> I'm having trouble figuring out how to add the boost parameter to  
>> the solr-ruby call. Here is my code:
>> req = Solr::Request::Standard.new(:start => start,
>>  :rows => max,
>>  :sort => sort_param,
>>  :query => query,
>>  :filter_queries => filter_queries,
>>  :field_list => @field_list,
>>  :facets => {:fields => @facet_fields,
>>    :mincount => 1,
>>    :missing => true,
>>    :limit => -1},
>>  :highlighting => {:field_list => ['text'],
>>    :fragment_size => 600},
>>    :shards => @cores)
>> response = @solr.send(req)
>> Do I just format it inside my query, like this:
>> query = query + "AND genre:Citation^.01"
>> or in filter_query, like this:
>> filter_queries.push("genre:Citation^.01")
>> or is there a hash parameter that I set?

filter queries (fq) do not contribute to the score, so boosting them  
makes no score difference at all.

>> (Note that the user can select Citation explicitly. I'll probably  
>> special case that.)
>> I've tried variations of the above, but I've had no luck so far.
>> Thanks,
>> Paul
>

	Erik



Re: de-boosting certain facets during search

Posted by Paul Rosen <pa...@performantsoftware.com>.
I'm still going around in a circle on this. I'm not sure why it's not 
sinking in...

If I could just create the desired URL, I can probably work backwards 
and construct the correct ruby call.

Here is the URL that I'm currently creating (I've added newlines here 
for readability):

http://localhost:8983/solr/resources/select?hl.fragsize=600
&hl=true
&facet.field=genre
&facet.field=archive
&facet.limit=-1
&qt=standard
&start=0
&fq=archive%3A%22blake%22
&hl.fl=text
&fl=uri%2Carchive%2Cdate_label%2Cgenre
&facet=true
&q=%28history%29
&rows=60
&facet.missing=true
&facet.mincount=1

What this search returns from my index is 53 hits. The first 43 contain 
the genre field value "Citation" and the last 10 do not (they contain 
other values in that field.)

Note: the genre field is multivalued, if that matters.

I'd like the search to put all of the objects that contain genre 
"Citation" below the 10 objects that do not contain that genre.

I've read the various pages on boosting, but since I'm not actively 
searching on the field that I want to put a boost value on, I'm not sure 
how to go about this.

Thanks for any hints.

Paul Rosen wrote:
> Hi,
> 
> I'm using solr-ruby-0.0.8 and solr 1.4.
> 
> My data contains a faceted field called "genre". We would like one 
> particular genre, (the one named "Citation") to show up last in the 
> results.
> 
> I'm having trouble figuring out how to add the boost parameter to the 
> solr-ruby call. Here is my code:
> 
> req = Solr::Request::Standard.new(:start => start,
>   :rows => max,
>   :sort => sort_param,
>   :query => query,
>   :filter_queries => filter_queries,
>   :field_list => @field_list,
>   :facets => {:fields => @facet_fields,
>     :mincount => 1,
>     :missing => true,
>     :limit => -1},
>   :highlighting => {:field_list => ['text'],
>     :fragment_size => 600},
>     :shards => @cores)
> 
> response = @solr.send(req)
> 
> Do I just format it inside my query, like this:
> 
> query = query + "AND genre:Citation^.01"
> 
> or in filter_query, like this:
> 
> filter_queries.push("genre:Citation^.01")
> 
> or is there a hash parameter that I set?
> 
> (Note that the user can select Citation explicitly. I'll probably 
> special case that.)
> 
> I've tried variations of the above, but I've had no luck so far.
> 
> Thanks,
> Paul