You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Paul Rosen <pa...@performantsoftware.com> on 2009/11/06 22:24:07 UTC
de-boosting certain facets during search
Hi,
I'm using solr-ruby-0.0.8 and solr 1.4.
My data contains a faceted field called "genre". We would like one
particular genre, (the one named "Citation") to show up last in the results.
I'm having trouble figuring out how to add the boost parameter to the
solr-ruby call. Here is my code:
req = Solr::Request::Standard.new(:start => start,
:rows => max,
:sort => sort_param,
:query => query,
:filter_queries => filter_queries,
:field_list => @field_list,
:facets => {:fields => @facet_fields,
:mincount => 1,
:missing => true,
:limit => -1},
:highlighting => {:field_list => ['text'],
:fragment_size => 600},
:shards => @cores)
response = @solr.send(req)
Do I just format it inside my query, like this:
query = query + "AND genre:Citation^.01"
or in filter_query, like this:
filter_queries.push("genre:Citation^.01")
or is there a hash parameter that I set?
(Note that the user can select Citation explicitly. I'll probably
special case that.)
I've tried variations of the above, but I've had no luck so far.
Thanks,
Paul
Re: de-boosting certain facets during search
Posted by Paul Rosen <pa...@performantsoftware.com>.
Thanks Erik,
Your suggestion below works great.
And we do want a particularly relevant Citation to appear higher in the
list.
I'm guessing that the value of the boost (you've given "5" in your
example) is important to getting the Citations to be just high enough.
Is there a way for me to determine, in a generic way, what a good value
for that boost would be? Since there are an infinite number of possible
queries the user can make, I don't think trial and error is particularly
useful.
Are there any rules-of-thumb for determining that number?
Erik Hatcher wrote:
> Paul,
>
> Inline below...
>
> On Nov 9, 2009, at 6:28 PM, Paul Rosen wrote:
>> If I could just create the desired URL, I can probably work backwards
>> and construct the correct ruby call.
>
> Right, this list will always serve you best if you take the Ruby out of
> the equation. solr-ruby, while cool and all, isn't very well known by
> many, but Solr URLs are universal lingo here.
>
>> http://localhost:8983/solr/resources/select?hl.fragsize=600
>> &hl=true
>> &facet.field=genre
>> &facet.field=archive
>> &facet.limit=-1
>> &qt=standard
>> &start=0
>> &fq=archive%3A%22blake%22
>> &hl.fl=text
>> &fl=uri%2Carchive%2Cdate_label%2Cgenre
>> &facet=true
>> &q=%28history%29
>> &rows=60
>> &facet.missing=true
>> &facet.mincount=1
>>
>> What this search returns from my index is 53 hits. The first 43
>> contain the genre field value "Citation" and the last 10 do not (they
>> contain other values in that field.)
>>
>> Note: the genre field is multivalued, if that matters.
>
> It matters if you want to sort by genre. It doesn't make sense to sort
> by a multivalued field though.
>
>> I'd like the search to put all of the objects that contain genre
>> "Citation" below the 10 objects that do not contain that genre.
>
> Are you dogmatic about them _all_ appearing below? Or might it be ok if
> a Citation that has substantially better term matching than another type
> of object appear ahead in the results?
>
>> I've read the various pages on boosting, but since I'm not actively
>> searching on the field that I want to put a boost value on, I'm not
>> sure how to go about this.
>
> How this is done is dependent on the query parser. You're using the
> Lucene query parser. Something like this might work for you:
>
>
> http://localhost:8983/solr/select?q=ipod%20%20OR%20%28ipod%20-manu:Belkin%29^5&debugQuery=true
>
>
> unurlencoded, that is q=ipod OR (ipod -manu:Belkin)^5, where the users
> query is repeated in a second clause that boosts up all documents that
> are not of a particular manufacturer using the example docs that Solr
> ships with.
>
> Be sure to use debugQuery=true to look at the score explanations (try
> looking at the output in the wt=ruby&indent=on format for best
> readability).
>
> Additionally...
>
>
>>
>> Thanks for any hints.
>>
>> Paul Rosen wrote:
>>> Hi,
>>> I'm using solr-ruby-0.0.8 and solr 1.4.
>>> My data contains a faceted field called "genre". We would like one
>>> particular genre, (the one named "Citation") to show up last in the
>>> results.
>>> I'm having trouble figuring out how to add the boost parameter to the
>>> solr-ruby call. Here is my code:
>>> req = Solr::Request::Standard.new(:start => start,
>>> :rows => max,
>>> :sort => sort_param,
>>> :query => query,
>>> :filter_queries => filter_queries,
>>> :field_list => @field_list,
>>> :facets => {:fields => @facet_fields,
>>> :mincount => 1,
>>> :missing => true,
>>> :limit => -1},
>>> :highlighting => {:field_list => ['text'],
>>> :fragment_size => 600},
>>> :shards => @cores)
>>> response = @solr.send(req)
>>> Do I just format it inside my query, like this:
>>> query = query + "AND genre:Citation^.01"
>>> or in filter_query, like this:
>>> filter_queries.push("genre:Citation^.01")
>>> or is there a hash parameter that I set?
>
> filter queries (fq) do not contribute to the score, so boosting them
> makes no score difference at all.
>
>>> (Note that the user can select Citation explicitly. I'll probably
>>> special case that.)
>>> I've tried variations of the above, but I've had no luck so far.
>>> Thanks,
>>> Paul
>>
>
> Erik
>
>
Re: de-boosting certain facets during search
Posted by Erik Hatcher <er...@gmail.com>.
Paul,
Inline below...
On Nov 9, 2009, at 6:28 PM, Paul Rosen wrote:
> If I could just create the desired URL, I can probably work
> backwards and construct the correct ruby call.
Right, this list will always serve you best if you take the Ruby out
of the equation. solr-ruby, while cool and all, isn't very well known
by many, but Solr URLs are universal lingo here.
> http://localhost:8983/solr/resources/select?hl.fragsize=600
> &hl=true
> &facet.field=genre
> &facet.field=archive
> &facet.limit=-1
> &qt=standard
> &start=0
> &fq=archive%3A%22blake%22
> &hl.fl=text
> &fl=uri%2Carchive%2Cdate_label%2Cgenre
> &facet=true
> &q=%28history%29
> &rows=60
> &facet.missing=true
> &facet.mincount=1
>
> What this search returns from my index is 53 hits. The first 43
> contain the genre field value "Citation" and the last 10 do not
> (they contain other values in that field.)
>
> Note: the genre field is multivalued, if that matters.
It matters if you want to sort by genre. It doesn't make sense to
sort by a multivalued field though.
> I'd like the search to put all of the objects that contain genre
> "Citation" below the 10 objects that do not contain that genre.
Are you dogmatic about them _all_ appearing below? Or might it be ok
if a Citation that has substantially better term matching than another
type of object appear ahead in the results?
> I've read the various pages on boosting, but since I'm not actively
> searching on the field that I want to put a boost value on, I'm not
> sure how to go about this.
How this is done is dependent on the query parser. You're using the
Lucene query parser. Something like this might work for you:
http://localhost:8983/solr/select?q=ipod%20%20OR%20%28ipod%20-manu:Belkin%29
^5&debugQuery=true
unurlencoded, that is q=ipod OR (ipod -manu:Belkin)^5, where the
users query is repeated in a second clause that boosts up all
documents that are not of a particular manufacturer using the example
docs that Solr ships with.
Be sure to use debugQuery=true to look at the score explanations (try
looking at the output in the wt=ruby&indent=on format for best
readability).
Additionally...
>
> Thanks for any hints.
>
> Paul Rosen wrote:
>> Hi,
>> I'm using solr-ruby-0.0.8 and solr 1.4.
>> My data contains a faceted field called "genre". We would like one
>> particular genre, (the one named "Citation") to show up last in the
>> results.
>> I'm having trouble figuring out how to add the boost parameter to
>> the solr-ruby call. Here is my code:
>> req = Solr::Request::Standard.new(:start => start,
>> :rows => max,
>> :sort => sort_param,
>> :query => query,
>> :filter_queries => filter_queries,
>> :field_list => @field_list,
>> :facets => {:fields => @facet_fields,
>> :mincount => 1,
>> :missing => true,
>> :limit => -1},
>> :highlighting => {:field_list => ['text'],
>> :fragment_size => 600},
>> :shards => @cores)
>> response = @solr.send(req)
>> Do I just format it inside my query, like this:
>> query = query + "AND genre:Citation^.01"
>> or in filter_query, like this:
>> filter_queries.push("genre:Citation^.01")
>> or is there a hash parameter that I set?
filter queries (fq) do not contribute to the score, so boosting them
makes no score difference at all.
>> (Note that the user can select Citation explicitly. I'll probably
>> special case that.)
>> I've tried variations of the above, but I've had no luck so far.
>> Thanks,
>> Paul
>
Erik
Re: de-boosting certain facets during search
Posted by Paul Rosen <pa...@performantsoftware.com>.
I'm still going around in a circle on this. I'm not sure why it's not
sinking in...
If I could just create the desired URL, I can probably work backwards
and construct the correct ruby call.
Here is the URL that I'm currently creating (I've added newlines here
for readability):
http://localhost:8983/solr/resources/select?hl.fragsize=600
&hl=true
&facet.field=genre
&facet.field=archive
&facet.limit=-1
&qt=standard
&start=0
&fq=archive%3A%22blake%22
&hl.fl=text
&fl=uri%2Carchive%2Cdate_label%2Cgenre
&facet=true
&q=%28history%29
&rows=60
&facet.missing=true
&facet.mincount=1
What this search returns from my index is 53 hits. The first 43 contain
the genre field value "Citation" and the last 10 do not (they contain
other values in that field.)
Note: the genre field is multivalued, if that matters.
I'd like the search to put all of the objects that contain genre
"Citation" below the 10 objects that do not contain that genre.
I've read the various pages on boosting, but since I'm not actively
searching on the field that I want to put a boost value on, I'm not sure
how to go about this.
Thanks for any hints.
Paul Rosen wrote:
> Hi,
>
> I'm using solr-ruby-0.0.8 and solr 1.4.
>
> My data contains a faceted field called "genre". We would like one
> particular genre, (the one named "Citation") to show up last in the
> results.
>
> I'm having trouble figuring out how to add the boost parameter to the
> solr-ruby call. Here is my code:
>
> req = Solr::Request::Standard.new(:start => start,
> :rows => max,
> :sort => sort_param,
> :query => query,
> :filter_queries => filter_queries,
> :field_list => @field_list,
> :facets => {:fields => @facet_fields,
> :mincount => 1,
> :missing => true,
> :limit => -1},
> :highlighting => {:field_list => ['text'],
> :fragment_size => 600},
> :shards => @cores)
>
> response = @solr.send(req)
>
> Do I just format it inside my query, like this:
>
> query = query + "AND genre:Citation^.01"
>
> or in filter_query, like this:
>
> filter_queries.push("genre:Citation^.01")
>
> or is there a hash parameter that I set?
>
> (Note that the user can select Citation explicitly. I'll probably
> special case that.)
>
> I've tried variations of the above, but I've had no luck so far.
>
> Thanks,
> Paul