You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Robert Brown <ro...@intelcompute.com> on 2011/10/06 15:58:24 UTC
negative boosts for docs with common field value
Hi,
For the sake of simplicity, I have an index with docs containing the
following fields:
Title
Description
Author
Some searches will obviously be saturated by docs from any given
author if they've simply written more.
I'd like to give a negative boost to these matches, there-by making
sure that 1 Author doesn't saturate the results just because they've
written 500 documents, compared to others who may have only written 2-3
documents.
The actual author value doesn't matter, I just want to bring down the
score of docs by any common author to give more varied results.
What's the easiest approach for this, and is it even possible at query
time? I could do this at index time but would prefer a Solr solution.
Solr 3.4 using edismax handler
Thanks,
Rob
Re: negative boosts for docs with common field value
Posted by Markus Jelsma <ma...@openindex.io>.
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_a_very_low_boost_to_documents_that_match_my_query
> Hi,
>
> For the sake of simplicity, I have an index with docs containing the
> following fields:
>
> Title
> Description
> Author
>
> Some searches will obviously be saturated by docs from any given
> author if they've simply written more.
>
> I'd like to give a negative boost to these matches, there-by making
> sure that 1 Author doesn't saturate the results just because they've
> written 500 documents, compared to others who may have only written 2-3
> documents.
>
> The actual author value doesn't matter, I just want to bring down the
> score of docs by any common author to give more varied results.
>
> What's the easiest approach for this, and is it even possible at query
> time? I could do this at index time but would prefer a Solr solution.
>
> Solr 3.4 using edismax handler
>
> Thanks,
> Rob
Re: negative boosts for docs with common field value
Posted by Chris Hostetter <ho...@fucit.org>.
: The setup for this question was to simplify the actual environment,
: we're not actually demoting popular authors.
Well, the beter you describe your problem in terms of your *actual* goal,,
the more likely people can help give you applicable answers...
https://people.apache.org/~hossman/#xyproblem
XY Problem
Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue. Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341
-Hoss
Re: negative boosts for docs with common field value
Posted by Rob Brown <ro...@intelcompute.com>.
The setup for this question was to simplify the actual environment,
we're not actually demoting popular authors.
Perhaps index-time (negative) boosts are indeed the only way.
--
IntelCompute
Web Design and Online Marketing
http://www.intelcompute.com
-----Original Message-----
From: Chris Hostetter <ho...@fucit.org>
Reply-to: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: negative boosts for docs with common field value
Date: Tue, 11 Oct 2011 15:37:03 -0700 (PDT)
: Some searches will obviously be saturated by docs from any given author if
: they've simply written more.
:
: I'd like to give a negative boost to these matches, there-by making sure that
: 1 Author doesn't saturate the results just because they've written 500
: documents, compared to others who may have only written 2-3 documents.
:
: The actual author value doesn't matter, I just want to bring down the score of
: docs by any common author to give more varied results.
:
: What's the easiest approach for this, and is it even possible at query time?
: I could do this at index time but would prefer a Solr solution.
w/o a custom plugin, the only way i know of to do something like this
would be to index a numeric "author_prolificness" field in each doc and
use that as the basis of a function query.
but honestly: i *really* don't think you want to do this - not if you are
dealing with real user queries (maybe if this is for some syntheticly
generated "related documents" or "interesting documents" query)
Imagine a user is searching for a *very* specific title (ie: "Nightfall")
by a very prolific author ("Isaac Asimov). What your'e describing would
penalize the desired match just because the author is prolific -- even if
the user types in the exact title of a document, so that some much more
esoteric document with the same title by an author who has written nothing
else ("Stephen Leather") would likely score higher.
I mean: if someone types in "Romeo and Juliet" do you really want to score
documents by "Shakespeare" lower then documents by "Stanley W. Wells" just
because Wells has written fewer total books?
-Hoss
Re: negative boosts for docs with common field value
Posted by Chris Hostetter <ho...@fucit.org>.
: Some searches will obviously be saturated by docs from any given author if
: they've simply written more.
:
: I'd like to give a negative boost to these matches, there-by making sure that
: 1 Author doesn't saturate the results just because they've written 500
: documents, compared to others who may have only written 2-3 documents.
:
: The actual author value doesn't matter, I just want to bring down the score of
: docs by any common author to give more varied results.
:
: What's the easiest approach for this, and is it even possible at query time?
: I could do this at index time but would prefer a Solr solution.
w/o a custom plugin, the only way i know of to do something like this
would be to index a numeric "author_prolificness" field in each doc and
use that as the basis of a function query.
but honestly: i *really* don't think you want to do this - not if you are
dealing with real user queries (maybe if this is for some syntheticly
generated "related documents" or "interesting documents" query)
Imagine a user is searching for a *very* specific title (ie: "Nightfall")
by a very prolific author ("Isaac Asimov). What your'e describing would
penalize the desired match just because the author is prolific -- even if
the user types in the exact title of a document, so that some much more
esoteric document with the same title by an author who has written nothing
else ("Stephen Leather") would likely score higher.
I mean: if someone types in "Romeo and Juliet" do you really want to score
documents by "Shakespeare" lower then documents by "Stanley W. Wells" just
because Wells has written fewer total books?
-Hoss
Re: negative boosts for docs with common field value
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Robert,
We've handled situations like this before by writing a custom Solr SearchComponent that acts as a diversifier with pluggable diversification algorithms. Maybe something like that would work for you, too?
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
----- Original Message -----
> From: Robert Brown <ro...@intelcompute.com>
> To: solr-user@lucene.apache.org
> Cc:
> Sent: Thursday, October 6, 2011 10:31 AM
> Subject: Re: negative boosts for docs with common field value
>
> We don't want to limit the number of results coming back, so
> unfortunately grouping doesn't quite fix it, plus it would, by nature,
> group docs by a particular Author together which might not necessarily
> be adjacent.
>
>
>
> On Thu, 6 Oct 2011 07:16:48 -0700 (PDT), Ahmet Arslan
> <io...@yahoo.com> wrote:
>>> For the sake of simplicity, I have an index with docs
>>> containing the following fields:
>>>
>>> Title
>>> Description
>>> Author
>>>
>>> Some searches will obviously be saturated by docs from any
>>> given author if they've simply written more.
>>>
>>> I'd like to give a negative boost to these matches,
>>> there-by making sure that 1 Author doesn't saturate the
>>> results just because they've written 500 documents, compared
>>> to others who may have only written 2-3 documents.
>>>
>>> The actual author value doesn't matter, I just want to
>>> bring down the score of docs by any common author to give
>>> more varied results.
>>>
>>> What's the easiest approach for this, and is it even
>>> possible at query time? I could do this at index time
>>> but would prefer a Solr solution.
>>>
>>> Solr 3.4 using edismax handler
>>
>> You can consider grouping results by author name. Display 2-3 results
>> per author, and put a link saying "see remaining xxx documents of this
>> author"
>>
>> http://wiki.apache.org/solr/FieldCollapsing
>
Re: negative boosts for docs with common field value
Posted by Robert Brown <ro...@intelcompute.com>.
We don't want to limit the number of results coming back, so
unfortunately grouping doesn't quite fix it, plus it would, by nature,
group docs by a particular Author together which might not necessarily
be adjacent.
On Thu, 6 Oct 2011 07:16:48 -0700 (PDT), Ahmet Arslan
<io...@yahoo.com> wrote:
>> For the sake of simplicity, I have an index with docs
>> containing the following fields:
>>
>> Title
>> Description
>> Author
>>
>> Some searches will obviously be saturated by docs from any
>> given author if they've simply written more.
>>
>> I'd like to give a negative boost to these matches,
>> there-by making sure that 1 Author doesn't saturate the
>> results just because they've written 500 documents, compared
>> to others who may have only written 2-3 documents.
>>
>> The actual author value doesn't matter, I just want to
>> bring down the score of docs by any common author to give
>> more varied results.
>>
>> What's the easiest approach for this, and is it even
>> possible at query time? I could do this at index time
>> but would prefer a Solr solution.
>>
>> Solr 3.4 using edismax handler
>
> You can consider grouping results by author name. Display 2-3 results
> per author, and put a link saying "see remaining xxx documents of this
> author"
>
> http://wiki.apache.org/solr/FieldCollapsing
Re: negative boosts for docs with common field value
Posted by Ahmet Arslan <io...@yahoo.com>.
> For the sake of simplicity, I have an index with docs
> containing the following fields:
>
> Title
> Description
> Author
>
> Some searches will obviously be saturated by docs from any
> given author if they've simply written more.
>
> I'd like to give a negative boost to these matches,
> there-by making sure that 1 Author doesn't saturate the
> results just because they've written 500 documents, compared
> to others who may have only written 2-3 documents.
>
> The actual author value doesn't matter, I just want to
> bring down the score of docs by any common author to give
> more varied results.
>
> What's the easiest approach for this, and is it even
> possible at query time? I could do this at index time
> but would prefer a Solr solution.
>
> Solr 3.4 using edismax handler
You can consider grouping results by author name. Display 2-3 results per author, and put a link saying "see remaining xxx documents of this author"
http://wiki.apache.org/solr/FieldCollapsing