You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Robert Brown <ro...@intelcompute.com> on 2011/10/06 15:58:24 UTC

negative boosts for docs with common field value

Hi,

For the sake of simplicity, I have an index with docs containing the 
following fields:

Title
Description
Author

Some searches will obviously be saturated by docs from any given 
author if they've simply written more.

I'd like to give a negative boost to these matches, there-by making 
sure that 1 Author doesn't saturate the results just because they've 
written 500 documents, compared to others who may have only written 2-3 
documents.

The actual author value doesn't matter, I just want to bring down the 
score of docs by any common author to give more varied results.

What's the easiest approach for this, and is it even possible at query 
time?  I could do this at index time but would prefer a Solr solution.

Solr 3.4 using edismax handler

Thanks,
Rob


Re: negative boosts for docs with common field value

Posted by Markus Jelsma <ma...@openindex.io>.
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_a_very_low_boost_to_documents_that_match_my_query

> Hi,
> 
> For the sake of simplicity, I have an index with docs containing the
> following fields:
> 
> Title
> Description
> Author
> 
> Some searches will obviously be saturated by docs from any given
> author if they've simply written more.
> 
> I'd like to give a negative boost to these matches, there-by making
> sure that 1 Author doesn't saturate the results just because they've
> written 500 documents, compared to others who may have only written 2-3
> documents.
> 
> The actual author value doesn't matter, I just want to bring down the
> score of docs by any common author to give more varied results.
> 
> What's the easiest approach for this, and is it even possible at query
> time?  I could do this at index time but would prefer a Solr solution.
> 
> Solr 3.4 using edismax handler
> 
> Thanks,
> Rob

Re: negative boosts for docs with common field value

Posted by Chris Hostetter <ho...@fucit.org>.
: The setup for this question was to simplify the actual environment,
: we're not actually demoting popular authors.

Well, the beter you describe your problem in terms of your *actual* goal,, 
the more likely people can help give you applicable answers...

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341



-Hoss

Re: negative boosts for docs with common field value

Posted by Rob Brown <ro...@intelcompute.com>.
The setup for this question was to simplify the actual environment,
we're not actually demoting popular authors.

Perhaps index-time (negative) boosts are indeed the only way.


-- 

IntelCompute
Web Design and Online Marketing

http://www.intelcompute.com


-----Original Message-----
From: Chris Hostetter <ho...@fucit.org>
Reply-to: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: negative boosts for docs with common field value
Date: Tue, 11 Oct 2011 15:37:03 -0700 (PDT)

: Some searches will obviously be saturated by docs from any given author if
: they've simply written more.
: 
: I'd like to give a negative boost to these matches, there-by making sure that
: 1 Author doesn't saturate the results just because they've written 500
: documents, compared to others who may have only written 2-3 documents.
: 
: The actual author value doesn't matter, I just want to bring down the score of
: docs by any common author to give more varied results.
: 
: What's the easiest approach for this, and is it even possible at query time?
: I could do this at index time but would prefer a Solr solution.

w/o a custom plugin, the only way i know of to do something like this 
would be to index a numeric "author_prolificness" field in each doc and 
use that as the basis of a function query.

but honestly: i *really* don't think you want to do this - not if you are 
dealing with real user queries (maybe if this is for some syntheticly 
generated "related documents" or "interesting documents" query)

Imagine a user is searching for a *very* specific title (ie: "Nightfall") 
by a very prolific author ("Isaac Asimov).  What your'e describing would 
penalize the desired match just because the author is prolific -- even if 
the user types in the exact title of a document, so that some much more 
esoteric document with the same title by an author who has written nothing 
else ("Stephen Leather") would likely score higher.


I mean: if someone types in "Romeo and Juliet" do you really want to score 
documents by "Shakespeare" lower then documents by "Stanley W. Wells" just 
because Wells has written fewer total books?



-Hoss


Re: negative boosts for docs with common field value

Posted by Chris Hostetter <ho...@fucit.org>.
: Some searches will obviously be saturated by docs from any given author if
: they've simply written more.
: 
: I'd like to give a negative boost to these matches, there-by making sure that
: 1 Author doesn't saturate the results just because they've written 500
: documents, compared to others who may have only written 2-3 documents.
: 
: The actual author value doesn't matter, I just want to bring down the score of
: docs by any common author to give more varied results.
: 
: What's the easiest approach for this, and is it even possible at query time?
: I could do this at index time but would prefer a Solr solution.

w/o a custom plugin, the only way i know of to do something like this 
would be to index a numeric "author_prolificness" field in each doc and 
use that as the basis of a function query.

but honestly: i *really* don't think you want to do this - not if you are 
dealing with real user queries (maybe if this is for some syntheticly 
generated "related documents" or "interesting documents" query)

Imagine a user is searching for a *very* specific title (ie: "Nightfall") 
by a very prolific author ("Isaac Asimov).  What your'e describing would 
penalize the desired match just because the author is prolific -- even if 
the user types in the exact title of a document, so that some much more 
esoteric document with the same title by an author who has written nothing 
else ("Stephen Leather") would likely score higher.


I mean: if someone types in "Romeo and Juliet" do you really want to score 
documents by "Shakespeare" lower then documents by "Stanley W. Wells" just 
because Wells has written fewer total books?



-Hoss

Re: negative boosts for docs with common field value

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Robert,

We've handled situations like this before by writing a custom Solr SearchComponent that acts as a diversifier with pluggable diversification algorithms.  Maybe something like that would work for you, too?

Otis
----

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


----- Original Message -----
> From: Robert Brown <ro...@intelcompute.com>
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Thursday, October 6, 2011 10:31 AM
> Subject: Re: negative boosts for docs with common field value
> 
> We don't want to limit the number of results coming back, so
> unfortunately grouping doesn't quite fix it, plus it would, by nature,
> group docs by a particular Author together which might not necessarily
> be adjacent.
> 
> 
> 
> On Thu, 6 Oct 2011 07:16:48 -0700 (PDT), Ahmet Arslan
> <io...@yahoo.com> wrote:
>>>  For the sake of simplicity, I have an index with docs
>>>  containing the following fields:
>>> 
>>>  Title
>>>  Description
>>>  Author
>>> 
>>>  Some searches will obviously be saturated by docs from any
>>>  given author if they've simply written more.
>>> 
>>>  I'd like to give a negative boost to these matches,
>>>  there-by making sure that 1 Author doesn't saturate the
>>>  results just because they've written 500 documents, compared
>>>  to others who may have only written 2-3 documents.
>>> 
>>>  The actual author value doesn't matter, I just want to
>>>  bring down the score of docs by any common author to give
>>>  more varied results.
>>> 
>>>  What's the easiest approach for this, and is it even
>>>  possible at query time?  I could do this at index time
>>>  but would prefer a Solr solution.
>>> 
>>>  Solr 3.4 using edismax handler
>> 
>>  You can consider grouping results by author name. Display 2-3 results
>>  per author, and put a link saying "see remaining xxx documents of this
>>  author"
>> 
>>  http://wiki.apache.org/solr/FieldCollapsing
>

Re: negative boosts for docs with common field value

Posted by Robert Brown <ro...@intelcompute.com>.
We don't want to limit the number of results coming back, so
unfortunately grouping doesn't quite fix it, plus it would, by nature,
group docs by a particular Author together which might not necessarily
be adjacent.



On Thu, 6 Oct 2011 07:16:48 -0700 (PDT), Ahmet Arslan
<io...@yahoo.com> wrote:
>> For the sake of simplicity, I have an index with docs
>> containing the following fields:
>>
>> Title
>> Description
>> Author
>>
>> Some searches will obviously be saturated by docs from any
>> given author if they've simply written more.
>>
>> I'd like to give a negative boost to these matches,
>> there-by making sure that 1 Author doesn't saturate the
>> results just because they've written 500 documents, compared
>> to others who may have only written 2-3 documents.
>>
>> The actual author value doesn't matter, I just want to
>> bring down the score of docs by any common author to give
>> more varied results.
>>
>> What's the easiest approach for this, and is it even
>> possible at query time?  I could do this at index time
>> but would prefer a Solr solution.
>>
>> Solr 3.4 using edismax handler
> 
> You can consider grouping results by author name. Display 2-3 results
> per author, and put a link saying "see remaining xxx documents of this
> author"
> 
> http://wiki.apache.org/solr/FieldCollapsing


Re: negative boosts for docs with common field value

Posted by Ahmet Arslan <io...@yahoo.com>.
> For the sake of simplicity, I have an index with docs
> containing the following fields:
> 
> Title
> Description
> Author
> 
> Some searches will obviously be saturated by docs from any
> given author if they've simply written more.
> 
> I'd like to give a negative boost to these matches,
> there-by making sure that 1 Author doesn't saturate the
> results just because they've written 500 documents, compared
> to others who may have only written 2-3 documents.
> 
> The actual author value doesn't matter, I just want to
> bring down the score of docs by any common author to give
> more varied results.
> 
> What's the easiest approach for this, and is it even
> possible at query time?  I could do this at index time
> but would prefer a Solr solution.
> 
> Solr 3.4 using edismax handler

You can consider grouping results by author name. Display 2-3 results per author, and put a link saying "see remaining xxx documents of this author"

http://wiki.apache.org/solr/FieldCollapsing