You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by CIF Search <ci...@gmail.com> on 2009/02/26 12:04:29 UTC

custom reranking

We have a distributed index consisting of several shards. There could be
some documents repeated across shards. We want to remove the duplicate
records from the documents returned from the shards, and re-order the
results by grouping them on the basis of a clustering algorithm and
reranking the documents within a cluster on the basis of log of a particular
returned field value.
How do we go about achieving this? Should we write this logic by
implementing QueryResponseWriter. Also if we remove duplicate records, the
total number of records that are actually returned are less than what were
asked for in the query.

Regards,
CI

Re: custom reranking

Posted by Grant Ingersoll <gs...@apache.org>.
Yeah, that is a good idea.  Some of it can be obtained already through  
the Editorial Boosting, some through function queries, similarity  
factory, custom sorting and other features.

User feedback and click log analysis would be nice features to have as  
well.

http://wiki.apache.org/solr/HowToContribute   ;-)

On Apr 7, 2009, at 6:38 AM, CIF Search wrote:

> Would it not be a good idea to provide Ranking as solr plugin, in  
> which
> users can write their custom ranking algorithms and reorder the  
> results
> returned by Solr in whichever way they need. It may also help Solr  
> users to
> incorporate learning (from search user feedback - such as click  
> logs), and
> reorder the results returned by Solr accordingly and not depend  
> purely on
> relevance as we do today.
>
> Regards,
> CI
>
>
> On Fri, Feb 27, 2009 at 5:21 PM, Grant Ingersoll  
> <gs...@apache.org>wrote:
>
>>
>> On Feb 26, 2009, at 11:16 PM, CIF Search wrote:
>>
>> I believe the query component will generate the query in such a way  
>> that i
>>> get the results that i want, but not process the returned results,  
>>> is that
>>> correct? Is there a way in which i can group the returned results,  
>>> and
>>> rank
>>> each group separately, and return the results together. In other  
>>> words
>>> which
>>> component do I need to write to reorder the returned results as  
>>> per my
>>> requirements.
>>>
>>
>> I'd have a look at what I did for the Clustering patch, i.e.  
>> SOLR-769.  It
>> may even be the case that you can simply plugin your own  
>> SolrClusterer or
>> whatever it's called.  Or, if it doesn't quite fit your needs, give  
>> me
>> feedback/patch and we can update it.  I'm definitely open to ideas  
>> on it.
>>
>>
>>
>>>
>>> Also, the deduplication patch seems interesting, but it doesnt  
>>> appear to
>>> be
>>> expected to work across multiple shards.
>>>
>>>
>> Yeah, that does seem a bit tricky.  Since Solr doesn't support  
>> distributed
>> indexing, it would be tricky to support just yet.
>>
>>
>>
>> Regards,
>>> CI
>>>
>>> On Thu, Feb 26, 2009 at 8:03 PM, Grant Ingersoll  
>>> <gsingers@apache.org
>>>> wrote:
>>>
>>>
>>>> On Feb 26, 2009, at 6:04 AM, CIF Search wrote:
>>>>
>>>> We have a distributed index consisting of several shards. There  
>>>> could be
>>>>
>>>>> some documents repeated across shards. We want to remove the  
>>>>> duplicate
>>>>> records from the documents returned from the shards, and re- 
>>>>> order the
>>>>> results by grouping them on the basis of a clustering algorithm  
>>>>> and
>>>>> reranking the documents within a cluster on the basis of log of a
>>>>> particular
>>>>> returned field value.
>>>>>
>>>>>
>>>>
>>>> I think you would have to implement your own QueryComponent.   
>>>> However,
>>>> you
>>>> may be able to get away with implementing/using Solr's  
>>>> FunctionQuery
>>>> capabilities.
>>>>
>>>> FieldCollapsing is also a likely source of inspiration/help (
>>>> http://www.lucidimagination.com/search/?q=Field+Collapsing#/
>>>> s:email,issues)
>>>>
>>>> As a side note, have you looked at
>>>> http://issues.apache.org/jira/browse/SOLR-769 ?
>>>>
>>>> You might also have a look at the de-duplication patch that is  
>>>> working
>>>> it's
>>>> way through dev: http://wiki.apache.org/solr/Deduplication
>>>>
>>>>
>>>>
>>>> How do we go about achieving this? Should we write this logic by
>>>>> implementing QueryResponseWriter. Also if we remove duplicate  
>>>>> records,
>>>>> the
>>>>> total number of records that are actually returned are less than  
>>>>> what
>>>>> were
>>>>> asked for in the query.
>>>>>
>>>>> Regards,
>>>>> CI
>>>>>
>>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>>
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/ 
>>>> Droids) using
>>>> Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>>
>>>>
>>>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
>> using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Re: custom reranking

Posted by CIF Search <ci...@gmail.com>.
Would it not be a good idea to provide Ranking as solr plugin, in which
users can write their custom ranking algorithms and reorder the results
returned by Solr in whichever way they need. It may also help Solr users to
incorporate learning (from search user feedback - such as click logs), and
reorder the results returned by Solr accordingly and not depend purely on
relevance as we do today.

Regards,
CI


On Fri, Feb 27, 2009 at 5:21 PM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Feb 26, 2009, at 11:16 PM, CIF Search wrote:
>
>  I believe the query component will generate the query in such a way that i
>> get the results that i want, but not process the returned results, is that
>> correct? Is there a way in which i can group the returned results, and
>> rank
>> each group separately, and return the results together. In other words
>> which
>> component do I need to write to reorder the returned results as per my
>> requirements.
>>
>
> I'd have a look at what I did for the Clustering patch, i.e. SOLR-769.  It
> may even be the case that you can simply plugin your own SolrClusterer or
> whatever it's called.  Or, if it doesn't quite fit your needs, give me
> feedback/patch and we can update it.  I'm definitely open to ideas on it.
>
>
>
>>
>> Also, the deduplication patch seems interesting, but it doesnt appear to
>> be
>> expected to work across multiple shards.
>>
>>
> Yeah, that does seem a bit tricky.  Since Solr doesn't support distributed
> indexing, it would be tricky to support just yet.
>
>
>
>  Regards,
>> CI
>>
>> On Thu, Feb 26, 2009 at 8:03 PM, Grant Ingersoll <gsingers@apache.org
>> >wrote:
>>
>>
>>> On Feb 26, 2009, at 6:04 AM, CIF Search wrote:
>>>
>>> We have a distributed index consisting of several shards. There could be
>>>
>>>> some documents repeated across shards. We want to remove the duplicate
>>>> records from the documents returned from the shards, and re-order the
>>>> results by grouping them on the basis of a clustering algorithm and
>>>> reranking the documents within a cluster on the basis of log of a
>>>> particular
>>>> returned field value.
>>>>
>>>>
>>>
>>> I think you would have to implement your own QueryComponent.  However,
>>> you
>>> may be able to get away with implementing/using Solr's FunctionQuery
>>> capabilities.
>>>
>>> FieldCollapsing is also a likely source of inspiration/help (
>>> http://www.lucidimagination.com/search/?q=Field+Collapsing#/
>>> s:email,issues)
>>>
>>> As a side note, have you looked at
>>> http://issues.apache.org/jira/browse/SOLR-769 ?
>>>
>>> You might also have a look at the de-duplication patch that is working
>>> it's
>>> way through dev: http://wiki.apache.org/solr/Deduplication
>>>
>>>
>>>
>>>  How do we go about achieving this? Should we write this logic by
>>>> implementing QueryResponseWriter. Also if we remove duplicate records,
>>>> the
>>>> total number of records that are actually returned are less than what
>>>> were
>>>> asked for in the query.
>>>>
>>>> Regards,
>>>> CI
>>>>
>>>>
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Re: custom reranking

Posted by Grant Ingersoll <gs...@apache.org>.
On Feb 26, 2009, at 11:16 PM, CIF Search wrote:

> I believe the query component will generate the query in such a way  
> that i
> get the results that i want, but not process the returned results,  
> is that
> correct? Is there a way in which i can group the returned results,  
> and rank
> each group separately, and return the results together. In other  
> words which
> component do I need to write to reorder the returned results as per my
> requirements.

I'd have a look at what I did for the Clustering patch, i.e.  
SOLR-769.  It may even be the case that you can simply plugin your own  
SolrClusterer or whatever it's called.  Or, if it doesn't quite fit  
your needs, give me feedback/patch and we can update it.  I'm  
definitely open to ideas on it.


>
>
> Also, the deduplication patch seems interesting, but it doesnt  
> appear to be
> expected to work across multiple shards.
>

Yeah, that does seem a bit tricky.  Since Solr doesn't support  
distributed indexing, it would be tricky to support just yet.


> Regards,
> CI
>
> On Thu, Feb 26, 2009 at 8:03 PM, Grant Ingersoll  
> <gs...@apache.org>wrote:
>
>>
>> On Feb 26, 2009, at 6:04 AM, CIF Search wrote:
>>
>> We have a distributed index consisting of several shards. There  
>> could be
>>> some documents repeated across shards. We want to remove the  
>>> duplicate
>>> records from the documents returned from the shards, and re-order  
>>> the
>>> results by grouping them on the basis of a clustering algorithm and
>>> reranking the documents within a cluster on the basis of log of a
>>> particular
>>> returned field value.
>>>
>>
>>
>> I think you would have to implement your own QueryComponent.   
>> However, you
>> may be able to get away with implementing/using Solr's FunctionQuery
>> capabilities.
>>
>> FieldCollapsing is also a likely source of inspiration/help (
>> http://www.lucidimagination.com/search/?q=Field+Collapsing#/
>> s:email,issues)
>>
>> As a side note, have you looked at
>> http://issues.apache.org/jira/browse/SOLR-769 ?
>>
>> You might also have a look at the de-duplication patch that is  
>> working it's
>> way through dev: http://wiki.apache.org/solr/Deduplication
>>
>>
>>
>>> How do we go about achieving this? Should we write this logic by
>>> implementing QueryResponseWriter. Also if we remove duplicate  
>>> records, the
>>> total number of records that are actually returned are less than  
>>> what were
>>> asked for in the query.
>>>
>>> Regards,
>>> CI
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
>> using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Re: custom reranking

Posted by CIF Search <ci...@gmail.com>.
I believe the query component will generate the query in such a way that i
get the results that i want, but not process the returned results, is that
correct? Is there a way in which i can group the returned results, and rank
each group separately, and return the results together. In other words which
component do I need to write to reorder the returned results as per my
requirements.

Also, the deduplication patch seems interesting, but it doesnt appear to be
expected to work across multiple shards.

Regards,
CI

On Thu, Feb 26, 2009 at 8:03 PM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Feb 26, 2009, at 6:04 AM, CIF Search wrote:
>
> We have a distributed index consisting of several shards. There could be
>> some documents repeated across shards. We want to remove the duplicate
>> records from the documents returned from the shards, and re-order the
>> results by grouping them on the basis of a clustering algorithm and
>> reranking the documents within a cluster on the basis of log of a
>> particular
>> returned field value.
>>
>
>
> I think you would have to implement your own QueryComponent.  However, you
> may be able to get away with implementing/using Solr's FunctionQuery
> capabilities.
>
> FieldCollapsing is also a likely source of inspiration/help (
> http://www.lucidimagination.com/search/?q=Field+Collapsing#/
> s:email,issues)
>
> As a side note, have you looked at
> http://issues.apache.org/jira/browse/SOLR-769 ?
>
> You might also have a look at the de-duplication patch that is working it's
> way through dev: http://wiki.apache.org/solr/Deduplication
>
>
>
>> How do we go about achieving this? Should we write this logic by
>> implementing QueryResponseWriter. Also if we remove duplicate records, the
>> total number of records that are actually returned are less than what were
>> asked for in the query.
>>
>> Regards,
>> CI
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Re: custom reranking

Posted by Grant Ingersoll <gs...@apache.org>.
On Feb 26, 2009, at 6:04 AM, CIF Search wrote:

> We have a distributed index consisting of several shards. There  
> could be
> some documents repeated across shards. We want to remove the duplicate
> records from the documents returned from the shards, and re-order the
> results by grouping them on the basis of a clustering algorithm and
> reranking the documents within a cluster on the basis of log of a  
> particular
> returned field value.


I think you would have to implement your own QueryComponent.  However,  
you may be able to get away with implementing/using Solr's  
FunctionQuery capabilities.

FieldCollapsing is also a likely source of inspiration/help (http://www.lucidimagination.com/search/?q=Field+Collapsing#/ 
s:email,issues)

As a side note, have you looked at http://issues.apache.org/jira/browse/SOLR-769 
  ?

You might also have a look at the de-duplication patch that is working  
it's way through dev: http://wiki.apache.org/solr/Deduplication


>
> How do we go about achieving this? Should we write this logic by
> implementing QueryResponseWriter. Also if we remove duplicate  
> records, the
> total number of records that are actually returned are less than  
> what were
> asked for in the query.
>
> Regards,
> CI

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search