You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Cheng Zhang <zh...@yahoo.com> on 2009/02/26 00:54:48 UTC

unique result

Is it possible to have Solr to remove duplicated query results? 

For example, instead of return 

<result name="response" numFound="572" start="0">
 <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
 <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
 <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
 <doc>  <str name="productGroup_t_i_s_nm">Video Games</str> </doc>
 <doc>  <str name="productGroup_t_i_s_nm">Video Games</str> </doc>
</result>

return:
  <result name="response" numFound="572" start="0">
   <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
   <doc>  <str name="productGroup_t_i_s_nm">Video Games</str> </doc>
  </result>

Thanks a lot,
Kevin


Re: unique result

Posted by "ristretto.rb" <ri...@gmail.com>.
FWIW...  We run a hash or the content and other bits of our docs, and
then remove duplicates according to specific algorithms.  (exactly the
same page content can clearly be hosted on many different urls but,
and domains)  Then, the choosen ones are indexed.  Though we toss the
synonyms in the index too, so we know all it's other "names."

cheers
gene

Gene Campbell
http:www.picante.co.nz
gene at picante point co point nz

http://www.travelbeen.com - "the social search engine for travel"

On Fri, Feb 27, 2009 at 5:53 AM, Cheng Zhang <zh...@yahoo.com> wrote:
> It's exactly what I'm looking for. Thank you Grant.
>
>
> ----- Original Message ----
> From: Grant Ingersoll <gs...@apache.org>
> To: solr-user@lucene.apache.org
> Sent: Thursday, February 26, 2009 6:56:22 AM
> Subject: Re: unique result
>
> I presume these all have different unique ids?
>
> If you can address it at indexing time, then have a look at https://issues.apache.org/jira/browse/SOLR-799
>
> Otherwise, you might look at https://issues.apache.org/jira/browse/SOLR-236
>
>
> On Feb 25, 2009, at 6:54 PM, Cheng Zhang wrote:
>
>> Is it possible to have Solr to remove duplicated query results?
>>
>> For example, instead of return
>>
>> <result name="response" numFound="572" start="0">
>> <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
>> <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
>> <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
>> <doc>  <str name="productGroup_t_i_s_nm">Video Games</str> </doc>
>> <doc>  <str name="productGroup_t_i_s_nm">Video Games</str> </doc>
>> </result>
>>
>> return:
>>  <result name="response" numFound="572" start="0">
>>   <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
>>   <doc>  <str name="productGroup_t_i_s_nm">Video Games</str> </doc>
>>  </result>
>>
>> Thanks a lot,
>> Kevin
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> using Solr/Lucene:
> http://www.lucidimagination.com/search
>

Re: unique result

Posted by Cheng Zhang <zh...@yahoo.com>.
It's exactly what I'm looking for. Thank you Grant. 


----- Original Message ----
From: Grant Ingersoll <gs...@apache.org>
To: solr-user@lucene.apache.org
Sent: Thursday, February 26, 2009 6:56:22 AM
Subject: Re: unique result

I presume these all have different unique ids?

If you can address it at indexing time, then have a look at https://issues.apache.org/jira/browse/SOLR-799

Otherwise, you might look at https://issues.apache.org/jira/browse/SOLR-236


On Feb 25, 2009, at 6:54 PM, Cheng Zhang wrote:

> Is it possible to have Solr to remove duplicated query results?
>
> For example, instead of return
>
> <result name="response" numFound="572" start="0">
> <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
> <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
> <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
> <doc>  <str name="productGroup_t_i_s_nm">Video Games</str> </doc>
> <doc>  <str name="productGroup_t_i_s_nm">Video Games</str> </doc>
> </result>
>
> return:
>  <result name="response" numFound="572" start="0">
>   <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
>   <doc>  <str name="productGroup_t_i_s_nm">Video Games</str> </doc>
>  </result>
>
> Thanks a lot,
> Kevin
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Re: unique result

Posted by Grant Ingersoll <gs...@apache.org>.
I presume these all have different unique ids?

If you can address it at indexing time, then have a look at https://issues.apache.org/jira/browse/SOLR-799

Otherwise, you might look at https://issues.apache.org/jira/browse/SOLR-236


On Feb 25, 2009, at 6:54 PM, Cheng Zhang wrote:

> Is it possible to have Solr to remove duplicated query results?
>
> For example, instead of return
>
> <result name="response" numFound="572" start="0">
> <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
> <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
> <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
> <doc>  <str name="productGroup_t_i_s_nm">Video Games</str> </doc>
> <doc>  <str name="productGroup_t_i_s_nm">Video Games</str> </doc>
> </result>
>
> return:
>  <result name="response" numFound="572" start="0">
>   <doc>  <str name="productGroup_t_i_s_nm">Wireless</str> </doc>
>   <doc>  <str name="productGroup_t_i_s_nm">Video Games</str> </doc>
>  </result>
>
> Thanks a lot,
> Kevin
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search