You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by nagarjuna <na...@gmail.com> on 2011/10/04 10:55:27 UTC

how to avoid duplicates in search results?

Hi everybody....
  i got the following response
<code>
<?xml version="1.0" encoding="UTF-8" ?> 
- <response>
- <lst name="responseHeader">
  <int name="status">0</int> 
  <int name="QTime">0</int> 
- <lst name="params">
  <str name="df">groups</str> 
  <str name="indent">on</str> 
  <str name="start">0</str> 
  <str name="q">participate</str> 
  <str name="version">2.2</str> 
  <str name="rows">30</str> 
  </lst>
  </lst>
- <result name="response" numFound="2" start="0">
- <doc>
  <str name="description">testing group</str> 
  <str name="name">testing group</str> 
  <str
name="url">http://abc.xyz.com/groups/testing-group/discussions/62</str> 
  </doc>
- <doc>
  <str name="description">testing group</str> 
  <str name="name">testing group</str> 
  <str
name="url">http://abc.xyz.com/groups/testing-group/discussions/62</str> 
  </doc>
  </result>
  </response> 
</code>

i need to remove the duplicte results 

can anyone give me suggestions

--
View this message in context: http://lucene.472066.n3.nabble.com/how-to-avoid-duplicates-in-search-results-tp3392524p3392524.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to avoid duplicates in search results?

Posted by Chris Hostetter <ho...@fucit.org>.
: There is also a Document Duplicate Detection at index time:
: http://wiki.apache.org/solr/Deduplication

Of just setting "url" as your UniqueKey field would solve this simplr 
usecase.  but it's not entirely clear what else you consider "duplicates" 
besides this one example.

: > - <doc>
: >  <str name="description">testing group</str>
: >  <str name="name">testing group</str>
: >  <str
: > name="url">http://abc.xyz.com/groups/testing-group/discussions/62</str>
: >  </doc>
: > - <doc>
: >  <str name="description">testing group</str>
: >  <str name="name">testing group</str>
: >  <str
: > name="url">http://abc.xyz.com/groups/testing-group/discussions/62</str>
: >  </doc>

-Hoss

Re: how to avoid duplicates in search results?

Posted by Edoardo Tosca <e....@sourcesense.com>.
You can probably use the Grouping feature:
http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters

There is also a Document Duplicate Detection at index time:
http://wiki.apache.org/solr/Deduplication

On Tue, Oct 4, 2011 at 9:55 AM, nagarjuna <na...@gmail.com>wrote:

> Hi everybody....
>  i got the following response
> <code>
> <?xml version="1.0" encoding="UTF-8" ?>
> - <response>
> - <lst name="responseHeader">
>  <int name="status">0</int>
>  <int name="QTime">0</int>
> - <lst name="params">
>  <str name="df">groups</str>
>  <str name="indent">on</str>
>  <str name="start">0</str>
>  <str name="q">participate</str>
>  <str name="version">2.2</str>
>  <str name="rows">30</str>
>  </lst>
>  </lst>
> - <result name="response" numFound="2" start="0">
> - <doc>
>  <str name="description">testing group</str>
>  <str name="name">testing group</str>
>  <str
> name="url">http://abc.xyz.com/groups/testing-group/discussions/62</str>
>  </doc>
> - <doc>
>  <str name="description">testing group</str>
>  <str name="name">testing group</str>
>  <str
> name="url">http://abc.xyz.com/groups/testing-group/discussions/62</str>
>  </doc>
>  </result>
>  </response>
> </code>
>
> i need to remove the duplicte results
>
> can anyone give me suggestions
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-avoid-duplicates-in-search-results-tp3392524p3392524.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Edoardo Tosca
Sourcesense - making sense of Open Source: http://www.sourcesense.com