You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Dan Davis <da...@gmail.com> on 2013/08/23 04:21:52 UTC

Removing duplicates during a query

Suppose I have two documents with different id, and there is another field,
for instance "content-hash" which is something like a 16-byte hash of the
content.

Can Solr be configured to return just one copy, and drop the other if both
are relevant?

If Solr does drop one result, do you get any indication in the document
that was kept that there was another copy?

Re: Removing duplicates during a query

Posted by Dan Davis <da...@gmail.com>.

OK - I see that this can be done with Field Collapsing/Grouping.  I also
see the mentions in the Wiki for avoiding duplicates using a 16-byte hash.

So, question withdrawn...

On Thu, Aug 22, 2013 at 10:21 PM, Dan Davis <da...@gmail.com> wrote:

> Suppose I have two documents with different id, and there is another
> field, for instance "content-hash" which is something like a 16-byte hash
> of the content.
>
> Can Solr be configured to return just one copy, and drop the other if both
> are relevant?
>
> If Solr does drop one result, do you get any indication in the document
> that was kept that there was another copy?
>
>