You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael <so...@gmail.com> on 2009/09/30 21:41:54 UTC

Conditional deduplication

If I index a bunch of email documents, is there a way to say"show me all
email documents, but only one per To: email address"
so that if there are a total of 10 distinct To: fields in the corpus, I get
back 10 email documents?

I'm aware of http://wiki.apache.org/solr/Deduplication but I want to retain
the ability to search across all of my email documents most of the time, and
only occasionally search for the distinct ones.

Essentially I want to do a
SELECT DISTINCT to_field FROM documents
where a normal search is a
SELECT * FROM documents

Thanks for any pointers.

Re: Conditional deduplication

Posted by Mauricio Scheffer <ma...@gmail.com>.
See http://wiki.apache.org/solr/FieldCollapsing

On Wed, Sep 30, 2009 at 4:41 PM, Michael <so...@gmail.com> wrote:

> If I index a bunch of email documents, is there a way to say"show me all
> email documents, but only one per To: email address"
> so that if there are a total of 10 distinct To: fields in the corpus, I get
> back 10 email documents?
>
> I'm aware of http://wiki.apache.org/solr/Deduplication but I want to
> retain
> the ability to search across all of my email documents most of the time,
> and
> only occasionally search for the distinct ones.
>
> Essentially I want to do a
> SELECT DISTINCT to_field FROM documents
> where a normal search is a
> SELECT * FROM documents
>
> Thanks for any pointers.
>