You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2007/10/02 17:25:21 UTC

Re: DuplicatesFilter - one for contrib?

+1

-Grant
On Sep 30, 2007, at 4:47 PM, markharw00d wrote:

> I've put together a new Filter and Junit test for eliminating  
> duplicates from search results.
>
> The typical usage scenario is where multiple documents exist in the  
> index which share an untokenized field value  (e.g. the same  
> primary key or URL). It is desirable to keep copies in the index  
> because some searches wish to see the multiple versions (e.g. to  
> view a revision history for a document). However, when a search is  
> done which needs to return only one version of each document (often  
> the latest version) this filter can be used as an efficient means  
> of filtering results. The bitset produced marks ALL the "master"  
> docs in an index for a field and this filter can be safely cached  
> for reuse with any query
>
>        DuplicateFilter df=new DuplicateFilter(KEY_FIELD_NAME);
>        df.setKeepMode(DuplicateFilter.KM_USE_LAST_OCCURRENCE);
>        Hits h = searcher.search(query,df);
>
>
> If anyone else finds this useful I'll commit it.
>
> Cheers
> Mark
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org