You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2011/12/13 03:54:25 UTC

[lucy-dev] LucyX::Search::DedupingSearcher

Greets,

A while back, I wrote a deduping searcher for KinoSearch 0.2x.  A few users
have expressed interest in such a beast, so I've contributed the old code to a
JIRA issue.  When someone wants to work on it, I'll provide help on how to
update it for Lucy.

  https://issues.apache.org/jira/browse/LUCY-198

The API is an IndexSearcher subclass with two extra constructor arguments:
hits_per_unique and dedup_field.  The algorithm, which comes from an old
Lucene module (IIRC, Andrzej Bialecki and Doug Cutting were involved) is to
rerun the search multiple times if necessary, adding filtering to exclude
unwanted results on later iterations.

If anybody wants to work on it or has commentary about the design, speak up.
Otherwise, if there are no objections, expect it to arrive in trunk as
LucyX::Search::DedupingSearcher when somebody finds the tuits to work up a
patch.

Marvin Humphrey