You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Trevor Watson <tw...@datassimilate.com> on 2011/06/21 20:43:35 UTC

[Lucene.Net] De-Duplication or flagging as duplicate

     Is there a way to indicate that a hit is a duplicate of another hit 
in the HitsCollector object or TopDocs?

     We have an MD5 field that includes an MD5 hash of the files 
selected and would like to indicate in a DataGrid (which uses 
VirtualMode to read the information from a database that an item is a 
duplicate without looping through each row in the table (and thus 
hitting the database for each row)).

So what we do is:

Pass Hits object to control
Control creates a DataGrid in virtual mode with Hits.Count() rows
When CellValueNeeded is fired (each visible row), we get the FileId from 
the Hits collection and then read the row information from a database.

I tried adding code to the CellValueNeeded that checked checked to see 
if the Lucene Index contained the same MD5 and the same search criteria 
with the ID from 0->the FileId (exclusive).  But this added too much 
overhead to the loading of the table.

Thanks in advance.
Trevor