You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Trevor Watson <tw...@datassimilate.com> on 2011/06/21 20:43:35 UTC
[Lucene.Net] De-Duplication or flagging as duplicate
Is there a way to indicate that a hit is a duplicate of another hit
in the HitsCollector object or TopDocs?
We have an MD5 field that includes an MD5 hash of the files
selected and would like to indicate in a DataGrid (which uses
VirtualMode to read the information from a database that an item is a
duplicate without looping through each row in the table (and thus
hitting the database for each row)).
So what we do is:
Pass Hits object to control
Control creates a DataGrid in virtual mode with Hits.Count() rows
When CellValueNeeded is fired (each visible row), we get the FileId from
the Hits collection and then read the row information from a database.
I tried adding code to the CellValueNeeded that checked checked to see
if the Lucene Index contained the same MD5 and the same search criteria
with the ID from 0->the FileId (exclusive). But this added too much
overhead to the loading of the table.
Thanks in advance.
Trevor