You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Taylor <pa...@fastmail.fm> on 2011/12/20 20:38:56 UTC
Query to find documents whihc contain the same value for a field,
i.e duplicate fields
So I had this code, that would return all documents where there was more
than one document that had the same value for fieldname. Trouble is I
didn't realise this could return documents
that had been deleted, so Im wondering what an equivalent using queries
would be.
public List<Integer> getDuplicates(int columnModelId)
{
String fieldname = String.valueOf(columnModelId);
List<Integer> matches = new ArrayList<Integer>();
if (AudioDataModel.getInstance().getRowCount() == 0)
{
return matches;
}
IndexReader ir;
try
{
ir = getIndexReader();
TermEnum terms = ir.terms(new Term(fieldName, ""));
do
{
if (terms.term() != null)
{
if (terms.docFreq() > 1)
{
TermDocs termDocs = ir.termDocs(terms.term());
while (termDocs.next())
{
Document d = ir.document(termDocs.doc());
matches.add(new
Integer(d.getFieldable(ROW_NUMBER).stringValue()));
}
}
}
}
while (terms.next() && terms.term().field().equals(fieldName));
}
catch (IOException ioe)
{
MainWindow.logger.log(Level.WARNING, "DataIndexer.Problem
searching for duplicates:" + ioe.getMessage(), ioe);
}
return matches;
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Query to find documents whihc contain the same value for a field,
i.e duplicate fields
Posted by Paul Taylor <pa...@fastmail.fm>.
On 20/12/2011 19:38, Paul Taylor wrote:
> So I had this code, that would return all documents where there was
> more than one document that had the same value for fieldname. Trouble
> is I didn't realise this could return documents
> that had been deleted, so Im wondering what an equivalent using
> queries would be.
>
>
> public List<Integer> getDuplicates(int columnModelId)
> {
> String fieldname = String.valueOf(columnModelId);
> List<Integer> matches = new ArrayList<Integer>();
> if (AudioDataModel.getInstance().getRowCount() == 0)
> {
> return matches;
> }
>
> IndexReader ir;
>
> try
> {
> ir = getIndexReader();
> TermEnum terms = ir.terms(new Term(fieldName, ""));
> do
> {
> if (terms.term() != null)
> {
> if (terms.docFreq() > 1)
> {
> TermDocs termDocs = ir.termDocs(terms.term());
> while (termDocs.next())
> {
> Document d = ir.document(termDocs.doc());
> matches.add(new
> Integer(d.getFieldable(ROW_NUMBER).stringValue()));
> }
> }
> }
> }
> while (terms.next() &&
> terms.term().field().equals(fieldName));
> }
> catch (IOException ioe)
> {
> MainWindow.logger.log(Level.WARNING, "DataIndexer.Problem
> searching for duplicates:" + ioe.getMessage(), ioe);
> }
> return matches;
FYI
I stuck with this code but added the IndexReader.isDeleted() check to
ensure the doc was still valid
Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org