You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Taylor <pa...@fastmail.fm> on 2011/12/20 20:38:56 UTC

Query to find documents whihc contain the same value for a field, i.e duplicate fields

So I had this code, that would return all documents where there was more 
than one document that had the same value for fieldname. Trouble is I 
didn't realise this could return documents
that had been deleted, so Im wondering what an equivalent using queries 
would be.


public List<Integer> getDuplicates(int columnModelId)
{
        String fieldname = String.valueOf(columnModelId);
        List<Integer> matches = new ArrayList<Integer>();
         if (AudioDataModel.getInstance().getRowCount() == 0)
         {
             return matches;
         }

         IndexReader ir;

         try
         {
             ir = getIndexReader();
             TermEnum terms = ir.terms(new Term(fieldName, ""));
             do
             {
                 if (terms.term() != null)
                 {
                     if (terms.docFreq() > 1)
                     {
                         TermDocs termDocs = ir.termDocs(terms.term());
                         while (termDocs.next())
                         {
                             Document d = ir.document(termDocs.doc());
                             matches.add(new 
Integer(d.getFieldable(ROW_NUMBER).stringValue()));
                         }
                     }
                 }
             }
             while (terms.next() && terms.term().field().equals(fieldName));
         }
         catch (IOException ioe)
         {
             MainWindow.logger.log(Level.WARNING, "DataIndexer.Problem 
searching for duplicates:" + ioe.getMessage(), ioe);
         }
         return matches;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Query to find documents whihc contain the same value for a field, i.e duplicate fields

Posted by Paul Taylor <pa...@fastmail.fm>.
On 20/12/2011 19:38, Paul Taylor wrote:
> So I had this code, that would return all documents where there was 
> more than one document that had the same value for fieldname. Trouble 
> is I didn't realise this could return documents
> that had been deleted, so Im wondering what an equivalent using 
> queries would be.
>
>
> public List<Integer> getDuplicates(int columnModelId)
> {
>        String fieldname = String.valueOf(columnModelId);
>        List<Integer> matches = new ArrayList<Integer>();
>         if (AudioDataModel.getInstance().getRowCount() == 0)
>         {
>             return matches;
>         }
>
>         IndexReader ir;
>
>         try
>         {
>             ir = getIndexReader();
>             TermEnum terms = ir.terms(new Term(fieldName, ""));
>             do
>             {
>                 if (terms.term() != null)
>                 {
>                     if (terms.docFreq() > 1)
>                     {
>                         TermDocs termDocs = ir.termDocs(terms.term());
>                         while (termDocs.next())
>                         {
>                             Document d = ir.document(termDocs.doc());
>                             matches.add(new 
> Integer(d.getFieldable(ROW_NUMBER).stringValue()));
>                         }
>                     }
>                 }
>             }
>             while (terms.next() && 
> terms.term().field().equals(fieldName));
>         }
>         catch (IOException ioe)
>         {
>             MainWindow.logger.log(Level.WARNING, "DataIndexer.Problem 
> searching for duplicates:" + ioe.getMessage(), ioe);
>         }
>         return matches;
FYI

I stuck with this code but added the IndexReader.isDeleted() check to 
ensure the doc was still valid

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org