You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Robert Koberg <ro...@koberg.com> on 2003/06/20 19:45:41 UTC

can't delete from an index using IndexReader.delete()

Hi,

I am using the latest binary distro (lucene-20030620.jar).  I am trying to
delete an entry from an index and then add it back with updated information.

The entry is a content XML piece with some metadata added to the Document. I
try to delete the entry by using a Term derived by the Field 'id' and the
value of that field. The value is correct. What happens is that two entries
exist after executing the code below. 

So, creating a Query for field 'id' with an example value 'abc' will return
two hits. Any ideas what I am doing wrong? Is this a bug?

Also, if you see anything I am doing stupidly or that can be improved,
please let me know.

Thanks,
-Rob


IndexReader reader =
IndexReader.open(project.search_index_path.getNativePath());
reader.delete(new Term("id", member.content_idref));
reader.close();

ISO8601Converter iso_conv = new ISO8601Converter(); 

try {
  IndexWriter writer = new
IndexWriter(project.search_index_path.getNativePath(), new
StandardAnalyzer(), false);
  	
  File f = new
File(project.content_path.lookup(member.content_idref.concat(".xml")).getNat
ivePath());

  XMLSearchHandler hdlr = new XMLSearchHandler(f);

  Document doc = hdlr.getDocument();

  doc.add(Field.Text("id", member.content_idref));      
  doc.add(Field.Text("status", status)); 
  doc.add(Field.Text("type", target_elem.getAttributeValue("type"))); 
  doc.add(Field.Text("creator", target_elem.getAttributeValue("creator"))); 
  doc.add(Field.Text("last_mod_by", member.full_name)); 
  doc.add(Field.Text("modified",
DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("modifie
d"), new ParsePosition(0))))); 
  doc.add(Field.Text("created",
DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("created
"), new ParsePosition(0))))); 
  doc.add(Field.Text("label", label)); 
  doc.add(Field.Text("keywords", keywords));
  
  writer.addDocument(doc);

  writer.optimize();
  writer.close();

} catch (Exception e) {
  ...
}


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: can't delete from an index using IndexReader.delete()

Posted by Robert Koberg <ro...@koberg.com>.
Thanks Jeff and Otis,

After some more testing I am finding that the bug affects only certain docs.

For example if I have a Document in the index with the following IDs it will
not be deleted:

'preamble_content'
'toc_testor'
'B724547'

The following IDs will work and delete the Document from the index:

'art_01_section_01'
'a266122794'

When editing one of duplicates (triplicates, etc) metadata and saving it
again (goes through delete and then re-add again) it adds another entry to
the index with the same id Term.

On providing complete code - I will try to get that out to the list. It
currently reads from config XML files. I will try to make a simple example.

Thanks again,
-Rob



> -----Original Message-----
> From: Jeff Linwood [mailto:jeff@greenninja.com]
> Sent: Sunday, June 22, 2003 9:54 PM
> To: Lucene Users List
> 
> Hi,
> 
> Can you check the return value of your reader.delete(...); call?
> According to the Javadocs, it should return the number of documents it
> deleted, maybe you can verify that it is deleting an entry?
> 
> Jeff
> 
> Otis Gospodnetic wrote:
> > The code looks fine.  Unfortunately, the provided code is not a full,
> > self-sufficient class that I can run on my machine to verify the
> > behaviour that you are describing.
> >
> > Otis
> >
> > --- Robert Koberg <ro...@koberg.com> wrote:
> >
> >>Hi,
> >>
> >>I am using the latest binary distro (lucene-20030620.jar).  I am
> >>trying to
> >>delete an entry from an index and then add it back with updated
> >>information.
> >>
> >>The entry is a content XML piece with some metadata added to the
> >>Document. I
> >>try to delete the entry by using a Term derived by the Field 'id' and
> >>the
> >>value of that field. The value is correct. What happens is that two
> >>entries
> >>exist after executing the code below.
> >>
> >>So, creating a Query for field 'id' with an example value 'abc' will
> >>return
> >>two hits. Any ideas what I am doing wrong? Is this a bug?
> >>
> >>Also, if you see anything I am doing stupidly or that can be
> >>improved,
> >>please let me know.
> >>
> >>Thanks,
> >>-Rob
> >>
> >>
> >>IndexReader reader =
> >>IndexReader.open(project.search_index_path.getNativePath());
> >>reader.delete(new Term("id", member.content_idref));
> >>reader.close();
> >>
> >>ISO8601Converter iso_conv = new ISO8601Converter();
> >>
> >>try {
> >>  IndexWriter writer = new
> >>IndexWriter(project.search_index_path.getNativePath(), new
> >>StandardAnalyzer(), false);
> >>
> >>  File f = new
> >>
> >
> >
> File(project.content_path.lookup(member.content_idref.concat(".xml")).getN
> at
> >
> >>ivePath());
> >>
> >>  XMLSearchHandler hdlr = new XMLSearchHandler(f);
> >>
> >>  Document doc = hdlr.getDocument();
> >>
> >>  doc.add(Field.Text("id", member.content_idref));
> >>  doc.add(Field.Text("status", status));
> >>  doc.add(Field.Text("type", target_elem.getAttributeValue("type")));
> >>
> >>  doc.add(Field.Text("creator",
> >>target_elem.getAttributeValue("creator")));
> >>  doc.add(Field.Text("last_mod_by", member.full_name));
> >>  doc.add(Field.Text("modified",
> >>
> >
> >
> DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("modif
> ie
> >
> >>d"), new ParsePosition(0)))));
> >>  doc.add(Field.Text("created",
> >>
> >
> >
> DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("creat
> ed
> >
> >>"), new ParsePosition(0)))));
> >>  doc.add(Field.Text("label", label));
> >>  doc.add(Field.Text("keywords", keywords));
> >>
> >>  writer.addDocument(doc);
> >>
> >>  writer.optimize();
> >>  writer.close();
> >>
> >>} catch (Exception e) {
> >>  ...
> >>}
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>
> >
> >
> >
> > __________________________________
> > Do you Yahoo!?
> > SBC Yahoo! DSL - Now only $29.95 per month!
> > http://sbc.yahoo.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: can't delete from an index using IndexReader.delete()

Posted by Jeff Linwood <je...@greenninja.com>.
Hi,

Can you check the return value of your reader.delete(...); call? 
According to the Javadocs, it should return the number of documents it 
deleted, maybe you can verify that it is deleting an entry?

Jeff

Otis Gospodnetic wrote:
> The code looks fine.  Unfortunately, the provided code is not a full,
> self-sufficient class that I can run on my machine to verify the
> behaviour that you are describing.
> 
> Otis
> 
> --- Robert Koberg <ro...@koberg.com> wrote:
> 
>>Hi,
>>
>>I am using the latest binary distro (lucene-20030620.jar).  I am
>>trying to
>>delete an entry from an index and then add it back with updated
>>information.
>>
>>The entry is a content XML piece with some metadata added to the
>>Document. I
>>try to delete the entry by using a Term derived by the Field 'id' and
>>the
>>value of that field. The value is correct. What happens is that two
>>entries
>>exist after executing the code below. 
>>
>>So, creating a Query for field 'id' with an example value 'abc' will
>>return
>>two hits. Any ideas what I am doing wrong? Is this a bug?
>>
>>Also, if you see anything I am doing stupidly or that can be
>>improved,
>>please let me know.
>>
>>Thanks,
>>-Rob
>>
>>
>>IndexReader reader =
>>IndexReader.open(project.search_index_path.getNativePath());
>>reader.delete(new Term("id", member.content_idref));
>>reader.close();
>>
>>ISO8601Converter iso_conv = new ISO8601Converter(); 
>>
>>try {
>>  IndexWriter writer = new
>>IndexWriter(project.search_index_path.getNativePath(), new
>>StandardAnalyzer(), false);
>>  	
>>  File f = new
>>
> 
> File(project.content_path.lookup(member.content_idref.concat(".xml")).getNat
> 
>>ivePath());
>>
>>  XMLSearchHandler hdlr = new XMLSearchHandler(f);
>>
>>  Document doc = hdlr.getDocument();
>>
>>  doc.add(Field.Text("id", member.content_idref));      
>>  doc.add(Field.Text("status", status)); 
>>  doc.add(Field.Text("type", target_elem.getAttributeValue("type")));
>>
>>  doc.add(Field.Text("creator",
>>target_elem.getAttributeValue("creator"))); 
>>  doc.add(Field.Text("last_mod_by", member.full_name)); 
>>  doc.add(Field.Text("modified",
>>
> 
> DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("modifie
> 
>>d"), new ParsePosition(0))))); 
>>  doc.add(Field.Text("created",
>>
> 
> DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("created
> 
>>"), new ParsePosition(0))))); 
>>  doc.add(Field.Text("label", label)); 
>>  doc.add(Field.Text("keywords", keywords));
>>  
>>  writer.addDocument(doc);
>>
>>  writer.optimize();
>>  writer.close();
>>
>>} catch (Exception e) {
>>  ...
>>}
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
> 
> 
> 
> __________________________________
> Do you Yahoo!?
> SBC Yahoo! DSL - Now only $29.95 per month!
> http://sbc.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: can't delete from an index using IndexReader.delete()

Posted by Robert Koberg <ro...@koberg.com>.
Here is a simple class that can reproduce the problem (happens with the last
stable release too). Let me know if you would prefer this as an attachment.

Call like this:
java TestReaderDelete existing_id new_label
- or -

Try:
java TestReaderDelete B724547 ppppppp

and then try:
java TestReaderDelete a266122794 ppppppp

If an index has not been created it will create one. Keep running the one of
the above example commands (with and without deleting the index directory)
and see what happens to the System.out.println's



import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.DateField;

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.xml.sax.Attributes;
import javax.xml.parsers.*;

import java.io.*;
import java.util.*;


class TestReaderDelete {

  

  public static void main(String[] args) 
    throws IOException
  {
    File index = new File("./testindex");
    if (!index.exists()) {
      HashMap test_map = new HashMap();
      test_map.put("preamble_content", "Preamble content bbb");
      test_map.put("art_01_section_01", "Article 1, Section 1");
      test_map.put("toc_tester", "Test TOC XML bbb");
      test_map.put("B724547", "bio example");
      test_map.put("a266122794", "tester");
      indexFiles(index, test_map);
    } 
    String identifier = args[0];
    String new_label = args[1];
    testDeleteAndAdd(index, identifier, new_label);
  }
  

  public static void indexFiles(File index, HashMap test_map) 
  {
    try {
      IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(),
true);
      for (Iterator i=test_map.entrySet().iterator(); i.hasNext(); ) {
        Map.Entry e = (Map.Entry) i.next();
System.out.println("Adding: " + e.getKey() + " = " + e.getValue());
        Document doc = new Document();
        doc.add(Field.Text("id", (String)e.getKey()));      
        doc.add(Field.Text("label", (String)e.getValue())); 
        writer.addDocument(doc);
      }
      writer.optimize();
      writer.close();
    } catch (Exception e) {
      System.out.println(" caught a " + e.getClass() +
			 "\n with message: " + e.getMessage());
    }
  }
  
  
  public static void testDeleteAndAdd(File index, String identifier, String
new_label) 
    throws IOException
  {
    IndexReader reader = IndexReader.open(index);
System.out.println("!!! reader.numDocs() : " + reader.numDocs());    
System.out.println("reader.indexExists(): " + reader.indexExists(index));

System.out.println("term field: " + new Term("id", identifier).field());
System.out.println("term text: " + new Term("id", identifier).text());
System.out.println("reader.docFreq: " + reader.docFreq(new Term("id",
identifier)));  
System.out.println("deleting target now...");    
    int deleted_num = reader.delete(new Term("id", identifier));
System.out.println("*** deleted_num: " + deleted_num);    
    reader.close();
    try {
      IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(),
false);
      String ident = identifier;
      Document doc = new Document();
      doc.add(Field.Text("id", identifier));      
      doc.add(Field.Text("label", new_label)); 
      writer.addDocument(doc);
      writer.optimize();
      writer.close();
    } catch (Exception e) {
      System.out.println(" caught a " + e.getClass() +
			 "\n with message: " + e.getMessage());
    }

System.out.println("!!! reader.numDocs() after deleting and adding : " +
reader.numDocs()); 
  }     
  
}



> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Sunday, June 22, 2003 9:42 PM
> To: Lucene Users List
> 
> The code looks fine.  Unfortunately, the provided code is not a full,
> self-sufficient class that I can run on my machine to verify the
> behaviour that you are describing.
> 
> Otis



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: can't delete from an index using IndexReader.delete()

Posted by Otis Gospodnetic <ot...@yahoo.com>.
The code looks fine.  Unfortunately, the provided code is not a full,
self-sufficient class that I can run on my machine to verify the
behaviour that you are describing.

Otis

--- Robert Koberg <ro...@koberg.com> wrote:
> 
> Hi,
> 
> I am using the latest binary distro (lucene-20030620.jar).  I am
> trying to
> delete an entry from an index and then add it back with updated
> information.
> 
> The entry is a content XML piece with some metadata added to the
> Document. I
> try to delete the entry by using a Term derived by the Field 'id' and
> the
> value of that field. The value is correct. What happens is that two
> entries
> exist after executing the code below. 
> 
> So, creating a Query for field 'id' with an example value 'abc' will
> return
> two hits. Any ideas what I am doing wrong? Is this a bug?
> 
> Also, if you see anything I am doing stupidly or that can be
> improved,
> please let me know.
> 
> Thanks,
> -Rob
> 
> 
> IndexReader reader =
> IndexReader.open(project.search_index_path.getNativePath());
> reader.delete(new Term("id", member.content_idref));
> reader.close();
> 
> ISO8601Converter iso_conv = new ISO8601Converter(); 
> 
> try {
>   IndexWriter writer = new
> IndexWriter(project.search_index_path.getNativePath(), new
> StandardAnalyzer(), false);
>   	
>   File f = new
>
File(project.content_path.lookup(member.content_idref.concat(".xml")).getNat
> ivePath());
> 
>   XMLSearchHandler hdlr = new XMLSearchHandler(f);
> 
>   Document doc = hdlr.getDocument();
> 
>   doc.add(Field.Text("id", member.content_idref));      
>   doc.add(Field.Text("status", status)); 
>   doc.add(Field.Text("type", target_elem.getAttributeValue("type")));
> 
>   doc.add(Field.Text("creator",
> target_elem.getAttributeValue("creator"))); 
>   doc.add(Field.Text("last_mod_by", member.full_name)); 
>   doc.add(Field.Text("modified",
>
DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("modifie
> d"), new ParsePosition(0))))); 
>   doc.add(Field.Text("created",
>
DateField.dateToString(iso_conv.parse(target_elem.getAttributeValue("created
> "), new ParsePosition(0))))); 
>   doc.add(Field.Text("label", label)); 
>   doc.add(Field.Text("keywords", keywords));
>   
>   writer.addDocument(doc);
> 
>   writer.optimize();
>   writer.close();
> 
> } catch (Exception e) {
>   ...
> }
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org