You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dhruba Borthakur <dh...@hotmail.com> on 2004/02/20 10:06:09 UTC

Re:can't delete from an index using IndexReader.delete()

Hi folks,

I am using the latest and greatest Lucene jar file and am facing a problem 
with
deleting documents from the index. Browsing the mail archive, I found that 
the
following email (June 2003) listed the exact problem that I am encountering.

In short: I am using Field.text("id", "value") to mark a document. Then I 
use
reader.delete(new Term("id", "value")) to remove the document: this
call returns 0 and fails to delete the document. The attached sample program
shows this behaviour.

i would appreciate it a lot if anybody in this list has encountered this 
problem
and would like to share his/her solution with me.

thanks,
dhruba


From: Robert Koberg <ro...@koberg.com>
Subject: can't delete from an index using IndexReader.delete()
Date: Mon, 23 Jun 2003 14:38:25 -0700
Content-Type: text/plain;
charset="us-ascii"

Here is a simple class that can reproduce the problem (happens with the last
stable release too). Let me know if you would prefer this as an attachment.

Call like this:
java TestReaderDelete existing_id new_label
- or -

Try:
java TestReaderDelete B724547 ppppppp

and then try:
java TestReaderDelete a266122794 ppppppp

If an index has not been created it will create one. Keep running the one of
the above example commands (with and without deleting the index directory)
and see what happens to the System.out.println's



import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.DateField;

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.xml.sax.Attributes;
import javax.xml.parsers.*;

import java.io.*;
import java.util.*;


class TestReaderDelete {



public static void main(String[] args)
  throws IOException
{
  File index = new File("./testindex");
  if (!index.exists()) {
    HashMap test_map = new HashMap();
    test_map.put("preamble_content", "Preamble content bbb");
    test_map.put("art_01_section_01", "Article 1, Section 1");
    test_map.put("toc_tester", "Test TOC XML bbb");
    test_map.put("B724547", "bio example");
    test_map.put("a266122794", "tester");
    indexFiles(index, test_map);
  }
  String identifier = args[0];
  String new_label = args[1];
  testDeleteAndAdd(index, identifier, new_label);
}


public static void indexFiles(File index, HashMap test_map)
{
  try {
    IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(),
true);
    for (Iterator i=test_map.entrySet().iterator(); i.hasNext(); ) {
      Map.Entry e = (Map.Entry) i.next();
System.out.println("Adding: " + e.getKey() + " = " + e.getValue());
      Document doc = new Document();
      doc.add(Field.Text("id", (String)e.getKey()));
      doc.add(Field.Text("label", (String)e.getValue()));
      writer.addDocument(doc);
    }
    writer.optimize();
    writer.close();
  } catch (Exception e) {
    System.out.println(" caught a " + e.getClass() +
"\n with message: " + e.getMessage());
  }
}


public static void testDeleteAndAdd(File index, String identifier, String
new_label)
  throws IOException
{
  IndexReader reader = IndexReader.open(index);
System.out.println("!!! reader.numDocs() : " + reader.numDocs());
System.out.println("reader.indexExists(): " + reader.indexExists(index));

System.out.println("term field: " + new Term("id", identifier).field());
System.out.println("term text: " + new Term("id", identifier).text());
System.out.println("reader.docFreq: " + reader.docFreq(new Term("id",
identifier)));
System.out.println("deleting target now...");
  int deleted_num = reader.delete(new Term("id", identifier));
System.out.println("*** deleted_num: " + deleted_num);
  reader.close();
  try {
    IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(),
false);
    String ident = identifier;
    Document doc = new Document();
    doc.add(Field.Text("id", identifier));
    doc.add(Field.Text("label", new_label));
    writer.addDocument(doc);
    writer.optimize();
    writer.close();
  } catch (Exception e) {
    System.out.println(" caught a " + e.getClass() +
"\n with message: " + e.getMessage());
  }

System.out.println("!!! reader.numDocs() after deleting and adding : " +
reader.numDocs());
}

}



   -----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Sunday, June 22, 2003 9:42 PM
To: Lucene Users List

The code looks fine.  Unfortunately, the provided code is not a full,
self-sufficient class that I can run on my machine to verify the
behaviour that you are describing.

Otis

_________________________________________________________________
Stay informed on Election 2004 and the race to Super Tuesday. 
http://special.msn.com/msn/election2004.armx


Return-Path: <dh...@hotmail.com>
Received: (qmail 30363 invoked from network); 20 Feb 2004 08:58:38 -0000
Received: from unknown (HELO hotmail.com) (64.4.49.60)
by daedalus.apache.org with SMTP; 20 Feb 2004 08:58:38 -0000
Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC;
Fri, 20 Feb 2004 00:58:50 -0800
Received: from 143.127.3.10 by by14fd.bay14.hotmail.msn.com with HTTP;
Fri, 20 Feb 2004 08:58:50 GMT
X-Originating-IP: [143.127.3.10]
X-Originating-Email: [dhruba_borthakur@hotmail.com]
X-Sender: dhruba_borthakur@hotmail.com
From: "Dhruba Borthakur" <dh...@hotmail.com>
To: lucene-user@jakarta.apache.org
Bcc:
Date: Fri, 20 Feb 2004 00:58:50 -0800
Mime-Version: 1.0
Content-Type: text/plain; format=flowed
Message-ID: <BA...@hotmail.com>
X-OriginalArrivalTime: 20 Feb 2004 08:58:50.0984 (UTC) 
FILETIME=[C25E2680:01C3F78F]
X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N

Hi folks,

I am using the latest and greatest Lucene jar file and am facing a problem
with
deleting documents from the index. Browsing the mail archive, I found that
the
following email (June 2003) listed the exact problem that I am encountering.

In short: I am using Field.text("id", "value") to mark a document. Then I
use
reader.delete(new Term("id", "value")) to remove the document: this
call returns 0 and fails to delete the document. The attached sample program
shows this behaviour.

i would appreciate it a lot if anybody in this list has encountered this
problem before
and would like to share his/her solution with me.

thanks,
dhruba



From: Robert Koberg <ro...@koberg.com>
Subject: can't delete from an index using IndexReader.delete()
Date: Mon, 23 Jun 2003 14:38:25 -0700
Content-Type: text/plain;
charset="us-ascii"

Here is a simple class that can reproduce the problem (happens with the last
stable release too). Let me know if you would prefer this as an attachment.

Call like this:
java TestReaderDelete existing_id new_label
- or -

Try:
java TestReaderDelete B724547 ppppppp

and then try:
java TestReaderDelete a266122794 ppppppp

If an index has not been created it will create one. Keep running the one of
the above example commands (with and without deleting the index directory)
and see what happens to the System.out.println's



import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.DateField;

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.xml.sax.Attributes;
import javax.xml.parsers.*;

import java.io.*;
import java.util.*;


class TestReaderDelete {



public static void main(String[] args)
   throws IOException
{
   File index = new File("./testindex");
   if (!index.exists()) {
     HashMap test_map = new HashMap();
     test_map.put("preamble_content", "Preamble content bbb");
     test_map.put("art_01_section_01", "Article 1, Section 1");
     test_map.put("toc_tester", "Test TOC XML bbb");
     test_map.put("B724547", "bio example");
     test_map.put("a266122794", "tester");
     indexFiles(index, test_map);
   }
   String identifier = args[0];
   String new_label = args[1];
   testDeleteAndAdd(index, identifier, new_label);
}


public static void indexFiles(File index, HashMap test_map)
{
   try {
     IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(),
true);
     for (Iterator i=test_map.entrySet().iterator(); i.hasNext(); ) {
       Map.Entry e = (Map.Entry) i.next();
System.out.println("Adding: " + e.getKey() + " = " + e.getValue());
       Document doc = new Document();
       doc.add(Field.Text("id", (String)e.getKey()));
       doc.add(Field.Text("label", (String)e.getValue()));
       writer.addDocument(doc);
     }
     writer.optimize();
     writer.close();
   } catch (Exception e) {
     System.out.println(" caught a " + e.getClass() +
"\n with message: " + e.getMessage());
   }
}


public static void testDeleteAndAdd(File index, String identifier, String
new_label)
   throws IOException
{
   IndexReader reader = IndexReader.open(index);
System.out.println("!!! reader.numDocs() : " + reader.numDocs());
System.out.println("reader.indexExists(): " + reader.indexExists(index));

System.out.println("term field: " + new Term("id", identifier).field());
System.out.println("term text: " + new Term("id", identifier).text());
System.out.println("reader.docFreq: " + reader.docFreq(new Term("id",
identifier)));
System.out.println("deleting target now...");
   int deleted_num = reader.delete(new Term("id", identifier));
System.out.println("*** deleted_num: " + deleted_num);
   reader.close();
   try {
     IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(),
false);
     String ident = identifier;
     Document doc = new Document();
     doc.add(Field.Text("id", identifier));
     doc.add(Field.Text("label", new_label));
     writer.addDocument(doc);
     writer.optimize();
     writer.close();
   } catch (Exception e) {
     System.out.println(" caught a " + e.getClass() +
"\n with message: " + e.getMessage());
   }

System.out.println("!!! reader.numDocs() after deleting and adding : " +
reader.numDocs());
}

}



    -----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Sunday, June 22, 2003 9:42 PM
To: Lucene Users List

The code looks fine.  Unfortunately, the provided code is not a full,
self-sufficient class that I can run on my machine to verify the
behaviour that you are describing.

Otis

_________________________________________________________________
Find and compare great deals on Broadband access at the MSN High-Speed 
Marketplace. http://click.atdmt.com/AVE/go/onm00200360ave/direct/01/

Re: Re:can't delete from an index using IndexReader.delete()

Posted by Morus Walter <mo...@tanto.de>.
Dhruba Borthakur writes:
> Hi folks,
> 
> I am using the latest and greatest Lucene jar file and am facing a problem 
> with
> deleting documents from the index. Browsing the mail archive, I found that 
> the
> following email (June 2003) listed the exact problem that I am encountering.
> 
> In short: I am using Field.text("id", "value") to mark a document. Then I 
> use
> reader.delete(new Term("id", "value")) to remove the document: this
> call returns 0 and fails to delete the document. The attached sample program
> shows this behaviour.
> 
You don't tell us how your ids look like, but Field.text("id", value)
tokenizes value, that is splits value into whatever the analyzer considers
to be a token, and creates a term for each token. 
Whereas new Term("id", value) creates one term containing value.

So I guess your ids are considered several token by the analyzer you use
and therefore they won't be matched by the term you construct for the delete.

Using keyword fields instead of text fields for the id should help.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: can't delete from an index using IndexReader.delete()

Posted by "Kevin A. Burton" <bu...@newsmonster.org>.
Dhruba Borthakur wrote:

> Hi folks,
>
> I am using the latest and greatest Lucene jar file and am facing a 
> problem with
> deleting documents from the index. Browsing the mail archive, I found 
> that the
> following email (June 2003) listed the exact problem that I am 
> encountering.
>
> In short: I am using Field.text("id", "value") to mark a document. 
> Then I use
> reader.delete(new Term("id", "value")) to remove the document: this
> call returns 0 and fails to delete the document. The attached sample 
> program
> shows this behaviour.

Agreed... you're values might be indexed... try adding them as Tokens...

Kevin

-- 

Please reply using PGP:

    http://peerfear.org/pubkey.asc    

    NewsMonster - http://www.newsmonster.org/
    
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster