You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by mahaveer jain <ja...@yahoo.com> on 2004/12/30 16:18:39 UTC

Deleting index for DB indexing

Hi All,
 
I am using lucene for my DB indexing. I have 2 columns which are Keyword. 
Now I want to delete my index based on this 2 keyword. 
 
Is it possible ? If no. What is other alternative ?
 
Thanks 
Mahaveer
 
 

		
---------------------------------
Do you Yahoo!?
 Yahoo! Mail - 250MB free storage. Do more. Manage less.

Re: Deleting index for DB indexing

Posted by Morus Walter <mo...@gmx.de>.

mahaveer jain writes:

> I am using lucene for my DB indexing. I have 2 columns which are Keyword. 
> Now I want to delete my index based on this 2 keyword. 
>  
> Is it possible ? If no. What is other alternative ?
>  
You can delete documents based on document number from an index reader.
You can get document numbers from searches.
So if you can search documents to be deleted based on your keywords, there
should be no problem deleting them...

HTH
	Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Deleting index for DB indexing

Posted by mohamed ebrahim faisal <eb...@hotmail.com>.

Hi

U can try out the following code to delete document based on KeyWords


import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;

import org.apache.lucene.search.Searcher;

public class LuceneDelete
{
	private static final String[] strSTOP_WORDS =
        {
			"and",
			"are"
			 };
	private void test() throws Exception
	{
		Analyzer objAnalyzer = new StandardAnalyzer();
		IndexWriter index = new IndexWriter("index",objAnalyzer, true );


		Document objDocument = new Document();

		objDocument.add( Field.Keyword("name","Ebrahim Faisal"));
		objDocument.add( Field.Text("address","Chennai"));
		objDocument.add( Field.Keyword("designation","Software Engineer"));
		objDocument.add( Field.UnIndexed("xyz","123 IndexWriter index"));

		index.addDocument( objDocument );

		objDocument = new Document();

		objDocument.add( Field.Keyword("name","John Smith"));
		objDocument.add( Field.Text("address","Delhi"));
		objDocument.add( Field.Keyword("designation","Sr. Software Engineer"));
		objDocument.add( Field.UnIndexed("xyz","456 StandardAnalyzer true"));

		index.addDocument( objDocument );

		index.optimize();
		index.close();

		//Logic for deleting

		IndexReader objIndexReader = IndexReader.open("index");

		TermDocs objTermDocs = objIndexReader.termDocs(new Term("name","Ebrahim 
Faisal"));

		while( objTermDocs.next() )
		{
			int docNum = objTermDocs.doc();
			objDocument = objIndexReader.document( docNum );
			if( objDocument.get("designation").equalsIgnoreCase("Software Engineer"))
			{
				objIndexReader.delete( docNum );
			}
		}
		objIndexReader.close();


		Searcher objIndexSearcher = new IndexSearcher("index");

		Query objQuery = null;

		objQuery = QueryParser.parse("Delhi", "address"
              , objAnalyzer);


		Hits objHits = objIndexSearcher.search(objQuery);

		System.out.println(" objHits "+objHits.length());

		for (int nStart = 0; nStart < objHits.length(); nStart++)
		{
			objDocument = objHits.doc(nStart);
			System.out.println(" address "+objDocument.get("address"));
		}
		objIndexSearcher.close();
		objIndexSearcher = null;


	}
	public static void main(String[] args) throws Exception
	{
		new LuceneDelete().test();
	}
}




E.FAISAL




>From: mahaveer jain <ja...@yahoo.com>
>Reply-To: "Lucene Users List" <lu...@jakarta.apache.org>
>To: Lucene Users List <lu...@jakarta.apache.org>,  Paul 
><pa...@gmail.com>
>Subject: Re: Deleting index for DB indexing
>Date: Thu, 30 Dec 2004 21:17:48 -0800 (PST)
>
>Thanks Paul,
>
>You idea seems to be good. I ll try that. I have one more question. Should 
>the new key what I create have to be keyword ? or Can it be just a column 
>in the index ?
>
>Mahaveer
>
>Paul <pa...@gmail.com> wrote:
>On Thu, 30 Dec 2004 08:36:04 -0800 (PST), mahaveer jain
>wrote:
> > I am indexing more that 5 tables. And each for them have autoincrement 
>and
> > that is the primary key. So if I do find DocNum, it may so happen that 
>it
> > may delete document I don't want to delete.
>
>you need to create your own global ID, I had the same problem (but I
>used a MD5 hashvalue). One solution ist to give each of your tables an
>internal number and when creating your lucene-documents you add an
>additional field with something like "dbInternalId*100+dbNumber" so
>that db-record 5 in table 3 results in 503. when documents from your
>DB are deleted and you need to update the index you simple create a
>term which's value is calculated the same way and delete the document
>with the IndexReader.delete(Term)
>Instead of calculating you can do string concatenating as well :)
>
>Paul
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around
>http://mail.yahoo.com

_________________________________________________________________
The MS Office product suite. Make efficiency a habit. 
http://www.microsoft.com/india/office/experience/  Simplify your life.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Deleting index for DB indexing

Posted by mahaveer jain <ja...@yahoo.com>.

Thanks Paul,

You idea seems to be good. I ll try that. I have one more question. Should the new key what I create have to be keyword ? or Can it be just a column in the index ?

Mahaveer

Paul <pa...@gmail.com> wrote:
On Thu, 30 Dec 2004 08:36:04 -0800 (PST), mahaveer jain
wrote:
> I am indexing more that 5 tables. And each for them have autoincrement and
> that is the primary key. So if I do find DocNum, it may so happen that it
> may delete document I don't want to delete. 

you need to create your own global ID, I had the same problem (but I
used a MD5 hashvalue). One solution ist to give each of your tables an
internal number and when creating your lucene-documents you add an
additional field with something like "dbInternalId*100+dbNumber" so
that db-record 5 in table 3 results in 503. when documents from your
DB are deleted and you need to update the index you simple create a
term which's value is calculated the same way and delete the document
with the IndexReader.delete(Term)
Instead of calculating you can do string concatenating as well :)

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: Deleting index for DB indexing

Posted by Paul <pa...@gmail.com>.

On Thu, 30 Dec 2004 08:36:04 -0800 (PST), mahaveer jain
<ja...@yahoo.com> wrote:
> I am indexing more that 5 tables. And each for them have autoincrement and
> that is the primary key. So if I do find DocNum, it may so happen that it
> may delete document I don't want to delete. 

you need to create your own global ID, I had the same problem (but I
used a MD5 hashvalue). One solution ist to give each of your tables an
internal number and when creating your lucene-documents you add an
additional field with something like "dbInternalId*100+dbNumber" so
that db-record 5 in table 3 results in 503. when documents from your
DB are deleted and you need to update the index you simple create a
term which's value is calculated the same way and delete the document
with the IndexReader.delete(Term)
Instead of calculating you can do string concatenating as well :)

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Deleting index for DB indexing

Posted by mahaveer jain <ja...@yahoo.com>.

I am indexing more that 5 tables. And each for them have autoincrement and that is the primary key. So if I do find DocNum, it may so happen that it may delete document I don't want to delete.

Paul <pa...@gmail.com> wrote:
Alternative: create a hashed value which is unique within your DB
(e.g. use md5). Afterwards you can delete documents from the index
with the IndexReader(Term).
Without that additional field you can use the IndexSearcher to
retrieve your documents from the index and then use
IndexReader(DocNum) to delete these documents

Paul

On Thu, 30 Dec 2004 07:18:39 -0800 (PST), mahaveer jain
wrote:
> Hi All,
> 
> I am using lucene for my DB indexing. I have 2 columns which are Keyword.
> Now I want to delete my index based on this 2 keyword.
> 
> Is it possible ? If no. What is other alternative ?
> 
> Thanks
> Mahaveer
> 
> 
> ---------------------------------
> Do you Yahoo!?
> Yahoo! Mail - 250MB free storage. Do more. Manage less.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------
Do you Yahoo!?
 Dress up your holiday email, Hollywood style. Learn more.

Re: Deleting index for DB indexing

Posted by Paul <pa...@gmail.com>.

Alternative: create a hashed value which is unique within your DB
(e.g. use md5). Afterwards you can delete documents from the index
with the IndexReader(Term).
Without that additional field you can use the IndexSearcher to
retrieve your documents from the index and then use
IndexReader(DocNum) to delete these documents

Paul

On Thu, 30 Dec 2004 07:18:39 -0800 (PST), mahaveer jain
<ja...@yahoo.com> wrote:
> Hi All,
> 
> I am using lucene for my DB indexing. I have 2 columns which are Keyword.
> Now I want to delete my index based on this 2 keyword.
> 
> Is it possible ? If no. What is other alternative ?
> 
> Thanks
> Mahaveer
> 
> 
> ---------------------------------
> Do you Yahoo!?
>  Yahoo! Mail - 250MB free storage. Do more. Manage less.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org