You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dominik Bruhn <do...@dbruhn.de> on 2006/07/07 12:13:55 UTC

addIndexes getting slower and slower plus eating up Mem

Hy, 
I use the following code to index about 1 Million Documents to a empty index:
=============
	private static void do_searchindex(Connection target) throws 
SQLException,IOException {
		int i=1164;
		PostIndexer.createIndexDir();	//Creates Index-Director
		IndexWriter fsWriter = new IndexWriter(PostIndexer.getIndexDir(), 
PostIndexer.getAnalyser(), false);
		while (do_searchindex(fsWriter,target,i)>0) {
			i++;
		}
		fsWriter.close();
	}

	private static int do_searchindex(IndexWriter writer,Connection ctarget,int 
page) throws SQLException,IOException {
		ResultSet rs = ctarget.createStatement().executeQuery("SELECT 
postid,db_post.threadid,posttext,db_thread.threadtitle FROM db_post LEFT JOIN 
db_thread ON (db_thread.threadid=db_post.threadid) ORDER BY postid DESC 
LIMIT "+(page*500)+",500 ;");
		int c=0;
		
		RAMDirectory ramDir = new RAMDirectory();
		IndexWriter ramWriter = new IndexWriter(ramDir, PostIndexer.getAnalyser(), 
true);
		
		while (rs.next()) {
			
PostIndexer.addToIndex(ramWriter,rs.getInt("postid"),rs.getString("posttext"),rs.getString("threadtitle"));
			c++;
		}
		writer.addIndexes(new Directory[] { ramDir });
		ramWriter.close();
		
		rs.close();
		System.out.println("Did Page "+page);
		return(c);		
	}
=================

The Code for "PostIndex.addToIndex" is:
===============
Document doc = new Document();
		Field title = new 
Field("title",threadtitle,Field.Store.NO,Field.Index.TOKENIZED,Field.TermVector.NO);
		title.setBoost(2);
		
		doc.add(title);
		doc.add(new 
Field("text",posttext,Field.Store.NO,Field.Index.TOKENIZED,Field.TermVector.YES));
		doc.add(new Field("id",""+postid,Field.Store.YES, 
Field.Index.UN_TOKENIZED));
					
		writer.addDocument(doc);	
============

When I run this code the first 500 Entries get added in about 2 seconds. But 
for the 1167*500 to (1167+1)*500 Entries it takes more than 10 Minutes. Also 
the RAM-Usage is increasing dramatically. Is this a normal behaviour, or is 
it a mistake in my code or is it a bug in Lucene? I remeber someone here on 
the list talking about this problem but cant find the post anymore.
Thanks
-- 
Dominik Bruhn
mailto: dominik@dbruhn.de
http://www.dbruhn.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: addIndexes getting slower and slower plus eating up Mem

Posted by Dominik Bruhn <do...@dbruhn.de>.
Hy,

On Friday 07 July 2006 12:23 mark harwood wrote:
> Out of interest, why are you using a RAMDirectory here? An IndexWriter uses
> one internally of size IndexWriter.setMaxBufferedDocs so you get the
> benefits of buffering automatically when writing to a File-based directory.
realy? I read the trick here and so I did it. I'll try without it to see if 
its a difference!

Thanks
-- 
Dominik Bruhn
mailto: dominik@dbruhn.de
http://www.dbruhn.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: addIndexes getting slower and slower plus eating up Mem

Posted by mark harwood <ma...@yahoo.co.uk>.
The answer is because addIndexes() currently always does an optimize post-merge. If I recall correctly optimize() will create a complete copy of the existing index during the optimize process then delete the old one so this shouldn't be done too often.

Out of interest, why are you using a RAMDirectory here? An IndexWriter uses one internally of size IndexWriter.setMaxBufferedDocs so you get the benefits of buffering automatically when writing to a File-based directory.

Cheers
Mark



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org