You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Marc Dumontier <md...@rogers.com> on 2003/06/27 05:57:07 UTC

parallizing index building

Hi,

I'm indexing 500 XML files each ~150Mb on an 8 CPU machine.

I'm wondering what the best strategy for making maximum use of resources is. I have the tweaked the single process indexer to index 5000 records (not files) in memory before writing out to disk.

Should i create an IndexThread and share the IndexWriter object across 5 threads..then monitor when one ends to start another, etc. Or should i create difference indexes then to a series of merges.

any help would be appreciated,

thanks,
Marc Dumontier
Bioinformatics Application Developer
Blueprint Initiative
Mount Sinai Hospital
Toronto
http://www.bind.ca

Re: parallizing index building

Posted by Victor Hadianto <vi...@nuix.com.au>.
> Where can I find any sample code or documentation about merging a set of
> small indexes into one big index?

It's very simple, just add the index directorioes to your IndexWriter using 
the addIndexes() method, call optimize() and then close().

That's all.


> Lixin

victor

>
> -----Original Message-----
> From: Doug Cutting [mailto:cutting@lucene.com]
> Sent: Monday, June 30, 2003 10:24 AM
> To: Lucene Users List
> Subject: Re: parallizing index building
>
> Marc Dumontier wrote:
> > I'm indexing 500 XML files each ~150Mb on an 8 CPU machine.
> >
> > I'm wondering what the best strategy for making maximum use of resources
>
> is. I have the tweaked the single process indexer to index 5000 records
> (not files) in memory before writing out to disk.
>
> > Should i create an IndexThread and share the IndexWriter object across 5
>
> threads..then monitor when one ends to start another, etc. Or should i
> create difference indexes then to a series of merges.
>
> Creating multiple indexes in parallel and then merging them at the end
> will probably be fastest.
>
> Doug
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org

-- 
Victor Hadianto

NUIX Pty Ltd
Level 8, 143 York Street, Sydney 2000
Phone: (02) 9283 9010
Fax:   (02) 9283 9020

This message is intended only for the named recipient. If you are not the
intended recipient you are notified that disclosing, copying, distributing
or taking any action in reliance on the contents of this message or
attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: parallizing index building

Posted by Lixin Meng <li...@fulldegree.com>.
Where can I find any sample code or documentation about merging a set of
small indexes into one big index?

Lixin

-----Original Message-----
From: Doug Cutting [mailto:cutting@lucene.com]
Sent: Monday, June 30, 2003 10:24 AM
To: Lucene Users List
Subject: Re: parallizing index building


Marc Dumontier wrote:
> I'm indexing 500 XML files each ~150Mb on an 8 CPU machine.
>
> I'm wondering what the best strategy for making maximum use of resources
is. I have the tweaked the single process indexer to index 5000 records (not
files) in memory before writing out to disk.
>
> Should i create an IndexThread and share the IndexWriter object across 5
threads..then monitor when one ends to start another, etc. Or should i
create difference indexes then to a series of merges.

Creating multiple indexes in parallel and then merging them at the end
will probably be fastest.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: parallizing index building

Posted by Doug Cutting <cu...@lucene.com>.
Marc Dumontier wrote:
> I'm indexing 500 XML files each ~150Mb on an 8 CPU machine.
> 
> I'm wondering what the best strategy for making maximum use of resources is. I have the tweaked the single process indexer to index 5000 records (not files) in memory before writing out to disk.
> 
> Should i create an IndexThread and share the IndexWriter object across 5 threads..then monitor when one ends to start another, etc. Or should i create difference indexes then to a series of merges.

Creating multiple indexes in parallel and then merging them at the end 
will probably be fastest.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org