You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Vince Taluskie <vi...@taluskie.com> on 2003/03/21 04:46:45 UTC

merging indexes and RamDirs?

Howdy All,

I am interested in several things to improve the speed of my indexing.  
First would be to find out if it's possible (as well as how) to merge 
lucene indexes of similarly structured (same number of and type of 
fields) documents or coordinate several machines updating the same 
index.   For my application (estimate of 360M lucene documents across 
30k physical files), I'd like to parallelize the indexing across as many 
CPUs as I can and then merge the results back together - or use a 
MultiSearcher across all the individual indexes if merge is not an option.

Secondly, I'd like to know more about performing indexing in a 
RAMDirectory and flushing those indexes back out to a FSDirectory.   I 
was performing some tests of indexing on a Solaris-based machine and my 
indexing speed went up by a factor of 3 when I pointed my indexing 
program to store it's index in a tmpfs (ram-based) filesystem rather 
than a physical disk - so I would imagine that I'd see a similar speedup 
with a RAMDirectory and it would be portable to non-solaris machines as 
well.   Would it be as simple as getting a list() from the RAMDir, then 
an openFile() on each file and writing that Stream out to to disk?

Thanks,

Vince Taluskie


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: merging indexes and RamDirs?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
The most recent article about Lucene published on
http://www.onjava.com/ talks exactly about this type of stuff.  It
should answer your questions from this email.

Otis

--- Vince Taluskie <vi...@taluskie.com> wrote:
> Howdy All,
> 
> I am interested in several things to improve the speed of my
> indexing.  
> First would be to find out if it's possible (as well as how) to merge
> 
> lucene indexes of similarly structured (same number of and type of 
> fields) documents or coordinate several machines updating the same 
> index.   For my application (estimate of 360M lucene documents across
> 
> 30k physical files), I'd like to parallelize the indexing across as
> many 
> CPUs as I can and then merge the results back together - or use a 
> MultiSearcher across all the individual indexes if merge is not an
> option.
> 
> Secondly, I'd like to know more about performing indexing in a 
> RAMDirectory and flushing those indexes back out to a FSDirectory.  
> I 
> was performing some tests of indexing on a Solaris-based machine and
> my 
> indexing speed went up by a factor of 3 when I pointed my indexing 
> program to store it's index in a tmpfs (ram-based) filesystem rather 
> than a physical disk - so I would imagine that I'd see a similar
> speedup 
> with a RAMDirectory and it would be portable to non-solaris machines
> as 
> well.   Would it be as simple as getting a list() from the RAMDir,
> then 
> an openFile() on each file and writing that Stream out to to disk?
> 
> Thanks,
> 
> Vince Taluskie
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org