You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by hu andy <an...@gmail.com> on 2006/03/23 08:45:19 UTC

Speed up Indexing

Hi,everyone. I have a large mount of xml files of size 1G. I use lucene(the
dotNet edition) to index . There are 8 fields for a document, with 4 keyword
fields and 4 unstored fields. I have set the minMergeDocs to 10000 and
mergeFactor to 100. It took about 2.5 hours (main memeory 3G, CPU p4 ) .I
also try in-memory indexing  which is also more than 2.5hours.  Due to the
performance requirement , I need complete the indexing in one hour without
the use of distributing or clustering system . Cant it be possible?  Is it
faster to use java Lucene than dotNet one? Any advice will be appreciated.
Thank you in advance.

Re: Speed up Indexing

Posted by Jeff Rodenburg <je...@gmail.com>.
I run Lucene.Net as well, and your indexing performance is dependent on more
factors aside from whether you're using the Java or C# version.  As a basic
suggestion, learn what you can about minMergeDocs and mergeFactor as well as
the compound file format.  Try different combinations to understand what is
faster vs. slower.

As a strategy for your specific scenario, you might consider building
several indexes in parallel, then merging the indexes at the end.

Hope this helps.

-- j


On 3/22/06, hu andy <an...@gmail.com> wrote:
>
> Hi,everyone. I have a large mount of xml files of size 1G. I use
> lucene(the
> dotNet edition) to index . There are 8 fields for a document, with 4
> keyword
> fields and 4 unstored fields. I have set the minMergeDocs to 10000 and
> mergeFactor to 100. It took about 2.5 hours (main memeory 3G, CPU p4 ) .I
> also try in-memory indexing  which is also more than 2.5hours.  Due to the
> performance requirement , I need complete the indexing in one hour without
> the use of distributing or clustering system . Cant it be possible?  Is it
> faster to use java Lucene than dotNet one? Any advice will be appreciated.
> Thank you in advance.
>
>