You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Laxmilal Menariya <lm...@chambal.com> on 2009/08/10 09:23:17 UTC

Taking too much time in optimization

Hello everyone,

I have created a sample application & indexing files properties, have index
appx 107K files.

I am getting OutofMemoryError after 100K while indexing,  got the cause from
MaxBuffereddocs=100K, but after that I am calling optimize() method, this is
taking too much time appx 12-HRS, and index size is more than 500GB, its too
large.

I am using Lucene 2.4.0. Could some one please let me know what wrong with
my configuration.

My Configuration is :

      lucWriter = new IndexWriter("C:\\Laxmilal", new KeywordAnalyzer(),
true);
      lucWriter.setMergeFactor((int) 1000);
      lucWriter.setMaxMergeDocs((int) 2147483647);
      lucWriter.setMaxBufferedDocs((int) 100000);


-- 
Thanks,
Laxmilal Menariya

http://www.bucketexplorer.com/
http://www.sdbexplorer.com/
http://www.chambal.com/

Re: Taking too much time in optimization

Posted by Laxmilal Menariya <lm...@chambal.com>.
Thanks, I will try.

On Tue, Aug 11, 2009 at 6:08 AM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Hi,
>
> That mergeFactor is too high.  I suggest going back to default (10).
> maxBufferedDocs is an old and not very accurate setting (imagine what
> happens with the JVM heap if your indexer hits a SUPER LARGE document).  Use
> setRamBufferSizeMB instead.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> ----- Original Message ----
> > From: Laxmilal Menariya <lm...@chambal.com>
> > To: java-user@lucene.apache.org
> > Sent: Monday, August 10, 2009 3:23:17 AM
> > Subject: Taking too much time in optimization
> >
> > Hello everyone,
> >
> > I have created a sample application & indexing files properties, have
> index
> > appx 107K files.
> >
> > I am getting OutofMemoryError after 100K while indexing,  got the cause
> from
> > MaxBuffereddocs=100K, but after that I am calling optimize() method, this
> is
> > taking too much time appx 12-HRS, and index size is more than 500GB, its
> too
> > large.
> >
> > I am using Lucene 2.4.0. Could some one please let me know what wrong
> with
> > my configuration.
> >
> > My Configuration is :
> >
> >       lucWriter = new IndexWriter("C:\\Laxmilal", new KeywordAnalyzer(),
> > true);
> >       lucWriter.setMergeFactor((int) 1000);
> >       lucWriter.setMaxMergeDocs((int) 2147483647);
> >       lucWriter.setMaxBufferedDocs((int) 100000);
> >
> >
> > --
> > Thanks,
> > Laxmilal Menariya
> >
> > http://www.bucketexplorer.com/
> > http://www.sdbexplorer.com/
> > http://www.chambal.com/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Thanks,
Laxmilal Menariya

http://www.bucketexplorer.com/
http://www.sdbexplorer.com/
http://www.chambal.com/

Re: Taking too much time in optimization

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,

That mergeFactor is too high.  I suggest going back to default (10).
maxBufferedDocs is an old and not very accurate setting (imagine what happens with the JVM heap if your indexer hits a SUPER LARGE document).  Use setRamBufferSizeMB instead.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: Laxmilal Menariya <lm...@chambal.com>
> To: java-user@lucene.apache.org
> Sent: Monday, August 10, 2009 3:23:17 AM
> Subject: Taking too much time in optimization
> 
> Hello everyone,
> 
> I have created a sample application & indexing files properties, have index
> appx 107K files.
> 
> I am getting OutofMemoryError after 100K while indexing,  got the cause from
> MaxBuffereddocs=100K, but after that I am calling optimize() method, this is
> taking too much time appx 12-HRS, and index size is more than 500GB, its too
> large.
> 
> I am using Lucene 2.4.0. Could some one please let me know what wrong with
> my configuration.
> 
> My Configuration is :
> 
>       lucWriter = new IndexWriter("C:\\Laxmilal", new KeywordAnalyzer(),
> true);
>       lucWriter.setMergeFactor((int) 1000);
>       lucWriter.setMaxMergeDocs((int) 2147483647);
>       lucWriter.setMaxBufferedDocs((int) 100000);
> 
> 
> -- 
> Thanks,
> Laxmilal Menariya
> 
> http://www.bucketexplorer.com/
> http://www.sdbexplorer.com/
> http://www.chambal.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org